This is the second part of a ten-part post on the foundation of our understanding of high energy physics, which is Richard Feynman's functional integral. The first part is Action, and the following parts, which will appear at intervals of about a month, are Electromagnetism, Action for Fields, Radiation in an Oven, Matrix Multiplication, The Functional Integral, Gauge Invariance, Photons, and Interactions.

I'm hoping this blog will be fun and useful for everyone with an interest in science, so although I'll pop up a few formulae again, I'll still try to keep them friendly by explaining all the pieces, as in the first part of the post. Please feel free to ask a question in the Comments, if you think anything in the post is unclear.

The clue that led to the discovery of quantum mechanics, whose principles are summarized in Feynman's functional integral, came from the attempted application to electromagnetic radiation of discoveries about heat and temperature. Today I would like to tell you about some of those discoveries.

Around 60 BC, Titus Lucretius Carus suggested in an epic poem, "On the Nature of Things," that matter consists of indivisible atoms moving incessantly in an otherwise empty void. In 1738 Daniel Bernoulli proposed that the pressure and temperature of gases are consequences of the random motions of large numbers of molecules. The theory was not immediately accepted, but around the middle of the nineteenth century, John James Waterston, August Krönig, Rudolf Clausius, and James Clerk Maxwell discovered that the everyday observations that things tend to have a temperature that can increase or decrease, and that the temperatures of adjacent objects tend to change towards a common intermediate value, follow from the random behaviour of large numbers of microscopic objects subject to Newton's laws, in particular the conservation of total energy.

To understand how this happens, it is helpful to know about the rate of change with $ y$ of an expression such as $ c^y$, which means $ c$ to the power $ y$, where $ c$ is a fixed number greater than 0, and $ y$ could for example be time or a position coordinate, measured in units of a fixed amount of time or a fixed distance. $ y$ does not have to be a whole number, since any number $ y$ can be approximated as accurately as desired by a ratio of the form $ \frac{a}{b}$, where $ a$ and $ b$ are whole numbers, and $ c^{\frac{a}{b}}$ is the $ a$'th power of the $ b$'th root of $ c$. The rate of change with $ y$ of $ c^y$ at $ y = 0$, which we can write as $ \left( \frac{\mathrm{d}}{\mathrm{d} 
y} c^y \right)_0$, is called the natural logarithm of $ c$, and usually written as $ \mathrm{\ln} \left( c \right)$. It was studied in the sixteenth century by John Napier. I explained the meaning of an expression like $ \frac{\mathrm{d}}{\mathrm{d} y}$ in the first part of the post, here.

For example, this diagram shows $ 2^y$ in blue, for $ y$ in the range $ - 1$ to 1. The red line is the straight line with the same rate of change with $ y$ as $ 2^y$ at $ y = 0$, and from its value $ \simeq 1.7$ at $ y = 1$, we can read from the graph that $ \mathrm{\ln} \left( 2 \right) \simeq \frac{1.7 - 1.0}{1} 
= 0.7$. The symbol $ \simeq$ means, "approximately equal to."

We have:

$\displaystyle \mathrm{\ln} \left( c \right) = \frac{c^{\mathrm{d} y} - c^0}{\mathrm{d} y} 
= \frac{c^{\mathrm{d} y} - 1}{\mathrm{d} y}, $

since $ c^0 = 1$ for all $ c$ greater than 0. Thus $ \mathrm{\ln} \left( 1 
\right) = 0$, since $ 1^y = 1$ for all $ y$. Leibniz's $ \mathrm{d}$ indicates that the formula is to be taken in the limit where the size of the tiny quantity $ \mathrm{d} y$ tends to 0.

We now find:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} y} c^y = \frac{c^{y + \mathrm{d} y} ... 
...d} y} - 1}{\mathrm{d} y} = \left( 
\mathrm{\ln} \left( c \right) \right) c^y . $

The number $ \mathrm{e}$ such that $ \mathrm{\ln} \left( \mathrm{e} \right) = 1$ is sometimes called Napier's number. From the above formula, we have:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} y} \mathrm{e}^y = \mathrm{e}^y . $

For any fixed number $ a$, we have:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} y} \mathrm{e}^{ay} = \frac{\mathrm{e... 
...^{ay} 
\frac{\mathrm{e}^{\mathrm{d} y} - 1}{\mathrm{d} y} = a \mathrm{e}^{ay}, $

where the second from last equality follows because the limit of $ \frac{\mathrm{e}^{a \mathrm{d} y} - 1}{a \mathrm{d} y}$ as the tiny quantity $ \mathrm{d} y$ tends to 0 is the same as the limit of $ \frac{\mathrm{e}^{\mathrm{d} y} - 1}{\mathrm{d} y}$ as the tiny quantity $ \mathrm{d} y$ tends to 0, because for fixed $ a$, $ a \mathrm{d} y$ is also a tiny quantity that tends to 0 as $ \mathrm{d} y$ tends to 0.

From the above equation with $ a$ chosen to be $ \mathrm{\ln} \left( c \right)$, we find that

$\displaystyle \mathrm{e}^{\left( \mathrm{\ln} \left( c \right) \right) y} = c^y $

for all $ y$, because both these expressions are equal to 1 for $ y = 0$, and they both satisfy the equation $ \frac{\mathrm{d}}{\mathrm{d} y} X = \left( 
\mathrm{\ln} \left( c \right) \right) X$. This equation fixes $ X \left( y 
\right)$ for all $ y$ once $ X \left( 0 \right)$ is given, because $ X \left( y 
\right)$ can be calculated from $ X \left( 0 \right)$ as accurately as desired, by dividing up the interval from 0 to $ y$ into a great number of sufficiently tiny intervals, and calculating the approximate value of $ X \left( y 
\right)$ at the end of each tiny interval from its approximate value at the start of that interval, by using $ X \left( y + \varepsilon \right) \simeq X \left( y 
\right) + \left( \frac{\mathrm{d} X}{\mathrm{d} y} \right)_y \varepsilon$.

From the above equation at $ y = 1$, we find that

$\displaystyle \mathrm{e}^{\mathrm{\ln} \left( c \right)} = c, $

for every number $ c$ greater than 0. Thus for any numbers $ a$ and $ b$, both greater than 0, we have $ \mathrm{e}^{\mathrm{\ln} \left( ab \right)} = ab = 
\mathrm{e}^{\mathrm{\ln} \l... 
...)} = \mathrm{e}^{\mathrm{\ln} \left( a \right) + \mathrm{\ln} \left( b 
\right)}$. Thus:

$\displaystyle \mathrm{\ln} \left( ab \right) = \mathrm{\ln} \left( a \right) + 
\mathrm{\ln} \left( b \right) . $

If $ X = \mathrm{e}^y$, so that the symbol $ X$, like the expression $ \mathrm{e}^y$, represents the collection of data that gives the value of the $ y$-dependent quantity $ \mathrm{e}^y$ at each value of $ y$, then from this formula above, we have $ \frac{\mathrm{d} X}{\mathrm{d} y} = X$, and from this formula above, we have $ y = \mathrm{\ln} \left( X \right)$, for all values of $ X$ greater than 0. Thus $ \frac{\mathrm{d} X}{\mathrm{d} \left( \mathrm{\ln} 
\left( X \right) \right)} = X$, for all $ X$ greater than 0, so:

$\displaystyle \frac{\mathrm{d} \left( \mathrm{\ln} \left( X \right) \right)}{\mathrm{d} 
X} = \frac{1}{X}, $

for all values of $ X$ greater than 0. We can treat $ \frac{\mathrm{d} 
X}{\mathrm{d} y}$ as an ordinary ratio when taking its reciprocal, as I did here, because the rate of change of $ X$ with $ y$ is the reciprocal of the rate of change of $ y$ with $ X$.

To calculate the value of Napier's number $ \mathrm{e}$, we observe first that for all positive whole numbers $ n$:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} y} y^n = ny^{n - 1} . $

For $ \frac{\mathrm{d} y}{\mathrm{d} y} = 1$, so the above formula is true for $ n = 1$, and if the formula is established for $ y^n$, then from Leibniz's rule for the rate of change of a product, which we obtained in the first part of the post here, we have:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} y} y^{n + 1} = \frac{\mathrm{d}}{\ma... 
...\mathrm{d} y}{\mathrm{d} y} = ny^{n - 1} y + y^n = \left( n + 1 
\right) y^n . $

Thus if the formula is established for $ n$ then it is established for $ n + 1$, so it is established for all the positive whole numbers $ n = 1, 2, 3, \ldots$ in succession.

Therefore:

$\displaystyle \mathrm{e}^y = 1 + y + \frac{y^2}{2!} + \frac{y^3}{3!} + \frac{y^4}{4!} + 
\ldots = \sum_{n = 0}^{\infty} \frac{y^n}{n!}, $

where $ 0!$ is defined to be 1, and for each positive whole number $ n$, $ n!$ is defined to be the product of all the whole numbers from 1 to $ n$. The exclamation mark $ !$ is usually read as "factorial". The $ \ldots$ mean that the sum continues in accordance with the pattern shown by the terms before the $ \ldots$. The symbol $ \sum$ is the Greek letter Sigma, and is called the summation sign. I explained its meaning in the first part of the post, here. The symbol $ \infty$ above the $ \sum$, which is read as "infinity", means that the sum is unending.

The reason the above formula for $ \mathrm{e}^y$ is true is that the sum in the right-hand side of the formula is equal to 1 for $ y = 0$, and it satisfies the same equation $ \frac{\mathrm{d} X}{\mathrm{d} y} = X$ as $ \mathrm{e}^y$ does. So for the same reason as we discussed above, the sum in the right-hand side of the formula is equal to $ e^y$, for all $ y$. The reason the expression $ X 
= 1 + y + \frac{y^2}{2!} + \frac{y^3}{3!} + \frac{y^4}{4!} + \ldots$ satisfies the equation $ \frac{\mathrm{d} X}{\mathrm{d} y} = X$ is that $ \frac{\mathrm{d}}{\mathrm{d} y}$ on the first term in $ X$ gives 0, and $ \frac{\mathrm{d}}{\mathrm{d} y}$ on each term in $ X$ after the first gives the preceding term in $ X$, since from our observation above, $ \frac{\mathrm{d}}{\mathrm{d} y} \frac{y^n}{n!} = \frac{ny^{n - 1}}{\left( n 
- 1 \right) !} = \frac{y^{n - 1}}{\left( n - 1 \right) !}$.

This argument that the expression $ X 
= 1 + y + \frac{y^2}{2!} + \frac{y^3}{3!} + \frac{y^4}{4!} + \ldots$ satisfies the equation $ \frac{\mathrm{d} X}{\mathrm{d} y} = X$ assumes that the endless sum tends to a finite limiting value as more and more terms are added, no matter how large the magnitude $ \left\vert y \right\vert$ of $ y$ is. In fact, the expression $ \frac{y^n}{n!}$ increases in magnitude with increasing $ n$ for $ n < \left\vert y \right\vert$, and then starts decreasing in magnitude more and more rapidly with increasing $ n$, so that the endless sum always does tend to a finite limiting value, no matter how large $ \left\vert y \right\vert$ is. If $ n$ is larger than $ 2 \left\vert y \right\vert$, then the endless sum of all the terms from $ \frac{y^n}{n!}$ onwards does not exceed $ \frac{\left\vert y \right\vert^n}{n!} \left( 1 + \frac{1}{2} + \frac{1}{4} + 
\frac{1}{8} + \ldots \right) = 2 \frac{\left\vert y \right\vert^n}{n!}$ in magnitude. The endless sum $ 1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \ldots$ approaches 2 when it is continued without end, because each successive term halves the difference between 2 and the sum of the terms up to that point, as shown in this diagram.

The above formula expressing $ \mathrm{e}^y$ as a sum of powers of $ y$ is an example of a "Taylor series," named after Brook Taylor. For $ y = 1$, it gives:

$\displaystyle \mathrm{e} = \mathrm{e}^1 = 1 + 1 + \frac{1}{2!} + \frac{1}{3!} + 
\frac{1}{4!} + \ldots = \sum_{n = 0}^{\infty} \frac{1}{n!} \simeq 2.718. $

The sum of the first 7 terms is sufficient to obtain the result to 3 decimal places.

Let's now consider again, as in the first part of the post, here, the example of a collection of objects, such that each object behaves approximately as though its mass is concentrated at a single point, the objects are moving slowly compared to the speed of light, and the forces between the objects arise from their potential energy $ V$, which depends on their positions but not on their motions. We'll continue to assume that their motions are governed by Newton's laws, and thus by de Maupertuis's principle of stationary action, which I explained in the first part of the post, here, and we'll now assume that the objects are microscopic and their number is very large, so they could be atoms in solids, liquids, gases, or living things. We'll use the same notation as in the first part of the post, here.

We'll use Cartesian coordinates for the positions of the objects, as we did when we derived Newton's second law of motion from de Maupertuis's principle, in the first part of the post, here, and we'll now assume that the potential energy $ V$ depends only on the relative positions of the objects, as in the example of the gravitational potential energy, so that the value of $ V$ is unaltered if the positions $ x_I$ of the objects are shifted by a common displacement $ d$, so that $ x_{a I} 
\rightarrow x_{a I} + d_a$, for all $ 1 \leq I \leq N$, $ 1 \leq a \leq 3$. The symbol $ \leq$ means "less than or equal to."

By adding up Newton's second law of motion for the $ N$ objects, which we obtained from de Maupertuis's principle in the first part of the post, here, we find:

$\displaystyle \sum_{I = 1}^N \frac{\mathrm{d}}{\mathrm{d} t} \left( m_I \frac{\... 
...t) + \sum_{I = 1}^N \left( \frac{\partial 
V}{\partial x_{a I}} \right)_x = 0. $

From the formula we derived in the first part of the post, here, for the change of $ V$ when $ x$ is replaced by $ x + \varepsilon$ with $ \varepsilon$ small, so that the positions $ x_I$ of the objects are shifted by arbitrary small displacements $ \varepsilon_I$, we find that if we choose all the $ \varepsilon_I$ to be the same small displacement $ d$, so that $ \varepsilon_{a I} = d_a$ for all $ 1 \leq I \leq N$, $ 1 \leq a \leq 3$, then:

$\displaystyle 0 = V \left( x + \varepsilon \right) - V \left( x \right) \simeq ... 
...1}^N \sum_{a = 1}^3 \left( \frac{\partial 
V}{\partial x_{a I}} \right)_x d_a, $

in consequence of the assumption that $ V$ depends only on the relative positions of the objects. This is true for an arbitrary small displacement $ d$, so it's true in particular if $ d_a = \varepsilon \delta_{a b}$, where $ \varepsilon$ now represents an arbitrary small number rather than a whole collection of data as before, $ b$ is any of the numbers 1, 2, or 3, and $ \delta_{a b}$ is the Kronecker delta symbol that I mentioned in the first part of the post, here, which is defined to be 1 if its two indices are equal, and 0 otherwise. The error of the above formula tends to 0 more rapidly than in proportion to $ \varepsilon$ as $ \varepsilon$ approaches 0, so we have:

$\displaystyle 0 = \sum_{I = 1}^N \sum_{a = 1}^3 \left( \frac{\partial V}{\parti... 
..._{a b} = \sum_{I = 1}^N \left( \frac{\partial 
V}{\partial x_{b I}} \right)_x, $

for all the relevant values 1, 2, and 3 of $ b$. Combining this result with the formula we obtained above by adding up Newton's law of motion for the $ N$ objects, we find:

$\displaystyle \sum_{I = 1}^N \frac{\mathrm{d}}{\mathrm{d} t} \left( m_I \frac{\mathrm{d} 
x_{a I}}{\mathrm{d} t} \right) = 0. $

The expression $ \frac{\mathrm{d} x_I}{\mathrm{d} t}$ represents the velocity of the $ I$'th object, and the product of an object's mass and its velocity is called its momentum. I shall let $ p$ represent the collection of data that gives the momenta of all the objects at each moment in time, so that $ p_{a I 
t} = m_I \left( \frac{\mathrm{d} x_{a I}}{\mathrm{d} t} \right)_t$ for all $ 1 \leq a \leq 3$, $ 1 \leq I \leq N$, and times $ t$. The above result can then be written:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} t} \sum_{I = 1}^N p_I = 0, $

which is usually referred to as the conservation of total momentum.

If the positions and momenta of all the objects are specified at one particular time, then their values at every other time are determined by Newton's second law of motion, which we obtained in the first part of the post, here, from de Maupertuis's principle. We'll now divide the range of the possible positions and momenta of the objects into equal size "bins", and ask what the most likely number of objects in each bin will be, if the objects are randomly distributed among the bins, subject to the total energy of the objects having a fixed value $ E$.

We'll assume that each bin is sufficiently small that we can treat the positions and momenta of objects in the same bin as approximately equal to one another, but also sufficiently large that the number of objects in a typical bin will be large. For this to be possible, we'll assume that the total number of objects, $ N$, is very large. This is reasonable for things in the everyday world, since the number of atoms in a kilogram of matter is in the range from about $ 10^{25}$ to $ 10^{27}$.

We'll allow for the possibility that there could be a number of different types of object, such that the masses and interactions of objects of the same type are either identical or very similar to one another, so that the kinetic energy $ T$ and the potential energy $ V$ are either exactly or approximately unaltered if the positions and momenta of two objects of the same type are swapped. Objects of different types could be different types of atom, or atoms of the same type in different situations. For example we'll treat two oxygen atoms as different types of object if they form parts of gas molecules contained in separate containers, or if one is part of a gas molecule and the other is part of the wall of a glass container. We'll assume that the number of objects of each different type is very large.

We'll assume that the total momentum of the objects is 0, so that the position of their centre of mass $ \sum_{I = 1}^N m_I x_I$ is independent of time, and we'll assume that if any of the objects are parts of liquid or gas molecules, then some of the other objects form solid containers that prevent the liquids or gases from spreading without limit. The number of relevant bins is therefore finite, because the position coordinates of all the objects are bounded, and the momenta of the objects are also bounded, because we assumed above that the objects are moving slowly compared to the speed of light. We'll denote the number of relevant bins by $ B$.

I shall let $ m$ represent the collection of data that gives the total number of objects of each type, so that if $ j$ represents one of the different types of object, then $ m_j$ is the total number of objects of type $ j$, and I shall let $ n$ represent the collection of data that gives the number of objects of each type in each of the bins into which the range of possible positions and momenta of the objects has been divided, so that if $ s$ represents one of the bins, then $ n_{j s}$ is the number of objects of type $ j$ in bin $ s$.

The objects can be distinguished from one another even if they are of the same type and identical to one another, because we can trace their motions back to a particular time, and "label" identical objects by the positions and momenta they had at that time. The number of different assignments of the $ m_j$ objects of type $ j$ to the bins is $ B^{m_j}$, because each of the objects can be assigned independently to any of the $ B$ bins, and of these $ B^{m_j}$ assignments, the number such that $ n_{j 1}$ of the objects of type $ j$ are in bin 1, $ n_{j 2}$ of them are in bin 2, and so on, is:

$\displaystyle \frac{m_j !}{n_{j 1} !n_{j 2} !n_{j 3} ! \ldots n_{j B} !}, $

where $ M!$ was defined above, for non-negative whole numbers $ M$, to be 1 times the product of all the whole numbers not less than 1 and not greater than $ M$.

To understand why the above formula gives the number of different assignments of the $ m_j$ objects of type $ j$ to the $ B$ bins, such that for each whole number $ s$ in the range 1 to $ B$, the number of objects of type $ j$ in bin $ s$ is $ n_{j s}$, we note first that the number of different ways of putting $ M$ distinguishable objects in $ M$ distinguishable places, such that exactly one object goes to each place, is $ M!$, because we can put the first object in any of the $ M$ places, the second object in any of the remaining $ M - 1$ places, and so on. So if there were $ n_{j s}$ distinguishable places in bin $ s$, for each $ s$ in the range 1 to $ B$, then the number of different ways of putting the $ m_j$ objects in these $ n_{j 1} + n_{j 2} + \ldots + n_{j B} = m_j$ distinct places would be $ m_j !$. This overcounts the number of different assignments of the objects to the bins by a factor $ n_{j 1} !n_{j 2} !n_{j 3} 
! \ldots n_B !$, because we can divide up the $ m_j !$ arrangements into classes, such that arrangements are in the same class if they only differ by permuting objects within bins. Each class then corresponds to a different assignment of the objects to the bins. The number of the $ m_j !$ arrangements in each class is $ n_{j 1} !n_{j 2} !n_{j 3} 
! \ldots n_B !$, so the number of different classes is $ \frac{m_j !}{n_{j 1} !n_{j 2} !n_{j 3} ! 
\ldots n_B !}$.

The number of different assignments of all $ N$ objects to the $ B$ bins is $ B^N$, and of these, the number such that $ n_{j s}$ of the objects of type $ j$ are in bin $ s$, for all $ j$ and $ s$, which I shall denote by $ W_n$, is the product of the above number over $ j$, which we can write as:

$\displaystyle W_n = \prod_j \frac{m_j !}{n_{j 1} !n_{j 2} !n_{j 3} ! \ldots n_{j B} !} . 
$

The symbol $ \prod$ is the upper-case Greek letter Pi, and indicates a product of what follows it. It works in the same way as I described in the first part of the post, here, for $ \sum$, except that instead of forming the sum of the expression that follows it for all the specified values of the specified index, we form the product. If no specific range of the index is specified, then the product is over all relevant values of the index, and thus here over all the different types of object.

The total energy $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} 
\right)_n$ of the objects, when the numbers of objects of the different types in the different bins are given by the collection of data $ n$, is approximately:

$\displaystyle \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n \simeq \sum_j 
\sum_{s = 1}^B n_{j s} E_{j s}, $

where $ E_{j s}$ is the energy of an object of type $ j$ in the centre of bin $ s$. If we randomly drop the $ N$ objects into the $ B$ bins, and discard the result unless the total energy $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} 
\right)_n$ differs from $ E$ by at most a fixed small amount, then the probability that the numbers of objects of the different types in the different bins are given by $ n$ is $ W_n$, divided by the sum of $ W_{n'}$ over all $ n'$ such that $ \left( 
E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_{n'}$ is close enough to $ E$. Thus if the objects are randomly distributed among the bins, subject to the total energy of the objects having a fixed value $ E$, then the most likely number of objects of each type in each bin will be given by the distribution $ n$ for which $ W_n$ reaches its maximum value, among all the distributions $ n'$ for which $ \left( 
E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_{n'}$ is approximately equal to $ E$.

To find the distribution $ n$ for which $ W_n$ reaches its maximum value, subject to $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n 
\simeq E$, we'll use the observation that the slope of a smooth hill is zero at the top of the hill. Thus we'll look for the distribution $ n$ such that the rate of change of $ W_n$ with each of the numbers $ n_{j s}$ would be 0, if it were not for the requirement that $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n 
\simeq E$. For convenience we'll do the calculation for $ \mathrm{\ln} \left( W_n \right)$ rather than for $ W_n$, so that the product of factors in $ W_n$ becomes the sum of the natural logarithms of those factors, due to the result we found above. This will give the same result for the most likely distribution $ n$, because $ \mathrm{\ln} \left( W_n \right)$ increases with increasing $ W_n$ for all $ W_n$ greater than 0, due to the result we found above, so that the $ n$ that gives the largest value of $ W_n$ will also be the $ n$ that gives the largest value of $ \mathrm{\ln} \left( W_n \right)$.

From the formula above for $ W_n$, we have:

$\displaystyle \mathrm{\ln} \left( W_n \right) = \sum_j \left( \mathrm{\ln} \lef... 
..._j ! 
\right) - \sum_{s = 1}^B \mathrm{\ln} \left( n_{j s} ! \right) \right) . $

From the above definition of $ M!$, for non-negative whole numbers $ M$, we have:

$\displaystyle \mathrm{\ln} \left( M! \right) = \mathrm{\ln} \left( 1 \right) + 
... 
... \ldots + 
\mathrm{\ln} \left( M - 1 \right) + \mathrm{\ln} \left( M \right) . $

For very large $ M$, we can therefore write:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} M} \mathrm{\ln} \left( M! \right) = 
... 
...ht) - \mathrm{\ln} \left( M! \right)}{1} 
= \mathrm{\ln} \left( M + 1 \right), $

where $ \frac{\mathrm{d}}{\mathrm{d} M} \mathrm{\ln} \left( M! \right)$ is the slope of a smooth curve that fits the values of $ M!$ at the whole numbers $ M$. The error of the above approximation changes in proportion to $ \mathrm{\ln} 
\left( M + 1 \right) - \mathrm{\ln} \left( M \right) = \mathrm{\ln} \left( 
\frac{M + 1}{M} \right) = \mathrm{\ln} \left( 1 + \frac{1}{M} \right)$ as $ M$ increases, and from above, we have $ \frac{\mathrm{d} \left( \mathrm{\ln} 
\left( X \right) \right)}{\mathrm{d} X} = \frac{1}{X}$, for all $ X > 0$, where the symbol $ >$ means "greater than". Thus $ \mathrm{\ln} \left( 1 + 
\frac{1}{M} \right) \simeq \mathrm{\ln} \left( 1 \righ... 
...\left( X \right) \right)}{\mathrm{d} X} 
\right)_1 \frac{1}{M} = 0 + \frac{1}{M}$, so the error of the above approximation decreases in proportion to $ \frac{1}{M}$ as $ M$ increases.

And from $ \frac{\mathrm{d} \left( \mathrm{\ln} 
\left( X \right) \right)}{\mathrm{d} X} = \frac{1}{X}$ together with Leibniz's rule for the rate of change of a product, which we obtained in the first part of the post, here, we have:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} M} \left( \left( M + 1 \right) \math... 
...eft( M + 1 \right) 
\right)}{\mathrm{d} M} - \frac{\mathrm{d} M}{\mathrm{d} M} $

$\displaystyle = \mathrm{\ln} \left( M + 1 \right) + \left( M + 1 \right) \frac{1}{M + 1} 
- 1 = \mathrm{\ln} \left( M + 1 \right) . $

Thus if we use the above result that $ \frac{\mathrm{d}}{\mathrm{d} M} 
\mathrm{\ln} \left( M! \right) \simeq \mathrm{\ln} \left( M + 1 \right)$ even for $ M$ down to 0, then from the result that the integral of the rate of change is equal to the net change, which we obtained in the first part of the post, here, we have:

$\displaystyle \mathrm{\ln} \left( M! \right) = \mathrm{\ln} \left( M! \right) -... 
...mathrm{d} M' 
\simeq \int_0^M \mathrm{\ln} \left( M' + 1 \right) \mathrm{d} M' $

$\displaystyle = \int_0^M \frac{\mathrm{d}}{\mathrm{d} M'} \left( \left( M' + 1 ... 
...) \mathrm{d} M' = \left( M + 
1 \right) \mathrm{\ln} \left( M + 1 \right) - M, $

since from above, $ \mathrm{0! = 1}$ and from above, $ \mathrm{\ln} \left( 1 
\right) = 0$. The above approximation for $ \mathrm{\ln} \left( M! \right)$ is in error by an amount that increases slowly for large $ M$, but its relative error tends to 0 for large $ M$, so it is accurate enough to use for finding the distribution $ n$ for which $ W_n$ reaches its maximum value, since the numbers $ n_{j s}$ will all increase in proportion to the total number of objects $ N$, which we have assumed to be very large. We can also use the simpler approximation:

$\displaystyle \mathrm{\ln} \left( M! \right) \simeq M \mathrm{\ln} \left( M \right) - M, 
$

whose relative error also tends to 0 for large $ M$.

Thus we have:

$\displaystyle \frac{1}{N} \mathrm{\ln} \left( W_n \right) \simeq \frac{1}{N} \s... 
... \left( 
n_{j s} \mathrm{\ln} \left( n_{j s} \right) - n_{j s} \right) \right) $

$\displaystyle = \sum_j \left( \frac{m_j}{N} \left( \mathrm{\ln} \left( m_j \rig... 
...) - 
\mathrm{\ln} \left( N \right) \right) - \frac{n_{j s}}{N} \right) \right) 
$

$\displaystyle = \sum_j \left( \frac{m_j}{N} \mathrm{\ln} \left( \frac{m_j}{N} \... 
...rm{\ln} \left( 
\frac{n_{j s}}{N} \right) - \frac{n_{j s}}{N} \right) \right), $

where the second line follows from the fact that $ \sum_{s = 1}^B n_{j s} = 
m_j$, for each type of object $ j$, and the third line follows from the relation $ \mathrm{\ln} \left( ab \right) = \mathrm{\ln} \left( a \right) + 
\mathrm{\ln} \left( b \right)$, which we obtained above, for any numbers $ a$ and $ b$, both greater than 0. Thus in the above approximation, $ \frac{1}{N} 
\mathrm{\ln} \left( W_n \right)$ depends only on the ratios $ \frac{m_j}{N}$ and $ \frac{n_{j s}}{N}$.

It is convenient to think of the ratios $ \frac{n_{j s}}{N}$ as coordinates in a "space", which I shall call the space of bin fractions, since $ \frac{n_{j s}}{N}$ is the fraction of the total number of objects $ N$ which are objects of type $ j$ in bin $ s$. The numbers $ n_{j s}$ are restricted to be whole numbers, but for fixed values of the ratios $ \frac{m_j}{N}$, these numbers will be proportional to $ N$, which we have assumed is very large. Thus the ratios $ \frac{n_{j s}}{N}$ only change by tiny amounts when $ n_{j s}$ change by $ \pm 1$, where the symbol $ \pm$ means "plus or minus," so since $ \mathrm{\ln} \left( X \right)$ depends smoothly on $ X$ for all numbers $ X > 0$, we can think of the coordinates $ \frac{n_{j s}}{N}$ as effectively continuous. If the number of types of object is $ Y$, then the space of bin fractions has $ YB$ dimensions, since a point in this space is specified by the $ YB$ numbers $ \frac{n_{j s}}{N}$.

The equation $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n = 
E$ imposes one relation among the $ YB$ coordinates of the space of bin fractions, so it defines a $ \left( YB - 1 \right)$-dimensional "surface" in this space. We are looking for the point $ n$ on this surface at which $ \mathrm{\ln} \left( W_n \right)$ reaches the largest value it takes anywhere on this surface. If we think of $ \mathrm{\ln} \left( W_n \right)$ as the height of a smooth "hill", then since the slope of a smooth hill is 0 in each direction at the top of the hill, the rate of change of $ \mathrm{\ln} \left( W_n \right)$ is 0 in each direction along the surface, at the point $ n$ on the surface where $ \mathrm{\ln} \left( W_n \right)$ reaches its maximum value. However the rate of change of $ \mathrm{\ln} \left( W_n \right)$ in directions that are not along the surface does not have to be 0 at that point. So we are looking for a point $ n$ on the surface such that the rate of change $ \mathrm{\ln} \left( W_n \right)$ in every direction, whether along the surface or not, is a multiple of the rate of change of $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} 
\right)_n$ in that direction, since the rate of change of $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} 
\right)_n$ along the surface is 0.

We can do that by looking for a point $ n$ for which an expression

$\displaystyle \mathrm{\ln} \left( W_n \right) - \beta \left( 
E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n $

reaches its maximum value, where $ \beta$, which is the Greek letter beta, is a fixed number whose value will be chosen later. For requiring that the rate of change of the above expression is 0 in every direction gives the equations:

$\displaystyle \frac{\partial}{\partial n_{j s}} \mathrm{\ln} \left( W_n \right)... 
... \left( \left( 
E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n \right) $

for all $ j$ and $ s$, so that the rate of change of $ \mathrm{\ln} \left( W_n \right)$ in every direction is a multiple of the rate of change of $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} 
\right)_n$ in that direction. As I explained in the first part of the post, here, the symbol $ \partial$ is an alternative notation for Leibniz's $ \mathrm{d}$, that is usually used to express the rate of change of a quantity that depends on a number of quantities that can vary continuously. The points $ n$ that solve the above equation for different values of $ \beta$ will form a "line" through the space of bin fractions, such that different values of $ \beta$ correspond to different points on the line, and after we have found this line, we can find the point where it intersects the surface defined by the equation $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n = 
E$. The number $ \beta$ is called a "Lagrange multiplier", after Joseph Louis Lagrange.

The $ Y$ equations $ \sum_{s = 1}^B n_{j s} = 
m_j$, one for each type of object $ j$, similarly each define a $ \left( YB - 1 \right)$-dimensional surface in the space of bin fractions, and we'll take these equations into account by using $ Y$ additional Lagrange multipliers $ \gamma_j$, one for each of these equations, where $ \gamma$ is the Greek letter gamma. So we'll look for a point $ n$ in the space of bin fractions for which an expression:

$\displaystyle K = \frac{1}{N} \mathrm{\ln} \left( W_n \right) - \beta \frac{1}{... 
...ot}}}} \right)_n - \frac{1}{N} 
\sum_{j = 1}^Y \gamma_j \sum_{s = 1}^B n_{j s} $

$\displaystyle = \sum_{j = 1}^Y \left( \frac{m_j}{N} \mathrm{\ln} \left( \frac{m... 
... \beta 
\frac{n_{j s}}{N} E_{j s} + \gamma_j \frac{n_{j s}}{N} \right) \right) $

$\displaystyle = \sum_{j = 1}^Y \left( \frac{m_j}{N} \mathrm{\ln} \left( \frac{m... 
... \right) - f_{j s} + \beta f_{j s} E_{j s} + \gamma_j f_{j s} 
\right) \right) $

reaches its maximum value, where $ f_{j s} = \frac{n_{j s}}{N}$, and $ \beta$ and the $ \gamma_j$ are fixed numbers whose values will be chosen later. The second line here follows from the formula for $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} 
\right)_n$, as above, and the formula for $ \frac{1}{N} 
\mathrm{\ln} \left( W_n \right)$ we obtained above. From a calculation similar to the one above, we have:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} f} \left( f \mathrm{\ln} \left( f \right) - f 
\right) = \mathrm{\ln} \left( f \right), $

so:

$\displaystyle \frac{\partial K}{\partial f_{j s}} = - \mathrm{\ln} \left( f_{j s} \right) 
- \beta E_{j s} - \gamma_j . $

If this is 0 for all $ j$ and all $ s$ at a point $ f$ in the space of bin fractions, then the rate of change of $ K$ will be 0 in any direction where the rates of change of $ \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} 
\right)_n$ and all the quantities $ \sum_{s = 1}^B n_{j s}$ are 0. From the result we found above, this expression for $ \frac{\partial K}{\partial f_{j 
s}}$ is 0 when:

$\displaystyle f_{j s} = \mathrm{e}^{- \beta E_{j s} - \gamma_j}, $

where Napier's number $ \mathrm{e}$ was defined above, and its value was approximately calculated above. From the result above, $ \frac{\mathrm{d}}{\mathrm{d} f} \mathrm{\ln} \left( f \right)$ is $ > 0$ for all $ f > 0$, so $ \mathrm{\ln} \left( f \right)$ increases as $ f$ increases, for all $ f > 0$, so from the formula above, $ \frac{\partial K}{\partial f_{j 
s}}$ decreases as $ f_{j s}$ increases, for all values of $ f_{j s}$ greater than 0. Thus for $ f_{j s} > 0$, $ \frac{\partial K}{\partial f_{j 
s}}$ is positive when $ f_{j s}$ is less than the above value and negative when $ f_{j s}$ is greater than the above value, so since the terms in $ K$ that depend on any one $ f_{j s}$ are independent of all the other $ f_{j s}$, the maximum value of $ K$ in the region where all $ f_{j s}$ are $ > 0$ is attained when the value of each $ f_{j s}$ is given by the above formula.

And when the value of each $ f_{j s}$ is given by the above formula, the rate of change of $ \frac{1}{N} 
\mathrm{\ln} \left( W_n \right)$, in any direction in the space of bin fractions, is $ \beta$ times the rate of change of $ \frac{1}{N} \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n$ in that direction, plus the sum, over the object types $ j$, of $ \gamma_j$ times the rate of change of $ \frac{1}{N} \sum_{s = 1}^B n_{j s}$ in that direction.

Remembering that $ \beta$ and the $ \gamma_j$ are fixed numbers whose values will be chosen later, we'll now define the value of $ \frac{1}{N} E$, and the numbers $ \frac{1}{N} m_j$, to be such that the equation $ \frac{1}{N} \left( 
E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n = \frac{1}{N} E$, and the $ Y$ equations $ \frac{1}{N} \sum_{s = 1}^B n_{j s} = \frac{1}{N} m_j$, are all satisfied at the point where the value of each $ f_{j s} = \frac{n_{j s}}{N}$ is given by the above formula.

For fixed values of $ \frac{1}{N} E$, and the numbers $ \frac{1}{N} m_j$, each of these $ 1 + Y$ equations defines a $ \left( YB - 1 \right)$-dimensional surface in the space of bin fractions, and the intersection of these $ 1 + Y$ surfaces defines a $ \left( YB - 1 - Y \right)$-dimensional "surface" in the space of bin fractions. This $ \left( YB - 1 - Y \right)$-dimensional surface is the surface on which the $ 1 + Y$ equations $ \frac{1}{N} \left( 
E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n = \frac{1}{N} E$ and $ \frac{1}{N} \sum_{s = 1}^B n_{j s} = \frac{1}{N} m_j$ are all satisfied, so in any direction tangential to this $ \left( YB - 1 - Y \right)$-dimensional surface, the rates of change of $ \frac{1}{N} \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n$ and $ \frac{1}{N} \sum_{s = 1}^B n_{j s}$ are all 0. Thus at the point on this $ \left( YB - 1 - Y \right)$-dimensional surface where the value of each $ f_{j s}$ is given by the above formula, the rate of change of $ \frac{1}{N} 
\mathrm{\ln} \left( W_n \right)$, in any direction tangential to this $ \left( YB - 1 - Y \right)$-dimensional surface, is 0.

I'll refer to this $ \left( YB - 1 - Y \right)$-dimensional surface in the space of bin fractions as $ S \left( \beta, \gamma \right)$, since the fixed values of $ \frac{1}{N} E$ and the numbers $ \frac{1}{N} m_j$ on it are determined by the fixed values of $ \beta$ and the $ \gamma_j$. Since the maximum value of $ K$ in the region where all $ f_{j s}$ are $ > 0$ is attained when the value of each $ f_{j s}$ is given by the above formula, and the values of $ \frac{1}{N} \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n$ and the $ \frac{1}{N} \sum_{s = 1}^B n_{j s}$ are all fixed on $ S \left( \beta, \gamma \right)$, the maximum value of $ \frac{1}{N} 
\mathrm{\ln} \left( W_n \right)$ on $ S \left( \beta, \gamma \right)$, in the region where all $ f_{j s}$ are $ > 0$, is attained when the value of each $ f_{j s}$ is given by the above formula.

From the formula for $ \frac{1}{N} \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n$ as above, the fixed values of the $ 1 + Y$ quantities $ \frac{1}{N} E$ and the $ \frac{1}{N} m_j$ are determined in terms of the $ 1 + Y$ quantities $ \beta$ and the $ \gamma_j$ by the formulae:

$\displaystyle \frac{1}{N} E = \frac{1}{N} \left( 
E_{\mathrm{\ensuremath{\opera... 
...sum_{j = 1}^Y \sum_{s = 1}^B \mathrm{e}^{- \beta 
E_{j s} - \gamma_j} E_{j s}, $

$\displaystyle \frac{1}{N} m_j = \frac{1}{N} \sum_{s = 1}^B n_{j s} = \sum_{s = 1}^B f_{j 
s} = \sum_{s = 1}^B \mathrm{e}^{- \beta E_{j s} - \gamma_j} . $

So for given values of $ \frac{1}{N} E$ and the $ \frac{1}{N} m_j$ that are not impossible, due for example to some of the $ \frac{1}{N} m_j$ being negative or greater than 1, or the $ \frac{1}{N} m_j$ not adding up to 1, there will be values of $ \beta$ and the $ \gamma_j$, typically uniquely determined, for which $ \frac{1}{N} E$ and the $ \frac{1}{N} m_j$, as determined by the above formulae, have the given values. These are the values of $ \beta$ and the $ \gamma_j$ which we choose, when the values of $ \frac{1}{N} E$ and the $ \frac{1}{N} m_j$ are given.

From the last formula above:

$\displaystyle \mathrm{e}^{\gamma_j} = \frac{N}{m_j} \sum_{s = 1}^B \mathrm{e}^{- \beta 
E_{j s}}, $

so from the formula above:

$\displaystyle n_{j s} = Nf_{j s} = N \mathrm{e}^{- \beta E_{j s} - \gamma_j} = ... 
...{\mathrm{e}^{- \beta E_{j s}}}{\sum_{s = 1}^B \mathrm{e}^{- \beta E_{j 
s}}} . $

From above, Napier's number $ \mathrm{e}$ has the value $ \mathrm{e} \simeq 
2.718 > 1$, so if $ \beta$ was negative, then for a given type of object $ j$, $ n_{j s}$ would be larger for bins for which the energy $ E_{j s}$ at their centres is larger, and if $ \beta$ was 0, $ n_{j s}$ would be the same for all bins, no matter how large the energy $ E_{j s}$ at their centres. In either of these cases, there would be no justification for our assumption that the objects are all moving slowly compared to the speed of light, so I'll assume that $ \beta > 0$.

From the above formula for $ n_{j s}$, we have:

$\displaystyle E = \left( E_{\mathrm{\ensuremath{\operatorname{tot}}}} \right)_n... 
...m_{s' = 1}^B \mathrm{e}^{- \beta 
E_{j s'}}} E_{j s} = \sum_{j = 1}^Y m_j E_j, $

where:

$\displaystyle E_j = \frac{1}{m_j} \sum_{s = 1}^B n_{j s} E_{j s} = \sum_{s = 1}... 
...{e}^{- \beta E_{j s}}}{\sum_{s' = 1}^B \mathrm{e}^{- \beta 
E_{j s'}}} E_{j s} $

is the average energy of an object of type $ j$.

We observe that $ E_j$ is a weighted average of the energies $ E_{j s}$ of the objects of type $ j$ at the bin centres, such that the relative weights of larger $ E_{j s}$ decrease as $ \beta$ increases, so we expect that $ E_j$ will decrease as $ \beta$ increases. To check this, we note that by a rearrangement of the above formula:

$\displaystyle E_j \sum_{s = 1}^B \mathrm{e}^{- \beta E_{j s}} = \sum_{s = 1}^B 
\mathrm{e}^{- \beta E_{j s}} E_{j s} . $

So from Leibniz's rule for the rate of change of a product, which we obtained in the first part of the post, here, we have:

$\displaystyle \frac{\mathrm{d} E_j}{\mathrm{d} \beta} \left( \sum_{s = 1}^B \ma... 
...m{d}}{\mathrm{d} \beta} 
\sum_{s = 1}^B \mathrm{e}^{- \beta E_{j s}} E_{j s} . $

So from the result above, with the fixed number $ a$ taken as $ - E_{j s}$, and $ y$ taken as $ \beta$, we have:

$\displaystyle \frac{\mathrm{d} E_j}{\mathrm{d} \beta} \left( \sum_{s = 1}^B \ma... 
...{- \beta 
E_{j s}} = - \sum_{s = 1}^B \mathrm{e}^{- \beta E_{j s}} E^2_{j s} . $

From a rearrangement of this formula, we have:

$\displaystyle \frac{\mathrm{d} E_j}{\mathrm{d} \beta} = - \frac{\sum_{s = 1}^B 
... 
...left( E_{j s} - E_j \right)}{\sum_{s' 
= 1}^B \mathrm{e}^{- \beta E_{j s'}}} . $

From the formula for $ E_j$ above, we have:

$\displaystyle 0 = \sum_{s = 1}^B \frac{\mathrm{e}^{- \beta E_{j s}}}{\sum_{s' = 1}^B 
\mathrm{e}^{- \beta E_{j s'}}} \left( E_{j s} - E_j \right) E_j . $

From the sum of this formula and the previous one, we have:

$\displaystyle \frac{\mathrm{d} E_j}{\mathrm{d} \beta} = - \frac{\sum_{s = 1}^B 
... 
...{j s} - E_j \right)^2}{\sum_{s' = 
1}^B \mathrm{e}^{- \beta E_{j s'}}} \leq 0, $

which confirms our expectation that $ \frac{\mathrm{d} E_j}{\mathrm{d} \beta} 
\leq 0$, and shows that for finite $ \beta$, $ \frac{\mathrm{d} E_j}{\mathrm{d} 
\beta} < 0$ unless $ E_{j s}$ has the same value for all bins $ s$.

From the formula above for $ E$ in terms of the $ E_j$, we have:

$\displaystyle \frac{\mathrm{d} E}{\mathrm{d} \beta} = \sum_{j = 1}^Y m_j \frac{\mathrm{d} 
E_j}{\mathrm{d} \beta} \leq 0. $

From above, we are considering objects as being of different types if they are different types of atom, or atoms of the same type in different situations. For example, we are treating two oxygen atoms as different types of object if they form parts of gas molecules contained in separate containers, or if one is part of a gas molecule and the other is part of the wall of a glass container. And from above, $ W_n$ is the number of different assignments of the $ N$ objects to the $ B$ bins, such that the number of objects of type $ j$ in bin $ s$, for all $ j$ and $ s$, is $ n_{j s}$. Properties of the system such as the pressures of gases in separate containers depend only on the numbers $ n_{j s}$, and not on the details of which objects are in which bins, other than through the numbers $ n_{j s}$.

If each different assignment of the $ N = \sum_{j = 1}^Y m_j$ objects to the $ B$ bins, consistent with the given total energy $ E$ of the system, is equally likely, then as we observed above, the most likely values of the numbers $ n_{j s}$ will be those for which $ W_n$ reaches its maximum value, consistent with the given total energy $ E$. If the numbers $ n_{j s}$ initially differ from these values, then over the course of time, we expect them to tend towards these values. The reason for this is that we have assumed that the total energy can be expressed as a sum of the energies of the individual objects. There will be small corrections to this assumption, due for example to interactions between gas molecules in the same container, or small mutual interactions between atoms vibrating near the surfaces of different containers that are touching one another. These interactions will occur randomly and can change the numbers $ \frac{n_{j s}}{N}$ by small amounts such as $ \pm 
\frac{1}{N}$, so their net effect is that the numbers $ n_{j s}$ will drift towards their most likely values.

It's convenient, now, to change the meaning of $ E_j$, which I defined above to be the average energy of an object of type $ j$, to be the total energy of the objects of type $ j$, instead. The total energy $ E_j = \sum_{s = 1}^B n_{j s} E_{j s}$ of the objects of type $ j$ depends on the numbers $ n_{j s}$, so a drift of these numbers with time can result in a net transfer of energy from one type of object to another, while the total energy remains constant. If two systems, each of which might contain a number of different types of object, are initially separated from one another, with total energies $ E_1$ and $ E_2$ and initial values $ \beta_1$ and $ \beta_2$ of $ \beta$, and are brought into contact with one another, such that neither system exerts any mechanical, electromagnetic, or gravitational force on the other, but the numbers $ n_{j s}$ for each system can drift due to random microscopic interactions between parts of the two systems as above, for example where containers of gas that were initially separated are now touching one another, then the numbers $ n_{j s}$ for each system will drift towards values corresponding to a common final value $ \beta_f$ of $ \beta$ for both systems, which is the value for which $ W_n$ for the combined system is maximized, when the total energy of the combined system is $ E_1 + E_2$.

If the initial values $ \beta_1$ and $ \beta_2$ of $ \beta$ are such that $ \beta_1 \geq \beta_2$, then the final common value $ \beta_f$ of $ \beta$ cannot be such that $ \beta_f > \beta_1$, for by the result above, that would mean that the final energies $ E_{1 f}$ and $ E_{2 f}$ of the two systems satisfy $ E_{1 f} < E_1$ and $ E_{2 f} < E_2$, in contradiction with the conservation of the total energy of the combined system, which we found above follows from Newton's laws or de Maupertuis's principle, and which implies that $ E_{1 f} + E_{2 f} = E_1 + E_2$. And similarly, $ \beta_f$ cannot be such that $ \beta_f < \beta_2$, for by the result above, that would imply $ E_{1 
f} > E_1$ and $ E_{2 f} > E_2$, which again contradicts the conservation of the total energy of the combined system. Thus we must have $ \beta_1 \geq \beta_f 
\geq \beta_2$, so if $ \beta_1 = \beta_2$, then $ \beta_f = \beta_1 = \beta_2$.

If $ \beta_1 > \beta_2$, and each of the two systems contains at least one type of object for which $ E_{j s}$ has different values for at least two different bins, then from the result above, $ \frac{\mathrm{d} E_1}{\mathrm{d} \beta_1} < 
0$ and $ \frac{\mathrm{d} E_2}{\mathrm{d} \beta_2} < 0$, so $ \beta_1 > \beta_f 
> \beta_2$, and $ E_1 < E_{1 f}$ and $ E_2 > E_{2 f}$, so the drift of the numbers $ n_{j s}$ to their final values results in a net transfer of energy from the second system to the first. The values of $ \beta_f$, $ E_{1 f}$, and $ E_{2 f}$ are determined by the requirement that $ E_{1 f} - E_1 = E_2 - E_{2 
f}$, so that $ E_{1 f} + E_{2 f} = E_1 + E_2$.

These results show that $ \beta$ has the basic observed properties of temperature, except that $ \beta$ increases where temperature decreases, and conversely. To determine the relation between $ \beta$ and temperature, we'll consider the example of an ideal gas, which is a collection of randomly moving non-interacting molecules of mass $ m$, enclosed in a container. In accordance with our assumptions above, we'll assume that each molecule behaves approximately as though its mass is concentrated at a single point.

We'll take the container of the gas to be a box whose edges are aligned with the Cartesian coordinate directions, such that the interior dimensions of the box are $ L_1$, $ L_2$, and $ L_3$. The total momentum of the molecules and the box is 0 in accordance with our assumption above, so the position of the centre of mass of the molecules and the box is independent of time. The molecules are moving randomly in the interior of the box, and we'll assume that the box is sufficiently rigid, and its mass is sufficiently large compared to the mass $ m$ of each molecule, that we can treat the box to a good approximation as staying in a fixed position. The ranges of the position coordinates in the interior of the box are $ 0 \leq x_1 \leq L_1$, $ 0 \leq x_2 
\leq L_2$, and $ 0 \leq x_3 \leq L_3$. The potential energy $ V$ is 0 when all the molecules are in the interior of the box, and $ + \infty$ when any of the molecules is outside the interior of the box.

We'll now divide the range of the possible positions and momenta of each molecule into equal size bins as I described above, and we'll choose each bin to be a box with its edges aligned with the Cartesian coordinate directions, such that the length of each position edge of a bin is $ \varepsilon_x$ and the difference between the values of a momentum coordinate at the ends of a momentum edge of a bin is $ \varepsilon_p$. The sizes of the bin edges $ \varepsilon_x$ and $ \varepsilon_p$ are sufficiently small that we can treat all the molecules in a bin as being approximately at the same position and having approximately the same momentum, but sufficiently large that the number of molecules in a bin is large. This is not a problem for gas containers of everyday sizes, since the number of molecules in a cubic metre of air, to the nearest power of 10, is about $ 10^{26}$.

From above, the momentum $ p$ of a molecule at position $ x$ moving with speed $ v = \frac{\mathrm{d} x}{\mathrm{d} t}$ is $ p = mv$, so the kinetic energy of the molecule is:

$\displaystyle \frac{1}{2} m \sum_{a = 1}^3 v^2_a = \frac{1}{2 m} \sum_{a = 1}^3 p^2_a . 
$

For convenience we'll label the bins by the position $ x$ and the momentum $ p$ at their centres. We are ignoring the vibrations of the atoms in the walls of the container about their mean positions, so the gas molecules are the only relevant type of object, so we can drop the index $ j$ that represents the type of object in the formulae above. So from the formula above, the most likely number of molecules in the bin centred at position $ x$ and momentum $ p$ is:

$\displaystyle n_{p x} = N \mathrm{e}^{- \beta E_{p x} - \gamma} = N \frac{\math... 
...on^3_x \varepsilon^3_p}{\mathrm{e}^{\gamma} 
\varepsilon^3_x \varepsilon^3_p}, $

where $ N$ is the total number of gas molecules, $ \beta$ is the Lagrange multiplier related to the temperature of the gas, whose value is to be calculated from the total energy of the gas, as above, and:

$\displaystyle \mathrm{e}^{\gamma} \varepsilon^3_x \varepsilon^3_p = 
\sum_{\mat... 
...rname{bins}}}} \mathrm{e}^{- \beta E_{p 
x}} \varepsilon^3_x \varepsilon^3_p . $

If $ x$ is in the interior of the container, then the potential energy $ V$ is 0, so from the formula above for the kinetic energy of a molecule:

$\displaystyle n_{p x} = \frac{N}{\mathrm{e}^{\gamma} \varepsilon^3_x \varepsilo... 
...{2 m} \left( p^2_1 + p^2_2 + p^2_3 \right)} 
\varepsilon^3_x \varepsilon^3_p . $

And if $ x$ is outside the interior of the container, then the potential energy $ V$ is $ + \infty$, so since we are assuming $ \beta > 0$ in accordance with the discussion above, the expression $ \mathrm{e}^{- \beta E_{p x}}$ is 0, so:

$\displaystyle n_{p x} = 0. $

Thus:

$\displaystyle \mathrm{e}^{\gamma} \varepsilon^3_x \varepsilon^3_p = 
\sum_{\tex... 
...{2 m} \left( p^2_1 + p^2_2 + 
p^2_3 \right)} \varepsilon^3_x \varepsilon^3_p . $

If we now write $ \mathrm{d} x_1 = \mathrm{d} x_2 = \mathrm{d} x_3 = 
\varepsilon_x$, and $ \mathrm{d} p_1 = \mathrm{d} p_2 = \mathrm{d} p_3 = 
\varepsilon_p$, where we have temporarily relaxed the rule that Leibniz's $ \mathrm{d}$ means that formulae are to be evaluated in the limit where expressions such as $ \mathrm{d} x_1$ tend to 0, we thus have approximately:

$\displaystyle \mathrm{e}^{\gamma} \mathrm{d} x_1 \mathrm{d} x_2 \mathrm{d} x_3 ... 
... \mathrm{d} x_2 \mathrm{d} x_3 \mathrm{d} p_1 \mathrm{d} p_2 
\mathrm{d} p_3 . $

If we did the calculations in the limit where the sizes $ \varepsilon_x$ and $ \varepsilon_p$ of the bin edges tend to 0, this formula would be exact. We assumed above that $ \varepsilon_x$ and $ \varepsilon_p$ are sufficiently small that we can treat all the molecules in a bin as being approximately at the same position and having approximately the same momentum, so I'll treat this formula as exact. We therefore have:

$\displaystyle \mathrm{e}^{\gamma} \mathrm{d} x_1 \mathrm{d} x_2 \mathrm{d} x_3 ... 
...p^2_1 + p^2_2 + p^2_3 \right)} 
\mathrm{d} p_1 \mathrm{d} p_2 \mathrm{d} p_3 . $

We assumed above that the number of bins is finite, and we could implement that by putting an upper limit on the magnitudes $ \left\vert p_1 \right\vert$, $ \left\vert 
p_2 \right\vert$, and $ \left\vert p_3 \right\vert$ of the momentum coordinates. However the expression $ \mathrm{e}^{- \frac{\beta}{2 m} p^2_1}$ tends to 0 very rapidly with increasing $ \left\vert p_1 \right\vert$, once $ \left\vert p_1 \right\vert$ is larger than about $ \sqrt{\frac{2 m}{\beta}}$, so the value of $ \mathrm{e}^{\gamma}$ calculated with an upper limit much larger than $ \sqrt{\frac{2 m}{\beta}}$ on the magnitudes $ \left\vert p_a \right\vert$ will be almost identical to the value of $ \mathrm{e}^{\gamma}$ calculated with no upper limit on those magnitudes. So we'll calculate $ \mathrm{e}^{\gamma}$ with the limits on each $ p_a$ taken as $ - \infty$ and $ + \infty$, where the symbol $ \infty$ was defined above.

We'll complete the calculation of $ \mathrm{e}^{\gamma}$ by first calculating the number $ \int_{- \infty}^{\infty} \mathrm{e}^{- y^2} \mathrm{d} y$, and we'll calculate this number by first calculating its square, which I'll represent by $ X_{\infty}$. We can write this as:

$\displaystyle X_{\infty} = \int_{- \infty}^{\infty} \mathrm{e}^{- y^2_1} \mathr... 
...} \mathrm{e}^{- \left( y^2_1 + 
y^2_2 \right)} \mathrm{d} y_1 \mathrm{d} y_2 . $

We can think of $ y_1$ and $ y_2$ as Cartesian coordinates for the 2-dimensional plane of Euclidean geometry. The distance $ r$ from the point $ \left( 0, 0 
\right)$ to the point $ \left( y_1, y_2 \right)$ is then $ r = \sqrt{y^2_1 + 
y^2_2}$. We'll now define $ X_R$, for $ R \geq 0$, to be the value of the above double integral over the region $ r \leq R$, so that $ X_{\infty}$ is the limit of $ X_R$ as $ R$ tends to $ \infty$. Then $ \left( \frac{\mathrm{d} 
X}{\mathrm{d} R} \right)_R = \frac{X_{R + \mathrm{d} R} - X_R}{\mathrm{d} R}$ is $ \mathrm{e}^{- R^2}$ times the area between the circles $ r = R$ and $ r = R 
+ \mathrm{d} R$, divided by $ \mathrm{d} R$, which is $ \mathrm{e}^{- R^2} 2 \pi 
R$. Now:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} R} \mathrm{e}^{- R^2} = \frac{\mathr... 
...athrm{e}^{- R^2 - 2 R \mathrm{d} R} - 
\mathrm{e}^{- R^2}}{- 2 R \mathrm{d} R} $

$\displaystyle = - 2 R \frac{\mathrm{e}^{- R^2 + \mathrm{d} u} - \mathrm{e}^{- 
... 
...- 2 R \left( 
\frac{\mathrm{d}}{\mathrm{d} u} \mathrm{e}^u \right)_{u = - R^2} $

$\displaystyle = - 2 R \left( \mathrm{e}^u \right)_{u = - R^2} = - 2 R \mathrm{e}^{- R^2} 
. $

The second step here is obtained because the terms neglected by dropping $ \left( \mathrm{d} R \right)^2$ in the exponent are all proportional to $ \frac{\left( \mathrm{d} R \right)^2}{\mathrm{d} R} = \mathrm{d} R$ or higher powers of $ \mathrm{d} R$, and thus give 0 in the limit where $ \mathrm{d} R$ tends to 0. The fourth step is obtained by writing the small quantity $ - 2 R 
\mathrm{d} R$ as $ \mathrm{d} u$. Leibniz's $ \mathrm{d}$ means that the expressions are to be taken in the limit where $ \mathrm{d} R$ tends to 0 from either positive or negative values, and for $ R$ not equal to 0, this is equivalent to the limit where $ - 2 R 
\mathrm{d} R$ tends to 0 from either negative or positive values. The seventh step follows from the result $ \frac{\mathrm{d}}{\mathrm{d} y} \mathrm{e}^y = \mathrm{e}^y$ which we obtained above, with $ y$ taken as $ u$.

Thus:

$\displaystyle \left( \frac{\mathrm{d} X}{\mathrm{d} R} \right)_R = \mathrm{e}^{- R^2} 2 
\pi R = - \pi \frac{\mathrm{d}}{\mathrm{d} R} \mathrm{e}^{- R^2} . $

From the result that the integral of the rate of change of a quantity is equal to the net change of that quantity, which we obtained in the first part of the post, here, we have:

$\displaystyle X_R = X_R - X_0 = \int_0^R \frac{\mathrm{d} X}{\mathrm{d} r} \mat... 
...}^{- r^2} 
\right) \mathrm{d} r = - \pi \left( \mathrm{e}^{- R^2} - 1 \right), $

since $ \mathrm{e}^{- 0^2} = 1$. Thus $ X_{\infty} = \pi$, since the limit of $ \mathrm{e}^{- R^2}$ as $ R$ tends to $ \infty$ is 0. Thus:

$\displaystyle \int_{- \infty}^{\infty} \mathrm{e}^{- y^2} \mathrm{d} y = 
\sqrt{X_{\infty}} = \sqrt{\pi} . $

We now observe that according to the explanation I gave in the first part of the post, here, of the meaning of the integral of a quantity, say $ Q$, that depends smoothly on another quantity, say $ q$, over a range of values of $ q$, say from $ q_1$ to $ q_2$, where $ q_1 \leq q_2$, the range of $ q$ from $ q_1$ to $ q_2$ is divided up into a great number of tiny intervals, and the integral $ \int_{q_1}^{q_2} Q \mathrm{d} q$ is approximately the sum of a contribution from each of these tiny intervals, such that the contribution from each tiny interval is the value of $ Q$ at some point in that tiny interval, times the difference between the values of $ q$ at the ends of that tiny interval. The exact value of the integral is the limit of sums of this form, as the tiny intervals become so small and their number so great, that the size of the largest tiny interval tends to 0. So if $ q$ in turn depends on another quantity, say $ y$, such that $ \frac{\mathrm{d} q}{\mathrm{d} y}$, the rate of change of $ q$ with respect to $ y$, is $ > 0$ for all values of $ q$ from $ q_1$ to $ q_2$, then we have:

$\displaystyle \int_{q_1}^{q_2} Q \mathrm{d} q = \int_{y_1}^{y_2} Q \frac{\mathrm{d} 
q}{\mathrm{d} y} \mathrm{d} y, $

where $ y_1$ and $ y_2$ are the values of $ y$ such that $ q_1 = q \left( y_1 
\right)$ and $ q_2 = q \left( y_2 \right)$. For to each way of dividing up the range of $ y$ from $ y_1$ to $ y_2$ into tiny intervals, there is a corresponding division of the range of $ q$ from $ q_1$ to $ q_2$ into tiny intervals, such that if $ y$ is the end of one tiny interval and the start of another tiny interval in the range from $ y_1$ to $ y_2$, then $ q \left( y 
\right)$ is the end of a tiny interval and the start of another tiny interval in the range from $ q_1$ to $ q_2$. And if the ends of a tiny interval in the range from $ y_1$ to $ y_2$ are at $ y$ and $ y + \varepsilon$, where $ \varepsilon$ is very small, then the difference between the values of $ q$ at the ends of the corresponding tiny interval in the range from $ q_1$ to $ q_2$ is approximately $ \left( \frac{\mathrm{d} q}{\mathrm{d} y} \right)_y 
\varepsilon$, and the error of this approximation tends to 0 more rapidly than in proportion to $ \varepsilon$ as $ \varepsilon$ tends to 0. So for every way of dividing up the range of $ y$ from $ y_1$ to $ y_2$ into tiny intervals, the contributions of corresponding tiny intervals to the left-hand side and the right-hand side of the above equation are approximately equal, and in the limit where the tiny intervals become so small and their number so great that the size of the largest tiny interval tends to 0, the totals of all the contributions to the left-hand side and to the right-hand side become exactly equal, for if the size of a typical tiny interval is $ \propto \, \varepsilon$, where the symbol $ \propto$ means "proportional to," then the number of tiny intervals is $ \propto \, \frac{1}{\varepsilon}$, and the total error is a sum of $ \propto \, \frac{1}{\varepsilon}$ quantities, each of which tends to 0 more rapidly than in proportion to $ \varepsilon$ as $ \varepsilon$ tends to 0, so the total error tends to 0 as $ \varepsilon$ tends to 0.

From this observation, with $ Q$ taken as $ \mathrm{e}^{- \frac{\beta}{2 m} p^2_1}$, $ q$ taken as $ p_1$, and $ y$ taken as $ \sqrt{\frac{\beta}{2 m}} p_1$, so that $ Q = \mathrm{e}^{- y^2}$, $ p_1 = \sqrt{\frac{2 m}{\beta}} y$, and $ \frac{\mathrm{d} p_1}{\mathrm{d} y} = \sqrt{\frac{2 m}{\beta}}$, we find from the result above that:

$\displaystyle \int_{- \infty}^{\infty} \mathrm{e}^{- \frac{\beta}{2 m} p^2_1} \... 
...ty}^{\infty} \mathrm{e}^{- y^2} 
\mathrm{d} y = \sqrt{\frac{2 \pi m}{\beta}} . $

So from the formula above:

$\displaystyle \mathrm{e}^{\gamma} \mathrm{d} x_1 \mathrm{d} x_2 \mathrm{d} x_3 ... 
...thrm{d} p_3 \simeq L_1 L_2 L_3 \left( \frac{2 \pi 
m}{\beta} \right)^{3 / 2} . $

So from the formula above, the most likely number of molecules in a bin centred at a position $ x$ inside the container and momentum $ p$, with edge sizes $ \mathrm{d} x_1$, $ \mathrm{d} x_2$, $ \mathrm{d} x_3$, $ \mathrm{d} p_1$, $ \mathrm{d} p_2$, and $ \mathrm{d} p_3$, is:

$\displaystyle n_{p x} = \frac{N}{L_1 L_2 L_3} \left( \frac{\beta}{2 \pi m} \rig... 
... \mathrm{d} x_2 \mathrm{d} x_3 \mathrm{d} p_1 \mathrm{d} p_2 
\mathrm{d} p_3 . $

The pressure of the gas is the force per unit area on the walls of the container, that results from the gas molecules bouncing off the walls. We'll calculate the force from the molecules bouncing off the wall of area $ L_2 L_3$ at $ x_1 = L_1$.

When an object of mass $ m$ moving with velocity $ v = \frac{\mathrm{d} x}{\mathrm{d} t}$ collides with an object of mass $ M$ that is initially at rest, and no other objects are involved, and the potential energy $ V$ is 0 except at the moment when the objects are in contact, then by the conservation of total energy, which we obtained in the first part of the post, here, the sum of the kinetic energies of the objects is the same before and after the collision, and by the result we found above, the sum of the momenta of the objects is the same before and after the collision. So if the final velocity of the object of mass $ m$ is $ v_f$, and the final velocity of the object of mass $ M$ is $ v_M$, we have:

$\displaystyle \frac{1}{2} mv^2 = \frac{1}{2} mv^2_f + \frac{1}{2} Mv^2_M, $

$\displaystyle mv = mv_f + Mv_M . $

If there is no force between the objects in the 2 or 3 coordinate directions during the collision, as will be the case when a point-like gas molecule hits a wall perpendicular to the 1 coordinate direction, then $ v_{f 2} = v_2$, $ v_{f 3} = v_3$, and $ v_{M 2} = v_{M 3} = 0$. So the equations above become:

$\displaystyle mv^2_1 = mv^2_{f 1} + Mv^2_{M 1}, $

$\displaystyle mv_1 = mv_{f 1} + Mv_{M 1}, $

whose solution with $ v_{f 1} \neq v_1$ is:

$\displaystyle v_{f 1} = - \frac{M - m}{M + m} v_1, \hspace{1.5cm} v_{M 1} = \frac{2 m}{M 
+ m} v_1 . $

So when $ M$ is extremely large in comparison to $ m$, as for example when the object of mass $ M$ is the wall of a gas container of mass $ \sim 1$ kilogram, and the object of mass $ m$ is a gas molecule of mass $ \sim 10^{- 26}$ kilograms, we have $ v_{f 1} \simeq - v_1$ and $ v_{M 1} \simeq 0$ to such a high precision that I shall treat these relations as exact.

So if the 1 component of the velocity of a molecule of the ideal gas is $ v_1$ at a particular time, then the only values the 1 component of the velocity of that molecule ever takes are $ \pm v_1$. If $ v_1 > 0$, then that molecule transfers momentum $ 2 mv_1$ to the container wall at $ x_1 = L_1$, at moments separated by time intervals $ \frac{2 L_1}{v_1}$, so the average rate at which that molecule transfers momentum to that container wall is $ \frac{2 mv^2_1}{2 
L_1} = \frac{mv^2_1}{L_1}$ per unit time, so since force is the rate of change of momentum, the average force exerted by that molecule on that container wall is $ \frac{mv^2_1}{L_1}$ in the outwards direction. Thus the pressure $ P$ on that container wall is the sum of $ \frac{mv^2_1}{L_1 L_2 L_3}$ over all the gas molecules in the container.

From the formula above, integrated over the volume of the container, the most likely number of molecules in a momentum bin centred at momentum $ p$, with edge sizes $ \mathrm{d} p_1$, $ \mathrm{d} p_2$, and $ \mathrm{d} p_3$, is:

$\displaystyle n_{p x} = N \left( \frac{\beta}{2 \pi m} \right)^{3 / 2} \mathrm{... 
...p^2_1 + p^2_2 + p^2_3 \right)} \mathrm{d} p_1 
\mathrm{d} p_2 \mathrm{d} p_3 . $

We'll assume now that this most likely number of molecules in each momentum bin is the actual number of molecules in each momentum bin. Then the sum of $ \frac{mv^2_1}{L_1 L_2 L_3} = \frac{p^2_1}{mL_1 L_2 L_3}$ over all the gas molecules in the container is:

$\displaystyle P = \int \int \int N \left( \frac{\beta}{2 \pi m} \right)^{3 / 2}... 
...( p^2_1 + 
p^2_2 + p^2_3 \right)} \mathrm{d} p_1 \mathrm{d} p_2 \mathrm{d} p_3 $

 \mathrm{e}^{- \frac{\beta}{2 m} p^2_1} \mathrm{d} 
p_1, $

where I used the result above to calculate the integrals over $ p_2$ and $ p_3$.

We'll calculate the above integral over $ p_1$ by first calculating the integral $ \int_{- \infty}^{\infty} \mathrm{e}^{- y^2} y^2 \mathrm{d} y$. We found above that $ \frac{\mathrm{d}}{\mathrm{d} y} \mathrm{e}^{- y^2} = - 2 y 
\mathrm{e}^{- y^2}$, so:

$\displaystyle \int_{- \infty}^{\infty} \mathrm{e}^{- y^2} y^2 \mathrm{d} y = - 
... 
...t( 
\frac{\mathrm{d}}{\mathrm{d} y} \mathrm{e}^{- y^2} \right) y \mathrm{d} y. 
$

From Leibniz's rule for the rate of change of a product, which we obtained in the first part of the post, here, we have:

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} y} \left( \mathrm{e}^{- y^2} y \righ... 
...\mathrm{d}}{\mathrm{d} y} \mathrm{e}^{- y^2} \right) y + 
\mathrm{e}^{- y^2} . $

Thus:

$\displaystyle \int_{- \infty}^{\infty} \left( \frac{\mathrm{d}}{\mathrm{d} y} 
... 
...left( \mathrm{e}^{- y^2} y \right) - 
\mathrm{e}^{- y^2} \right) \mathrm{d} y. $

And from the result that the integral of the rate of change of a quantity is equal to the net change of that quantity, which we found in the first part of the post, here, we have:

$\displaystyle \int_{- \infty}^{\infty} \frac{\mathrm{d}}{\mathrm{d} y} \left( 
... 
...rrow \infty} - \left( \mathrm{e}^{- y^2} y \right)_{y 
\rightarrow - \infty} . $

The magnitude of $ \mathrm{e}^{y^2}$ increases much more rapidly than the magnitude $ \left\vert y \right\vert$ of $ y$, as $ \left\vert y \right\vert$ becomes large, because the magnitude of $ \mathrm{e}^{y^2}$ is multiplied by a factor $ \mathrm{e} \simeq 2.718$ every time $ y^2$ increases by 1. Thus since $ \mathrm{e}^{- y^2} = \frac{1}{\mathrm{e}^{y^2}}$, the magnitude of $ \mathrm{e}^{- y^2} y$ tends rapidly to 0 as $ \left\vert y \right\vert$ becomes large, so both terms in the right-hand side of the above formula are 0. Thus:

$\displaystyle \int_{- \infty}^{\infty} \left( \frac{\mathrm{d}}{\mathrm{d} y} 
... 
... = - \int_{- \infty}^{\infty} 
\mathrm{e}^{- y^2} \mathrm{d} y = - \sqrt{\pi}, $

where at the last step, I used the result we found above. So from the formula above:

$\displaystyle \int_{- \infty}^{\infty} \mathrm{e}^{- y^2} y^2 \mathrm{d} y = - 
... 
...rm{d} y} 
\mathrm{e}^{- y^2} \right) y \mathrm{d} y = \frac{1}{2} \sqrt{\pi} . $

So in a similar manner to the calculation above, with $ y$ again taken as $ \sqrt{\frac{\beta}{2 m}} p_1$, we have:

 \mathrm{e}^{- \frac{\beta}{2 m}... 
...thrm{d} y = \frac{1}{2} 
\sqrt{\pi} \left( \frac{2 m}{\beta} \right)^{3 / 2} . $

So from the result above, the pressure $ P$ of the gas in the container is:

$\displaystyle P = \frac{N}{mL_1 L_2 L_3} \sqrt{\frac{\beta}{2 \pi m}} \hspace{0... 
...eta} \right)^{3 / 2} = 
\frac{N}{L_1 L_2 L_3} \hspace{0.8em} \frac{1}{\beta} . $

Comparing with the ideal gas law:

$\displaystyle PV = N \mathrm{k} T, $

which was deduced by Émile Clapeyron in 1834 from the experimental observations of Robert Boyle and Jacques Charles, where $ V = L_1 L_2 L_3$ now represents the volume of the container, $ T$ is the absolute temperature in degrees kelvin, which is the same as the centigrade scale except that the zero of temperature is at $ - 273.16^{\circ}$ centigrade instead of at $ 0^{\circ}$ centigrade, and:

$\displaystyle \mathrm{k} = 1.38 \times 10^{- 23} \hspace{0.8em} 
\mathrm{\ensur... 
...ame{degree}}} \hspace{0.8em} 
\mathrm{\ensuremath{\operatorname{centigrade}}}, $

is known as Boltzmann's constant, after Ludwig Boltzmann, we therefore find that:

$\displaystyle \beta = \frac{1}{\mathrm{k} T} . $

So if a system in thermal equilibrium at absolute temperature $ T$ is composed of a very large number microscopic objects of various types subject to Newton's laws of motion, which we derived from de Maupertuis's principle of stationary action in the first part of the post, here, and if the range of possible positions and momenta of the objects is divided up into $ B$ tiny bins of equal size, then from the result we found above, the most likely number of objects of type $ j$ in position and momentum bin number $ s$ is:

$\displaystyle n_{j s} = m_j \frac{\mathrm{e}^{- \frac{E_{j s}}{\mathrm{k} T}}}{\sum_{s = 
1}^B \mathrm{e}^{- \frac{E_{j s}}{\mathrm{k} T}}}, $

where $ m_j$ is the total number of objects of type $ j$, and $ E_{j s}$ is the energy of an object of type $ j$ at the centre of bin number $ s$. This is called the Boltzmann distribution, and the corresponding distribution of the momenta of the molecules in an ideal gas, which we found above, with $ \beta$ identified as $ \frac{1}{\mathrm{k} T}$, is called the Maxwell-Boltzmann distribution.

The clue that led to the discovery of quantum mechanics, whose principles are summarized in Feynman's functional integral, and which made possible, among other things, the design and construction of the computer on which you are reading this blog post, came from the attempted application of the Boltzmann distribution to electromagnetic radiation. In the next part of the post, Electromagnetism, we'll look at the discoveries about electricity and magnetism that enabled James Clerk Maxwell, in the middle of the nineteenth century, to identify light as waves of oscillating electric and magnetic fields, and to calculate the speed of light from measurements of:

  1. the force between parallel wires carrying electric currents;
  2. the heat given off by a long thin wire carrying an electric current; and
  3. the time integral of the temporary electric current that flows through a long thin wire, when a voltage is introduced between two parallel metal plates, close to each other but not touching, via that wire.