Fake Banner
    The Most Famous Equation Of Physics And Its Derivation
    By Barry Barrett | May 11th 2012 08:45 AM | 17 comments | Print | E-mail | Track Comments
    About Barry

    I have a B.S. in Mathematics and a minor in Philosophy....

    View Barry's Profile
    PART A: Introduction: does the inertia of a body depend upon its energy content?
    PART B: The Toolkit: mathematical and physical assumptions 
    PART C: The Thought Experiment and the Word Problem
    PART D: The Derivation. Solving the Word Problem
    PART E: Conclusion: Einstein's Style or How to not be a Crack Pot

    PART A:  Does the inertia of a body depend upon its energy content?

    Students of physics know the answer to the question Einstein asked in the title of his celebrated paper, "Does the inertia of an object body depend upon its energy content?" because in it Einstein derived the equation . The inertia (or mass) of an object at rest is equal to  where E is the energy content of the object. A remarkable fact about this paper and many of his other early papers is their relative simplicity. I believe I can briefly state the few mathematical and physical assumptions used in the paper. These are conservation of energy, the principle of relativity, the formulas for the kinetic energy of a particle and the relativistic energy of light and the binomial theorem. Given these assumptions, and a cleverly devised thought experiment to apply them to, can be derived as though it were a simple undergraduate word problem. 

    PART B: The Toolkit: Physical and Mathematical Assumptions

    Physical Assumptions:

    1. Conservation of Energy: Energy is neither created or destroyed. The total energy in an isolated system remains constant. In Einstein's thought experiment, the system consists of all the volume enclosing a box.


    A classic example of conservation of energy. the pendulum
    The bob of a pendulum has kinetic  and potential energy where m is the mass of the bob, v is its velocity, h is its height and g is a number that is determined by the strength of gravity. According to conservation of energy


    As the bob ascends and its velocity decreases, h increases to keep the equation constant. When the height is maximum, the velocity is zero and when the height is zero, the velocity is maximized. As the bob swings back and forth, kinetic and potential energy vary, but the total energy of the system remains constant.

    2. The relativistic equation for the energy of light

    The energy  of a light ray in a frame moving with a constant velocity v can be determined from its energy in a stationary frame from 

    [assuming the moving frame is parallel to the x-axis of the stationary frame and  is the angle of the light ray from the x-axis]

    If the light ray is moving parallel with the x-axis and in the same direction as the moving reference frame then and cos(0)=1 and therefore

    If the light ray were moving in the opposite direction, then (because cos(2*pi)=-1) the negative sign (-) before the v/c term is replaced by a positive sign(+). Einstein used this fact to simplify the mathematics of his derivation.

    3. relativity: the freedom to choose different reference frames 

    The principle of Special Relativity states that if a physical principle or equation holds in a stationary frame, it must also hold in a frame moving relative to it at constant velocity. We can evaluate a process from either reference frame. In the derivation of E=mc^2 below, the energy of a simple system is calculated in different frames of reference moving at constant velocity v relative to one another and, in accordance with the principle of relativity, it is assumed that the principle of conservation of energy holds equally in both frames of reference. 
    4. kinetic energy As mentioned above, the kinetic energy of a particle with mass m, is  .

    Mathematical Assumptions:

    1. Elementary Algebra 

    The Binomial Series:

    2. Calculus (optional)
    The Taylor Series:

    These assumptions (and keen physical intuition)  are all you need to derive E=mc^2. So, before I guide you through Einstein's derivation, try and derive it yourself. If I was a high school physics teacher I'd like to assign this word problem as extra credit. 

    PART C: The Thought Experiment (applying mathematical and physical assumptions to a concrete scenerio) and the Word Problem:

    Einstein began by constructing a simple thought experiment to apply his assumptions to.

    1. A box is weighed. 

    2. The box emits two, oppositely oriented, light rays for a short interval of time.

    3. The box is weighed again. 

    The Word Problem:

    Has the box lost any mass as a result of the energy emitted from it? If so, how much?

    Remember, light has no mass, therefore according the principle of mass conservation, no mass has left the box. 
    In 1905, Lavoisier's principle of conservation of mass was still generally accepted. This 1905 Annus Mirabilis paper would overturn that.

    A few remarks before presenting Einstein's derivation. 

    On reducing a complicated problem to a problem that is "as simple as possible, but not simpler." Applying the mathematical and physical assumptions above to this simple thought experiment reduces the deep question Einstein posed in 1905 to a problem that an intelligent undergraduate physics student could answer. Notice that Einstein didn't attempt to argue from general principles. If you immediately plunged into the equations and started messing around with the math, then you are not thinking like Einstein. 

    PART D: The Derivation. Solving the Word Problem 

    The initial total energy of the box, relative to the stationary system (x,y,z), in Einstein's thought experiment is . This is the energy of the system, in the stationary frame, prior to the emission of the light rays.

    The initial total energy of the box, relative to the moving frame (t,u,v), is  . This is the energy of the box, relative to the moving frame, prior to the emission of the light rays.  

    If is the energy of the box, relative to the stationary frame, after the emission of the light rays and the total energy of both light rays is  then, by conservation of energy

    If is the energy of the box, relative to the moving frame, after the emission of the light rays and the energy of the light rays in the moving frame is given by                           


    then, by conservation of energy and the relativistic equation for the energy of light

    and therefore

    Einstein then calculates that

    The first and second H-E terms on the left are measures of the change in energy of the box at the same instant in time due only to the relative motion of the two frames of reference. The first H-E term is the initial change in energy of the box due to relative motion of the two frames and the second term H-E term is the change in energy of the box after the emission of the light ray, but again, only due to the relative motion of the two frames

    Because the H-E terms measure the change in energy of the box due to the relative motion of the two frames only, the additive constant C representing any other energy left over (such as the internal molecular energies of the box etc) is constant

    and therefore C cancels out leaving only the change in kinetic energy as

    This can be approximated by the Binomial or Taylor Series "neglecting magnitudes of fourth order or higher." Applying the Binomial Theorem approximation above, 


    By comparing this to the expression for Kinetic energy  one can infer that the change in mass of the box due to the emission of light is equal to L/c^2. Recall that L is the total energy of the light rays emitted from the box, therefore E=mc^2. 

    An Important Proviso: By using the binomial approximation in  this derivation, it was assumed that v < < c. If v were close to c, then this approximation would be invalid. This is why mc^2 is the rest energy of an object. The slower the box in the derivation is moving, the more accurate the approximation becomes. If an object is moving close to the speed of light, then the E=mc^2 approximation must be replaced by E^2 = (mc^2)^2 + (pc)^2 where p  is momentum.    
    From this equation it directly follows that:—If a body gives off the energy L in the form of radiation, its mass diminishes by L/c². The fact that the energy withdrawn from the body becomes energy of radiation evidently makes no difference, so that we are led to the more general conclusion that the mass of a body is a measure of its energy-content; if the energy changes by L, the mass changes in the same sense by L/9 × 1020, the energy being measured in ergs, and the mass in grammes.                                                                                                                                                                              Einstein, Albert. Does the Inertia of a Body Depend upon its Energy Content?

    PART E: Einstein's Style or How to not be a Crack Pot

    In many of his papers Einstein derived general results by studying highly idealized thought experiments. He gave himself illuminating, but simple, word problems. In a great series of QFT lectures, Anthony Zee (see the suggestions section below) quotes Einstein as saying "Make physics as simple as possible, but not simpler." This was the approach of great physicists like Plank, Bohr and Feynman. Plank famously pioneered quantum mechanics by analyzing a box composed of small oscillators. At a Ted talk in honor of Feynman, Susskind said that Feynman was a master of simplifying physics. Compare this tendency with the grandiose complexity found in the work (and blog posts) of crack pots.

    Now read the original paper


    I'd like to inspire curiosity in my readers.  So here are some questions to ponder.

    1. Did Einstein use the Taylor Series or the Binomial Series? Does his scratchwork still exist?
    2. Did Einstein know the answer before deriving it or was he surprised by his own result? If so, why did he suspect that mass was related to energy?
    3.What was the general consensus about the mass of light in 1900? What was Maxwell's opinion? 
    4. I've stated some assumptions. Are there any "hidden assumptions" I've glossed over? 
    5.What about those higher order terms in the Binomial expansion that I've ignored? 
    6. Light always moves with the same speed and is always massless, but we have seen that light does have varying energy. This is different from the case of a particle. Why? 


    Albert Einstein. Does the Inertia of a body depend upon its energy content? The original paper. 

    Anthony Zee. Lectures on QFT. (with an emphasis on "making physics as simple as possible, but not simpler" especially in the lecture on "back of the envelop calculations") There also are many other excellent lecture series covering most of the major recent developments in physics including Dark Energy, Inflation, String Theory, and SUSY.

    A Yale University Lecture on Taylor Series that includes a very brief derivation of E=mc^2 using Taylor Series. There are also very good lectures on Special Relativity in the same series especially this lecture on 4-Vectors.

    In this MIT Lecture on Special Relativity Leonard Susskind derives E=mc^2 using the binomial theorem (at 1:16.00) and answers a question about the higher order terms of the binomial series approximation (at 1:22.40).  


    So there you have it, the answer as to why a hot potato seems to weigh less as it is tossed around. ….Well actually, after you factor in the incredible speed of light squared in the denominator … other factors prevail. Seriously though, thanks for the quick and efficient derivation. For clarity, I would add parentheses around the H0-E0 terms … and for elementary school readers, I would state what exactly is being used as x and alpha in the binomial series. A simple Pascal triangle should be enough to teach the binomial form to Asian players of the noisy plinko game machines in which they watch their fortunes drop …

    In 1906, Einstein did a derivation involving light zipping from one end of a cylinder to another. I'll look up my extensive notes on it … done on its centennial ...

    My biggest fault with your article is where you casually say, like a magician doing his patter, “Remember, light has no mass …”

    Good Lord. Pardon my Poynting Vector. And here we have David Halliday in his corner patiently trying to get us up to tangent spaces and 4-vectors (and with a wink and a blink … connections across fibers).

    Two photons moving in different directions have a mass as soon as you sum them as 4-vectors. Do you want to see me prove it?

    ~ Bullwinkle puts his gloved hand into a hat ~ and voila ~ our coyote David has ten thousand hits outlining what constitutes a vector space without even getting into the not-so-positive definite metric which is behind the curtain.
    My biggest fault with your article is where you casually say, like a magician doing his patter, “Remember, light has no mass …Two photons moving in different directions have a mass as soon as you sum them as 4-vectors. Do you want to see me prove it?
    I only meant to convey that, prior to quantum mechanics, in the context of Maxwell's Equations, it didn't make much sense to talk about light having mass, so the answer was not obvious. I believe the general consensus at the time was that light was massless. The assumption that light has no mass or no rest mass is not part of the derivation. Two photons have mass or "effective mass" independently of their directions. The mass is determined by the frequency of the light. I wasn't under the impression that the mass of particles cancels out when they are traveling in opposite directions. But, I would like to see your calculation. Or did you mean that two photons moving in opposite direction have a total rest mass? That would be surprising.
    Remember, light has no mass, therefore according the principle of mass conservation, no mass has left the box.
    Light has no restmass. It does have mass. Then I stopped reading because I remember the derivation to be a lot easier than this here. I am not sure whether it is faulty or not, but assuming first such a complicated formula for the behavior of light (with cosine and all) and then the series expressions, this is not the simplest derivation.

    Have you not read Einstein's original paper?  The "complicated formula for the behavior of light (with cosines and all)" is the very first equation in Einstein's paper.


    I'd like to see Sascha prove E=mc^2 without high school level algebra. The purpose of the presentation was to guide the curious through the original paper which happens to be simpler than most people would guess. I never said it's the simplest derivation.
    You want to always do like the pioneer's did before a field matured? Good luck with that.
    Thank you. It is true that although light has no rest mass, the equations E=mc^2 and E=hc/y where y is the wavelength of light and h is planks constant, imply the mass (usually called "effective mass") of light  is given by the equation m=h/cy. Unfortunately, I can't edit the post.
    Here is the simplest way to get “mass without mass”.

    Start with the general E^2 –P^2 = m^2 equation with each term scaled in units of mass. I'm assuming you are already satisfied that it can be derived through a variety of means. Let's now use it.

    Consider just two photons moving in opposite directions, each with energy E.
    Their momentum is also of magnitude E, since for photons, the metric vanishes and E^2 =P^2.

    The time-like and space-like components of the 2 photons can be written in 4-vector form as (E, P) and (E, -P) … or perhaps you prefer Edt+Pdx and Edt-Pdx.

    To compute the 4-vector of BOTH photons together, you simply add up their individual components (just as you would do for combing the vectors of winds or currents).

    (E, P) + (E, -P) = (2E, 0). The resultant vector has just a component pointing in the time direction. Its non-zero magnitude is 2E units of mass. Voila: mass without mass.

    Nowhere do I introduce a preposterous effective mass for a photon.
    The key is to not lose sight of its 4-vector form. That's all folks.

    What is alarming is that Barry would write: "did you mean that two photons moving in opposite direction have a total rest mass? That would be surprising."
    I wanted to know whether a system of photons could properly be considered to have "a rest mass." Consider the case with planes (identical planes slowly flying away from each other) instead of photons. The argument still works (giving 2E=2m or E=m) but the m is obviously rest mass in that case. But, you started by stating that the particles have zero rest mass, so you are saying that the system has rest mass even though the individual particles don't. I was unsure about whether that should be my conclusion, but I believe it is. I don't see any reason to be opaque about it.
    Yes, a system can have real rest-mass (and inertial mass via the equivalence principle) even though its individual parts may have no rest-mass. What's really funny about this whole historic exercise is the way Bohr took Einstein for a loop-de-loop by weighing a similar box of photons some twenty years later to defend Heisenberg's uncertainty principle  and his complementary way of seeing everything. Remember, it had to do with the way time would change as the box became lighter as it was suspended on a simple spring scale for weighing potatoes.
    A professor covered that and I've read about it, but forgot the details. I believe that Bohr somehow used Einstein's own theory of General Relativity against him. 
    Blue-Green is perfectly right. Mass is nothing else than the energy of the system observed from a reference frame in which the momenta of the system's components add up to zero. Such a reference frame is referred to as the center of mass frame. And by the way, here is a simpler derivation: http://www.adamauton.com/warp/emc2.html

    Now to go further and comb through and combine high dimensional structures .... let's realize that although the 4-vector form for a photon is vastly superior to trying to fit it into a scalar ... this form still does not do it justice since it wipes out an electromagnetic signal's polarization which was well known long before Einstein ... as was also the fact that radiant energy has momentum. This is why I mentioned the Poynting Vector which is a cross-product of the electric and magnetic fields and points in the spacial direction of a signal with a right-handed curl and all of that. I don't really know what is the best geometric object for a photon that doesn't compromise any of its features. I'd like to know what the full geometric object is for each fundamental particle. Suggestions are welcomed …. especially from the wolf Luboš ..... ~~ this is my second time to summon him here, the first invocation vanished ~~

    Since Sascha is not around to prove E=mc^2 without high school level algebra, I'll gladly do it. I call it the playground derivation of E= mc^2. It is based on Einstein’s 1906 derivation. Watch for how the scaffolding is removed. The essentials could be inscribed on a grain of rice. The only tricky equation I'll assume is E = pc for photons. This formula was already established in Maxwell's day. It was derived again using Special Relativity and yet again using elementary quantum considerations as I'll show right now. For light, “distance = rate times time” can be written as dx = cdt = c/v. By v I mean 1/dt, the inverse of dt, the same v or Greek nu that stands for frequency in the 2nd most famous formula in physics, E= hv. Plug v = c/dx into it and you get E = hc/dx. This can be written as E = pc once you realize that h/dx is the momentum p. The expression p = h/dx is often written as wavelength lambda = dx = h/p. This general formula for light led the way for de Broglie to claim it is as being universal for all matter.

    Long before we had astronauts, Einstein could visualize the simplified physics of objects floating ~ actually free-falling ~ through space-time.

    Imagine an insulated cylinder (floating in space) of length L and mass M,with one end very hot. Photons push off from the hot end and race across to the cool end. This pushing away from the hot end gives the cylinder a tiny kick in the reverse direction. The recoil of the cylinder a distance –dx in time dt is precisely counter-balanced by the movement of the photons so that the center-of-mass of the entire system never changes.

    Let m be the unknown mass content of the thermal energy that is released from the hot end. It moves a distance L while the cylinder recoils a distance -dx. Think of a teeter-totter in which a large weight M is shifted a tiny distance –dx and a small weight m is shifted a long distance L. If the beam is to maintain its initial balance, then Mdx=mL. We’ll call this the balance beam equation.

    When the thermal energy is transferred from one end to the other, its mass equivalent m is added to the cold end and subtracted from the hot end.

    How big is dx? It is the distance the cylinder recoils while the photons cover the distance L in the brief time dt = L/c. The velocity V of the cylinder is -dx/dt. Its momentum is the Newtonian expression MV since the cylinder moves slowly. This momentum is exactly opposite in value to the total momentum p of the photons that kick off the hot surface with a combined energy E = pc. This E is the thermal energy that is being moved. When it is moving across the cylinder as radiant energy, it has a momentum p = E/c. This p is exactly balanced by the cylinder’s impulse MV. MV = p can be written as Mdx/dt = E/c or equivalently, dx = Edt/Mc = EL/Mc^2 (using dt=L/c). Use this expression for dx in the balance beam equation Mdx=mL and you have: MEL/Mc^2 = mL.

    Now here comes the voodoo math. In the last expression, the big M and big L can be completely removed. The M is divided by itself on one side. The L’s occur in the numerator on both sides and can be divided out. This is where the scaffolding disappears and the scales fall away from your eyes. The imaginary cylinder of M and L is an artifice that we don’t have to keep or attach any particular importance. The simplified balance beam equation now says: E/c^2 = m, which is the same as E = mc^2.

    If this formula were wrong, a system initially at rest and balanced, could not remain so.

    Good work. It looks good, but it seems you have sacrificed mathematical complexity for some physical complexity. Mathematics aside, the physical argument in Einstein's paper is very simple, but probably too subtle for most high school students. I'd probably provide some hints if presenting it to someone at that level.

    Don't be bashful. What would those extra “hints” be that would help a few more students (young or old) get it. In the playground derivation, I think it is very interesting how it hinges on something so fundamental as “an object at rest will stay at rest” and a bit of teeter-totter physics.

    Long before quantum physics was fashionable for pseudo-science, relativity was ripe for abuse with its warped spaces and time, arbitrary topologies and matter being convertible into invisible waves.  Einstein had his work cut out for him to keep his Principles on solid ground. As I noted in a concurrent thread on the 3rd most famous equation in physics, delta S > 0, Einstein's work in the patent office had him more than a little familiar with whacky inventors and impossible schemes.

    What do you think happens to the students who walk away from physics or don't even step over its rather low threshold? I don't know, however, I do know that popular media is replete with objects at rest that suddenly move about on their own as if they were possessed. Therefore, I implore you to spell out just what are those hints that would help people get over the bumpiest parts of the ride. Where is our test patient Helen when we need her? Just don't let her or your students get into wiccapaedia and google. That will be the end of the learning for most.

    The hint, in my case, would probably be a reminder that the kinetic energy of the box contains m, and that the difference in energy between the two frames is purely kinetic and that energy is conserved in both frames of reference. That would reduce it to an almost purely mathematical exercise, but I believe, for an introductory physics class, putting that information togther in mathematical language would be an exercise in itself for most students. 

    I wouldn't know where to give hints for your derivation. There are a lot of steps and I doubt many high school students could discover that method even with hints.