*your*problem, and it deserves your attention and dedication.

The other day I had a collaborative coding session with a post-doc (who also has a blog here, where he writes interesting stuff on machine learning and neural networks), Giles. He writes excellent python code, in a principled, tidy way that physicists usually are incapable to match, so I am learning a lot by working with him. At some point we found ourselves looking at the problem of finding the equation of a straight line in 3D which best fits a set of points.

"Wait -" I can hear some of you say, "are you telling me you didn't just look it up and get the solution from f***ing googling it?" Yeah, that's exactly it. We knew this "problem" has tabulated solutions, but it is fun to do it yourself, and that's precisely what I did: as Giles was busy with python, I took paper and pencil (really!) and proceeded to solve it by maximizing a likelihood function. How weird is that? It reminds me a short story by (I think) Isaac Asimov telling the tale of a starship losing its main computer, and nobody is capable of doing additions and multiplications any more because they've used computers for ages to do that. There's only one guy who can do it, and to the disbelief of all the other passengers he finds the navigation parameters required to bring the vessel back home.

But let's see what the problem is, and if you can follow the very simple math, and solve it again with my guidance. You have a set of "detector hits" - readouts of the position a particle passed through - in 3D space. Call them

**x_i, y_i, z_i**, with

**i=1...N**. We assume that

**z_i**are known perfectly (because the detector is a planar device, and well positioned at a certain

**z**coordinate, while

**x_i**and

**y_i**come in with some uncertainties, which for now we assume all equal to

**s**. We search for an equation linking

**x**to

**z**and

**y**to

**z**separately, because the

**x**and

**y**coordinates of each point are measured fully independently, and a 2D parametrization of a line is easier.

We start by writing these equations down: for a line that is not orthogonal to the

**z**axis we may write in full generality

**x(z) = x* + az**

**y(z) = y* + bz**

Now, if the ones above are the equations that connect the value of

**x,y**, and

**z**of the particle trajectory, we may write the probability of finding the particle at a position

**x,y**if its

**z**position is

**z_i**, and given a corresponding set of readouts

**x_i, y_i**, and uncertainties

**s**, as

**P_i (x,y) = exp[-(x-x_i)^2 / (2s^2)] exp [-(y-y_i)^2 / (2s^2)]**

Since we have a set of

**N**readouts, we need to combine these probabilities, such that we can find the parameters

**x*, y*, a, b**which we need to define our particle path. But this is easy: we are talking of independent measurements, so we just multiply probabilities:

**P = P_1 P_2 P_3 .... P_N**

Now, the likelihood of the parameters

**x*,y*,a,b**can be expressed as that product

**P**above, when we interpret

**P**as a function of those four parameters - which we can do, if we have specified our data

**x_i,y_i,z_i**. It is then even easier if we write directly the logarithm of the likelihood, because products become sums:

**log L(x*,y*,a,b) = log P = log P_1 + log P_2 + log P_3 + ... + log P_N**

Given the expression we have written above for the

**P_i**'s, we thus find

**log L = -1/(2s^2) { SUM_i [(x-x_i)^2 + (y-y_i)^2] }**

where

**SUM_i**is the sum over the

**N**sets of data points. We can now substitute the mathematical formulation of our model for the particle trajectory - a line - into the

**log L**expression, by formulating

**x**and

**y**as connected to

**x*,a, y*,b**and get

**log L = -1/(2s^2) { SUM_i [(x*+az_i - x_i)^2 + (y*+bz_i - y_i)^2] }**

"Fine, very good, let's put this into a minimizer routine and we're done", I can almost hear some of the computer geek readers here. No, we're going on with paper and pencil, as the fun is about to start! In fact, now what we just need to do is to ask that the likelihood be maximized by the four parameters, and the thing will solve itself! We thus write:

**d log L(x*,y*,a,b) / dx* = 0**

**d log L(x*,y*,a,b) / dy* = 0**

**d log L(x*,y*,a,b) / da = 0**

**d log L(x*,y*,a,b) / db = 0**

(where the symbols "

**d logL / dx**", etc, mean taking the derivative of the

**logL**function with respect to the variable

**x**) and we'll obtain a set of four equations, from which we will be able to obtain the wanted values of the four parameters describing the particle trajectory. This is because when the likelihood is maximized its derivative with respect to each parameter will become zero - it has an extremum. Let's see what happens: e.g. the first equation becomes

**-1/(2s)^2 SUM_i [ 2x* + 2az_i - 2x_i] = 0**

If you consider that

**x***and a are not depending on the i indices over which the sum runs, we can simplify this equation (also getting rid of the constant term

**1/(2s)^2)**as

**x* = SUM_i [x_i - a z_i] / N**

which we can write, calling

**x_m**the average of the

**x_i**and

**z_m**the average of the

**z_i**,

**x* = x_m - a z_m**

Simple, ain't it? Similar calculations will easily bring you to get similar formulas for the other three parameters:

**y* = y_m - b z_m**

**a = [(xz)_m -x* z_m ] / (z^2)_m**

**b ) [(yz)_m - y*z_m] / (z^2)_m**

where again the suffix

**_m**indicates that the expressions it applies to are averages over the

**N**measurements; for

**xz, yz, z^2**this also applies, provided that you first multiply and then sum the components when you average them!

We have gotten to a four-equation system of four unknowns. They can be nasty in some cases, but this one is is actually very straightforward to solve. Take the first equation for

**x***and substitute in it the expression for

**a**, and you immediately get

**x* = [x_m - z_m (xz)_m / (z^2)_m ] / [1 - (z_m)^2 / (z^2)_m]**

Similarly, because of the identical formulation of the problem on the two planes

**xz, yz,**you get the corresponding expression for

**y***by just substituting

**y**to

**x**and

**b**to

**a**:

**y* = [y_m - z_m (yz)_m / (z^2)_m] / [1 - (z_m)^2 / (z^2)_m]**

For a and b the solution is instead

**a = [(xz)_m - x* z_m] / (z^2)_m**

**b = [(yz)_m - y* z_m] / (z^2)_m**

which we may keep in a compact form exploiting the already given expressions for

**x***and

**y***above.

So, this is it. As you see, the solution takes roughly five minutes with paper and pencil. Ok ok, let's make it ten minutes, so that we go slowly and avoid mistakes... But what happens if the s parameters are actually different for the

**N**measurements, and different for the

**x_i**and

**y_i**? Do we have to go back to the start and solve much harder equations? NO! The beauty of the Gaussian approximation of uncertainties, and the symmetry of the problem as written out above, is that we can substitute the arithmetic averages of the

**x_i, y_i, (x_i z_i)**, etc factors with weighted averages, where the weights are the inverse variances

**sx_i, sy_i**, and we are done!

What do I mean with weighted averages? Well, that is the solution of another likelihood maximization problem, but one which you should know by heart: the weighted average

**x_w**of measurements

**x_i**each carrying an uncertainty

**sx_i**is

**x_w = [SUM_i x_i / sx_i^2] / SUM_i [ 1 / sx_i^2]**

and similarly for all other factors.

Now let us stop for a second and think at what we have done. We have started with a set of points in the space,

**(x_i, y_i, z_i)**, and we have postulated that they were the result of detection hits of a particle zipping through on a straight line. Under that assumption, we can solve the problem with simple math... But we should never forget that a few other hidden assumptions are also taken for granted in solving the problem with a max likelihood as above. What we have coded implies that:

1) each

**x_i, y_i**measurement carries uncertainties

**sx_i, sy_i**which are completely independent on i, and between the two;

2) the uncertainties

**sx_i, sy_i**are all that is needed to describe the probability that a measurement of a position

**x**returns a value

**x_i**. More precisely, we have assumed that the

**x_i**have values which are Gaussian-distributed around the unknown true value

**x**, with a sigma equal to

**sx_i**.

3) we have assumed, for simplicity, no uncertainty on the positions z_i. Although this is never true in practice, it is often a good approximation.

Assumption 2) in particular is a quite strong one, which in practice is never met exactly; on the other hand, a number of theorems in statistics reassure us that it is a pretty robust assumption in most cases, in the sense that a departure from that probability distribution will not harm our likelihood estimates much. Anyway, it is something to keep in mind!

So, I hope you enjoyed going through this very simple problem. You can learn a lot by solving similar problems with paper and pencil, but you need to keep your mind awake, and always know what are the assumptions you have taken in extracting a solution!

---

Tommaso Dorigo (see his personal web page here) is an experimental particle physicist who works for the INFN and the University of Padova, and collaborates with the CMS experiment at the CERN LHC. He coordinates the MODE Collaboration, a group of physicists and computer scientists from eight institutions in Europe and the US who aim to enable end-to-end optimization of detector design with differentiable programming. Dorigo is an editor of the journals Reviews in Physics and Physics Open. In 2016 Dorigo published the book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", an insider view of the sociology of big particle physics experiments. You can get a copy of the book on Amazon, or contact him to get a free pdf copy if you have limited financial means.

## Comments