Data visualization is a key tool in Data Science. You should always look at your data. Invariably, you will discover aspects of the data you might not notice if you just blindly run algorithms on it. That said, another important rule of thumb in Data Science is that you should not only rely on what you think you are seeing. You should always follow up and confirm any graphical analysis with Math.

That should be the lesson of Where's The Fuzz?, a collection of observations and arguments based entirely on graphical analysis, which I recently wrote. As it turns out, much of what we think we see there can actually be an optical illusion. See sess's post over at Reddit.

This time around, I'm making source code available on Github, as well as the amplitude and phase data produced by that code. I'll have more to say about phase and period in a future installment, because there is a lot that is remarkable about this data.

So was it all an optical illusion? Mostly, yes, I think so. However, there is significant attenuation around dips relative to the median amplitude of the signal. In other words, we should only have noticed the attenuation by comparing segments of data around dips relative to segments of data elsewhere.

I will start by showing signal amplitude data. This is calculated 700 observations at a time (i.e. 17-day windows) in steps of 21 observations (i.e. half a day increments), assuming the period is 0.88 days. The amplitude is calculated by fitting a pair of sine and cosine functions, as well as linear, quadratic and logarithmic functions, which make the results more stable in cases where there's underlying non-stationarity. (It wouldn't change these particular results a whole lot, but in experiments with synthetic data, the additional functions do produce better results.)
Amplitude of 0.88-day signal, 700 observations at a time
The dashed blue lines represent dips D140, D260, D792, D1519 and D1568. A spurious increase in amplitude is expected around large dips because a short-lived spike is interpreted by the linear model as a large amplitude signal. Here are some summary stats:
  • The median amplitude is 9.7⋅10-5.
  • The median non-spurious amplitude (all those below 3.5⋅10-4) is 9.37⋅10-5.
  • The median non-spurious amplitude pre-1200 is 7.88⋅10-5.
  • The standard deviation for non-spurious amplitudes is 7.1⋅10-5 and 4.2⋅10-5 for pre-1200 but keep in mind we're not looking at normally distributed data. The distribution is asymmetric.

  • Let's take a look at amplitude using synthetic data. The following graph represents amplitude of a synthetic series with a 0.88-day signal of amplitude 10-4 and white noise with standard deviation also of 10-4. The synthetic series also includes a dip similar to D792.

    Amplitude of synthetic series, 0.88-day period, artificial D792 dipLarge dips result in some data artifacts, as we can see. With a shorter observation window, the amplitude series becomes more noisy. With a longer window, it's less noisy, but the artifacts around large dips become more pronounced and longer-lived. Using a window of 700 observations (17 days) seemed like a reasonable compromise. It should help to look at a close-up of the synthetic D792 dip's artifacts:

    Close-up synthetic D792 amplitude, artifacts
    I also tested a synthetic light curve with many small Gaussian dips of up to 0.3%. The resulting amplitude series has some knot-looking artifacts, but no obvious bias either way:
    Amplitude of synthetic series with many small Gaussian dips

    Before we look at attenuated sections of the light curve of Boyajian's star, let's consider a couple of reference segments of data, where the amplitude is relatively large. I will use the same y-axis scale in each case. First, this is flux from day 1305 to day 1325:

    Flux in section of light curve with high amplitude

    The following graph shows flux from day 510 to day 530, where the amplitude is also relatively large, within the pre-1200 period.

    Another light curve section with relatively large amplitude

    Close to D140, the 17-day section with most attenuation is centered at day 152.8, where the amplitude is 1.1610-5. This amplitude is below 98.5% of all non-spurious amplitudes and 98.1% of all those before day 1200. Here is the relevant segment of the light curve (with the same scaling as the reference graphs above):

    Flux between days 142 and 162

    It's a significant amplitude dip, but is it indicative of anything? You might expect to see such a dip in amplitude in one out of 50 amplitude observations at most, or every ~25 days. It's not impossible to see this kind of amplitude dip about 12 days away from a flux dip, so we need further confirmation.

    Notice that before day 600 or so, the frequency of significant amplitude dip groups appears to be closer to 1 in ~100 days. Around D140, we have a second such dip centered at day 130.5, with amplitude 2.0510-5.

    Flux between days 120 and 140

    A couple days from D260, we have an amplitude observation of 7.810-6, which is less than 99% of pre-1200 amplitudes.

    Flux between days 247 and 267
    The next group of low-amplitude observations is found around day 367, where we also happen to find a dip about 8 days prior.

    Flux between days 357 and 377

    Next we have small amplitudes around day 500, where we again find another minor dip in the light curve. (Notice that we previously used the 510-530 segment as reference, and yes, there does seem to be a clear change right around day 510.)

    Flux between days 490 and 510

    Prior to the second largest dip in the light curve, D792, starting around day 600, there seems to be a general attenuation of the amplitude, with many dips, culminating in an amplitude dip around day 764. The minimum there is 1.210-5, which means it's less than at least 98% of all non-spurious pre-1200 amplitudes.

    Light curve between 754 and 774

    On day 829 we find a very similar minimum, and after that, attenuation at this level is largely over.

    We do find an interesting dip in amplitude between days 1110 and 1120, and we also find a small (barely noticeable) flux dip to account for it on day 1127.

    Flux around day 1120

    The next time we see a major drop in the signal's amplitude is shortly before the biggest dip in the light curve: D1519. A minimum amplitude of 4.2710-6 is centered around day 1502.  This is a highly significant minimum: less than 99.9% of all non-spurious amplitudes.

    Another highly significant minimum is found centered on day 1555.4, not far from D1568, the 3rd largest dip in the light curve.

    Even though there is a lot of variability after day 1500, it doesn't appear that these minimums are the result of artifacts. One thing that is noticeable after day 1500 is a signal or pseudo-signal with a period that appears to be about 2 days long.

    To conclude, the intuition seems to have been correct, even though it initially came from incorrect observations. I'm not sure how convincing it might be to readers, but there is further evidence tying the 0.88-day signal to dips in flux. I plan to discuss that in a future installment.