The Haystack and OVRO 15 GHz data for PKS 2131-021 shows a clear sinusoidal pattern.
We wish to fit a sine wave with unknown period and phase, and a separate amplitude and offset for the early Haystack data and the later OVRO data. Conceptually this is
\begin{align} S_i^O &= A^O\sin(2\pi[(t_i^O - t_0)/P+\phi]) + S_0^O + n_i^O \\ S_i^H &= A^H\sin(2\pi[(t_i^H - t_0)/P+\phi]) + S_0^H + n_i^H \\ \end{align}Where $i$ is a sample index (different range for Ovro and Haystack), $t_i$ is is the observation tome of sample $i$ and $n_i$ its noise, and the superscripts $O$ and $H$ indicate OVRO and Haystack respectively. The unknown parameters are the period $P$, phase $\phi$, amplitudes $A^O$ and $A^H$ and the offsets $S_0^O$ and $S_0^H$.
The noise includes both the instrument and any AGN variability not part of the sinusoidal motion. We will approximate the noise as Gaussian, with covariance $N^O$ and $N^H$. These covariance matrices aren't available directly, but can be inferred from light curve simulations provided by Felipe. I made a script for OVRO and Haystack to read in the data and sims and massage them into a convenient format for my analysis while building the covmats. The result is these files: ovro_J2134-0153.hdf and haystack_J2134-0153.hdf. The covmats look like this.
Haystack | OVRO |
---|---|
![]() |
![]() |
These mostly look sensible. Nearby samples are strongly correlated, and this correlation falls off with distance. But let's look more carefully. Here's a plot of the covariance of the first sample with the other samples for OVRO.
The covariance starts high as expected, and falls to zero by around 1000 days. To my surprise it then becomes negative, before it starts rising again. By the last sample, some 15.6 years after the first, it is back to being almost 100% correlated! This behavior is probably caused by the simulations being Fourier-based, where the first and last samples are implied to be neighbors by the boundary conditions. In my case this was problematic, since it forces the likelihood to favor fits where the residual for the first and last sample of a dataset match, and it was willing to sacrifice a good period and phase match to do so!
Since I didn't have the program responsible for making the simulations available, and making new sims would be a pain anyway, I instead chose to make an ad-hoc modification to the covariance matrix using this script. It finds the edge of the central positive-covariance region, and sets everything outside to zero. To keep things simple it assumes equi-spaced samples when doing this, which while not accurate is a good enough approximation for our purpose. The resulting fixed datasets are here: ovro_J2134-0153_fix.hdf and haystack_J2134-0153_fix.hdf, and here's what the fixed covmats look like.
Haystack | OVRO |
---|---|
![]() |
![]() |
The central band only looks brighter because the color range is narrower here due to the lack of negative correlations.
The model in equation 1-2 is inefficient to fit because it has four non-linear parameters: $P$, $\phi$, $A^O$ and $A^H$. We can reduce this to two by reparametrizing.
\begin{align} S_i^O &= \alpha\sin(\theta_i) + \beta\cos(\theta_i) + S_0^O + n_i^O \\ S_i^H &= A^{HO}\Big(\alpha\sin(\theta_i) + \beta\cos(\theta_i)\Big) + S_0^H + n_i^H \\ \end{align}where $\theta_i = 2\pi(t_i - t_0)$ and $A^{HO} = A^H/A^O$ is the Haystack amplitude relative to the OVRO amplitude. With this the only non-linear parameters are $P$ and $A^{HO}$. Given these parameters, the rest can be evaluated in a single, linear step. We can write the model in linear algebra form
\begin{align} S &= Qa + n \end{align}Here $S$ is a stack of the OVRO and Haystack samples, $a = [\alpha,\beta,S_0^O,S_0^H]$ are the set of linear parameters and $Q$ is a matrix that depends on $P$ and $A^{HO}$ and implements equations 3-4. With this, we can write the Likelihood as
\begin{align} -2\log L &= |2\pi N| + (S-Qa)^TN^{-1}(S-Qa) \notag \\ &= |2\pi N| + (a-\hat a)^T A^{-1} (a-\hat a) + S^TN^{-1}S - \hat a A^{-1} \hat a \end{align}where $\hat a = (Q^TN^{-1}Q)^{-1}Q^TN^{-1}S$ and $\hat A^{-1} = Q^TN^{-1}Q$. These have simple interpretations. $\hat a$ is the maximum-likelihood estimator for the linear parameters, and $\hat A$ is the covariance of this estimator. To reduce the non-linear fit to just two parameters we must marginalize over the linear parameters, which means integrating over all values of $a$. Since $a$ has a Gaussian likelihood with covariance $\hat A$, we know that its integral is just $\sqrt{|2\pi \hat A|}$, so the marginalized likelihood is
\begin{align} -2\log L' &= |2\pi N| - |2\pi \hat A| + S^TN^{-1}S - \hat a^T \hat A^{-1} \hat a \end{align}The first and third terms don't depend on any of our parameters and so can be ignored in an MCMC fit. Additionally ignoring the second term is equivalent to a Jeffrey's minimum-information prior on $a$. We are therefore left with the following posterior probability for the non-linear parameters.
\begin{align} \log P &= \frac12 \hat a^T \hat A^{-1} \hat a + \log \textrm{prior} \end{align}where prior is a manual prior on the parameters. I used $P\in [1000,3000]$ and $A^{HO}\in [0.1,4]$ in my fit.
Here's a plot of what the marginalized posterior for our two non-linear parameters looks like, with and without the noise covariance fix. The color range is -30 to 0 in $\log P$ relative to the peak, with a contour interval of 1. As you can see, the fix makes a big difference. In fact, without the fix the preferred Haystack amplitude is negative!
Fixed | Raw |
---|---|
![]() |
![]() |
I implemented an emcee fit to the marginalized posterior, recording $\hat a$ and $\hat A$ for each sample via the "blob" feature. I then drew a realization of $a$ for each sample and used these to recover samples of the phase $\phi$ and amplitudes $A^O$ and $A^H$ from equations 1-2. I used two walkers per free parameter and 30000 samples (+30000 burnin) for each walker. The resulting samples can be found here: Haystack+OVRO, OVRO only.
The degeneracy between P and $\phi$ is due to a suboptimal choice of $t_0 = 51000$ mjd. I have tested values in the middle of the OVRO range, and this breaks the degeneracy, but I chose this one to be compatible with the papers.
Both | OVRO | |||||||
---|---|---|---|---|---|---|---|---|
P | 1735.800 | ± | 14.046 | days | 1757.119 | ± | 63.929 | days | $\phi$ | 0.170 | ± | 0.034 | periods | 0.122 | ± | 0.153 | periods | $A^O$ | 0.453 | ± | 0.052 | Jy | 0.464 | ± | 0.053 | Jy | $A^H$ | 0.837 | ± | 0.306 | Jy |
This can be compared to P = 1737.9 ± 2.6 and $\phi$ = 0.140 ± 0.005 from O'Neill et al, which ignores correlated noise in the fit. The values are consistent with mine to 1 sigma, while my errors are around 10x as large.
Here's what the fit looks like with $t_0 = 57500$ instead. This is in the middle of the OVRO period.
Both | OVRO | |||||||
---|---|---|---|---|---|---|---|---|
P | 1735.989 | ± | 13.618 | days | 1761.611 | ± | 65.601 | days | $\phi$ | 0.425 | ± | 0.020 | periods | 0.416 | ± | 0.028 | periods | $A^O$ | 0.453 | ± | 0.052 | Jy | 0.465 | ± | 0.053 | Jy | $A^H$ | 0.844 | ± | 0.312 | Jy |
While working on adding more datasets, I've encountered another issue with the original OVRO and Haystack sims: They don't just assume periodic boundary conditions - they also assume equi-spaced samples. Here's an illustration of this.
Both plot show the 0th and 100th rows from the covariance matrix measured from the OVRO simulations. In the left plot, the covariance is plotted as a function of sample offset, so 0 on the x axis would be the covariacne of a sample with itself, 1 on the x axis would be the covariance of a sample with its neighbor, and so on. As you can see. It's a very smooth function when plotted this way, and the 0th and 100th rows are identical to each others. That's just what one would expect for a model that's diagonal in Fourier space.
In the right plot, the covariance is instead plotted as a function of time offset, so 1000 days on the x axis would be the covariance between two samples separated by 1000 days. This is what we expect the physics to care about, so this is the space where we would expect the rows of the matrix to be smooth and identical. But instead we see that they look jagged and different. This is consistent with the left plot: Given that the samples themselves are non-equispaced, a well-behaved curve in one plot has to be jagged in the other. What's wrong is that the jagged one is the one with a physical x axis.
Here's the same plot for ALMA 91GHz as an example. Here the time-relative plot is the well-behaved one, as it should be. So something has changed between the OVRO/Haystack sims were made and the ALMA/ZTF/WISE sims were made.
Originally I just used the OVRO comat as is, after pruning the periodic part. But of course, samples that are further apart should be less correlated, so the noise model should be improved by interpolating it to a non-equispaced basis. I did this, making the covmat go from the left image to the right image below.
Equi-spaced | Resampled |
---|---|
![]() |
![]() |
Ideally this would reduce the error bars, but instead the $P$ uncertainty went from 14 days to 18 days.
P 1739.800 ± 17.353 φ 0.173 ± 0.040 AO 0.374 ± 0.083 AH 0.853 ± 0.304
P 1739.911 ± 18.412 φ 0.436 ± 0.040 AO 0.373 ± 0.083 AH 0.852 ± 0.307
I have a hypothesis for what's going on. If I understand it correctly, the noise model is built from the data itself, which means that the large sinusoidal signal itself is a big contribution, and therefore ends up penalizing itself in the likelihood. However, when the noise model assumes equi-spaced samples while the signal model uses the real sample spacing, the signal is pulled out of alignment with the noise model, and therefore isn't as heavily penalized as it really "should" be. If this hypothesis is right, then we could improve our error bars by building the noise model in a way that makes it insensitive to the sinusoidal signal, though there's a bit of a Chicken-and-egg problem there. But in any case, the change in the error bars is small enough that I don't think we need to waste more time on this, and I think quoting either the numbers from my earlier analysis or this one should be fine. The difference between them is probably not significant compared to the uncertainty in the noise model itself.