Eclipses of continuous gravitational waves as a probe of stellar structure

Understanding how stars work is a fundamental problem in astrophysics. We can’t open up a star to investigate its inner workings, which makes it difficult to test our models. Over the years, we have developed several ways to sneak a peek into what must be happening inside stars, such as by measuring solar neutrinos, or using asteroseismology to measure how sounds travels through a star. In this paper, we propose a new way to examine the hearts of stars using gravitational waves.

Gravitational waves interact very weakly with stuff. Whereas light gets blocked by material (meaning that we can’t see deeper than a star’s photosphere), gravitational waves will happily travel through pretty much anything. This property means that gravitational waves are hard to detect, but it also means that there’ll happily pass through an entire star. While the material that makes up a star will not affect the passing of a gravitational wave, its gravity will. The mass of a star can lead to gravitational lensing and a slight deflecting, magnification and delaying of a passing gravitational wave. If we can measure this lensing, we can reconstruct the mass of star, and potentially map out its internal structure.

Two types of eclipse: the eclipse of a distant gravitational wave (GW) source by the Sun, and gravitational waves from an accreting millisecond pulsar (MSP) eclipsed by its companion. Either scenario could enable us to see gravitational waves passing through a star. Figure 2 of Marchant et al. (2020).

We proposed looking at gravitational waves for eclipsing sources—where a gravitational wave source is behind a star. As the alignment of the Earth (and our detectors), the star and the source changes, the gravitational wave will travel through different parts of the star, and we will see a different amount of lensing, allowing us to measure the mass of the star at different radii. This sounds neat, but how often will we be lucky enough to see an eclipsing source?

To date, we have only seen gravitational waves from compact binary coalescences (the inspiral and merger of two black holes or neutron stars). These are not a good source for eclipses. The chances that they travel through a star is small (as space is pretty empty) [bonus note]. Furthermore, we might not even be able to work out that this happened. The signal is relatively short, so we can’t compare the signal before and during an eclipse. Another type of gravitational wave signal would be much better: a continuous gravitational wave signal.

Probability of observing at least one eclipsing source amongst a number of observed sources. Compact binary coalescences (CBCs, shown in purple) are the most rare, continuous gravitational waves (CGWs) eclipsed by the Sun (red) or by a companion (red) are more common. Here we assume companions are stars about a tenth the mass of the neutron star. The number of neutron stars with binary companions is estimated using the COSMIC population synthesis code. Results are shown for eclipses where the gravitational waves get within distance $b$ of the centre of the star. Figure 1 of Marchant et al. (2020).

Continuous gravitational waves are produced by rotating neutron stars. They are pretty much perfect for searching for eclipses. As you might guess from their name, continuous gravitational waves are always there. They happily hum away, sticking to pretty much the same note (they’d get pretty annoying to listen to). Therefore, we can measure them before, during and after an eclipse, and identify any changes due to the gravitational lensing. Furthermore, we’d expect that many neutron stars would be in close binaries, and therefore would be eclipsed by their partner. This would happen each time they orbit, potentially giving us lots of juicy information on these stars. All we need to do is measure the continuous gravitational wave…

The effect of the gravitational lensing by a star is small. We performed detailed calculations for our Sun (using MESA), and found that for the effects to be measurable you would need an extremely loud signal. A signal-to-noise ratio would need to be hundreds during the eclipse for measurement precision to be good enough to notice the imprint of lensing. To map out how things changed as the eclipse progressed, you’d need signal-to-noise ratios many times higher than this. As an eclipse by the Sun is only a small fraction of the time, we’re going to need some really loud signals (at least signal-to-noise ratios of 2500) to see these effects. We will need the next generation of gravitational wave detectors.

We are currently thinking about the next generation of gravitational wave detectors [bonus note]. The leading ideas are successors to LIGO and Virgo: detectors which cover a large range of frequencies to detect many different types of source. These will be expensive (billions of dollars, euros or pounds), and need international collaboration to finance. However, I also like the idea of smaller detectors designed to do one thing really well. Potentially these could be financed by a single national lab. I think eclipsing continuous waves are the perfect source for this—instead of needing a detector sensitive over a wide frequency range, we just need to be sensitive over a really narrow range. We will be able to detect continuous waves before we are able to see the impact of eclipses. Therefore, we’ll know exactly what frequency to tune for. We’ll also know exactly when we need to observe. I think it would be really awesome to have a tunable narrowband detector, which could measure the eclipse of one source, and then be tuned for the next one, and the next. By combining many observations, we could really build up a detailed picture of the Sun. I think this would be an exciting experiment—instrumentalists, put your thinking hats on!

Let’s reach for(the centres of) the stars.

arXiv: 1912.04268 [astro-ph.SR]
Journal: Physical Review D; 101(2):024039(15); 2020
Data release: Eclipses of continuous gravitational waves as a probe of stellar structure
CIERA story: Using gravitational waves to see inside stars
Why does the sun really shine? The Sun is a miasma of incandescent plasma

Bonus notes

Silver lining

Since signals from compact binary coalescences are so unlikely to be eclipsed by a star, we don’t have to worry that our measurements of the source property are being messed up by this type of gravitational lensing distorting the signal. Which is nice.

Prospects with LISA

If you were wondering if we could see these types of eclipses with the space-based gravitational wave observatory LISA, the answer is sadly no. LISA observes lower frequency gravitational waves. Lower frequency means longer wavelength, so long in fact that the wavelength is larger than the size of the Sun! Since the size of the Sun is so small compared to the gravitational wave, it doesn’t leave a same imprint: the wave effectively skips over the gravitational potential.

An introduction to LIGO–Virgo data analysis

LIGO and Virgo make their data open for anyone to try analysing [bonus note]. If you’re a student looking for a project, a teacher planning a class activity, or a scientist working on a paper, this data is waiting for you to use. Understanding how to analyse the data can be tricky. In this post, I’ll share some of the resources made by LIGO and Virgo to help introduce gravitational-wave analysis. These papers together should give you a good grounding in how to get started working with gravitational-wave data.

If you’d like a more in-depth understanding, I’d recommend visiting your local library for Michele Maggiore’s  Gravitational Waves: Volume 1.

The Data Analysis Guide

Title: A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals
arXiv:
1908.11170 [gr-qc]
Journal: Classical & Quantum Gravity;37(5):055002(54); 2020
Tutorial notebook: GitHub;  Google Colab; Binder
Code repository: Data Guide
LIGO science summary: A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals

It took many decades to develop the technology necessary to build gravitational-wave detectors. Similarly, gravitational-wave data analysis has developed over many decades—I’d say LIGO analysis was really kicked off in the early 1990s by Kipp Thorne’s group. There are now hundreds of papers on various aspects of gravitational-wave analysis. If you are new to the area, where should you start? Don’t panic! For the binary sources discovered so far, this Data Analysis Guide has you covered.

More details: The Data Analysis Guide

The GWOSC Paper

Title: Open data from the first and second observing runs of Advanced LIGO and Advanced Virgo
arXiv:
1912.11716 [gr-qc]
Website: Gravitational Wave Open Science Center
LIGO science summary: Open data from the first and second observing runs of Advanced LIGO and Advanced Virgo

Data from the LIGO and Virgo detectors is released by the Gravitational Wave Open Science Center (GWOSC, pronounced, unfortunately, as it is spelt). If you want to try analysing our delicious data yourself, either searching for signals or studying the signals we have found, GWOSC is the place to start. This paper outlines how these data are produced, going from our laser interferometers to your hard-drive. The paper specifically looks at the data released for our first and second observing runs (O1 and O2), however, GWOSC also host data from the initial detectors’ fifth science run (S5) and sixth science run (S6), and will be updated with new data in the future.

If you do use data from GWOSC, please remember to say thank you.

More details: The GWOSC Paper

I thought I saw a 2! Credit: Fox

The Data Analysis Guide

Synopsis: Data Analysis Guide
Read this if: You want an introduction to signal analysis
Favourite part: This is a great resource for new students [bonus note]

Gravitational-wave detectors measure ripples in spacetime. They record a simple time series of the stretching and squeezing of space as a gravitational wave passes. Well, they measure that, plus a whole lot of noise. Most of the time it is just noise. How do we go from this time series to discoveries about the Universe’s black holes and neutron stars? This paper gives the outline, it covers (in order)

1. An introduction to observations at the time of writing
2. The basics of LIGO and Virgo data—what it is the we analyse
3. The basics of detector noise—how we describe sources of noise in our data
4. Fourier analysis—how we go from time series to looking at the data in the as a function of frequency, which is the most natural way to analyse that data.
5. Time–frequency analysis and stationarity—how we check the stability of data from our detectors
6. Detector calibration and data quality—how we make sure we have good quality data
7. The noise model and likelihood—how we use our understanding of the noise, under the assumption of it being stationary, to work out the likelihood of different signals being in the data
8. Signal detection—how we identify times in the data which have a transient signal present
9. Inferring waveform and physical parameters—how we estimate the parameters of the source of a gravitational wave
10. Residuals around GW150914—a consistency check that we have understood the noise surrounding our first detection

The paper works through things thoroughly, and I would encourage you to work though it if you are interested.

I won’t summarise everything here, I want to focus the (roughly undergraduate-level) foundations of how we do our analysis in the frequency domain. My discussion of the GWOSC Paper goes into more detail on the basics of LIGO and Virgo data, and some details on calibration and data quality. I’ll leave talking about residuals to this bonus note, as it involves a long tangent and me needing to lie down for a while.

Fourier analysis

The signal our detectors measure is a time series $d(t)$. This is may just contain noise, $d(t) = n(t)$, or it may also contain a signal, $d(t) = n(t) + h(t)$.

There are many sources of noise for our detectors. The different sources can affect different frequencies. If we assume that the noise is stationary, so that it’s properties don’t change with time, we can simply describe the properties of the noise with the power spectral density $S_n(f)$. On average we expect the noise at a given frequency to be zero, but with it fluctuating up and down with a variance given by the power spectral density. We typically approximate the noise as Gaussian, such that

$n(f) \sim \mathcal{N}(0; S_n(f)/2)$,

where we use $\mathcal{N}(\mu; \sigma^2)$ to represent a normal distribution with mean $\mu$ and standard deviation $\sigma$. The approximations of stationary and Gaussian noise are good most of the time. The noise does vary over time, but is usually effectively stationary over the durations we look at for a signal. The noise is also mostly Gaussian except for glitches. These are taken into account when we search for signals, but we’ll ignore them for now. The statistical description of the noise in terms of the power spectral density allows us to understand our data, but this understanding comes as a function of frequency: we must transform of time domain data to frequency domain data.

The go from $d(t)$ to $d(f)$ we can use a Fourier transform. Fourier transforms are a way of converting a function of one variable into a function of a reciprocal variable—in the case of time you convert to frequency. Fourier transforms encode all the information of the original function, so it is possible to convert back and forth as you like. Really, a Fourier transform is just another way of looking at the same function.

The Fourier transform is defined as

$d(f) = \mathcal{F}_f\left\{d(t)\right\} = \int_{-\infty}^{\infty} d(t) \exp(-2\pi i f t) \,\mathrm{d}t$.

Now, from this you might notice a problem when it comes to real data analysis, namely that the integral is defined over an infinite amount of time. We don’t have that much data. Instead, we only have a short period.

We could recast the integral above over a shorter time if instead of taking the Fourier transform of $d(t)$, we take the Fourier transform of $d(t) \times w(t)$ where $w(t)$ is some window function which goes to zero outside of the time interval we are looking at. What we end up with is a convolution of the function we want with the Fourier transform of the window function,

$\mathcal{F}_f\left\{d(t)w(t)\right\} = d(f) \ast w(f)$.

It is important to pick a window function which minimises the distortion to the signal that we want. If we just take a tophat (also known as a boxcar or rectangular, possibly on account of its infamous criminal background) function which is abruptly cuts off the data at the ends of the time interval, we find that $w(f)$ is a sinc function. This is not a good thing, as it leads to all sorts of unwanted correlations between different frequencies, commonly known as spectral leakage. A much better choice is a function which smoothly tapers to zero at the edges. Using a tapering window, we lose a little data at the edges (we need to be careful choosing the length of the data analysed), but we can avoid the significant nastiness of spectral leakage. A tapering window function should always be used. Then or finite-time Fourier transform is then a good approximation to the exact $d(f)$.

Data processing to reveal GW150914. The top panel shows raw Hanford data. The second panel shows a window function being applied. The third panel shows the data after being whitened. This cleans up the data, making it easier to pick out the signal from all the low frequency noise. The bottom panel shows the whitened data after a bandpass filter is applied to pick out the signal. We don’t use the bandpass filter in our analysis (it is just for illustration), but the other steps reflect how we treat our data. Figure 2 of the Data Analysis Guide.

Now we have our data in the frequency domain, it is simple enough to compare the data to the expected noise a t a given frequency. If we measure something loud at a frequency with lots of noise we should be less surprised than if we measure something loud at a frequency which is usually quiet. This is kind of like how somewhat shouting is less startling at a rock concert than in a library. The appropriate way to weight is to divide by the square root of power spectral density $d_\mathrm{w}(f) \propto d(f)/[S_n(f)]^{1/2}$. This is known as whitening. Whitened data should have equal amplitude fluctuations at all frequencies, allowing for easy comparisons.

Now we understand the statistical properties of noise we can do some analysis! We can start by testing our assumption that the data are stationary and Gaussian by checking that that after whitening we get the expected distribution. We can also define the likelihood of obtaining the data $d(t)$ given a model of a gravitational-wave signal $h(t)$, as the properties of the noise mean that $d(f) - h(f) \sim \mathcal{N}(0; S_n(f)/2)$. Combining the likelihood for each individual frequency gives the overall likelihood

$\displaystyle p(d|h) \propto \exp\left[-\int_{-\infty}^{\infty} \frac{|d(f) - h(f)|^2}{S_n(f)} \mathrm{d}f \right]$.

This likelihood is at the heart of parameter estimation, as we can work out the probability of there being a signal with a given set of parameters. The Data Analysis Guide goes through many different analyses (including parameter estimation) and demonstrates how to check that noise is nice and Gaussian.

Distribution of residuals for 4 seconds of data around GW150914 after subtracting the maximum likelihood waveform. The residuals are the whitened Fourier amplitudes, and they should be consistent with a unit Gaussian. The residuals follow the expected distribution and show no sign of non-Gaussianity. Figure 14 of the Data Analysis Guide.

Homework

The Data Analysis Guide contains much more material on gravitational-wave data analysis. If you wanted to delve further, there many excellent papers cited. Favourites of mine include Finn (1992); Finn & Chernoff (1993); Cutler & Flanagan (1994); Flanagan & Hughes (1998); Allen (2005), and Allen et al. (2012). I would also recommend the tutorials available from GWOSC and the lectures from the Open Data Workshops.

The GWOSC Paper

Synopsis: GWOSC Paper
Read this if: You want to analyse our gravitational wave data
Favourite part: All the cool projects done with this data

You’re now up-to-speed with some ideas of how to analyse gravitational-wave data, you’ve made yourself a fresh cup of really hot tea, you’re ready to get to work! All you need are the data—this paper explains where this comes from.

Data production

The first step in getting gravitational-wave data is the easy one. You need to design a detector, convince science agencies to invest something like half a billion dollars in building one, then spend 40 years carefully researching the necessary technology and putting it all together as part of an international collaboration of hundreds of scientists, engineers and technicians, before painstakingly commissioning the instrument and operating it. For your convenience, we have done this step for you, but do feel free to try it yourself at home.

Gravitational-wave detectors like Advanced LIGO are built around an interferometer: they have two arms at right angles to each other, and we bounce lasers up and down them to measure their length. A passing gravitational wave will change the relative length of one arm relative to the other. This changes the time taken to travel along one arm compared to the other. Hence, when the two bits of light reach the output of the interferometer, they’ll have a different phase:where normally one light wave would have a peak, it’ll have a trough. This change in phase will change how light from the two arms combine together. When no gravitational wave is present, the light interferes destructively, almost cancelling out so that the output is dark. We measure the brightness of light at the output which tells us about how the length of the arms changes.

We want our detector in measure the gravitational-wave strain. That is the fractional change in length of the arms,

$\displaystyle h(t) = \frac{\Delta L(t)}{L}$,

where $\Delta L = L_x - L_y$ is the relative difference in the length of the two arms, and $L$ is the usually arm length. Since we love jargon in LIGO & Virgo, we’ll often refer to the strain as HOFT (as you would read $h(t)$ as h of t; it took me years to realise this) or DARM (differential arm measurement).

The actual output of the detector is the voltage from a photodiode measuring the intensity of the light. It is necessary to make careful calibration of the detectors. In theory this is simple: we change the position of the mirrors at the end of the arms and see how the output changes. In practise, it is very difficult. The GW150914 Calibration Paper goes into details for O1, more up-to-date descriptions are given in Cahillane et al. (2017) for LIGO and Acernese et al. (2018) for Virgo. The calibration of the detectors can drift over time, improving the calibration is one of the things we do between originally taking the data and releasing the final data.

The data are only celebrated between 10 Hz and 5 kHz, so don’t trust the data outside of that frequency range.

The next stage of our data’s journey is going through detector characterisation and data quality checks. In addition to measuring gravitational-wave strain, we record many other data channels: about 200,000 per detector. These measure all sorts of things, from the internal state of the instrument, to monitoring the physical environment around the detectors. These auxiliary channels are used to check the data quality. In some cases, an auxiliary channel will record a source of noise, like scattered light or the mains power frequency, allowing us to clean up our strain data by subtracting out this noise. In other cases, an auxiliary channel can act as a witness to a glitch in our detector, identifying when it is misbehaving so that we know not to trust that part of the data. The GW150914 Detector Characterisation Paper goes into details of how we check potential detections. In doing data quality checks we are careful to only use the auxiliary channels which record something which would be independent of a passing gravitational wave.

We have 4 flags for data quality:

1. DATA: All clear. Certified fresh. Eat as much as you like.
2. CAT1: A critical problem with the instrument. Data from these times are likely to be a dumpster fire of noise. We do not use them in our analyses, and they are currently excluded from our public releases. About 1.7% of Hanford data and 1.0% of time from Livingston was flagged with CAT1 in O1. In O2,  we got this done to 0.001% for Hanford, 0.003% for Livingston and 0.05% for Virgo.
3. CAT2: Some activity in an auxiliary channel (possibly the electric boogaloo monitor) which has a well understood correlation with the measured strain channel. You would therefore expect to find some form of glitchiness in the data.
4. CAT3: There is some correlation in an auxiliary channel and the strain channel which is not understood. We’re not currently using this flag, but it’s kept as an option.

It’s important to verify the data quality before starting your analysis. You don’t want to get excited to discover a completely new form of gravitational wave only to realise that it’s actually some noise from nearby logging. Remember, if a tree falls in the forest and no-one is around, LIGO will still know.

To test our systems, we also occasionally perform a signal injection: we move the mirrors to simulate a signal. This is useful for calibration and for testing analysis algorithms. We don’t perform injections very often (they get in the way of looking for real signals), but these times are flagged. Just as for data quality flags, it is important to check for injections before analysing a stretch of data.

Once passing through all these checks, the data is ready to analyse!

Excited Data. Credit: Paramount

Accessing the data

After our data have been lovingly prepared, they are served up in two data formats:

• Hierarchical Data Format HDF, which is a popular data storage format, as it is easily allows for metadata and multiple data sets (like the important data quality flags) to be packaged together.
• Gravitational Wave Frame GWF, which is the standard format we use internally. Veteran gravitational-wave scientists often get a far-way haunted look when you bring up how the specifications for this file format were decided. It’s best not to mention unless you are also buying them a stiff drink.

In these files, you will find $h(t)$ sampled at either 4096 Hz or 16384 Hz (either are available). Pick the sampling rate you need depending upon the frequency range you are interested in: the 4096 Hz data are good for upto 1.7 kHz, while the 16384 Hz are good to the limit of the calibration range at 5 kHz.

Files can be downloaded from the GWOSC website. If you want to download a large amount, it is recommended to use the CernVM-FS distributed file system.

To check when the gravitational-wave detectors were observing, you can use the Timeline search.

Screenshot of the GWOSC Timeline showing observing from the fifth science run (S5) on the initial detector era through to the second observing run (O2) of the advanced detector era. Bars show observing of GEO 600 (G1), Hanford (H1 and H2), Livingston (L1) and Virgo (V1). Hanford initial had two detectors housed within its site, the plan in the advanced detector era is to install the equipment as LIGO India instead.

Try this at home

Having gone through all these details, you should now know what are data is, over what ranges it can be analyzed, and how to get access to it. Your cup of tea has also probably gone cold. Why not make yourself a new one, and have a couple of biscuits as reward too. You deserve it!

To help you on your way in starting analysing the data, GWOSC has a set of tutorials (and don’t forget the Data Analysis Guide), and a collection of open source software. Have fun, and remember, it’s never aliens.

Bonus notes

Release schedule

The current policy is that data are released:

1. In a chunk surrounding an event at time of publication of that event. This enables the new detection to be analysed by anyone. We typically release about an hour of data around an event.
2. 18 months after the end of the run. This time gives us chance to properly calibrate the data, check the data quality, and then run the analyses we are committed to. A lot of work goes into producing gravitational wave data!

Start marking your calendars now for the release of O3 data.

Summer studenting

In summer 2019, while we were finishing up on the Data Analysis Guide, I gave it to one of my summer students Andrew Kim as an introduction. Andrew was working on gravitational-wave data analysis, so I hoped that he’d find it useful. He ended up working through the draft notebook made to accompany the paper and making a number of useful suggestions! He ended up as an author on the paper because of these contributions, which was nice.

The conspiracy of residuals

The Data Analysis Guide is an extremely useful paper. It explains many details of gravitational-wave analysis. The detections made by LIGO and Virgo over the last few years has increased the interest in analysing gravitational waves, making it the perfect time to write such an article. However, that’s not really what motivated us to write it.

In 2017, a paper appeared on the arXiv making claims of suspicious correlations in our LIGO data around GW150914. Could this call into question the very nature of our detection? No. The paper has two serious flaws.

1. The first argument in the paper was that there were suspicious phase correlations in the data. This is because the authors didn’t window their data before Fourier transforming.
2. The second argument was the residuals presented in Figure 1 of the GW150914 Discovery Paper contain a correlation. This is true, but these residuals aren’t actually the results of how we analyse the data. The point of Figure 1 was to that you don’t need our fancy analysis to see the signal—you can spot it by eye. Unfortunately, doing things by eye isn’t perfect, and this imperfection was picked up on.

The first flaw is a rookie mistake—pretty much everyone does it at some point. I did it starting out as a first-year PhD student, and I’ve run into it with all the undergraduates I’ve worked with writing their own analyses. The authors of this paper are rookies in gravitational-wave analysis, so they shouldn’t be judged too harshly for falling into this trap, and it is something so simple I can’t blame the referee of the paper for not thinking to ask. Any physics undergraduate who has met Fourier transforms (the second year of my degree) should grasp the mistake—it’s not something esoteric you need to be an expert in quantum gravity to understand.

The second flaw is something which could have been easily avoided if we had been more careful in the GW150914 Discovery Paper. We could have easily aligned the waveforms properly, or more clearly explained that the treatment used for Figure 1 is not what we actually do. However, we did write many other papers explaining what we did do, so we were hardly being secretive. While Figure 1 was not perfect, it was not wrong—it might not be what you might hope for, but it is described correctly in the text, and none of the LIGO–Virgo results depend on the figure in any way.

Recovered gravitational waveforms from our analysis of GW150914. The grey line shows the data whitened by the noise spectrum. The dark band shows our estimate for the waveform without assuming a particular source. The light bands show results if we assume it is a binary black hole (BBH) as predicted by general relativity. This plot more accurately represents how we analyse gravitational-wave data. Figure 6 of the GW150914 Parameter Estimation Paper.

Both mistakes are easy to fix. They are at the level of “Oops, that’s embarrassing! Give me 10 minutes. OK, that looks better”. Unfortunately, that didn’t happen.

The paper regrettably got picked up by science blogs, and caused much of a flutter. There were demands that LIGO and Virgo publically explain ourselves. This was difficult—the Collaboration is set up to do careful science, not handle a PR disaster. One of the problems was that we didn’t want to be seen to policing the use of our data. We can’t check that every paper ever using our data does everything perfectly. We don’t have time, and that probably wouldn’t encourage people to use our data if they knew any mistake would be pulled up by this 1000 person collaboration. A second problem was that getting anything approved as an official Collaboration document takes ages—getting consensus amongst so many people isn’t always easy. What would you do—would you want to be the faceless Collaboration persecuting the helpless, plucky scientists trying to check results?

There were private communications between people in the Collaboration and the authors. It took us a while to isolate the sources of the problems. In the meantime, pressure was mounting for an official™ response. It’s hard to justify why your analysis is correct by gesturing to a stack of a dozen papers—people don’t have time to dig through all that (I actually sent links to 16 papers to a science journalist who contacted me back in July 2017). Our silence may have been perceived as arrogance or guilt.

It was decided that we would put out an unofficial response. Ian Harry had been communicating with the authors, and wrote up his notes which Sean Carroll kindly shared on his blog. Unfortunately, this didn’t really make anyone too happy. The authors of the paper weren’t happy that something was shared via such an informal medium; the post is too technical for the general public to appreciate, and there was a minor typo in the accompanying code which (since fixed) was seized upon. It became necessary to write a formal paper.

Peer review will save the children! Credit: Fox

We did continue to try to explain the errors to the authors. I have colleagues who spent many hours in a room in Copenhagen trying to explain the mistakes. However, little progress was made, and it was not a fun time™. I can imagine at this point that the authors of the paper were sufficiently angry not to want to listen, which is a shame.

Now that the Data Analysis Guide is published, everyone will be satisfied, right? A refereed journal article should quash all fears, surely? Sadly, I doubt this will be the case. I expect these doubts will keep circulating for years. After all, there are those who still think vaccines cause autism. Fortunately, not believing in gravitational waves won’t kill any children. If anyone asks though, you can tell them that any doubts on LIGO’s analysis have been quashed, and that vaccines cause adults!

For a good account of the back and forth, Natalie Wolchover wrote a nice article in Quanta, and for a more acerbic view, try Mark Hannam’s blog.

What GW170729’s exceptional mass and spin tells us about its family tree

One of the great discoveries that came with our first observation of gravitational waves was that black holes can merge—two black holes in a binary can come together and form a bigger black hole. This had long been predicted, but never before witnessed. If black holes can merge once, can they go on to merge again? In this paper, we calculated how to identify a binary containing a second-generation black hole formed in a merger.

Merging black holes

Black holes have two important properties: their mass and their spin. When two black holes merge, the resulting black hole has:

1. A mass which is almost as big as the sum of the masses of its two parents. It is a little less (about 5%) as some of the energy is radiated away as gravitational waves.
2. A spin which is around 0.7. This is set by the angular momentum of the two black holes as they plunge in together. For equal-mass black holes, the orbit of the two black holes will give about enough angular momentum for the final black hole to be about 0.7. The spins of the two parent black holes will cause a bit a variation around this, depending upon the orientations of their spins. For more unequal mass binaries, the spin of the larger parent black hole becomes more important.

To look for second-generation (or higher) black holes formed in mergers, we need to look for more massive black holes with spins of about 0.7 [bonus note].

Combining black holes. The result of a merger is a larger black hole with significant spin. From Dawn Finney.

The difficult bit here is that we don’t know the distribution of masses and spins of the initial first-generation black holes. What is they naturally form with spins of 0.7? How can you tell if a black hole is unexpectedly large if you don’t know what sizes to expect? With the discovery of the 10 binary black holes found in our first and second observing runs, we are able to start making inferences about the properties of black holes—using these measurements of the population, we can estimate how probable it is that a binary contains a second generation black hole versus containing two first generation black hole.

GW170729

Amongst the black holes observed in O1 and O2, the source of GW170729 stands out. It is both the most massive, and one of only two systems (the other being GW151226) showing strong evidence for spin. This got me wondering if it could be a second-generation system? The high mass would be explained as we have a second-generation black hole, and the spin is larger than usual as a spin 0.7 sticks out.

Chase Kimball worked out the relative probability of getting a system with a given chirp mass and effective inspiral spin for a binary with a second-generation black hole verses a binary with only first-generation black holes. We worked in terms of chirp mass and effective inspiral spin, as these are the properties we measure well from a gravitational-wave signal.

Relative likelihood of a binary black hole being second-generation versus first-generation for different values of the chirp mass and the magnitude of the effective inspiral spin. The white contour gives the 90% credible area for GW170729. Figure 1 of Kimball et al. (2019).

The plot above shows the relative probabilities. Yellow indicate chirp mass and effective inspiral spins which are more likely with second-generation systems, while dark purple indicates values more likely with first-generation systems.. The first thing I realised was my idea about the spin was off. We expect binaries with second-generation black holes to be formed dynamically. Following the first merger, the black hole wander around until it gets close enough to form a new binary with a new black hole. For dynamically formed binaries the spins should be randomly distributed. This means that there’s only a small probability of having a spin aligned with the orbital angular momentum as measured for GW170729. Most of the time, you’d measure an effective inspiral spin of around zero.

Since we don’t know exactly the chirp mass and effective inspiral spin for GW170729, we have to average over our uncertainty. That gives the ratio of the probability of observing GW170729 given a second-generation source, verses given a first-generation source. Using different inferred black hole populations (for example, ones inferred including and excluding GW170729), we find ratios of between 0.2 (meaning the first-generation origin is more likely) and 16 (meaning second generation is more likely). The results change significantly as the result is sensitive to the maximum mass of a black hole. If we include GW170729 in our population inference for first-generation systems, the maximum mass goes up, and it’s easier to explain the system as first-generation (as you’d expect).

Before you place your bets, there is one more piece to the calculation. We have calculated the relative probabilities of the observed properties assuming either first-generation black holes or a second-generation black hole, but we have not folded in the relative rates of mergers [bonus note]. We expect first-generation only binaries to be more common than ones containing second generation black holes. In simulations of globular clusters, at most about 20% of merging binaries are with second-generation black holes. For binaries not in an environment like a globular cluster (where there are lots of nearby black holes to grab), we expect the fraction of second-generation black holes in binaries to be basically zero. Therefore, on balance we have at best a weak preference for a second-generation black hole and most probably just two first-generation black holes in GW170729’s source, despite its large mass.

Verdict

What we have learnt from this calculation is that it seems that all of the first 10 binary black holes contain only first-generation black holes. It is safe to infer the properties of first-generation black holes from these observations. Detecting second-generation black holes requires knowledge of this distribution, and crucially if there is a maximum mass. As we get more detection, we’ll be able to pin this down. There is still a lot to learn about the full black hole family.

If you’d like to understand our calculation, the paper is extremely short. It is therefore an excellent paper to bring to journal club if you are a PhD student who forgot you were presenting this week…

arXiv: 1903.07813 [astro-ph.HE]
Journal: Research Notes of the AAS; 4(1):2; 2020 [bonus note]
Gizmodo story: The gravitational wave detectors are turning back on and we’re psyched
Theme music: Nice to see you!

Bonus notes

Useful papers

Back in 2017 two papers hit the arXiv [bonus bonus note] at pretty much the same time addressing the expected properties of second-generation black holes: Fishbach, Holz & Farr (2017), and Gerosa & Berti (2017). Both are nice reads.

I was asked how we could tell if the black holes we were seeing were themselves the results of mergers back in 2016 when I was giving a talk to the Carolian Astronomical Society. It was a good question. I explained about the masses and spins, but I didn’t think about how to actually do the analysis to infer if we had a merger. I now make a note to remember any questions I’m asked, as they can be good inspiration for projects!

Bayes factor and odds ratio

The quantity we work out in the paper is the Bayes factor for a second-generation system verses a first-generation one

$\displaystyle \frac{P(\mathrm{GW170729}|\mathrm{Gen\ 2})}{P(\mathrm{GW170729}|\mathrm{Gen\ 1})}$.

What we want is the odds ratio

$\displaystyle \frac{P(\mathrm{Gen\ 2}|\mathrm{GW170729})}{P(\mathrm{Gen\ 1}|\mathrm{GW170729})}$,

which gives the betting odds for the two scenarios. The convert the Bayes factor into an odds ratio we need the prior odds

$\displaystyle \frac{P(\mathrm{Gen\ 2})}{P(\mathrm{Gen\ 1})}$.

We’re currently working on a better way to fold these pieces together.

1000 words

As this was a quick calculation, we thought it would be a good paper to be a Research Note. Research Notes are limited to 1000 words, which is a tough limit. We carefully crafted the document, using as many word-saving measures (such as abbreviations), as we could. We made it to the limit by our counting, only to submit and find that we needed to share another 100 off! Fortunately, the arXiv [bonus bonus note] is more forgiving, so you can read our more relaxed (but still delightfully short) version there. It’s the one I’d recommend.

arXiv

For those reading who are not professional physicists, the arXiv (pronounced archive, as the X is really the Greek letter chi χ) is a preprint server. It where physicists can post version of their papers ahead of publication. This allows sharing of results earlier (both good as it can take a while to get a final published paper, and because you can get feedback before finalising a paper), and, vitally, for free. Most published papers require a subscription to read. Fine if you’re at a university, not so good otherwise. The arXiv allows anyone to read the latest research. Admittedly, you have to be careful, as not everything on the arXiv will make it through peer review, and not everyone will update their papers to reflect the published version. However, I think the arXiv is a very good thing™. There are few things I can think of which have benefited modern science as much. I would 100% support those behind the arXiv receiving a Nobel Prize, as I think it has had just as a significant impact on the development of the field as the discovery of dark matter, understanding nuclear fission, or deducing the composition of the Sun.

Can neutron-star mergers explain the r-process enrichment in globular clusters?

Maybe

The mystery of the elements

Where do the elements come from? Hydrogen, helium and a little lithium were made in the big bang. These lighter elements are fused together inside stars, making heavier elements up to around iron. At this point you no longer get energy out by smooshing nuclei together. To build even heavier elements, you need different processes—one being to introduce lots of extra neutrons. Adding neutrons slowly leads to creation of s-process elements, while adding then rapidly leads to the creation of r-process elements. By observing the distribution of elements, we can figure out how often these different processes operate.

Periodic table showing the origins of different elements found in our Solar System. THis plot assumes that neutron star mergers are the dominant source of r-process elements. Credit: Jennifer Johnson

It has long been theorised that the site of r-process production could be neutron star mergers. Material ejected as the stars are ripped apart or ejected following the collision is naturally neutron rich. This undergoes radioactive decay leading making r-process elements. The discovery of the first binary neutron star collision confirmed this happens. If you have any gold or platinum jewellery, it’s origins can probably be traced back to a pair of neutron stars which collided billions of years ago!

The r-process may also occur in supernova explosions. It is most likely that it occurs in both supernovae and neutron star mergers—the question is which contributes more. Figuring this out would be helpful in our quest to understand how stars live and die.

Hubble Space Telescope image of the stars of NGC 1898, a globular cluster in the Large Magellanic Cloud. Credit: ESA/Hubble & NASA

In this paper, led by Michael Zevin, we investigated the r-process elements of globular clusters. Globular clusters are big balls of stars. Apart from being beautiful, globular clusters are an excellent laboratory for testing our understanding of stars,as there are so many packed into a (relatively) small space. We considered if observations of r-process enrichment could be explained by binary neutron star mergers?

Enriching globular clusters

The stars in globular clusters are all born around the same time. They should all be made from the same stuff; they should have the same composition, aside from any elements that they have made themselves. Since r-process elements are not made in stars, the stars in a globular cluster should have the same abundances of these elements. However, measurements of elements like lanthanum and europium, show star-to-star variation in some globular clusters.

This variation can happen if some stars were polluted by r-process elements made after the cluster formed. The first stars formed from unpolluted gas, while later stars formed from gas which had been enriched, possibly with stars closer to the source being more enriched than those further away. For this to work, we need (i) a process which can happen quickly [bonus science note], as the time over which stars form is short (they are almost the same age), and (ii) something that will happen in some clusters but not others—we need to hit the goldilocks zone of something not so rare that we’d almost never since enrichment, but not so common that almost all clusters would be enriched. Can binary neutron stars merge quickly enough and with the right rate to explain r-process enrichment?

Making binary neutron stars

There are two ways of making binary neutron stars: dynamically and via isolated evolution. Dynamically formed binaries are made when two stars get close enough to form a pairing, or when a star gets close to an binary existing binary resulting in one member getting ejecting and the interloper taking its place, or when two binaries get close together, resulting in all sorts of madness (Michael has previously looked at binary black holes formed through binary–binary interactions, and I love the animations, as shown below). Isolated evolution happens when you have a pair of stars that live their entire lives together. We examined both channels.

Dynamically formed binaries

With globular clusters having so many stars in such a small space, you might think that dynamical formation is a good bet for binary neutron star formation. We found that this isn’t the case. The problem is that neutron stars are relatively light. This causes two problems. First, generally the heaviest objects generally settle in the centre of a cluster where the density is highest and binaries are most likely to form. Second, in interactions, it is typically the heaviest objects that will be left in the binary. Black holes are more massive than neutron stars, so they will initially take the prime position. Through dynamical interactions, many will be eventually ejected from the cluster; however, even then, many of the remaining stars will be more massive than the neutron stars. It is hard for neutron stars to get the prime binary-forming positions [bonus note].

To check on the dynamical-formation potential, we performed two simulations: one with the standard mix of stars, and one ultimate best case™ where we artificially removed all the black holes. In both cases, we found that binary neutron stars take billions of years to merge. That’s far too long to lead to the necessary r-process enrichment.

Time taken for double black hole (DHB, shown in blue), neutron star–black hole (NSBH, shown in green), and double neutron star (DNS, shown in purple) [bonus note] binaries to form and then inspiral to merge in globular cluster simulations. Circles and dashed histograms show results for the standard cluster model. Triangles and solids histograms show results when black holes are artificially removed. Figure 1 of a Zevin et al. (2019).

Isolated binaries

Considering isolated binaries, we need to work out how many binary neutron stars will merge close enough to a cluster to enrich it. This requires a couple of ingredients: (I) knowing how many binary neutron stars form, and (ii) working how many are still close to the cluster when they merge. Neutron stars will get kicks when they are born in supernova explosions, and these are enough to kick them out of the cluster.  So long as they merge before they get too far, that’s OK for enrichment. Therefore we need to track both those that stay in the cluster, and those which leave but merge before getting too far. To estimate the number of enriching binary neutron stars, we simulated a populations of binary stars.

The evolution of binary neutron stars can be complicated. The neutron stars form from massive stars. In order for them to end up merging, they need to be in a close binary. This means that as the stars evolve and start to expand, they will transfer mass between themselves. This mass transfer can be stable, in which case the orbit widens, faster eventually shutting off the mass transfer, or it can be unstable, when the star expands leading to even more mass transfer (what’s really important is the rate of change of the size of the star compared to the Roche lobe). When mass transfer is extremely rapid, it leads to the formation of a common envelope: the outer layers of the donor ends up encompassing both the core of the star and the companion. Drag experienced in a common envelope can lead to the orbit shrinking, exactly as you’d want for a merger, but it can be too efficient, and the two stars may merge before forming two neutron stars. It’s also not clear what would happen in this case if there isn’t a clear boundary between the envelope and core of the donor star—it’s probable you’d just get a mess and the stars merging. We used COSMIC to see the effects of different assumptions about the physics:

• Model A: Our base model, which is in my opinion the least plausible. This assumes that helium stars can successfully survive a common envelope. Mass transfer from helium star will be especially important for our results, particularly what is called Case BB mass transfer [bonus note], which occurs once helium burning has finished in the core of a star, and is now burning is a shell outside the core.
• Model B: Here, we assume that stars without a clear core/envelope boundary will always merge during the common envelope. Stars burning helium in a shell lack a clear core/envelope boundary, and so any common envelopes formed from Case BB mass transfer will result in the stars merging (and no binary neutron star forming). This is a pessimistic model in terms of predicting rates.
• Model C: The same as Model A, but we use prescriptions from Tauris, Langer & Podsiadlowski (2015) for the orbital evolution and mass loss for mass transfer. These results show that mass transfer from helium stars typically proceeds stably. This means we don’t need to worry about common envelopes from Case BB mass transfer. This is more optimistic in terms of rates.
• Model D: The same as Model C, except all stars which undergo Case BB mass transfer are assumed to become ultra-stripped. Since they have less material in their envelopes, we give them smaller supernova natal kicks, the same as electron capture supernovae.

All our models can produce some merging neutron stars within 100 million years. However, for Model B, this number is small, so that only a few percent of globular clusters would be enriched. For the others, it would be a few tens of percent, but not all. Model A gives the most enrichment. Model C and D are similar, with Model D producing slightly less enrichment.

Post-supernova binary neutron star properties (systemic velocity $v_\mathrm{sys}$ vs inspiral time $t_\mathrm{insp}$, and orbital separation $a$ vs eccentricity $e$) for our population models. The lines in the left-hand plots show the bounds for a binary to enrich a cluster of a given virial radius: viable binaries are below the lines. In both plots, red, blue and green points are the binaries which could enrich clusters of virial radii 1 pc, 3 pc and 10 pc; of the other points, purple indicates systems where the secondary star went through Case BB mass transfer. Figure 2 of Zevin et al. (2019).

Maybe?

Our results show that the r-process enrichment of globular clusters could be explained by binary neutron star mergers if binaries can survive Case BB mass transfer without merging. If Case BB mass transfer is typically unstable and somehow it is possible to survive a common envelope (Model A), ~30−90% of globular clusters should be enriched (depending upon their mass and size). This rate is consistent with consistent with current observations, but it is a stretch to imagine stars surviving common envelopes in this case. However, if Case BB mass transfer is stable (Models C and D), we still have ~10−70% of globular clusters should be enriched. This could plausibly explain everything! If we can measure the enrichment in more clusters and accurately pin down the fraction which are enriched, we may learn something important about how binaries interact.

However, for our idea to work, we do need globular clusters to form stars over an extended period of time. If there’s no gas around to absorb the material ejected from binary neutron star mergers and then form new stars, we have not cracked the problem. The plot below shows that the build up of enriching material happens at around 40 million years after the initial start formation. This is when we need the gas to be around. If this is not the case, we need a different method of enrichment.

Probability of cluster enrichment $P_\mathrm{enrich}$ and number of enriching binary neutron star mergers per cluster $\Lambda_\mathrm{enrich}$ as a function of the timescale of star formation $\Delta \tau_\mathrm{SF}$. Dashed lines are used of a cluster of a million solar masses and solid lines are used for a cluster of half this mass. Results are shown for Model D. The build up happens around the same time in different models. Figure 5 in Zevin et al. (2019).

It may be interesting to look again at r-process enrichment from supernova.

arXiv: arXiv:1906.11299 [astro-ph.HE]
Journal: Astrophysical Journal; 886(1):4(16); 2019 [bonus note]
Alternative tile: The Europium Report

Bonus notes

Hidden pulsars and GW190425

The most recent gravitational-wave detection, GW190425, comes from a binary neutron star system of an unusually high mass. It’s mass is much higher than the population of binary neutron stars observed in our Galaxy. One explanation for this could be that it represents a population which is short lived, and we’d be unlikely to spot one in our Galaxy, as they’re not around for long. Consequently, the same physics may be important both for this study of globular clusters and for explaining GW190425.

Gravitational-wave sources and dynamical formation

The question of how do binary neutron stars form is important for understanding gravitational-wave sources. The question of whether dynamically formed binary neutron stars could be a significant contribution to the overall rate was recently studied in detail in a paper led by Northwestern PhD student Claire Ye. The conclusions of this work was that the fraction of binary neutron stars formed dynamically in globular clusters was tiny (in agreement with our results). Only about 0.001% of binary neutron stars we observe with gravitational waves would be formed dynamically in globular clusters.

Double vs binary

In this paper we use double black hole = DBH and double neutron star = DNS instead of the usual binary black hole = BBH and binary neutron star = BNS from gravitational-wave astronomy. The terms mean the same. I will use binary instead of double here as B is worth more than D in Scrabble.

Mass transfer cases

The different types of mass transfer have names which I always forget. For regular stars we have:

• Case A is from a star on the main sequence, when it is burning hydrogen in its core.
• Case B is from a star which has finished burning hydrogen in its core, and is burning hydrogen in shell/burning helium in the core.
• Case C is from a start which has finished core helium burning, and is burning helium in a shell. The star will now have carbon it its core, which may later start burning too.

The situation where mass transfer is avoided because the stars are well mixed, and so don’t expand, has also been referred to as Case M. This is more commonly known as (quai)chemically homogenous evolution.

If a star undergoes Case B mass transfer, it can lose its outer hydrogen-rich layers, to leave behind a helium star. This helium star may subsequently expand and undergo a new phase of mass transfer. The mass transfer from this helium star gets named similarly:

• Case BA is from the helium star while it is on the helium main sequence burning helium in its core.
• Case BB is from the helium star once it has finished core helium burning, and may be burning helium in a shell.
• Case BC is from the helium star once it is burning carbon.

If the outer hydrogen-rich layers are lost during Case C mass transfer, we are left with a helium star with a carbon–oxygen core. In this case, subsequent mass transfer is named as:

• Case CB if helium shell burning is on-going. (I wonder if this could lead to fast radio bursts?)
• Case CC once core carbon burning has started.

I guess the naming almost makes sense. Case closed!

Page count

Don’t be put off by the length of the paper—the bibliography is extremely detailed. Michael was exceedingly proud of the number of references. I think it is the most in any non-review paper of mine!

Classifying the unknown: Discovering novel gravitational-wave detector glitches using similarity learning

Gravity Spy is an awesome project that combines citizen science and machine learning to classify glitches in LIGO and Virgo data. Glitches are short bursts of noise in our detectors which make analysing our data more difficult. Some glitches have known causes, others are more mysterious. Classifying glitches into different types helps us better understand their properties, and in some cases track down their causes and eliminate them! In this paper, led by Scotty Coughlin, we demonstrated the effectiveness of a new tool which are citizen scientists can use to identify new glitch classes.

The Gravity Spy project

Gravitational-wave detectors are complicated machines. It takes a lot of engineering to achieve the required accuracy needed to observe gravitational waves. Most of the time, our detectors perform well. The background noise in our detectors is easy to understand and model. However, our detectors are also subject to glitches, unusual  (sometimes extremely loud and complicated) noise that doesn’t fit the usual properties of noise. Glitches are short, they only appear in a small fraction of the total data, but they are common. This makes detection and analysis of gravitational-wave signals more difficult. Detection is tricky because you need to be careful to distinguish glitches from signals (and possibly glitches and signals together), and understanding the signal is complicated as we may need to model a signal and a glitch together [bonus note]. Understanding glitches is essential if gravitational-wave astronomy is to be a success.

To understand glitches, we need to be able to classify them. We can search for glitches by looking for loud pops, whooshes and splats in our data. The task is then to spot similarities between them. Once we have a set of glitches of the same type, we can examine the state of the instruments at these times. In the best cases, we can identify the cause, and then work to improve the detectors so that this no longer happens. Other times, we might be able to find the source, but we can find one of the monitors in our detectors which acts a witness to the glitch. Then we know that if something appears in that monitor, we expect a glitch of a particular form. This might mean that we throw away that bit of data, or perhaps we can use the witness data to subtract out the glitch. Since glitches are so common, classifying them is a huge amount of work. It is too much for our detector characterisation experts to do by hand.

There are two cunning options for classifying large numbers of glitches

1. Get a computer to do it. The difficulty  is teaching a computer to identify the different classes. Machine-learning algorithms can do this, if they are properly trained. Training can require a large training set, and careful validation, so the process is still labour intensive.
2. Get lots of people to help. The difficulty here is getting non-experts up-to-speed on what to look for, and then checking that they are doing a good job. Crowdsourcing classifications is something citizen scientists can do, but we will need a large number of dedicated volunteers to tackle the full set of data.

The idea behind Gravity Spy is to combine the two approaches. We start with a small training set from our detector characterization experts, and train a machine-learning algorithm on them. We then ask citizen scientists (thanks Zooniverse) to classify the glitches. We start them off with glitches the machine-learning algorithm is confident in its classification; these should be easy to identify. As citizen scientists get more experienced, they level up and start tackling more difficult glitches. The citizen scientists validate the classifications of the machine-learning algorithm, and provide a larger training set (especially helpful for the rarer glitch classes) for it. We can then happily apply the machine-learning algorithm to classify the full data set [bonus note].

How Gravity Spy works: the interconnection of machine-learning classification and citizen-scientist classification. The similarity search is used to identify glitches similar to one which do not fit into current classes. Figure 2 of Coughlin et al. (2019).

I especially like the levelling-up system in Gravity Spy. I think it helps keep citizen scientists motivated, as it both prevents them from being overwhelmed when they start and helps them see their own progress. I am currently Level 4.

Gravity Spy works using images of the data. We show spectrograms, plots of how loud the output of the detectors are at different frequencies at different times. A gravitational wave form a binary would show a chirp structure, starting at lower frequencies and sweeping up.

Spectrogram showing the upward-sweeping chirp of gravitational wave GW170104 as seen in Gravity Spy. I correctly classified this as a Chirp.

New glitches

The Gravity Spy system works smoothly. However, it is set up to work with a fixed set of glitch classes. We may be missing new glitch classes, either because they are rare, and hadn’t been spotted by our detector characterization team, or because we changed something in our detectors and new class arose (we expect this to happen as we tune up the detectors between observing runs). We can add more classes to our citizen scientists and machine-learning algorithm to use, but how do we spot new classes in the first place?

Our citizen scientists managed to identify a few new glitches by spotting things which didn’t fit into any of the classes. These get put in the None-of-the-Above class. Occasionally, you’ll come across similar looking glitches, and by collecting a few of these together, build a new class. The Paired Dove and Helix classes were identified early on by our citizen scientists this way; my favourite suggested new class is the Falcon [bonus note]. The difficulty is finding a large number of examples of a new class—you might only recognise a common feature after going past a few examples, backtracking to find the previous examples is hard, and you just have to keep working until you are lucky enough to be given more of the same.

Example Helix (left) and Paired Dove glitches. These classes were identified by Gravity Spy citizen scientists. Helix glitches are related to related to hiccups in the auxiliary lasers used to calibrate the detectors by pushing on the mirrors. Paired Dove glitches are related to motion of the beamsplitter in the interferometer. Adapted from Figure 8 of Zevin et al. (2017).

To help our citizen scientists find new glitches, we created a similar search. Having found an interesting glitch, you can search for similar examples, and put quickly put together a collection of your new class. The video below shows how it works. The thing we had to work out was how to define similar?

Transfer learning

Our machine-learning algorithm only knows about the classes we tell it about. It then works out the features we distinguish the different classes, and are common to glitches of the same class. Working in this feature space, glitches form clusters of different classes.

Visualisation showing the clustering of different glitches in the Gravity Spy feature space. Each point is a different glitch from our training set. The feature space has more than three dimensions: this visualisation was made using a technique which preserves the separation and clustering of different and similar points. Figure 1 of Coughlin et al. (2019).

For our similarity search, our idea was to measure distances in feature space [bonus note for experts]. This should work well if our current set of classes have a wide enough set of features to capture to characteristics of the new class; however, it won’t be effective if the new class is completely different, so that its unique features are not recognised. As an analogy, imagine that you had an algorithm which classified M&M’s by colour. It would probably do well if you asked it to distinguish a new colour, but would probably do poorly if you asked it to distinguish peanut butter filled M&M’s as they are identified by flavour, which is not a feature it knows about. The strategy of using what a machine learning algorithm learnt about one problem to tackle a new problem is known as transfer learning, and we found this strategy worked well for our similarity search.

Raven Pecks and Water Jets

To test our similarity search, we applied it to two glitches classes not in the Gravity Spy set:

1. Raven Peck glitches are caused by thirsty ravens pecking ice built up along nitrogen vent lines outside of the Hanford detector. Raven Pecks look like horizontal lines in spectrograms, similar to other Gravity Spy glitch classes (like the Power Line, Low Frequency Line and 1080 Line). The similarity search should therefore do a good job, as we should be able to recognise its important features.
2. Water Jet glitches were caused by local seismic noise at the Hanford detector which  causes loud bands which disturb the input laser optics. These glitches are found between , over which time there are 26,871 total glitches in GRavity Spy. The Water Jet glitch doesn’t have anything to do with water, it is named based on its appearance (like a fountain, not a weasel). Its features are subtle, and unlike other classes, so we would expect this to be difficult for our similarity search to handle.

These glitches appeared in the data from the second observing run. Raven Pecks appeared between 14 April and 9 August 2017, and Water Jets appeared 4 January and 28 May 2017. Over these intervals there are a total of 13,513 and 26,871 Gravity Spy glitches from all type, so even if you knew exactly when to look, you have a large number to search through to find examples.

Example Raven Peck (left) and Water Jet (right) glitches. These classes of glitch are not included in the usual Gravity Spy scheme. Adapted from Figure 3 of Coughlin et al. (2019).

We tested using our machine-learning feature space for the similarity search against simpler approaches: using the raw difference in pixels, and using a principal component analysis to create a feature space. Results are shown in the plots below. These show the fraction of glitches we want returned by the similarity search versus the total number of glitches rejected. Ideally, we would want to reject all the glitches except the ones we want, so the search would return 100% of the wanted classes and reject almost 100% of the total set. However, the actual results will depend on the adopted threshold for the similarity search: if we’re very strict we’ll reject pretty much everything, and only get the most similar glitches of the class we want, if we are too accepting, we get everything back, regardless of class. The plots can be read as increasing the range of the similarity search (becoming less strict) as you go left to right.

Performance of the similarity search for Raven Peck (left) and Water Jet (right) glitches: the fraction of known glitches of the desired class that have a higher similarity score (compared to an example of that glitch class) than a given percentage of full data set. Results are shown for three different ways of defining similarity: the DIRECT machine-learning algorithm feature space (think line), a principal component analysis (medium line) and a comparison of pixels (thin line). Adapted from Figure 3 of Coughlin et al. (2019).

For the Raven Peck, the similarity search always performs well. We have 50% of Raven Pecks returned while rejecting 99% of the total set of glitches, and we can get the full set while rejecting 92% of the total set! The performance is pretty similar between the different ways of defining feature space. Raven Pecks are easy to spot.

Water Jets are more difficult. When we have 50% of Water Jets returned by the search, our machine-learning feature space can still reject almost all glitches. The simpler approaches do much worse, and will only reject about 30% of the full data set. To get the full set of Water Jets we would need to loosen the similarity search so that it only rejects 55% of the full set using our machine-learning feature space; for the simpler approaches we’d basically get the full set of glitches back. They do not do a good job at narrowing down the hunt for glitches. Despite our suspicion that our machine-learning approach would struggle, it still seems to do a decent job [bonus note for experts].

Do try this at home

Having developed and testing our similarity search tool, it is now live. Citizen scientists can use it to hunt down new glitch classes. Several new glitches classes have been identified in data from LIGO and Virgo’s (currently ongoing) third observing run. If you are looking for a new project, why not give it a go yourself? (Or get your students to give it a go, I’ve had some reasonable results with high-schoolers). There is the real possibility that your work could help us with the next big gravitational-wave discovery.

arXiv: arXiv:1903.04058 [astro-ph.IM]
Journal: Physical Review D; 99(8):082002(8); 2019
Websites: Gravity Spy; Gravity Spy Tools
Gravity Spy blog: Introducing Gravity Spy Tools
Current stats: Gravity Spy has 15,500 registered users, who have made 4.4 million glitch classifications, leading to 200,000 successfully identified glitches.

Bonus notes

Signals and glitches

The best example of a gravitational-wave overlapping a glitch is GW170817. The glitch meant that the signal in the LIGO Livingston detector wasn’t immediately recognised. Fortunately, the signal in the Hanford detector was easy to spot. The glitch was analyse and categorised in Gravity Spy. It is a simple glitch, so it wasn’t too difficult to remove from the data. As our detectors become more sensitive, so that detections become more frequent, we expect that signal overlapping with glitches will become a more common occurrence. Unless we can eliminate glitches, it is only a matter of time before we get a glitch that prevents us from analysing an important signal.

In the third observing run of LIGO and Virgo, we send out automated alerts when we have a new gravitational-wave candidate. Astronomers can then pounce into action to see if they can spot anything coinciding with the source. It is important to quickly check the state of the instruments to ensure we don’t have a false alarm. To help with this, a data quality report is automatically prepared, containing many diagnostics. The classification from the Gravity Spy algorithm is one of many pieces of information included. It is the one I check first.

The Falcon

Excellent Gravity Spy moderator EcceruElme suggested a new glitch class Falcon. This suggestion was followed up by Oli Patane, they found that all the examples identified occured between 6:30 am and 8:30 am on 20 June 2017 in the Hanford detector. The instrument was misbehaving at the time. To solve this, the detector was taken out of observing mode and relocked (the equivalent of switching it off and on again). Since this glitch class was only found in this one 2-hour window, we’ve not added it as a class. I love how it was possible to identify this problematic stretch of time using only Gravity Spy images (which don’t identify when they are from). I think this could be the seed of a good detective story. The Hanfordese Falcon?

Examples of the proposed Falcon glitch class, illustrating the key features (and where the name comes from). This new glitch class was suggested by Gravity Spy citizen scientist EcceruElme.

Distance measure

We chose a cosine distance to measure similarity in feature space. We found this worked better than a Euclidean metric. Possibly because for identifying classes it is more important to have the right mix of features, rather than how significant the individual features are. However, we didn’t do a systematic investigation of the optimal means of measuring similarity.

Retraining the neural net

We tested the performance of the machine-learning feature space in the similarity search after modifying properties of our machine-learning algorithm. The algorithm we are using is a deep multiview convolution neural net. We switched the activation function in the fully connected layer of the net, trying tanh and leaukyREU. We also varied the number of training rounds and the number of pairs of similar and dissimilar images that are drawn from the training set each round. We found that there was little variation in results. We found that leakyREU performed a little better than tanh, possibly because it covers a larger dynamic range, and so can allow for cleaner separation of similar and dissimilar features. The number of training rounds and pairs makes negligible difference, possibly because the classes are sufficiently distinct that you don’t need many inputs to identify the basic features to tell them apart. Overall, our results appear robust. The machine-learning approach works well for the similarity search.

GW190425—First discovery from O3

The first gravitational wave detection of LIGO and Virgo’s third observing run (O3) has been announced: GW190425! [bonus note] The signal comes from the inspiral of two objects which have a combined mass of about 3.4 times the mass of our Sun. These masses are in range expected for neutron stars, this makes GW190425 the second observation of gravitational waves from a binary neutron star inspiral (after GW170817). While the individual masses of the two components agree with the masses of neutron stars found in binaries, the overall mass of the binary (times the mass of our Sun) is noticeably larger than any previously known binary neutron star system. GW190425 may be the first evidence for multiple ways of forming binary neutron stars.

The gravitational wave signal

On 25 April 2019 the LIGO–Virgo network observed a signal. This was promptly shared with the world as candidate event S190425z [bonus note]. The initial source classification was as a binary neutron star. This caused a flurry of excitement in the astronomical community [bonus note], as the smashing together of two neutron stars should lead to the emission of light. Unfortunately, the sky localization was HUGE (the initial 90% area wass about a quarter of the sky, and the refined localization provided the next day wasn’t much improvement), and the distance was four times that of GW170817 (meaning that any counterpart would be about 16 times fainter). Covering all this area is almost impossible. No convincing counterpart has been found [bonus note].

Early sky localization for GW190425. Darker areas are more probable. This localization was circulated in GCN 24228 on 26 April and was used to guide follow-up, even though it covers a huge amount of the sky (the 90% area is about 18% of the sky).

The localization for GW19045 was so large because LIGO Hanford (LHO) was offline at the time. Only LIGO Livingston (LLO) and Virgo were online. The Livingston detector was about 2.8 times more sensitive than Virgo, so pretty much all the information came from Livingston. I’m looking forward to when we have a larger network of detectors at comparable sensitivity online (we really need three detectors observing for a good localization).

We typically search for gravitational waves by looking for coincident signals in our detectors. When looking for binaries, we have templates for what the signals look like, so we match these to the data and look for good overlaps. The overlap is quantified by the signal-to-noise ratio. Since our detectors contains all sorts of noise, you’d expect them to randomly match templates from time to time. On average, you’d expect the signal-to-noise ratio to be about 1. The higher the signal-to-noise ratio, the less likely that a random noise fluctuation could account for this.

Our search algorithms don’t just rely on the signal-to-noise ratio. The complication is that there are frequently glitches in our detectors. Glitches can be extremely loud, and so can have a significant overlap with a template, even though they don’t look anything like one. Therefore, our search algorithms also look at the overlap for different parts of the template, to check that these match the expected distribution (for example, there’s not one bit which is really loud, while the others don’t match). Each of our different search algorithms has their own way of doing this, but they are largely based around the ideas from Allen (2005), which is pleasantly readable if you like these sort of things. It’s important to collect lots of data so that we know the expected distribution of signal-to-noise ratio and signal-consistency statistics (sometimes things change in our detectors and new types of noise pop up, which can confuse things).

It is extremely important to check the state of the detectors at the time of an event candidate. In O3, we have unfortunately had to retract various candidate events after we’ve identified that our detectors were in a disturbed state. The signal consistency checks take care of most of the instances, but they are not perfect. Fortunately, it is usually easy to identify that there is a glitch—the difficult question is whether there is a glitch on top of a signal (as was the case for GW170817).  Our checks revealed nothing up with the detectors which could explain the signal (there was a small glitch in Livingston about 60 seconds before the merger time, but this doesn’t overlap with the signal).

Now, the search that identified GW190425 was actually just looking for single-detector events: outliers in the distribution of signal-to-noise ratio and signal-consistency as expected for signals. This was a Good Thing™. While the signal-to-noise ratio in Livingston was 12.9 (pretty darn good), the signal-to-noise ration in Virgo was only 2.5 (pretty meh) [bonus note]. This is below the threshold (signal-to-noise ratio of 4) the search algorithms use to look for coincidences (a threshold is there to cut computational expense: the lower the threshold, the more triggers need to be checked) [bonus note]. The Bad Thing™ about GW190425 being found by the single-detector search, and being missed by the usual multiple detector search, is that it is much harder to estimate the false-alarm rate—it’s much harder to rule out the possibility of some unusual noise when you don’t have another detector to cross-reference against. We don’t have a final estimate for the significance yet. The initial estimate was 1 in 69,000 years (which relies on significant extrapolation). What we can be certain of is that this event is a noticeable outlier: across the whole of O1, O2 and the first 50 days of O3, it comes second only to GW170817. In short, we can say that GW190425 is worth betting on, but I’m not sure (yet) how heavily you want to bet.

Detection statistics for GW190425 showing how it stands out from the background. The left plot shows the signal-to-noise ratio (SNR) and signal-consistency statistic from the GstLAL algorithm, which made the detection. The coloured density plot shows the distribution of background triggers. Right shows the detection statistic from PyCBC, which combines the SNR and their signal-consistency statistic. The lines show the background distributions. GW190425 is more significant than everything apart from GW170817. Adapted from Figures 1 and 6 of the GW190425 Discovery Paper.

I’m always cautious of single-detector candidates. If you find a high-mass binary black hole (which would be an extremely short template), or something with extremely high spins (indicating that the templates don’t match unless you push to the bounds of what is physical), I would be suspicious. Here, we do have consistent Virgo data, which is good for backing up what is observed in Livingston. It may be a single-detector detection, but it is a multiple-detector observation. To further reassure ourselves about GW190425, we ran our full set of detection algorithms on the Livingston data to check that they all find similar signals, with reasonable signal-consistency test values. Indeed, they do! The best explanation for the data seems to be a gravitational wave.

The source

Given that we have a gravitational wave, where did it come from? The best-measured property of a binary inspiral is its chirp mass—a particular combination of the two component masses. For GW190425, this is $1.44^{+0.02}_{-0.02}$ solar masses (quoting the 90% range for parameters). This is larger than GW170817’s $1.186^{+0.001}_{-0.001}$ solar masses: we have a heavier binary.

Estimated masses for the two components in the binary. We show results for two different spin limits. The two-dimensional shows the 90% probability contour, which follows a line of constant chirp mass. The one-dimensional plot shows individual masses; the dotted lines mark 90% bounds away from equal mass. The masses are in the range expected for neutron stars. Figure 3 of the GW190425 Discovery Paper.

Figuring out the component masses is trickier. There is a degeneracy between the spins and the mass ratio—by increasing the spins of the components it is possible to get more extreme mass ratios to fit the signal. As we did for GW170817, we quote results with two ranges of spins. The low-spin results use a maximum spin of 0.05, which matches the range of spins we see for binary neutron stars in our Galaxy, while the high-spin results use a limit of 0.89, which safely encompasses the upper limit for neutron stars (if they spin faster than about 0.7 they’ll tear themselves apart). We find that the heavier component of the binary has a mass of $1.62$$1.88$ solar masses with the low-spin assumption, and $1.61$$2.52$ solar masses with the high-spin assumption; the lighter component has a mass $1.45$$1.69$ solar masses with the low-spin assumption, and $1.12$$1.68$ solar masses with the high-spin. These are the range of masses expected for neutron stars.

Without an electromagnetic counterpart, we cannot be certain that we have two neutron stars. We could tell from the gravitational wave by measuring the imprint in the signal left by the tidal distortion of the neutron star. Black holes have a tidal deformability of 0, so measuring a nonzero tidal deformability would be the smoking gun that we have a neutron star. Unfortunately, the signal isn’t loud enough to find any evidence of these effects. This isn’t surprising—we couldn’t say anything for GW170817, without assuming its source was a binary neutron star, and GW170817 was louder and had a lower mass source (where tidal effects are easier to measure). We did check—it’s probably not the case that the components were made of marshmallow, but there’s not much more we can say (although we can still make pretty simulations). It would be really odd to have black holes this small, but we can’t rule out than at least one of the components was a black hole.

Two binary neutron stars is the most likely explanation for GW190425. How does it compare to other binary neutron stars? Looking at the 17 known binary neutron stars in our Galaxy, we see that GW190425’s source is much heavier. This is intriguing—could there be a different, previously unknown formation mechanism for this binary? Perhaps the survey of Galactic binary neutron stars (thanks to radio observations) is incomplete? Maybe the more massive binaries form in close binaries, which are had to spot in the radio (as the neutron star moves so quickly, the radio signals gets smeared out), or maybe such heavy binaries only form from stars with low metallicity (few elements heavier than hydrogen and helium) from earlier in the Universe’s history, so that they are no longer emitting in the radio today? I think it’s too early to tell—but it’s still fun to speculate. I expect there’ll be a flurry of explanations out soon.

Comparison of the total binary mass of the 10 known binary neutron stars in our Galaxy that will merge within a Hubble time and GW190425’s source (with both the high-spin and low-spin assumptions). We also show a Gaussian fit to the Galactic binaries. GW190425’s source is higher mass than previously known binary neutron stars. Figure 5 of the GW190425 Discovery Paper.

Since the source seems to be an outlier in terms of mass compared to the Galactic population, I’m a little cautious about using the low-spin results—if this sample doesn’t reflect the full range of masses, perhaps it doesn’t reflect the full range of spins too? I think it’s good to keep an open mind. The fastest spinning neutron star we know of has a spin of around 0.4, maybe binary neutron star components can spin this fast in binaries too?

One thing we can measure is the distance to the source: $160^{+70}_{-70}~\mathrm{Mpc}$. That means the signal was travelling across the Universe for about half a billion years. This is as many times bigger than diameter of Earth’s orbit about the Sun, as the diameter of the orbit is than the height of a LEGO brick. Space is big.

We have now observed two gravitational wave signals from binary neutron stars. What does the new observation mean for the merger rate of binary neutron stars? To go from an observed number of signals to how many binaries are out there in the Universe we need to know how sensitive our detectors are to the sources. This depends on  the masses of the sources, since more massive binaries produce louder signals. We’re not sure of the mass distribution for binary neutron stars yet. If we assume a uniform mass distribution for neutron stars between 0.8 and 2.3 solar masses, then at the end of O2 we estimated a merger rate of $110$$2520~\mathrm{Gpc^{-3}\,\mathrm{yr}^{-3}}$. Now, adding in the first 50 days of O3, we estimate the rate to be $250$$2470~\mathrm{Gpc^{-3}\,\mathrm{yr}^{-3}}$, so roughly the same (which is nice) [bonus note].

Since GW190425’s source looks rather different from other neutron stars, you might be interested in breaking up the merger rates to look at different classes. Using measured masses, we can construct rates for GW170817-like (matching the usual binary neutron star population) and GW190425-like binaries (we did something similar for binary black holes after our first detection). The GW170817-like rate is $110$$2500~\mathrm{Gpc^{-3}\,\mathrm{yr}^{-3}}$, and the GW190425-like rate is lower at $70$$4600~\mathrm{Gpc^{-3}\,\mathrm{yr}^{-3}}$. Combining the two (Assuming that binary neutron stars are all one class or the other), gives an overall rate of $290$$2810~\mathrm{Gpc^{-3}\,\mathrm{yr}^{-3}}$, which is not too different than assuming the uniform distribution of masses.

Given these rates, we might expect some more nice binary neutron star signals in the O3 data. There is a lot of science to come.

Future mysteries

GW190425 hints that there might be a greater variety of binary neutron stars out there than previously thought. As we collect more detections, we can start to reconstruct the mass distribution. Using this, together with the merger rate, we can start to pin down the details of how these binaries form.

As we find more signals, we should also find a few which are loud enough to measure tidal effects. With these, we can start to figure out the properties of the Stuff™ which makes up neutron stars, and potentially figure out if there are small black holes in this mass range. Discovering smaller black holes would be extremely exciting—these wouldn’t be formed from collapsing stars, but potentially could be remnants left over from the early Universe.

Probability distributions for neutron star masses and radii (blue for the more massive neutron star, orange for the lighter), assuming that GW190425’s source is a binary neutron star. The left plots use the high-spin assumption, the right plots use the low-spin assumptions. The top plots use equation-of-state insensitive relations, and the bottom use parametrised equation-of-state models incorporating the requirement that neutron stars can be 1.97 solar masses. Similar analyses were done in the GW170817 Equation-of-state Paper. In the one-dimensional plots, the dashed lines indicate the priors. Figure 16 of the GW190425 Discovery Paper.

With more detections (especially when we have more detectors online), we should also be lucky enough to have a few which are well localised. These are the events when we are most likely to find an electromagnetic counterpart. As our gravitational-wave detectors become more sensitive, we can detect sources further out. These are much harder to find counterparts for, so we mustn’t expect every detection to have a counterpart. However, for nearby sources, we will be able to localise them better, and so increase our odds of finding a counterpart. From such multimessenger observations we can learn a lot. I’m especially interested to see how typical GW170817 really was.

O3 might see gravitational wave detection becoming routine, but that doesn’t mean gravitational wave astronomy is any less exciting!

Title: GW190425: Observation of a compact binary coalescence with total mass ~ 3.4 solar masses
arXiv: arXiv:2001.01761 [astro-ph.HE] [bonus note]
Science summary: GW190425: The heaviest binary neutron star system ever seen?
Data release: Gravitational Wave Open Science Center; Parameter estimation results
Rating: 🥇😮🥂🥇

Bonus notes

Exceptional events

The plan for publishing papers in O3 is that we would write a paper for any particularly exciting detections (such as a binary neutron star), and then put out a catalogue of all our results later. The initial discovery papers wouldn’t be the full picture, just the key details so that the entire community could get working on them. Our initial timeline was to get the individual papers out in four months—that’s not going so well, it turns out that the most interesting events have lots of interesting properties, which take some time to understand. Who’d have guessed?

We’re still working on getting papers out as soon as possible. We’ll be including full analyses, including results which we can’t do on these shorter timescales in our catalogue papers. The catalogue paper for the first half of O3 (O3a) is currently pencilled in for April 2020.

Naming conventions

The name of a gravitational wave signal is set by the date it is observed. GW190425 is hence the gravitational wave (GW) observed on 2019 April 25th. Our candidates alerts don’t start out with the GW prefix, as we still need to do lots of work to check if they are real. Their names start with S for superevent (not for hope) [bonus bonus note], then the date, and then a letter indicating the order it was uploaded to our database of candidates (we upload candidates with false alarm rates of around one per hour, so there are multiple database entries per day, and most are false alarms). S190425z was the 26th superevent uploaded on 2019 April 25th.

What is a superevent? We call anything flagged by our detection pipelines an event. We have multiple detection pipelines, and often multiple pipelines produce events for the same stretch of data (you’d expect this to happen for real signals). It was rather confusing having multiple events for the same signal (especially when trying to quickly check a candidate to issue an alert), so in O3 we group together events from similar times into SUPERevents.

GRB 190425?

Pozanenko et al. (2019) suggest a gamma-ray burst observed by INTEGRAL (first reported in GCN 24170). The INTEGRAL team themselves don’t find anything in their data, and seem sceptical of the significance of the detection claim. The significance of the claim seems to be based on there being two peaks in the data (one about 0.5 seconds after the merger, one 5.9 seconds after the merger), but I’m not convinced why this should be the case. Nothing was observed by Fermi, which is possibly because the source was obscured by the Earth for them. I’m interested in seeing more study of this possible gamma-ray burst.

EMMA 2019

At the time of GW190425, I was attending the first day of the Enabling Multi-Messenger Astrophysics in the Big Data Era Workshop. This was a meeting bringing together many involved in the search for counterparts to gravitational wave events. The alert for S190425z cause some excitement. I don’t think there was much sleep that week.

Signal-to-noise ratio ratios

The signal-to-noise ratio reported from our search algorithm for LIGO Livingston is 12.9, and the same code gives 2.5 for Virgo. Virgo was about 2.8 times less sensitive that Livingston at the time, so you might be wondering why we have a signal-to-noise ratio of 2.8, instead of 4.6? The reason is that our detectors are not equally sensitive in all directions. They are most sensitive directly to sources directly above and below, and less sensitive to sources from the sides. The relative signal-to-noise ratios, together with the time or arrival at the different detectors, helps us to figure out the directions the signal comes from.

Detection thresholds

In O2, GW170818 was only detected by GstLAL because its signal-to-noise ratios in Hanford and Virgo (4.1 and 4.2 respectively) were below the threshold used by PyCBC for their analysis (in O2 it was 5.5). Subsequently, PyCBC has been rerun on the O2 data to produce the second Open Gravitational-wave Catalog (2-OGC). This is an analysis performed by PyCBC experts both inside and outside the LIGO Scientific & Virgo Collaboration. For this, a threshold of 4 was used, and consequently they found GW170818, which is nice.

I expect that if the threshold for our usual multiple-detector detection pipelines were lowered to ~2, they would find GW190425. Doing so would make the analysis much trickier, so I’m not sure if anyone will ever attempt this. Let’s see. Perhaps the 3-OGC team will be feeling ambitious?

Rates calculations

In comparing rates calculated for this papers and those from our end-of-O2 paper, my student Chase Kimball (who calculated the new numbers) would like me to remember that it’s not exactly an apples-to-apples comparison. The older numbers evaluated our sensitivity to gravitational waves by doing a large number of injections: we simulated signals in our data and saw what fraction of search algorithms could pick out. The newer numbers used an approximation (using a simple signal-to-noise ratio threshold) to estimate our sensitivity. Performing injections is computationally expensive, so we’re saving that for our end-of-run papers. Given that we currently have only two detections, the uncertainty on the rates is large, and so we don’t need to worry too much about the details of calculating the sensitivity. We did calibrate our approximation to past injection results, so I think it’s really an apples-to-pears-carved-into-the-shape-of-apples comparison.

Paper release

The original plan for GW190425 was to have the paper published before the announcement, as we did with our early detections. The timeline neatly aligned with the AAS meeting, so that seemed like an good place to make the announcement. We managed to get the the paper submitted, and referee reports back, but we didn’t quite get everything done in time for the AAS announcement, so Plan B was to have the paper appear on the arXiv just after the announcement. Unfortunately, there was a problem uploading files to the arXiv (too large), and by the time that was fixed the posting deadline had passed. Therefore, we went with Plan C or sharing the paper on the LIGO DCC. Next time you’re struggling to upload something online, remember that it happens to Nobel-Prize winning scientific collaborations too.

On the question of when it is best to share a paper, I’m still not decided. I like the idea of being peer-reviewed before making a big splash in the media. I think it is important to show that science works by having lots of people study a topic, before coming to a consensus. Evidence needs to be evaluated by independent experts. On the other hand, engaging the entire community can lead to greater insights than a couple of journal reviewers, and posting to arXiv gives opportunity to make adjustments before you having the finished article.

I think I am leaning towards early posting in general—the amount of internal review that our Collaboration papers receive, satisfies my requirements that scientists are seen to be careful, and I like getting a wider range of comments—I think this leads to having the best paper in the end.

S

The joke that S stands for super, not hope is recycled from an article I wrote for the LIGO Magazine. The editor, Hannah Middleton wasn’t sure that many people would get the reference, but graciously printed it anyway. Did people get it, or do I need to fly around the world really fast?

Deep and rapid observations of strong-lensing galaxy clusters within the sky localisation of GW170814

Gravitational waves and gravitational lensing are two predictions of general relativity. Gravitational waves are produced whenever masses accelerate. Gravitational lensing is produced by anything with mass. Gravitational lensing can magnify images, making it easier to spot far away things. In theory, gravitational waves can be lensed too. In this paper, we looked for evidence that GW170814 might have been lensed. (We didn’t find any, but this was my first foray into traditional astronomy).

The lensing of gravitational waves

Strong gravitational lensing magnifies a signal. A gravitational wave which has been lensed would therefore have a larger amplitude than if it had not been lensed. We infer the distance to the source of a gravitational wave from the amplitude. If we didn’t know a signal was lensed, we’d therefore think the source is much closer than it really is.

The shape of the gravitational wave encodes the properties of the source. This information is what lets us infer parameters. The example signal is GW150914 (which is fairly similar to GW170814). I made this explainer with Ban Farr and Nutsinee Kijbunchoo for the LIGO Magazine.

Mismeasuring the distance to a gravitational wave has important consequences for understanding their sources. As the gravitational wave travels across the expanding Universe, it gets stretched (redshifted) so by the time it arrives at our detectors it has a longer wavelength (and shorter frequency). If we assume that a signal came from a closer source, we’ll underestimate the amount of stretching the signal has undergone, and won’t fully correct for it. This means we’ll overestimate the masses when we infer them from the signal.

This possibility got a few people thinking when we announced our first detection, as GW150914 was heavier than previously observed black holes. Could we be seeing lensed gravitational waves?

Such strongly lensed gravitational waves should be multiply imaged. We should be able to see multiple copies of the same signal which have taken different paths from the source and then are bent by the gravity of the lens to reach us at different times. The delay time between images depends on the mass of the lens, with bigger lensing having longer delays. For galaxy clusters, it can be years.

The idea

Some of my former Birmingham colleagues who study gravitational lensing, were thinking about the possibility of having multiply imaged gravitational waves. I pointed out how difficult these would be to identify. They would come from the same part of the sky, and would have the same source parameters. However, since our uncertainties are so large for gravitational wave observations, I thought it would be tough to convince yourself that you’d seen the same signal twice [bonus note]. Lensing is expected to be rare [bonus note], so would you put your money on two signals (possibly years apart) being the same, or there just happening to be two similar systems somewhere in this huge patch of the sky?

However, if there were an optical counterpart to the merger, it would be much easier to tell that it was lensed. Since we know the location of galaxy clusters which could strongly lens a signal, we can target searches looking for counterparts at these clusters. The odds of finding anything are slim, but since this doesn’t take too much telescope time to look it’s still a gamble worth taking, as the potential pay-off would be huge.

Somehow [bonus note], I got involved in observing proposals to look for strongly lensed. We got everything in place for the last month of O2. It was just one month, so I wasn’t anticipating there being that much to do. I was very wrong.

GW170814

For GW170814 there were a couple of galaxy clusters which could serve as being strong gravitational lenses. Abell 3084 started off as the more probably, but as the sky localization for GW170814 was refined, SMACS J0304.3−4401 looked like the better bet.

Sky localization for GW170814 and the galaxy clusters Abell 3084 (filled circle), and SMACS J0304.3−4401 (open). The left plot shows the low-latency Bayestar localization (LIGO only dotted, LIGO and Virgo solid), and the right shows the refined LALInference sky maps (solid from GCN 21493, which we used for our observations, and dotted from GWTC-1). The dashed lines shows the Galactic plane. Figure 1 of Smith et al. (2019).

We observed both galaxy clusters using the Gemini Multi-Object Spectrographs (GMOS) on Gemini South and the Multi Unit Spectroscopic Explorer (MUSE) on the Very Large Telescope, both in Chile. You’ll never guess what we found…

That’s right, absolutely nothing! [bonus note] That’s not actually too surprising. GW170814‘s source was identified as a binary black hole—assuming no lensing, its source binary had masses around 25 and 30 solar masses. We don’t expect significant electromagnetic emission from a binary black hole merger (which would make it a big discovery if found, but that is a long shot). If there source were lensed, we would have overestimated the source masses, but to get the source into the neutron star mass range would take a ridiculous amount of lensing. However, the important point is that we have demonstrated that such a search for strong lensed images is possible!

The future

In O3 [bonus notebonus note], the team has been targeting lower mass systems, where a neutron star may get mislabelled as a black hole by mistake due to a moderate amount of lensing. A false identification here  could confuse our understanding of the minimum mass of a black hole, and also mean that we miss all sorts of lovely multimessenger observations, so this seems like a good plan to me.

arXiv: 1805.07370 [astro-ph.HE]
Journal: Monthly Notices of the Royal Astronomical Society; 485(4):5180–5191; 2019
Conference proceedings: 1803.07851 [astro-ph.HE] (from when work was still in-progress)
Future research: Are Double Stuf Oreos just gravitationally lensed regular Oreos?

Bonus notes

Statistical analysis

It is possible to do a statistical analysis to calculate the probability of two signals being lensed images of each. The best attempt I’ve seen at this is Hannuksela et al. (2019). They do a nice study considering lensing by galaxies (and find nothing conclusive).

Biasing merger rates

If we included lensed events in our calculations of the merger rate density (the rate of mergers per unit volume of space), without correcting for them being lensed, we would overestimate the merger rate density. We’d assume that all our mergers came from a smaller volume of space than they actually did, as we wouldn’t know that the lensed events are being seen from further away. As long as the fraction of lensed events is small, this shouldn’t be a big problem, so we’re probably safe not to worry about it.

Slippery slope

What actually happened was my then boss, Alberto Vecchio, asked me to do some calculations based upon the sky maps for our detections in O1 as they’d only take me 5 minutes. Obviously, there were then more calculations, advice about gravitational wave alerts, feedback on observing proposals… and eventually I thought that if I’d put in this much time I might as well get a paper to show for it.

It was interesting to see how electromagnetic observing works, but I’m not sure I’d do it again.

Upper limits

Following tradition, when we don’t make a detection, we can set an upper limit on what could be there. In this case, we conclude that there is nothing to see down to an i-band magnitude of 25. This is pretty faint, about 40 million times fainter than something you could see with the naked eye (translating to visibly light). We can set such a good upper limit (compared to other follow-up efforts) as we only needed to point the telescopes at a small patch of sky around the galaxy clusters, and so we could leave them staring for a relatively long time.

O3 lensing hype

In O3, two gravitational wave candidates (S190828j and S190828l) were found just 21 minutes apart—this, for reasons I don’t entirely understand, led to much speculation that they were multiple images of a gravitationally lensed source. For a comprehensive debunking, follow this Twitter thread.

Second star to the right and straight on ’til morning—Astrophysics white papers

What will be the next big thing in astronomy? One of the hard things about research is that you often don’t know what you will discover before you embark on an investigation. An idea might work out, or it might not, or along the way you might discover something unexpected which is far more interesting. As you might imagine, this can make laying definite plans difficult…

However, it is important to have plans for research. While you might not be sure of the outcome, it is necessary to weigh the risks and rewards associated with the probable results before you invest your time and taxpayers’ money!

To help with planning and prioritising, researchers in astrophysics often pull together white papers [bonus note]. These are sketches of ideas for future research, arguing why you think they might be interesting. These can then be discussed within the community to help shape the direction of the field. If other scientists find the paper convincing, you can build support which helps push for funding. If there are gaps in the logic, others can point these out to ave you heading the wrong way. This type of consensus building is especially important for large experiments or missions—you don’t want to spend a billion dollars on something unless you’re really sure it is a good idea and lots of people agree.

I have been involved with a few white papers recently. Here are some key ideas for where research should go.

Ground-based gravitational-wave detectors: The next generation

We’ve done some awesome things with Advanced LIGO and Advanced Virgo. In just a couple of years we have revolutionized our understanding of binary black holes. That’s not bad. However, our current gravitational-wave observatories are limited in what they can detect. What amazing things could we achieve with a new generation of detectors?

It can take decades to develop new instruments, therefore it’s important to start thinking about them early. Obviously, what we would most like is an observatory which can detect everything, but that’s not feasible. In this white paper, we pick the questions we most want answered, and see what the requirements for a new detector would be. A design which satisfies these specifications would therefore be a solid choice for future investment.

Binary black holes are the perfect source for ground-based detectors. What do we most want to know about them?

1. How many mergers are there, and how does the merger rate change over the history of the Universe? We want to know how binary black holes are made. The merger rate encodes lots of information about how to make binaries, and comparing how this evolves compared with the rate at which the Universe forms stars, will give us a deeper understanding of how black holes are made.
2. What are the properties (masses and spins) of black holes? The merger rate tells us some things about how black holes form, but other properties like the masses, spins and orbital eccentricity complete the picture. We want to make precise measurements for individual systems, and also understand the population.
3. Where do supermassive black holes come from? We know that stars can collapse to produce stellar-mass black holes. We also know that the centres of galaxies contain massive black holes. Where do these massive black holes come from? Do they grow from our smaller black holes, or do they form in a different way? Looking for intermediate-mass black holes in the gap in-between will tells us whether there is a missing link in the evolution of black holes.

The detection horizon (the distance to which sources can be detected) for Advanced LIGO (aLIGO), its upgrade A+, and the proposed Cosmic Explorer (CE) and Einstein Telescope (ET). The horizon is plotted for binaries with equal-mass, nonspinning components. Adapted from Hall & Evans (2019).

What can we do to answer these questions?

1. Increase sensitivity! Advanced LIGO and Advanced Virgo can detect a $30 M_\odot + 30 M_\odot$ binary out to a redshift of about $z \approx 1$. The planned detector upgrade A+ will see them out to redshift $z \approx 2$. That’s pretty impressive, it means we’re covering 10 billion years of history. However, the peak in the Universe’s star formation happens at around $z \approx 2$, so we’d really like to see beyond this in order to measure how the merger rate evolves. Ideally we would see all the way back to cosmic dawn at $z \approx 20$ when the Universe was only 200 million years old and the first stars light up.
2. Increase our frequency range! Our current detectors are limited in the range of frequencies they can detect. Pushing to lower frequencies helps us to detect heavier systems. If we want to detect intermediate-mass black holes of $100 M_\odot$ we need this low frequency sensitivity. At the moment, Advanced LIGO could get down to about $10~\mathrm{Hz}$. The plot below shows the signal from a $100 M_\odot + 100 M_\odot$ binary at $z = 10$. The signal is completely undetectable at $10~\mathrm{Hz}$.

The gravitational wave signal from the final stages of inspiral, merger and ringdown of a two 100 solar mass black holes at a redshift of 10. The signal chirps up in frequency. The colour coding shows parts of the signal above different frequencies. Part of Figure 2 of the Binary Black Holes White Paper.

3. Increase sensitivity and frequency range! Increasing sensitivity means that we will have higher signal-to-noise ratio detections. For these loudest sources, we will be able to make more precise measurements of the source properties. We will also have more detections overall, as we can survey a larger volume of the Universe. Increasing the frequency range means we can observe a longer stretch of the signal (for the systems we currently see). This means it is easier to measure spin precession and orbital eccentricity. We also get to measure a wider range of masses. Putting the improved sensitivity and frequency range together means that we’ll get better measurements of individual systems and a more complete picture of the population.

How much do we need to improve our observatories to achieve our goals? To quantify this, lets consider the boost in sensitivity relative to A+, which I’ll call $\beta_\mathrm{A+}$. If the questions can be answered with $\beta_\mathrm{A+} = 1$, then we don’t need anything beyond the currently planned A+. If we need a slightly larger $\beta_\mathrm{A+}$, we should start investigating extra ways to improve the A+ design. If we need much larger $\beta_\mathrm{A+}$, we need to think about new facilities.

The plot below shows the boost necessary to detect a binary (with equal-mass nonspinning components) out to a given redshift. With a boost of $\beta_\mathrm{A+} = 10$ (blue line) we can survey black holes around $10 M_\odot$$30 M_\odot$ across cosmic time.

The boost factor (relative to A+) $\beta_\mathrm{A+}$ needed to detect a binary with a total mass $M$ out to redshift $z$. The binaries are assumed to have equal-mass, nonspinning components. The colour scale saturates at $\log_{10} \beta_\mathrm{A+} = 4.5$. The blue curve highlights the reach at a boost factor of $\beta_\mathrm{A+} = 10$. The solid and dashed white lines indicate the maximum reach of Cosmic Explorer and the Einstein Telescope, respectively. Part of Figure 1 of the Binary Black Holes White Paper.

The plot above shows that to see intermediate-mass black holes, we do need to completely overhaul the low-frequency sensitivity. What do we need to detect a $100 M_\odot + 100 M_\odot$ binary at $z = 10$? If we parameterize the noise spectrum (power spectral density) of our detector as $S_n(f) = S_{10}(f/10~\mathrm{Hz})^\alpha$ with a lower cut-off frequency of $f_\mathrm{min}$, we can investigate the various possibilities. The plot below shows the possible combinations of parameters which meet of requirements.

Requirements on the low-frequency noise power spectrum necessary to detect an optimally oriented intermediate-mass binary black hole system with two 100 solar mass components at a redshift of 10. Part of Figure 2 of the Binary Black Holes White Paper.

To build up information about the population of black holes, we need lots of detections. Uncertainties scale inversely with the square root of the number of detections, so you would expect few percent uncertainty after 1000 detections. If we want to see how the population evolves, we need these many per redshift bin! The plot below shows the number of detections per year of observing time for different boost factors. The rate starts to saturate once we detect all the binaries in the redshift range. This is as good as you’ll ever going to get.

Expected rate of binary black hole detections $R_\mathrm{det}$ per redshift bin as a function of A+ boost factor $\beta_\mathrm{A+}$ for three redshift bins. The merging binaries are assumed to be uniformly distributed with a constant merger rate roughly consistent with current observations: the solid line is about the current median, while the dashed and dotted lines are roughly the 90% bounds. Figure 3 of the Binary Black Holes White Paper.

Looking at the plots above, it is clear that A+ is not going to satisfy our requirements. We need something with a boost factor of $\beta_\mathrm{A+} = 10$: a next-generation observatory. Both the Cosmic Explorer and Einstein Telescope designs do satisfy our goals.

Title: Deeper, wider, sharper: Next-generation ground-based gravitational-wave observations of binary black holes
arXiv:
1903.09220 [astro-ph.HE]
Theme music: Daft Punk

Extreme mass ratio inspirals are awesome

We have seen gravitational waves from a stellar-mass black hole merging with another stellar-mass black hole, can we observe a stellar-mass black hole merging with a massive black hole? Yes, these are a perfect source for a space-based gravitational wave observatory. We call these systems extreme mass-ratio inspirals (or EMRIs, pronounced em-rees, for short) [bonus note].

Having such an extreme mass ratio, with one black hole much bigger than the other, gives EMRIs interesting properties. The number of orbits over the course of an inspiral scales with the mass ratio: the more extreme the mass ratio, the more orbits there are. Each of these gives us something to measure in the gravitational wave signal.

A short section of an orbit around a spinning black hole. While inspirals last for years, this would represent only a few hours around a black hole of mass $M = 10^6 M_\odot$. The position is measured in terms of the gravitational radius $r_\mathrm{g} = GM/c^2$. The innermost stable orbit for this black hole would be about $r_\mathrm{g} = 2.3$. Part of Figure 1 of the EMRI White Paper.

As EMRIs are so intricate, we can make exquisit measurements of the source properties. These will enable us to:

• Measure the masses of both black holes to better than 10% precision
• Reconstruct the mass distribution of massive black holes out to redshift $z \gtrsim 4$
• Measure massive black hole spins to a precision of better than 0.001, giving us an insight into how they formed
• Perform precision tests of the no-hair theorem describing black holes in general relativity, and test alternative theories of gravity in the strong-field regime
• Cross-correlate locations with galaxy catalogues to measure the expansion of the Universe

Event rates for EMRIs are currently uncertain: there could be just one per year or thousands. From the rate we can figure out the details of what is going in in the nuclei of galaxies, and what types of objects you find there.

With EMRIs you can unravel mysteries in astrophysics, fundamental physics and cosmology.

Have we sold you that EMRIs are awesome? Well then, what do we need to do to observe them? There is only one currently planned mission which can enable us to study EMRIs: LISA. To maximise the science from EMRIs, we have to support LISA.

As an aspiring scientist, Lisa Simpson is a strong supporter of the LISA mission. Credit: Fox

Title: The unique potential of extreme mass-ratio inspirals for gravitational-wave astronomy
arXiv:
1903.03686 [astro-ph.HE]
Theme music: Muse

Bonus notes

White paper vs journal article

Since white papers are proposals for future research, they aren’t as rigorous as usual academic papers. They are really attempts to figure out a good question to ask, rather than being answers. White papers are not usually peer reviewed before publication—the point is that you want everybody to comment on them, rather than just one or two anonymous referees.

Whilst white papers aren’t quite the same class as journal articles, they do still contain some interesting ideas, so I thought they still merit a blog post.

Recycling

I have blogged about EMRIs before, so I won’t go into too much detail here. It was one of my former blog posts which inspired the LISA Science Team to get in touch to ask me to write the white paper.

The O2 Catalogue—It goes up to 11

The full results of our second advanced-detector observing run (O2) have now been released—we’re pleased to announce four new gravitational wave signals: GW170729, GW170809, GW170818 and GW170823 [bonus note]. These latest observations are all of binary black hole systems. Together, they bring our total to 10 observations of binary black holes, and 1 of a binary neutron star. With more frequent detections on the horizon with our third observing run due to start early 2019, the era of gravitational wave astronomy is truly here.

The population of black holes and neutron stars observed with gravitational waves and with electromagnetic astronomy. You can play with an interactive version of this plot online.

The new detections are largely consistent with our previous findings. GW170809, GW170818 and GW170823 are all similar to our first detection GW150914. Their black holes have masses around 20 to 40 times the mass of our Sun. I would lump GW170104 and GW170814 into this class too. Although there were models that predicted black holes of these masses, we weren’t sure they existed until our gravitational wave observations. The family of black holes continues out of this range. GW151012, GW151226 and GW170608 fall on the lower mass side. These overlap with the population of black holes previously observed in X-ray binaries. Lower mass systems can’t be detected as far away, so we find fewer of these. On the higher end we have GW170729 [bonus note]. Its source is made up of black holes with masses $50.2^{+16.2}_{-10.2} M_\odot$ and $34.0^{+9.1}_{-10.1} M_\odot$ (where $M_\odot$ is the mass of our Sun). The larger black hole is a contender for the most massive black hole we’ve found in a binary (the other probable contender is GW170823’s source, which has a $39.5^{+11.2}_{-6.7} M_\odot$ black hole). We have a big happy family of black holes!

Of the new detections, GW170729, GW170809 and GW170818 were both observed by the Virgo detector as well as the two LIGO detectors. Virgo joined O2 for an exciting August [bonus note], and we decided that the data at the time of GW170729 were good enough to use too. Unfortunately, Virgo wasn’t observing at the time of GW170823. GW170729 and GW170809 are very quiet in Virgo, you can’t confidently say there is a signal there [bonus note]. However, GW170818 is a clear detection like GW170814. Well done Virgo!

Using the collection of results, we can start understand the physics of these binary systems. We will be summarising our findings in a series of papers. A huge amount of work went into these.

The papers

The O2 Catalogue Paper

Title: GWTC-1: A gravitational-wave transient catalog of compact binary mergers observed by LIGO and Virgo during the first and second observing runs
arXiv:
1811.12907 [astro-ph.HE]
Data: Catalogue; Parameter estimation results
Journal: Physical Review X; 9(3):031040(49); 2019
LIGO science summary: GWTC-1: A new catalog of gravitational-wave detections

The paper summarises all our observations of binaries to date. It covers our first and second observing runs (O1 and O2). This is the paper to start with if you want any information. It contains estimates of parameters for all our sources, including updates for previous events. It also contains merger rate estimates for binary neutron stars and binary black holes, and an upper limit for neutron star–black hole binaries. We’re still missing a neutron star–black hole detection to complete the set.

More details: The O2 Catalogue Paper

The O2 Populations Paper

Title: Binary black hole population properties inferred from the first and second observing runs of Advanced LIGO and Advanced Virgo
arXiv:
1811.12940 [astro-ph.HE]
Journal: Astrophysical Journal Letters; 882(2):L24(30); 2019
Data: Population inference results
LIGO science summary: Binary black hole properties inferred from O1 and O2

Using our set of ten binary black holes, we can start to make some statistical statements about the population: the distribution of masses, the distribution of spins, the distribution of mergers over cosmic time. With only ten observations, we still have a lot of uncertainty, and can’t make too many definite statements. However, if you were wondering why we don’t see any more black holes more massive than GW170729, even though we can see these out to significant distances, so are we. We infer that almost all stellar-mass black holes have masses less than $45 M_\odot$.

More details: The O2 Populations Paper

The O2 Catalogue Paper

Synopsis: O2 Catalogue Paper
Read this if: You want the most up-to-date gravitational results
Favourite part: It’s out! We can tell everyone about our FOUR new detections

This is a BIG paper. It covers our first two observing runs and our main searches for coalescing stellar mass binaries. There will be separate papers going into more detail on searches for other gravitational wave signals.

The instruments

Gravitational wave detectors are complicated machines. You don’t just take them out of the box and press go. We’ll be slowly improving the sensitivity of our detectors as we commission them over the next few years. O2 marks the best sensitivity achieved to date. The paper gives a brief overview of the detector configurations in O2 for both LIGO detectors, which did differ, and Virgo.

During O2, we realised that one source of noise was beam jitter, disturbances in the shape of the laser beam. This was particularly notable in Hanford, where there was a spot on the one of the optics. Fortunately, we are able to measure the effects of this, and hence subtract out this noise. This has now been done for the whole of O2. It makes a big difference! Derek Davis and TJ Massinger won the first LIGO Laboratory Award for Excellence in Detector Characterization and Calibration™ for implementing this noise subtraction scheme (the award citation almost spilled the beans on our new detections). I’m happy that GW170104 now has an increased signal-to-noise ratio, which means smaller uncertainties on its parameters.

The searches

We use three search algorithms in this paper. We have two matched-filter searches (GstLAL and PyCBC). These compare a bank of templates to the data to look for matches. We also use coherent WaveBurst (cWB), which is a search for generic short signals, but here has been tuned to find the characteristic chirp of a binary. Since cWB is more flexible in the signals it can find, it’s slightly less sensitive than the matched-filter searches, but it gives us confidence that we’re not missing things.

The two matched-filter searches both identify all 11 signals with the exception of GW170818, which is only found by GstLAL. This is because PyCBC only flags signals above a threshold in each detector. We’re confident it’s real though, as it is seen in all three detectors, albeit below PyCBC’s threshold in Hanford and Virgo. (PyCBC only looked at signals found in coincident Livingston and Hanford in O2, I suspect they would have found it if they were looking at all three detectors, as that would have let them lower their threshold).

The search pipelines try to distinguish between signal-like features in the data and noise fluctuations. Having multiple detectors is a big help here, although we still need to be careful in checking for correlated noise sources. The background of noise falls off quickly, so there’s a rapid transition between almost-certainly noise to almost-certainly signal. Most of the signals are off the charts in terms of significance, with GW170818, GW151012 and GW170729 being the least significant. GW170729 is found with best significance by cWB, that gives reports a false alarm rate of $1/(50~\mathrm{yr})$.

Cumulative histogram of results from GstLAL (top left), PyCBC (top right) and cWB (bottom). The expected background is shown as the dashed line and the shaded regions give Poisson uncertainties. The search results are shown as the solid red line and named gravitational-wave detections are shown as blue dots. More significant results are further to the right of the plot. Fig. 2 and Fig. 3 of the O2 Catalogue Paper.

The false alarm rate indicates how often you would expect to find something at least as signal like if you were to analyse a stretch of data with the same statistical properties as the data considered, assuming that they is only noise in the data. The false alarm rate does not fold in the probability that there are real gravitational waves occurring at some average rate. Therefore, we need to do an extra layer of inference to work out the probability that something flagged by a search pipeline is a real signal versus is noise.

The results of this calculation is given in Table IV. GW170729 has a 94% probability of being real using the cWB results, 98% using the GstLAL results, but only 52% according to PyCBC. Therefore, if you’re feeling bold, you might, say, only wager the entire economy of the UK on it being real.

We also list the most marginal triggers. These all have probabilities way below being 50% of being real: if you were to add them all up you wouldn’t get a total of 1 real event. (In my professional opinion, they are garbage). However, if you want to check for what we might have missed, these may be a place to start. Some of these can be explained away as instrumental noise, say scattered light. Others show no obvious signs of disturbance, so are probably just some noise fluctuation.

The source properties

We give updated parameter estimates for all 11 sources. These use updated estimates of calibration uncertainty (which doesn’t make too much difference), improved estimate of the noise spectrum (which makes some difference to the less well measured parameters like the mass ratio), the cleaned data (which helps for GW170104), and our most currently complete waveform models [bonus note].

This plot shows the masses of the two binary components (you can just make out GW170817 down in the corner). We use the convention that the more massive of the two is $m_1$ and the lighter is $m_2$. We are now really filling in the mass plot! Implications for the population of black holes are discussed in the Populations Paper.

Estimated masses for the two binary objects for each of the events in O1 and O2. From lowest chirp mass (left; red) to highest (right; purple): GW170817 (solid), GW170608 (dashed), GW151226 (solid), GW151012 (dashed), GW170104 (solid), GW170814 (dashed), GW170809 (dashed), GW170818 (dashed), GW150914 (solid), GW170823 (dashed), GW170729 (solid). The contours mark the 90% credible regions. The grey area is excluded from our convention on masses. Part of Fig. 4 of the O2 Catalogue Paper. The mass ratio is $q = m_2/m_1$.

As well as mass, black holes have a spin. For the final black hole formed in the merger, these spins are always around 0.7, with a little more or less depending upon which way the spins of the two initial black holes were pointing. As well as being probably the most most massive, GW170729’s could have the highest final spin! It is a record breaker. It radiated a colossal $4.8^{+1.7}_{-1.7} M_\odot$ worth of energy in gravitational waves [bonus note].

Estimated final masses and spins for each of the binary black hole events in O1 and O2. From lowest chirp mass (left; red–orange) to highest (right; purple): GW170608 (dashed), GW151226 (solid), GW151012 (dashed), GW170104 (solid), GW170814 (dashed), GW170809 (dashed), GW170818 (dashed), GW150914 (solid), GW170823 (dashed), GW170729 (solid). The contours mark the 90% credible regions. Part of Fig. 4 of the O2 Catalogue Paper.

There is considerable uncertainty on the spins as there are hard to measure. The best combination to pin down is the effective inspiral spin parameter $\chi_\mathrm{eff}$. This is a mass weighted combination of the spins which has the most impact on the signal we observe. It could be zero if the spins are misaligned with each other, point in the orbital plane, or are zero. If it is non-zero, then it means that at least one black hole definitely has some spin. GW151226 and GW170729 have $\chi_\mathrm{eff} > 0$ with more than 99% probability. The rest are consistent with zero. The spin distribution for GW170104 has tightened up for GW170104 as its signal-to-noise ratio has increased, and there’s less support for negative $\chi_\mathrm{eff}$, but there’s been no move towards larger positive $\chi_\mathrm{eff}$.

Estimated effective inspiral spin parameters for each of the events in O1 and O2. From lowest chirp mass (left; red) to highest (right; purple): GW170817, GW170608, GW151226, GW151012, GW170104, GW170814, GW170809, GW170818, GW150914, GW170823, GW170729. Part of Fig. 5 of the O2 Catalogue Paper.

For our analysis, we use two different waveform models to check for potential sources of systematic error. They agree pretty well. The spins are where they show most difference (which makes sense, as this is where they differ in terms of formulation). For GW151226, the effective precession waveform IMRPhenomPv2 gives $0.20^{+0.18}_{-0.08}$ and the full precession model gives $0.15^{+0.25}_{-0.11}$ and extends to negative $\chi_\mathrm{eff}$. I panicked a little bit when I first saw this, as GW151226 having a non-zero spin was one of our headline results when first announced. Fortunately, when I worked out the numbers, all our conclusions were safe. The probability of $\chi_\mathrm{eff} < 0$ is less than 1%. In fact, we can now say that at least one spin is greater than $0.28$ at 99% probability compared with $0.2$ previously, because the full precession model likes spins in the orbital plane a bit more. Who says data analysis can’t be thrilling?

Our measurement of $\chi_\mathrm{eff}$ tells us about the part of the spins aligned with the orbital angular momentum, but not in the orbital plane. In general, the in-plane components of the spin are only weakly constrained. We basically only get back the information we put in. The leading order effects of in-plane spins is summarised by the effective precession spin parameter $\chi_\mathrm{p}$. The plot below shows the inferred distributions for $\chi_\mathrm{p}$. The left half for each event shows our results, the right shows our prior after imposed the constraints on spin we get from $\chi_\mathrm{eff}$. We get the most information for GW151226 and GW170814, but even then it’s not much, and we generally cover the entire allowed range of values.

Estimated effective inspiral spin parameters for each of the events in O1 and O2. From lowest chirp mass (left; red) to highest (right; purple): GW170817, GW170608, GW151226, GW151012, GW170104, GW170814, GW170809, GW170818, GW150914, GW170823, GW170729. The left (coloured) part of the plot shows the posterior distribution; the right (white) shows the prior conditioned by the effective inspiral spin parameter constraints. Part of Fig. 5 of the O2 Catalogue Paper.

One final measurement which we can make (albeit with considerable uncertainty) is the distance to the source. The distance influences how loud the signal is (the further away, the quieter it is). This also depends upon the inclination of the source (a binary edge-on is quieter than a binary face-on/off). Therefore, the distance is correlated with the inclination and we end up with some butterfly-like plots. GW170729 is again a record setter. It comes from a luminosity distance of $2.84^{+1.40}_{-1.36}~\mathrm{Gpc}$ away. That means it has travelled across the Universe for $3.2$$6.2$ billion years—it potentially started its journey before the Earth formed!

Estimated luminosity distances and orbital inclinations for each of the events in O1 and O2. From lowest chirp mass (left; red) to highest (right; purple): GW170817 (solid), GW170608 (dashed), GW151226 (solid), GW151012 (dashed), GW170104 (solid), GW170814 (dashed), GW170809 (dashed), GW170818 (dashed), GW150914 (solid), GW170823 (dashed), GW170729 (solid). The contours mark the 90% credible regions.An inclination of zero means that we’re looking face-on along the direction of the total angular momentum, and inclination of $\pi/2$ means we’re looking edge-on perpendicular to the angular momentum. Part of Fig. 7 of the O2 Catalogue Paper.

Waveform reconstructions

To check our results, we reconstruct the waveforms from the data to see that they match our expectations for binary black hole waveforms (and there’s not anything extra there). To do this, we use unmodelled analyses which assume that there is a coherent signal in the detectors: we use both cWB and BayesWave. The results agree pretty well. The reconstructions beautifully match our templates when the signal is loud, but, as you might expect, can resolve the quieter details. You’ll also notice the reconstructions sometimes pick up a bit of background noise away from the signal. This gives you and idea of potential fluctuations.

Time–frequency maps and reconstructed signal waveforms for the binary black holes. For each event we show the results from the detector where the signal was loudest. The left panel for each shows the time–frequency spectrogram with the upward-sweeping chip. The right show waveforms: blue the modelled waveforms used to infer parameters (LALInf; top panel); the red wavelet reconstructions (BayesWave; top panel); the black is the maximum-likelihood cWB reconstruction (bottom panel), and the green (bottom panel) shows reconstructions for simulated similar signals. I think the agreement is pretty good! All the data have been whitened as this is how we perform the statistical analysis of our data. Fig. 10 of the O2 Catalogue Paper.

I still think GW170814 looks like a slug. Some people think they look like crocodiles.

We’ll be doing more tests of the consistency of our signals with general relativity in a future paper.

Merger rates

Given all our observations now, we can set better limits on the merger rates. Going from the number of detections seen to the number merger out in the Universe depends upon what you assume about the mass distribution of the sources. Therefore, we make a few different assumptions.

For binary black holes, we use (i) a power-law model for the more massive black hole similar to the initial mass function of stars, with a uniform distribution on the mass ratio, and (ii) use uniform-in-logarithmic distribution for both masses. These were designed to bracket the two extremes of potential distributions. With our observations, we’re starting to see that the true distribution is more like the power-law, so I expect we’ll be abandoning these soon. Taking the range of possible values from our calculations, the rate is in the range of $9.7$$101~\mathrm{Gpc^{-3}\,yr^{-1}}$ for black holes between $5 M_\odot$ and $50 M_\odot$ [bonus note].

For binary neutron stars, which are perhaps more interesting astronomers, we use a uniform distribution of masses between $0.8 M_\odot$ and $2.3 M_\odot$, and a Gaussian distribution to match electromagnetic observations. We find that these bracket the range $97$$4440~\mathrm{Gpc^{-3}\,yr^{-1}}$. This larger than are previous range, as we hadn’t considered the Gaussian distribution previously.

90% upper limits for neutron star–black hole binaries. Three black hole masses were tried and two spin distributions. Results are shown for the two matched-filter search algorithms. Fig. 14 of the O2 Catalogue Paper.

Finally, what about neutron star–black holes? Since we don’t have any detections, we can only place an upper limit. This is a maximum of $610~\mathrm{Gpc^{-3}\,yr^{-1}}$. This is about a factor of 2 better than our O1 results, and is starting to get interesting!

We are sure to discover lots more in O3… [bonus note].

The O2 Populations Paper

Synopsis: O2 Populations Paper
Read this if: You want the best family portrait of binary black holes
Favourite part: A maximum black hole mass?

Each detection is exciting. However, we can squeeze even more science out of our observations by looking at the entire population. Using all 10 of our binary black hole observations, we start to trace out the population of binary black holes. Since we still only have 10, we can’t yet be too definite in our conclusions. Our results give us some things to ponder, while we are waiting for the results of O3. I think now is a good time to start making some predictions.

We look at the distribution of black hole masses, black hole spins, and the redshift (cosmological time) of the mergers. The black hole masses tell us something about how you go from a massive star to a black hole. The spins tell us something about how the binaries form. The redshift tells us something about how these processes change as the Universe evolves. Ideally, we would look at these all together allowing for mixtures of binary black holes formed through different means. Given that we only have a few observations, we stick to a few simple models.

To work out the properties of the population, we perform a hierarchical analysis of our 10 binary black holes. We infer the properties of the individual systems, assuming that they come from a given population, and then see how well that population fits our data compared with a different distribution.

In doing this inference, we account for selection effects. Our detectors are not equally sensitive to all sources. For example, nearby sources produce louder signals and we can’t detect signals that are too far away, so if you didn’t account for this you’d conclude that binary black holes only merged in the nearby Universe. Perhaps less obvious is that we are not equally sensitive to all source masses. More massive binaries produce louder signals, so we can detect these further way than lighter binaries (up to the point where these binaries are so high mass that the signals are too low frequency for us to easily spot). This is why we detect more binary black holes than binary neutron stars, even though there are more binary neutron stars out here in the Universe.

Masses

When looking at masses, we try three models of increasing complexity:

• Model A is a simple power law for the mass of the more massive black hole $m_1$. There’s no real reason to expect the masses to follow a power law, but the masses of stars when they form do, and astronomers generally like power laws as they’re friendly, so its a sensible thing to try. We fit for the power-law index. The power law goes from a lower limit of $5 M_\odot$ to an upper limit which we also fit for. The mass of the lighter black hole $m_2$ is assumed to be uniformly distributed between $5 M_\odot$ and the mass of the other black hole.
• Model B is the same power law, but we also allow the lower mass limit to vary from $5 M_\odot$. We don’t have much sensitivity to low masses, so this lower bound is restricted to be above $5 M_\odot$. I’d be interested in exploring lower masses in the future. Additionally, we allow the mass ratio $q = m_2/m_1$ of the black holes to vary, trying $q^{\beta_q}$ instead of Model A’s $q^0$.
• Model C has the same power law, but now with some smoothing at the low-mass end, rather than a sharp turn-on. Additionally, it includes a Gaussian component towards higher masses. This was inspired by the possibility of pulsational pair-instability supernova causing a build up of black holes at certain masses: stars which undergo this lose extra mass, so you’d end up with lower mass black holes than if the stars hadn’t undergone the pulsations. The Gaussian could fit other effects too, for example if there was a secondary formation channel, or just reflect that the pure power law is a bad fit.

In allowing the mass distributions to vary, we find overall rates which match pretty well those we obtain with our main power-law rates calculation included in the O2 Catalogue Paper, higher than with the main uniform-in-log distribution.

The fitted mass distributions are shown in the plot below. The error bars are pretty broad, but I think the models agree on some broad features: there are more light black holes than heavy black holes; the minimum black hole mass is below about $9 M_\odot$, but we can’t place a lower bound on it; the maximum black hole mass is above about $35 M_\odot$ and below about $50 M_\odot$, and we prefer black holes to have more similar masses than different ones. The upper bound on the black hole minimum mass, and the lower bound on the black hole upper mass are set by the smallest and biggest black holes we’ve detected, respectively.

Binary black hole merger rate as a function of the primary mass ($m_1$; top) and mass ratio ($q$; bottom). The solid lines and bands show the medians and 90% intervals. The dashed line shows the posterior predictive distribution: our expectation for future observations averaging over our uncertainties. Fig. 2 of the O2 Populations Paper.

That there does seem to be a drop off at higher masses is interesting. There could be something which stops stars forming black holes in this range. It has been proposed that there is a mass gap due to pair instability supernovae. These explosions completely disrupt their progenitor stars, leaving nothing behind. (I’m not sure if they are accompanied by a flash of green light). You’d expect this to kick for black holes of about $50$$60 M_\odot$. We infer that 99% of merging black holes have masses below $44.0 M_\odot$ with Model A, $41.8 M_\odot$ with Model B, and $41.8 M_\odot$ with Model C. Therefore, our results are not inconsistent with a mass gap. However, we don’t really have enough evidence to be sure.

We can compare how well each of our three models fits the data by looking at their Bayes factors. These naturally incorporate the complexity of the models: models with more parameters (which can be more easily tweaked to match the data) are penalised so that you don’t need to worry about overfitting. We have a preference for Model C. It’s not strong, but I think good evidence that we can’t use a simple power law.

Spins

To model the spins:

• For the magnitude, we assume a beta distribution. There’s no reason for this, but these are convenient distributions for things between 0 and 1, which are the limits on black hole spin (0 is nonspinning, 1 is as fast as you can spin). We assume that both spins are drawn from the same distribution.
• For the spin orientations, we use a mix of an isotropic distribution and a Gaussian centred on being aligned with the orbital angular momentum. You’d expect an isotropic distribution if binaries were assembled dynamically, and perhaps something with spins generally aligned with each other if the binary evolved in isolation.

We don’t get any useful information on the mixture fraction. Looking at the spin magnitudes, we have a preference towards smaller spins, but still have support for large spins. The more misaligned spins are, the larger the spin magnitudes can be: for the isotropic distribution, we have support all the way up to maximal values.

Inferred spin magnitude distributions. The left shows results for the parametric distribution, assuming a mixture of almost aligned and isotropic spin, with the median (solid), 50% and 90% intervals shaded, and the posterior predictive distribution as the dashed line. Results are included both for beta distributions which can be singular at 0 and 1, and with these excluded. Model V is a very low spin model shown for comparison. The right shows a binned reconstruction of the distribution for aligned and isotropic distributions, showing the median and 90% intervals. Fig. 8 of the O2 Populations Paper.

Since spins are harder to measure than masses, it is not surprising that we can’t make strong statements yet. If we were to find something with definitely negative $\chi_\mathrm{eff}$, we would be able to deduce that spins can be seriously misaligned.

Redshift evolution

As a simple model of evolution over cosmological time, we allow the merger rate to evolve as $(1+z)^\lambda$. That’s right, another power law! Since we’re only sensitive to relatively small redshifts for the masses we detect ($z < 1$), this gives a good approximation to a range of different evolution schemes.

Evolution of the binary black hole merger rate (blue), showing median, 50% and 90% intervals. For comparison, a non-evolving rate calculated using Model B is shown too. Fig. 6 of the O2 Populations Paper.

We find that we prefer evolutions that increase with redshift. There’s an 88% probability that $\lambda > 0$, but we’re still consistent with no evolution. We might expect rate to increase as star formation was higher bach towards $z =2$. If we can measure the time delay between forming stars and black holes merging, we could figure out what happens to these systems in the meantime.

The local merger rate is broadly consistent with what we infer with our non-evolving distributions, but is a little on the lower side.

Bonus notes

Naming

Gravitational waves are named as GW-year-month-day, so our first observation from 14 September 2015 is GW150914. We realise that this convention suffers from a Y2K-style bug, but by the time we hit 2100, we’ll have so many detections we’ll need a new scheme anyway.

Previously, we had a second designation for less significant potential detections. They were LIGO–Virgo Triggers (LVT), the one example being LVT151012. No-one was really happy with this designation, but it stems from us being cautious with our first announcement, and not wishing to appear over bold with claiming we’d seen two gravitational waves when the second wasn’t that certain. Now we’re a bit more confident, and we’ve decided to simplify naming by labelling everything a GW on the understanding that this now includes more uncertain events. Under the old scheme, GW170729 would have been LVT170729. The idea is that the broader community can decide which events they want to consider as real for their own studies. The current condition for being called a GW is that the probability of it being a real astrophysical signal is at least 50%. Our 11 GWs are safely above that limit.

The naming change has hidden the fact that now when we used our improved search pipelines, the significance of GW151012 has increased. It would now be a GW even under the old scheme. Congratulations LVT151012, I always believed in you!

Is it of extraterrestrial origin, or is it just a blurry figure? GW151012: the truth is out there!.

Burning bright

We are lacking nicknames for our new events. They came in so fast that we kind of lost track. Ilya Mandel has suggested that GW170729 should be the Tiger, as it happened on the International Tiger Day. Since tigers are the biggest of the big cats, this seems apt.

Carl-Johan Haster argues that LIGO+tiger = Liger. Since ligers are even bigger than tigers, this seems like an excellent case to me! I’d vote for calling the bigger of the two progenitor black holes GW170729-tiger, the smaller GW170729-lion, and the final black hole GW17-729-liger.

Suggestions for other nicknames are welcome, leave your ideas in the comments.

August 2017—Something fishy or just Poisson statistics?

The final few weeks of O2 were exhausting. I was trying to write job applications at the time, and each time I sat down to work on my research proposal, my phone went off with another alert. You may be wondering about was special about August. Some have hypothesised that it is because Aaron Zimmerman, my partner for the analysis of GW170104, was on the Parameter Estimation rota to analyse the last few weeks of O2. The legend goes that Aaron is especially lucky as he was bitten by a radioactive Leprechaun. I can neither confirm nor deny this. However, I make a point of playing any lottery numbers suggested by him.

A slightly more mundane explanation is that August was when the detectors were running nice and stably. They were observing for a large fraction of the time. LIGO Livingston reached its best sensitivity at this time, although it was less happy for Hanford. We often quantify the sensitivity of our detectors using their binary neutron star range, the average distance they could see a binary neutron star system with a signal-to-noise ratio of 8. If this increases by a factor of 2, you can see twice as far, which means you survey 8 times the volume. This cubed factor means even small improvements can have a big impact. The LIGO Livingston range peak a little over $100~\mathrm{Mpc}$. We’re targeting at least $120~\mathrm{Mpc}$ for O3, so August 2017 gives an indication of what you can expect.

Binary neutron star range for the instruments across O2. The break around week 3 was for the holidays (We did work Christmas 2015). The break at week 23 was to tune-up the instruments, and clean the mirrors. At week 31 there was an earthquake in Montana, and the Hanford sensitivity didn’t recover by the end of the run. Part of Fig. 1 of the O2 Catalogue Paper.

Of course, in the case of GW170817, we just got lucky.

Sign errors

GW170809 was the first event we identified with Virgo after it joined observing. The signal in Virgo is very quiet. We actually got better results when we flipped the sign of the Virgo data. We were just starting to get paranoid when GW170814 came along and showed us that everything was set up right at Virgo. When I get some time, I’d like to investigate how often this type of confusion happens for quiet signals.

SEOBNRv3

One of the waveforms, which includes the most complete prescription of the precession of the spins of the black holes, we use in our analysis goes by the technical name of SEOBNRv3. It is extremely computationally expensive. Work has been done to improve that, but this hasn’t been implemented in our reviewed codes yet. We managed to complete an analysis for the GW170104 Discovery Paper, which was a huge effort. I said then to not expect it for all future events. We did it for all the black holes, even for the lowest mass sources which have the longest signals. I was responsible for GW151226 runs (as well as GW170104) and I started these back at the start of the summer. Eve Chase put in a heroic effort to get GW170608 results, we pulled out all the stops for that.

Thanksgiving

I have recently enjoyed my first Thanksgiving in the US. I was lucky enough to be hosted for dinner by Shane Larson and his family (and cats). I ate so much I thought I might collapse to a black hole. Apparently, a Thanksgiving dinner can be 3000–4500 calories. That sounds like a lot, but the merger of GW170729 would have emitted about $5 \times 10^{40}$ times more energy. In conclusion, I don’t need to go on a diet.

Confession

We cheated a little bit in calculating the rates. Roughly speaking, the merger rate is given by

$\displaystyle R = \frac{N}{\langle VT\rangle}$,

where $N$ is the number of detections and $\langle VT\rangle$ is the amount of volume and time we’ve searched. You expect to detect more events if you increase the sensitivity of the detectors (and hence $V$), or observer for longer (and hence increase $T$). In our calculation, we included GW170608 in $N$, even though it was found outside of standard observing time. Really, we should increase $\langle VT\rangle$ to factor in the extra time outside of standard observing time when we could have made a detection. This is messy to calculate though, as there’s not really a good way to check this. However, it’s only a small fraction of the time (so the extra $T$ should be small), and for much of the sensitivity of the detectors will be poor (so $V$ will be small too). Therefore, we estimated any bias from neglecting this is smaller than our uncertainty from the calibration of the detectors, and not worth worrying about.

New sources

We saw our first binary black hole shortly after turning on the Advanced LIGO detectors. We saw our first binary neutron star shortly after turning on the Advanced Virgo detector. My money is therefore on our first neutron star–black hole binary shortly after we turn on the KAGRA detector. Because science…

Dirichlet Process Gaussian-mixture model: An application to localizing coalescing binary neutron stars with gravitational-wave observations

Where do gravitational waves like GW170817 come from? Using our network of detectors, we cannot pinpoint a source, but we can make a good estimate—the amplitude of the signal tells us about the distance; the time delay between the signal arriving at different detectors, and relative amplitudes of the signal in different detectors tells us about the sky position (see the excellent video by Leo Singer below).

In this paper we look at full three-dimensional localization of gravitational-wave sources; we important a (rather cunning) technique from computer vision to construct a probability distribution for the source’s location, and then explore how well we could localise a set of simulated binary neutron stars. Knowing the source location enables lots of cool science. First, it aids direct follow-up observations with non-gravitational-wave observatories, searching for electromagnetic or neutrino counterparts. It’s especially helpful if you can cross-reference with galaxy catalogues, to find the most probable source locations (this technique was used to find the kilonova associated with GW170817). Even without finding a counterpart, knowing the most probable host galaxy helps us figure out how the source formed (have lots of stars been born recently, or are all the stars old?), and allows us measure the expansion of the Universe. Having a reliable technique to reconstruct source locations is useful!

This was a fun paper to write [bonus note]. I’m sure it will be valuable, both for showing how to perform this type of reconstruction of a multi-dimensional probability density, and for its implications for source localization and follow-up of gravitational-wave signals. I go into details of both below, first discussing our statistical model (this is a bit technical), then looking at our results for a set of binary neutron stars (which have implications for hunting for counterparts) .

Dirichlet process Gaussian mixture model

When we analyse gravitational-wave data to infer the source properties (location, masses, etc.), we map out parameter space with a set of samples: a list of points in the parameter space, with there being more around more probable locations and fewer in less probable locations. These samples encode everything about the probability distribution for the different parameters, we just need to extract it…

For our application, we want a nice smooth probability density. How do we convert a bunch of discrete samples to a smooth distribution? The simplest thing is to bin the samples. However, picking the right bin size is difficult, and becomes much harder in higher dimensions. Another popular option is to use kernel density estimation. This is better at ensuring smooth results, but you now have to worry about the size of your kernels.

Our approach is in essence to use a kernel density estimate, but to learn the size and position of the kernels (as well as the number) from the data as an extra layer of inference. The “Gaussian mixture model” part of the name refers to the kernels—we use several different Gaussians. The “Dirichlet process” part refers to how we assign their properties (their means and standard deviations). What I really like about this technique, as opposed to the usual rule-of-thumb approaches used for kernel density estimation,  is that it is well justified from a theoretical point of view.

I hadn’t come across a Dirchlet process before. Section 2 of the paper is a walkthrough of how I built up an understanding of this mathematical object, and it contains lots of helpful references if you’d like to dig deeper.

In our application, you can think of the Dirichlet process as being a probability distribution for probability distributions. We want a probability distribution describing the source location. Given our samples, we infer what this looks like. We could put all the probability into one big Gaussian, or we could put it into lots of little Gaussians. The Gaussians could be wide or narrow or a mix. The Dirichlet distribution allows us to assign probabilities to each configuration of Gaussians; for example, if our samples are all in the northern hemisphere, we probably want Gaussians centred around there, rather than in the southern hemisphere.

With the resulting probability distribution for the source location, we can quickly evaluate it at a single point. This means we can rapidly produce a list of most probable source galaxies—extremely handy if you need to know where to point a telescope before a kilonova fades away (or someone else finds it).

Gravitational-wave localization

To verify our technique works, and develop an intuition for three-dimensional localizations, we used studied a set of simulated binary neutron star signals created for the First 2 Years trilogy of papers. This data set is well studied now, it illustrates performance it what we anticipated to be the first two observing runs of the advanced detectors, which turned out to be not too far from the truth. We have previously looked at three-dimensional localizations for these signals using a super rapid approximation.

The plots below show how well we could localise the sources of our binary neutron star sources. Specifically, the plots show the size of the volume which has a 90% probability of containing the source verses the signal-to-noise ratio (the loudness) of the signal. Typically, volumes are $10^4$$10^5~\mathrm{Mpc}^3$, which is about $10^{68}$$10^{69}$ Olympic swimming pools. Such a volume would contain something like $100$$1000$ galaxies.

Localization volume as a function of signal-to-noise ratio. The top panel shows results for two-detector observations: the LIGO-Hanford and LIGO-Livingston (HL) network similar to in the first observing run, and the LIGO and Virgo (HLV) network similar to the second observing run. The bottom panel shows all observations for the HLV network including those with all three detectors which are colour coded by the fraction of the total signal-to-noise ratio from Virgo. In both panels, there are fiducial lines scaling inversely with the sixth power of the signal-to-noise ratio. Adapted from Fig. 4 of Del Pozzo et al. (2018).

Looking at the results in detail, we can learn a number of things

1. The localization volume is roughly inversely proportional to the sixth power of the signal-to-noise ratio [bonus note]. Loud signals are localized much better than quieter ones!
2. The localization dramatically improves when we have three-detector observations. The extra detector improves the sky localization, which reduces the localization volume.
3. To get the benefit of the extra detector, the source needs to be close enough that all the detectors could get a decent amount of the signal-to-noise ratio. In our case, Virgo is the least sensitive, and we see the the best localizations are when it has a fair share of the signal-to-noise ratio.
4. Considering the cases where we only have two detectors, localization volumes get bigger at a given signal-to-noise ration as the detectors get more sensitive. This is because we can detect sources at greater distances.

Putting all these bits together, I think in the future, when we have lots of detections, it would make most sense to prioritise following up the loudest signals. These are the best localised, and will also be the brightest since they are the closest, meaning there’s the greatest potential for actually finding a counterpart. As the sensitivity of the detectors improves, its only going to get more difficult to find a counterpart to a typical gravitational-wave signal, as sources will be further away and less well localized. However, having more sensitive detectors also means that we are more likely to have a really loud signal, which should be really well localized.

Left: Localization (yellow) with a network of two low-sensitivity detectors. The sky location is uncertain, but we know the source must be nearby. Right: Localization (green) with a network of three high-sensitivity detectors. We have good constraints on the source location, but it could now be at a much greater range of distances. Not to scale.

Using our localization volumes as a guide, you would only need to search one galaxy to find the true source in about 7% of cases with a three-detector network similar to at the end of our second observing run. Similarly, only ten would need to be searched in 23% of cases. It might be possible to get even better performance by considering which galaxies are most probable because they are the biggest or the most likely to produce merging binary neutron stars. This is definitely a good approach to follow.

Galaxies within the 90% credible volume of an example simulated source, colour coded by probability. The galaxies are from the GLADE Catalog; incompleteness in the plane of the Milky Way causes the missing wedge of galaxies. The true source location is marked by a cross [bonus note]. Part of Figure 5 of Del Pozzo et al. (2018).

arXiv: 1801.08009 [astro-ph.IM]
Journal: Monthly Notices of the Royal Astronomical Society; 479(1):601–614; 2018
Code: 3d_volume
Buzzword bingo: Interdisciplinary (we worked with computer scientist Tom Haines); machine learning (the inference involving our Dirichlet process Gaussian mixture model); multimessenger astronomy (as our results are useful for following up gravitational-wave signals in the search for counterparts)

Bonus notes

Writing

We started writing this paper back before the first observing run of Advanced LIGO. We had a pretty complete draft on Friday 11 September 2015. We just needed to gather together a few extra numbers and polish up the figures and we’d be done! At 10:50 am on Monday 14 September 2015, we made our first detection of gravitational waves. The paper was put on hold. The pace of discoveries over the coming years meant we never quite found enough time to get it together—I’ve rewritten the introduction a dozen times. It’s extremely satisfying to have it done. This is a shame, as it meant that this study came out much later than our other three-dimensional localization study. The delay has the advantage of justifying one of my favourite acknowledgement sections.

Sixth power

We find that the localization volume $\Delta V$ is inversely proportional to the sixth power of the signal-to-noise ration $\varrho$. This is what you would expect. The localization volume depends upon the angular uncertainty on the sky $\Delta \Omega$, the distance to the source $D$, and the distance uncertainty $\Delta D$,

$\Delta V \sim D^2 \Delta \Omega \Delta D$.

Typically, the uncertainty on a parameter (like the masses) scales inversely with the signal-to-noise ratio. This is the case for the logarithm of the distance, which means

$\displaystyle \frac{\Delta D}{D} \propto \varrho^{-1}$.

The uncertainty in the sky location (being two dimensional) scales inversely with the square of the signal-to-noise ration,

$\Delta \Omega \propto \varrho^{-2}$.

The signal-to-noise ratio itself is inversely proportional to the distance to the source (sources further way are quieter. Therefore, putting everything together gives

$\Delta V \propto \varrho^{-6}$.

Treasure

We all know that treasure is marked by a cross. In the case of a binary neutron star merger, dense material ejected from the neutron stars will decay to heavy elements like gold and platinum, so there is definitely a lot of treasure at the source location.