Dirichlet Process Gaussian-mixture model: An application to localizing coalescing binary neutron stars with gravitational-wave observations

Where do gravitational waves like GW170817 come from? Using our network of detectors, we cannot pinpoint a source, but we can make a good estimate—the amplitude of the signal tells us about the distance; the time delay between the signal arriving at different detectors, and relative amplitudes of the signal in different detectors tells us about the sky position (see the excellent video by Leo Singer below).

In this paper we look at full three-dimensional localization of gravitational-wave sources; we important a (rather cunning) technique from computer vision to construct a probability distribution for the source’s location, and then explore how well we could localise a set of simulated binary neutron stars. Knowing the source location enables lots of cool science. First, it aids direct follow-up observations with non-gravitational-wave observatories, searching for electromagnetic or neutrino counterparts. It’s especially helpful if you can cross-reference with galaxy catalogues, to find the most probable source locations (this technique was used to find the kilonova associated with GW170817). Even without finding a counterpart, knowing the most probable host galaxy helps us figure out how the source formed (have lots of stars been born recently, or are all the stars old?), and allows us measure the expansion of the Universe. Having a reliable technique to reconstruct source locations is useful!

This was a fun paper to write [bonus note]. I’m sure it will be valuable, both for showing how to perform this type of reconstruction of a multi-dimensional probability density, and for its implications for source localization and follow-up of gravitational-wave signals. I go into details of both below, first discussing our statistical model (this is a bit technical), then looking at our results for a set of binary neutron stars (which have implications for hunting for counterparts) .

Dirichlet process Gaussian mixture model

When we analyse gravitational-wave data to infer the source properties (location, masses, etc.), we map out parameter space with a set of samples: a list of points in the parameter space, with there being more around more probable locations and fewer in less probable locations. These samples encode everything about the probability distribution for the different parameters, we just need to extract it…

For our application, we want a nice smooth probability density. How do we convert a bunch of discrete samples to a smooth distribution? The simplest thing is to bin the samples. However, picking the right bin size is difficult, and becomes much harder in higher dimensions. Another popular option is to use kernel density estimation. This is better at ensuring smooth results, but you now have to worry about the size of your kernels.

Our approach is in essence to use a kernel density estimate, but to learn the size and position of the kernels (as well as the number) from the data as an extra layer of inference. The “Gaussian mixture model” part of the name refers to the kernels—we use several different Gaussians. The “Dirichlet process” part refers to how we assign their properties (their means and standard deviations). What I really like about this technique, as opposed to the usual rule-of-thumb approaches used for kernel density estimation,  is that it is well justified from a theoretical point of view.

I hadn’t come across a Dirchlet process before. Section 2 of the paper is a walkthrough of how I built up an understanding of this mathematical object, and it contains lots of helpful references if you’d like to dig deeper.

In our application, you can think of the Dirichlet process as being a probability distribution for probability distributions. We want a probability distribution describing the source location. Given our samples, we infer what this looks like. We could put all the probability into one big Gaussian, or we could put it into lots of little Gaussians. The Gaussians could be wide or narrow or a mix. The Dirichlet distribution allows us to assign probabilities to each configuration of Gaussians; for example, if our samples are all in the northern hemisphere, we probably want Gaussians centred around there, rather than in the southern hemisphere.

With the resulting probability distribution for the source location, we can quickly evaluate it at a single point. This means we can rapidly produce a list of most probable source galaxies—extremely handy if you need to know where to point a telescope before a kilonova fades away (or someone else finds it).

Gravitational-wave localization

To verify our technique works, and develop an intuition for three-dimensional localizations, we used studied a set of simulated binary neutron star signals created for the First 2 Years trilogy of papers. This data set is well studied now, it illustrates performance it what we anticipated to be the first two observing runs of the advanced detectors, which turned out to be not too far from the truth. We have previously looked at three-dimensional localizations for these signals using a super rapid approximation.

The plots below show how well we could localise the sources of our binary neutron star sources. Specifically, the plots show the size of the volume which has a 90% probability of containing the source verses the signal-to-noise ratio (the loudness) of the signal. Typically, volumes are 10^410^5~\mathrm{Mpc}^3, which is about 10^{68}10^{69} Olympic swimming pools. Such a volume would contain something like 1001000 galaxies.

Volume verses signal-to-noise ratio

Localization volume as a function of signal-to-noise ratio. The top panel shows results for two-detector observations: the LIGO-Hanford and LIGO-Livingston (HL) network similar to in the first observing run, and the LIGO and Virgo (HLV) network similar to the second observing run. The bottom panel shows all observations for the HLV network including those with all three detectors which are colour coded by the fraction of the total signal-to-noise ratio from Virgo. In both panels, there are fiducial lines scaling inversely with the sixth power of the signal-to-noise ratio. Adapted from Fig. 4 of Del Pozzo et al. (2018).

Looking at the results in detail, we can learn a number of things

  1. The localization volume is roughly inversely proportional to the sixth power of the signal-to-noise ratio [bonus note]. Loud signals are localized much better than quieter ones!
  2. The localization dramatically improves when we have three-detector observations. The extra detector improves the sky localization, which reduces the localization volume.
  3. To get the benefit of the extra detector, the source needs to be close enough that all the detectors could get a decent amount of the signal-to-noise ratio. In our case, Virgo is the least sensitive, and we see the the best localizations are when it has a fair share of the signal-to-noise ratio.
  4. Considering the cases where we only have two detectors, localization volumes get bigger at a given signal-to-noise ration as the detectors get more sensitive. This is because we can detect sources at greater distances.

Putting all these bits together, I think in the future, when we have lots of detections, it would make most sense to prioritise following up the loudest signals. These are the best localised, and will also be the brightest since they are the closest, meaning there’s the greatest potential for actually finding a counterpart. As the sensitivity of the detectors improves, its only going to get more difficult to find a counterpart to a typical gravitational-wave signal, as sources will be further away and less well localized. However, having more sensitive detectors also means that we are more likely to have a really loud signal, which should be really well localized.

Banana vs cucumber

Left: Localization (yellow) with a network of two low-sensitivity detectors. The sky location is uncertain, but we know the source must be nearby. Right: Localization (green) with a network of three high-sensitivity detectors. We have good constraints on the source location, but it could now be at a much greater range of distances. Not to scale.

Using our localization volumes as a guide, you would only need to search one galaxy to find the true source in about 7% of cases with a three-detector network similar to at the end of our second observing run. Similarly, only ten would need to be searched in 23% of cases. It might be possible to get even better performance by considering which galaxies are most probable because they are the biggest or the most likely to produce merging binary neutron stars. This is definitely a good approach to follow.

Three-dimensional localization with galaxy catalgoue

Galaxies within the 90% credible volume of an example simulated source, colour coded by probability. The galaxies are from the GLADE Catalog; incompleteness in the plane of the Milky Way causes the missing wedge of galaxies. The true source location is marked by a cross [bonus note]. Part of Figure 5 of Del Pozzo et al. (2018).

arXiv: 1801.08009 [astro-ph.IM]
Journal: Monthly Notices of the Royal Astronomical Society; 479(1):601–614; 2018
Code: 3d_volume
Buzzword bingo: Interdisciplinary (we worked with computer scientist Tom Haines); machine learning (the inference involving our Dirichlet process Gaussian mixture model); multimessenger astronomy (as our results are useful for following up gravitational-wave signals in the search for counterparts)

Bonus notes


We started writing this paper back before the first observing run of Advanced LIGO. We had a pretty complete draft on Friday 11 September 2015. We just needed to gather together a few extra numbers and polish up the figures and we’d be done! At 10:50 am on Monday 14 September 2015, we made our first detection of gravitational waves. The paper was put on hold. The pace of discoveries over the coming years meant we never quite found enough time to get it together—I’ve rewritten the introduction a dozen times. It’s extremely satisfying to have it done. This is a shame, as it meant that this study came out much later than our other three-dimensional localization study. The delay has the advantage of justifying one of my favourite acknowledgement sections.

Sixth power

We find that the localization volume \Delta V is inversely proportional to the sixth power of the signal-to-noise ration \varrho. This is what you would expect. The localization volume depends upon the angular uncertainty on the sky \Delta \Omega, the distance to the source D, and the distance uncertainty \Delta D,

\Delta V \sim D^2 \Delta \Omega \Delta D.

Typically, the uncertainty on a parameter (like the masses) scales inversely with the signal-to-noise ratio. This is the case for the logarithm of the distance, which means

\displaystyle \frac{\Delta D}{D} \propto \varrho^{-1}.

The uncertainty in the sky location (being two dimensional) scales inversely with the square of the signal-to-noise ration,

\Delta \Omega \propto \varrho^{-2}.

The signal-to-noise ratio itself is inversely proportional to the distance to the source (sources further way are quieter. Therefore, putting everything together gives

\Delta V \propto \varrho^{-6}.


We all know that treasure is marked by a cross. In the case of a binary neutron star merger, dense material ejected from the neutron stars will decay to heavy elements like gold and platinum, so there is definitely a lot of treasure at the source location.


Going the distance: Mapping host galaxies of LIGO and Virgo sources in three dimensions using local cosmography and targeted follow-up

GW150914 claimed the title of many firsts—it was the first direct observation of gravitational waves, the first observation of a binary black hole system, the first observation of two black holes merging, the first time time we’ve tested general relativity in such extreme conditions… However, there are still many firsts for gravitational-wave astronomy yet to come (hopefully, some to be accompanied by cake). One of the most sought after, is the first is signal to have a clear electromagnetic counterpart—a glow in some part of the spectrum of light (from radio to gamma-rays) that we can observe with telescopes.

Identifying a counterpart is challenging, as it is difficult to accurately localise a gravitational-wave source. electromagnetic observers must cover a large area of sky before any counterparts fade. Then, if something is found, it can be hard to determine if that is from the same source as the gravitational waves, or some thing else…

To help the search, it helps to have as much information as possible about the source. Especially useful is the distance to the source. This can help you plan where to look. For nearby sources, you can cross-reference with galaxy catalogues, and perhaps pick out the biggest galaxies as the most likely locations for the source [bonus note]. Distance can also help plan your observations: you might want to start with regions of the sky where the source would be closer and so easiest to spot, or you may want to prioritise points where it is further and so you’d need to observe longer to detect it (I’m not sure there’s a best strategy, it depends on the telescope and the amount of observing time available). In this paper we describe a method to provide easy-to-use distance information, which could be supplied to observers to help their search for a counterpart.

Going the distance

This work is the first spin-off from the First 2 Years trilogy of papers, which looked a sky localization and parameter estimation for binary neutron stars in the first two observing runs of the advance-detector era. Binary neutron star coalescences are prime candidates for electromagnetic counterparts as we think there should be a bigger an explosion as they merge. I was heavily involved in the last two papers of the trilogy, but this study was led by Leo Singer: I think I mostly annoyed Leo by being a stickler when it came to writing up the results.

3D localization with the two LIGO detectors

Three-dimensional localization showing the 20%, 50%, and 90% credible levels for a typical two-detector early Advanced LIGO event. The Earth is shown at the centre, marked by \oplus. The true location is marked by the cross. Leo poetically described this as looking like the seeds of the jacaranda tree, and less poetically as potato chips. Figure 1 of Singer et al. (2016).

The idea is to provide a convenient means of sharing a 3D localization for a gravitational wave source. The full probability distribution is rather complicated, but it can be made more manageable if you break it up into pixels on the sky. Since astronomers need to decide where to point their telescopes, breaking up the 3D information along different lines of sight, should be useful for them.

Each pixel covers a small region of the sky, and along each line of sight, the probability distribution for distance D can be approximated using an ansatz

\displaystyle p(D|\mathrm{data}) \propto D^2\exp\left[-\frac{(D - \mu)^2}{2\sigma}\right],

where \mu and \sigma are calculated for each pixel individually.  The form of this ansatz can be understood as the posterior probability distribution is proportional to the product of the prior and the likelihood. Our prior is that sources are uniformly distributed in volume, which means \propto D^2, and the likelihood can often be well approximated as a Gaussian distribution, which gives the other piece [bonus note].

The ansatz doesn’t always fit perfectly, but it performs well on average. Considering the catalogue of binary neutron star signals used in the earlier papers, we find that roughly 50% of the time sources are found within the 50% credible volume, 90% are found in the 90% volume, etc. We looked at a more sophisticated means of constructing the localization volume in a companion paper.

The 3D localization is easy to calculate, and Leo has worked out a cunning way to evaluate the ansatz with BAYESTAR, our rapid sky-localization code, meaning that we can produce it on minute time-scales. This means that observers should have something to work with straight-away, even if we’ll need to wait a while for the full, final results. We hope that this will improve prospects for finding counterparts—some potential examples are sketched out in the penultimate section of the paper.

If you are interested in trying out the 3D information, there is a data release and the supplement contains a handy Python tutorial. We are hoping that the Collaboration will use the format for alerts for LIGO and Virgo’s upcoming observing run (O2).

arXiv: 1603.07333 [astro-ph.HE]; 1605.04242 [astro-ph.IM]
Journal: Astrophysical Journal Letters; 829(1):L15(7); 2016; Astrophysical Journal Supplement Series; 226(1):10(8); 2016
Data release: Going the distance
Favourite crisp flavour: Salt & vinegar
Favourite jacaranda: Jacaranda mimosifolia

Bonus notes

Catalogue shopping

The Event’s source has a luminosity distance of around 250–570 Mpc. This is sufficiently distant that galaxy catalogues are incomplete and not much use when it comes to searching. GW151226 and LVT151012 have similar problems, being at around the same distance or even further.

The gravitational-wave likelihood

For the professionals interested in understanding more about the shape of the likelihood, I’d recommend Cutler & Flanagan (1994). This is a fantastic paper which contains many clever things [bonus bonus note]. This work is really the foundation of gravitational-wave parameter estimation. From it, you can see how the likelihood can be approximated as a Gaussian. The uncertainty can then be evaluated using Fisher matrices. Many studies have been done using Fisher matrices, but it important to check that this is a valid approximation, as nicely explained in Vallisneri (2008). I ran into a case when it didn’t during my PhD.


As a reminder that smart people make mistakes, Cutler & Flanagan have a typo in the title of arXiv posting of their paper. This is probably the most important thing to take away from this paper.