Dirichlet Process Gaussian-mixture model: An application to localizing coalescing binary neutron stars with gravitational-wave observations

Where do gravitational waves like GW170817 come from? Using our network of detectors, we cannot pinpoint a source, but we can make a good estimate—the amplitude of the signal tells us about the distance; the time delay between the signal arriving at different detectors, and relative amplitudes of the signal in different detectors tells us about the sky position (see the excellent video by Leo Singer below).

In this paper we look at full three-dimensional localization of gravitational-wave sources; we important a (rather cunning) technique from computer vision to construct a probability distribution for the source’s location, and then explore how well we could localise a set of simulated binary neutron stars. Knowing the source location enables lots of cool science. First, it aids direct follow-up observations with non-gravitational-wave observatories, searching for electromagnetic or neutrino counterparts. It’s especially helpful if you can cross-reference with galaxy catalogues, to find the most probable source locations (this technique was used to find the kilonova associated with GW170817). Even without finding a counterpart, knowing the most probable host galaxy helps us figure out how the source formed (have lots of stars been born recently, or are all the stars old?), and allows us measure the expansion of the Universe. Having a reliable technique to reconstruct source locations is useful!

This was a fun paper to write [bonus note]. I’m sure it will be valuable, both for showing how to perform this type of reconstruction of a multi-dimensional probability density, and for its implications for source localization and follow-up of gravitational-wave signals. I go into details of both below, first discussing our statistical model (this is a bit technical), then looking at our results for a set of binary neutron stars (which have implications for hunting for counterparts) .

Dirichlet process Gaussian mixture model

When we analyse gravitational-wave data to infer the source properties (location, masses, etc.), we map out parameter space with a set of samples: a list of points in the parameter space, with there being more around more probable locations and fewer in less probable locations. These samples encode everything about the probability distribution for the different parameters, we just need to extract it…

For our application, we want a nice smooth probability density. How do we convert a bunch of discrete samples to a smooth distribution? The simplest thing is to bin the samples. However, picking the right bin size is difficult, and becomes much harder in higher dimensions. Another popular option is to use kernel density estimation. This is better at ensuring smooth results, but you now have to worry about the size of your kernels.

Our approach is in essence to use a kernel density estimate, but to learn the size and position of the kernels (as well as the number) from the data as an extra layer of inference. The “Gaussian mixture model” part of the name refers to the kernels—we use several different Gaussians. The “Dirichlet process” part refers to how we assign their properties (their means and standard deviations). What I really like about this technique, as opposed to the usual rule-of-thumb approaches used for kernel density estimation,  is that it is well justified from a theoretical point of view.

I hadn’t come across a Dirchlet process before. Section 2 of the paper is a walkthrough of how I built up an understanding of this mathematical object, and it contains lots of helpful references if you’d like to dig deeper.

In our application, you can think of the Dirichlet process as being a probability distribution for probability distributions. We want a probability distribution describing the source location. Given our samples, we infer what this looks like. We could put all the probability into one big Gaussian, or we could put it into lots of little Gaussians. The Gaussians could be wide or narrow or a mix. The Dirichlet distribution allows us to assign probabilities to each configuration of Gaussians; for example, if our samples are all in the northern hemisphere, we probably want Gaussians centred around there, rather than in the southern hemisphere.

With the resulting probability distribution for the source location, we can quickly evaluate it at a single point. This means we can rapidly produce a list of most probable source galaxies—extremely handy if you need to know where to point a telescope before a kilonova fades away (or someone else finds it).

Gravitational-wave localization

To verify our technique works, and develop an intuition for three-dimensional localizations, we used studied a set of simulated binary neutron star signals created for the First 2 Years trilogy of papers. This data set is well studied now, it illustrates performance it what we anticipated to be the first two observing runs of the advanced detectors, which turned out to be not too far from the truth. We have previously looked at three-dimensional localizations for these signals using a super rapid approximation.

The plots below show how well we could localise the sources of our binary neutron star sources. Specifically, the plots show the size of the volume which has a 90% probability of containing the source verses the signal-to-noise ratio (the loudness) of the signal. Typically, volumes are 10^410^5~\mathrm{Mpc}^3, which is about 10^{68}10^{69} Olympic swimming pools. Such a volume would contain something like 1001000 galaxies.

Volume verses signal-to-noise ratio

Localization volume as a function of signal-to-noise ratio. The top panel shows results for two-detector observations: the LIGO-Hanford and LIGO-Livingston (HL) network similar to in the first observing run, and the LIGO and Virgo (HLV) network similar to the second observing run. The bottom panel shows all observations for the HLV network including those with all three detectors which are colour coded by the fraction of the total signal-to-noise ratio from Virgo. In both panels, there are fiducial lines scaling inversely with the sixth power of the signal-to-noise ratio. Adapted from Fig. 4 of Del Pozzo et al. (2018).

Looking at the results in detail, we can learn a number of things

  1. The localization volume is roughly inversely proportional to the sixth power of the signal-to-noise ratio [bonus note]. Loud signals are localized much better than quieter ones!
  2. The localization dramatically improves when we have three-detector observations. The extra detector improves the sky localization, which reduces the localization volume.
  3. To get the benefit of the extra detector, the source needs to be close enough that all the detectors could get a decent amount of the signal-to-noise ratio. In our case, Virgo is the least sensitive, and we see the the best localizations are when it has a fair share of the signal-to-noise ratio.
  4. Considering the cases where we only have two detectors, localization volumes get bigger at a given signal-to-noise ration as the detectors get more sensitive. This is because we can detect sources at greater distances.

Putting all these bits together, I think in the future, when we have lots of detections, it would make most sense to prioritise following up the loudest signals. These are the best localised, and will also be the brightest since they are the closest, meaning there’s the greatest potential for actually finding a counterpart. As the sensitivity of the detectors improves, its only going to get more difficult to find a counterpart to a typical gravitational-wave signal, as sources will be further away and less well localized. However, having more sensitive detectors also means that we are more likely to have a really loud signal, which should be really well localized.

Banana vs cucumber

Left: Localization (yellow) with a network of two low-sensitivity detectors. The sky location is uncertain, but we know the source must be nearby. Right: Localization (green) with a network of three high-sensitivity detectors. We have good constraints on the source location, but it could now be at a much greater range of distances. Not to scale.

Using our localization volumes as a guide, you would only need to search one galaxy to find the true source in about 7% of cases with a three-detector network similar to at the end of our second observing run. Similarly, only ten would need to be searched in 23% of cases. It might be possible to get even better performance by considering which galaxies are most probable because they are the biggest or the most likely to produce merging binary neutron stars. This is definitely a good approach to follow.

Three-dimensional localization with galaxy catalgoue

Galaxies within the 90% credible volume of an example simulated source, colour coded by probability. The galaxies are from the GLADE Catalog; incompleteness in the plane of the Milky Way causes the missing wedge of galaxies. The true source location is marked by a cross [bonus note]. Part of Figure 5 of Del Pozzo et al. (2018).

arXiv: 1801.08009 [astro-ph.IM]
Journal: Monthly Notices of the Royal Astronomical Society; 479(1):601–614; 2018
Code: 3d_volume
Buzzword bingo: Interdisciplinary (we worked with computer scientist Tom Haines); machine learning (the inference involving our Dirichlet process Gaussian mixture model); multimessenger astronomy (as our results are useful for following up gravitational-wave signals in the search for counterparts)

Bonus notes

Writing

We started writing this paper back before the first observing run of Advanced LIGO. We had a pretty complete draft on Friday 11 September 2015. We just needed to gather together a few extra numbers and polish up the figures and we’d be done! At 10:50 am on Monday 14 September 2015, we made our first detection of gravitational waves. The paper was put on hold. The pace of discoveries over the coming years meant we never quite found enough time to get it together—I’ve rewritten the introduction a dozen times. It’s extremely satisfying to have it done. This is a shame, as it meant that this study came out much later than our other three-dimensional localization study. The delay has the advantage of justifying one of my favourite acknowledgement sections.

Sixth power

We find that the localization volume \Delta V is inversely proportional to the sixth power of the signal-to-noise ration \varrho. This is what you would expect. The localization volume depends upon the angular uncertainty on the sky \Delta \Omega, the distance to the source D, and the distance uncertainty \Delta D,

\Delta V \sim D^2 \Delta \Omega \Delta D.

Typically, the uncertainty on a parameter (like the masses) scales inversely with the signal-to-noise ratio. This is the case for the logarithm of the distance, which means

\displaystyle \frac{\Delta D}{D} \propto \varrho^{-1}.

The uncertainty in the sky location (being two dimensional) scales inversely with the square of the signal-to-noise ration,

\Delta \Omega \propto \varrho^{-2}.

The signal-to-noise ratio itself is inversely proportional to the distance to the source (sources further way are quieter. Therefore, putting everything together gives

\Delta V \propto \varrho^{-6}.

Treasure

We all know that treasure is marked by a cross. In the case of a binary neutron star merger, dense material ejected from the neutron stars will decay to heavy elements like gold and platinum, so there is definitely a lot of treasure at the source location.

Advertisement

On symmetry

Dave Green only combs half of his beard, the rest follows by symmetry. — Dave Green Facts

Physicists love symmetry! Using symmetry can dramatically simplify a problem. The concept of symmetry is at the heart of modern theoretical physics and some of the most beautiful of scientific results.

In this post, I’ll give a brief introduction to how physicists think about symmetry. Symmetry can be employed in a number of ways when tackling a problem; we’ll have a look at how they can help you ask the right question and then check that your answer makes sense. In a future post I hope to talk about Noether’s Theorem, my all-time favourite result in theoretical physics, which is deeply entwined with the concept of symmetry. First, we shall discuss what we mean when we talk about symmetry.

What is symmetry?

We say something is symmetric with respect to a particular operation if it is unchanged after that operation. That might sound rather generic, but that’s because the operation can be practically anything. Let’s consider a few examples:

  • Possibly the most familiar symmetry would be reflection symmetry, when something is identical to its mirror image. Something has reflection symmetry if it is invariant under switching left and right. Squares have reflection symmetry along lines in the middle of their sides and along their diagonals, rectangles only have reflection symmetry along the lines in the middle of their sides, and circles have reflection symmetry through any line that goes through their centre.
    The Star Trek Mirror Universe actually does not have reflection symmetry with our own Universe. First, they switch good and evil, rather than left and right, and second, after this transformation, we can tell the two universes apart by checking Spock’s beard.
  • Rotational symmetry is when an object is identical after being rotated. Squares are the same after a 90° rotation, rectangles are the same after a 180° rotation, and circles are the same after a rotation by any angle. There is a link between the rotational symmetry of these shapes and their mirror symmetry: you can combine two reflections to make a rotation. With rotations we have seen that symmetries can either be discrete, as for a square when we have to rotate by multiples of 90°, or continuous, as for the circle where we can pick any angle we like.
  • Translational symmetry is similar to rotational symmetry, but is when an object is the same when shifted along a particular direction. This could be a spatial direction, so shifting everything to the left, or in time. This are a little more difficult to apply to the real world than the simplified models that physicists like to imagine.
    For translational invariance, imagine an infinite, flat plane, the same in all directions. This would be translational invariant in any direction parallel to the ground. It would be a terrible place to lose your keys. If you can imagine an infinite blob of tangerine jelly, that is entirely the same in all directions, we can translate in any direction we like. We think the Universe is pretty much like this on the largest scales (where details like galaxies are no longer important), except, it’s not quite as delicious.
    The backgrounds in some Scooby-Doo cartoons show periodic translational invariance: they repeat on a loop, so if you translate by the right amount they are the same. This is a discrete symmetry, just like rotating my a fixed angle. Similarly, if you have a rigid daily routine, such that you do the same thing at the same time every day, then your schedule is symmetric with respect to a time translation of 24 hours.
  • Exchange symmetry is when you can swap two (or more) things. If you are building a LEGO model, you can switch two bricks of the same size and colour and end up with the same result, hence it is symmetric under the exchange of those bricks. The idea that we have the same physical system when we swap two particles, like two electrons, is important in quantum mechanics. In my description of translational symmetry, I could have equally well have used lime jelly instead of tangerine, or even strawberry, hence the argument is symmetric under exchange of flavours. The symmetry is destroyed should we eat the infinite jelly Universe (we might also get stomach ache).
    Mario and Luigi are not symmetric under exchange, as anyone who has tried to play multiplayer Super Mario Bros. will know, as Luigi is the better jumper and has the better moustache.

There are lots more potential symmetries. Some used by physicists seem quite obscure, such as Lorentz symmetry, but the important thing to remember is that a symmetry of a system means we get the same thing back after a transformation.

Sometimes we consider approximate symmetries, when something is almost the same under a transformation. Coke and Pepsi are approximately exchange symmetric: try switching them for yourself. They are similar, but it is possible to tell them apart. The Earth has approximate rotational symmetry, but it is not exact as it is lumpy. The spaceship at the start of Spaceballs has approximate translational invariance: it just keeps going and going, but the symmetry is not exact as it does end eventually, so the symmetry only applies to the middle.

How to use symmetry

When studying for an undergraduate degree in physics, one of the first things you come to appreciate is that some coordinate systems make problems much easier than others. Coordinates are the set of numbers that describe a position in some space. The most familiar are Cartesian coordinates, when you use x and y to describe horizontal and vertical position respectively. Cartesian coordinates give you a nice grid with everything at right-angles. Undergrad students often like to stick with Cartesian coordinates as they are straight-forward and familiar. However, they can be a pain when describing a circle. If we want to plot a line five units from the origin of of coordinate system (0,\,0), we have to solve \sqrt{x^2 + y^2} = 5. However, if we used a polar coordinate system, it would simply be r = 5. By using coordinates that match the symmetry of our system we greatly simplify the problem!

Treasure map

Pirates are trying to figure out where they buried their treasure. They know it’s 5 yarrrds from the doughnut. Calculating positions using Cartesian coordinates is difficult, but they are good for specifying specific locations, like of the palm tree.

Treasure map

Using polar coordinates, it is easy to specify the location of points 5 yarrrds from the doughnut. Pirates prefer using the polar coordinates, they really like using r.

Picking a coordinate system for a problem should depend on the symmetries of the system. If we had a system that was translation invariant, Cartesian coordinates are the best to use. If the system was invariant with respect to translation in the horizontal direction, then we know that our answer should not depend on x. If we have a system that is rotation invariant, polar coordinates are the best, as we should get an answer that doesn’t depend on the rotation angle \varphi. By understanding symmetries, we can formulate our analysis of the problem such that we ask the best questions.

At the end of my undergrad degree, my friends and I went along to an awards ceremony. I think we were hoping they’d have the miniature éclairs they normally had for special occasions. There was a chap from an evil corporation™ giving away branded clocks, that apparently ran on water. We were fairly convinced there was more to it than that, so, as now fully qualified physicists, we though we should able to figure it out. We quickly came up with two ideas: that there was some powder inside the water tank that reacted with the water to produce energy, or that the electrodes reacted in a similar way to in a potato clock. We then started to argue about how to figure this out. At this point, Peter Littlewood, then head of the Cavendish Laboratory, wandered over. We explained the problem, but not our ideas. Immediately, he said that it must be to do with the electrodes due to symmetry. Current flows to power the clock. It’ll either flow left to right through the tank, or right to left. It doesn’t matter which, but the important thing is the clock can’t have reflection symmetry. If it did, there would be no preferred direction for the current to flow. To break the symmetry, the two (similar looking) electrodes must actually be different (and hence the potato clock theory is along the right lines). My friends and I all felt appropriately impressed and humbled, but it served as a good reminder that a simple concept like symmetry can be a powerful tool.

A concept I now try to impress upon my students, is to use symmetry to guide their answers. Most are happy enough to use symmetry for error checking: if the solution is meant to have rotational symmetry and their answer depends on \varphi they know they’ve made a mistake. However, symmetry can sometimes directly tell you the answer.

Lets imagine that you’ve baked a perfectly doughnut, such that it has rotational symmetry. For some reason you sprinkled it with an even coating of electrons instead of hundreds and thousands. We now want to calculate the electric field surrounding the doughnut (for obvious reasons). The electric field tells us which way charges are pushed/pulled. We’d expect positive charges to be attracted towards our negatively charged doughnut. There should be a radial electric field to pull positive charges in, but since it has rotational symmetry, there shouldn’t be any field in the \varphi direction, as there’s now reason for charges to be pulled clockwise or anticlockwise round our doughnut. Therefore, we should be able to write down immediately that the electric field in the \varphi direction is zero, by symmetry.

Most undergrads, though, will feel that this is cheating, and will often attempt to do all the algebra (hopefully using polar coordinates). Some will get this wrong, although there might be a few who are smart enough to note that their answer must be incorrect because of the symmetry. If symmetry tells you the answer, use it! Although it is good to practise your algebra (you get better by training), you can’t learn anything more than you already knew by symmetry. Working efficiently isn’t cheating, it’s smart.

Symmetry is a useful tool for problem solving, and something that everyone should make use of.