Accuracy of inference on the physics of binary evolution from gravitational-wave observations

Gravitational-wave astronomy lets us observing binary black holes. These systems, being made up of two black holes, are pretty difficult to study by any other means. It has long been argued that with this new information we can unravel the mysteries of stellar evolution. Just as a palaeontologist can discover how long-dead animals lived from their bones, we can discover how massive stars lived by studying their black hole remnants. In this paper, we quantify how much we can really learn from this black hole palaeontology—after 1000 detections, we should pin down some of the most uncertain parameters in binary evolution to a few percent precision.

Life as a binary

There are many proposed ways of making a binary black hole. The current leading contender is isolated binary evolution: start with a binary star system (most stars are in binaries or higher multiples, our lonesome Sun is a little unusual), and let the stars evolve together. Only a fraction will end with black holes close enough to merge within the age of the Universe, but these would be the sources of the signals we see with LIGO and Virgo. We consider this isolated binary scenario in this work [bonus note].

Now, you might think that with stars being so fundamentally important to astronomy, and with binary stars being so common, we’d have the evolution of binaries figured out by now. It turns out it’s actually pretty messy, so there’s lots of work to do. We consider constraining four parameters which describe the bits of binary physics which we are currently most uncertain of:

  • Black hole natal kicks—the push black holes receive when they are born in supernova explosions. We now the neutron stars get kicks, but we’re less certain for black holes [bonus note].
  • Common envelope efficiency—one of the most intricate bits of physics about binaries is how mass is transferred between stars. As they start exhausting their nuclear fuel they puff up, so material from the outer envelope of one star may be stripped onto the other. In the most extreme cases, a common envelope may form, where so much mass is piled onto the companion, that both stars live in a single fluffy envelope. Orbiting inside the envelope helps drag the two stars closer together, bringing them closer to merging. The efficiency determines how quickly the envelope becomes unbound, ending this phase.
  • Mass loss rates during the Wolf–Rayet (not to be confused with Wolf 359) and luminous blue variable phases–stars lose mass through out their lives, but we’re not sure how much. For stars like our Sun, mass loss is low, there is enough to gives us the aurora, but it doesn’t affect the Sun much. For bigger and hotter stars, mass loss can be significant. We consider two evolutionary phases of massive stars where mass loss is high, and currently poorly known. Mass could be lost in clumps, rather than a smooth stream, making it difficult to measure or simulate.

We use parameters describing potential variations in these properties are ingredients to the COMPAS population synthesis code. This rapidly (albeit approximately) evolves a population of stellar binaries to calculate which will produce merging binary black holes.

The question is now which parameters affect our gravitational-wave measurements, and how accurately we can measure those which do?

Merger rate with redshift and chirp mass

Binary black hole merger rate at three different redshifts z as calculated by COMPAS. We show the rate in 30 different chirp mass bins for our default population parameters. The caption gives the total rate for all masses. Figure 2 of Barrett et al. (2018)

Gravitational-wave observations

For our deductions, we use two pieces of information we will get from LIGO and Virgo observations: the total number of detections, and the distributions of chirp masses. The chirp mass is a combination of the two black hole masses that is often well measured—it is the most important quantity for controlling the inspiral, so it is well measured for low mass binaries which have a long inspiral, but is less well measured for higher mass systems. In reality we’ll have much more information, so these results should be the minimum we can actually do.

We consider the population after 1000 detections. That sounds like a lot, but we should have collected this many detections after just 2 or 3 years observing at design sensitivity. Our default COMPAS model predicts 484 detections per year of observing time! Honestly, I’m a little scared about having this many signals…

For a set of population parameters (black hole natal kick, common envelope efficiency, luminous blue variable mass loss and Wolf–Rayet mass loss), COMPAS predicts the number of detections and the fraction of detections as a function of chirp mass. Using these, we can work out the probability of getting the observed number of detections and fraction of detections within different chirp mass ranges. This is the likelihood function: if a given model is correct we are more likely to get results similar to its predictions than further away, although we expect their to be some scatter.

If you like equations, the from of our likelihood is explained in this bonus note. If you don’t like equations, there’s one lurking in the paragraph below. Just remember, that it can’t see you if you don’t move. It’s OK to skip the equation.

To determine how sensitive we are to each of the population parameters, we see how the likelihood changes as we vary these. The more the likelihood changes, the easier it should be to measure that parameter. We wrap this up in terms of the Fisher information matrix. This is defined as

\displaystyle F_{ij} = -\left\langle\frac{\partial^2\ln \mathcal{L}(\mathcal{D}|\left\{\lambda\right\})}{\partial \lambda_i \partial\lambda_j}\right\rangle,

where \mathcal{L}(\mathcal{D}|\left\{\lambda\right\}) is the likelihood for data \mathcal{D} (the number of observations and their chirp mass distribution in our case), \left\{\lambda\right\} are our parameters (natal kick, etc.), and the angular brackets indicate the average over the population parameters. In statistics terminology, this is the variance of the score, which I think sounds cool. The Fisher information matrix nicely quantifies how much information we can lean about the parameters, including the correlations between them (so we can explore degeneracies). The inverse of the Fisher information matrix gives a lower bound on the covariance matrix (the multidemensional generalisation of the variance in a normal distribution) for the parameters \left\{\lambda\right\}. In the limit of a large number of detections, we can use the Fisher information matrix to estimate the accuracy to which we measure the parameters [bonus note].

We simulated several populations of binary black hole signals, and then calculate measurement uncertainties for our four population uncertainties to see what we could learn from these measurements.


Using just the rate information, we find that we can constrain a combination of the common envelope efficiency and the Wolf–Rayet mass loss rate. Increasing the common envelope efficiency ends the common envelope phase earlier, leaving the binary further apart. Wider binaries take longer to merge, so this reduces the merger rate. Similarly, increasing the Wolf–Rayet mass loss rate leads to wider binaries and smaller black holes, which take longer to merge through gravitational-wave emission. Since the two parameters have similar effects, they are anticorrelated. We can increase one and still get the same number of detections if we decrease the other. There’s a hint of a similar correlation between the common envelope efficiency and the luminous blue variable mass loss rate too, but it’s not quite significant enough for us to be certain it’s there.

Correaltions between population parameters

Fisher information matrix estimates for fractional measurement precision of the four population parameters: the black hole natal kick \sigma_\mathrm{kick}, the common envelope efficiency \alpha_\mathrm{CE}, the Wolf–Rayet mass loss rate f_\mathrm{WR}, and the luminous blue variable mass loss rate f_\mathrm{LBV}. There is an anticorrealtion between f_\mathrm{WR} and \alpha_\mathrm{CE}, and hints at a similar anticorrelation between f_|mathrm{LBV} and \alpha_\mathrm{CE}. We show 1500 different realisations of the binary population to give an idea of scatter. Figure 6 of Barrett et al. (2018)

Adding in the chirp mass distribution gives us more information, and improves our measurement accuracies. The fraction uncertainties are about 2% for the two mass loss rates and the common envelope efficiency, and about 5% for the black hole natal kick. We’re less sensitive to the natal kick because the most massive black holes don’t receive a kick, and so are unaffected by the kick distribution [bonus note]. In any case, these measurements are exciting! With this type of precision, we’ll really be able to learn something about the details of binary evolution.

Standard deviation of measurements of population parameters

Measurement precision for the four population parameters after 1000 detections. We quantify the precision with the standard deviation estimated from the Fisher inforamtion matrix. We show results from 1500 realisations of the population to give an idea of scatter. Figure 5 of Barrett et al. (2018)

The accuracy of our measurements will improve (on average) with the square root of the number of gravitational-wave detections. So we can expect 1% measurements after about 4000 observations. However, we might be able to get even more improvement by combining constraints from other types of observation. Combining different types of observation can help break degeneracies. I’m looking forward to building a concordance model of binary evolution, and figuring out exactly how massive stars live their lives.

arXiv: 1711.06287 [astro-ph.HE]
Journal: Monthly Notices of the Royal Astronomical Society; 477(4):4685–4695; 2018
Favourite dinosaur: Professor Science

Bonus notes

Channel selection

In practise, we will need to worry about how binary black holes are formed, via isolated evolution or otherwise, before inferring the parameters describing binary evolution. This makes the problem more complicated. Some parameters, like mass loss rates or black hole natal kicks, might be common across multiple channels, while others are not. There are a number of ways we might be able to tell different formation mechanisms apart, such as by using spin measurements.

Kick distribution

We model the supernova kicks v_\mathrm{kick} as following a Maxwell–Boltzmann distribution,

\displaystyle p(v_\mathrm{kick}) = \sqrt{\frac{2}{\pi}}  \frac{v_\mathrm{kick}^2}{\sigma_\mathrm{kick}^3} \exp\left(\frac{-v_\mathrm{kick}^2}{2\sigma_\mathrm{kick}^2}\right),

where \sigma_\mathrm{kick} is the unknown population parameter. The natal kick received by the black hole v^*_\mathrm{kick} is not the same as this, however, as we assume some of the material ejected by the supernova falls back, reducing the over kick. The final natal kick is

v^*_\mathrm{kick} = (1-f_\mathrm{fb})v_\mathrm{kick},

where f_\mathrm{fb} is the fraction that falls back, taken from Fryer et al. (2012). The fraction is greater for larger black holes, so the biggest black holes get no kicks. This means that the largest black holes are unaffected by the value of \sigma_\mathrm{kick}.

The likelihood

In this analysis, we have two pieces of information: the number of detections, and the chirp masses of the detections. The first is easy to summarise with a single number. The second is more complicated, and we consider the fraction of events within different chirp mass bins.

Our COMPAS model predicts the merger rate \mu and the probability of falling in each chirp mass bin p_k (we factor measurement uncertainty into this). Our observations are the the total number of detections N_\mathrm{obs} and the number in each chirp mass bin c_k (N_\mathrm{obs} = \sum_k c_k). The likelihood is the probability of these observations given the model predictions. We can split the likelihood into two pieces, one for the rate, and one for the chirp mass distribution,

\mathcal{L} = \mathcal{L}_\mathrm{rate} \times \mathcal{L}_\mathrm{mass}.

For the rate likelihood, we need the probability of observing N_\mathrm{obs} given the predicted rate \mu. This is given by a Poisson distribution,

\displaystyle \mathcal{L}_\mathrm{rate} = \exp(-\mu t_\mathrm{obs}) \frac{(\mu t_\mathrm{obs})^{N_\mathrm{obs}}}{N_\mathrm{obs}!},

where t_\mathrm{obs} is the total observing time. For the chirp mass likelihood, we the probability of getting a number of detections in each bin, given the predicted fractions. This is given by a multinomial distribution,

\displaystyle \mathcal{L}_\mathrm{mass} = \frac{N_\mathrm{obs}!}{\prod_k c_k!} \prod_k p_k^{c_k}.

These look a little messy, but they simplify when you take the logarithm, as we need to do for the Fisher information matrix.

When we substitute in our likelihood into the expression for the Fisher information matrix, we get

\displaystyle F_{ij} = \mu t_\mathrm{obs} \left[ \frac{1}{\mu^2} \frac{\partial \mu}{\partial \lambda_i} \frac{\partial \mu}{\partial \lambda_j}  + \sum_k\frac{1}{p_k} \frac{\partial p_k}{\partial \lambda_i} \frac{\partial p_k}{\partial \lambda_j} \right].

Conveniently, although we only need to evaluate first-order derivatives, even though the Fisher information matrix is defined in terms of second derivatives. The expected number of events is \langle N_\mathrm{obs} \rangle = \mu t_\mathrm{obs}. Therefore, we can see that the measurement uncertainty defined by the inverse of the Fisher information matrix, scales on average as N_\mathrm{obs}^{-1/2}.

For anyone worrying about using the likelihood rather than the posterior for these estimates, the high number of detections [bonus note] should mean that the information we’ve gained from the data overwhelms our prior, meaning that the shape of the posterior is dictated by the shape of the likelihood.

Interpretation of the Fisher information matrix

As an alternative way of looking at the Fisher information matrix, we can consider the shape of the likelihood close to its peak. Around the maximum likelihood point, the first-order derivatives of the likelihood with respect to the population parameters is zero (otherwise it wouldn’t be the maximum). The maximum likelihood values of $latex N_\mathrm{obs} = \mu t_\mathrm{obs}$ and c_k = N_\mathrm{obs} p_k are the same as their expectation values. The second-order derivatives are given by the expression we have worked out for the Fisher information matrix. Therefore, in the region around the maximum likelihood point, the Fisher information matrix encodes all the relevant information about the shape of the likelihood.

So long as we are working close to the maximum likelihood point, we can approximate the distribution as a multidimensional normal distribution with its covariance matrix determined by the inverse of the Fisher information matrix. Our results for the measurement uncertainties are made subject to this approximation (which we did check was OK).

Approximating the likelihood this way should be safe in the limit of large N_\mathrm{obs}. As we get more detections, statistical uncertainties should reduce, with the peak of the distribution homing in on the maximum likelihood value, and its width narrowing. If you take the limit of N_\mathrm{obs} \rightarrow \infty, you’ll see that the distribution basically becomes a delta function at the maximum likelihood values. To check that our N_\mathrm{obs} = 1000 was large enough, we verified that higher-order derivatives were still small.

Michele Vallisneri has a good paper looking at using the Fisher information matrix for gravitational wave parameter estimation (rather than our problem of binary population synthesis). There is a good discussion of its range of validity. The high signal-to-noise ratio limit for gravitational wave signals corresponds to our high number of detections limit.



Hierarchical analysis of gravitational-wave measurements of binary black hole spin–orbit misalignments

Gravitational waves allow us to infer the properties of binary black holes (two black holes in orbit about each other), but can we use this information to figure out how the black holes and the binary form? In this paper, we show that measurements of the black holes’ spins can help us this out, but probably not until we have at least 100 detections.

Black hole spins

Black holes are described by their masses (how much they bend spacetime) and their spins (how much they drag spacetime to rotate about them). The orientation of the spins relative to the orbit of the binary could tell us something about the history of the binary [bonus note].

We considered four different populations of spin–orbit alignments to see if we could tell them apart with gravitational-wave observations:

  1. Aligned—matching the idealised example of isolated binary evolution. This stands in for the case where misalignments are small, which might be the case if material blown off during a supernova ends up falling back and being swallowed by the black hole.
  2. Isotropic—matching the expectations for dynamically formed binaries.
  3. Equal misalignments at birth—this would be the case if the spins and orbit were aligned before the second supernova, which then tilted the plane of the orbit. (As the binary inspirals, the spins wobble around, so the two misalignment angles won’t always be the same).
  4. Both spins misaligned by supernova kicks, assuming that the stars were aligned with the orbit before exploding. This gives a more general scatter of unequal misalignments, but typically the primary (bigger and first forming) black hole is more misaligned.

These give a selection of possible spin alignments. For each, we assumed that the spin magnitude was the same and had a value of 0.7. This seemed like a sensible idea when we started this study [bonus note], but is now towards the upper end of what we expect for binary black holes.

Hierarchical analysis

To measurement the properties of the population we need to perform a hierarchical analysis: there are two layers of inference, one for the individual binaries, and one of the population.

From a gravitational wave signal, we infer the properties of the source using Bayes’ theorem. Given the data d_\alpha, we want to know the probability that the parameters \mathbf{\Theta}_\alpha have different values, which is written as p(\mathbf{\Theta}_\alpha|d_\alpha). This is calculated using

\displaystyle p(\mathbf{\Theta}_\alpha|d_\alpha) = \frac{p(d_\alpha | \mathbf{\Theta}_\alpha) p(\mathbf{\Theta}_\alpha)}{p(d_\alpha)},

where p(d_\alpha | \mathbf{\Theta}_\alpha) is the likelihood, which we can calculate from our knowledge of the noise in our gravitational wave detectors, p(\mathbf{\Theta}_\alpha) is the prior on the parameters (what we would have guessed before we had the data), and the normalisation constant p(d_\alpha) is called the evidence. We’ll use the evidence again in the next layer of inference.

Our prior on the parameters should actually depend upon what we believe about the astrophysical population. It is different if we believed that Model 1 were true (when we’d only consider aligned spins) than for Model 2. Therefore, we should really write

\displaystyle p(\mathbf{\Theta}_\alpha|d_\alpha, \lambda) = \frac{p(d_\alpha | \mathbf{\Theta}_\alpha,\lambda) p(\mathbf{\Theta}_\alpha,\lambda)}{p(d_\alpha|\lambda)},

where  \lambda denotes which model we are considering.

This is an important point to remember: if you our using our LIGO results to test your theory of binary formation, you need to remember to correct for our choice of prior. We try to pick non-informative priors—priors that don’t make strong assumptions about the physics of the source—but this doesn’t mean that they match what would be expected from your model.

We are interested in the probability distribution for the different models: how many binaries come from each. Given a set of different observations \{d_\alpha\}, we can work this out using another application of Bayes’ theorem (yay)

\displaystyle p(\mathbf{\lambda}|\{d_\alpha\}) = \frac{p(\{d_\alpha\} | \mathbf{\lambda}) p(\mathbf{\lambda})}{p(\{d_\alpha\})},

where p(\{d_\alpha\} | \mathbf{\lambda}) is just all the evidences for the individual events (given that model) multiplied together, p(\mathbf{\lambda}) is our prior for the different models, and p(\{d_\alpha\}) is another normalisation constant.

Now knowing how to go from a set of observations to the probability distribution on the different channels, let’s give it a go!


To test our approach made a set of mock gravitational wave measurements. We generated signals from binaries for each of our four models, and analysed these as we would for real signals (using LALInference). This is rather computationally expensive, and we wanted a large set of events to analyse, so using these results as a guide, we created a larger catalogue of approximate distributions for the inferred source parameters p(\mathbf{\Theta}_\alpha|d_\alpha). We then fed these through our hierarchical analysis. The GIF below shows how measurements of the fraction of binaries from each population tightens up as we get more detections: the true fraction is marked in blue.

Fraction of binaries from each of the four models

Probability distribution for the fraction of binaries from each of our four spin misalignment populations for different numbers of observations. The blue dot marks the true fraction: and equal fraction from all four channels.

The plot shows that we do zoom in towards the true fraction of events from each model as the number of events increases, but there are significant degeneracies between the different models. Notably, it is difficult to tell apart Models 1 and 3, as both have strong support for both spins being nearly aligned. Similarly, there is a degeneracy between Models 2 and 4 as both allow for the two spins to have very different misalignments (and for the primary spin, which is the better measured one, to be quite significantly misaligned).

This means that we should be able to distinguish aligned from misaligned populations (we estimated that as few as 5 events would be needed to distinguish the case that all events came from either Model 1  or Model 2 if those were the only two allowed possibilities). However, it will be more difficult to distinguish different scenarios which only lead to small misalignments from each other, or disentangle whether there is significant misalignment due to big supernova kicks or because binaries are formed dynamically.

The uncertainty of the fraction of events from each model scales roughly with the square root of the number of observations, so it may be slow progress making these measurements. I’m not sure whether we’ll know the answer to how binary black hole form, or who will sit on the Iron Throne first.

arXiv: 1703.06873 [astro-ph.HE]
Journal: Monthly Notices of the Royal Astronomical Society471(3):2801–2811; 2017
Birmingham science summary: Hierarchical analysis of gravitational-wave measurements of binary black hole spin–orbit misalignment (by Simon)
If you like this you might like: Farr et al. (2017)Talbot & Thrane (2017), Vitale et al. (2017), Trifirò et al. (2016), Minogue (2000)

Bonus notes

Spin misalignments and formation histories

If you have two stars forming in a binary together, you’d expect them to be spinning in roughly the same direction, rotating the same way as they go round in their orbit (like our Solar System). This is because they all formed from the same cloud of swirling gas and dust. Furthermore, if two stars are to form a black hole binary that we can detect gravitational waves from, they need to be close together. This means that there can be tidal forces which gently tug the stars to align their rotation with the orbit. As they get older, stars puff up, meaning that if you have a close-by neighbour, you can share outer layers. This transfer of material will tend to align rotate too. Adding this all together, if you have an isolated binary of stars, you might expect that when they collapse down to become black holes, their spins are aligned with each other and the orbit.

Unfortunately, real astrophysics is rarely so clean. Even if the stars were initially rotating the same way as each other, they doesn’t mean that their black hole remnants will do the same. This depends upon how the star collapses. Massive stars explode as supernova, blasting off their outer layers while their cores collapse down to form black holes. Escaping material could carry away angular momentum, meaning that the black hole is spinning in a different direction to its parent star, or material could be blasted off asymmetrically, giving the new black hole a kick. This would change the plane of the binary’s orbit, misaligning the spins.

Alternatively, the binary could be formed dynamically. Instead of two stars living their lives together, we could have two stars (or black holes) come close enough together to form a binary. This is likely to happen in regions where there’s a high density of stars, such as a globular cluster. In this case, since the binary has been randomly assembled, there’s no reason for the spins to be aligned with each other or the orbit. For dynamically assembled binaries, all spin–orbit misalignments are equally probable.

Slow and steady

This project was led by Simon Stevenson. It was one of the first things we started working on at the beginning of his PhD. He has now graduated, and is off to start a new exciting life as a postdoc in Australia. We got a little distracted by other projects, most notably analysing the first detections of gravitational waves. Simon spent a lot of time developing the COMPAS population code, a code to simulate the evolution of binaries. Looking back, it’s impressive how far he’s come. This paper used a simple approximation to to estimate the masses of our black holes: we called it the Post-it note model, as we wrote it down on a single Post-it. Now Simon’s writing papers including the complexities of common-envelope evolution in order to explain LIGO’s actual observations.