Classifying the unknown: Discovering novel gravitational-wave detector glitches using similarity learning

 

Gravity Spy is an awesome project that combines citizen science and machine learning to classify glitches in LIGO and Virgo data. Glitches are short bursts of noise in our detectors which make analysing our data more difficult. Some glitches have known causes, others are more mysterious. Classifying glitches into different types helps us better understand their properties, and in some cases track down their causes and eliminate them! In this paper, led by Scotty Coughlin, we demonstrated the effectiveness of a new tool which are citizen scientists can use to identify new glitch classes.

The Gravity Spy project

Gravitational-wave detectors are complicated machines. It takes a lot of engineering to achieve the required accuracy needed to observe gravitational waves. Most of the time, our detectors perform well. The background noise in our detectors is easy to understand and model. However, our detectors are also subject to glitches, unusual  (sometimes extremely loud and complicated) noise that doesn’t fit the usual properties of noise. Glitches are short, they only appear in a small fraction of the total data, but they are common. This makes detection and analysis of gravitational-wave signals more difficult. Detection is tricky because you need to be careful to distinguish glitches from signals (and possibly glitches and signals together), and understanding the signal is complicated as we may need to model a signal and a glitch together [bonus note]. Understanding glitches is essential if gravitational-wave astronomy is to be a success.

To understand glitches, we need to be able to classify them. We can search for glitches by looking for loud pops, whooshes and splats in our data. The task is then to spot similarities between them. Once we have a set of glitches of the same type, we can examine the state of the instruments at these times. In the best cases, we can identify the cause, and then work to improve the detectors so that this no longer happens. Other times, we might be able to find the source, but we can find one of the monitors in our detectors which acts a witness to the glitch. Then we know that if something appears in that monitor, we expect a glitch of a particular form. This might mean that we throw away that bit of data, or perhaps we can use the witness data to subtract out the glitch. Since glitches are so common, classifying them is a huge amount of work. It is too much for our detector characterisation experts to do by hand.

There are two cunning options for classifying large numbers of glitches

  1. Get a computer to do it. The difficulty  is teaching a computer to identify the different classes. Machine-learning algorithms can do this, if they are properly trained. Training can require a large training set, and careful validation, so the process is still labour intensive.
  2. Get lots of people to help. The difficulty here is getting non-experts up-to-speed on what to look for, and then checking that they are doing a good job. Crowdsourcing classifications is something citizen scientists can do, but we will need a large number of dedicated volunteers to tackle the full set of data.

The idea behind Gravity Spy is to combine the two approaches. We start with a small training set from our detector characterization experts, and train a machine-learning algorithm on them. We then ask citizen scientists (thanks Zooniverse) to classify the glitches. We start them off with glitches the machine-learning algorithm is confident in its classification; these should be easy to identify. As citizen scientists get more experienced, they level up and start tackling more difficult glitches. The citizen scientists validate the classifications of the machine-learning algorithm, and provide a larger training set (especially helpful for the rarer glitch classes) for it. We can then happily apply the machine-learning algorithm to classify the full data set [bonus note].

The Gravity Spy workflow

How Gravity Spy works: the interconnection of machine-learning classification and citizen-scientist classification. The similarity search is used to identify glitches similar to one which do not fit into current classes. Figure 2 of Coughlin et al. (2019).

I especially like the levelling-up system in Gravity Spy. I think it helps keep citizen scientists motivated, as it both prevents them from being overwhelmed when they start and helps them see their own progress. I am currently Level 4.

Gravity Spy works using images of the data. We show spectrograms, plots of how loud the output of the detectors are at different frequencies at different times. A gravitational wave form a binary would show a chirp structure, starting at lower frequencies and sweeping up.

Gravitational-wave chirp

Spectrogram showing the upward-sweeping chirp of gravitational wave GW170104 as seen in Gravity Spy. I correctly classified this as a Chirp.

New glitches

The Gravity Spy system works smoothly. However, it is set up to work with a fixed set of glitch classes. We may be missing new glitch classes, either because they are rare, and hadn’t been spotted by our detector characterization team, or because we changed something in our detectors and new class arose (we expect this to happen as we tune up the detectors between observing runs). We can add more classes to our citizen scientists and machine-learning algorithm to use, but how do we spot new classes in the first place?

Our citizen scientists managed to identify a few new glitches by spotting things which didn’t fit into any of the classes. These get put in the None-of-the-Above class. Occasionally, you’ll come across similar looking glitches, and by collecting a few of these together, build a new class. The Paired Dove and Helix classes were identified early on by our citizen scientists this way; my favourite suggested new class is the Falcon [bonus note]. The difficulty is finding a large number of examples of a new class—you might only recognise a common feature after going past a few examples, backtracking to find the previous examples is hard, and you just have to keep working until you are lucky enough to be given more of the same.

Helix and Paired Dove

Example Helix (left) and Paired Dove glitches. These classes were identified by Gravity Spy citizen scientists. Helix glitches are related to related to hiccups in the auxiliary lasers used to calibrate the detectors by pushing on the mirrors. Paired Dove glitches are related to motion of the beamsplitter in the interferometer. Adapted from Figure 8 of Zevin et al. (2017).

To help our citizen scientists find new glitches, we created a similar search. Having found an interesting glitch, you can search for similar examples, and put quickly put together a collection of your new class. The video below shows how it works. The thing we had to work out was how to define similar?

Transfer learning

Our machine-learning algorithm only knows about the classes we tell it about. It then works out the features we distinguish the different classes, and are common to glitches of the same class. Working in this feature space, glitches form clusters of different classes.

Gravity Spy feature space

Visualisation showing the clustering of different glitches in the Gravity Spy feature space. Each point is a different glitch from our training set. The feature space has more than three dimensions: this visualisation was made using a technique which preserves the separation and clustering of different and similar points. Figure 1 of Coughlin et al. (2019).

For our similarity search, our idea was to measure distances in feature space [bonus note for experts]. This should work well if our current set of classes have a wide enough set of features to capture to characteristics of the new class; however, it won’t be effective if the new class is completely different, so that its unique features are not recognised. As an analogy, imagine that you had an algorithm which classified M&M’s by colour. It would probably do well if you asked it to distinguish a new colour, but would probably do poorly if you asked it to distinguish peanut butter filled M&M’s as they are identified by flavour, which is not a feature it knows about. The strategy of using what a machine learning algorithm learnt about one problem to tackle a new problem is known as transfer learning, and we found this strategy worked well for our similarity search.

Raven Pecks and Water Jets

To test our similarity search, we applied it to two glitches classes not in the Gravity Spy set:

  1. Raven Peck glitches are caused by thirsty ravens pecking ice built up along nitrogen vent lines outside of the Hanford detector. Raven Pecks look like horizontal lines in spectrograms, similar to other Gravity Spy glitch classes (like the Power Line, Low Frequency Line and 1080 Line). The similarity search should therefore do a good job, as we should be able to recognise its important features.
  2. Water Jet glitches were caused by local seismic noise at the Hanford detector which  causes loud bands which disturb the input laser optics. These glitches are found between , over which time there are 26,871 total glitches in GRavity Spy. The Water Jet glitch doesn’t have anything to do with water, it is named based on its appearance (like a fountain, not a weasel). Its features are subtle, and unlike other classes, so we would expect this to be difficult for our similarity search to handle.

These glitches appeared in the data from the second observing run. Raven Pecks appeared between 14 April and 9 August 2017, and Water Jets appeared 4 January and 28 May 2017. Over these intervals there are a total of 13,513 and 26,871 Gravity Spy glitches from all type, so even if you knew exactly when to look, you have a large number to search through to find examples.

Raven Peck and Water Jet glitches

Example Raven Peck (left) and Water Jet (right) glitches. These classes of glitch are not included in the usual Gravity Spy scheme. Adapted from Figure 3 of Coughlin et al. (2019).

We tested using our machine-learning feature space for the similarity search against simpler approaches: using the raw difference in pixels, and using a principal component analysis to create a feature space. Results are shown in the plots below. These show the fraction of glitches we want returned by the similarity search versus the total number of glitches rejected. Ideally, we would want to reject all the glitches except the ones we want, so the search would return 100% of the wanted classes and reject almost 100% of the total set. However, the actual results will depend on the adopted threshold for the similarity search: if we’re very strict we’ll reject pretty much everything, and only get the most similar glitches of the class we want, if we are too accepting, we get everything back, regardless of class. The plots can be read as increasing the range of the similarity search (becoming less strict) as you go left to right.

Similarity search performance

Performance of the similarity search for Raven Peck (left) and Water Jet (right) glitches: the fraction of known glitches of the desired class that have a higher similarity score (compared to an example of that glitch class) than a given percentage of full data set. Results are shown for three different ways of defining similarity: the DIRECT machine-learning algorithm feature space (think line), a principal component analysis (medium line) and a comparison of pixels (thin line). Adapted from Figure 3 of Coughlin et al. (2019).

For the Raven Peck, the similarity search always performs well. We have 50% of Raven Pecks returned while rejecting 99% of the total set of glitches, and we can get the full set while rejecting 92% of the total set! The performance is pretty similar between the different ways of defining feature space. Raven Pecks are easy to spot.

Water Jets are more difficult. When we have 50% of Water Jets returned by the search, our machine-learning feature space can still reject almost all glitches. The simpler approaches do much worse, and will only reject about 30% of the full data set. To get the full set of Water Jets we would need to loosen the similarity search so that it only rejects 55% of the full set using our machine-learning feature space; for the simpler approaches we’d basically get the full set of glitches back. They do not do a good job at narrowing down the hunt for glitches. Despite our suspicion that our machine-learning approach would struggle, it still seems to do a decent job [bonus note for experts].

Do try this at home

Having developed and testing our similarity search tool, it is now live. Citizen scientists can use it to hunt down new glitch classes. Several new glitches classes have been identified in data from LIGO and Virgo’s (currently ongoing) third observing run. If you are looking for a new project, why not give it a go yourself? (Or get your students to give it a go, I’ve had some reasonable results with high-schoolers). There is the real possibility that your work could help us with the next big gravitational-wave discovery.

arXiv: arXiv:1903.04058 [astro-ph.IM]
Journal: Physical Review D; 99(8):082002(8); 2019
Websites: Gravity Spy; Gravity Spy Tools
Gravity Spy blog: Introducing Gravity Spy Tools
Current stats: Gravity Spy has 15,500 registered users, who have made 4.4 million glitch classifications, leading to 200,000 successfully identified glitches.

Bonus notes

Signals and glitches

The best example of a gravitational-wave overlapping a glitch is GW170817. The glitch meant that the signal in the LIGO Livingston detector wasn’t immediately recognised. Fortunately, the signal in the Hanford detector was easy to spot. The glitch was analyse and categorised in Gravity Spy. It is a simple glitch, so it wasn’t too difficult to remove from the data. As our detectors become more sensitive, so that detections become more frequent, we expect that signal overlapping with glitches will become a more common occurrence. Unless we can eliminate glitches, it is only a matter of time before we get a glitch that prevents us from analysing an important signal.

Gravitational-wave alerts

In the third observing run of LIGO and Virgo, we send out automated alerts when we have a new gravitational-wave candidate. Astronomers can then pounce into action to see if they can spot anything coinciding with the source. It is important to quickly check the state of the instruments to ensure we don’t have a false alarm. To help with this, a data quality report is automatically prepared, containing many diagnostics. The classification from the Gravity Spy algorithm is one of many pieces of information included. It is the one I check first.

The Falcon

Excellent Gravity Spy moderator EcceruElme suggested a new glitch class Falcon. This suggestion was followed up by Oli Patane, they found that all the examples identified occured between 6:30 am and 8:30 am on 20 June 2017 in the Hanford detector. The instrument was misbehaving at the time. To solve this, the detector was taken out of observing mode and relocked (the equivalent of switching it off and on again). Since this glitch class was only found in this one 2-hour window, we’ve not added it as a class. I love how it was possible to identify this problematic stretch of time using only Gravity Spy images (which don’t identify when they are from). I think this could be the seed of a good detective story. The Hanfordese Falcon?

Characteristics of Falcon glitches

Examples of the proposed Falcon glitch class, illustrating the key features (and where the name comes from). This new glitch class was suggested by Gravity Spy citizen scientist EcceruElme.

Distance measure

We chose a cosine distance to measure similarity in feature space. We found this worked better than a Euclidean metric. Possibly because for identifying classes it is more important to have the right mix of features, rather than how significant the individual features are. However, we didn’t do a systematic investigation of the optimal means of measuring similarity.

Retraining the neural net

We tested the performance of the machine-learning feature space in the similarity search after modifying properties of our machine-learning algorithm. The algorithm we are using is a deep multiview convolution neural net. We switched the activation function in the fully connected layer of the net, trying tanh and leaukyREU. We also varied the number of training rounds and the number of pairs of similar and dissimilar images that are drawn from the training set each round. We found that there was little variation in results. We found that leakyREU performed a little better than tanh, possibly because it covers a larger dynamic range, and so can allow for cleaner separation of similar and dissimilar features. The number of training rounds and pairs makes negligible difference, possibly because the classes are sufficiently distinct that you don’t need many inputs to identify the basic features to tell them apart. Overall, our results appear robust. The machine-learning approach works well for the similarity search.

Dirichlet Process Gaussian-mixture model: An application to localizing coalescing binary neutron stars with gravitational-wave observations

Where do gravitational waves like GW170817 come from? Using our network of detectors, we cannot pinpoint a source, but we can make a good estimate—the amplitude of the signal tells us about the distance; the time delay between the signal arriving at different detectors, and relative amplitudes of the signal in different detectors tells us about the sky position (see the excellent video by Leo Singer below).

In this paper we look at full three-dimensional localization of gravitational-wave sources; we important a (rather cunning) technique from computer vision to construct a probability distribution for the source’s location, and then explore how well we could localise a set of simulated binary neutron stars. Knowing the source location enables lots of cool science. First, it aids direct follow-up observations with non-gravitational-wave observatories, searching for electromagnetic or neutrino counterparts. It’s especially helpful if you can cross-reference with galaxy catalogues, to find the most probable source locations (this technique was used to find the kilonova associated with GW170817). Even without finding a counterpart, knowing the most probable host galaxy helps us figure out how the source formed (have lots of stars been born recently, or are all the stars old?), and allows us measure the expansion of the Universe. Having a reliable technique to reconstruct source locations is useful!

This was a fun paper to write [bonus note]. I’m sure it will be valuable, both for showing how to perform this type of reconstruction of a multi-dimensional probability density, and for its implications for source localization and follow-up of gravitational-wave signals. I go into details of both below, first discussing our statistical model (this is a bit technical), then looking at our results for a set of binary neutron stars (which have implications for hunting for counterparts) .

Dirichlet process Gaussian mixture model

When we analyse gravitational-wave data to infer the source properties (location, masses, etc.), we map out parameter space with a set of samples: a list of points in the parameter space, with there being more around more probable locations and fewer in less probable locations. These samples encode everything about the probability distribution for the different parameters, we just need to extract it…

For our application, we want a nice smooth probability density. How do we convert a bunch of discrete samples to a smooth distribution? The simplest thing is to bin the samples. However, picking the right bin size is difficult, and becomes much harder in higher dimensions. Another popular option is to use kernel density estimation. This is better at ensuring smooth results, but you now have to worry about the size of your kernels.

Our approach is in essence to use a kernel density estimate, but to learn the size and position of the kernels (as well as the number) from the data as an extra layer of inference. The “Gaussian mixture model” part of the name refers to the kernels—we use several different Gaussians. The “Dirichlet process” part refers to how we assign their properties (their means and standard deviations). What I really like about this technique, as opposed to the usual rule-of-thumb approaches used for kernel density estimation,  is that it is well justified from a theoretical point of view.

I hadn’t come across a Dirchlet process before. Section 2 of the paper is a walkthrough of how I built up an understanding of this mathematical object, and it contains lots of helpful references if you’d like to dig deeper.

In our application, you can think of the Dirichlet process as being a probability distribution for probability distributions. We want a probability distribution describing the source location. Given our samples, we infer what this looks like. We could put all the probability into one big Gaussian, or we could put it into lots of little Gaussians. The Gaussians could be wide or narrow or a mix. The Dirichlet distribution allows us to assign probabilities to each configuration of Gaussians; for example, if our samples are all in the northern hemisphere, we probably want Gaussians centred around there, rather than in the southern hemisphere.

With the resulting probability distribution for the source location, we can quickly evaluate it at a single point. This means we can rapidly produce a list of most probable source galaxies—extremely handy if you need to know where to point a telescope before a kilonova fades away (or someone else finds it).

Gravitational-wave localization

To verify our technique works, and develop an intuition for three-dimensional localizations, we used studied a set of simulated binary neutron star signals created for the First 2 Years trilogy of papers. This data set is well studied now, it illustrates performance it what we anticipated to be the first two observing runs of the advanced detectors, which turned out to be not too far from the truth. We have previously looked at three-dimensional localizations for these signals using a super rapid approximation.

The plots below show how well we could localise the sources of our binary neutron star sources. Specifically, the plots show the size of the volume which has a 90% probability of containing the source verses the signal-to-noise ratio (the loudness) of the signal. Typically, volumes are 10^410^5~\mathrm{Mpc}^3, which is about 10^{68}10^{69} Olympic swimming pools. Such a volume would contain something like 1001000 galaxies.

Volume verses signal-to-noise ratio

Localization volume as a function of signal-to-noise ratio. The top panel shows results for two-detector observations: the LIGO-Hanford and LIGO-Livingston (HL) network similar to in the first observing run, and the LIGO and Virgo (HLV) network similar to the second observing run. The bottom panel shows all observations for the HLV network including those with all three detectors which are colour coded by the fraction of the total signal-to-noise ratio from Virgo. In both panels, there are fiducial lines scaling inversely with the sixth power of the signal-to-noise ratio. Adapted from Fig. 4 of Del Pozzo et al. (2018).

Looking at the results in detail, we can learn a number of things

  1. The localization volume is roughly inversely proportional to the sixth power of the signal-to-noise ratio [bonus note]. Loud signals are localized much better than quieter ones!
  2. The localization dramatically improves when we have three-detector observations. The extra detector improves the sky localization, which reduces the localization volume.
  3. To get the benefit of the extra detector, the source needs to be close enough that all the detectors could get a decent amount of the signal-to-noise ratio. In our case, Virgo is the least sensitive, and we see the the best localizations are when it has a fair share of the signal-to-noise ratio.
  4. Considering the cases where we only have two detectors, localization volumes get bigger at a given signal-to-noise ration as the detectors get more sensitive. This is because we can detect sources at greater distances.

Putting all these bits together, I think in the future, when we have lots of detections, it would make most sense to prioritise following up the loudest signals. These are the best localised, and will also be the brightest since they are the closest, meaning there’s the greatest potential for actually finding a counterpart. As the sensitivity of the detectors improves, its only going to get more difficult to find a counterpart to a typical gravitational-wave signal, as sources will be further away and less well localized. However, having more sensitive detectors also means that we are more likely to have a really loud signal, which should be really well localized.

Banana vs cucumber

Left: Localization (yellow) with a network of two low-sensitivity detectors. The sky location is uncertain, but we know the source must be nearby. Right: Localization (green) with a network of three high-sensitivity detectors. We have good constraints on the source location, but it could now be at a much greater range of distances. Not to scale.

Using our localization volumes as a guide, you would only need to search one galaxy to find the true source in about 7% of cases with a three-detector network similar to at the end of our second observing run. Similarly, only ten would need to be searched in 23% of cases. It might be possible to get even better performance by considering which galaxies are most probable because they are the biggest or the most likely to produce merging binary neutron stars. This is definitely a good approach to follow.

Three-dimensional localization with galaxy catalgoue

Galaxies within the 90% credible volume of an example simulated source, colour coded by probability. The galaxies are from the GLADE Catalog; incompleteness in the plane of the Milky Way causes the missing wedge of galaxies. The true source location is marked by a cross [bonus note]. Part of Figure 5 of Del Pozzo et al. (2018).

arXiv: 1801.08009 [astro-ph.IM]
Journal: Monthly Notices of the Royal Astronomical Society; 479(1):601–614; 2018
Code: 3d_volume
Buzzword bingo: Interdisciplinary (we worked with computer scientist Tom Haines); machine learning (the inference involving our Dirichlet process Gaussian mixture model); multimessenger astronomy (as our results are useful for following up gravitational-wave signals in the search for counterparts)

Bonus notes

Writing

We started writing this paper back before the first observing run of Advanced LIGO. We had a pretty complete draft on Friday 11 September 2015. We just needed to gather together a few extra numbers and polish up the figures and we’d be done! At 10:50 am on Monday 14 September 2015, we made our first detection of gravitational waves. The paper was put on hold. The pace of discoveries over the coming years meant we never quite found enough time to get it together—I’ve rewritten the introduction a dozen times. It’s extremely satisfying to have it done. This is a shame, as it meant that this study came out much later than our other three-dimensional localization study. The delay has the advantage of justifying one of my favourite acknowledgement sections.

Sixth power

We find that the localization volume \Delta V is inversely proportional to the sixth power of the signal-to-noise ration \varrho. This is what you would expect. The localization volume depends upon the angular uncertainty on the sky \Delta \Omega, the distance to the source D, and the distance uncertainty \Delta D,

\Delta V \sim D^2 \Delta \Omega \Delta D.

Typically, the uncertainty on a parameter (like the masses) scales inversely with the signal-to-noise ratio. This is the case for the logarithm of the distance, which means

\displaystyle \frac{\Delta D}{D} \propto \varrho^{-1}.

The uncertainty in the sky location (being two dimensional) scales inversely with the square of the signal-to-noise ration,

\Delta \Omega \propto \varrho^{-2}.

The signal-to-noise ratio itself is inversely proportional to the distance to the source (sources further way are quieter. Therefore, putting everything together gives

\Delta V \propto \varrho^{-6}.

Treasure

We all know that treasure is marked by a cross. In the case of a binary neutron star merger, dense material ejected from the neutron stars will decay to heavy elements like gold and platinum, so there is definitely a lot of treasure at the source location.