Puzzle procrastination: perplexing probabilities

I enjoy pondering over puzzles. Figuring out correct probabilities can be confusing, but it is a good exercise to practise logical reasoning. Previously, we have seen how to update probabilities when given new information; let’s see if we use this to solve some puzzles!

1 Girls, boys and doughnuts

As an example, we’ve previously calculated the probabilities for the boy–girl distribution of our office-mate Iris’ children. Let’s imagine that we’ve popped over to Iris’ for doughnuts (this time while her children are out), and there we meet her sister Laura. Laura tells us that she has two children. What are the probabilities that Laura has: (i) two girls, (ii) a girl and a boy, or (iii) two boys?

It turns out that Laura has one of her children with her. After you finish your second doughnut (a chocolatey, custardy one), Laura introduces you to her daughter Lucy. Lucy loves LEGO, but that is unimportant for the current discussion. How does Lucy being a girl change the probabilities?

While you are finishing up your third doughnut (with plum and apple jam), you discover that Lucy is the eldest child. What are the probabilities now—have they changed?

Laura is a member of an extremely selective club for mothers with two children of which at least one is a girl. They might fight crime at the weekends, Laura gets a little evasive about what they actually do. What are the probabilities that a random member of this club has (i) two girls, (ii) a girl and a boy, or (iii) two boys?

The answers to similar questions have been the subject to lots of argument, even though they aren’t about anything too complicated. If you figure out the answers, you might see how the  way you phrase the question is important.

2 Do or do-nut

You are continuing to munch through the doughnuts at Iris’. You are working your way through a box of 24. There is one of each flavour and you know there is one you do not like (which we won’t mention for liable reasons). There’s no way of telling what flavour a doughnut is before biting into it. You have now eaten six, not one of which was the bad one. The others have eaten twelve between them. What is the probability that your nemesis doughnut is left? What is the probability that you will pick it up next?

You continue munching, as do the others. You discover that Iris, Laura and Lucy all hate the same flavour that you do, but that none of them have eaten it. There are now just four doughnuts left. Lucy admits that she did take one of the doughnuts to feed the birds in the garden (although they didn’t actually eat it as they are trying to stick to a balanced diet). She took the doughnut while there were still 12 left. What is the probability that the accursed flavour is still lurking amongst the final four?

You are agree to take one each, one after another. Does it matter when you pick yours?

Happy pondering! I shall post the solutions later.

Tips for scientific writing

Second year physics undergraduates at the University of Birmingham have to write an essay as part of their course. As a tutor, it’s my job to give them advice on how to write in a scientific style (and then mark the results). I have assembled these tips to try to aid them (and make my marking less painful). Much of this advice also translates to paper writing, and I try to follow these tips myself.

Writing well is difficult. It requires practice. It is an important skill, yet it is something that I do not believe is frequently formally taught (at least in the sciences). Scientific and other technical writing can be especially hard, as it has its own rules that can be at odds with what we learn at school (when studying literature or creative writing). Reading the work of others is a good way for figuring out what works well and what does not.

In this post, I include some tips that I hope are useful (not everyone will agree). I begin by considering how to plan and structure a piece of writing (section 1), from the largest scale (section 1.2) progressing down to the smallest (section 1.4); then I discuss various aspects of technical writing (section 2), both in terms of content and style, including referencing (section 2.5), which is often problematic, and I conclude with some general editing advice (section 3) before summarising (section 4). If you have anything extra to add, please do so in the comments.

1 Structure and planning

The structure of your writing is important as it reflects the logical flow of your arguments. It is worth spending some time before you start writing considering what you want to say, and what is the best order for your ideas. (This is also true in exams: I have found when trying to answer essay questions it is worth the time to spend a couple of minutes planning, otherwise I am liable to miss out an important point). I frequently get frustrated that I must write linearly, one idea after another, and cannot introduce multiple strands at a time, with arguments intertwining with each other. However, putting in the effort to construct a clear progression does help your reader.

1.1 Title and audience

The first thing to consider is what you want to write and who is going to read it. Always write for your audience, and remember that professional scientists and the general public look for different things (this blog may be a poor example of this, as different posts are targeted towards different audiences).

Having thought about what you want to say, pick a title that reflects this. Don’t have a title “The life and works of Albert Einstein” if you are only going to cover special relativity, and don’t have a title “Equilibrium thermodynamics of non-oxide perovskite superconductors” if you are writing for a general audience. If your title is a question, make sure you answer it. It might be a good idea to write your title after you have finished your main text so that you can match it to what you have actually written.

1.2 Beginning, middle and end

To help your audience understand what you are telling them, begin with an introduction, and end with a summary. This is also true when giving a talk. Start by explaining what you will tell them, then tell them, then tell them what you told them. Repetition of key ideas makes them more memorable and help to emphasise what your audience should take away.

At the beginning, introduce the key ideas you will talk about. If you are writing an essay titled “The Solar Neutrino Problem“, you should explain what a solar neutrino is and why there is a problem. You might also like to explain why the reader should care. Sketching out the contents of the rest of the work is useful as it prepares the reader for what will follow: it’s like warm-up stretches for the mind. The introduction sets the scene for the arguments to follow.

The main body of your text contains most of the information, this is where you introduce your ideas and explain them. It is the burger between the buns of the introduction and conclusion. For longer documents, or subjects with many aspects, you might consider breaking this up into sections (and subsections). Using headings (perhaps numbered for reference) is good: skimming section headings should give an outline of the contents. Some sections within the main body might be sufficiently involved to merit their own introduction and summary. There should be a clear progression of ideas: if you find there is a big jump, try writing some text to cover the transition (“Having explained how neutrinos are produced in the Sun, we now consider how they are detected on the Earth”).

After presenting your arguments, it is good to summarise. As an example, a summary on the solar neutrino problem could be:

“Experiments measuring neutrinos from the Sun only detected about a third as many as expected. This could indicate either a problem with our understanding of solar physics or of particle physics. It is not possible to modify solar models to match both the measured neutrino flux and observations of luminosity and composition; however, the reduced flux could be explained by introducing neutrino oscillations. These were subsequently observed in several experiments. The solar neutrino problem has therefore been resolved by introducing new particle physics.”

Don’t introduce new arguments at this stage, this is just as unsatisfying as reading a murder mystery and discovering the murderer was someone never mentioned before. In my solar neutrino example, both the solar models and neutrino oscillations should have been discussed. Distilling your argument down to few lines also helps you to double-check your logic.

Either as part of your summary, of following on from it, end your writing with a conclusion. This is what you want your audience to have learnt (it should be the answer to your question). It is OK if you cannot produce a concrete answer, there are many cases where there is no clear-cut solution, perhaps more data is needed: in these cases, your conclusion is that there is no simple answer. To check that you have successfully wrapped things up, try reading just your introduction and conclusion; these should pair up to form a delicious (but bite-sized) sandwich.

1.3 Paragraphs

On a smaller scale, your writing is organised using paragraphing. Paragraphs are the building blocks of your arguments; each paragraph should address a single point or idea. Big blocks of text are hard to read (and look intimidating), so it is good to break them up. You can think of each paragraph as a micro-essay: the first sentence (usually) introduces the subject, you then go on to elaborate, before reaching a conclusion at the end (see section 1.2). To check that your paragraph sticks to a single point (and doesn’t need to be broken up), try reading the first and last sentences, usually they should make sense together.

1.4 Sentences

Paragraphs are constructed from sentences. Ensure your sentences make sense, that they are grammatically correct and that their subject is clear.

Vary your sentence length. In technical writing there is often the temptation, even amongst the best writers, to include long, convoluted sentences in order to fully describe a complicated idea and include all the relevant details, but these can be hard to read, both because of the complexity of their structure, which may require significant mental effort to unpack, and because by the time they finally conclude, the reader has forgotten the initial topic of the over-long, rambling sentence. Brevity gives impact. Shorter sentences are easier to understand. Breaking up your ideas helps the reader. Short sentences also get boring. They seem repetitive. They are tiring to read. They can send your reader to sleep. It is, therefore, better to have a range of sentence lengths. Include some short. In addition to these, have some longer sentences, as these allow you to join up your ideas. If you are unsure where to break up long sentences, look for commas (or semi-colons, etc.); if you are unsure where to put commas, read the sentence and see where you would pause.

2 Writing style and referencing

Having discussed how to structure your writing, we now move on to what to write. Technical writing has some specific requirements with regards to content, these might seem peculiar when first encountered. I’ll try to explain why we do certain things in technical writing, and give some ideas on how to incorporate these ideas to improve your own writing.

2.1 Be specific

The most common mistake I come across in my students’ work is the failure to be specific. The following two points (sections 2.2 and 2.3) are closely related to this. As an example, consider making a comparison:

  • Poor — “Nuclear power provides more energy than fossil fuel.”
  • Better — “Per unit mass of fuel, nuclear fission releases more energy than the burning of fossil fuel.”
  • Even better — “Nuclear fission can produce ~8000 times as much energy per unit mass of fuel as burning fossil fuels: the same amount of energy is produced from 16 kg of fossil fuels as by using 2 g of uranium in a standard reactor (MacKay, 2008).”

Here, we have specified exactly what we are comparing, given figures to allow a quantitative comparison, and provided references for those figures (see section 2.5). If possible, give numbers; don’t say “many ” or “lots” or “some”, but say “70%”, “9 billion” or “six Olympic swimming pools”.

Weak modifiers like “very”, “quite”, “somewhat” or “highly” are another example where it is better to be specific. What is the difference between being “hot” and being “very hot”? I might say that my bowl of soup is very hot, but does that tell you any more than if I just said it was hot? It is tempting to use these words for emphasis, surely if I were talking about the surface of the Sun we can agree that’s very hot? Not if you were to compare it to the centre of the Sun! Often, what is hot or cold, big or small, fast or slow depends upon the context. What is hot for soup is cold for the Sun, and what is cold for soup is hot for superconductors. It is much better to make distinctions by using figures: “The surface of the Sun is about 6000 K”.

It is OK to use “very” if you define the range where this is applicable, for example “High frequency radio waves are between 3 MHz and 30 MHz, very high frequency radio waves are between 30 MHz and 300 MHz, and ultra high frequency radio waves are between 300 MHz and 3 GHz.”

2.2 Provide justification

When putting forward an argument, it is necessary to include some evidence or justification to back it up. It is not sufficient merely to assert your opinion because you need the reader to follow your reasoning. If you are using someone else’s argument, you should provide a citation (section 2.5); the reader can then check there to find the reasoning. However, if it is an important point you might like to add some exposition. If you are being good about providing quantitative statements (section 2.1), you are already part way there as you can use those figures a back-up. For example, if discussing global warming, it is easy to argue it is important if you have already included figures on how many people would lose their homes to rising sea-levels, or if comparing materials, it is straightforward to argue that aluminium is better for making aeroplanes than steel if you have already included their densities. Sometimes, all that is required is an explanation of your reasoning, for example, “It is a good idea to build nuclear power plants because this reduces reliance on fossil fuels” or “It is not advisable to lick the surface of the Sun because it doesn’t taste of golden syrup.” Here, the reader might disagree that it is a good idea to build nuclear power plants, but they understand that you are using dependence on fossil fuels as an argument instead of, say, environmental issues, or the reader might agree that it is a bad idea to lick the Sun, but might have been thinking more about its temperature than its flavour. Even if the reader does not agree with your conclusions, they should understand how you reached them.

A similar idea is to show rather than tell. Don’t tell me that something is a fascinating topic or an exciting concept, get on with explaining it! Similarly, don’t just say something is important, but explain why it is important. This allows the reader to decide upon things themselves, if you have justified your arguments then they should follow your logic.

2.3 Use the correct word

In technical writing there is often a specific word that should be used in a particular context. In common usage we might use weight and mass interchangeably, in physics they have different meanings. This sometimes trips people up as they naturally try to find synonyms to reduce the monotony of their work. Always use the correct term.

Technical language can be full of jargon. This makes things difficult to understand for an outsider. It is important to define unfamiliar terms to help the reader. In particular, acronyms must be defined the first time they are used. As an example, “When talking about online materials, the uniform resource locator (URL), otherwise known as the web address, is a string of characters that identifies a resource.” Avoid jargon as much as possible; try to always use the simplest word for the job. It will be necessary to use technical terms to describe things accurately, but if they are introduced carefully, these need not confuse the reader.

A particular pet-peeve of mine is the use of scare quotes, which I always read as if the author is making air quotes. If quoting someone else’s choice of phrase then quotation marks are appropriate, and a reference must be provided (section 2.5). Most of the time, these quotation marks are used to indicate that the author thinks the terminology isn’t quite right. If the terminology is incorrect, use a different word (the correct one); if the terminology is correct (if that is what is used in the field), then the quotation marks aren’t needed!

2.4 Use equations and diagrams

Most physics problems involve solving an equation or two. For these mathematical questions, I am always encouraging my students to explain their work, to use words. When writing essays, I find they have the opposite problem: they only use prose and don’t include equations (or diagrams). Equations are useful for concisely and precisely explaining relationships, it is good to include them in writing.

Equations may put off general readers, but they improve the readability of technical work. Consider describing the kinetic energy of a (non-relativistic) particle:

  • With only words — “The kinetic energy of a particle depends upon its mass and speed: it is directly proportional to the mass and increases with the square of the speed.”
  • Using an equation — “The kinetic energy of a particle E is given by E = (1/2)mv^2, where m is its mass and v is its velocity.”

The second method is more straightforward, there is no ambiguity in our description, and we also get the factor of a half so the reader can go away at calculate things for themselves. This was just a simple equation; if we were considering something more complicated, such as the kinetic energy of a relativistic particle

\displaystyle E = \left(\frac{1}{\sqrt{1- v^2/c^2}} - 1\right)mc^2,

where c is the speed of light, it is much harder to produce a comprehensive description using only words. In this case, it is tempting to miss out reference to the equation. Sometimes this is justified: if the equation is too complicated a reader will not understand its meaning, but, in many cases, an equation allows you to show exactly how a system changes, and this is extremely valuable.

When including an equation, always define the symbols that you are using. Some common constants, such as \pi, might be understood, but it is better safe than sorry.

Equations should be correctly punctuated. They are read as part of the surrounding text, with the equals sign read as the verb “equals”, etc.

Using diagrams is another way of providing information in a clear, concise format. Like equations, diagrams can replace long and potentially confusing sections of text. Diagrams can be pictures of experimental set-up, schematics of the system under discussion, or show more abstract information, such as illustrating processes (perhaps as a flow chart). The cliché is that a picture is worth a thousand words; as diagrams are so awesome for conveying information, I’m not even going to attempt to give an example where I try to use only words. Below is as example figure, which I have chosen as it also includes equations.

“Figure 1 shows the proton–proton (pp) chain, the series of thermonuclear reactions that provides most (~99%) of Sun’s energy (Bahcall, Serenelli & Basu, 2005). There are several neutrino-producing reactions.”

The pp chain

Figure 1: The thermonuclear reactions of the pp chain. The traditional names of the produced neutrinos are given in bold and the branch names are given in parentheses. Percentages indicate branching fractions. Adapted from Giunti & Kin (2007).

Graphs can be used to show relationships between quantities, or collections of data. They can be used for theoretical models or experimental results. In the example below I show both. Graphs might be useful for plotting especially complicated functions, where the equation isn’t easy to understand. There are many types of graph (scatter plots, histograms, pie charts), and picking the best way to show your data can be as challenging as obtaining it in the first place!

“In figure 2 we plot the orbital decay of the Hulse–Taylor binary pulsar, indicated by the shift in periastron time (the point in the orbit where the stars are closest together). The data are in excellent agreement with the prediction assuming that the orbit evolves because of the emission of gravitational waves.”

Periastron shift of binary pulsar

Figure 2: The cumulative shift of periastron time as a function of time of the Hulse–Taylor binary pulsar (PSR B1913+16). The points are measured values, while the curve is the theoretical prediction assuming gravitational-wave emission. Taken from Weisberg & Taylor (2005).

All diagrams should have a descriptive caption. It is usually good to number these for ease of reference. If you are using someone else’s figure, make sure to explicitly cite them in the caption (see section 2.5)—you need to unambiguously acknowledge that you have taken someone else’s work, and have not just used their data or ideas (which would also warrant a citation) to make your own.

Tables can also be used to present data. Tables might be better than plots for when there are only a few numbers to present. Like figures, tables should have a caption (which includes relevant references if the data is taken from another source), they should be numbered, and they should be referred to explicitly in the text.

When writing, it is useful to remember that different people learn better through different means: some prefer words, some love equations, and other like visual representations. Including equations and figures can help you communicate effectively with a wider audience.

There are conventions for how to present equations, graphs and tables. I shall return to this in future posts. The rules may seem arcane, but they are designed to make communication clear.

2.5 Referencing

At the end of any good piece of technical writing there should be a list of references, hence I have tackled referencing last in the section. (Sometimes this is done in footnotes rather than the end, but I’m ignoring that). However, referencing should not be considered something that is just done at the end, or something that is tacked on at the end as an after-thought; it is one of the most important components of academic writing.

We include references for several reasons:

  1. To show the source of facts, figures and ideas. This allows readers to verify things that we quote, to double-check we’ve not made an error or misinterpreted things. It also shows distinguishes what is our own from what we have taken from elsewhere. This is important in avoiding plagiarism, as we acknowledge when we use someone else’s work.
  2. To provide the reader with a further source of information. It is not possible to explain everything, and a reader might be interested in finding out more about a topic, how a particular quantity was measured or how a particular calculation was done. By providing a reference we give the reader something further they can read if they want to (that doesn’t mean our work shouldn’t make sense on it’s own: you should be able to watch The Avengers without having seen Iron Man, but it’s still useful to know what to watch to find out the back-story). By following references readers can see how ideas have developed and changed, and gain a fuller understanding of a topic.
  3. To give credit for useful work. This is linked to the idea of not claiming the ideas as your own (avoiding plagiarism), but in addition to that, by referencing something you are publicising it, by using it you are claiming that it is of good-enough quality to be trusted. If you are to look at an academic article you will often see a link to citing articles. The number of citations is used as a crude measure of the value of that paper. Furthermore, this linking can allow a reader to work forwards, finding new ideas built upon those in that paper, just as they can work backwards by following references.
  4. To show you know your stuff. This might sound rather cynical, but it is important to do your research. To understand a topic you need to know what work has been done in that area (you can’t always derive everything from first principles yourself), and you demonstrate your familiarity with a field by include references.

You must always include citations in the text at the relevant point: if you use an idea include your source, if you introduce a concept say where it came from. It is not acceptable just to have a list of references at the end: does the reader have to go through all of these to figure out what came from where?

There are multiple styles for putting citations in text. The two most common are the following:

  • Numeric (or Vancouver) — using a number, e.g., [1], where the references at the end form an ordered list. This has the advantage of not taking up much space, especially when including citations to multiple papers, e.g. [1–5].
  • Author–year (or Harvard) — using the authors and year of publication to identify the paper, e.g., (Einstein, 1905). This has the advantage of making it easier to identify a paper: I’ve no idea what [13] is until I flick to the end, but I know what (Hulse & Taylor, 1975) is about.

Which style you use might be specified for you or it might be a free choice. Whichever style you use, the important thing is to include relevant references at the appropriate place in the text.

Having figured out why we should reference, where we should put references and how to include citations in the text, the last piece is how to assemble the bibliographic information to include at the end (or in footnotes). Exactly what information is included and how it is formatted depends on the particular style: there are endless combinations. Again, this might be specified for you or might be a free choice, just make sure you are consistent. Basic information that is always included are an author (this may be an organisation rather than a person), so we know who to attribute the work to, and a date so we know how up-to-date it is. Other information that is included depends upon the source we are referencing: a journal article will need the name of the journal, the volume and page number; a book will need a title, edition and publisher; a website will need a title and URL, etc. We need to include all the necessary information for the reader to find the exact source we used (hence we need to include the edition of a book, the date updated or written for a website, and so on).

There are numerous guides online for how to format references correctly. Some software does it automatically (I use Mendeley to produce BibTeX, but that’s not for everyone). The University of Birmingham has a guide to using Havard-style referencing that is comprehensive.

A final issue remains of which sources to reference: how do you know that a source is reliable? This is an in-depth question, so I shall return to it is a dedicated post.

3 Editing

Writing isn’t finished as soon as you have all your ideas on the page, things often take some polishing up. Some people like to perfect things as they go along, others prefer to get everything down in whatever form and go back through after. Here, I conclude with some tips for editing.

3.1 Be merciless

Keep your writing short. Don’t waste your readers’ time or overcomplicate things. Cut unnecessary words.

There are some phrases that are typically superfluous:

  • “Obviously…” — If it is obvious, then the reader will realise it; if it’s not, you are patronising them.
  • “It should be noted that…” — That would be why it’s written down! (I hope you are not writing things that shouldn’t be noted).
  • “Remember that…” — You’re reminding the reader by writing it.
  • Any of the modifiers like “very”, “quite” or “extremely” mentioned in the section 2.3.

3.2 Proof-read

The single best method to improve a piece of writing is to proof-read it. Reread what you have written to check that it says what you think it should. I find I have to wait for a while after writing something to read it properly, otherwise I read what I intended to write rather than what I actually did. Having others read it is an excellent way to check it makes sense (especially if you are not a native English speaker); this is best if they are representative of your target audience.

I hate it when others find a mistake in my writing. It’s like rubbing a cat the wrong way. However, each mistake you find and correct makes your writing a little better, and that’s really the important thing.

4 Summary

In conclusion, my main tips for good scientific writing are:

  • Plan what you want to tell your audience and how they will take your message away.
  • Say what you’re going to say (introduction), then say it (main text), then say what you said (conclusion).
  • Have a clear, logical flow, with one point per paragraph.
  • Be specific and back up with your points with quantitative data and references.
  • Use equations and diagrams to help explain.
  • Be concise.
  • Proof-read (and get a second opinion).

If you have any further ideas for improving essay writing, please leave a comment.

An introduction to probability: Inference and learning from data

Probabilities are a way of quantifying your degree of belief. The more confident you are that something is true, the larger the probability assigned to it, with 1 used for absolute certainty and 0 used for complete impossibility. When you get new information that updates your knowledge, you should revise your probabilities. This is what we do all the time in science: we perform an experiment and use our results to update what we believe is true. In this post, I’ll explain how to update your probabilities, just as Sherlock Holmes updates his suspicions after uncovering new evidence.

Taking an umbrella

Imagine that you are a hard-working PhD student and you have been working late in your windowless office. Having finally finished analysing your data, you decide it’s about time to go home. You’ve been trapped inside so long that you no idea what the weather is like outside: should you take your umbrella with you? What is the probability that it is raining? This will depend upon where you are, what time of year it is, and so on. I did my PhD in Cambridge, which is one of the driest places in England, so I’d be confident that I wouldn’t need one. We’ll assume that you’re somewhere it doesn’t rain most of the time too, so at any random time you probably wouldn’t need an umbrella. Just as you are about to leave, your office-mate Iris comes in dripping wet. Do you reconsider taking that umbrella? We’re still not certain that it’s raining outside (it could have stopped, or Iris could’ve just been in a massive water-balloon fight), but it’s now more probable that it is raining. I’d take the umbrella. When we get outside, we can finally check the weather, and be pretty certain if it’s raining or not (maybe not entirely certain as, after plotting that many graphs, we could be hallucinating).

In this story we get two new pieces of information: that newly-arrived Iris is soaked, and what we experience when we get outside. Both of these cause us to update our probability that it is raining. What we learn doesn’t influence whether it is raining or not, just what we believe regarding if it is raining. Some people worry that probabilities should be some statement of absolute truth, and so because we changed our probability of it raining after seeing that our office-mate is wet, there should be some causal link between office-mates and the weather. We’re not saying that (you can’t control the weather by tipping a bucket of water over your office-mate), our probabilities just reflect what we believe. Hopefully you can imagine how your own belief that it is raining would change throughout the story, we’ll now discuss how to put this on a mathematical footing.

Bayes’ theorem

We’re going to venture into using some maths now, but it’s not too serious. You might like to skip to the example below if you prefer to see demonstrations first. I’ll use P(A) to mean the probability of A. A joint probability describes the probability of two (or more things), so we have P(A, B) as the probability that both A and B happen. The probability that A happens given that B happens is the conditional probability P(A|B). Consider the the joint probability of A and B: we want both to happen. We could construct this in a couple of ways. First we could imagine that A happens, and then B. In this case we build up the joint probability of both by working out the probability that A happens and then the probability B happens given A. Putting that in equation form

P(A,B) = P(A)P(B|A).

Alternatively, we could have B first and then A. This gives us a similar result of

P(A,B) = P(B)P(A|B).

Both of our equations give the same result. (We’ve checked this before). If we put the two together then

P(B|A)P(A) = P(A|B)P(B).

Now we divide both sides by P(A) and bam:

\displaystyle P(B|A) = \frac{P(A|B)P(B)}{P(A)},

this is Bayes’ theorem. I think the Reverend Bayes did rather well to get a theorem named after him for noting something that is true and then rearranging! We use Bayes’ theorem to update our probabilities.

Usually, when doing inference (when trying to learn from some evidence), we have some data (that our office-mate is damp) and we want to work out the probability of our hypothesis (that it’s raining). We want to calculate P(\mathrm{hypothesis}|\mathrm{data}). We normally have a model that can predict how likely it would be to observe that data if our hypothesis is true, so we know P(\mathrm{data}|\mathrm{hypothesis}), so we just need to convert between the two. This is known as the inverse problem.

We can do this using Bayes’ theorem

\displaystyle P(\mathrm{hypothesis}|\mathrm{data}) = \frac{P(\mathrm{data}|\mathrm{hypothesis})P(\mathrm{hypothesis})}{P(\mathrm{data})}.

In this context, we give names to each of the probabilities (to make things sound extra fancy): P(\mathrm{hypothesis}|\mathrm{data}) is the posterior, because it’s what we get at the end; P(\mathrm{data}|\mathrm{hypothesis}) is the likelihood, it’s what you may remember calculating in statistics classes; P(\mathrm{hypothesis}) is the prior, because it’s what we believed about our hypothesis before we got the data, and P(\mathrm{data}) is the evidence. If ever you hear of someone doing something in a Bayesian way, it just means they are using the formula above. I think it’s rather silly to point this out, as it’s really the only logical way to do science, but people like to put “Bayesian” in the title of their papers as it sounds cool.

Whenever you get some new information, some new data, you should update your belief in your hypothesis using the above. The prior is what you believed about hypothesis before, and the posterior is what you believe after (you’ll use this posterior as your prior next time you learn something new). There are a couple of examples below, but before we get there I will take a moment to discuss priors.

About priors: what we already know

There have been many philosophical arguments about the use of priors in science. People worry that what you believe affects the results of science. Surely science should be above such things: it should be about truth, and should not be subjective! Sadly, this is not the case. Using Bayes’ theorem is the only logical thing to do. You can’t calculate a probability of what you believe after you get some data unless you know what you believed beforehand. If this makes you unhappy, just remember that when we changed our probability for it being raining outside, it didn’t change whether it was raining or not. If two different people use two different priors they can get two different results, but that’s OK, because they know different things, and so they should expect different things to happen.

To try to convince yourself that priors are necessary, consider the case that you are Sherlock Holmes (one of the modern versions), and you are trying to solve a bank robbery. There is a witness who saw the getaway, and they can remember what they saw with 99% accuracy (this gives the likelihood). If they say the getaway vehicle was a white transit van, do you believe them? What if they say it was a blue unicorn? In both cases the witness is the same, the likelihood is the same, but one is much more believable than the other. My prior that the getaway vehicle is a transit van is much greater than my prior for a blue unicorn: the latter can’t carry nearly as many bags of loot, and so is a silly choice.

If you find that changing your prior (to something else sensible) significantly changes your results, this just means that your data don’t tell you much. Imagine that you checked the weather forecast before leaving the office and it said “cloudy with 0–100% chance of precipitation”. You basically believe the same thing before and after. This really means that you need more (or better) data. I’ll talk about some good ways of calculating priors in the future.

Solving the inverse problem

Example 1: Doughnut allergy

We shall now attempt to use Bayes’ theorem to calculate some posterior probabilities. First, let’s consider a worrying situation. Imagine there is a rare genetic disease that makes you allergic to doughnuts. One in a million people have this disease, that only manifests later in life. You have tested positive. The test is 99% successful at detecting the disease if it is present, and returns a false positive (when you don’t have the disease) 1% of the time. How worried should you be? Let’s work out the probability of having the disease given that you tested positive

\displaystyle P(\mathrm{allergy}|\mathrm{positive}) = \frac{P(\mathrm{positive}|\mathrm{allergy})P(\mathrm{allergy})}{P(\mathrm{positive})}.

Our prior for having the disease is given by how common it is, P(\mathrm{allergy}) = 10^{-6}. The prior probability of not having the disease is P(\mathrm{no\: allergy}) = 1 - P(\mathrm{allergy}). The likelihood of our positive result is P(\mathrm{positive}|\mathrm{allergy}) = 0.99, which seems worrying. The evidence, the total probability of testing positive P(\mathrm{positive}) is found by adding the probability of a true positive and a false positive

 P(\mathrm{positive}) = P(\mathrm{positive}|\mathrm{allergy})P(\mathrm{allergy}) + P(\mathrm{positive}|\mathrm{no\: allergy})P(\mathrm{no\: allergy}).

The probability of a false positive is P(\mathrm{positive}|\mathrm{no\: allergy}) = 0.01. We thus have everything we need. Substituting everything in, gives

\displaystyle P(\mathrm{allergy}|\mathrm{positive}) = \frac{0.99 \times 10^{-6}}{0.99 \times 10^{-6} + 0.01 \times (1 - 10^{-6})} = 9.899 \times 10^{-5}.

Even after testing positive, you still only have about a one in ten thousand chance of having the allergy. While it is more likely that you have the allergy than a random member of the public, it’s still overwhelmingly probable that you’ll be fine continuing to eat doughnuts. Hurray!

Doughnut love

Doughnut love: probably fine.

Example 2: Boys, girls and water balloons

Second, imagine that Iris has three children. You know she has a boy and a girl, but you don’t know if she has two boys or two girls. You pop around for doughnuts one afternoon, and a girl opens the door. She is holding a large water balloon. What’s the probability that Iris has two girls? We want to calculate the posterior

\displaystyle P(\mathrm{two\: girls}|\mathrm{girl\:at\:door}) = \frac{P(\mathrm{girl\:at\:door}|\mathrm{two\: girls})P(\mathrm{two\: girls})}{P(\mathrm{girl\:at\:door})}.

As a prior, we’d expect boys and girls to be equally common, so P(\mathrm{two\: girls}) = P(\mathrm{two\: boys}) = 1/2. If we assume that it is equally likely that any one of the children opened the door, then the likelihood that one of the girls did so when their are two of them is P(\mathrm{girl\:at\:door}|\mathrm{two\: girls}) = 2/3. Similarly, if there were two boys, the probability of a girl answering the door is P(\mathrm{girl\:at\:door}|\mathrm{two\: boys}) = 1/3. The evidence, the total probability of a girl being at the door is

P(\mathrm{girl\:at\:door}) =P(\mathrm{girl\:at\:door}|\mathrm{two\: girls})P(\mathrm{two\: girls}) +P(\mathrm{girl\:at\:door}|\mathrm{two\: boys}) P(\mathrm{two\: boys}).

Using all of these,

\displaystyle P(\mathrm{two\: girls}|\mathrm{girl\:at\:door}) = \frac{(2/3)(1/2)}{(2/3)(1/2) + (1/3)(1/2)} = \frac{2}{3}.

Even though we already knew there was at least one girl, seeing a girl first makes it much more likely that the Iris has two daughters. Whether or not you end up soaked is a different question.

Example 3: Fudge!

Finally, we shall return to the case of Ted and his overconsumption of fudge. Ted claims to have eaten a lethal dose of fudge. Given that he is alive to tell the anecdote, what is the probability that he actually ate the fudge? Here, our data is that Ted is alive, and our hypothesis is that he did eat the fudge. We have

\displaystyle P(\mathrm{fudge}|\mathrm{survive}) = \frac{P(\mathrm{survive}|\mathrm{fudge})P(\mathrm{fudge})}{P(\mathrm{survive})}.

This is a case where our prior, the probability that he would eat a lethal dose of fudge P(\mathrm{fudge}), makes a difference. We know the probability of surviving the fatal dose is P(\mathrm{survive}|\mathrm{fudge}) = 0.5. The evidence, the total probability of surviving P(\mathrm{survive}),  is calculated by considering the two possible sequence of events: either Ted ate the fudge and survived or he didn’t eat the fudge and survived

P(\mathrm{survive}) = P(\mathrm{survive}|\mathrm{fudge})P(\mathrm{fudge}) + P(\mathrm{survive}|\mathrm{no\: fudge})P(\mathrm{no\: fudge}).

We’ll assume if he didn’t eat the fudge he is guaranteed to be alive, P(\mathrm{survive}| \mathrm{no\: fudge}) = 1. Since Ted either ate the fudge or he didn’t P(\mathrm{fudge}) + P(\mathrm{no\: fudge}) = 1. Therefore,

P(\mathrm{survive}) = 0.5 P(\mathrm{fudge}) + [1 - P(\mathrm{fudge})] = 1 - 0.5 P(\mathrm{fudge}).

This gives us a posterior

\displaystyle P(\mathrm{fudge}|\mathrm{survive}) = \frac{0.5 P(\mathrm{fudge})}{1 - 0.5 P(\mathrm{fudge})}.

We just need to decide on a suitable prior.

If we believe Ted could never possibly lie, then he must have eaten that fudge and P(\mathrm{fudge}) = 1. In this case,

\displaystyle P(\mathrm{fudge}|\mathrm{survive}) = \frac{0.5}{1 - 0.5} = 1.

Since we started being absolutely sure, we end up being absolutely sure: nothing could have changed our mind! This is a poor prior: it is too strong, we are being closed-minded. If you are closed-minded you can never learn anything new.

If we don’t know who Ted is, what fudge is, or the ease of consuming a lethal dose, then we might assume an equal prior on eating the fudge and not eating the fudge, P(\mathrm{fudge}) = 0.5. In this case we are in a state of ignorance. Our posterior is

\displaystyle P(\mathrm{fudge}|\mathrm{survive}) = \frac{0.5 \times 0.5}{1 - 0.5 \times 0.5} = \frac{1}{3}.

 Even though we know nothing, we conclude that it’s more probable that the Ted did not eat the fudge. In fact, it’s twice as probable that he didn’t eat the fudge than he did as P(\mathrm{no\: fudge}|\mathrm{survive}) = 1 -P(\mathrm{fudge}|\mathrm{survive}) = 2/3.

In reality, I think that it’s extremely improbable anyone could consume a lethal dose of fudge. I’m fairly certain that your body tries to protect you from such stupidity by expelling the fudge from your system before such a point. However, I will concede that it is not impossible. I want to assign a small probability to P(\mathrm{fudge}). I don’t know if this should be one in a thousand, one in a million or one in a billion, but let’s just say it is some small value p. Then

\displaystyle P(\mathrm{fudge}|\mathrm{survive}) = \frac{0.5 p}{1 - 0.5 p} \approx 0.5 p.

as the denominator is approximately one. Whatever small probability I pick, it is half as probable that Ted ate the fudge.

Mr. Impossible

I would assign a much higher probability to Mr. Impossible being able to eat that much fudge than Ted.

While it might not be too satisfying that we can’t come up with incontrovertible proof that Ted didn’t eat the fudge, we might be able to shut him up by telling him that even someone who knows nothing would think his story is unlikely, and that we will need much stronger evidence before we can overcome our prior.

Homework example: Monty Hall

You now have all the tools necessary to tackle the Monty Hall problem, one of the most famous probability puzzles:

You are on a game show and are given the choice of three doors. Behind one is a car (a Lincoln Continental), but behind the others are goats (which you don’t want). You pick a door. The host, who knows what is behind the doors, opens another door to reveal goat. They then offer you the chance to switch doors. Should you stick with your current door or not? — Monty Hall problem

You should be able to work out the probability of winning the prize by switching and sticking. You can’t guarantee you win, but you can maximise your chances.

Summary

Whenever you encounter new evidence, you should revise how probable you think things are. This is true in science, where we perform experiments to test hypotheses; it is true when trying to solve a mystery using evidence, or trying to avoid getting a goat on a game show. Bayes’ theorem is used to update probabilities. Although Bayes’ theorem itself is quite simple, calculating likelihoods, priors and evidences for use in it can be difficult. I hope to return to all these topics in the future.

White lab coats, pink tutus and camouflage fatigues

In this post I contemplate the effects of stereotypes and biases. I hope that this will encourage you to examine these ideas too. I promise I’ll get back to more science soon.

Just over a week ago, I helped with an outreach event for year nine students. Some of the astrophysics PhD students and I ran an interactive lecture on gravity and its importance in astrophysics. These type of events are fun: you get to teach some physics to a (usually) enthusiastic audience, and hopefully inspire them to consider studying the subject. I also get to play with our Lycra Universe. I think it’s especially important to show students what a university environment is like and have them interact with real scientists. It is important to counter the stereotype that studying science means that you’ll spend all day in a lab wearing a white lab coat. (Although that would be cool. I’d want goggles too, and maybe a doomsday device).

This event was to promote the studying of STEM subjects. That’s science, technology, engineering and mathematics, because there’s nothing like an acronym to make things accessible. It is often argued that we need more people trained in STEM subjects for the economy, industry, or just so we can finally get pizza over the Internet. I like to encourage people to study these areas as I think it’s good to have a scientifically-literate population. Also, because science is awesome! The event was aimed specifically at encouraging a group who are under-represented at university-level STEM, namely girls.

There has been much written on gender and subject choice. I would recommend the Closing Doors report by the Institute of Physics. I will not attempt to unravel this subject. In all my experience, I have never noticed any difference in aptitude between genders. I don’t believe that the ability to pee standing up gives any advantage when studying physics—one could argue for a better understanding of parabolic motion, but anyone who has paid attention to the floor in the gents (I advise against this), knows this is demonstrably not the case. I assume the dominant factors are social pressures: a vicious circle of a subject becoming more associated with one gender, which makes people feel self-conscious or out of place studying it. Also: there are always bigots. It’s a real shame to be potentially missing out on capable scientists. There have been many attempts to try to counter this trend, to break the cycle—some of them truly awful.

Good arguments have been made that the gender segregation of toys pushes girls away from science and technology from an early age. (For some reason, there seems to be a ridiculous idea that women can only relate to things that are pink). It makes sense to me that if only boys get the chemistry sets and construction toys, then they are going to be more numerous in the STEM subjects. The fact that a few female LEGO scientists merits coverage in nation newspapers, the BBC, etc. shows something isn’t quite right.

We are all influenced by our childhoods, and this got me thinking: I know of negative impacts for women from these gender biases, what are they for men? If women are under-represented in engineering, maths and physics, then men must be under-represented somewhere else to balance things: namely English, biology (conspicuous amongst the STEM subjects) and languages. We are short of male teachers and nurses. It seems that men are pushed away from caring careers or those with emphasis on communication.

The lack of men in certain professions is a problem, although I would say less so than the continued under-representation of women at senior positions (say as professors, CEOs or members of government). I was about to relax, since I hadn’t uncovered yet another unconscious bias to add to the list. Then I checked the news. I don’t know what’s in the news when you’re reading this, but at the time it was conflict in Ukraine, Iraq and Israel–Palestine—I assume things are much better in the future? One thing that struck me was that the combatants in the photos were almost exclusively men. It then occurred to me that for every girl who plays with a ballerina doll, there is a boy who plays with an action figure with a weapon. I’m not as naive as to suggest it’s a simple as growing up to be exactly like your toys (I, regrettably, am neither a dinosaur nor a cuddly elephant), but perhaps it is worth keeping in the front of our mind what identities we associate with each gender and how we project these onto children. I don’t want to say that being a ballerina isn’t a good vocation or hobby, or that being a soldier is a bad career. (Curiously, I believe that some of the requirements to be a good ballet dancer or soldier overlap, say discipline, determination, physical fitness and, perhaps, empathy). However, I think it is dangerous if we raise girls who primarily aspire to be pretty, and boys who resolve conflict through violence (men are both more likely to be victims of homicide and suicide).

In conclusion, stereotypes can be damaging, be it that scientists are all socially-awkward comic-book geeks as in The Big Bang Theory, that men can’t talk about their feelings, or that women must be mothers. There is a balance between the genders: by assigning one quality to a particular gender, you can push the other away. Mathematical ability shouldn’t be masculine and compassion shouldn’t be feminine. This is not a new idea, but conveniently coincides with Emma Watson’s wonderful speech for the UN as part of the HeForShe campaign. Cultural biases might be more significant than you think, so give them some extra attention. Sexism hurts everyone, so let’s cut it out and all go play with some LEGO.

The Big Bang Theory

The Big Bang Theory‘s popularity has been credited with encouraging more students to take physics. The cast reflects traditional stereotypes: the men are physicists, an astronomer and an engineer, the women are two biologists and Penny.

How big is a black hole?

Physicist love things that are simple. This may be one of the reasons that I think black holes are cool.

Black holes form when you have something so dense that nothing can resist its own gravity: it collapses down becoming smaller and smaller. Whatever formerly made up your object (usually, the remains of what made up a star), is crushed out of existence. It becomes infinitely compact, squeezed into an infinitely small space, such that you can say that the whatever was there no longer exists. Black holes aren’t made of anything: they are just empty spacetime!

A spherical cow

Daisy, a spherical cow, or “moo-on”. Spherical cows are highly prized as pets amongst physicists because of their high degree of symmetry and ability to survive in a vacuum. They also produce delicious milkshakes.

Black holes are very simple because they are just vacuum. They are much simpler than tables, or mugs of coffee, or even spherical cows, which are all made up of things: molecules and atoms and other particles all wibbling about and interacting with each other. If you’re a fan of Game of Thrones, then you know the plot is rather complicated because there are a lot of characters. However, in a single glass of water there may be 1025 molecules: imagine how involved things can be with that many things bouncing around, occasionally evaporating, or plotting to take over the Iron Throne and rust it to pieces! Even George R. R. Martin would struggle to kill off 1025 characters. Black holes have no internal parts, they have no microstructure, they are just… nothing…

(In case you’re the type of person to worry about such things, this might not quite be true in a quantum theory, but I’m just treating them classically here.)

Since black holes aren’t made of anything, they don’t have a surface. There is no boundary, no crispy sugar shell, no transition from space to something else. This makes it difficult to really talk about the size of black holes: it is a question I often get asked when giving public talks. Black holes are really infinitely small if we just consider the point that everything collapsed to, but that’s not too useful. When we want to consider a size for a black hole, we normally use its event horizon.

Point of no return sign

The event horizon is not actually sign-posted. It’s not possible to fix a sign-post in empty space, and it would be sucked into the black hole. The sign would disappear faster than a Ramsay Street sign during a tour of the Neighbours set.

The event horizon is the point of no return. Once passed, the black hole’s gravity is inescapable; there’s no way out, even if you were able to travel at the speed of light (this is what makes them black holes). The event horizon separates the parts of the Universe where you can happily wander around from those where you’re trapped plunging towards the centre of the black hole. It is, therefore, a sensible measure of the extent of a black hole: it marks the region where the black hole’s gravity has absolute dominion (which is better than possessing the Iron Throne, and possibly even dragons).

The size of the event horizon depends upon the mass of the black hole. More massive black holes have stronger gravity, so there event horizon extends further. You need to stay further away from bigger black holes!

If we were to consider the simplest type of black hole, it’s relatively (pun intended) easy to work out where the event horizon is. The event horizon is a spherical surface, with radius

\displaystyle r_\mathrm{S} = \frac{2GM}{c^2},

This is known as the Schwarzschild radius, as this type of black hole was first theorised by Karl Schwarszchild (who was a real hard-core physicist). In this formula, M is the black hole’s mass (as it increases, so does the size of the event horizon); G is Newton’s gravitational constant (it sets the strength of gravity), and c is the speed of light (the same as in the infamous E = mc^2). You can plug in some numbers to this formula (if anything like me, two or three times before getting the correct answer), to find out how big a black hole is (or equivalently, how much you need to squeeze something before it will collapse to a black hole).

What I find shocking is that black holes are tiny! I meant it, they’re really small. The Earth has a Schwarzschild radius of 9 mm, which means you could easily lose it down the back of the sofa. Until it promptly swallowed your sofa, of course. Stellar-mass black holes are just a few kilometres across. For comparison, the Sun has a radius of about 700,000 km. For the massive black hole at the centre of our Galaxy, it is 1010 m, which does sound a lot until you release that it’s less than 10% of Earth’s orbital radius, and it’s about four million solar masses squeezed into that space.

The event horizon changes shape if the black hole has angular momentum (if it is spinning). In this case, you can get closer in, but the position of the horizon doesn’t change much. In the most extreme case, the event horizon is at a radius of

\displaystyle r_\mathrm{g} = \frac{GM}{c^2}.

Relativists like this formula, since it’s even simpler than for the Schwarzscild radius (we don’t have to remember the value of two), and it’s often called the gravitational radius. It sets the scale in relativity problems, so computer simulations often use it as a unit instead of metres or light-years or parsecs or any of the other units astronomy students despair over learning.

We’ve now figured out a sensible means of defining the size of a black hole: we can use the event horizon (which separates the part of the Universe where you can escape form the black hole, from that where there is no escape), and the size of this is around the gravitational radius r_\mathrm{g}. An interesting consequence of this (well, something I think is interesting), is to consider the effective density of a black hole. Density is how much mass you can fit into a given space. In our case, we’ll consider the mass of the black hole and the volume of its event horizon. This would be something like

\displaystyle \rho = \frac{3 M}{4 \pi r_\mathrm{g}^3} = \frac{3 c^6}{4 \pi G^3 M^2},

where I’ve used \rho for density and you shouldn’t worry about the factors of \pi or G or c, I’ve just put them in case you were curious. The interesting result is that the density decreases as the mass increases. More massive black holes are less dense! In fact, the most massive black holes, about a billion times the mass of our Sun, are less dense than water. They would float if you could find a big enough bath tub, and could somehow fill it without the water collapsing down to a black hole under its own weight…

In general, it probably makes a lot more sense (and doesn’t break the laws of physics), if you stick with a rubber duck, rather than a black hole, as a bath-time toy.

In conclusion, black holes might be smaller (and less dense) than you’d expect. However, this doesn’t mean that they’re not very dangerous. As Tyrion Lannister has shown, it doesn’t pay to judge someone by their size alone.

How sport is like science

Athene Donald, Professor of Experimental Physics and soon-to-be Master of my old college, Churchill, recently blogged about how athletics resembles academia. She argued that both are hard careers: they require many years of training, and even then success is not guaranteed—not everyone will reach the top to become an Olympian or a Professor—there is a big element of luck too—a career can stall because of an injury or because of time invested in a study that eventually yields null results, and, conversely, a single big championship win or serendipitous discovery can land a comfortable position. These factors can make these career paths unappealing, but still most people who enter them do so because they love the area, and have a real talent for the field.

The Breakfast Club

As The Breakfast Club taught us, being into physics or sports can have similar pressures.

I find this analogy extremely appealing. There are many parallels. Both sports and academic careers are meritocratic and competitive. Most who enter them will not become rich—those who do, usually manage it by making use of their profile, either through product endorsement or through writing a book, say Stephen Hawking, or Michael Jordan (although he was still extremely well paid). Both fields have undisputed heavy-weights like Einstein or Muhammad Ali, and media superstars like Neil deGrasse Tyson or Anna Kournikova; both have inspirational figures who have overcome adversity, be they Jesse Owens or Emmy Noether, and idols whose personal lives you probably shouldn’t emulate, say Tiger Woods or Richard Feynman. However, I think the similarity can stretch beyond career paths.

Athene says that although she doesn’t participate in athletics, she does enjoy watching the sport. I’m sure many can empathise with that position. I think that this is similarly the case for research: many enjoy finding out about new discoveries or ideas, even though they don’t want to invest the time studying themselves. There are many excellent books and documentaries, many excellent communicators of research. (I shall be helping out at this year’s British Science Festival, which I’m sure will be packed with people keen to find out about current research.) However, there is undoubtedly more that could be done, both in terms of growing the market and improving the quality—reporting of science is notoriously bad. If you were to go into any pub in the country, I’d expect you’d be able to find someone to have an in-depth conversation with about how best to manage the national football team, despite them not being a professional footballer. Why not someone with similar opinions about research council funding? Can we make research as popular as sport?

Increasing engagement with and awareness of research is a popular subject, most research grants with have some mention of wider impact; however, I don’t think that this is the only goal. According to UK government research, many young students do enjoy science, they just don’t feel it is for them. The problem is that people think that science is too difficult. Given my previous ramblings, that’s perhaps understandable. However, that was for academic research; science is far broader than that! There are many careers outside the lab, and understanding science is useful even if that’s not your job, for example when discussing subjects like global warming or vaccination that affect us all. Coming back to our sports analogy, the situation is like children not wanting to play football because they won’t be a professional. It’s true that most people aren’t good enough to play for England (potentially including members of the current squad, depending upon who you ask in that pub), but that doesn’t mean you can’t enjoy a kick around, perhaps play for a local team at weekend, or even coach others. Playing sports regular keeps you physically fit, which is a good thing™; taking an interest in science (or language or literature or etceteras) keeps you mentally fit, also a good thing™.

Chocolate models

Chocolate is also a good thing™. However, neither Nobel Prizes nor Olympic Medals are made of chocolate, something I’m not sure that everyone appreciates. I’d make the gold Olympic models out of milk chocolate, silver out of white and bronze out of dark. The Nobel Prize for Medicine should contain nuts as an incentive to cure allergies; the Prize for Economics should be mint(ed) chocolate, the Peace Prize Swiss chocolate, the Chemistry Prize should contain popping candy, and the Physics Prize should be orange chocolate (that’s my favourite).

How to encourage more people to engage in science is a complicated problem. There’s no single solution, but it is something to work on. I would definitely prefer to live in a science-literate society. Stressing applications of science beyond pure research might be one avenue. I would also like to emphasis that it’s OK to find science (and maths) hard. Problem solving is difficult, like long-distance running, but if you practise, it does get easier. I can only vouch for one side of that simile from personal experience, but since I’m a theoretician, I’m happy enough to state that without direct experimental confirmation. I guess that means I should take my own advice and participate more myself: spend a little more time being physically active? Motivating myself is also a difficult problem.

The missing link for black holes

There has been some recent excitement about the claimed identification of a 400-solar-mass black hole. A team of scientists have recently published a letter in the journal Nature where they show how X-ray measurements of a source in the nearby galaxy M82 can be interpreted as originating from a black hole with mass of around 400 times the mass of the Sun—from now on I’ll use M_\odot as shorthand for the mass of the Sun (one solar mass). This particular X-ray source is peculiarly bright and has long been suspected to potentially be a black hole with a mass around 100 M_\odot to 1000 M_\odot. If the result is confirmed, then it is the first definite detection of an intermediate-mass black hole, or IMBH for short, but why is this exciting?

Mass of black holes

In principle, a black hole can have any mass. To form a black hole you just need to squeeze mass down into a small enough space. For the something the mass of the Earth, you need to squeeze down to a radius of about 9 mm and for something about the mass of the Sun, you need to squeeze to a radius of about 3 km. Black holes are pretty small! Most of the time, things don’t collapse to form black holes because they materials they are made of are more than strong enough to counterbalance their own gravity.

Marshmallows

These innocent-looking marshmallows could collapse down to form black holes if they were squeezed down to a size of about 10−29 m. The only thing stopping this is the incredible strength of marshmallow when compared to gravity.

Stellar-mass black holes

Only very massive things, where gravitational forces are immense, collapse down to black holes. This happens when the most massive stars reach the end of their lifetimes. Stars are kept puffy because they are hot. They are made of plasma where all their constituent particles are happily whizzing around and bouncing into each other. This can continue to happen while the star is undergoing nuclear fusion which provides the energy to keep things hot. At some point this fuel runs out, and then the core of the star collapses. What happens next depends on the mass of the core. The least massive stars (like our own Sun) will collapse down to become white dwarfs. In white dwarfs, the force of gravity is balanced by electrons. Electrons are rather anti-social and dislike sharing the same space with each other (a concept known as the Pauli exclusion principle, which is a consequence of their exchange symmetry), hence they put up a bit of a fight when squeezed together. The electrons can balance the gravitational force for masses up to about 1.4 M_\odot, known as the Chandrasekhar mass. After that they get squeezed together with protons and we are left with a neutron star. Neutron stars are much like giant atomic nuclei. The force of gravity is now balanced by the neutrons who, like electrons, don’t like to share space, but are less easy to bully than the electrons. The maximum mass of a neutron star is not exactly known, but we think it’s somewhere between 2 M_\odot and 3 M_\odot. After this, nothing can resist gravity and you end up with a black hole of a few times the mass of the Sun.

Collapsing stars produce the imaginatively named stellar-mass black holes, as they are about the same mass as stars. Stars lose a lot of mass during their lifetime, so the mass of a newly born black hole is less than the original mass of the star that formed it. The maximum mass of stellar-mass black holes is determined by the maximum size of stars. We have good evidence for stellar-mass black holes, for example from looking at X-ray binaries, where we see a hot disc of material swirling around the black hole.

Massive black holes

We also have evidence for another class of black holes: massive black holes, MBHs to their friends, or, if trying to sound extra cool, supermassive black holes. These may be 10^5 M_\odot to 10^9 M_\odot. The strongest evidence comes from our own galaxy, where we can see stars in the centre of the galaxy orbiting something so small and heavy it can only be a black hole.

We think that there is an MBH at the centre of pretty much every galaxy, like there’s a hazelnut at the centre of a Ferrero Rocher (in this analogy, I guess the Nutella could be delicious dark matter). From the masses we’ve measured, the properties of these black holes is correlated with the properties of their surrounding galaxies: bigger galaxies have bigger MBHs. The most famous of these correlations is the M–sigma relation, between the mass of the black hole (M) and the velocity dispersion, the range of orbital speeds, of stars surrounding it (the Greek letter sigma, \sigma). These correlations tell us that the evolution of the galaxy and it’s central black hole are linked somehow, this could be just because of their shared history or through some extra feedback too.

MBHs can grow by accreting matter (swallowing up clouds of gas or stars that stray too close) or by merging with other MBHs (we know galaxies merge). The rather embarrassing problem, however, is that we don’t know what the MBHs have grown from. There are really huge MBHs already present in the early Universe (they power quasars), so MBHs must be able to grow quickly. Did they grow from regular stellar-mass black holes or some form of super black hole that formed from a giant star that doesn’t exist today? Did lots of stellar-mass black holes collide to form a seed or did material just accrete quickly? Did the initial black holes come from somewhere else other than stars, perhaps they are leftovers from the Big Bang? We don’t have the data to tell where MBHs came from yet (gravitational waves could be useful for this).

Intermediate-mass black holes

However MBHs grew, it is generally agreed that we should be able to find some intermediate-mass black holes: black holes which haven’t grown enough to become IMBHs. These might be found in dwarf galaxies, or maybe in globular clusters (giant collections of stars that formed together), perhaps even in the centre of galaxies orbiting an MBH. Finding some IMBHs will hopefully tell us about how MBHs formed (and so, possibly about how galaxies formed too).

IMBHs have proved elusive. They are difficult to spot compared to their bigger brothers and sisters. Not finding any might mean we’d need to rethink our ideas of how MBHs formed, and try to find a way for them to either be born about a million times the mass of the Sun, or be guaranteed to grow that big. The finding of the first IMBH tells us that things are more like common sense would dictate: black holes can come in the expected range of masses (phew!). We now need to identify some more to learn about their properties as a population.

In conclusion, black holes can come in a range of masses. We know about the smaller stellar-mass ones and the bigger massive black holes. We suspect that the bigger ones grow from smaller ones, and we now have some evidence for the existence of the hypothesised intermediate-mass black holes. Whatever their size though, black holes are awesome, and they shouldn’t worry about their weight.

An introduction to probability: Great expectations

We use probabilities a lot in science. Previously, I introduced the concept of probabilities, here I’ll explain the concept of expectation and averages. Expectation and average values are one of the most useful statistics that we can construct from a probability distribution. This post contains a traces of calculus, but is peanut free.

Expectations

Imagine that we have a discrete set of numeric outcomes, such as the number from rolling a die. We’ll label these as x_1, x_2, etc., or as x_i where the subscript i is used as shorthand to indicate any of the possible outcomes. The probability of the numeric value being a particular x_i is given by P(x_i). For rolling our dice, the outcomes are one to six (x_1 =1, x_2 = 2, etc.) and the probabilities are

\displaystyle P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = \frac{1}{6}.

The expectation value is the sum of all the possible outcomes multiplied by their respective probabilities,

\displaystyle \langle x \rangle = \sum_i x_i P(x_i),

where \sum_i means sum over all values of i (over all outcomes). The expectation value for rolling a die is

\displaystyle \langle x \rangle = 1 \times P(1) + 2 \times P(2) + 3 \times P(3) + 4 \times P(4) + 5 \times P(5) + 6 \times P(6) = \frac{7}{2} .

The expectation value of a distribution is its average, the value you’d expect after many (infinite) repetitions. (Of course this is possible in reality—you’d certainly get RSI—but it is useful for keeping statisticians out of mischief).

For a continuous distribution, the expectation value is given by

\displaystyle \langle x \rangle = \int x p(x) \, \mathrm{d} x ,

where p(x) is the probability density function.

You can use the expectation value to guide predictions for the outcome. You can never predict with complete accuracy (unless there is only one possible outcome), but you can use knowledge of the probabilities of the various outcomes the inform your predictions.

Imagine that after buying a large quantity of fudge, for entirely innocent reasons, the owner offers you the chance to play double-or-nothing—you’ll either pay double the bill or nothing, based on some random event—should you play?  Obviously, this depends upon the probability of winning. Let’s say that the probability of winning is p and find out how high it needs to be to be worthwhile. We can use the expectation value to calculate how much we should expect to pay, if this is less than the bill as it stands, it’s worth giving it a go, if the expectation value is larger than the original bill, we should expect to pay more (and so probably shouldn’t play). The expectation value is

\displaystyle \langle x \rangle = 0 \times (1 - p) + 2 \times p = 2 p,

where I’m working in terms of unified fudge currency, which, shockingly, is accepted in very few shops, but has the nice property that your fudge bill is always 1. Anyhow, if \langle x \rangle is less than one, so if p < 0.5, it’s worth playing. If we were tossing a (fair) coin, we’d expect to come out even, if we had to roll a six, we’d expect to pay more.

The expectation value is the equivalent of the mean. This is the average that people usually think of first. If you have a set of numeric results, you calculate the mean by adding up all or your results and dividing by the total number of results N. Imagine each outcome x_i occurs n_i times, then the mean is

\displaystyle \bar{x} = \sum_i x_i \frac{n_i}{N}.

We can estimate the probability of each outcome as P(x_i) = n_i/N so that \bar{x} = \langle x \rangle.

Median and mode

Aside from the mean there are two other commonly used averages, the median and the mode. These aren’t quite as commonly used, despite sounding friendlier. With a set of numeric results, the median is the middle result and the mode is the most common result. We can define equivalents for both when dealing with probability distributions.

To calculate the median we find the value where the total probability of being smaller (or bigger) than it is a half: P(x < x_\mathrm{med}) = 0.5. This can be done by adding up probabilities until you get a half

\displaystyle \sum_{x_i \, \leq \, x_\mathrm{med} } P(x_i) = 0.5.

For a continuous distribution this becomes

\displaystyle \int_{x_\mathrm{low}}^{x_\mathrm{med}} p(x) \, \mathrm{d}x = 0.5,

where x_\mathrm{low} is the lower limit of the distribution. (That’s all the calculus out of the way now, so if you’re not a fan you can relax). The LD50 lethal dose is a median. The median is effectively the middle of the distribution, the point at which you’re equally likely to be higher or lower.

The median is often used as it is not as sensitive as the mean to a few outlying results which are far from the typical values.

The mode is the value with the largest probability, the most probable outcome

\displaystyle P(x_\mathrm{mode}) = \max P(x_i).

For a continuous distribution, it is the point which maximises the probability density function

\displaystyle p(x_\mathrm{mode}) = \max p(x).

The modal value is the most probable outcome, the most likely result, the one to bet on if you only have one chance.

Education matters

Every so often, some official, usually an education minister, says something about wanting more than half of students to be above average. This results in much mocking, although seemingly little rise in the standards for education ministers. Having discussed averages ourselves, we can now see if it’s entirely fair to pick on these poor MPs.

The first line of defence, is that we should probably specify the distribution we’ve averaging. It may well be that they actually meant the average bear. It’s a sad truth that bears perform badly in formal education. Many blame the unfortunate stereotyping resulting from Winnie the Pooh. It might make sense to compare with performance in the past to see if standards are improving. We could imagine that taking the average from the 1400s would indeed show some improvement. For argument’s sake, let’s say that we are indeed talking about the average over this year’s students.

If the average we were talking about was the median, then it would be impossible for more (or fewer) than half of students to do better than average. In the case, it is entirely fair to mock the minister, and possibly to introduce them to the average bear. In this case, a mean bear.

If we were referring to the mode, then it is quite simple for more than half of the students to do better than this. To achieve this we just need a bottom-heavy distribution, a set of results where the most likely outcome is low, but most students do better than this. We might want to question an education system where the aim is to have a large number of students doing poorly though!

Finally, there is the mean; to use the mean, we first have to decide if we have a sensible if we are averaging a sensible quantity. For education performance this normally means exam results. Let’s sidestep the issue of if we want to reduce the output of the education system down to a single number, and consider the properties we want in order to take a sensible average. We want the results to be numeric (check); to be ordered, such that high is good and low is bad (or vice versa) so 42 is better than 41 but not as good as 43 and so on (check), and to be on a linear scale. The last criterion means that performance is directly proportional to the mark: a mark twice as big is twice as good. Most exams I’ve encountered are not like this, but I can imagine that it is possible to define a mark scheme this way. Let’s keep imagining, and assume things are sensible (and perhaps think about kittens and rainbows too… ).

We can construct a distribution where over half of students perform better than the mean. In this case we’d really need a long tail: a few students doing really very poorly. In this case, these few outliers are enough to skew the mean and make everyone else look better by comparison. This might be better than the modal case where we had a glut of students doing badly, as now we can have lots doing nicely. However, it also means that there are a few students who are totally failed by the system (perhaps growing up to become a minister for education), which is sad.

In summary, it is possible to have more than 50% of students performing above average, assuming that we are not using the median. It’s therefore unfair to heckle officials with claims of innumeracy. However, for these targets to be met requires lots of students to do badly. This seems like a poor goal. It’s probably better to try to aim for a more sensible distribution with about half of students performing above average, just like you’d expect.

On symmetry

Dave Green only combs half of his beard, the rest follows by symmetry. — Dave Green Facts

Physicists love symmetry! Using symmetry can dramatically simplify a problem. The concept of symmetry is at the heart of modern theoretical physics and some of the most beautiful of scientific results.

In this post, I’ll give a brief introduction to how physicists think about symmetry. Symmetry can be employed in a number of ways when tackling a problem; we’ll have a look at how they can help you ask the right question and then check that your answer makes sense. In a future post I hope to talk about Noether’s Theorem, my all-time favourite result in theoretical physics, which is deeply entwined with the concept of symmetry. First, we shall discuss what we mean when we talk about symmetry.

What is symmetry?

We say something is symmetric with respect to a particular operation if it is unchanged after that operation. That might sound rather generic, but that’s because the operation can be practically anything. Let’s consider a few examples:

  • Possibly the most familiar symmetry would be reflection symmetry, when something is identical to its mirror image. Something has reflection symmetry if it is invariant under switching left and right. Squares have reflection symmetry along lines in the middle of their sides and along their diagonals, rectangles only have reflection symmetry along the lines in the middle of their sides, and circles have reflection symmetry through any line that goes through their centre.
    The Star Trek Mirror Universe actually does not have reflection symmetry with our own Universe. First, they switch good and evil, rather than left and right, and second, after this transformation, we can tell the two universes apart by checking Spock’s beard.
  • Rotational symmetry is when an object is identical after being rotated. Squares are the same after a 90° rotation, rectangles are the same after a 180° rotation, and circles are the same after a rotation by any angle. There is a link between the rotational symmetry of these shapes and their mirror symmetry: you can combine two reflections to make a rotation. With rotations we have seen that symmetries can either be discrete, as for a square when we have to rotate by multiples of 90°, or continuous, as for the circle where we can pick any angle we like.
  • Translational symmetry is similar to rotational symmetry, but is when an object is the same when shifted along a particular direction. This could be a spatial direction, so shifting everything to the left, or in time. This are a little more difficult to apply to the real world than the simplified models that physicists like to imagine.
    For translational invariance, imagine an infinite, flat plane, the same in all directions. This would be translational invariant in any direction parallel to the ground. It would be a terrible place to lose your keys. If you can imagine an infinite blob of tangerine jelly, that is entirely the same in all directions, we can translate in any direction we like. We think the Universe is pretty much like this on the largest scales (where details like galaxies are no longer important), except, it’s not quite as delicious.
    The backgrounds in some Scooby-Doo cartoons show periodic translational invariance: they repeat on a loop, so if you translate by the right amount they are the same. This is a discrete symmetry, just like rotating my a fixed angle. Similarly, if you have a rigid daily routine, such that you do the same thing at the same time every day, then your schedule is symmetric with respect to a time translation of 24 hours.
  • Exchange symmetry is when you can swap two (or more) things. If you are building a LEGO model, you can switch two bricks of the same size and colour and end up with the same result, hence it is symmetric under the exchange of those bricks. The idea that we have the same physical system when we swap two particles, like two electrons, is important in quantum mechanics. In my description of translational symmetry, I could have equally well have used lime jelly instead of tangerine, or even strawberry, hence the argument is symmetric under exchange of flavours. The symmetry is destroyed should we eat the infinite jelly Universe (we might also get stomach ache).
    Mario and Luigi are not symmetric under exchange, as anyone who has tried to play multiplayer Super Mario Bros. will know, as Luigi is the better jumper and has the better moustache.

There are lots more potential symmetries. Some used by physicists seem quite obscure, such as Lorentz symmetry, but the important thing to remember is that a symmetry of a system means we get the same thing back after a transformation.

Sometimes we consider approximate symmetries, when something is almost the same under a transformation. Coke and Pepsi are approximately exchange symmetric: try switching them for yourself. They are similar, but it is possible to tell them apart. The Earth has approximate rotational symmetry, but it is not exact as it is lumpy. The spaceship at the start of Spaceballs has approximate translational invariance: it just keeps going and going, but the symmetry is not exact as it does end eventually, so the symmetry only applies to the middle.

How to use symmetry

When studying for an undergraduate degree in physics, one of the first things you come to appreciate is that some coordinate systems make problems much easier than others. Coordinates are the set of numbers that describe a position in some space. The most familiar are Cartesian coordinates, when you use x and y to describe horizontal and vertical position respectively. Cartesian coordinates give you a nice grid with everything at right-angles. Undergrad students often like to stick with Cartesian coordinates as they are straight-forward and familiar. However, they can be a pain when describing a circle. If we want to plot a line five units from the origin of of coordinate system (0,\,0), we have to solve \sqrt{x^2 + y^2} = 5. However, if we used a polar coordinate system, it would simply be r = 5. By using coordinates that match the symmetry of our system we greatly simplify the problem!

Treasure map

Pirates are trying to figure out where they buried their treasure. They know it’s 5 yarrrds from the doughnut. Calculating positions using Cartesian coordinates is difficult, but they are good for specifying specific locations, like of the palm tree.

Treasure map

Using polar coordinates, it is easy to specify the location of points 5 yarrrds from the doughnut. Pirates prefer using the polar coordinates, they really like using r.

Picking a coordinate system for a problem should depend on the symmetries of the system. If we had a system that was translation invariant, Cartesian coordinates are the best to use. If the system was invariant with respect to translation in the horizontal direction, then we know that our answer should not depend on x. If we have a system that is rotation invariant, polar coordinates are the best, as we should get an answer that doesn’t depend on the rotation angle \varphi. By understanding symmetries, we can formulate our analysis of the problem such that we ask the best questions.

At the end of my undergrad degree, my friends and I went along to an awards ceremony. I think we were hoping they’d have the miniature éclairs they normally had for special occasions. There was a chap from an evil corporation™ giving away branded clocks, that apparently ran on water. We were fairly convinced there was more to it than that, so, as now fully qualified physicists, we though we should able to figure it out. We quickly came up with two ideas: that there was some powder inside the water tank that reacted with the water to produce energy, or that the electrodes reacted in a similar way to in a potato clock. We then started to argue about how to figure this out. At this point, Peter Littlewood, then head of the Cavendish Laboratory, wandered over. We explained the problem, but not our ideas. Immediately, he said that it must be to do with the electrodes due to symmetry. Current flows to power the clock. It’ll either flow left to right through the tank, or right to left. It doesn’t matter which, but the important thing is the clock can’t have reflection symmetry. If it did, there would be no preferred direction for the current to flow. To break the symmetry, the two (similar looking) electrodes must actually be different (and hence the potato clock theory is along the right lines). My friends and I all felt appropriately impressed and humbled, but it served as a good reminder that a simple concept like symmetry can be a powerful tool.

A concept I now try to impress upon my students, is to use symmetry to guide their answers. Most are happy enough to use symmetry for error checking: if the solution is meant to have rotational symmetry and their answer depends on \varphi they know they’ve made a mistake. However, symmetry can sometimes directly tell you the answer.

Lets imagine that you’ve baked a perfectly doughnut, such that it has rotational symmetry. For some reason you sprinkled it with an even coating of electrons instead of hundreds and thousands. We now want to calculate the electric field surrounding the doughnut (for obvious reasons). The electric field tells us which way charges are pushed/pulled. We’d expect positive charges to be attracted towards our negatively charged doughnut. There should be a radial electric field to pull positive charges in, but since it has rotational symmetry, there shouldn’t be any field in the \varphi direction, as there’s now reason for charges to be pulled clockwise or anticlockwise round our doughnut. Therefore, we should be able to write down immediately that the electric field in the \varphi direction is zero, by symmetry.

Most undergrads, though, will feel that this is cheating, and will often attempt to do all the algebra (hopefully using polar coordinates). Some will get this wrong, although there might be a few who are smart enough to note that their answer must be incorrect because of the symmetry. If symmetry tells you the answer, use it! Although it is good to practise your algebra (you get better by training), you can’t learn anything more than you already knew by symmetry. Working efficiently isn’t cheating, it’s smart.

Symmetry is a useful tool for problem solving, and something that everyone should make use of.

An introduction to probability: Leaving nothing to chance

Probabilities and science

Understanding probabilities is important in science. Once you’ve done an experiment, you need to be able to extract from your data information about your theory. Only rarely do you get a simple yes or no: most of the time you have to work with probabilities to quantify your degree of certainty. I’ll (probably) be writing about probabilities in connection with my research, so I thought it would be useful to introduce some of the concepts.

I’ll be writing a series of posts, hopefully going through from the basics to the limits of my understanding. We’ll begin with introducing the concept of probability. There’s a little bit of calculus, but you can skip that without effecting the rest, just remember you can’t grow up to be big and strong if you don’t finish your calculus.

What is a probability?

A probability describes the degree of belief that you have in a proposition. We talk about probabilities quite intuitively: there are some angry-looking, dark clouds overhead and I’ve just lit the barbecue, so it’s probably going to rain; it’s more likely that United will win this year’s sportsball league than Rovers, or it’s more credible that Ted is exaggerating in his anecdote than he actually ate that much fudge…

We formalise the concept of a probability, so that it can be used in calculations, by assigning them numerical values (not by making them wear a bow-tie, although that is obviously cool). Conventionally, we use 0 for impossible, 1 for certain and the range in between for intermediate probabilities. For example, if we were tossing a coin, we might expect it to be heads half the time, hence the probability of heads is P(\mathrm{head}) = 1/2, or if rolling a die, the probability of getting a six is P(6) = 1/6.

For both the coin and the die we have a number of equally probable outcomes: two for the coin (heads and tails) and six for the die (1, 2, 3, 4, 5 and 6). This does not have to be the case: imagine picking a letter at random from a sample of English text. Some letters are more common than others—this is why different letters have different values in Scrabble and why hangman can be tricky. The most frequent letter is “e”, the probability of picking it is about 0.12, and the least frequent is “z”, the probability of picking that is just 0.0007.

Often we consider a parameter that has a continuous range, rather than discrete values (as in the previous examples). For example, I might be interested in the mass of a black hole, which can have any positive value. We then use a probability density function p(x) such that the probability for the parameter lies in the range a \leq x \leq b is given by the integral

\displaystyle P(a \leq x \leq b) = \int_a^b p(x)\, \mathrm{d}x.

Performing an integral is just calculating the area under a curve, it can be thought of a the equivalent of adding up an infinite number of infinitely closely spaced slices. Returning to how much fudge Ted actually ate, we might to find the probability that he a mass of fudge m that was larger than zero, but smaller than the fatal dose M. If we a had probability density function p(m), we would calculate

\displaystyle P(0 < m \leq M) = \int_0^{M} p(m)\, \mathrm{d}m.

The probability density is largest where the probability is greatest and smallest where the probability is smallest, as you’d expect. Calculating probabilities and probability distributions is, in general, a difficult problem, it’s actually what I spend a lot of my time doing. We’ll return to calculating probabilities later.

Combining probabilities

There are several recipes for combining probabilities to construct other probabilities, just like there are recipes to combine sugar and dairy to make fudge. Admittedly, probabilities are less delicious than fudge, but they are also less likely to give you cavities. If we have a set of of disjoint outcomes, we can work out the probability of that set by adding up the probabilities of the individual outcomes. For example, when rolling our die, the probability of getting an even number is

\displaystyle P(\mathrm{even}) = P(2) + P(4) + P(6) = \frac{1}{6} +\frac{1}{6} +\frac{1}{6} = \frac{1}{2}.

(This is similar to what we’re doing when integrating up the probability density function for continuous distributions: there we’re adding up the probability that the variable x is in each infinitesimal range \mathrm{d}x).

If we have two independent events, then the probability of both of them occurring is calculated by multiplying the two individual probabilities together. For example, we could consider the probability of rolling a six and the probability of Ted surviving eating the lethal dose of fudge, then

\displaystyle P(\mathrm{6\: and\: survive}) = P(6) \times P(\mathrm{survive}).

The most commonly quoted quantity for a lethal dose is the median lethal dose or LD50, which is the dose that kills half the population, so we can take the probability of surviving to be 0.5. Thus,

\displaystyle P(\mathrm{6\: and\: survive}) = P(6) \times P(\mathrm{survive}) = \frac{1}{12} .

Events are independent if they don’t influence each other. Rolling a six shouldn’t influence Ted’s medical condition, and Ted’s survival shouldn’t influence the roll of a die, so these events are independent.

Things are more interesting when events are not independent. We then have to deal with conditional probabilities: the conditional probability P(\mathrm{A}|\mathrm{B}) is the probability of \mathrm{A} given that B is true. For example, if I told you that I rolled an even number, the probability of me having rolled a six is P(6|\mathrm{even}) = 1/3. If I told you that I have rolled a six, then the probability of me having rolled an even number is P(\mathrm{even}|6) = 1—it’s a dead cert, so bet all your fudge on that! When combining probabilities from dependent events, we chain probabilities together in a logical chain. The probability of rolling a six and an even number is the probability of rolling an even number multiplied by the probability of rolling a six given that I rolled an even number

\displaystyle P(\mathrm{6\: and\: even}) = P(6|\mathrm{even}) \times P(\mathrm{even})= \frac{1}{3} \times \frac{1}{2} = \frac{1}{6},

or equivalently the probability of rolling six multplied by the probability of rolling an even number given that I rolled a six

\displaystyle P(\mathrm{6\: and\: even}) = P(\mathrm{even} | 6) \times P(6) = 1 \times \frac{1}{6} = \frac{1}{6}.

Reassuringly, we do get the same answer. This is a bit of a silly example, as we know that if we’ve rolled a six we have rolled an even number, so all we are doing if calculating the probability of rolling a six.

We can use conditional probabilities for independent events: this is really easy as the conditional probability is just the straight probability. The probability of Ted surviving his surfeit of fudge given that I rolled a six is just the probability of him surviving, P(\mathrm{survive}|6) = P(\mathrm{survive}).

Let’s try a more complicated example, let’s imagine that Ted is playing fudge roulette. This is like Russian roulette, except you roll a die and if it comes up six, then you have to eat the lethal dose of fudge. His survival probability now depends on the roll of the die. We want to calculate the probability that Ted will live to tomorrow. If Ted doesn’t roll a six, we’ll assume that he has a 100% survive rate (based on that one anecdote where he claims to have created a philosopher’s stone by soaking duct tape in holy water), this isn’t quite right, but is good enough. The probability of Ted surviving given he didn’t roll a six is

\displaystyle P(\mathrm{not\: 6\: and\: survive}) = P(\mathrm{survive} | \mathrm{not\: 6}) \times P(\mathrm{not\: 6}) = 1 \times \frac{5}{6} = \frac{5}{6}.

The probability of Ted rolling a six (and eating the fudge) and then surviving is

\displaystyle P(\mathrm{6\: and\: survive}) = P(\mathrm{survive} | \mathrm{6}) \times P(\mathrm{6}) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12}.

We have two disjoint outcomes (rolling a six and survivng, and not rolling a six and surving), so the total probability of surviving is given by the sum

\displaystyle P(\mathrm{survive}) =P(\mathrm{not\: 6\: and\: survive}) +P(\mathrm{6\: and\: survive}) = \frac{5}{6} +\frac{1}{12} =\frac{11}{12}.

It seems likely that he’ll make it, although fudge roulette is still a dangerous game!

There’s actually an easier way of calculating the probability that Ted survives. There are only two possible outcomes: Ted survives or he doesn’t. Since one of these must happen, their probabilities must add to one: the survive probability is

P(\mathrm{survive}) = 1 - P(\mathrm{not\: survive}).

We’ve already seen this, as we’ve used the probability of not rolling a six isP(\mathrm{not\: 6}) = 1 - P(6) = 5/6. The probability of not surviving is much easier to work out as there’s only one way that can happen: rolling a six and then overdosing on fudge. The probability is

\displaystyle P(\mathrm{not\: surviving}) = P(\mathrm{fudge\: overdose}|6) \times P(6) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12},

and so the survival probability is P(\mathrm{survive}) = 1 - 1/12 = 11/12, exactly as before, but in fewer steps.

In a future post we’ll try working out the probability that Ted did eat a lethal dose of fudge given that he is alive to tell the anecdote. This is known as an inverse problem, and is similar to what scientists do all the time. We do experiments and get data, then we need to work out the probability of our theory (that Ted ate the fudge) being correct given the data (that he’s still alive).

Interpreting probabilities

We have now discussed what a probability is and how we can combine them. We should now think about how to interpret them. It’s easy enough to understand that a probability of 0.05 means that we expect something should happen on average once in 20 times, and that it is more probable than something with a probability of 0.01, but less likely than something with a probability of 0.10. However, we are not good at having an intuitive understanding of probabilities.

Consider the case that a scientist announces a result with 95% confidence. That sounds pretty good. Think how surprised you would be (assuming that their statistics are all correct) that the result was wrong. I feel like I would be pretty surprised. Now consider rolling tow dice, how surprised would you be if you rolled two sixes? The probability of the result being wrong is 1 - 0.95 = 0.05, or one in twenty. The probability of rolling two sixes is 1/6 \times 1/6 = 1/36 or about one in forty. Hence, you should be almost twice as surprised by rolling double six as for a 95% confidence-level result being incorrect.

When dealing with probabilities, I find it useful to make a comparison to something familiar. While Ted is more likely than not to survive fudge roulette, there is a one is twelve chance of dying. That’s three times as likely as rolling a double six, or equally probable as rolling a six and getting heads. That’s riskier than I’d like, so I’m going to stick to consuming fudge in moderation.