Of p and sigma

Ask a doctor or a psychologist if they’re sure about something, and they might say “it has p<0.05”. Ask a physicist, and they’ll say it’s a “5 sigma result”. On the surface, they sound like they’re talking about completely different things. As it turns out, they’re not quite that different.

Whether it’s a p-value or a sigma, what scientists are giving you is shorthand for a probability. The p-value is the probability itself, while sigma tells you how many standard deviations something is away from the mean on a normal distribution. For people not used to statistics this might sound very complicated, but it’s not so tricky in the end. There’s a graph, called a normal distribution, and you can look at how much of it is above a certain point, measured in units called standard deviations, or “sigmas”. That gives you your probability.

What are these numbers a probability of? At first, you might think they’re a probability of the scientist being right: of the medicine working, or the Higgs boson being there.

That would be reasonable, but it’s not how it works. Scientists can’t measure the chance they’re right. All they can do is compare models. When a scientist reports a p-value, what they’re doing is comparing to a kind of default model, called a “null hypothesis”. There are different null hypotheses for different experiments, depending on what the scientists want to test. For the Higgs, scientists looked at pairs of photons detected by the LHC. The null hypothesis was that these photons were created by other parts of the Standard Model, like the strong force, and not by a Higgs boson. For medicine, the null hypothesis might be that people get better on their own after a certain amount of time. That’s hard to estimate, which is why medical experiments use a control group: a similar group without the medicine, to see how much they get better on their own.

Once we have a null hypothesis, we can use it to estimate how likely it is that it produced the result of the experiment. If there was no Higgs, and all those photons just came from other particles, what’s the chance there would still be a giant pile of them at one specific energy? If the medicine didn’t do anything, what’s the chance the control group did that much worse than the treatment group?

Ideally, you want a small probability here. In medicine and psychology, you’re looking for a 5% probability, for p<0.05. In physics, you need 5 sigma to make a discovery, which corresponds to a one in 3.5 million probability. If the probability is low, then you can say that it would be quite unlikely for your result to happen if the null hypothesis was true. If you’ve got a better hypothesis (the Higgs exists, the medicine works), then you should pick that instead.

Note that this probability still uses a model: it’s the probability of the result given that the model is true. It isn’t the probability that the model is true, given the result. That probability is more important to know, but trickier to calculate. To get from one to the other, you need to include more assumptions: about how likely your model was to begin with, given everything else you know about the world. Depending on those assumptions, even the tiniest p-value might not show that your null hypothesis is wrong.

In practice, unfortunately, we usually can’t estimate all of those assumptions in detail. The best we can do is guess their effect, in a very broad way. That usually just means accepting a threshold for p-values, declaring some a discovery and others not. That limitation is part of why medicine and psychology demand p-values of 0.05, while physicists demand 5 sigma results. Medicine and psychology have some assumptions they can rely on: that people function like people, that biology and physics keep working. Physicists don’t have those assumptions, so we have to be extra-strict.

Ultimately, though, we’re all asking the same kind of question. And now you know how to understand it when we do.