# Of p and sigma

Ask a doctor or a psychologist if they’re sure about something, and they might say “it has p<0.05”. Ask a physicist, and they’ll say it’s a “5 sigma result”. On the surface, they sound like they’re talking about completely different things. As it turns out, they’re not quite that different.

Whether it’s a p-value or a sigma, what scientists are giving you is shorthand for a probability. The p-value is the probability itself, while sigma tells you how many standard deviations something is away from the mean on a normal distribution. For people not used to statistics this might sound very complicated, but it’s not so tricky in the end. There’s a graph, called a normal distribution, and you can look at how much of it is above a certain point, measured in units called standard deviations, or “sigmas”. That gives you your probability.

What are these numbers a probability of? At first, you might think they’re a probability of the scientist being right: of the medicine working, or the Higgs boson being there.

That would be reasonable, but it’s not how it works. Scientists can’t measure the chance they’re right. All they can do is compare models. When a scientist reports a p-value, what they’re doing is comparing to a kind of default model, called a “null hypothesis”. There are different null hypotheses for different experiments, depending on what the scientists want to test. For the Higgs, scientists looked at pairs of photons detected by the LHC. The null hypothesis was that these photons were created by other parts of the Standard Model, like the strong force, and not by a Higgs boson. For medicine, the null hypothesis might be that people get better on their own after a certain amount of time. That’s hard to estimate, which is why medical experiments use a control group: a similar group without the medicine, to see how much they get better on their own.

Once we have a null hypothesis, we can use it to estimate how likely it is that it produced the result of the experiment. If there was no Higgs, and all those photons just came from other particles, what’s the chance there would still be a giant pile of them at one specific energy? If the medicine didn’t do anything, what’s the chance the control group did that much worse than the treatment group?

Ideally, you want a small probability here. In medicine and psychology, you’re looking for a 5% probability, for p<0.05. In physics, you need 5 sigma to make a discovery, which corresponds to a one in 3.5 million probability. If the probability is low, then you can say that it would be quite unlikely for your result to happen if the null hypothesis was true. If you’ve got a better hypothesis (the Higgs exists, the medicine works), then you should pick that instead.

Note that this probability still uses a model: it’s the probability of the result given that the model is true. It isn’t the probability that the model is true, given the result. That probability is more important to know, but trickier to calculate. To get from one to the other, you need to include more assumptions: about how likely your model was to begin with, given everything else you know about the world. Depending on those assumptions, even the tiniest p-value might not show that your null hypothesis is wrong.

In practice, unfortunately, we usually can’t estimate all of those assumptions in detail. The best we can do is guess their effect, in a very broad way. That usually just means accepting a threshold for p-values, declaring some a discovery and others not. That limitation is part of why medicine and psychology demand p-values of 0.05, while physicists demand 5 sigma results. Medicine and psychology have some assumptions they can rely on: that people function like people, that biology and physics keep working. Physicists don’t have those assumptions, so we have to be extra-strict.

Ultimately, though, we’re all asking the same kind of question. And now you know how to understand it when we do.

# Halloween Post: Superstimuli for Physicists

For Halloween, this blog has a tradition of covering “the spooky side” of physics. This year, I’m bringing in a concept from biology to ask a spooky physics “what if?”

In the 1950’s, biologists discovered that birds were susceptible to a worryingly effective trick. By giving them artificial eggs larger and brighter than their actual babies, they found that the birds focused on the new eggs to the exclusion of their own. They couldn’t help trying to hatch the fake eggs, even if they were so large that they would fall off when they tried to sit on them. The effect, since observed in other species, became known as a supernormal stimulus, or superstimulus.

Can this happen to humans? Some think so. They worry about junk food we crave more than actual nutrients, or social media that eclipses our real relationships. Naturally, this idea inspires horror writers, who write about haunting music you can’t stop listening to, or holes in a wall that “fit” so well you’re compelled to climb in.

(And yes, it shows up in porn as well.)

But this is a physics blog, not a biology blog. What kind of superstimulus would work on physicists?

Well for one, this sounds a lot like some criticisms of string theory. Instead of a theory that just unifies some forces, why not unify all the forces? Instead of just learning some advanced mathematics, why not learn more, and more? And if you can’t be falsified by any experiment, well, all that would do is spoil the fun, right?

But it’s not just string theory you could apply this logic to. Astrophysicists study not just one world but many. Cosmologists study the birth and death of the entire universe. Particle physicists study the fundamental pieces that make up the fundamental pieces. We all partake in the euphoria of problem-solving, a perpetual rush where each solution leads to yet another question.

Do I actually think that string theory is a superstimulus, that astrophysics or particle physics is a superstimulus? In a word, no. Much as it might look that way from the news coverage, most physicists don’t work on these big, flashy questions. Far from being lured in by irresistible super-scale problems, most physicists work with tabletop experiments and useful materials. For those of us who do look up at the sky or down at the roots of the world, we do it not just because it’s compelling but because it has a good track record: physics wouldn’t exist if Newton hadn’t cared about the orbits of the planets. We study extremes because they advance our understanding of everything else, because they give us steam engines and transistors and change everyone’s lives for the better.

Then again, if I had fallen victim to a superstimulus, I’d say that anyway, right?

*cue spooky music*

# Congratulations to Arthur Ashkin, Gérard Mourou, and Donna Strickland!

The 2018 Physics Nobel Prize was announced this week, awarded to Arthur Ashkin, Gérard Mourou, and Donna Strickland for their work in laser physics.

Some Nobel prizes recognize discoveries of the fundamental nature of reality. Others recognize the tools that make those discoveries possible.

Ashkin developed techniques that use lasers to hold small objects in place, culminating in “optical tweezers” that can pick up and move individual bacteria. Mourou and Strickland developed chirped pulse amplification, the current state of the art in extremely high-power lasers. Strickland is only the third woman to win the Nobel prize in physics, Ashkin at 96 is the oldest person to ever win the prize.

(As an aside, the phrase “optical tweezers” probably has you imagining two beams of laser light pinching a bacterium between them, like microscopic lightsabers. In fact, optical tweezers use a single beam, focused and bent so that if an object falls out of place it will gently roll back to the middle of the beam. Instead of tweezers, it’s really more like a tiny laser spoon.)

The Nobel announcement emphasizes practical applications, like eye surgery. It’s important to remember that these are research tools as well. I wouldn’t have recognized the names of Ashkin, Mourou, and Strickland, but I recognized atom trapping, optical tweezers, and ultrashort pulses. Hang around atomic physicists, or quantum computing experiments, and these words pop up again and again. These are essential tools that have given rise to whole subfields. LIGO won a Nobel based on the expectation that it would kick-start a vast new area of research. Ashkin, Mourou, and Strickland’s work already has.

# Different Fields, Different Worlds

My grandfather is a molecular biologist. When we meet, we swap stories: the state of my field and his, different methods and focuses but often a surprising amount of common ground.

Recently he forwarded me an article by Raymond Goldstein, a biological physicist, arguing that biologists ought to be more comfortable with physical reasoning. The article is interesting in its own right, contrasting how physicists and biologists think about the relationship between models, predictions, and experiments. But what struck me most about the article wasn’t the content, but the context.

Goldstein’s article focuses on a question that seemed to me oddly myopic: should physical models be in the Results section, or the Discussion section?

As someone who has never written a paper with either a Results section or a Discussion section, I wondered why anyone would care. In my field, paper formats are fairly flexible. We usually have an Introduction and a Conclusion, yes, but in between we use however many sections we need to explain what we need to. In contrast, biology papers seem to have a very fixed structure: after the Introduction, there’s a Results section, a Discussion section, and a Materials and Methods section at the end.

At first blush, this seemed incredibly bizarre. Why describe your results before the methods you used to get them? How do you talk about your results without discussing them, but still take a full section to do it? And why do reviewers care how you divide things up in the first place?

It made a bit more sense once I thought about how biology differs from theoretical physics. In theoretical physics, the “methods” are most of the result: unsolved problems are usually unsolved because existing methods don’t solve them, and we need to develop new methods to make progress. Our “methods”, in turn, are often the part of the paper experts are most eager to read. In biology, in contrast, the methods are much more standardized. While papers will occasionally introduce new methods, there are so many unexplored biological phenomena that most of the time researchers don’t need to invent a new method: just asking a question no-one else has asked can be enough for a discovery. In that environment, the “results” matter a lot more: they’re the part that takes the most scrutiny, that needs to stand up on its own.

I can even understand the need for a fixed structure. Biology is a much bigger field than theoretical physics. My field is small enough that we all pretty much know each other. If a paper is hard to read, we’ll probably get a chance to ask the author what they meant. Biology, in contrast, is huge. An important result could come from anywhere, and anyone. Having a standardized format makes it a lot easier to scan through an unfamiliar paper and find what you need, especially when there might be hundreds of relevant papers.

The problem with a standardized system, as always, is the existence of exceptions. A more “physics-like” biology paper is more readable with “physics-like” conventions, even if the rest of the field needs to stay “biology-like”. Because of that, I have a lot of sympathy for Goldstein’s argument, but I can’t help but feel that he should be asking for more. If creating new mathematical models and refining them with observation is at the heart of what Goldstein is doing, then maybe he shouldn’t have to use Results/Discussion/Methods in the first place. Maybe he should be allowed to write biology papers that look more like physics papers.