# Shape the Science to the Statistics, Not the Statistics to the Science

In theatre, and more generally in writing, the advice is always to “show, don’t tell”. You could just tell your audience that Long John Silver is a ruthless pirate, but it works a lot better to show him marching a prisoner off the plank. Rather than just informing with words, you want to make things as concrete as possible, with actions.

There is a similar rule in pedagogy. Pedagogy courses teach you to be explicit about your goals, planning a course by writing down Intended Learning Outcomes. (They never seem amused when I ask about the Unintended Learning Outcomes.) At first, you’d want to write down outcomes like “students will understand calculus” or “students will know what a sine is”. These, however, are hard to judge, and thus hard to plan around. Instead, the advice is to write outcomes that correspond to actions you want the students to take, things you want them to be capable of doing: “students can perform integration by parts” “students can decide correctly whether to use a sine or cosine”. Again and again, the best way to get the students to know something is to get them to do something.

Jay Daigle recently finished a series of blog posts on how scientists use statistics to test hypotheses. I recommend it, it’s a great introduction to the concepts scientists use to reason about data, as well as a discussion of how they often misuse those concepts and what they can do better. I have a bit of a different perspective on one of the “takeaways” of the post, and I wanted to highlight that here.

The center of Daigle’s point is a tool, widely used in science, called Neyman-Pearson Hypothesis Testing. Neyman-Pearson is a tool for making decisions involving a threshold for significance: a number that scientists often call a p-value. If you follow the procedure, only acting when you find a p-value below 0.05, then you will only be wrong 5% of the time: specifically, that will be your rate of false positives, the percent of the time you conclude some action works when it really doesn’t.

A core problem, from Daigle’s perspective, is that scientists use Neyman-Pearson for the wrong purpose. Neyman-Pearson is a tool for making decisions, not a test that tells you whether or not a specific claim is true. It tells you “on average, if I approve drugs when their p-value is below 0.05, only 5% of them will fail”. That’s great if you can estimate how bad it is to deny a drug that should be approved, how bad it is to approve a drug that should be denied, and calculate out on average how often you can afford to be wrong. It doesn’t tell you anything about the specific drug, though. It doesn’t tell you “every drug with a p-value below 0.05 works”. It certainly doesn’t tell you “a drug with a p-value of 0.051 almost works” or “a drug with a p-value of 0.001 definitely works”. It just doesn’t give you that information.

In later posts, Daigle suggests better tools, which he argues map better to what scientists want to do, as well as general ways scientists can do better. Section 4. in particular focuses on the idea that one thing scientists need to do is ask better questions. He uses a specific example from cognitive psychology, a study that tests whether describing someone’s face makes you worse at recognizing it later. That’s a clear scientific question, one that can be tested statistically. That doesn’t mean it’s a good question, though. Daigle points out that questions like this have a problem: it isn’t clear what the result actually tells us.

Here’s another example of the same problem. In grad school, I knew a lot of social psychologists. One was researching a phenomenon called extended contact. Extended contact is meant to be a foil to another phenomenon called direct contact, both having to do with our views of other groups. In direct contact, making a friend from another group makes you view that whole group better. In extended contact, making a friend who has a friend from another group makes you view the other group better.

The social psychologist was looking into a concrete-sounding question: which of these phenomena, direct or extended contact, is stronger?

At first, that seems like it has the same problem as Daigle’s example. Suppose one of these effects is larger: what does that mean? Why do we care?

Well, one answer is that these aren’t just phenomena: they’re interventions. If you know one phenomenon is stronger than another, you can use that to persuade people to be more accepting of other groups. The psychologist’s advisor even had a procedure to make people feel like they made a new friend. Armed with that, it’s definitely useful to know whether extended contact or direct contact is better: whichever one is stronger is the one you want to use!

You do need some “theory” behind this, of course. You need to believe that, if a phenomenon is stronger in your psychology lab, it will be stronger wherever you try to apply it in the real world. It probably won’t be stronger every single time, so you need some notion of how much stronger it needs to be. That in turn means you need to estimate costs: what it costs if you pick the weaker one instead, how much money you’re wasting or harm you’re doing.

You’ll notice this is sounding a lot like the requirements I described earlier, for Neyman-Pearson. That’s not accident: as you try to make your science more and more clearly defined, it will get closer and closer to a procedure to make a decision, and that’s exactly what Neyman-Pearson is good for.

So in the end I’m quite a bit more supportive of Neyman-Pearson than Daigle is. That doesn’t mean it isn’t being used wrong: most scientists are using it wrong. Instead of calculating a p-value each time they make a decision, they do it at the end of a paper, misinterpreting it as evidence that one thing or another is “true”. But I think that what these scientists need to do is not chance their statistics, but change their science. If they focused their science on making concrete decisions, they would actually be justified in using Neyman-Pearson…and their science would get a lot better in the process.

# The Most Anthropic of All Possible Worlds

Today, we’d call Leibniz a mathematician, a physicist, and a philosopher. As a mathematician, Leibniz turned calculus into something his contemporaries could actually use. As a physicist, he championed a doomed theory of gravity. In philosophy, he seems to be most remembered for extremely cheaty arguments.

I don’t blame him for this. Faced with a tricky philosophical problem, it’s enormously tempting to just blaze through with an answer that makes every subtlety irrelevant. It’s a temptation I’ve succumbed to time and time again. Faced with a genie, I would always wish for more wishes. On my high school debate team, I once forced everyone at a tournament to switch sides with some sneaky definitions. It’s all good fun, but people usually end up pretty annoyed with you afterwards.

People were annoyed with Leibniz too, especially with his solution to the problem of evil. If you believe in a benevolent, all-powerful god, as Leibniz did, why is the world full of suffering and misery? Leibniz’s answer was that even an all-powerful god is constrained by logic, so if the world contains evil, it must be logically impossible to make the world any better: indeed, we live in the best of all possible worlds. Voltaire famously made fun of this argument in Candide, dragging a Leibniz-esque Professor Pangloss through some of the most creative miseries the eighteenth century had to offer. It’s possibly the most famous satire of a philosopher, easily beating out Aristophanes’ The Clouds (which is also great).

Physicists can also get accused of cheaty arguments, and probably the most mocked is the idea of a multiverse. While it hasn’t had its own Candide, the multiverse has been criticized by everyone from bloggers to Nobel prizewinners. Leibniz wanted to explain the existence of evil, physicists want to explain “unnaturalness”: the fact that the kinds of theories we use to explain the world can’t seem to explain the mass of the Higgs boson. To explain it, these physicists suggest that there are really many different universes, separated widely in space or built in to the interpretation of quantum mechanics. Each universe has a different Higgs mass, and ours just happens to be the one we can live in. This kind of argument is called “anthropic” reasoning. Rather than the best of all possible worlds, it says we live in the world best-suited to life like ours.

I called Leibniz’s argument “cheaty”, and you might presume I think the same of the multiverse. But “cheaty” doesn’t mean “wrong”. It all depends what you’re trying to do.

Leibniz’s argument and the multiverse both work by dodging a problem. For Leibniz, the problem of evil becomes pointless: any evil might be necessary to secure a greater good. With a multiverse, naturalness becomes pointless: with many different laws of physics in different places, the existence of one like ours needs no explanation.

In both cases, though, the dodge isn’t perfect. To really explain any given evil, Leibniz would have to show why it is secretly necessary in the face of a greater good (and Pangloss spends Candide trying to do exactly that). To explain any given law of physics, the multiverse needs to use anthropic reasoning: it needs to show that that law needs to be the way it is to support human-like life.

This sounds like a strict requirement, but in both cases it’s not actually so useful. Leibniz could (and Pangloss does) come up with an explanation for pretty much anything. The problem is that no-one actually knows which aspects of the universe are essential and which aren’t. Without a reliable way to describe the best of all possible worlds, we can’t actually test whether our world is one.

The same problem holds for anthropic reasoning. We don’t actually know what conditions are required to give rise to people like us. “People like us” is very vague, and dramatically different universes might still contain something that can perceive and observe. While it might seem that there are clear requirements, so far there hasn’t been enough for people to do very much with this type of reasoning.

However, for both Leibniz and most of the physicists who believe anthropic arguments, none of this really matters. That’s because the “best of all possible worlds” and “most anthropic of all possible worlds” aren’t really meant to be predictive theories. They’re meant to say that, once you are convinced of certain things, certain problems don’t matter anymore.

Leibniz, in particular, wasn’t trying to argue for the existence of his god. He began the argument convinced that a particular sort of god existed: one that was all-powerful and benevolent, and set in motion a deterministic universe bound by logic. His argument is meant to show that, if you believe in such a god, then the problem of evil can be ignored: no matter how bad the universe seems, it may still be the best possible world.

Similarly, the physicists convinced of the multiverse aren’t really getting there through naturalness. Rather, they’ve become convinced of a few key claims: that the universe is rapidly expanding, leading to a proliferating multiverse, and that the laws of physics in such a multiverse can vary from place to place, due to the huge landscape of possible laws of physics in string theory. If you already believe those things, then the naturalness problem can be ignored: we live in some randomly chosen part of the landscape hospitable to life, which can be anywhere it needs to be.

So despite their cheaty feel, both arguments are fine…provided you agree with their assumptions. Personally, I don’t agree with Leibniz. For the multiverse, I’m less sure. I’m not confident the universe expands fast enough to create a multiverse, I’m not even confident it’s speeding up its expansion now. I know there’s a lot of controversy about the math behind the string theory landscape, about whether the vast set of possible laws of physics are as consistent as they’re supposed to be…and of course, as anyone must admit, we don’t know whether string theory itself is true! I don’t think it’s impossible that the right argument comes around and convinces me of one or both claims, though. These kinds of arguments, “if assumptions, then conclusion” are the kind of thing that seems useless for a while…until someone convinces you of the conclusion, and they matter once again.

So in the end, despite the similarity, I’m not sure the multiverse deserves its own Candide. I’m not even sure Leibniz deserved Candide. But hopefully by understanding one, you can understand the other just a bit better.

# The Undefinable

If I can teach one lesson to all of you, it’s this: be precise. In physics, we try to state what we mean as precisely as we can. If we can’t state something precisely, that’s a clue: maybe what we’re trying to state doesn’t actually make sense.

Someone recently reached out to me with a question about black holes. He was confused about how they were described, about what would happen when you fall in to one versus what we could see from outside. Part of his confusion boiled down to a question: “is the center really an infinitely small point?”

I remembered a commenter a while back who had something interesting to say about this. Trying to remind myself of the details, I dug up this question on Physics Stack Exchange. user4552 has a detailed, well-referenced answer, with subtleties of General Relativity that go significantly beyond what I learned in grad school.

According to user4552, the reason this question is confusing is that the usual setup of general relativity cannot answer it. In general relativity, singularities like the singularity in the middle of a black hole aren’t treated as points, or collections of points: they’re not part of space-time at all. So you can’t count their dimensions, you can’t see whether they’re “really” infinitely small points, or surfaces, or lines…

This might surprise people (like me) who have experience with simpler equations for these things, like the Schwarzchild metric. The Schwarzchild metric describes space-time around a black hole, and in the usual coordinates it sure looks like the singularity is at a single point where r=0, just like the point where r=0 is a single point in polar coordinates in flat space. The thing is, though, that’s just one sort of coordinates. You can re-write a metric in many different sorts of coordinates, and the singularity in the center of a black hole might look very different in those coordinates. In general relativity, you need to stick to things you can say independent of coordinates.

Ok, you might say, so the usual mathematics can’t answer the question. Can we use more unusual mathematics? If our definition of dimensions doesn’t tell us whether the singularity is a point, maybe we just need a new definition!

According to user4552, people have tried this…and it only sort of works. There are several different ways you could define the dimension of a singularity. They all seem reasonable in one way or another. But they give different answers! Some say they’re points, some say they’re three-dimensional. And crucially, there’s no obvious reason why one definition is “right”. The question we started with, “is the center really an infinitely small point?”, looked like a perfectly reasonable question, but it actually wasn’t: the question wasn’t precise enough.

This is the real problem. The problem isn’t that our question was undefined, after all, we can always add new definitions. The problem was that our question didn’t specify well enough the definitions we needed. That is why the question doesn’t have an answer.

Once you understand the difference, you see these kinds of questions everywhere. If you’re baffled by how mass could have come out of the Big Bang, or how black holes could radiate particles in Hawking radiation, maybe you’ve heard a physicist say that energy isn’t always conserved. Energy conservation is a consequence of symmetry, specifically, symmetry in time. If your space-time itself isn’t symmetric (the expanding universe making the past different from the future, a collapsing star making a black hole), then you shouldn’t expect energy to be conserved.

I sometimes hear people object to this. They ask, is it really true that energy isn’t conserved when space-time isn’t symmetric? Shouldn’t we just say that space-time itself contains energy?

And well yes, you can say that, if you want. It isn’t part of the usual definition, but you can make a new definition, one that gives energy to space-time. In fact, you can make more than one new definition…and like the situation with the singularity, these definitions don’t always agree! Once again, you asked a question you thought was sensible, but it wasn’t precise enough to have a definite answer.

Keep your eye out for these kinds of questions. If scientists seem to avoid answering the question you want, and keep answering a different question instead…it might be their question is the only one with a precise answer. You can define a method to answer your question, sure…but it won’t be the only way. You need to ask precise enough questions to get good answers.

# The Only Speed of Light That Matters

A couple weeks back, someone asked me about a Veritasium video with the provocative title “Why No One Has Measured The Speed Of Light”. Veritasium is a science popularization youtube channel, and usually a fairly good one…so it was a bit surprising to see it make a claim usually reserved for crackpots. Many, many people have measured the speed of light, including Ole Rømer all the way back in 1676. To argue otherwise seems like it demands a massive conspiracy.

Veritasium wasn’t proposing a conspiracy, though, just a technical point. Yes, many experiments have measured the speed of light. However, the speed they measure is in fact a “two-way speed”, the speed that light takes to go somewhere and then come back. They leave open the possibility that light travels differently in different directions, and only has the measured speed on average: that there are different “one-way speeds” of light.

The loophole is clearest using some of the more vivid measurements of the speed of light, timing how long it takes to bounce off a mirror and return. It’s less clear using other measurements of the speed of light, like Rømer’s. Rømer measured the speed of light using the moons of Jupiter, noticing that the time they took to orbit appeared to change based on whether Jupiter was moving towards or away from the Earth. For this measurement Rømer didn’t send any light to Jupiter…but he did have to make assumptions about Jupiter’s rotation, using it like a distant clock. Those assumptions also leave the door open to a loophole, one where the different one-way speeds of light are compensated by different speeds for distant clocks. You can watch the Veritasium video for more details about how this works, or see the wikipedia page for the mathematical details.

When we think of the speed of light as the same in all directions, in some sense we’re making a choice. We’ve chosen a convention, called the Einstein synchronization convention, that lines up distant clocks in a particular way. We didn’t have to choose that convention, though we prefer to (the math gets quite a bit more complicated if we don’t). And crucially for any such choice, it is impossible for any experiment to tell the difference.

So far, Veritasium is doing fine here. But if the video was totally fine, I wouldn’t have written this post. The technical argument is fine, but the video screws up its implications.

Near the end of the video, the host speculates whether this ambiguity is a clue. What if a deeper theory of physics could explain why we can’t tell the difference between different synchronizations? Maybe that would hint at something important.

Well, it does hint at something important, but not something new. What it hints at is that “one-way speeds” don’t matter. Not for light, or really for anything else.

Think about measuring the speed of something, anything. There are two ways to do it. One is to time it against something else, like the signal in a wire, and assume we know that speed. Veritasium shows an example of this, measuring the speed of a baseball that hits a target and sends a signal back. The other way is to send it somewhere with a clock we trust, and compare it to our clock. Each of these requires that something goes back and forth, even if it’s not the same thing each time. We can’t measure the one-way speed of anything because we’re never in two places at once. Everything we measure, every conclusion we come to about the world, rests on something “two-way”: our actions go out, our perceptions go in. Even our depth perception is an inference from our ancestors, whose experience seeing food and traveling to it calibrated our notion of distance.

Synchronization of clocks is a convention because the external world is a convention. What we have really, objectively, truly, are our perceptions and our memories. Everything else is a model we build to fill the gaps in between. Some features of that model are essential: if you change them, you no longer match our perceptions. Other features, though, are just convenience: ways we arrange the model to make it easier to use, to make it not “sound dumb”, to tell a coherent story. Synchronization is one of those things: the notion that you can compare times in distant places is convenient, but as relativity already tells us in other contexts, not necessary. It’s part of our storytelling, not an essential part of our model.

# Duality and Emergence: When Is Spacetime Not Spacetime?

Spacetime is doomed! At least, so say some physicists. They don’t mean this as a warning, like some comic-book universe-destroying disaster, but rather as a research plan. These physicists believe that what we think of as space and time aren’t the full story, but that they emerge from something more fundamental, so that an ultimate theory of nature might not use space or time at all. Other, grumpier physicists are skeptical. Joined by a few philosophers, they think the “spacetime is doomed” crowd are over-excited and exaggerating the implications of their discoveries. At the heart of the argument is the distinction between two related concepts: duality and emergence.

In physics, sometimes we find that two theories are actually dual: despite seeming different, the patterns of observations they predict are the same. Some of the more popular examples are what we call holographic theories. In these situations, a theory of quantum gravity in some space-time is dual to a theory without gravity describing the edges of that space-time, sort of like how a hologram is a 2D image that looks 3D when you move it. For any question you can ask about the gravitational “bulk” space, there is a matching question on the “boundary”. No matter what you observe, neither description will fail.

If theories with gravity can be described by theories without gravity, does that mean gravity doesn’t really exist? If you’re asking that question, you’re asking whether gravity is emergent. An emergent theory is one that isn’t really fundamental, but instead a result of the interaction of more fundamental parts. For example, hydrodynamics, the theory of fluids like water, emerges from more fundamental theories that describe the motion of atoms and molecules.

(For the experts: I, like most physicists, am talking about “weak emergence” here, not “strong emergence”.)

The “spacetime is doomed” crowd think that not just gravity, but space-time itself is emergent. They expect that distances and times aren’t really fundamental, but a result of relationships that will turn out to be more fundamental, like entanglement between different parts of quantum fields. As evidence, they like to bring up dualities where the dual theories have different concepts of gravity, number of dimensions, or space-time. Using those theories, they argue that space and time might “break down”, and not be really fundamental.

The skeptics, though, bring up an important point. If two theories are really dual, then no observation can distinguish them: they make exactly the same predictions. In that case, say the skeptics, what right do you have to call one theory more fundamental than the other? You can say that gravity emerges from a boundary theory without gravity, but you could just as easily say that the boundary theory emerges from the gravity theory. The whole point of duality is that no theory is “more true” than the other: one might be more or less convenient, but both describe the same world. If you want to really argue for emergence, then your “more fundamental” theory needs to do something extra: to predict something that your emergent theory doesn’t predict.

Sometimes this is a fair objection. There are members of the “spacetime is doomed” crowd who are genuinely reckless about this, who’ll tell a journalist about emergence when they really mean duality. But many of these people are more careful, and have thought more deeply about the question. They tend to have some mix of these two perspectives:

First, if two descriptions give the same results, then do the descriptions matter? As physicists, we have a history of treating theories as the same if they make the same predictions. Space-time itself is a result of this policy: in the theory of relativity, two people might disagree on which one of two events happened first or second, but they will agree on the overall distance in space-time between the two. From this perspective, a duality between a bulk theory and a boundary theory isn’t evidence that the bulk theory emerges from the boundary, but it is evidence that both the bulk and boundary theories should be replaced by an “overall theory”, one that treats bulk and boundary as irrelevant descriptions of the same physical reality. This perspective is similar to an old philosophical theory called positivism: that statements are meaningless if they cannot be derived from something measurable. That theory wasn’t very useful for philosophers, which is probably part of why some philosophers are skeptics of “space-time is doomed”. The perspective has been quite useful to physicists, though, so we’re likely to stick with it.

Second, some will say that it’s true that a dual theory is not an emergent theory…but it can be the first step to discover one. In this perspective, dualities are suggestive evidence that a deeper theory is waiting in the wings. The idea would be that one would first discover a duality, then discover situations that break that duality: examples on one side that don’t correspond to anything sensible on the other. Maybe some patterns of quantum entanglement are dual to a picture of space-time, but some are not. (Closer to my sub-field, maybe there’s an object like the amplituhedron that doesn’t respect locality or unitarity.) If you’re lucky, maybe there are situations, or even experiments, that go from one to the other: where the space-time description works until a certain point, then stops working, and only the dual description survives. Some of the models of emergent space-time people study are genuinely of this type, where a dimension emerges in a theory that previously didn’t have one. (For those of you having a hard time imagining this, read my old post about “bubbles of nothing”, then think of one happening in reverse.)

It’s premature to say space-time is doomed, at least as a definite statement. But it is looking like, one way or another, space-time won’t be the right picture for fundamental physics. Maybe that’s because it’s equivalent to another description, redundant embellishment on an essential theoretical core. Maybe instead it breaks down, and a more fundamental theory could describe more situations. We don’t know yet. But physicists are trying to figure it out.

# Of p and sigma

Ask a doctor or a psychologist if they’re sure about something, and they might say “it has p<0.05”. Ask a physicist, and they’ll say it’s a “5 sigma result”. On the surface, they sound like they’re talking about completely different things. As it turns out, they’re not quite that different.

Whether it’s a p-value or a sigma, what scientists are giving you is shorthand for a probability. The p-value is the probability itself, while sigma tells you how many standard deviations something is away from the mean on a normal distribution. For people not used to statistics this might sound very complicated, but it’s not so tricky in the end. There’s a graph, called a normal distribution, and you can look at how much of it is above a certain point, measured in units called standard deviations, or “sigmas”. That gives you your probability.

What are these numbers a probability of? At first, you might think they’re a probability of the scientist being right: of the medicine working, or the Higgs boson being there.

That would be reasonable, but it’s not how it works. Scientists can’t measure the chance they’re right. All they can do is compare models. When a scientist reports a p-value, what they’re doing is comparing to a kind of default model, called a “null hypothesis”. There are different null hypotheses for different experiments, depending on what the scientists want to test. For the Higgs, scientists looked at pairs of photons detected by the LHC. The null hypothesis was that these photons were created by other parts of the Standard Model, like the strong force, and not by a Higgs boson. For medicine, the null hypothesis might be that people get better on their own after a certain amount of time. That’s hard to estimate, which is why medical experiments use a control group: a similar group without the medicine, to see how much they get better on their own.

Once we have a null hypothesis, we can use it to estimate how likely it is that it produced the result of the experiment. If there was no Higgs, and all those photons just came from other particles, what’s the chance there would still be a giant pile of them at one specific energy? If the medicine didn’t do anything, what’s the chance the control group did that much worse than the treatment group?

Ideally, you want a small probability here. In medicine and psychology, you’re looking for a 5% probability, for p<0.05. In physics, you need 5 sigma to make a discovery, which corresponds to a one in 3.5 million probability. If the probability is low, then you can say that it would be quite unlikely for your result to happen if the null hypothesis was true. If you’ve got a better hypothesis (the Higgs exists, the medicine works), then you should pick that instead.

Note that this probability still uses a model: it’s the probability of the result given that the model is true. It isn’t the probability that the model is true, given the result. That probability is more important to know, but trickier to calculate. To get from one to the other, you need to include more assumptions: about how likely your model was to begin with, given everything else you know about the world. Depending on those assumptions, even the tiniest p-value might not show that your null hypothesis is wrong.

In practice, unfortunately, we usually can’t estimate all of those assumptions in detail. The best we can do is guess their effect, in a very broad way. That usually just means accepting a threshold for p-values, declaring some a discovery and others not. That limitation is part of why medicine and psychology demand p-values of 0.05, while physicists demand 5 sigma results. Medicine and psychology have some assumptions they can rely on: that people function like people, that biology and physics keep working. Physicists don’t have those assumptions, so we have to be extra-strict.

Ultimately, though, we’re all asking the same kind of question. And now you know how to understand it when we do.

# Don’t Trust the Experiments, Trust the Science

I was chatting with an astronomer recently, and this quote by Arthur Eddington came up:

“Never trust an experimental result until it has been confirmed by theory.”

Arthur Eddington

At first, this sounds like just typical theorist arrogance, thinking we’re better than all those experimentalists. It’s not that, though, or at least not just that. Instead, it’s commenting on a trend that shows up again and again in science, but rarely makes the history books. Again and again an experiment or observation comes through with something fantastical, something that seems like it breaks the laws of physics or throws our best models into disarray. And after a few months, when everyone has checked, it turns out there was a mistake, and the experiment agrees with existing theories after all.

You might remember a recent example, when a lab claimed to have measured neutrinos moving faster than the speed of light, only for it to turn out to be due to a loose cable. Experiments like this aren’t just a result of modern hype: as Eddington’s quote shows, they were also common in his day. In general, Eddington’s advice is good: when an experiment contradicts theory, theory tends to win in the end.

This may sound unscientific: surely we should care only about what we actually observe? If we defer to theory, aren’t we putting dogma ahead of the evidence of our senses? Isn’t that the opposite of good science?

To understand what’s going on here, we can use an old philosophical argument: David Hume’s argument against miracles. David Hume wanted to understand how we use evidence to reason about the world. He argued that, for miracles in particular, we can never have good evidence. In Hume’s definition, a miracle was something that broke the established laws of science. Hume argued that, if you believe you observed a miracle, there are two possibilities: either the laws of science really were broken, or you made a mistake. The thing is, laws of science don’t just come from a textbook: they come from observations as well, many many observations in many different conditions over a long period of time. Some of those observations establish the laws in the first place, others come from the communities that successfully apply them again and again over the years. If your miracle was real, then it would throw into doubt many, if not all, of those observations. So the question you have to ask is: it it more likely those observations were wrong? Or that you made a mistake? Put another way, your evidence is only good enough for a miracle if it would be a bigger miracle if you were wrong.

Hume’s argument always struck me as a little bit too strict: if you rule out miracles like this, you also rule out new theories of science! A more modern approach would use numbers and statistics, weighing the past evidence for a theory against the precision of the new result. Most of the time you’d reach the same conclusion, but sometimes an experiment can be good enough to overthrow a theory.

Still, theory should always sit in the background, a kind of safety net for when your experiments screw up. It does mean that when you don’t have that safety net you need to be extra-careful. Physics is an interesting case of this: while we have “the laws of physics”, we don’t have any established theory that tells us what kinds of particles should exist. That puts physics in an unusual position, and it’s probably part of why we have such strict standards of statistical proof. If you’re going to be operating without the safety net of theory, you need that kind of proof.

This post was also inspired by some biological examples. The examples are politically controversial, so since this is a no-politics blog I won’t discuss them in detail. (I’ll also moderate out any comments that do.) All I’ll say is that I wonder if in that case the right heuristic is this kind of thing: not to “trust scientists” or “trust experts” or even “trust statisticians”, but just to trust the basic, cartoon-level biological theory.

# Facts About Math Are Facts About Us

Each year, the Niels Bohr International Academy has a series of public talks. Part of Copenhagen’s Folkeuniversitet (“people’s university”), they attract a mix of older people who want to keep up with modern developments and young students looking for inspiration. I gave a talk a few days ago, as part of this year’s program. The last time I participated, back in 2017, I covered a topic that comes up a lot on this blog: “The Quest for Quantum Gravity”. This year, I was asked to cover something more unusual: “The Unreasonable Effectiveness of Mathematics in the Natural Sciences”.

Some of you might notice that title is already taken: it’s a famous lecture by the physicist Wigner, from 1959. Wigner posed an interesting question: why is advanced mathematics so useful in physics? Time and time again, mathematicians develop an idea purely for its own sake, only for physicists to find it absolutely indispensable to describe some part of the physical world. Should we be surprised that this keeps working? Suspicious?

I talked a bit about this: some of the answers people have suggested over the years, and my own opinion. But like most public talks, the premise was mostly a vehicle for cool examples: physicists through history bringing in new math, and surprising mathematical facts like the ones I talked about a few weeks back at Culture Night. Because of that, I was actually a bit unprepared to dive into the philosophical side of the topic (despite it being in principle a very philosophical topic!) When one of the audience members brought up mathematical Platonism, I floundered a bit, not wanting to say something that was too philosophically naive.

Well, if there’s anywhere I can be naive, it’s my own blog. I even have a label for Amateur Philosophy posts. So let’s do one.

Mathematical Platonism is the idea that mathematical truths “exist”: that they’re somewhere “out there” being discovered. On the other side, one might believe that mathematics is not discovered, but invented. For some reason, a lot of people with the latter opinion seem to think this has something to do with describing nature (for example, an essay a few years back by Lee Smolin defines mathematics as “the study of systems of evoked relationships inspired by observations of nature”).

I’m not a mathematical Platonist. I don’t even like to talk about which things do or don’t “exist”. But I also think that describing mathematics in terms of nature is missing the point. Mathematicians aren’t physicists. While there may have been a time when geometers argued over lines in the sand, these days mathematicians’ inspiration isn’t usually the natural world, at least not in the normal sense.

Instead, I think you can’t describe mathematics without describing mathematicians. A mathematical fact is, deep down, something a mathematician can say without other mathematicians shouting them down. It’s an allowed move in what my hazy secondhand memory of Wittgenstein wants to call a “language game”: something that gets its truth from a context of people interpreting and reacting to it, in the same way a move in chess matters only when everyone is playing by its rules.

This makes mathematics sound very subjective, and we’re used to the opposite: the idea that a mathematical fact is as objective as they come. The important thing to remember is that even with this kind of description, mathematics still ends up vastly less subjective than any other field. We care about subjectivity between different people: if a fact is “true” for Brits and “false” for Germans, then it’s a pretty limited fact. Mathematics is special because the “rules of its game” aren’t rules of one group or another. They’re rules that are in some sense our birthright. Any human who can read and write, or even just act and perceive, can act as a Turing Machine, a universal computer. With enough patience and paper, anything that you can prove to one person you can prove to another: you just have to give them the rules and let them follow them. It doesn’t matter how smart you are, or what you care about most: if something is mathematically true for others, it is mathematically true for you.

Some would argue that this is evidence for mathematical Platonism, that if something is a universal truth it should “exist”. Even if it does, though, I don’t think it’s useful to think of it in that way. Once you believe that mathematical truth is “out there”, you want to try to characterize it, to say something about it besides that it’s “out there”. You’ll be tempted to have an opinion on the Axiom of Choice, or the Continuum Hypothesis. And the whole point is that those aren’t sensible things to have opinions on, that having an opinion about them means denying the mathematical proofs that they are, in the “standard” axioms, undecidable. Whatever is “out there”, it has to include everything you can prove with every axiom system, whichever non-standard ones you can cook up, because mathematicians will happily work on any of them. The whole point of mathematics, the thing that makes it as close to objective as anything can be, is that openness: the idea that as long as an argument is good enough, as long as it can convince anyone prepared to wade through the pages, then it is mathematics. Nothing, so long as it can convince in the long-run, is excluded.

If we take this definition seriously, there are some awkward consequences. You could imagine a future in which every mind, everyone you might be able to do mathematics with, is crushed under some tyrant, forced to agree to something false. A real philosopher would dig in to this corner case, try to salvage the definition or throw it out. I’m not a real philosopher though. So all I can say is that while I don’t think that tyrant gets to define mathematics, I also don’t think there are good alternatives to my argument. Our only access to mathematics, and to truth in general, is through the people who pursue it. I don’t think we can define one without the other.

# Of Cows and Razors

Last week’s post came up on Reddit, where a commenter made a good point. I said that one of the mysteries of neutrinos is that they might not get their mass from the Higgs boson. This is true, but the commenter rightly points out it’s true of other particles too: electrons might not get their mass from the Higgs. We aren’t sure. The lighter quarks might not get their mass from the Higgs either.

When talking physics with the public, we usually say that electrons and quarks all get their mass from the Higgs. That’s how it works in our Standard Model, after all. But even though we’ve found the Higgs boson, we can’t be 100% sure that it functions the way our model says. That’s because there are aspects of the Higgs we haven’t been able to measure directly. We’ve measured how it affects the heaviest quark, the top quark, but measuring its interactions with other particles will require a bigger collider. Until we have those measurements, the possibility remains open that electrons and quarks get their mass another way. It would be a more complicated way: we know the Higgs does a lot of what the model says, so if it deviates in another way we’d have to add more details, maybe even more undiscovered particles. But it’s possible.

If I wanted to defend the idea that neutrinos are special here, I would point out that neutrino masses, unlike electron masses, are not part of the Standard Model. For electrons, we have a clear “default” way for them to get mass, and that default is in a meaningful way simpler than the alternatives. For neutrinos, every alternative is complicated in some fashion: either adding undiscovered particles, or unusual properties. If we were to invoke Occam’s Razor, the principle that we should always choose the simplest explanation, then for electrons and quarks there is a clear winner. Not so for neutrinos.

I’m not actually going to make this argument. That’s because I’m a bit wary of using Occam’s Razor when it comes to questions of fundamental physics. Occam’s Razor is a good principle to use, if you have a good idea of what’s “normal”. In physics, you don’t.

To illustrate, I’ll tell an old joke about cows and trains. Here’s the version from The Curious Incident of the Dog in the Night-Time:

There are three men on a train. One of them is an economist and one of them is a logician and one of them is a mathematician. And they have just crossed the border into Scotland (I don’t know why they are going to Scotland) and they see a brown cow standing in a field from the window of the train (and the cow is standing parallel to the train). And the economist says, ‘Look, the cows in Scotland are brown.’ And the logician says, ‘No. There are cows in Scotland of which at least one is brown.’ And the mathematician says, ‘No. There is at least one cow in Scotland, of which one side appears to be brown.’

If we want to be as careful as possible, the mathematician’s answer is best. But we expect not to have to be so careful. Maybe the economist’s answer, that Scottish cows are brown, is too broad. But we could imagine an agronomist who states “There is a breed of cows in Scotland that is brown”. And I suggest we should find that pretty reasonable. Essentially, we’re using Occam’s Razor: if we want to explain seeing a brown half-cow from a train, the simplest explanation would be that it’s a member of a breed of cows that are brown. It would be less simple if the cow were unique, a brown mutant in a breed of black and white cows. It would be even less simple if only one side of the cow were brown, and the other were another color.

When we use Occam’s Razor in this way, we’re drawing from our experience of cows. Most of the cows we meet are members of some breed or other, with similar characteristics. We don’t meet many mutant cows, or half-colored cows, so we think of those options as less simple, and less likely.

But what kind of experience tells us which option is simpler for electrons, or neutrinos?

The Standard Model is a type of theory called a Quantum Field Theory. We have experience with other Quantum Field Theories: we use them to describe materials, metals and fluids and so forth. Still, it seems a bit odd to say that if something is typical of these materials, it should also be typical of the universe. As another physicists in my sub-field, Nima Arkani-Hamed, likes to say, “the universe is not a crappy metal!”

We could also draw on our experience from other theories in physics. This is a bit more productive, but has other problems. Our other theories are invariably incomplete, that’s why we come up with new theories in the first place…and with so few theories, compared to breeds of cows, it’s unclear that we really have a good basis for experience.

Physicists like to brag that we study the most fundamental laws of nature. Ordinarily, this doesn’t matter as much as we pretend: there’s a lot to discover in the rest of science too, after all. But here, it really makes a difference. Unlike other fields, we don’t know what’s “normal”, so we can’t really tell which theories are “simpler” than others. We can make aesthetic judgements, on the simplicity of the math or the number of fields or the quality of the stories we can tell. If we want to be principled and forego all of that, then we’re left on an abyss, a world of bare observations and parameter soup.

If a physicist looks out a train window, will they say that all the electrons they see get their mass from the Higgs? Maybe, still. But they should be careful about it.

# Digging for Buried Insight

The scientific method, as we usually learn it, starts with a hypothesis. The scientist begins with a guess, and asks a question with a clear answer: true, or false? That guess lets them design an experiment, observe the consequences, and improve our knowledge of the world.

But where did the scientist get the hypothesis in the first place? Often, through some form of exploratory research.

Exploratory research is research done, not to answer a precise question, but to find interesting questions to ask. Each field has their own approach to exploration. A psychologist might start with interviews, asking broad questions to find narrower questions for a future survey. An ecologist might film an animal, looking for changes in its behavior. A chemist might measure many properties of a new material, seeing if any stand out. Each approach is like digging for treasure, not sure of exactly what you will find.

Mathematicians and theoretical physicists don’t do experiments, but we still need hypotheses. We need an idea of what we plan to prove, or what kind of theory we want to build: like other scientists, we want to ask a question with a clear, true/false answer. And to find those questions, we still do exploratory research.

What does exploratory research look like, in the theoretical world? Often, it begins with examples and calculations. We can start with a known method, or a guess at a new one, a recipe for doing some specific kind of calculation. Recipe in hand, we proceed to do the same kind of calculation for a few different examples, covering different sorts of situation. Along the way, we notice patterns: maybe the same steps happen over and over, or the result always has some feature.

We can then ask, do those same steps always happen? Does the result really always have that feature? We have our guess, our hypothesis, and our attempt to prove it is much like an experiment. If we find a proof, our hypothesis was true. On the other hand, we might not be able to find a proof. Instead, exploring, we might find a counterexample – one where the steps don’t occur, the feature doesn’t show up. That’s one way to learn that our hypothesis was false.

This kind of exploration is essential to discovery. As scientists, we all have to eventually ask clear yes/no questions, to submit our beliefs to clear tests. But we can’t start with those questions. We have to dig around first, to observe the world without a clear plan, to get to a point where we have a good question to ask.