Category Archives: Amateur Philosophy

Shape the Science to the Statistics, Not the Statistics to the Science

In theatre, and more generally in writing, the advice is always to “show, don’t tell”. You could just tell your audience that Long John Silver is a ruthless pirate, but it works a lot better to show him marching a prisoner off the plank. Rather than just informing with words, you want to make things as concrete as possible, with actions.

There is a similar rule in pedagogy. Pedagogy courses teach you to be explicit about your goals, planning a course by writing down Intended Learning Outcomes. (They never seem amused when I ask about the Unintended Learning Outcomes.) At first, you’d want to write down outcomes like “students will understand calculus” or “students will know what a sine is”. These, however, are hard to judge, and thus hard to plan around. Instead, the advice is to write outcomes that correspond to actions you want the students to take, things you want them to be capable of doing: “students can perform integration by parts” “students can decide correctly whether to use a sine or cosine”. Again and again, the best way to get the students to know something is to get them to do something.

Jay Daigle recently finished a series of blog posts on how scientists use statistics to test hypotheses. I recommend it, it’s a great introduction to the concepts scientists use to reason about data, as well as a discussion of how they often misuse those concepts and what they can do better. I have a bit of a different perspective on one of the “takeaways” of the post, and I wanted to highlight that here.

The center of Daigle’s point is a tool, widely used in science, called Neyman-Pearson Hypothesis Testing. Neyman-Pearson is a tool for making decisions involving a threshold for significance: a number that scientists often call a p-value. If you follow the procedure, only acting when you find a p-value below 0.05, then you will only be wrong 5% of the time: specifically, that will be your rate of false positives, the percent of the time you conclude some action works when it really doesn’t.

A core problem, from Daigle’s perspective, is that scientists use Neyman-Pearson for the wrong purpose. Neyman-Pearson is a tool for making decisions, not a test that tells you whether or not a specific claim is true. It tells you “on average, if I approve drugs when their p-value is below 0.05, only 5% of them will fail”. That’s great if you can estimate how bad it is to deny a drug that should be approved, how bad it is to approve a drug that should be denied, and calculate out on average how often you can afford to be wrong. It doesn’t tell you anything about the specific drug, though. It doesn’t tell you “every drug with a p-value below 0.05 works”. It certainly doesn’t tell you “a drug with a p-value of 0.051 almost works” or “a drug with a p-value of 0.001 definitely works”. It just doesn’t give you that information.

In later posts, Daigle suggests better tools, which he argues map better to what scientists want to do, as well as general ways scientists can do better. Section 4. in particular focuses on the idea that one thing scientists need to do is ask better questions. He uses a specific example from cognitive psychology, a study that tests whether describing someone’s face makes you worse at recognizing it later. That’s a clear scientific question, one that can be tested statistically. That doesn’t mean it’s a good question, though. Daigle points out that questions like this have a problem: it isn’t clear what the result actually tells us.

Here’s another example of the same problem. In grad school, I knew a lot of social psychologists. One was researching a phenomenon called extended contact. Extended contact is meant to be a foil to another phenomenon called direct contact, both having to do with our views of other groups. In direct contact, making a friend from another group makes you view that whole group better. In extended contact, making a friend who has a friend from another group makes you view the other group better.

The social psychologist was looking into a concrete-sounding question: which of these phenomena, direct or extended contact, is stronger?

At first, that seems like it has the same problem as Daigle’s example. Suppose one of these effects is larger: what does that mean? Why do we care?

Well, one answer is that these aren’t just phenomena: they’re interventions. If you know one phenomenon is stronger than another, you can use that to persuade people to be more accepting of other groups. The psychologist’s advisor even had a procedure to make people feel like they made a new friend. Armed with that, it’s definitely useful to know whether extended contact or direct contact is better: whichever one is stronger is the one you want to use!

You do need some “theory” behind this, of course. You need to believe that, if a phenomenon is stronger in your psychology lab, it will be stronger wherever you try to apply it in the real world. It probably won’t be stronger every single time, so you need some notion of how much stronger it needs to be. That in turn means you need to estimate costs: what it costs if you pick the weaker one instead, how much money you’re wasting or harm you’re doing.

You’ll notice this is sounding a lot like the requirements I described earlier, for Neyman-Pearson. That’s not accident: as you try to make your science more and more clearly defined, it will get closer and closer to a procedure to make a decision, and that’s exactly what Neyman-Pearson is good for.

So in the end I’m quite a bit more supportive of Neyman-Pearson than Daigle is. That doesn’t mean it isn’t being used wrong: most scientists are using it wrong. Instead of calculating a p-value each time they make a decision, they do it at the end of a paper, misinterpreting it as evidence that one thing or another is “true”. But I think that what these scientists need to do is not chance their statistics, but change their science. If they focused their science on making concrete decisions, they would actually be justified in using Neyman-Pearson…and their science would get a lot better in the process.

Einstein-Years

Scott Aaronson recently published an interesting exchange on his blog Shtetl Optimized, between him and cognitive psychologist Steven Pinker. The conversation was about AI: Aaronson is optimistic (though not insanely so) Pinker is pessimistic (again, not insanely though). While fun reading, the whole thing would normally be a bit too off-topic for this blog, except that Aaronson’s argument ended up invoking something I do know a bit about: how we make progress in theoretical physics.

Aaronson was trying to respond to an argument of Pinker’s, that super-intelligence is too vague and broad to be something we could expect an AI to have. Aaronson asks us to imagine an AI that is nothing more or less than a simulation of Einstein’s brain. Such a thing isn’t possible today, and might not even be efficient, but it has the advantage of being something concrete we can all imagine. Aarsonson then suggests imagining that AI sped up a thousandfold, so that in one year it covers a thousand years of Einstein’s thought. Such an AI couldn’t solve every problem, of course. But in theoretical physics, surely such an AI could be safely described as super-intelligent: an amazing power that would change the shape of physics as we know it.

I’m not as sure of this as Aaronson is. We don’t have a machine that generates a thousand Einstein-years to test, but we do have one piece of evidence: the 76 Einstein-years the man actually lived.

Einstein is rightly famous as a genius in theoretical physics. His annus mirabilis resulted in five papers that revolutionized the field, and the next decade saw his theory of general relativity transform our understanding of space and time. Later, he explored what general relativity was capable of and framed challenges that deepened our understanding of quantum mechanics.

After that, though…not so much. For Einstein-decades, he tried to work towards a new unified theory of physics, and as far as I’m aware made no useful progress at all. I’ve never seen someone cite work from that period of Einstein’s life.

Aarsonson mentions simulating Einstein “at his peak”, and it would be tempting to assume that the unified theory came “after his peak”, when age had weakened his mind. But while that kind of thing can sometimes be an issue for older scientists, I think it’s overstated. I don’t think careers peak early because of “youthful brains”, and with the exception of genuine dementia I don’t think older physicists are that much worse-off cognitively than younger ones. The reason so many prominent older physicists go down unproductive rabbit-holes isn’t because they’re old. It’s because genius isn’t universal.

Einstein made the progress he did because he was the right person to make that progress. He had the right background, the right temperament, and the right interests to take others’ mathematics and take them seriously as physics. As he aged, he built on what he found, and that background in turn enabled him to do more great things. But eventually, the path he walked down simply wasn’t useful anymore. His story ended, driven to a theory that simply wasn’t going to work, because given his experience up to that point that was the work that interested him most.

I think genius in physics is in general like that. It can feel very broad because a good genius picks up new tricks along the way, and grows their capabilities. But throughout, you can see the links: the tools mastered at one age that turn out to be just right for a new pattern. For the greatest geniuses in my field, you can see the “signatures” in their work, hints at why they were just the right genius for one problem or another. Give one a thousand years, and I suspect the well would eventually run dry: the state of knowledge would no longer be suitable for even their breadth.

…of course, none of that really matters for Aaronson’s point.

A century of Einstein-years wouldn’t have found the Standard Model or String Theory, but a century of physicist-years absolutely did. If instead of a simulation of Einstein, your AI was a simulation of a population of scientists, generating new geniuses as the years go by, then the argument works again. Sure, such an AI would be much more expensive, much more difficult to build, but the first one might have been as well. The point of the argument is simply to show such a thing is possible.

The core of Aaronson’s point rests on two key traits of technology. Technology is replicable: once we know how to build something, we can build more of it. Technology is scalable: if we know how to build something, we can try to build a bigger one with more resources. Evolution can tap into both of these, but not reliably: just because it’s possible to build a mind a thousand times better at some task doesn’t mean it will.

That is why the possibility of AI leads to the possibility of super-intelligence. If we can make a computer that can do something, we can make it do that something faster. That something doesn’t have to be “general”, you can have programs that excel at one task or another. For each such task, with more resources you can scale things up: so anything a machine can do now, a later machine can probably do better. Your starting-point doesn’t necessarily even have to be efficient, or a good algorithm: bad algorithms will take longer to scale, but could eventually get there too.

The only question at that point is “how fast?” I don’t have the impression that’s settled. The achievements that got Pinker and Aarsonson talking, GPT-3 and DALL-E and so forth, impressed people by their speed, by how soon they got to capabilities we didn’t expect them to have. That doesn’t mean that something we might really call super-intelligence is close: that has to do with the details, with what your target is and how fast you can actually scale. And it certainly doesn’t mean that another approach might not be faster! (As a total outsider, I can’t help but wonder if current ML is in some sense trying to fit a cubic with straight lines.)

It does mean, though, that super-intelligence isn’t inconceivable, or incoherent. It’s just the recognition that technology is a master of brute force, and brute force eventually triumphs. If you want to think about what happens in that “eventually”, that’s a very important thing to keep in mind.

The Most Anthropic of All Possible Worlds

Today, we’d call Leibniz a mathematician, a physicist, and a philosopher. As a mathematician, Leibniz turned calculus into something his contemporaries could actually use. As a physicist, he championed a doomed theory of gravity. In philosophy, he seems to be most remembered for extremely cheaty arguments.

Free will and determinism? Can’t it just be a coincidence?

I don’t blame him for this. Faced with a tricky philosophical problem, it’s enormously tempting to just blaze through with an answer that makes every subtlety irrelevant. It’s a temptation I’ve succumbed to time and time again. Faced with a genie, I would always wish for more wishes. On my high school debate team, I once forced everyone at a tournament to switch sides with some sneaky definitions. It’s all good fun, but people usually end up pretty annoyed with you afterwards.

People were annoyed with Leibniz too, especially with his solution to the problem of evil. If you believe in a benevolent, all-powerful god, as Leibniz did, why is the world full of suffering and misery? Leibniz’s answer was that even an all-powerful god is constrained by logic, so if the world contains evil, it must be logically impossible to make the world any better: indeed, we live in the best of all possible worlds. Voltaire famously made fun of this argument in Candide, dragging a Leibniz-esque Professor Pangloss through some of the most creative miseries the eighteenth century had to offer. It’s possibly the most famous satire of a philosopher, easily beating out Aristophanes’ The Clouds (which is also great).

Physicists can also get accused of cheaty arguments, and probably the most mocked is the idea of a multiverse. While it hasn’t had its own Candide, the multiverse has been criticized by everyone from bloggers to Nobel prizewinners. Leibniz wanted to explain the existence of evil, physicists want to explain “unnaturalness”: the fact that the kinds of theories we use to explain the world can’t seem to explain the mass of the Higgs boson. To explain it, these physicists suggest that there are really many different universes, separated widely in space or built in to the interpretation of quantum mechanics. Each universe has a different Higgs mass, and ours just happens to be the one we can live in. This kind of argument is called “anthropic” reasoning. Rather than the best of all possible worlds, it says we live in the world best-suited to life like ours.

I called Leibniz’s argument “cheaty”, and you might presume I think the same of the multiverse. But “cheaty” doesn’t mean “wrong”. It all depends what you’re trying to do.

Leibniz’s argument and the multiverse both work by dodging a problem. For Leibniz, the problem of evil becomes pointless: any evil might be necessary to secure a greater good. With a multiverse, naturalness becomes pointless: with many different laws of physics in different places, the existence of one like ours needs no explanation.

In both cases, though, the dodge isn’t perfect. To really explain any given evil, Leibniz would have to show why it is secretly necessary in the face of a greater good (and Pangloss spends Candide trying to do exactly that). To explain any given law of physics, the multiverse needs to use anthropic reasoning: it needs to show that that law needs to be the way it is to support human-like life.

This sounds like a strict requirement, but in both cases it’s not actually so useful. Leibniz could (and Pangloss does) come up with an explanation for pretty much anything. The problem is that no-one actually knows which aspects of the universe are essential and which aren’t. Without a reliable way to describe the best of all possible worlds, we can’t actually test whether our world is one.

The same problem holds for anthropic reasoning. We don’t actually know what conditions are required to give rise to people like us. “People like us” is very vague, and dramatically different universes might still contain something that can perceive and observe. While it might seem that there are clear requirements, so far there hasn’t been enough for people to do very much with this type of reasoning.

However, for both Leibniz and most of the physicists who believe anthropic arguments, none of this really matters. That’s because the “best of all possible worlds” and “most anthropic of all possible worlds” aren’t really meant to be predictive theories. They’re meant to say that, once you are convinced of certain things, certain problems don’t matter anymore.

Leibniz, in particular, wasn’t trying to argue for the existence of his god. He began the argument convinced that a particular sort of god existed: one that was all-powerful and benevolent, and set in motion a deterministic universe bound by logic. His argument is meant to show that, if you believe in such a god, then the problem of evil can be ignored: no matter how bad the universe seems, it may still be the best possible world.

Similarly, the physicists convinced of the multiverse aren’t really getting there through naturalness. Rather, they’ve become convinced of a few key claims: that the universe is rapidly expanding, leading to a proliferating multiverse, and that the laws of physics in such a multiverse can vary from place to place, due to the huge landscape of possible laws of physics in string theory. If you already believe those things, then the naturalness problem can be ignored: we live in some randomly chosen part of the landscape hospitable to life, which can be anywhere it needs to be.

So despite their cheaty feel, both arguments are fine…provided you agree with their assumptions. Personally, I don’t agree with Leibniz. For the multiverse, I’m less sure. I’m not confident the universe expands fast enough to create a multiverse, I’m not even confident it’s speeding up its expansion now. I know there’s a lot of controversy about the math behind the string theory landscape, about whether the vast set of possible laws of physics are as consistent as they’re supposed to be…and of course, as anyone must admit, we don’t know whether string theory itself is true! I don’t think it’s impossible that the right argument comes around and convinces me of one or both claims, though. These kinds of arguments, “if assumptions, then conclusion” are the kind of thing that seems useless for a while…until someone convinces you of the conclusion, and they matter once again.

So in the end, despite the similarity, I’m not sure the multiverse deserves its own Candide. I’m not even sure Leibniz deserved Candide. But hopefully by understanding one, you can understand the other just a bit better.

The Only Speed of Light That Matters

A couple weeks back, someone asked me about a Veritasium video with the provocative title “Why No One Has Measured The Speed Of Light”. Veritasium is a science popularization youtube channel, and usually a fairly good one…so it was a bit surprising to see it make a claim usually reserved for crackpots. Many, many people have measured the speed of light, including Ole Rømer all the way back in 1676. To argue otherwise seems like it demands a massive conspiracy.

Veritasium wasn’t proposing a conspiracy, though, just a technical point. Yes, many experiments have measured the speed of light. However, the speed they measure is in fact a “two-way speed”, the speed that light takes to go somewhere and then come back. They leave open the possibility that light travels differently in different directions, and only has the measured speed on average: that there are different “one-way speeds” of light.

The loophole is clearest using some of the more vivid measurements of the speed of light, timing how long it takes to bounce off a mirror and return. It’s less clear using other measurements of the speed of light, like Rømer’s. Rømer measured the speed of light using the moons of Jupiter, noticing that the time they took to orbit appeared to change based on whether Jupiter was moving towards or away from the Earth. For this measurement Rømer didn’t send any light to Jupiter…but he did have to make assumptions about Jupiter’s rotation, using it like a distant clock. Those assumptions also leave the door open to a loophole, one where the different one-way speeds of light are compensated by different speeds for distant clocks. You can watch the Veritasium video for more details about how this works, or see the wikipedia page for the mathematical details.

When we think of the speed of light as the same in all directions, in some sense we’re making a choice. We’ve chosen a convention, called the Einstein synchronization convention, that lines up distant clocks in a particular way. We didn’t have to choose that convention, though we prefer to (the math gets quite a bit more complicated if we don’t). And crucially for any such choice, it is impossible for any experiment to tell the difference.

So far, Veritasium is doing fine here. But if the video was totally fine, I wouldn’t have written this post. The technical argument is fine, but the video screws up its implications.

Near the end of the video, the host speculates whether this ambiguity is a clue. What if a deeper theory of physics could explain why we can’t tell the difference between different synchronizations? Maybe that would hint at something important.

Well, it does hint at something important, but not something new. What it hints at is that “one-way speeds” don’t matter. Not for light, or really for anything else.

Think about measuring the speed of something, anything. There are two ways to do it. One is to time it against something else, like the signal in a wire, and assume we know that speed. Veritasium shows an example of this, measuring the speed of a baseball that hits a target and sends a signal back. The other way is to send it somewhere with a clock we trust, and compare it to our clock. Each of these requires that something goes back and forth, even if it’s not the same thing each time. We can’t measure the one-way speed of anything because we’re never in two places at once. Everything we measure, every conclusion we come to about the world, rests on something “two-way”: our actions go out, our perceptions go in. Even our depth perception is an inference from our ancestors, whose experience seeing food and traveling to it calibrated our notion of distance.

Synchronization of clocks is a convention because the external world is a convention. What we have really, objectively, truly, are our perceptions and our memories. Everything else is a model we build to fill the gaps in between. Some features of that model are essential: if you change them, you no longer match our perceptions. Other features, though, are just convenience: ways we arrange the model to make it easier to use, to make it not “sound dumb”, to tell a coherent story. Synchronization is one of those things: the notion that you can compare times in distant places is convenient, but as relativity already tells us in other contexts, not necessary. It’s part of our storytelling, not an essential part of our model.

Don’t Trust the Experiments, Trust the Science

I was chatting with an astronomer recently, and this quote by Arthur Eddington came up:

“Never trust an experimental result until it has been confirmed by theory.”

Arthur Eddington

At first, this sounds like just typical theorist arrogance, thinking we’re better than all those experimentalists. It’s not that, though, or at least not just that. Instead, it’s commenting on a trend that shows up again and again in science, but rarely makes the history books. Again and again an experiment or observation comes through with something fantastical, something that seems like it breaks the laws of physics or throws our best models into disarray. And after a few months, when everyone has checked, it turns out there was a mistake, and the experiment agrees with existing theories after all.

You might remember a recent example, when a lab claimed to have measured neutrinos moving faster than the speed of light, only for it to turn out to be due to a loose cable. Experiments like this aren’t just a result of modern hype: as Eddington’s quote shows, they were also common in his day. In general, Eddington’s advice is good: when an experiment contradicts theory, theory tends to win in the end.

This may sound unscientific: surely we should care only about what we actually observe? If we defer to theory, aren’t we putting dogma ahead of the evidence of our senses? Isn’t that the opposite of good science?

To understand what’s going on here, we can use an old philosophical argument: David Hume’s argument against miracles. David Hume wanted to understand how we use evidence to reason about the world. He argued that, for miracles in particular, we can never have good evidence. In Hume’s definition, a miracle was something that broke the established laws of science. Hume argued that, if you believe you observed a miracle, there are two possibilities: either the laws of science really were broken, or you made a mistake. The thing is, laws of science don’t just come from a textbook: they come from observations as well, many many observations in many different conditions over a long period of time. Some of those observations establish the laws in the first place, others come from the communities that successfully apply them again and again over the years. If your miracle was real, then it would throw into doubt many, if not all, of those observations. So the question you have to ask is: it it more likely those observations were wrong? Or that you made a mistake? Put another way, your evidence is only good enough for a miracle if it would be a bigger miracle if you were wrong.

Hume’s argument always struck me as a little bit too strict: if you rule out miracles like this, you also rule out new theories of science! A more modern approach would use numbers and statistics, weighing the past evidence for a theory against the precision of the new result. Most of the time you’d reach the same conclusion, but sometimes an experiment can be good enough to overthrow a theory.

Still, theory should always sit in the background, a kind of safety net for when your experiments screw up. It does mean that when you don’t have that safety net you need to be extra-careful. Physics is an interesting case of this: while we have “the laws of physics”, we don’t have any established theory that tells us what kinds of particles should exist. That puts physics in an unusual position, and it’s probably part of why we have such strict standards of statistical proof. If you’re going to be operating without the safety net of theory, you need that kind of proof.

This post was also inspired by some biological examples. The examples are politically controversial, so since this is a no-politics blog I won’t discuss them in detail. (I’ll also moderate out any comments that do.) All I’ll say is that I wonder if in that case the right heuristic is this kind of thing: not to “trust scientists” or “trust experts” or even “trust statisticians”, but just to trust the basic, cartoon-level biological theory.

Facts About Math Are Facts About Us

Each year, the Niels Bohr International Academy has a series of public talks. Part of Copenhagen’s Folkeuniversitet (“people’s university”), they attract a mix of older people who want to keep up with modern developments and young students looking for inspiration. I gave a talk a few days ago, as part of this year’s program. The last time I participated, back in 2017, I covered a topic that comes up a lot on this blog: “The Quest for Quantum Gravity”. This year, I was asked to cover something more unusual: “The Unreasonable Effectiveness of Mathematics in the Natural Sciences”.

Some of you might notice that title is already taken: it’s a famous lecture by the physicist Wigner, from 1959. Wigner posed an interesting question: why is advanced mathematics so useful in physics? Time and time again, mathematicians develop an idea purely for its own sake, only for physicists to find it absolutely indispensable to describe some part of the physical world. Should we be surprised that this keeps working? Suspicious?

I talked a bit about this: some of the answers people have suggested over the years, and my own opinion. But like most public talks, the premise was mostly a vehicle for cool examples: physicists through history bringing in new math, and surprising mathematical facts like the ones I talked about a few weeks back at Culture Night. Because of that, I was actually a bit unprepared to dive into the philosophical side of the topic (despite it being in principle a very philosophical topic!) When one of the audience members brought up mathematical Platonism, I floundered a bit, not wanting to say something that was too philosophically naive.

Well, if there’s anywhere I can be naive, it’s my own blog. I even have a label for Amateur Philosophy posts. So let’s do one.

Mathematical Platonism is the idea that mathematical truths “exist”: that they’re somewhere “out there” being discovered. On the other side, one might believe that mathematics is not discovered, but invented. For some reason, a lot of people with the latter opinion seem to think this has something to do with describing nature (for example, an essay a few years back by Lee Smolin defines mathematics as “the study of systems of evoked relationships inspired by observations of nature”).

I’m not a mathematical Platonist. I don’t even like to talk about which things do or don’t “exist”. But I also think that describing mathematics in terms of nature is missing the point. Mathematicians aren’t physicists. While there may have been a time when geometers argued over lines in the sand, these days mathematicians’ inspiration isn’t usually the natural world, at least not in the normal sense.

Instead, I think you can’t describe mathematics without describing mathematicians. A mathematical fact is, deep down, something a mathematician can say without other mathematicians shouting them down. It’s an allowed move in what my hazy secondhand memory of Wittgenstein wants to call a “language game”: something that gets its truth from a context of people interpreting and reacting to it, in the same way a move in chess matters only when everyone is playing by its rules.

This makes mathematics sound very subjective, and we’re used to the opposite: the idea that a mathematical fact is as objective as they come. The important thing to remember is that even with this kind of description, mathematics still ends up vastly less subjective than any other field. We care about subjectivity between different people: if a fact is “true” for Brits and “false” for Germans, then it’s a pretty limited fact. Mathematics is special because the “rules of its game” aren’t rules of one group or another. They’re rules that are in some sense our birthright. Any human who can read and write, or even just act and perceive, can act as a Turing Machine, a universal computer. With enough patience and paper, anything that you can prove to one person you can prove to another: you just have to give them the rules and let them follow them. It doesn’t matter how smart you are, or what you care about most: if something is mathematically true for others, it is mathematically true for you.

Some would argue that this is evidence for mathematical Platonism, that if something is a universal truth it should “exist”. Even if it does, though, I don’t think it’s useful to think of it in that way. Once you believe that mathematical truth is “out there”, you want to try to characterize it, to say something about it besides that it’s “out there”. You’ll be tempted to have an opinion on the Axiom of Choice, or the Continuum Hypothesis. And the whole point is that those aren’t sensible things to have opinions on, that having an opinion about them means denying the mathematical proofs that they are, in the “standard” axioms, undecidable. Whatever is “out there”, it has to include everything you can prove with every axiom system, whichever non-standard ones you can cook up, because mathematicians will happily work on any of them. The whole point of mathematics, the thing that makes it as close to objective as anything can be, is that openness: the idea that as long as an argument is good enough, as long as it can convince anyone prepared to wade through the pages, then it is mathematics. Nothing, so long as it can convince in the long-run, is excluded.

If we take this definition seriously, there are some awkward consequences. You could imagine a future in which every mind, everyone you might be able to do mathematics with, is crushed under some tyrant, forced to agree to something false. A real philosopher would dig in to this corner case, try to salvage the definition or throw it out. I’m not a real philosopher though. So all I can say is that while I don’t think that tyrant gets to define mathematics, I also don’t think there are good alternatives to my argument. Our only access to mathematics, and to truth in general, is through the people who pursue it. I don’t think we can define one without the other.

Of Cows and Razors

Last week’s post came up on Reddit, where a commenter made a good point. I said that one of the mysteries of neutrinos is that they might not get their mass from the Higgs boson. This is true, but the commenter rightly points out it’s true of other particles too: electrons might not get their mass from the Higgs. We aren’t sure. The lighter quarks might not get their mass from the Higgs either.

When talking physics with the public, we usually say that electrons and quarks all get their mass from the Higgs. That’s how it works in our Standard Model, after all. But even though we’ve found the Higgs boson, we can’t be 100% sure that it functions the way our model says. That’s because there are aspects of the Higgs we haven’t been able to measure directly. We’ve measured how it affects the heaviest quark, the top quark, but measuring its interactions with other particles will require a bigger collider. Until we have those measurements, the possibility remains open that electrons and quarks get their mass another way. It would be a more complicated way: we know the Higgs does a lot of what the model says, so if it deviates in another way we’d have to add more details, maybe even more undiscovered particles. But it’s possible.

If I wanted to defend the idea that neutrinos are special here, I would point out that neutrino masses, unlike electron masses, are not part of the Standard Model. For electrons, we have a clear “default” way for them to get mass, and that default is in a meaningful way simpler than the alternatives. For neutrinos, every alternative is complicated in some fashion: either adding undiscovered particles, or unusual properties. If we were to invoke Occam’s Razor, the principle that we should always choose the simplest explanation, then for electrons and quarks there is a clear winner. Not so for neutrinos.

I’m not actually going to make this argument. That’s because I’m a bit wary of using Occam’s Razor when it comes to questions of fundamental physics. Occam’s Razor is a good principle to use, if you have a good idea of what’s “normal”. In physics, you don’t.

To illustrate, I’ll tell an old joke about cows and trains. Here’s the version from The Curious Incident of the Dog in the Night-Time:

There are three men on a train. One of them is an economist and one of them is a logician and one of them is a mathematician. And they have just crossed the border into Scotland (I don’t know why they are going to Scotland) and they see a brown cow standing in a field from the window of the train (and the cow is standing parallel to the train). And the economist says, ‘Look, the cows in Scotland are brown.’ And the logician says, ‘No. There are cows in Scotland of which at least one is brown.’ And the mathematician says, ‘No. There is at least one cow in Scotland, of which one side appears to be brown.’

One side of this cow appears to be very fluffy.

If we want to be as careful as possible, the mathematician’s answer is best. But we expect not to have to be so careful. Maybe the economist’s answer, that Scottish cows are brown, is too broad. But we could imagine an agronomist who states “There is a breed of cows in Scotland that is brown”. And I suggest we should find that pretty reasonable. Essentially, we’re using Occam’s Razor: if we want to explain seeing a brown half-cow from a train, the simplest explanation would be that it’s a member of a breed of cows that are brown. It would be less simple if the cow were unique, a brown mutant in a breed of black and white cows. It would be even less simple if only one side of the cow were brown, and the other were another color.

When we use Occam’s Razor in this way, we’re drawing from our experience of cows. Most of the cows we meet are members of some breed or other, with similar characteristics. We don’t meet many mutant cows, or half-colored cows, so we think of those options as less simple, and less likely.

But what kind of experience tells us which option is simpler for electrons, or neutrinos?

The Standard Model is a type of theory called a Quantum Field Theory. We have experience with other Quantum Field Theories: we use them to describe materials, metals and fluids and so forth. Still, it seems a bit odd to say that if something is typical of these materials, it should also be typical of the universe. As another physicists in my sub-field, Nima Arkani-Hamed, likes to say, “the universe is not a crappy metal!”

We could also draw on our experience from other theories in physics. This is a bit more productive, but has other problems. Our other theories are invariably incomplete, that’s why we come up with new theories in the first place…and with so few theories, compared to breeds of cows, it’s unclear that we really have a good basis for experience.

Physicists like to brag that we study the most fundamental laws of nature. Ordinarily, this doesn’t matter as much as we pretend: there’s a lot to discover in the rest of science too, after all. But here, it really makes a difference. Unlike other fields, we don’t know what’s “normal”, so we can’t really tell which theories are “simpler” than others. We can make aesthetic judgements, on the simplicity of the math or the number of fields or the quality of the stories we can tell. If we want to be principled and forego all of that, then we’re left on an abyss, a world of bare observations and parameter soup.

If a physicist looks out a train window, will they say that all the electrons they see get their mass from the Higgs? Maybe, still. But they should be careful about it.

Who Is, and Isn’t, Counting Angels on a Pinhead

How many angels can dance on the head of a pin?

It’s a question famous for its sheer pointlessness. While probably no-one ever had that exact debate, “how many angels fit on a pin” has become a metaphor, first for a host of old theology debates that went nowhere, and later for any academic study that seems like a waste of time. Occasionally, physicists get accused of doing this: typically string theorists, but also people who debate interpretations of quantum mechanics.

Are those accusations fair? Sometimes yes, sometimes no. In order to tell the difference, we should think about what’s wrong, exactly, with counting angels on the head of a pin.

One obvious answer is that knowing the number of angels that fit on a needle’s point is useless. Wikipedia suggests that was the origin of the metaphor in the first place, a pun on “needle’s point” and “needless point”. But this answer is a little too simple, because this would still be a useful debate if angels were real and we could interact with them. “How many angels fit on the head of a pin” is really a question about whether angels take up space, whether two angels can be at the same place at the same time. Asking that question about particles led physicists to bosons and fermions, which among other things led us to invent the laser. If angelology worked, perhaps we would have angel lasers as well.

Be not afraid of my angel laser

“If angelology worked” is key here, though. Angelology didn’t work, it didn’t lead to angel-based technology. And while Medieval people couldn’t have known that for certain, maybe they could have guessed. When people accuse academics of “counting angels on the head of a pin”, they’re saying they should be able to guess that their work is destined for uselessness.

How do you guess something like that?

Well, one problem with counting angels is that nobody doing the counting had ever seen an angel. Counting angels on the head of a pin implies debating something you can’t test or observe. That can steer you off-course pretty easily, into conclusions that are either useless or just plain wrong.

This can’t be the whole of the problem though, because of mathematics. We rarely accuse mathematicians of counting angels on the head of a pin, but the whole point of math is to proceed by pure logic, without an experiment in sight. Mathematical conclusions can sometimes be useless (though we can never be sure, some ideas are just ahead of their time), but we don’t expect them to be wrong.

The key difference is that mathematics has clear rules. When two mathematicians disagree, they can look at the details of their arguments, make sure every definition is as clear as possible, and discover which one made a mistake. Working this way, what they build is reliable. Even if it isn’t useful yet, the result is still true, and so may well be useful later.

In contrast, when you imagine Medieval monks debating angels, you probably don’t imagine them with clear rules. They might quote contradictory bible passages, argue everyday meanings of words, and win based more on who was poetic and authoritative than who really won the argument. Picturing a debate over how many angels can fit on the head of a pin, it seems more like Calvinball than like mathematics.

This then, is the heart of the accusation. Saying someone is just debating how many angels can dance on a pin isn’t merely saying they’re debating the invisible. It’s saying they’re debating in a way that won’t go anywhere, a debate without solid basis or reliable conclusions. It’s saying, not just that the debate is useless now, but that it will likely always be useless.

As an outsider, you can’t just dismiss a field because it can’t do experiments. What you can and should do, is dismiss a field that can’t produce reliable knowledge. This can be hard to judge, but a key sign is to look for these kinds of Calvinball-style debates. Do people in the field seem to argue the same things with each other, over and over? Or do they make progress and open up new questions? Do the people talking seem to be just the famous ones? Or are there cases of young and unknown researchers who happen upon something important enough to make an impact? Do people just list prior work in order to state their counter-arguments? Or do they build on it, finding consequences of others’ trusted conclusions?

A few corners of string theory do have this Calvinball feel, as do a few of the debates about the fundamentals of quantum mechanics. But if you look past the headlines and blogs, most of each of these fields seems more reliable. Rather than interminable back-and-forth about angels and pinheads, these fields are quietly accumulating results that, one way or another, will give people something to build on.

Being Precise About Surprise

A reader pointed me to Stephen Wolfram’s one-year update of his proposal for a unified theory of physics. I was pretty squeamish about it one year ago, and now I’m even less interested in wading in to the topic. But I thought it would be worth saying something, and rather than say something specific, I realized I could say something general. I thought I’d talk a bit about how we judge good and bad research in theoretical physics.

In science, there are two things we want out of a new result: we want it to be true, and we want it to be surprising. The first condition should be obvious, but the second is also important. There’s no reason to do an experiment or calculation if it will just tell us something we already know. We do science in the hope of learning something new, and that means that the best results are the ones we didn’t expect.

(What about replications? We’ll get there.)

If you’re judging an experiment, you can measure both of these things with statistics. Statistics lets you estimate how likely an experiment’s conclusion is to be true: was there a large enough sample? Strong enough evidence? It also lets you judge how surprising the experiment is, by estimating how likely it would be to happen given what was known beforehand. Did existing theories and earlier experiments make the result seem likely, or unlikely? While you might not have considered replications surprising, from this perspective they can be: if a prior experiment seems unreliable, successfully replicating it can itself be a surprising result.

If instead you’re judging a theoretical result, these measures get more subtle. There aren’t always good statistical tools to test them. Nonetheless, you don’t have to rely on vague intuitions either. You can be fairly precise, both about how true a result is and how surprising it is.

We get our results in theoretical physics through mathematical methods. Sometimes, this is an actual mathematical proof: guaranteed to be true, no statistics needed. Sometimes, it resembles a proof, but falls short: vague definitions and unstated assumptions mar the argument, making it less likely to be true. Sometimes, the result uses an approximation. In those cases we do get to use some statistics, estimating how good the approximation may be. Finally, a result can’t be true if it contradicts something we already know. This could be a logical contradiction in the result itself, but if the result is meant to describe reality (note: not always the case), it might contradict the results of a prior experiment.

What makes a theoretical result surprising? And how precise can we be about that surprise?

Theoretical results can be surprising in the light of earlier theory. Sometimes, this gets made precise by a no-go theorem, a proof that some kind of theoretical result is impossible to obtain. If a result finds a loophole in a no-go theorem, that can be quite surprising. Other times, a result is surprising because it’s something no-one else was able to do. To be precise about that kind of surprise, you need to show that the result is something others wanted to do, but couldn’t. Maybe someone else made a conjecture, and only you were able to prove it. Maybe others did approximate calculations, and now you can do them more precisely. Maybe a question was controversial, with different people arguing for different sides, and you have a more conclusive argument. This is one of the better reasons to include a long list of references in a paper: not to pad your friends’ citation counts, but to show that your accomplishment is surprising: that others might have wanted to achieve it, but had to settle for something lesser.

In general, this means that showing whether a theoretical result is good: not merely true, but surprising and new, links you up to the rest of the theoretical community. You can put in all the work you like on a theory of everything, and make it as rigorous as possible, but if all you did was reproduce a sub-case of someone else’s theory then you haven’t accomplished all that much. If you put your work in context, compare and contrast to what others have done before, then we can start getting precise about how much we should be surprised, and get an idea of what your result is really worth.

Reality as an Algebra of Observables

Listen to a physicist talk about quantum mechanics, and you’ll hear the word “observable”. Observables are, intuitively enough, things that can be observed. They’re properties that, in principle, one could measure in an experiment, like the position of a particle or its momentum. They’re the kinds of things linked by uncertainty principles, where the better you know one, the worse you know the other.

Some physicists get frustrated by this focus on measurements alone. They think we ought to treat quantum mechanics, not like a black box that produces results, but as information about some underlying reality. Instead of just observables, they want us to look for “beables“: not just things that can be observed, but things that something can be. From their perspective, the way other physicists focus on observables feels like giving up, like those physicists are abandoning their sacred duty to understand the world. Others, like the Quantum Bayesians or QBists, disagree, arguing that quantum mechanics really is, and ought to be, a theory of how individuals get evidence about the world.

I’m not really going to weigh in on that debate, I still don’t feel like I know enough to even write a decent summary. But I do think that one of the instincts on the “beables” side is wrong. If we focus on observables in quantum mechanics, I don’t think we’re doing anything all that unusual. Even in other parts of physics, we can think about reality purely in terms of observations. Doing so isn’t a dereliction of duty: often, it’s the most useful way to understand the world.

When we try to comprehend the world, we always start alone. From our time in the womb, we have only our senses and emotions to go on. With a combination of instinct and inference we start assembling a consistent picture of reality. Philosophers called phenomenologists (not to be confused with the physicists called phenomenologists) study this process in detail, trying to characterize how different things present themselves to an individual consciousness.

For my point here, these details don’t matter so much. That’s because in practice, we aren’t alone in understanding the world. Based on what others say about the world, we conclude they perceive much like we do, and we learn by their observations just as we learn by our own. We can make things abstract: instead of the specifics of how individuals perceive, we think about groups of scientists making measurements. At the end of this train lie observables: things that we as a community could in principle learn, and share with each other, ignoring the details of how exactly we measure them.

If each of these observables was unrelated, just scattered points of data, then we couldn’t learn much. Luckily, they are related. In quantum mechanics, some of these relationships are the uncertainty principles I mentioned earlier. Others relate measurements at different places, or at different times. The fancy way to refer to all these relationships is as an algebra: loosely, it’s something you can “do algebra with”, like you did with numbers and variables in high school. When physicists and mathematicians want to do quantum mechanics or quantum field theory seriously, they often talk about an “algebra of observables”, a formal way of thinking about all of these relationships.

Focusing on those two things, observables and how they are related, isn’t just useful in the quantum world. It’s an important way to think in other areas of physics too. If you’ve heard people talk about relativity, the focus on measurement screams out, in thought experiments full of abstract clocks and abstract yardsticks. Without this discipline, you find paradoxes, only to resolve them when you carefully track what each person can observe. More recently, physicists in my field have had success computing the chance particles collide by focusing on the end result, the actual measurements people can make, ignoring what might happen in between to cause that measurement. We can then break measurements down into simpler measurements, or use the structure of simpler measurements to guess more complicated ones. While we typically have done this in quantum theories, that’s not really a limitation: the same techniques make sense for problems in classical physics, like computing the gravitational waves emitted by colliding black holes.

With this in mind, we really can think of reality in those terms: not as a set of beable objects, but as a set of observable facts, linked together in an algebra of observables. Paring things down to what we can know in this way is more honest, and it’s also more powerful and useful. Far from a betrayal of physics, it’s the best advantage we physicists have in our quest to understand the world.