Tag Archives: philosophy of science

On Stubbornness and Breaking Down

In physics, we sometimes say that an idea “breaks down”. What do we mean by that?

When a theory “breaks down”, we mean that it stops being accurate. Newton’s theory of gravity is excellent most of the time, but for objects under strong enough gravity or high enough speed its predictions stop matching reality and a new theory (relativity) is needed. This is the sense in which we say that Newtonian gravity breaks down for the orbit of mercury, or breaks down much more severely in the area around a black hole.

When a symmetry is “broken”, we mean that it stops holding true. Most of physics looks the same when you flip it in a mirror, a property called parity symmetry. Take a pile of electric and magnetic fields, currents and wires, and you’ll find their mirror reflection is also a perfectly reasonable pile of electric and magnetic fields, currents and wires. This isn’t true for all of physics, though: the weak nuclear force isn’t the same when you flip it in a mirror. We say that the weak force breaks parity symmetry.

What about when a more general “idea” breaks down? What about space-time?

In order for space-time to break down, there needs to be a good reason to abandon the idea. And depending on how stubborn you are about it, that reason can come at different times.

You might think of space-time as just Einstein’s theory of general relativity. In that case, you could say that space-time breaks down as soon as the world deviates from that theory. In that view, any modification to general relativity, no matter how small, corresponds to space-time breaking down. You can think of this as the “least stubborn” option, the one with barely any stubbornness at all, that will let space-time break down with a tiny nudge.

But if general relativity breaks down, a slightly more stubborn person could insist that space-time is still fine. You can still describe things as located at specific places and times, moving across curved space-time. They just obey extra forces, on top of those built into the space-time.

Such a person would be happy as long as general relativity was a good approximation of what was going on, but they might admit space-time has broken down when general relativity becomes a bad approximation. If there are only small corrections on top of the usual space-time picture, then space-time would be fine, but if those corrections got so big that they overwhelmed the original predictions of general relativity then that’s quite a different situation. In that situation, space-time may have stopped being a useful description, and it may be much better to describe the world in another way.

But we could imagine an even more stubborn person who still insists that space-time is fine. Ultimately, our predictions about the world are mathematical formulas. No matter how complicated they are, we can always subtract a piece off of those formulas corresponding to the predictions of general relativity, and call the rest an extra effect. That may be a totally useless thing to do that doesn’t help you calculate anything, but someone could still do it, and thus insist that space-time still hasn’t broken down.

To convince such a person, space-time would need to break down in a way that made some important concept behind it invalid. There are various ways this could happen, corresponding to different concepts. For example, one unusual proposal is that space-time is non-commutative. If that were true then, in addition to the usual Heisenberg uncertainty principle between position and momentum, there would be an uncertainty principle between different directions in space-time. That would mean that you can’t define the position of something in all directions at once, which many people would agree is an important part of having a space-time!

Ultimately, physics is concerned with practicality. We want our concepts not just to be definable, but to do useful work in helping us understand the world. Our stubbornness should depend on whether a concept, like space-time, is still useful. If it is, we keep it. But if the situation changes, and another concept is more useful, then we can confidently say that space-time has broken down.

Why Dark Matter Feels Like Cheating (And Why It Isn’t)

I’ve never met someone who believed the Earth was flat. I’ve met a few who believed it was six thousand years old, but not many. Occasionally, I run into crackpots who rail against relativity or quantum mechanics, or more recent discoveries like quarks or the Higgs. But for one conclusion of modern physics, the doubters are common. For this one idea, the average person may not insist that the physicists are wrong, but they’ll usually roll their eyes a little bit, ask the occasional “really?”

That idea is dark matter.

For the average person, dark matter doesn’t sound like normal, responsible science. It sounds like cheating. Scientists try to explain the universe, using stars and planets and gravity, and eventually they notice the equations don’t work, so they just introduce some new matter nobody can detect. It’s as if a budget didn’t add up, so the accountant just introduced some “dark expenses” to hide the problem.

Part of what’s going on here is that fundamental physics, unlike other fields, doesn’t have to reduce to something else. An accountant has to explain the world in terms of transfers of money, a chemist in terms of atoms and molecules. A physicist has to explain the world in terms of math, with no more restrictions than that. Whatever the “base level” of another field is, physics can, and must, go deeper.

But that doesn’t explain everything. Physics may have to explain things in terms of math, but we shouldn’t just invent new math whenever we feel like it. Surely, we should prefer explanations in terms of things we know to explanations in terms of things we don’t know. The question then becomes, what justifies the preference? And when do we get to break it?

Imagine you’re camping in your backyard. You’ve brought a pack of jumbo marshmallows. You wake up to find a hole torn in the bag, a few marshmallows strewn on a trail into the bushes, the rest gone. You’re tempted to imagine a new species of ant, with enormous jaws capable of ripping open plastic and hauling the marshmallows away. Then you remember your brother likes marshmallows, and it’s probably his fault.

Now imagine instead you’re camping in the Amazon rainforest. Suddenly, the ant explanation makes sense. You may not have a particular species of ants in mind, but you know the rainforest is full of new species no-one has yet discovered. And you’re pretty sure your brother couldn’t have flown to your campsite in the middle of the night and stolen your marshmallows.

We do have a preference against introducing new types of “stuff”, like new species of ants or new particles. We have that preference because these new types of stuff are unlikely, based on our current knowledge. We don’t expect new species of ants in our backyards, because we think we have a pretty good idea of what kinds of ants exist, and we think a marshmallow-stealing brother is more likely. That preference gets dropped, however, based on the strength of the evidence. If it’s very unlikely our brother stole the marshmallows, and if we’re somewhere our knowledge of ants is weak, then the marshmallow-stealing ants are more likely.

Dark matter is a massive leap. It’s not a massive leap because we can’t see it, but simply because it involves new particles, particles not in our Standard Model of particle physics. (Or, for the MOND-ish fans, new fields not present in Einstein’s theory of general relativity.) It’s hard to justify physics beyond the Standard Model, and our standards for justifying it are in general very high: we need very precise experiments to conclude that the Standard Model is well and truly broken.

For dark matter, we keep those standards. The evidence for some kind of dark matter, that there is something that can’t be explained by just the Standard Model and Einstein’s gravity, is at this point very strong. Far from a vague force that appears everywhere, we can map dark matter’s location, systematically describe its effect on the motion of galaxies to clusters of galaxies to the early history of the universe. We’ve checked if there’s something we’ve left out, if black holes or unseen planets might cover it, and they can’t. It’s still possible we’ve missed something, just like it’s possible your brother flew to the Amazon to steal your marshmallows, but it’s less likely than the alternatives.

Also, much like ants in the rainforest, we don’t know every type of particle. We know there are things we’re missing: new types of neutrinos, or new particles to explain quantum gravity. These don’t have to have anything to do with dark matter, they might be totally unrelated. But they do show that we should expect, sometimes, to run into particles we don’t already know about. We shouldn’t expect that we already know all the particles.

If physicists did what the cartoons suggest, it really would be cheating. If we proposed dark matter because our equations didn’t match up, and stopped checking, we’d be no better than an accountant adding “dark money” to a budget. But we didn’t do that. When we argue that dark matter exists, it’s because we’ve actually tried to put together the evidence, because we’ve weighed it against the preference to stick with the Standard Model and found the evidence tips the scales. The instinct to call it cheating is a good instinct, one you should cultivate. But here, it’s an instinct physicists have already taken into account.

Machine Learning, Occam’s Razor, and Fundamental Physics

There’s a saying in physics, attributed to the famous genius John von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

Say you want to model something, like some surprising data from a particle collider. You start with some free parameters: numbers in your model that aren’t decided yet. You then decide those numbers, “fixing” them based on the data you want to model. Your goal is for your model not only to match the data, but to predict something you haven’t yet measured. Then you can go out and check, and see if your model works.

The more free parameters you have in your model, the easier this can go wrong. More free parameters make it easier to fit your data, but that’s because they make it easier to fit any data. Your model ends up not just matching the physics, but matching the mistakes as well: the small errors that crop up in any experiment. A model like that may look like it’s a great fit to the data, but its predictions will almost all be wrong. It wasn’t just fit, it was overfit.

We have statistical tools that tell us when to worry about overfitting, when we should be impressed by a model and when it has too many parameters. We don’t actually use these tools correctly, but they still give us a hint of what we actually want to know, namely, whether our model will make the right predictions. In a sense, these tools form the mathematical basis for Occam’s Razor, the idea that the best explanation is often the simplest one, and Occam’s Razor is a critical part of how we do science.

So, did you know machine learning was just modeling data?

All of the much-hyped recent advances in artificial intelligence, GPT and Stable Diffusion and all those folks, at heart they’re all doing this kind of thing. They start out with a model (with a lot more than five parameters, arranged in complicated layers…), then use data to fix the free parameters. Unlike most of the models physicists use, they can’t perfectly fix these numbers: there are too many of them, so they have to approximate. They then test their model on new data, and hope it still works.

Increasingly, it does, and impressively well, so well that the average person probably doesn’t realize this is what it’s doing. When you ask one of these AIs to make an image for you, what you’re doing is asking what image the model predicts would show up captioned with your text. It’s the same sort of thing as asking an economist what their model predicts the unemployment rate will be when inflation goes up. The machine learning model is just way, way more complicated.

As a physicist, the first time I heard about this, I had von Neumann’s quote in the back of my head. Yes, these machines are dealing with a lot more data, from a much more complicated reality. They literally are trying to fit elephants, even elephants wiggling their trunks. Still, the sheer number of parameters seemed fishy here. And for a little bit things seemed even more fishy, when I learned about double descent.

Suppose you start increasing the number of parameters in your model. Initially, your model gets better and better. Your predictions have less and less error, your error descends. Eventually, though, the error increases again: you have too many parameters so you’re over-fitting, and your model is capturing accidents in your data, not reality.

In machine learning, weirdly, this is often not the end of the story. Sometimes, your prediction error rises, only to fall once more, in a double descent.

For a while, I found this deeply disturbing. The idea that you can fit your data, start overfitting, and then keep overfitting, and somehow end up safe in the end, was terrifying. The way some of the popular accounts described it, like you were just overfitting more and more and that was fine, was baffling, especially when they seemed to predict that you could keep adding parameters, keep fitting tinier and tinier fleas on the elephant’s trunk, and your predictions would never start going wrong. It would be the death of Occam’s Razor as we know it, more complicated explanations beating simpler ones off to infinity.

Luckily, that’s not what happens. And after talking to a bunch of people, I think I finally understand this enough to say something about it here.

The right way to think about double descent is as overfitting prematurely. You do still expect your error to eventually go up: your model won’t be perfect forever, at some point you will really overfit. It might take a long time, though: machine learning people are trying to model very complicated things, like human behavior, with giant piles of data, so very complicated models may often be entirely appropriate. In the meantime, due to a bad choice of model, you can accidentally overfit early. You will eventually overcome this, pushing past with more parameters into a model that works again, but for a little while you might convince yourself, wrongly, that you have nothing more to learn.

(You can even mitigate this by tweaking your setup, potentially avoiding the problem altogether.)

So Occam’s Razor still holds, but with a twist. The best model is simple enough, but no simpler. And if you’re not careful enough, you can convince yourself that a too-simple model is as complicated as you can get.

Image from Astral Codex Ten

I was reminded of all this recently by some articles by Sabine Hossenfelder.

Hossenfelder is a critic of mainstream fundamental physics. The articles were her restating a point she’s made many times before, including in (at least) one of her books. She thinks the people who propose new particles and try to search for them are wasting time, and the experiments motivated by those particles are wasting money. She’s motivated by something like Occam’s Razor, the need to stick to the simplest possible model that fits the evidence. In her view, the simplest models are those in which we don’t detect any more new particles any time soon, so those are the models she thinks we should stick with.

I tend to disagree with Hossenfelder. Here, I was oddly conflicted. In some of her examples, it seemed like she had a legitimate point. Others seemed like she missed the mark entirely.

Talk to most astrophysicists, and they’ll tell you dark matter is settled science. Indeed, there is a huge amount of evidence that something exists out there in the universe that we can’t see. It distorts the way galaxies rotate, lenses light with its gravity, and wiggled the early universe in pretty much the way you’d expect matter to.

What isn’t settled is whether that “something” interacts with anything else. It has to interact with gravity, of course, but everything else is in some sense “optional”. Astroparticle physicists use satellites to search for clues that dark matter has some other interactions: perhaps it is unstable, sometimes releasing tiny signals of light. If it did, it might solve other problems as well.

Hossenfelder thinks this is bunk (in part because she thinks those other problems are bunk). I kind of do too, though perhaps for a more general reason: I don’t think nature owes us an easy explanation. Dark matter isn’t obligated to solve any of our other problems, it just has to be dark matter. That seems in some sense like the simplest explanation, the one demanded by Occam’s Razor.

At the same time, I disagree with her substantially more on collider physics. At the Large Hadron Collider so far, all of the data is reasonably compatible with the Standard Model, our roughly half-century old theory of particle physics. Collider physicists search that data for subtle deviations, one of which might point to a general discrepancy, a hint of something beyond the Standard Model.

While my intuitions say that the simplest dark matter is completely dark, they don’t say that the simplest particle physics is the Standard Model. Back when the Standard Model was proposed, people might have said it was exceptionally simple because it had a property called “renormalizability”, but these days we view that as less important. Physicists like Ken Wilson and Steven Weinberg taught us to view theories as a kind of series of corrections, like a Taylor series in calculus. Each correction encodes new, rarer ways that particles can interact. A renormalizable theory is just the first term in this series. The higher terms might be zero, but they might not. We even know that some terms cannot be zero, because gravity is not renormalizable.

The two cases on the surface don’t seem that different. Dark matter might have zero interactions besides gravity, but it might have other interactions. The Standard Model might have zero corrections, but it might have nonzero corrections. But for some reason, my intuition treats the two differently: I would find it completely reasonable for dark matter to have no extra interactions, but very strange for the Standard Model to have no corrections.

I think part of where my intuition comes from here is my experience with other theories.

One example is a toy model called sine-Gordon theory. In sine-Gordon theory, this Taylor series of corrections is a very familiar Taylor series: the sine function! If you go correction by correction, you’ll see new interactions and more new interactions. But if you actually add them all up, something surprising happens. Sine-Gordon turns out to be a special theory, one with “no particle production”: unlike in normal particle physics, in sine-Gordon particles can neither be created nor destroyed. You would never know this if you did not add up all of the corrections.

String theory itself is another example. In string theory, elementary particles are replaced by strings, but you can think of that stringy behavior as a series of corrections on top of ordinary particles. Once again, you can try adding these things up correction by correction, but once again the “magic” doesn’t happen until the end. Only in the full series does string theory “do its thing”, and fix some of the big problems of quantum gravity.

If the real world really is a theory like this, then I think we have to worry about something like double descent.

Remember, double descent happens when our models can prematurely get worse before getting better. This can happen if the real thing we’re trying to model is very different from the model we’re using, like the example in this explainer that tries to use straight lines to match a curve. If we think a model is simpler because it puts fewer corrections on top of the Standard Model, then we may end up rejecting a reality with infinite corrections, a Taylor series that happens to add up to something quite nice. Occam’s Razor stops helping us if we can’t tell which models are really the simple ones.

The problem here is that every notion of “simple” we can appeal to here is aesthetic, a choice based on what makes the math look nicer. Other sciences don’t have this problem. When a biologist or a chemist wants to look for the simplest model, they look for a model with fewer organisms, fewer reactions…in the end, fewer atoms and molecules, fewer of the building-blocks given to those fields by physics. Fundamental physics can’t do this: we build our theories up from mathematics, and mathematics only demands that we be consistent. We can call theories simpler because we can write them in a simple way (but we could write them in a different way too). Or we can call them simpler because they look more like toy models we’ve worked with before (but those toy models are just a tiny sample of all the theories that are possible). We don’t have a standard of simplicity that is actually reliable.

From the Wikipedia page for dark matter halos

There is one other way out of this pickle. A theory that is easier to write down is under no obligation to be true. But it is more likely to be useful. Even if the real world is ultimately described by some giant pile of mathematical parameters, if a simple theory is good enough for the engineers then it’s a better theory to aim for: a useful theory that makes peoples’ lives better.

I kind of get the feeling Hossenfelder would make this objection. I’ve seen her argue on twitter that scientists should always be able to say what their research is good for, and her Guardian article has this suggestive sentence: “However, we do not know that dark matter is indeed made of particles; and even if it is, to explain astrophysical observations one does not need to know details of the particles’ behaviour.”

Ok yes, to explain astrophysical observations one doesn’t need to know the details of dark matter particles’ behavior. But taking a step back, one doesn’t actually need to explain astrophysical observations at all.

Astrophysics and particle physics are not engineering problems. Nobody out there is trying to steer a spacecraft all the way across a galaxy, navigating the distribution of dark matter, or creating new universes and trying to make sure they go just right. Even if we might do these things some day, it will be so far in the future that our attempts to understand them won’t just be quaint: they will likely be actively damaging, confusing old research in dead languages that the field will be better off ignoring to start from scratch.

Because of that, usefulness is also not a meaningful guide. It cannot tell you which theories are more simple, which to favor with Occam’s Razor.

Hossenfelder’s highest-profile recent work falls afoul of one or the other of her principles. Her work on the foundations of quantum mechanics could genuinely be useful, but there’s no reason aside from claims of philosophical beauty to expect it to be true. Her work on modeling dark matter is at least directly motivated by data, but is guaranteed to not be useful.

I’m not pointing this out to call Hossenfelder a hypocrite, as some sort of ad hominem or tu quoque. I’m pointing this out because I don’t think it’s possible to do fundamental physics today without falling afoul of these principles. If you want to hold out hope that your work is useful, you don’t have a great reason besides a love of pretty math: otherwise, anything useful would have been discovered long ago. If you just try to model existing data as best you can, then you’re making a model for events far away or locked in high-energy particle colliders, a model no-one else besides other physicists will ever use.

I don’t know the way through this. I think if you need to take Occam’s Razor seriously, to build on the same foundations that work in every other scientific field…then you should stop doing fundamental physics. You won’t be able to make it work. If you still need to do it, if you can’t give up the sub-field, then you should justify it on building capabilities, on the kind of “practice” Hossenfelder also dismisses in her Guardian piece.

We don’t have a solid foundation, a reliable notion of what is simple and what isn’t. We have guesses and personal opinions. And until some experiment uncovers some blinding flash of new useful meaningful magic…I don’t think we can do any better than that.

Congratulations to Alain Aspect, John F. Clauser and Anton Zeilinger!

The 2022 Nobel Prize was announced this week, awarded to Alain Aspect, John F. Clauser, and Anton Zeilinger for experiments with entangled photons, establishing the violation of Bell inequalities and pioneering quantum information science.

I’ve complained in the past about the Nobel prize awarding to “baskets” of loosely related topics. This year, though, the three Nobelists have a clear link: they were pioneers in investigating and using quantum entanglement.

You can think of a quantum particle like a coin frozen in mid-air. Once measured, the coin falls, and you read it as heads or tails, but before then the coin is neither, with equal chance to be one or the other. In this metaphor, quantum entanglement slices the coin in half. Slice a coin in half on a table, and its halves will either both show heads, or both tails. Slice our “frozen coin” in mid-air, and it keeps this property: the halves, both still “frozen”, can later be measured as both heads, or both tails. Even if you separate them, the outcomes never become independent: you will never find one half-coin to land on tails, and the other on heads.

For those who read my old posts, I think this is a much better metaphor than the different coin-cut-in-half metaphor I used five years ago.

Einstein thought that this couldn’t be the whole story. He was bothered by the way that measuring a “frozen” coin seems to change its behavior faster than light, screwing up his theory of special relativity. Entanglement, with its ability to separate halves of a coin as far as you liked, just made the problem worse. He thought that there must be a deeper theory, one with “hidden variables” that determined whether the halves would be heads or tails before they were separated.

In 1964, a theoretical physicist named J.S. Bell found that Einstein’s idea had testable consequences. He wrote down a set of statistical equations, called Bell inequalities, that have to hold if there are hidden variables of the type Einstein imagined, then showed that quantum mechanics could violate those inequalities.

Bell’s inequalities were just theory, though, until this year’s Nobelists arrived to test them. Clauser was first: in the 70’s, he proposed a variant of Bell’s inequalities, then tested them by measuring members of a pair of entangled photons in two different places. He found complete agreement with quantum mechanics.

Still, there was a loophole left for Einstein’s idea. If the settings on the two measurement devices could influence the pair of photons when they were first entangled, that would allow hidden variables to influence the outcome in a way that avoided Bell and Clauser’s calculations. It was Aspect, in the 80’s, who closed this loophole: by doing experiments fast enough to change the measurement settings after the photons were entangled, he could show that the settings could not possibly influence the forming of the entangled pair.

Aspect’s experiments, in many minds, were the end of the story. They were the ones emphasized in the textbooks when I studied quantum mechanics in school.

The remaining loopholes are trickier. Some hope for a way to correlate the behavior of particles and measurement devices that doesn’t run afoul of Aspect’s experiment. This idea, called, superdeterminism, has recently had a few passionate advocates, but speaking personally I’m still confused as to how it’s supposed to work. Others want to jettison special relativity altogether. This would not only involve measurements influencing each other faster than light, but also would break a kind of symmetry present in the experiments, because it would declare one measurement or the other to have happened “first”, something special relativity forbids. The majority, uncomfortable with either approach, thinks that quantum mechanics is complete, with no deterministic theory that can replace it. They differ only on how to describe, or interpret, the theory, a debate more the domain of careful philosophy than of physics.

After all of these philosophical debates over the nature of reality, you may ask what quantum entanglement can do for you?

Suppose you want to make a computer out of quantum particles, one that uses the power of quantum mechanics to do things no ordinary computer can. A normal computer needs to copy data from place to place, from hard disk to RAM to your processor. Quantum particles, however, can’t be copied: a theorem says that you cannot make an identical, independent copy of a quantum particle. Moving quantum data then required a new method, pioneered by Anton Zeilinger in the late 90’s using quantum entanglement. The method destroys the original particle to make a new one elsewhere, which led to it being called quantum teleportation after the Star Trek devices that do the same with human beings. Quantum teleportation can’t move information faster than light (there’s a reason the inventor of Le Guin’s ansible despairs of the materialism of “Terran physics”), but it is still a crucial technology for quantum computers, one that will be more and more relevant as time goes on.

Shape the Science to the Statistics, Not the Statistics to the Science

In theatre, and more generally in writing, the advice is always to “show, don’t tell”. You could just tell your audience that Long John Silver is a ruthless pirate, but it works a lot better to show him marching a prisoner off the plank. Rather than just informing with words, you want to make things as concrete as possible, with actions.

There is a similar rule in pedagogy. Pedagogy courses teach you to be explicit about your goals, planning a course by writing down Intended Learning Outcomes. (They never seem amused when I ask about the Unintended Learning Outcomes.) At first, you’d want to write down outcomes like “students will understand calculus” or “students will know what a sine is”. These, however, are hard to judge, and thus hard to plan around. Instead, the advice is to write outcomes that correspond to actions you want the students to take, things you want them to be capable of doing: “students can perform integration by parts” “students can decide correctly whether to use a sine or cosine”. Again and again, the best way to get the students to know something is to get them to do something.

Jay Daigle recently finished a series of blog posts on how scientists use statistics to test hypotheses. I recommend it, it’s a great introduction to the concepts scientists use to reason about data, as well as a discussion of how they often misuse those concepts and what they can do better. I have a bit of a different perspective on one of the “takeaways” of the post, and I wanted to highlight that here.

The center of Daigle’s point is a tool, widely used in science, called Neyman-Pearson Hypothesis Testing. Neyman-Pearson is a tool for making decisions involving a threshold for significance: a number that scientists often call a p-value. If you follow the procedure, only acting when you find a p-value below 0.05, then you will only be wrong 5% of the time: specifically, that will be your rate of false positives, the percent of the time you conclude some action works when it really doesn’t.

A core problem, from Daigle’s perspective, is that scientists use Neyman-Pearson for the wrong purpose. Neyman-Pearson is a tool for making decisions, not a test that tells you whether or not a specific claim is true. It tells you “on average, if I approve drugs when their p-value is below 0.05, only 5% of them will fail”. That’s great if you can estimate how bad it is to deny a drug that should be approved, how bad it is to approve a drug that should be denied, and calculate out on average how often you can afford to be wrong. It doesn’t tell you anything about the specific drug, though. It doesn’t tell you “every drug with a p-value below 0.05 works”. It certainly doesn’t tell you “a drug with a p-value of 0.051 almost works” or “a drug with a p-value of 0.001 definitely works”. It just doesn’t give you that information.

In later posts, Daigle suggests better tools, which he argues map better to what scientists want to do, as well as general ways scientists can do better. Section 4. in particular focuses on the idea that one thing scientists need to do is ask better questions. He uses a specific example from cognitive psychology, a study that tests whether describing someone’s face makes you worse at recognizing it later. That’s a clear scientific question, one that can be tested statistically. That doesn’t mean it’s a good question, though. Daigle points out that questions like this have a problem: it isn’t clear what the result actually tells us.

Here’s another example of the same problem. In grad school, I knew a lot of social psychologists. One was researching a phenomenon called extended contact. Extended contact is meant to be a foil to another phenomenon called direct contact, both having to do with our views of other groups. In direct contact, making a friend from another group makes you view that whole group better. In extended contact, making a friend who has a friend from another group makes you view the other group better.

The social psychologist was looking into a concrete-sounding question: which of these phenomena, direct or extended contact, is stronger?

At first, that seems like it has the same problem as Daigle’s example. Suppose one of these effects is larger: what does that mean? Why do we care?

Well, one answer is that these aren’t just phenomena: they’re interventions. If you know one phenomenon is stronger than another, you can use that to persuade people to be more accepting of other groups. The psychologist’s advisor even had a procedure to make people feel like they made a new friend. Armed with that, it’s definitely useful to know whether extended contact or direct contact is better: whichever one is stronger is the one you want to use!

You do need some “theory” behind this, of course. You need to believe that, if a phenomenon is stronger in your psychology lab, it will be stronger wherever you try to apply it in the real world. It probably won’t be stronger every single time, so you need some notion of how much stronger it needs to be. That in turn means you need to estimate costs: what it costs if you pick the weaker one instead, how much money you’re wasting or harm you’re doing.

You’ll notice this is sounding a lot like the requirements I described earlier, for Neyman-Pearson. That’s not accident: as you try to make your science more and more clearly defined, it will get closer and closer to a procedure to make a decision, and that’s exactly what Neyman-Pearson is good for.

So in the end I’m quite a bit more supportive of Neyman-Pearson than Daigle is. That doesn’t mean it isn’t being used wrong: most scientists are using it wrong. Instead of calculating a p-value each time they make a decision, they do it at the end of a paper, misinterpreting it as evidence that one thing or another is “true”. But I think that what these scientists need to do is not chance their statistics, but change their science. If they focused their science on making concrete decisions, they would actually be justified in using Neyman-Pearson…and their science would get a lot better in the process.

The Most Anthropic of All Possible Worlds

Today, we’d call Leibniz a mathematician, a physicist, and a philosopher. As a mathematician, Leibniz turned calculus into something his contemporaries could actually use. As a physicist, he championed a doomed theory of gravity. In philosophy, he seems to be most remembered for extremely cheaty arguments.

Free will and determinism? Can’t it just be a coincidence?

I don’t blame him for this. Faced with a tricky philosophical problem, it’s enormously tempting to just blaze through with an answer that makes every subtlety irrelevant. It’s a temptation I’ve succumbed to time and time again. Faced with a genie, I would always wish for more wishes. On my high school debate team, I once forced everyone at a tournament to switch sides with some sneaky definitions. It’s all good fun, but people usually end up pretty annoyed with you afterwards.

People were annoyed with Leibniz too, especially with his solution to the problem of evil. If you believe in a benevolent, all-powerful god, as Leibniz did, why is the world full of suffering and misery? Leibniz’s answer was that even an all-powerful god is constrained by logic, so if the world contains evil, it must be logically impossible to make the world any better: indeed, we live in the best of all possible worlds. Voltaire famously made fun of this argument in Candide, dragging a Leibniz-esque Professor Pangloss through some of the most creative miseries the eighteenth century had to offer. It’s possibly the most famous satire of a philosopher, easily beating out Aristophanes’ The Clouds (which is also great).

Physicists can also get accused of cheaty arguments, and probably the most mocked is the idea of a multiverse. While it hasn’t had its own Candide, the multiverse has been criticized by everyone from bloggers to Nobel prizewinners. Leibniz wanted to explain the existence of evil, physicists want to explain “unnaturalness”: the fact that the kinds of theories we use to explain the world can’t seem to explain the mass of the Higgs boson. To explain it, these physicists suggest that there are really many different universes, separated widely in space or built in to the interpretation of quantum mechanics. Each universe has a different Higgs mass, and ours just happens to be the one we can live in. This kind of argument is called “anthropic” reasoning. Rather than the best of all possible worlds, it says we live in the world best-suited to life like ours.

I called Leibniz’s argument “cheaty”, and you might presume I think the same of the multiverse. But “cheaty” doesn’t mean “wrong”. It all depends what you’re trying to do.

Leibniz’s argument and the multiverse both work by dodging a problem. For Leibniz, the problem of evil becomes pointless: any evil might be necessary to secure a greater good. With a multiverse, naturalness becomes pointless: with many different laws of physics in different places, the existence of one like ours needs no explanation.

In both cases, though, the dodge isn’t perfect. To really explain any given evil, Leibniz would have to show why it is secretly necessary in the face of a greater good (and Pangloss spends Candide trying to do exactly that). To explain any given law of physics, the multiverse needs to use anthropic reasoning: it needs to show that that law needs to be the way it is to support human-like life.

This sounds like a strict requirement, but in both cases it’s not actually so useful. Leibniz could (and Pangloss does) come up with an explanation for pretty much anything. The problem is that no-one actually knows which aspects of the universe are essential and which aren’t. Without a reliable way to describe the best of all possible worlds, we can’t actually test whether our world is one.

The same problem holds for anthropic reasoning. We don’t actually know what conditions are required to give rise to people like us. “People like us” is very vague, and dramatically different universes might still contain something that can perceive and observe. While it might seem that there are clear requirements, so far there hasn’t been enough for people to do very much with this type of reasoning.

However, for both Leibniz and most of the physicists who believe anthropic arguments, none of this really matters. That’s because the “best of all possible worlds” and “most anthropic of all possible worlds” aren’t really meant to be predictive theories. They’re meant to say that, once you are convinced of certain things, certain problems don’t matter anymore.

Leibniz, in particular, wasn’t trying to argue for the existence of his god. He began the argument convinced that a particular sort of god existed: one that was all-powerful and benevolent, and set in motion a deterministic universe bound by logic. His argument is meant to show that, if you believe in such a god, then the problem of evil can be ignored: no matter how bad the universe seems, it may still be the best possible world.

Similarly, the physicists convinced of the multiverse aren’t really getting there through naturalness. Rather, they’ve become convinced of a few key claims: that the universe is rapidly expanding, leading to a proliferating multiverse, and that the laws of physics in such a multiverse can vary from place to place, due to the huge landscape of possible laws of physics in string theory. If you already believe those things, then the naturalness problem can be ignored: we live in some randomly chosen part of the landscape hospitable to life, which can be anywhere it needs to be.

So despite their cheaty feel, both arguments are fine…provided you agree with their assumptions. Personally, I don’t agree with Leibniz. For the multiverse, I’m less sure. I’m not confident the universe expands fast enough to create a multiverse, I’m not even confident it’s speeding up its expansion now. I know there’s a lot of controversy about the math behind the string theory landscape, about whether the vast set of possible laws of physics are as consistent as they’re supposed to be…and of course, as anyone must admit, we don’t know whether string theory itself is true! I don’t think it’s impossible that the right argument comes around and convinces me of one or both claims, though. These kinds of arguments, “if assumptions, then conclusion” are the kind of thing that seems useless for a while…until someone convinces you of the conclusion, and they matter once again.

So in the end, despite the similarity, I’m not sure the multiverse deserves its own Candide. I’m not even sure Leibniz deserved Candide. But hopefully by understanding one, you can understand the other just a bit better.

The Undefinable

If I can teach one lesson to all of you, it’s this: be precise. In physics, we try to state what we mean as precisely as we can. If we can’t state something precisely, that’s a clue: maybe what we’re trying to state doesn’t actually make sense.

Someone recently reached out to me with a question about black holes. He was confused about how they were described, about what would happen when you fall in to one versus what we could see from outside. Part of his confusion boiled down to a question: “is the center really an infinitely small point?”

I remembered a commenter a while back who had something interesting to say about this. Trying to remind myself of the details, I dug up this question on Physics Stack Exchange. user4552 has a detailed, well-referenced answer, with subtleties of General Relativity that go significantly beyond what I learned in grad school.

According to user4552, the reason this question is confusing is that the usual setup of general relativity cannot answer it. In general relativity, singularities like the singularity in the middle of a black hole aren’t treated as points, or collections of points: they’re not part of space-time at all. So you can’t count their dimensions, you can’t see whether they’re “really” infinitely small points, or surfaces, or lines…

This might surprise people (like me) who have experience with simpler equations for these things, like the Schwarzchild metric. The Schwarzchild metric describes space-time around a black hole, and in the usual coordinates it sure looks like the singularity is at a single point where r=0, just like the point where r=0 is a single point in polar coordinates in flat space. The thing is, though, that’s just one sort of coordinates. You can re-write a metric in many different sorts of coordinates, and the singularity in the center of a black hole might look very different in those coordinates. In general relativity, you need to stick to things you can say independent of coordinates.

Ok, you might say, so the usual mathematics can’t answer the question. Can we use more unusual mathematics? If our definition of dimensions doesn’t tell us whether the singularity is a point, maybe we just need a new definition!

According to user4552, people have tried this…and it only sort of works. There are several different ways you could define the dimension of a singularity. They all seem reasonable in one way or another. But they give different answers! Some say they’re points, some say they’re three-dimensional. And crucially, there’s no obvious reason why one definition is “right”. The question we started with, “is the center really an infinitely small point?”, looked like a perfectly reasonable question, but it actually wasn’t: the question wasn’t precise enough.

This is the real problem. The problem isn’t that our question was undefined, after all, we can always add new definitions. The problem was that our question didn’t specify well enough the definitions we needed. That is why the question doesn’t have an answer.

Once you understand the difference, you see these kinds of questions everywhere. If you’re baffled by how mass could have come out of the Big Bang, or how black holes could radiate particles in Hawking radiation, maybe you’ve heard a physicist say that energy isn’t always conserved. Energy conservation is a consequence of symmetry, specifically, symmetry in time. If your space-time itself isn’t symmetric (the expanding universe making the past different from the future, a collapsing star making a black hole), then you shouldn’t expect energy to be conserved.

I sometimes hear people object to this. They ask, is it really true that energy isn’t conserved when space-time isn’t symmetric? Shouldn’t we just say that space-time itself contains energy?

And well yes, you can say that, if you want. It isn’t part of the usual definition, but you can make a new definition, one that gives energy to space-time. In fact, you can make more than one new definition…and like the situation with the singularity, these definitions don’t always agree! Once again, you asked a question you thought was sensible, but it wasn’t precise enough to have a definite answer.

Keep your eye out for these kinds of questions. If scientists seem to avoid answering the question you want, and keep answering a different question instead…it might be their question is the only one with a precise answer. You can define a method to answer your question, sure…but it won’t be the only way. You need to ask precise enough questions to get good answers.

The Only Speed of Light That Matters

A couple weeks back, someone asked me about a Veritasium video with the provocative title “Why No One Has Measured The Speed Of Light”. Veritasium is a science popularization youtube channel, and usually a fairly good one…so it was a bit surprising to see it make a claim usually reserved for crackpots. Many, many people have measured the speed of light, including Ole Rømer all the way back in 1676. To argue otherwise seems like it demands a massive conspiracy.

Veritasium wasn’t proposing a conspiracy, though, just a technical point. Yes, many experiments have measured the speed of light. However, the speed they measure is in fact a “two-way speed”, the speed that light takes to go somewhere and then come back. They leave open the possibility that light travels differently in different directions, and only has the measured speed on average: that there are different “one-way speeds” of light.

The loophole is clearest using some of the more vivid measurements of the speed of light, timing how long it takes to bounce off a mirror and return. It’s less clear using other measurements of the speed of light, like Rømer’s. Rømer measured the speed of light using the moons of Jupiter, noticing that the time they took to orbit appeared to change based on whether Jupiter was moving towards or away from the Earth. For this measurement Rømer didn’t send any light to Jupiter…but he did have to make assumptions about Jupiter’s rotation, using it like a distant clock. Those assumptions also leave the door open to a loophole, one where the different one-way speeds of light are compensated by different speeds for distant clocks. You can watch the Veritasium video for more details about how this works, or see the wikipedia page for the mathematical details.

When we think of the speed of light as the same in all directions, in some sense we’re making a choice. We’ve chosen a convention, called the Einstein synchronization convention, that lines up distant clocks in a particular way. We didn’t have to choose that convention, though we prefer to (the math gets quite a bit more complicated if we don’t). And crucially for any such choice, it is impossible for any experiment to tell the difference.

So far, Veritasium is doing fine here. But if the video was totally fine, I wouldn’t have written this post. The technical argument is fine, but the video screws up its implications.

Near the end of the video, the host speculates whether this ambiguity is a clue. What if a deeper theory of physics could explain why we can’t tell the difference between different synchronizations? Maybe that would hint at something important.

Well, it does hint at something important, but not something new. What it hints at is that “one-way speeds” don’t matter. Not for light, or really for anything else.

Think about measuring the speed of something, anything. There are two ways to do it. One is to time it against something else, like the signal in a wire, and assume we know that speed. Veritasium shows an example of this, measuring the speed of a baseball that hits a target and sends a signal back. The other way is to send it somewhere with a clock we trust, and compare it to our clock. Each of these requires that something goes back and forth, even if it’s not the same thing each time. We can’t measure the one-way speed of anything because we’re never in two places at once. Everything we measure, every conclusion we come to about the world, rests on something “two-way”: our actions go out, our perceptions go in. Even our depth perception is an inference from our ancestors, whose experience seeing food and traveling to it calibrated our notion of distance.

Synchronization of clocks is a convention because the external world is a convention. What we have really, objectively, truly, are our perceptions and our memories. Everything else is a model we build to fill the gaps in between. Some features of that model are essential: if you change them, you no longer match our perceptions. Other features, though, are just convenience: ways we arrange the model to make it easier to use, to make it not “sound dumb”, to tell a coherent story. Synchronization is one of those things: the notion that you can compare times in distant places is convenient, but as relativity already tells us in other contexts, not necessary. It’s part of our storytelling, not an essential part of our model.

Duality and Emergence: When Is Spacetime Not Spacetime?

Spacetime is doomed! At least, so say some physicists. They don’t mean this as a warning, like some comic-book universe-destroying disaster, but rather as a research plan. These physicists believe that what we think of as space and time aren’t the full story, but that they emerge from something more fundamental, so that an ultimate theory of nature might not use space or time at all. Other, grumpier physicists are skeptical. Joined by a few philosophers, they think the “spacetime is doomed” crowd are over-excited and exaggerating the implications of their discoveries. At the heart of the argument is the distinction between two related concepts: duality and emergence.

In physics, sometimes we find that two theories are actually dual: despite seeming different, the patterns of observations they predict are the same. Some of the more popular examples are what we call holographic theories. In these situations, a theory of quantum gravity in some space-time is dual to a theory without gravity describing the edges of that space-time, sort of like how a hologram is a 2D image that looks 3D when you move it. For any question you can ask about the gravitational “bulk” space, there is a matching question on the “boundary”. No matter what you observe, neither description will fail.

If theories with gravity can be described by theories without gravity, does that mean gravity doesn’t really exist? If you’re asking that question, you’re asking whether gravity is emergent. An emergent theory is one that isn’t really fundamental, but instead a result of the interaction of more fundamental parts. For example, hydrodynamics, the theory of fluids like water, emerges from more fundamental theories that describe the motion of atoms and molecules.

(For the experts: I, like most physicists, am talking about “weak emergence” here, not “strong emergence”.)

The “spacetime is doomed” crowd think that not just gravity, but space-time itself is emergent. They expect that distances and times aren’t really fundamental, but a result of relationships that will turn out to be more fundamental, like entanglement between different parts of quantum fields. As evidence, they like to bring up dualities where the dual theories have different concepts of gravity, number of dimensions, or space-time. Using those theories, they argue that space and time might “break down”, and not be really fundamental.

(I’ve made arguments like that in the past too.)

The skeptics, though, bring up an important point. If two theories are really dual, then no observation can distinguish them: they make exactly the same predictions. In that case, say the skeptics, what right do you have to call one theory more fundamental than the other? You can say that gravity emerges from a boundary theory without gravity, but you could just as easily say that the boundary theory emerges from the gravity theory. The whole point of duality is that no theory is “more true” than the other: one might be more or less convenient, but both describe the same world. If you want to really argue for emergence, then your “more fundamental” theory needs to do something extra: to predict something that your emergent theory doesn’t predict.

Sometimes this is a fair objection. There are members of the “spacetime is doomed” crowd who are genuinely reckless about this, who’ll tell a journalist about emergence when they really mean duality. But many of these people are more careful, and have thought more deeply about the question. They tend to have some mix of these two perspectives:

First, if two descriptions give the same results, then do the descriptions matter? As physicists, we have a history of treating theories as the same if they make the same predictions. Space-time itself is a result of this policy: in the theory of relativity, two people might disagree on which one of two events happened first or second, but they will agree on the overall distance in space-time between the two. From this perspective, a duality between a bulk theory and a boundary theory isn’t evidence that the bulk theory emerges from the boundary, but it is evidence that both the bulk and boundary theories should be replaced by an “overall theory”, one that treats bulk and boundary as irrelevant descriptions of the same physical reality. This perspective is similar to an old philosophical theory called positivism: that statements are meaningless if they cannot be derived from something measurable. That theory wasn’t very useful for philosophers, which is probably part of why some philosophers are skeptics of “space-time is doomed”. The perspective has been quite useful to physicists, though, so we’re likely to stick with it.

Second, some will say that it’s true that a dual theory is not an emergent theory…but it can be the first step to discover one. In this perspective, dualities are suggestive evidence that a deeper theory is waiting in the wings. The idea would be that one would first discover a duality, then discover situations that break that duality: examples on one side that don’t correspond to anything sensible on the other. Maybe some patterns of quantum entanglement are dual to a picture of space-time, but some are not. (Closer to my sub-field, maybe there’s an object like the amplituhedron that doesn’t respect locality or unitarity.) If you’re lucky, maybe there are situations, or even experiments, that go from one to the other: where the space-time description works until a certain point, then stops working, and only the dual description survives. Some of the models of emergent space-time people study are genuinely of this type, where a dimension emerges in a theory that previously didn’t have one. (For those of you having a hard time imagining this, read my old post about “bubbles of nothing”, then think of one happening in reverse.)

It’s premature to say space-time is doomed, at least as a definite statement. But it is looking like, one way or another, space-time won’t be the right picture for fundamental physics. Maybe that’s because it’s equivalent to another description, redundant embellishment on an essential theoretical core. Maybe instead it breaks down, and a more fundamental theory could describe more situations. We don’t know yet. But physicists are trying to figure it out.

Of p and sigma

Ask a doctor or a psychologist if they’re sure about something, and they might say “it has p<0.05”. Ask a physicist, and they’ll say it’s a “5 sigma result”. On the surface, they sound like they’re talking about completely different things. As it turns out, they’re not quite that different.

Whether it’s a p-value or a sigma, what scientists are giving you is shorthand for a probability. The p-value is the probability itself, while sigma tells you how many standard deviations something is away from the mean on a normal distribution. For people not used to statistics this might sound very complicated, but it’s not so tricky in the end. There’s a graph, called a normal distribution, and you can look at how much of it is above a certain point, measured in units called standard deviations, or “sigmas”. That gives you your probability.

Give it a try: how much of this graph is past the 1\sigma line? How about 2\sigma?

What are these numbers a probability of? At first, you might think they’re a probability of the scientist being right: of the medicine working, or the Higgs boson being there.

That would be reasonable, but it’s not how it works. Scientists can’t measure the chance they’re right. All they can do is compare models. When a scientist reports a p-value, what they’re doing is comparing to a kind of default model, called a “null hypothesis”. There are different null hypotheses for different experiments, depending on what the scientists want to test. For the Higgs, scientists looked at pairs of photons detected by the LHC. The null hypothesis was that these photons were created by other parts of the Standard Model, like the strong force, and not by a Higgs boson. For medicine, the null hypothesis might be that people get better on their own after a certain amount of time. That’s hard to estimate, which is why medical experiments use a control group: a similar group without the medicine, to see how much they get better on their own.

Once we have a null hypothesis, we can use it to estimate how likely it is that it produced the result of the experiment. If there was no Higgs, and all those photons just came from other particles, what’s the chance there would still be a giant pile of them at one specific energy? If the medicine didn’t do anything, what’s the chance the control group did that much worse than the treatment group?

Ideally, you want a small probability here. In medicine and psychology, you’re looking for a 5% probability, for p<0.05. In physics, you need 5 sigma to make a discovery, which corresponds to a one in 3.5 million probability. If the probability is low, then you can say that it would be quite unlikely for your result to happen if the null hypothesis was true. If you’ve got a better hypothesis (the Higgs exists, the medicine works), then you should pick that instead.

Note that this probability still uses a model: it’s the probability of the result given that the model is true. It isn’t the probability that the model is true, given the result. That probability is more important to know, but trickier to calculate. To get from one to the other, you need to include more assumptions: about how likely your model was to begin with, given everything else you know about the world. Depending on those assumptions, even the tiniest p-value might not show that your null hypothesis is wrong.

In practice, unfortunately, we usually can’t estimate all of those assumptions in detail. The best we can do is guess their effect, in a very broad way. That usually just means accepting a threshold for p-values, declaring some a discovery and others not. That limitation is part of why medicine and psychology demand p-values of 0.05, while physicists demand 5 sigma results. Medicine and psychology have some assumptions they can rely on: that people function like people, that biology and physics keep working. Physicists don’t have those assumptions, so we have to be extra-strict.

Ultimately, though, we’re all asking the same kind of question. And now you know how to understand it when we do.