So, would you believe I’ve never visited CERN before?
I was at CERN for a few days this week, visiting friends and collaborators and giving an impromptu talk. Surprisingly, this is the first time I’ve been, a bit of an embarrassing admission for someone who’s ostensibly a particle physicist.
Despite that, CERN felt oddly familiar. The maze of industrial buildings and winding roads, the security gates and cards (and work-arounds for when you arrive outside of card-issuing hours, assisted by friendly security guards), the constant construction and remodeling, all of it reminded me of the times I visited SLAC during my PhD. This makes a lot of sense, of course: one accelerator is at least somewhat like another. But besides a visit to Fermilab for a conference several years ago, I haven’t been in many other places like that since then.
(One thing that might have also been true of SLAC and Fermilab but I never noticed: CERN buildings not only have evacuation instructions for the building in case of a fire, but also evacuation instructions for the whole site.)
CERN is a bit less “pretty” than SLAC on average, without the nice grassy area in the middle or the California sun that goes with it. It makes up for it with what seems like more in terms of outreach resources, including a big wooden dome of a mini-museum sponsored by Rolex, and a larger visitor center still under construction.
CERN hosts a variety of theoretical physicists doing various different types of work. I was hosted by the “QCD group”, but the string theorists just down the hall include a few people I know as well. The lounge had a few cardboard signs hidden under the table, leftovers of CERN’s famous yearly Christmas play directed by John Ellis.
It’s been a fun, if brief, visit. I’ll likely get to see a bit more this summer, when they host Amplitudes 2023. Until then, it was fun reconnecting with that “accelerator feel”.
Since Valentine’s Day was this week, it’s time for the next installment of my traditional Valentine’s Day Physics Poems. New readers, don’t let this drive you off, I only do it once a year! And if you actually like it, you can take a look at poems from previous years here.
Married to a Model
If you ever face a physics class distracted,
Rappers and footballers twinkling on their phones,
Then like an awkward youth pastor, interject,
“You know who else is married to a Model?”
Her name is Standard, you see,
Wife of fifty years to Old Man Physics,
Known for her beauty, charm, and strangeness too.
But Old Man Physics has a wandering eye,
and dreams of Models Beyond.
Let the old man bend your ear,
a litany of Problems.
He’ll never understand her, so he starts.
Some matters she holds weighty, some feather-light
with nary rhyme or reason
(which he is owed, he’s sure).
She’s unnatural, he says,
(echoing Higgins et al.),
a set of rules he can’t predict.
(But with those rules, all else is possible.)
Some regularities she holds to fast, despite room for exception,
others breaks, like an ill-lucked bathroom mirror.
And then, he says, she’ll just blow up
(when taken to extremes),
while singing nonsense in the face of Gravity.
He’s been keeping a careful eye
and noticing anomalies
(and each time, confronting them,
finds an innocent explanation,
but no matter).
And he imagines others
with yet wilder curves
and more sensitive reactions
(and nonsense, of course,
that he’s lived fifty years without).
Old man physics talks,
But beyond the talk,
beyond the phases and phrases,
(conscious uncoupling, non-empirical science),
he stays by her side.
He knows Truth,
in this world,
is worth fighting for.
I’ve never met someone who believed the Earth was flat. I’ve met a few who believed it was six thousand years old, but not many. Occasionally, I run into crackpots who rail against relativity or quantum mechanics, or more recent discoveries like quarks or the Higgs. But for one conclusion of modern physics, the doubters are common. For this one idea, the average person may not insist that the physicists are wrong, but they’ll usually roll their eyes a little bit, ask the occasional “really?”
That idea is dark matter.
For the average person, dark matter doesn’t sound like normal, responsible science. It sounds like cheating. Scientists try to explain the universe, using stars and planets and gravity, and eventually they notice the equations don’t work, so they just introduce some new matter nobody can detect. It’s as if a budget didn’t add up, so the accountant just introduced some “dark expenses” to hide the problem.
But that doesn’t explain everything. Physics may have to explain things in terms of math, but we shouldn’t just invent new math whenever we feel like it. Surely, we should prefer explanations in terms of things we know to explanations in terms of things we don’t know. The question then becomes, what justifies the preference? And when do we get to break it?
Imagine you’re camping in your backyard. You’ve brought a pack of jumbo marshmallows. You wake up to find a hole torn in the bag, a few marshmallows strewn on a trail into the bushes, the rest gone. You’re tempted to imagine a new species of ant, with enormous jaws capable of ripping open plastic and hauling the marshmallows away. Then you remember your brother likes marshmallows, and it’s probably his fault.
Now imagine instead you’re camping in the Amazon rainforest. Suddenly, the ant explanation makes sense. You may not have a particular species of ants in mind, but you know the rainforest is full of new species no-one has yet discovered. And you’re pretty sure your brother couldn’t have flown to your campsite in the middle of the night and stolen your marshmallows.
We do have a preference against introducing new types of “stuff”, like new species of ants or new particles. We have that preference because these new types of stuff are unlikely, based on our current knowledge. We don’t expect new species of ants in our backyards, because we think we have a pretty good idea of what kinds of ants exist, and we think a marshmallow-stealing brother is more likely. That preference gets dropped, however, based on the strength of the evidence. If it’s very unlikely our brother stole the marshmallows, and if we’re somewhere our knowledge of ants is weak, then the marshmallow-stealing ants are more likely.
Dark matter is a massive leap. It’s not a massive leap because we can’t see it, but simply because it involves new particles, particles not in our Standard Model of particle physics. (Or, for the MOND-ish fans, new fields not present in Einstein’s theory of general relativity.) It’s hard to justify physics beyond the Standard Model, and our standards for justifying it are in general very high: we need very precise experiments to conclude that the Standard Model is well and truly broken.
For dark matter, we keep those standards. The evidence for some kind of dark matter, that there is something that can’t be explained by just the Standard Model and Einstein’s gravity, is at this point very strong. Far from a vague force that appears everywhere, we can map dark matter’s location, systematically describe its effect on the motion of galaxies to clusters of galaxies to the early history of the universe. We’ve checked if there’s something we’ve left out, if black holes or unseen planets might cover it, and they can’t. It’s still possible we’ve missed something, just like it’s possible your brother flew to the Amazon to steal your marshmallows, but it’s less likely than the alternatives.
Also, much like ants in the rainforest, we don’t know every type of particle. We know there are things we’re missing: new types of neutrinos, or new particles to explain quantum gravity. These don’t have to have anything to do with dark matter, they might be totally unrelated. But they do show that we should expect, sometimes, to run into particles we don’t already know about. We shouldn’t expect that we already know all the particles.
If physicists did what the cartoons suggest, it really would be cheating. If we proposed dark matter because our equations didn’t match up, and stopped checking, we’d be no better than an accountant adding “dark money” to a budget. But we didn’t do that. When we argue that dark matter exists, it’s because we’ve actually tried to put together the evidence, because we’ve weighed it against the preference to stick with the Standard Model and found the evidence tips the scales. The instinct to call it cheating is a good instinct, one you should cultivate. But here, it’s an instinct physicists have already taken into account.
I’ve had this conversation a few times over the years. Usually, the people I’m talking to are worried about black holes. They’ve heard that the Large Hadron Collider speeds up particles to amazingly high energies before colliding them together. They worry that these colliding particles could form a black hole, which would fall into the center of the Earth and busily gobble up the whole planet.
This pretty clearly hasn’t happened. But also, physicists were pretty confident that it couldn’t happen. That isn’t to say they thought it was impossible to make a black hole with the LHC. Some physicists actually hoped to make a black hole: it would have been evidence for extra dimensions, curled-up dimensions much larger than the tiny ones required by string theory. They figured out the kind of evidence they’d see if the LHC did indeed create a black hole, and we haven’t seen that evidence. But even before running the machine, they were confident that such a black hole wouldn’t gobble up the planet. Why?
The best argument is also the most unsatisfying. The LHC speeds up particles to high energies, but not unprecedentedly high energies. High-energy particles called cosmic rays enter the atmosphere every day, some of which are at energies comparable to the LHC. The LHC just puts the high-energy particles in front of a bunch of sophisticated equipment so we can measure everything about them. If the LHC could destroy the world, cosmic rays would have already done so.
That’s a very solid argument, but it doesn’t really explain why. Also, it may not be true for future colliders: we could build a collider with enough energy that cosmic rays don’t commonly meet it. So I should give another argument.
The next argument is Hawking radiation. In Stephen Hawking’s most famous accomplishment, he argued that because of quantum mechanics black holes are not truly black. Instead, they give off a constant radiation of every type of particle mixed together, shrinking as it does so. The radiation is faintest for large black holes, but gets more and more intense the smaller the black hole is, until the smallest black holes explode into a shower of particles and disappear. This argument means that a black hole small enough that the LHC could produce it would radiate away to nothing in almost an instant: not long enough to leave the machine, let alone fall to the center of the Earth.
This is a good argument, but maybe you aren’t as sure as I am about Hawking radiation. As it turns out, we’ve never measured Hawking radiation, it’s just a theoretical expectation. Remember that the radiation gets fainter the larger the black hole is: for a black hole in space with the mass of a star, the radiation is so tiny it would be almost impossible to detect even right next to the black hole. From here, in our telescopes, we have no chance of seeing it.
So suppose tiny black holes didn’t radiate, and suppose the LHC could indeed produce them. Wouldn’t that have been dangerous?
Here, we can do a calculation. I want you to appreciate how tiny these black holes would be.
From science fiction and cartoons, you might think of a black hole as a kind of vacuum cleaner, sucking up everything nearby. That’s not how black holes work, though. The “sucking” black holes do is due to gravity, no stronger than the gravity of any other object with the same mass at the same distance. The only difference comes when you get close to the event horizon, an invisible sphere close-in around the black hole. Pass that line, and the gravity is strong enough that you will never escape.
We know how to calculate the position of the event horizon of a black hole. It’s the Schwarzchild radius, and we can write it in terms of Newton’s constant G, the mass of the black hole M, and the speed of light c, as follows:
The Large Hadron Collider’s two beams each have an energy around seven tera-electron-volts, or TeV, so there are 14 TeV of energy in total in each collision. Imagine all of that energy being converted into mass, and that mass forming a black hole. That isn’t how it would actually happen: some of the energy would create other particles, and some would give the black hole a “kick”, some momentum in one direction or another. But we’re going to imagine a “worst-case” scenario, so let’s assume all the energy goes to form the black hole. Electron-volts are a weird physicist unit, but if we divide them by the speed of light squared (as we should if we’re using to create a mass), then Wikipedia tells us that each electron-volt will give us kilograms. “Tera” is the SI prefix for . Thus our tiny black hole starts with a mass of
Plugging in Newton’s constant ( meters cubed per kilogram per second squared), and the speed of light ( meters per second), and we get a radius of,
That, by the way, is amazingly tiny. The size of an atom is about meters. If every atom was a tiny person, and each of that person’s atoms was itself a person, and so on for five levels down, then the atoms of the smallest person would be the same size as this event horizon.
Now, we let this little tiny black hole fall. Let’s imagine it falls directly towards the center of the Earth. The only force affecting it would be gravity (if it had an electrical charge, it would quickly attract a few electrons and become neutral). That means you can think of it as if it were falling through a tiny hole, with no friction, gobbling up anything unfortunate enough to fall within its event horizon.
For our first estimate, we’ll treat the black hole as if it stays the same size through its journey. Imagine the black hole travels through the entire earth, absorbing a cylinder of matter. Using the Earth’s average density of 5515 kilograms per cubic meter, and the Earth’s maximum radius of 6378 kilometers, our cylinder adds a mass of,
That’s absurdly tiny. That’s much, much, much tinier than the mass we started out with. Absorbing an entire cylinder through the Earth makes barely any difference.
You might object, though, that the black hole is gaining mass as it goes. So really we ought to use a differential equation. If the black hole travels a distance r, absorbing mass as it goes at average Earth density , then we find,
Solving this, we get
Where is the mass we start out with.
Plug in the distance through the Earth for r, and we find…still about ! It didn’t change very much, which makes sense, it’s a very very small difference!
So let’s ask a question: how long would it take for a black hole, oscillating like this, to double its mass?
We want to solve,
so we need the black hole to travel a total distance of
That’s a huge distance! The Earth’s radius, remember, is 6378 kilometers. So traveling that far would take
Ten to the sixty-seven years. Our universe is only about ten to the ten years old. In another five times ten to the nine years, the Sun will enter its red giant phase, and swallow the Earth. There simply isn’t enough time for this tiny tiny black hole to gobble up the world, before everything is already gobbled up by something else. Even in the most pessimistic way to walk through the calculation, it’s just not dangerous.
I hope that, if you were worried about black holes at the LHC, you’re not worried any more. But more than that, I hope you’ve learned three lessons. First, that even the highest-energy particle physics involves tiny energies compared to day-to-day experience. Second, that gravitational effects are tiny in the context of particle physics. And third, that with Wikipedia access, you too can answer questions like this. If you’re worried, you can make an estimate, and check!
As the new year approaches, people think about the future. Me, I’m thinking about the future of fundamental physics, about what might lie beyond the Standard Model. Physicists search for many different things, with many different motivations. Some are clear missing pieces, places where the Standard Model fails and we know we’ll need to modify it. Others are based on experience, with no guarantees but an expectation that, whatever we find, it will be surprising. Finally, some are cool possibilities, ideas that would explain something or fill in a missing piece but aren’t strictly necessary.
The Almost-Sure Things
Science isn’t math, so nothing here is really a sure thing. We might yet discover a flaw in important principles like quantum mechanics and special relativity, and it might be that an experimental result we trust turns out to be flawed. But if we chose to trust those principles, and our best experiments, then these are places we know the Standard Model is incomplete:
Neutrino Masses: The original Standard Model’s neutrinos were massless. Eventually, physicists discovered this was wrong: neutrinos oscillate, switching between different types in a way they only could if they had different masses. This result is familiar enough that some think of it as already part of the Standard Model, not really beyond. But the masses of neutrinos involve unsolved mysteries: we don’t know what those masses are, but more, there are different ways neutrinos could have mass, and we don’t yet know which is present in nature. Neutrino masses also imply the existence of an undiscovered “sterile” neutrino, a particle that doesn’t interact with the strong, weak, or electromagnetic forces.
Dark Matter Phenomena (and possibly Dark Energy Phenomena): Astronomers first suggested dark matter when they observed galaxies moving at speeds inconsistent with the mass of their stars. Now, they have observed evidence for it in a wide variety of situations, evidence which seems decisively incompatible with ordinary gravity and ordinary matter. Some solve this by introducing dark matter, others by modifying gravity, but this is more of a technical difference than it sounds: in order to modify gravity, one must introduce new quantum fields, much the same way one does when introducing dark matter. The only debate is how “matter-like” those fields need to be, but either approach goes beyond the Standard Model.
Quantum Gravity:It isn’t as hard to unite quantum mechanics and gravity as you might think. Physicists have known for decades how to write down a naive theory of quantum gravity, one that follows the same steps one might use to derive the quantum theory of electricity and magnetism. The problem is, this theory is incomplete. It works at low energies, but as the energy increases it loses the ability to make predictions, eventually giving nonsensical answers like probabilities greater than one. We have candidate solutions to this problem, like string theory, but we might not know for a long time which solution is right.
Landau Poles: Here’s a more obscure one. In particle physics we can zoom in and out in our theories, using similar theories at different scales. What changes are the coupling constants, numbers that determine the strength of the different forces. You can think of this in a loosely reductionist way, with the theories at smaller scales determining the constants for theories at larger scales. This gives workable theories most of the time, but it fails for at least one part of the Standard Model. In electricity and magnetism, the coupling constant increases as you zoom in. Eventually, it becomes infinite, and what’s more, does so at a finite energy scale. It’s still not clear how we should think about this, but luckily we won’t have to very soon: this energy scale is vastly vastly higher than even the scale of quantum gravity.
Some Surprises Guarantee Others: The Standard Model is special in a way that gravity isn’t. Even if you dial up the energy, a Standard Model calculation will always “make sense”: you never get probabilities greater than one. This isn’t true for potential deviations from the Standard Model. If the Higgs boson turns out to interact differently than we expect, it wouldn’t just be a violation of the Standard Model on its own: it would guarantee mathematically that, at some higher energy, we’d have to find something new. That was precisely the kind of argument the LHC used to find the Higgs boson: without the Higgs, something new was guaranteed to happen within the energy range of the LHC to prevent impossible probability numbers.
The Argument from (Theoretical) Experience
Everything in this middle category rests on a particular sort of argument. It’s short of a guarantee, but stronger than a dream or a hunch. While the previous category was based on calculations in theories we already know how to write down, this category relies on our guesses about theories we don’t yet know how to write.
Suppose we had a deeper theory, one that could use fewer parameters to explain the many parameters of the Standard Model. For example, it might explain the Higgs mass, letting us predict it rather than just measuring it like we do now. We don’t have a theory like that yet, but what we do have are many toy model theories, theories that don’t describe the real world but do, in this case, have fewer parameters. We can observe how these theories work, and what kinds of discoveries scientists living in worlds described by them would make. By looking at this process, we can get a rough idea of what to expect, which things in our own world would be “explained” in other ways in these theories.
The Hierarchy Problem: This is also called the naturalness problem. Suppose we had a theory that explained the mass of the Higgs, one where it wasn’t just a free parameter. We don’t have such a theory for the real Higgs, but we do have many toy models with similar behavior, ones with a boson with its mass determined by something else. In these models, though, the mass of the boson is always close to the energy scale of other new particles, particles which have a role in determining its mass, or at least in postponing that determination. This was the core reason why people expected the LHC to find something besides the Higgs. Without such new particles, the large hierarchy between the mass of the Higgs and the mass of new particles becomes a mystery, one where it gets harder and harder to find a toy model with similar behavior that still predicts something like the Higgs mass.
The Strong CP Problem: The weak nuclear force does what must seem like a very weird thing, by violating parity symmetry: the laws that govern it are not the same when you flip the world in a mirror. This is also true when you flip all the charges as well, a combination called CP (charge plus parity). But while it may seem strange that the weak force violates this symmetry, physicists find it stranger that the strong force seems to obey it. Much like in the hierarchy problem, it is very hard to construct a toy model that both predicts a strong force that maintains CP (or almost maintains it) and doesn’t have new particles. The new particle in question, called the axion, is something some people also think may explain dark matter.
Matter-Antimatter Asymmetry: We don’t know the theory of quantum gravity. Even if we did, the candidate theories we have struggle to describe conditions close to the Big Bang. But while we can’t prove it, many physicists expect the quantum gravity conditions near the Big Bang to produce roughly equal amounts of matter and antimatter. Instead, matter dominates: we live in a world made almost entirely of matter, with no evidence of large antimatter areas even far out in space. This lingering mystery could be explained if some new physics was biased towards matter instead of antimatter.
Various Problems in Cosmology: Many open questions in cosmology fall in this category. The small value of the cosmological constant is mysterious for the same reasons the small value of the Higgs mass is, but at a much larger and harder to fix scale. The early universe surprises many cosmologists by its flatness and uniformity, which has led them to propose new physics. This surprise is not because such flatness and uniformity is mathematically impossible, but because it is not the behavior they would expect out of a theory of quantum gravity.
The Cool Possibilities
Some ideas for physics beyond the standard model aren’t required, either from experience or cold hard mathematics. Instead, they’re cool, and would be convenient. These ideas would explain things that look strange, or make for a simpler deeper theory, but they aren’t the only way to do so.
Grand Unified Theories: Not the same as a “theory of everything”, Grand Unified Theories unite the three “particle physics forces”: the strong nuclear force, the weak nuclear force, and electromagnetism. Under such a theory, the different parameters that determine the strengths of those forces could be predicted from one shared parameter, with the forces only seeming different at low energies. These theories often unite the different matter particles too, but they also introduce new particles and new forces. These forces would, among other things, make protons unstable, and so giant experiments have been constructed to try to detect a proton decaying into other particles. So far none has been seen.
Low-Energy Supersymmetry: String theory requires supersymmetry, a relationship where matter and force particles share many properties. That supersymmetry has to be “broken”, which means that while the matter and force particles have the same charges, they can have wildly different masses, so that the partner particles are all still undiscovered. Those masses may be extremely high, all the way up at the scale of quantum gravity, but they could also be low enough to test at the LHC. Physicists hoped to detect such particles, as they could have been a good solution to the hierarchy problem. Now that the LHC hasn’t found these supersymmetric particles, it is much harder to solve the problem this way, though some people are still working on it.
Large Extra Dimensions: String theory also involves extra dimensions, beyond our usual three space and one time. Those dimensions are by default very small, but some proposals have them substantially bigger, big enough that we could have seen evidence for them at the LHC. These proposals could explain why gravity is so much weaker than the other forces. Much like the previous members of this category though, no evidence for this has yet been found.
I think these categories are helpful, but experts may quibble about some of my choices. I also haven’t mentioned every possible thing that could be found beyond the Standard Model. If you’ve heard of something and want to know which category I’d put it in, let me know in the comments!
Quanta has come up a number of times on this blog, they’re a science news outlet set up by the Simons Foundation. Their goal is to enhance the public understanding of science and mathematics. They cover topics other outlets might find too challenging, and they cover the topics others cover with more depth. Most people I know who’ve worked with them have been impressed by their thoroughness: they take fact-checking to a level I haven’t seen with other science journalists. If you’re doing a certain kind of mathematical work, then you hope that Quanta decides to cover it.
A while back, as I was chatting with one of their journalists, I had a startling realization: if I want Quanta to cover something, I can send them a tip, and if they’re interested they’ll write about it. That realization resulted in the article I talked about here. Chatting with the journalist interviewing me for that article, though, I learned something if anything even more startling: if I want Quanta to cover something, and I want to write about it, I can pitch the article to Quanta, and if they’re interested they’ll pay me to write about it.
Around the same time, I happened to talk to a few people in my field, who had a problem they thought Quanta should cover. A software, called FORM, was used in all the most serious collider physics calculations. Despite that, the software wasn’t being supported: its future was unclear. You can read the article to learn more.
One thing I didn’t mention in that article: I hadn’t used FORM before I started writing it. I don’t do those “most serious collider physics calculations”, so I’d never bothered to learn FORM. I mostly use Mathematica, a common choice among physicists who want something easy to learn, even if it’s not the strongest option for many things.
(By the way, it was surprisingly hard to find quotes about FORM that didn’t compare it specifically to Mathematica. In the end I think I included one, but believe me, there could have been a lot more.)
Now, I wonder if I should have been using FORM all along. Many times I’ve pushed to the limits of what Mathematica could comfortable handle, the limits of what my computer’s memory could hold, equations long enough that just expanding them out took complicated work-arounds. If I had learned FORM, maybe I would have breezed through those calculations, and pushed even further.
I’d love it if this article gets FORM more attention, and more support. But also, I’d love it if it gives a window on the nuts and bolts of hard-core particle physics: the things people have to do to turn those T-shirt equations into predictions for actual colliders. It’s a world in between physics and computer science and mathematics, a big part of the infrastructure of how we know what we know that, precisely because it’s infrastructure, often ends up falling through the cracks.
There’s a saying in physics, attributed to the famous genius John von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”
Say you want to model something, like some surprising data from a particle collider. You start with some free parameters: numbers in your model that aren’t decided yet. You then decide those numbers, “fixing” them based on the data you want to model. Your goal is for your model not only to match the data, but to predict something you haven’t yet measured. Then you can go out and check, and see if your model works.
The more free parameters you have in your model, the easier this can go wrong. More free parameters make it easier to fit your data, but that’s because they make it easier to fit any data. Your model ends up not just matching the physics, but matching the mistakes as well: the small errors that crop up in any experiment. A model like that may look like it’s a great fit to the data, but its predictions will almost all be wrong. It wasn’t just fit, it was overfit.
So, did you know machine learning was just modeling data?
All of the much-hyped recent advances in artificial intelligence, GPT and Stable Diffusion and all those folks, at heart they’re all doing this kind of thing. They start out with a model (with a lot more than five parameters, arranged in complicated layers…), then use data to fix the free parameters. Unlike most of the models physicists use, they can’t perfectly fix these numbers: there are too many of them, so they have to approximate. They then test their model on new data, and hope it still works.
Increasingly, it does, and impressively well, so well that the average person probably doesn’t realize this is what it’s doing. When you ask one of these AIs to make an image for you, what you’re doing is asking what image the model predicts would show up captioned with your text. It’s the same sort of thing as asking an economist what their model predicts the unemployment rate will be when inflation goes up. The machine learning model is just way, way more complicated.
As a physicist, the first time I heard about this, I had von Neumann’s quote in the back of my head. Yes, these machines are dealing with a lot more data, from a much more complicated reality. They literally are trying to fit elephants, even elephants wiggling their trunks. Still, the sheer number of parameters seemed fishy here. And for a little bit things seemed even more fishy, when I learned about double descent.
Suppose you start increasing the number of parameters in your model. Initially, your model gets better and better. Your predictions have less and less error, your error descends. Eventually, though, the error increases again: you have too many parameters so you’re over-fitting, and your model is capturing accidents in your data, not reality.
In machine learning, weirdly, this is often not the end of the story. Sometimes, your prediction error rises, only to fall once more, in a double descent.
For a while, I found this deeply disturbing. The idea that you can fit your data, start overfitting, and then keep overfitting, and somehow end up safe in the end, was terrifying. The way some of the popular accounts described it, like you were just overfitting more and more and that was fine, was baffling, especially when they seemed to predict that you could keep adding parameters, keep fitting tinier and tinier fleas on the elephant’s trunk, and your predictions would never start going wrong. It would be the death of Occam’s Razor as we know it, more complicated explanations beating simpler ones off to infinity.
Luckily, that’s not what happens. And after talking to a bunch of people, I think I finally understand this enough to say something about it here.
The right way to think about double descent is as overfitting prematurely. You do still expect your error to eventually go up: your model won’t be perfect forever, at some point you will really overfit. It might take a long time, though: machine learning people are trying to model very complicated things, like human behavior, with giant piles of data, so very complicated models may often be entirely appropriate. In the meantime, due to a bad choice of model, you can accidentally overfit early. You will eventually overcome this, pushing past with more parameters into a model that works again, but for a little while you might convince yourself, wrongly, that you have nothing more to learn.
So Occam’s Razor still holds, but with a twist. The best model is simple enough, but no simpler. And if you’re not careful enough, you can convince yourself that a too-simple model is as complicated as you can get.
I was reminded of all this recently by somearticles by Sabine Hossenfelder.
Hossenfelder is a critic of mainstream fundamental physics. The articles were her restating a point she’s made many times before, including in (at least) one of her books. She thinks the people who propose new particles and try to search for them are wasting time, and the experiments motivated by those particles are wasting money. She’s motivated by something like Occam’s Razor, the need to stick to the simplest possible model that fits the evidence. In her view, the simplest models are those in which we don’t detect any more new particles any time soon, so those are the models she thinks we should stick with.
I tend to disagree with Hossenfelder. Here, I was oddly conflicted. In some of her examples, it seemed like she had a legitimate point. Others seemed like she missed the mark entirely.
Talk to most astrophysicists, and they’ll tell you dark matter is settled science. Indeed, there is a huge amount of evidence that something exists out there in the universe that we can’t see. It distorts the way galaxies rotate, lenses light with its gravity, and wiggled the early universe in pretty much the way you’d expect matter to.
What isn’t settled is whether that “something” interacts with anything else. It has to interact with gravity, of course, but everything else is in some sense “optional”. Astroparticle physicists use satellites to search for clues that dark matter has some other interactions: perhaps it is unstable, sometimes releasing tiny signals of light. If it did, it might solve other problems as well.
Hossenfelder thinks this is bunk (in part because she thinks those other problems are bunk). I kind of do too, though perhaps for a more general reason: I don’t think nature owes us an easy explanation. Dark matter isn’t obligated to solve any of our other problems, it just has to be dark matter. That seems in some sense like the simplest explanation, the one demanded by Occam’s Razor.
At the same time, I disagree with her substantially more on collider physics. At the Large Hadron Collider so far, all of the data is reasonably compatible with the Standard Model, our roughly half-century old theory of particle physics. Collider physicists search that data for subtle deviations, one of which might point to a general discrepancy, a hint of something beyond the Standard Model.
While my intuitions say that the simplest dark matter is completely dark, they don’t say that the simplest particle physics is the Standard Model. Back when the Standard Model was proposed, people might have said it was exceptionally simple because it had a property called “renormalizability”, but these days we view that as less important. Physicists like Ken Wilson and Steven Weinberg taught us to view theories as a kind of series of corrections, like a Taylor series in calculus. Each correction encodes new, rarer ways that particles can interact. A renormalizable theory is just the first term in this series. The higher terms might be zero, but they might not. We even know that some terms cannot be zero, because gravity is not renormalizable.
The two cases on the surface don’t seem that different. Dark matter might have zero interactions besides gravity, but it might have other interactions. The Standard Model might have zero corrections, but it might have nonzero corrections. But for some reason, my intuition treats the two differently: I would find it completely reasonable for dark matter to have no extra interactions, but very strange for the Standard Model to have no corrections.
I think part of where my intuition comes from here is my experience with other theories.
One example is a toy model called sine-Gordon theory. In sine-Gordon theory, this Taylor series of corrections is a very familiar Taylor series: the sine function! If you go correction by correction, you’ll see new interactions and more new interactions. But if you actually add them all up, something surprising happens. Sine-Gordon turns out to be a special theory, one with “no particle production”: unlike in normal particle physics, in sine-Gordon particles can neither be created nor destroyed. You would never know this if you did not add up all of the corrections.
String theory itself is another example. In string theory, elementary particles are replaced by strings, but you can think of that stringy behavior as a series of corrections on top of ordinary particles. Once again, you can try adding these things up correction by correction, but once again the “magic” doesn’t happen until the end. Only in the full series does string theory “do its thing”, and fix some of the big problems of quantum gravity.
If the real world really is a theory like this, then I think we have to worry about something like double descent.
Remember, double descent happens when our models can prematurely get worse before getting better. This can happen if the real thing we’re trying to model is very different from the model we’re using, like the example in this explainer that tries to use straight lines to match a curve. If we think a model is simpler because it puts fewer corrections on top of the Standard Model, then we may end up rejecting a reality with infinite corrections, a Taylor series that happens to add up to something quite nice. Occam’s Razor stops helping us if we can’t tell which models are really the simple ones.
The problem here is that every notion of “simple” we can appeal to here is aesthetic, a choice based on what makes the math look nicer. Other sciences don’t have this problem. When a biologist or a chemist wants to look for the simplest model, they look for a model with fewer organisms, fewer reactions…in the end, fewer atoms and molecules, fewer of the building-blocks given to those fields by physics. Fundamental physics can’t do this: we build our theories up from mathematics, and mathematics only demands that we be consistent. We can call theories simpler because we can write them in a simple way (but we could write them in a different way too). Or we can call them simpler because they look more like toy models we’ve worked with before (but those toy models are just a tiny sample of all the theories that are possible). We don’t have a standard of simplicity that is actually reliable.
There is one other way out of this pickle. A theory that is easier to write down is under no obligation to be true. But it is more likely to be useful. Even if the real world is ultimately described by some giant pile of mathematical parameters, if a simple theory is good enough for the engineers then it’s a better theory to aim for: a useful theory that makes peoples’ lives better.
I kind of get the feeling Hossenfelder would make this objection. I’ve seen her argue on twitter that scientists should always be able to say what their research is good for, and her Guardian article has this suggestive sentence: “However, we do not know that dark matter is indeed made of particles; and even if it is, to explain astrophysical observations one does not need to know details of the particles’ behaviour.”
Ok yes, to explain astrophysical observations one doesn’t need to know the details of dark matter particles’ behavior. But taking a step back, one doesn’t actually need to explain astrophysical observations at all.
Astrophysics and particle physics are not engineering problems. Nobody out there is trying to steer a spacecraft all the way across a galaxy, navigating the distribution of dark matter, or creating new universes and trying to make sure they go just right. Even if we might do these things some day, it will be so far in the future that our attempts to understand them won’t just be quaint: they will likely be actively damaging, confusing old research in dead languages that the field will be better off ignoring to start from scratch.
Because of that, usefulness is also not a meaningful guide. It cannot tell you which theories are more simple, which to favor with Occam’s Razor.
Hossenfelder’s highest-profile recent work falls afoul of one or the other of her principles. Her work on the foundations of quantum mechanics could genuinely be useful, but there’s no reason aside from claims of philosophical beauty to expect it to be true. Her work on modeling dark matter is at least directly motivated by data, but is guaranteed to not be useful.
I’m not pointing this out to call Hossenfelder a hypocrite, as some sort of ad hominem or tu quoque. I’m pointing this out because I don’t think it’s possible to do fundamental physics today without falling afoul of these principles. If you want to hold out hope that your work is useful, you don’t have a great reason besides a love of pretty math: otherwise, anything useful would have been discovered long ago. If you just try to model existing data as best you can, then you’re making a model for events far away or locked in high-energy particle colliders, a model no-one else besides other physicists will ever use.
I don’t know the way through this. I think if you need to take Occam’s Razor seriously, to build on the same foundations that work in every other scientific field…then you should stop doing fundamental physics. You won’t be able to make it work. If you still need to do it, if you can’t give up the sub-field, then you should justify it on building capabilities, on the kind of “practice” Hossenfelder also dismisses in her Guardian piece.
We don’t have a solid foundation, a reliable notion of what is simple and what isn’t. We have guesses and personal opinions. And until some experiment uncovers some blinding flash of new useful meaningful magic…I don’t think we can do any better than that.
I’ve done a lot of work with what we like to call “bootstrap” methods. Instead of doing a particle physics calculation in all its gory detail, we start with a plausible guess and impose requirements based on what we know. Eventually, we have the right answer pulled up “by its own bootstraps”: the only answer the calculation could have, without actually doing the calculation.
This method works very well, but so far it’s only been applied to certain kinds of calculations, involving mathematical functions called polylogarithms. More complicated calculations involve a mathematical object called an elliptic curve, and until very recently it wasn’t clear how to bootstrap them. To get people thinking about it, my colleagues Hjalte Frellesvig and Andrew McLeod asked the Carlsberg Foundation (yes, that Carlsberg) to fund a mini-conference. The idea was to get elliptic people and bootstrap people together (along with Hjalte’s tribe, intersection theory people) to hash things out. “Jumpstart people” are not a thing in physics, so despite the title they were not invited.
Having the conference so soon after the yearly Elliptics meeting had some strange consequences. There was only one actual duplicate talk, but the first day of talks all felt like they would have been welcome additions to the earlier conference. Some might be functioning as “overflow”: Elliptics this year focused on discussion and so didn’t have many slots for talks, while this conference despite its discussion-focused goal had a more packed schedule. In other cases, people might have been persuaded by the more relaxed atmosphere and lack of recording or posted slides to give more speculative talks. Oliver Schlotterer’s talk was likely in this category, a discussion of the genus-two functions one step beyond elliptics that I think people at the previous conference would have found very exciting, but which involved work in progress that I could understand him being cautious about presenting.
The other days focused more on the bootstrap side, with progress on some surprising but not-quite-yet elliptic avenues. It was great to hear that Mark Spradlin is making new progress on his Ziggurat story, to hear James Drummond suggest a picture for cluster algebras that could generalize to other theories, and to get some idea of the mysterious ongoing story that animates my colleague Cristian Vergu.
There was one thing the organizers couldn’t have anticipated that ended up throwing the conference into a new light. The goal of the conference was to get people started bootstrapping elliptic functions, but in the meantime people have gotten started on their own. Roger Morales Espasa presented his work on this with several of my other colleagues. They can already reproduce a known result, the ten-particle elliptic double-box, and are well on-track to deriving something genuinely new, the twelve-particle version. It’s exciting, but it definitely makes the rest of us look around and take stock. Hopefully for the better!
I had a paper two weeks ago with a Master’s student, Alex Chaparro Pozo. I haven’t gotten a chance to talk about it yet, so I thought I should say a few words this week. It’s another entry in what I’ve been calling my cabinet of curiosities, interesting mathematical “objects” I’m sharing with the world.
I calculate scattering amplitudes, formulas that give the probability that particles scatter off each other in particular ways. While in principle I could do this with any particle physics theory, I have a favorite: a “toy model” called N=4 super Yang-Mills. N=4 super Yang-Mills doesn’t describe reality, but it lets us figure out cool new calculation tricks, and these often end up useful in reality as well.
Many scattering amplitudes in N=4 super Yang-Mills involve a type of mathematical functions called polylogarithms. These functions are especially easy to work with, but they aren’t the whole story. One we start considering more complicated situations (what if two particles collide, and eight particles come out?) we need more complicated functions, called elliptic polylogarithms.
The original calculation was pretty complicated. Two particles colliding, eight particles coming out, meant that in total we had to keep track of ten different particles. That gets messy fast. I’m pretty good at dealing with six particles, not ten. Luckily, it turned out there was a way to pretend there were six particles only: by “twisting” up the calculation, we found a toy model within the toy model: a six-particle version of the calculation. Much like the original was in a theory that doesn’t describe the real world, these six particles don’t describe six particles in that theory: they’re a kind of toy calculation within the toy model, doubly un-real.
With this nested toy model, I was confident we could do the calculation. I wasn’t confident I’d have time for it, though. This ended up making it perfect for a Master’s thesis, which is how Alex got into the game.
Alex worked his way through the calculation, programming and transforming, going from one type of mathematical functions to another (at least once because I’d forgotten to tell him the right functions to use, oops!) There were more details and subtleties than expected, but in the end everything worked out.
Alex left the field (not, as far as I know, because of this). And for a while, because of that especially thorough scooping, I didn’t publish.
What changed my mind, in part, was seeing the field develop in the meantime. It turns out toy models, and even nested toy models, are quite useful. We still have a lot ofuncertainty about what to do, how to use the new calculation methods and what they imply. And usually, the best way to get through that kind of uncertainty is with simple, well-behaved toy models.
So I thought, in the end, that this might be useful. Even if it’s a toy version of something that already exists, I expect it to be an educational toy, one we can learn a lot from. So I’ve put it out into the world, as part of this year’s cabinet of curiosities.
Elliptics has been growing in recent years, hurtling into prominence as a subfield of amplitudes (which is already a subfield of theoretical physics). This has led to growing lists of participants and a more and more packed schedule.
This year walked all of that back a bit. There were three talks a day: two one-hour talks by senior researchers and one half-hour talk by a junior researcher. The rest, as well as the whole last day, are geared to discussion. It’s an attempt to go back to the subfield’s roots. In the beginning, the Elliptics conferences drew together a small group to sort out a plan for the future, digging through the often-confusing mathematics to try to find a baseline for future progress. The field has advanced since then, but some of our questions are still almost as basic. What relations exist between different calculations? How much do we value fast numerics, versus analytical understanding? What methods do we want to preserve, and which aren’t serving us well? To answer these questions, it helps to get a few people together in one place, not to silently listen to lectures, but to question and discuss and hash things out. I may have heard a smaller range of topics at this year’s Elliptics, but due to the sheer depth we managed to probe on those fewer topics I feel like I’ve learned much more.
Since someone always asks, I should say that the talks were not recorded, but they are posting slides online, so if you’re interested in the topic you can watch there. A few people discussed new developments, some just published and some yet to be published. I discussed the work I talked about last week, and got a lot of good feedback and ideas about how to move forward.