Why Solving the Muon Puzzle Doesn’t Solve the Puzzle

You may have heard that the muon g-2 problem has been solved.

Muons are electrons’ heavier cousins. As spinning charged particles, they are magnetic, the strength of that magnetism characterized by a number denoted “g”. If you were to guess this number from classical physics alone, you’d conclude it should be 2, but quantum mechanics tweaks it. The leftover part, “g-2”, can be measured, and predicted, with extraordinary precision, which ought to make it an ideal test: if our current understanding of the particle physics, called the Standard Model, is subtly wrong, the difference might be noticeable there.

And for a while, it looked like such a difference was indeed noticeable. Extremely precise experiments over the last thirty years have consistently found a number slightly different from the extremely precise calculations, different enough that it seemed quite unlikely to be due to chance.

Now, the headlines are singing a different tune.

What changed?

That headline might make you think the change was an experimental result, a new measurement that changed the story. It wasn’t, though. There is a new, more precise measurement, but it agrees with the old measurements.

So the change has to be in the calculations, right? They did a new calculation, corrected a mistake or just pushed up their precision, and found that the Standard Model matches the experiment after all?

…sort of, but again, not really. The group of theoretical physicists associated with the experiment did release new, more accurate calculations. But it wasn’t the new calculations, by themselves, that made a difference. Instead, it was a shift in what kind of calculations they used…or even more specifically, what kind of calculations they trusted.

Parts of the calculation of g-2 can be done with Feynman diagrams, those photogenic squiggles you see on physicists’ blackboards. That part is very precise, and not especially controversial. However, Feynman diagrams only work well when forces between particles are comparatively weak. They’re great for electromagnetism, even better for the weak nuclear force. But for the strong nuclear force, the one that holds protons and neutrons together, you often need a different method.

For g-2, that used to be done via a “data-driven” method. Physicists measured different things, particles affected by the strong nuclear force in different ways, and used that to infer how the strong force would affect g-2. By getting a consistent picture from different experiments, they were reasonably confident that they had the right numbers.

Back in 2020, though, a challenger came to the scene, with another method. Called lattice QCD, this method involves building gigantic computer simulations of the effect of the strong force. People have been doing lattice QCD since the 1970’s, and the simulations have been getting better and better, until in 2020, a group managed to calculate the piece of the g-2 calculation that had until then been done by the data-driven method.

The lattice group found a very different result than what had been found previously. Instead of a wild disagreement with experiment, their calculation agreed. According to them, everything was fine, the muon g-2 was behaving exactly as the Standard Model predicted.

For some of us, that’s where the mystery ended. Clearly, something must be wrong with the data-driven method, not with the Standard Model. No more muon puzzle.

But the data-driven method wasn’t just a guess, it was being used for a reason. A significant group of physicists found the arguments behind it convincing. Now, there was a new puzzle: figuring out why the data-driven method and lattice QCD disagree.

Five years later, has that mystery been solved? Is that, finally, what the headlines are about?

Again, not really, no.

The theorists associated with the experiment have decided to trust lattice QCD, not the data-driven method. But they don’t know what went wrong, exactly.

Instead, they’ve highlighted cracks in the data-driven method. The way the data-driven method works, it brings together different experiments to try to get a shared picture. But that shared picture has started to fall apart. A new measurement by a different experiment doesn’t fit into the system: the data-driven method now “has tensions”, as physicists say. It’s no longer possible to combine all experiments into a shared picture they way they used to. Meanwhile, lattice QCD has gotten even better, reaching even higher precision. From the perspective of the theorists associated with the muon g-2 experiment, switching methods is now clearly the right call.

But does that mean they solved the puzzle?

If you were confident that lattice QCD is the right approach, then the puzzle was already solved in 2020. All that changed was the official collaboration finally acknowledging that.

And if you were confident that the data-driven method was the right approach, then the puzzle is even worse. Now, there are tensions within the method itself…but still no explanation of what went wrong! If you had good reasons to think the method should work, you still have those good reasons. Now you’re just…more puzzled.

I am reminded of another mystery, a few years back, when an old experiment announced a dramatically different measurement for the mass of the W boson. Then, I argued the big mystery was not how the W boson’s mass had changed (it hadn’t), but how they came to be so confident in a result so different from what others, also confidently, had found. In physics, our confidence is encoded in numbers, estimated and measured and tested and computed. If we’re not estimating that confidence correctly…then that’s the real mystery, the real puzzle. One much more important to solve.


Also, I had two more pieces out this week! In Quanta I have a short explainer about bosons and fermions, while at Ars Technica I have a piece about machine learning at the LHC. I may have a “bonus info” post on the latter at some point, I have to think about whether I have enough material for it.

4 thoughts on “Why Solving the Muon Puzzle Doesn’t Solve the Puzzle

  1. One Guy's avatarOne Guy

    Does this data-driven method predict other results? If so it makes sense to try and measure them again more prciesly and see if the method is wrong again

    Like

    Reply
    1. 4gravitons's avatar4gravitons Post author

      I have the vague impression that’s kind of what happened (a more precise measurement that diverged from the data-driven method’s predictions), it’s just that because of how the data-driven method works this mostly looks like a tension between two different measurements from their perspective. But this isn’t something I know about in detail.

      Like

      Reply
  2. Andrew Oh-Willeke's avatarAndrew Oh-Willeke

    “If you were confident that lattice QCD is the right approach, then the puzzle was already solved in 2020.”

    The data driven method was always fundamentally “a cheat”. The goal was always to do a first principles calculation using the equations of the Standard Model and no inputs other than the fundamental experimentally measured parameters of the Standard Model. Comparing theoretical predictions to experiment is the most basic and straightforward kind of science there is.

    The data driven method was used because doing the calculations from first principles for the hadronic part of muon g-2, as one ideally would to test the theory, with the desired precision, was just too hard from a technical feasibility perspective, until the BMW group came along and did it, and the key components of the BMW calculations were reproduced independently many times in order to corroborate their Lattice QCD calculation.

    In 2020, the puzzle wasn’t quite solved, because their hadn’t been independent replication of their result, while by 2025, there was (and indeed, refinements of the Lattice QCD calculation beyond those available in 2020, encouragingly made the calculation more precise and brought its conclusion closer to the final experimental result).

    There is a new puzzle regarding why there is a tension between the experimental results that were used for the 2020 data driven result, and the new experimental result that is closer to the Lattice QCD calculation. But that’s an analytically distinct question from whether the muon g-2 puzzle is solved.

    The new puzzle is more like a forensic after the fact analysis of why the Challenger exploded that ultimately fingered its O-rings. We know that something went wrong in one or more of the underlying experiments, because they aren’t consistent, and now need to figure out what it is. Presumptively, somebody did something wrong in how they conducted the experiment, with the most anodyne of the possibilities being that some source of systemic error in one or more of the experiments was underestimated or omitted. (The prediction based on the data driven method may have been off by 4-5 sigma, given the stated uncertainties of that method, but it still wasn’t that far off in an experiment where the uncertainties involved were still at the parts per billion of the overall result level.)

    The new puzzle also has a lot in common with the proton radius puzzle that was trying to figure out why protons in muonic hydrogen seemingly had a different radius than those in ordinary hydrogen, even though there was no physics reason why this should be the case. New experiments ultimately reconciled the two.

    The new puzzle is a lot less significant, however. It’s basically an after the fact debriefing and analysis to try to do better next time when conducting experiments, rather than presenting an meaningful unsolved question in physics that has to be resolved.

    Like

    Reply
    1. 4gravitons's avatar4gravitons Post author

      There’s a big difference between space shuttles and physics experiments, though. For the space shuttle, it’s obvious what “going right” looks like: a space shuttle that doesn’t explode. In physics, the problem appears to be in our tools to estimate whether there is a problem in the first place. A priori, we don’t know how far that goes. Whatever methodology led to people being this confident that the data-driven method indicated a real anomaly, it may well have led to them being overconfident about other results. Accurately estimating theoretical uncertainties is how we know that there are problems to begin with, if we can’t actually do that it jeopardizes everything.

      Like

      Reply

Leave a reply to One Guy Cancel reply