Have you heard the news about the W boson?
The W boson is a fundamental particle, part of the Standard Model of particle physics. It is what we call a “force-carrying boson”, a particle related to the weak nuclear force in the same way photons are related to electromagnetism. Unlike photons, W bosons are “heavy”: they have a mass. We can’t usually predict masses of particles, but the W boson is a bit different, because its mass comes from the Higgs boson in a special way, one that ties it to the masses of other particles like the Z boson. The upshot is that if you know the mass of a few other particles, you can predict the mass of the W.
And according to a recent publication, that prediction is wrong. A team analyzed results from an old experiment called the Tevatron, the biggest predecessor of today’s Large Hadron Collider. They treated the data with groundbreaking care, mindbogglingly even taking into account the shape of the machine’s wires. And after all that analysis, they found that the W bosons detected by the Tevatron had a different mass than the mass predicted by the Standard Model.
How different? Here’s where precision comes in. In physics, we decide whether to trust a measurement with a statistical tool. We calculate how likely the measurement would be, if it was an accident. In this case: how likely it would be that, if the Standard Model was correct, the measurement would still come out this way? To discover a new particle, we require this chance to be about one in 3.5 million, or in our jargon, five sigma. That was the requirement for discovering the Higgs boson. This super-precise measurement of the W boson doesn’t have five sigma…it has seven sigma. That means, if we trust the analysis team, then a measurement like this could come accidentally out of the Standard Model only about one in a trillion times.
Ok, should we trust the analysis team?
If you want to know that, I’m the wrong physicist to ask. The right physicists are experimental particle physicists. They do analyses like that one, and they know what can go wrong. Everyone I’ve heard from in that field emphasized that this was a very careful group, who did a lot of things impressively right…but there is still room for mistakes. One pointed out that the new measurement isn’t just inconsistent with the Standard Model, but with many previous measurements too. Those measurements are less precise, but still precise enough that we should be a bit skeptical. Another went into more detail about specific clues as to what might have gone wrong.
If you can’t find an particle experimentalist, the next best choice is a particle phenomenologist. These are the people who try to make predictions for new experiments, who use theoretical physics to propose new models that future experiments can test. Here’s one giving a first impression, and discussing some ways to edit the Standard Model to agree with the new measurement. Here’s another discussing what to me is an even more interesting question: if we take these measurements seriously, both the new one and the old ones, then what do we believe?
I’m not an experimentalist or a phenomenologist. I’m an “amplitudeologist”. I work not on the data, or the predictions, but the calculational tools used to make those predictions, called “scattering amplitudes”. And that gives me a different view on the situation.
See in my field, precision is one of our biggest selling-points. If you want theoretical predictions to match precise experiments, you need our tricks to compute them. We believe (and argue to grant agencies) that this precision will be important: if a precise experiment and a precise prediction disagree, it could be the first clue to something truly new. New solid evidence of something beyond the Standard Model would revitalize all of particle physics, giving us a concrete goal and killing fruitless speculation.
This result shakes my faith in that a little. Probably, the analysis team got something wrong. Possibly, all previous analyses got something wrong. Either way, a lot of very careful smart people tried to estimate their precision, got very confident…and got it wrong.
(There’s one more alternative: maybe million-to-one chances really do crop up nine times out of ten.)
If some future analysis digs down deep in precision, and finds another deviation from the Standard Model, should we trust it? What if it’s measuring something new, and we don’t have the prior experiments to compare to?
(This would happen if we build a new even higher-energy collider. There are things the collider could measure, like the chance one Higgs boson splits into two, that we could not measure with any earlier machine. If we measured that, we couldn’t compare it to the Tevatron or the LHC, we’d have only the new collider to go on.)
Statistics are supposed to tell us whether to trust a result. Here, they’re not doing their job. And that creates the scary possibility that some anomaly shows up, some real deviation deep in the sigmas that hints at a whole new path for the field…and we just end up bickering about who screwed it up. Or the equally scary possibility that we find a seven-sigma signal of some amazing new physics, build decades of new theories on it…and it isn’t actually real.
We don’t just trust statistics. We also trust the things normal people trust. Do other teams find the same result? (I hope that they’re trying to get to this same precision here, and see what went wrong!) Does the result match other experiments? Does it make predictions, which then get tested in future experiments?
All of those are heuristics of course. Nothing can guarantee that we measure the truth. Each trick just corrects for some of our biases, some of the ways we make mistakes. We have to hope that’s good enough, that if there’s something to see we’ll see it, and if there’s nothing to see we won’t. Precision, my field’s raison d’être, can’t be enough to convince us by itself. But it can help.
Thanks for the post and the links!
Most of the experts do not seem to be particularly convinced. Maybe 7 sigma is too much?
Or the discrepancy is too big to be plausible?
Yet, in T. Dorigo’s post I read that the error uncertainty bar is significantly much narrower than most of the other measurements. Only two of them ( DO II, ATLAS) are relatively close to that accuracy. Obviously, there are details and subtleties, beyond the disagreement with the Standard Model, that causes this lack of enthusiasm.
Yeah the disagreement with other, prior measurements is much more important than the disagreement with the SM here. The new measurement is much more precise, but the older measurements are still precise enough that either way something fishy happened (see the Resonaances post in particular for an attempt to make that quantitative).
As for “7 sigma is too much”, it’s hard to say. I like (I think it was) Dorigo’s point that ATLAS and CMS must be kicking themselves now trying to figure out how they can catch up to that precision.
I’m not sure I’d be too worried about a lack of faith in future statistical analyses. If there is a problem it’s likely to be an unanticipated systematic error. Such systematic errors are the bane of the data analyst’s life since you cannot detect them in the analysis.
Sure, I’m not literally saying “statistics is broken”. But if unanticipated systematic errors come up every time you try to do something statistically ambitious, then it discourages you from doing more things that are statistically ambitious.
“This would happen if we build a new even higher-energy collider. There are things the collider could measure, like the chance one Higgs boson splits into two, that we could not measure with any earlier machine. If we measured that, we couldn’t compare it to the Tevatron or the LHC, we’d have only the new collider to go on.”
This is why we almost always set up at least two parallel experiments with a major experimental apparatus like Tevatron or the LHC, and two trust the experimental result fully until it is replicated by both experiments. And why, we also insist on having some plausible theory explanation before being really comfortable with a result. The shorthand is that five sigma is a discovery, but the formula for a discovery is really five sigma, confirmed at more than one experiment, with a plausible theoretical basis in physics.
FWIW, the rush to phenomenology papers using only a single new measurement of the W boson mass when there are already nine or ten done already which contradict the new result measuring the same thing is silly. Perhaps, if the scientific community had better incentives we’d see instead a rush of papers about possible sources of unquantified systemic error, which is what every physicists with a general public orientation I’ve seen size it up has said. I also share the view of some critics that this result really should have been preprinted before being published, to allow for more commentary that a small panel of peer reviewers.
At a minimum the phenomenologists should be looking at their theories from the perspective of a revised global average that includes this as just one new measurement of something already measured and know (all of which also still do agree to three significant digits, at least, regardless of the number of standard deviations of differences between this result and the prior global average and global electroweak fit value — this isn’t quite as extreme as muon g-2 where every experimental and theoretical prediction agree at the part per million basis, but more caution is warranted when the deviations between results in percentage terms are small since it is easy to get a systemic uncertainty estimate wrong even for the best of the best scientists).
I also think the better way to think about the global electroweak fit is as an indirect measurement that should just be included in the global average with an uncertainty weight, which isn’t all that special and just adds one more datapoint. Ultimately, a global electroweak fit is based on other measurements like the Z boson mass, the Fine Structure constant, Fermi’s constant, the Higgs boson mass and the top quark mass, each of which have their own uncertainties to some greater or lesser degree. The global electroweak fit doesn’t come from some mathematical constant, ultimately, it is just another measurement made by different means.
Given that the Tevatron result was using old data one also sees similarities to the muonic hydrogen radius case where it turned out that it was the old ordinary hydrogen radius measurement that was flawed, rather than the new muonic hydrogen radius measurement.
There was also a notable preprint published before any of the W boson news broke, which deserves mention, suggesting that there is a pretty much across the board systemic error of 20 MeV (which would reduce the previously measured values) regarding the definitions used to translate experimental data into a final result in this and all of the previous measurements, except the global electroweak fit. See Scott Willenbrock, “Mass and width of an unstable particle” arXiv:2203.11056 (March 21, 2022).
LikeLiked by 2 people