Monthly Archives: October 2024

Transforming Particles Are Probably Here to Stay

It can be tempting to imagine the world in terms of lego-like building-blocks. Atoms stick together protons, neutrons, and electrons, and protons and neutrons are made of stuck-together quarks in turn. And while atoms, despite the name, aren’t indivisible, you might think that if you look small enough you’ll find indivisible, unchanging pieces, the smallest building-blocks of reality.

Part of that is true. We might, at some point, find the smallest pieces, the things everything else is made of. (In a sense, it’s quite likely we’ve already found them!) But those pieces don’t behave like lego blocks. They aren’t indivisible and unchanging.

Instead, particles, even the most fundamental particles, transform! The most familiar example is beta decay, a radioactive process where a neutron turns into a proton, emitting an electron and a neutrino. This process can be explained in terms of more fundamental particles: the neutron is made of three quarks, and one of those quarks transforms from a “down quark” to an “up quark”. But the explanation, as far as we can tell, doesn’t go any deeper. Quarks aren’t unchanging, they transform.

Beta decay! Ignore the W, which is important but not for this post.

There’s a suggestion I keep hearing, both from curious amateurs and from dedicated crackpots: why doesn’t this mean that quarks have parts? If a down quark can turn into an up dark, an electron, and a neutrino, then why doesn’t that mean that a down quark contains an up quark, an electron, and a neutrino?

The simplest reason is that this isn’t the only way a quark transforms. You can also have beta-plus decay, where an up quark transforms into a down quark, emitting a neutrino and the positively charged anti-particle of the electron, called a positron.

Also, ignore the directions of the arrows, that’s weird particle physics notation that doesn’t matter here.

So to make your idea work, you’d somehow need each down quark to contain an up quark plus some other particles, and each up quark to contain a down quark plus some other particles.

Can you figure out some complicated scheme that works like that? Maybe. But there’s a deeper reason why this is the wrong path.

Transforming particles are part of a broader phenomenon, called particle production. Reactions in particle physics can produce new particles that weren’t there before. This wasn’t part of the earliest theories of quantum mechanics that described one electron at a time. But if you want to consider the quantum properties of not just electrons, but the electric field as well, then you need a more complete theory, called a quantum field theory. And in those theories, you can produce new particles. It’s as simple as turning on the lights: from a wiggling electron, you make light, which in a fully quantum theory is made up of photons. Those photons weren’t “part of” the electron to start with, they are produced by its motion.

If you want to avoid transforming particles, to describe everything in terms of lego-like building-blocks, then you want to avoid particle production altogether. Can you do this in a quantum field theory?

Actually, yes! But your theory won’t describe the whole of the real world.

In physics, we have examples of theories that don’t have particle production. These example theories have a property called integrability. They are theories we can “solve”, doing calculations that aren’t possible in ordinary theories, named after the fact that the oldest such theories in classical mechanics were solved using integrals.

Normal particle physics theories have conserved charges. Beta decay conserves electric charge: you start out with a neutral particle, and end up with one particle with positive charge and another with negative charge. It also conserves other things, like “electron-number” (the electron has electron-number one, the neutrino that comes out with it has electron-number minus one), energy, and momentum.

Integrable theories have those charges too, but they have more. In fact, they have an infinite number of conserved charges. As a result, you can show that in these theories it is impossible to produce new particles. It’s as if each particle’s existence is its own kind of conserved charge, one that can never be created or destroyed, so that each collision just rearranges the particles, never makes new ones.

But while we can write down these theories, we know they can’t describe the whole of the real world. In an integrable theory, when you build things up from the fundamental building-blocks, their energy follows a pattern. Compare the energy of a bunch of different combinations, and you find a characteristic kind of statistical behavior called a Poisson distribution.

Look at the distribution of energies of nuclei of atoms, and you’ll find a very different kind of behavior. It’s called a Wigner-Dyson distribution, and it indicates the opposite of integrability: chaos. Chaos is behavior that can’t be “solved” like integrable theories, behavior that has to be approached by simulations and approximations.

So if you really want there to be un-changing building-blocks, if you think that’s really essential? Then you should probably start looking at integrable theories. But I wouldn’t hold my breath if I were you: the real world seems pretty clearly chaotic, not integrable. And probably, that means particle production is here to stay.

Lack of Recognition Is a Symptom, Not a Cause

Science is all about being first. Once a discovery has been made, discovering the same thing again is redundant. At best, you can improve the statistical evidence…but for a theorem or a concept, you don’t even have that. This is why we make such a big deal about priority: the first person to discover something did something very valuable. The second, no matter how much effort and insight went into their work, did not.

Because priority matters, for every big scientific discovery there is a priority dispute. Read about science’s greatest hits, and you’ll find people who were left in the wings despite their accomplishments, people who arguably found key ideas and key discoveries earlier than the people who ended up famous. That’s why the idea Peter Higgs is best known for, the Higgs mechanism,

“is therefore also called the Brout–Englert–Higgs mechanism, or Englert–Brout–Higgs–Guralnik–Hagen–Kibble mechanism, Anderson–Higgs mechanism,Anderson–Higgs–Kibble mechanism, Higgs–Kibble mechanism by Abdus Salam and ABEGHHK’tH mechanism (for Anderson, Brout, Englert, Guralnik, Hagen, Higgs, Kibble, and ‘t Hooft) by Peter Higgs.”

Those who don’t get the fame don’t get the rewards. The scientists who get less recognition than they deserve get fewer grants and worse positions, losing out on the career outcomes that the person famous for the discovery gets, even if the less-recognized scientist made the discovery first.

…at least, that’s the usual story.

You can start to see the problem when you notice a contradiction: if a discovery has already been made, what would bring someone to re-make it?

Sometimes, people actually “steal” discoveries, finding something that isn’t widely known and re-publishing it without acknowledging the author. More often, though, the re-discoverer genuinely didn’t know. That’s because, in the real world, we don’t all know about a discovery as soon as it’s made. It has to be communicated.

At minimum, this means you need enough time to finish ironing out the kinks of your idea, write up a paper, and disseminate it. In the days before the internet, dissemination might involve mailing pre-prints to universities across the ocean. It’s relatively easy, in such a world, for two people to get started discovering the same thing, write it up, and even publish it before they learn about the other person’s work.

Sometimes, though, something gets rediscovered long after the original paper should have been available. In those cases, the problem isn’t time, it’s reach. Maybe the original paper was written in a way that hid its implications. Maybe it was published in a way only accessible to a smaller community: either a smaller part of the world, like papers that were only available to researchers in the USSR, or a smaller research community. Maybe the time hadn’t come yet, and the whole reason why the result mattered had yet to really materialize.

For a result like that, a lack of citations isn’t really the problem. Rather than someone who struggles because their work is overlooked, these are people whose work is overlooked, in a sense, because they are struggling: because their work is having a smaller impact on the work of others. Acknowledging them later can do something, but it can’t change the fact that this was work published for a smaller community, yielding smaller rewards.

And ultimately, it isn’t just priority we care about, but impact. While the first European to make contact with the New World might have been Erik the Red, we don’t call the massive exchange of plants and animals between the Old and New World the “Red Exchange”. Erik the Red being “first” matters much less, historically speaking, than Columbus changing the world. Similarly, in science, being the first to discover something is meaningless if that discovery doesn’t change how other people do science, and the person who manages to cause that change is much more valuable than someone who does the same work but doesn’t manage the same reach.

Am I claiming that it’s fair when scientists get famous for other peoples’ discoveries? No, it’s definitely not fair. It’s not fair because most of the reasons one might have lesser reach aren’t under one’s control. Soviet scientists (for the most part) didn’t choose to be based in the USSR. People who make discoveries before they become relevant don’t choose the time in which they were born. And while you can get better at self-promotion with practice, there’s a limited extent to which often-reclusive scientists should be blamed for their lack of social skills.

What I am claiming is that addressing this isn’t a matter of scrupulously citing the “original” discoverer after the fact. That’s a patch, and a weak one. If we want to get science closer to the ideal, where each discovery only has to be made once, then we need to work to increase reach for everyone. That means finding ways to speed up publication, to let people quickly communicate preliminary ideas with a wide audience and change the incentives so people aren’t penalized when others take up those ideas. It means enabling conversations between different fields and sub-fields, building shared vocabulary and opportunities for dialogue. It means making a community that rewards in-person hand-shaking less and careful online documentation more, so that recognition isn’t limited to the people with the money to go to conferences and the social skills to schmooze their way through them. It means anonymity when possible, and openness when we can get away with it.

Lack of recognition and redundant effort are both bad, and they both stem from the same failures to communicate. Instead of fighting about who deserves fame, we should work to make sure that science is truly global and truly universal. We can aim for a future where no-one’s contribution goes unrecognized, and where anything that is known to one is known to all.

Congratulations to John Hopfield and Geoffrey Hinton!

The 2024 Physics Nobel Prize was announced this week, awarded to John Hopfield and Geoffrey Hinton for using physics to propose foundational ideas in the artificial neural networks used for machine learning.

If the picture above looks off-center, it’s because this is the first time since 2015 that the Physics Nobel has been given to two, rather than three, people. Since several past prizes bundled together disparate ideas in order to make a full group of three, it’s noteworthy that this year the committee decided that each of these people deserved 1/2 the prize amount, without trying to find one more person to water it down further.

Hopfield was trained as a physicist, working in the broad area known as “condensed matter physics”. Condensed matter physicists use physics to describe materials, from semiconductors to crystals to glass. Over the years, Hopfield started using this training less for the traditional subject matter of the field and more to study the properties of living systems. He moved from a position in the physics department of Princeton to chemistry and biology at Caltech. While at Caltech he started studying neuroscience and proposed what are now known as Hopfield networks as a model for how neurons store memory. Hopfield networks have very similar properties to a more traditional condensed matter system called a “spin glass”, and from what he knew about those systems Hopfield could make predictions for how his networks would behave. Those networks would go on to be a major inspiration for the artificial neural networks used for machine learning today.

Hinton was not trained as a physicist, and in fact has said that he didn’t pursue physics in school because the math was too hard! Instead, he got a bachelor’s degree in psychology, and a PhD in the at the time nascent field of artificial intelligence. In the 1980’s, shortly after Hopfield published his network, Hinton proposed a network inspired by a closely related area of physics, one that describes temperature in terms of the statistics of moving particles. His network, called a Boltzmann machine, would be modified and made more efficient over the years, eventually becoming a key part of how artificial neural networks are “trained”.

These people obviously did something impressive. Was it physics?

In 2014, the Nobel prize in physics was awarded to the people who developed blue LEDs. Some of these people were trained as physicists, some weren’t: Wikipedia describes them as engineers. At the time, I argued that this was fine, because these people were doing “something physicists are good at”, studying the properties of a physical system. Ultimately, the thing that ties together different areas of physics is training: physicists are the people who study under other physicists, and go on to collaborate with other physicists. That can evolve in unexpected directions, from more mathematical research to touching on biology and social science…but as long as the work benefits from being linked to physics departments and physics degrees, it makes sense to say it “counts as physics”.

By that logic, we can probably call Hopfield’s work physics. Hinton is more uncertain: his work was inspired by a physical system, but so are other ideas in computer science, like simulated annealing. Other ideas, like genetic algorithms, are inspired by biological systems: does that mean they count as biology?

Then there’s the question of the Nobel itself. If you want to get a Nobel in physics, it usually isn’t enough to transform the field. Your idea has to actually be tested against nature. Theoretical physics is its own discipline, with several ideas that have had an enormous influence on how people investigate new theories, ideas which have never gotten Nobels because the ideas were not intended, by themselves, to describe the real world. Hopfield networks and Boltzmann machines, similarly, do not exist as physical systems in the real world. They exist as computer simulations, and it is those computer simulations that are useful. But one can simulate many ideas in physics, and that doesn’t tend to be enough by itself to get a Nobel.

Ultimately, though, I don’t think this way of thinking about things is helpful. The Nobel isn’t capable of being “fair”, there’s no objective standard for Nobel-worthiness, and not much reason for there to be. The Nobel doesn’t determine which new research gets funded, nor does it incentivize anyone (except maybe Brian Keating). Instead, I think the best way of thinking about the Nobel these days is a bit like Disney.

When Disney was young, its movies had to stand or fall on their own merits. Now, with so many iconic movies in its history, Disney movies are received in the context of that history. Movies like Frozen or Moana aren’t just trying to be a good movie by themselves, they’re trying to be a Disney movie, with all that entails.

Similarly, when the Nobel was young, it was just another award, trying to reward things that Alfred Nobel might have thought deserved rewarding. Now, though, each Nobel prize is expected to be “Nobel-like”, an analogy between each laureate and the laureates of the past. When new people are given Nobels the committee is on some level consciously telling a story, saying that these people fit into the prize’s history.

This year, the Nobel committee clearly wanted to say something about AI. There is no Nobel prize for computer science, or even a Nobel prize for mathematics. (Hinton already has the Turing award, the most prestigious award in computer science.) So to say something about AI, the Nobel committee gave rewards in other fields. In addition to physics, this year’s chemistry award went in part to the people behind AlphaFold2, a machine learning tool to predict what shapes proteins fold into. For both prizes, the committee had a reasonable justification. AlphaFold2 genuinely is an amazing advance in the chemistry of proteins, a research tool like nothing that came before. And the work of Hopfield and Hinton did lead ideas in physics to have an enormous impact on the world, an impact that is worth recognizing. Ultimately, though, whether or not these people should have gotten the Nobel doesn’t depend on that justification. It’s an aesthetic decision, one that (unlike Disney’s baffling decision to make live-action remakes of their most famous movies) doesn’t even need to impress customers. It’s a question of whether the action is “Nobel-ish” enough, according to the tastes of the Nobel committee. The Nobel is essentially expensive fanfiction of itself.

And honestly? That’s fine. I don’t think there’s anything else they could be doing at this point.

At Quanta This Week, With a Piece on Multiple Imputation

I’ve got another piece in Quanta Magazine this week.

While my past articles in Quanta have been about physics, this time I’m stretching my science journalism muscles in a new direction. I was chatting with a friend who works for a pharmaceutical company, and he told me about a statistical technique that sounded ridiculous. Luckily, he’s a patient person, and after annoying him and a statistician family member for a while I understood that the technique actually made sense. Since I love sharing counterintuitive facts, I thought this would be a great story to share with Quanta’s readers. I then tracked down more statisticians, and annoyed them in a more professional way, finally resulting in the Quanta piece.

The technique is called multiple imputation, and is a way to deal with missing data. By filling in (“imputing”) missing information with good enough guesses, you can treat a dataset with missing data as if it was complete. If you do this imputation multiple times with the help of a source of randomness, you can also model how uncertain those guesses are, so your final statistical estimates are as uncertain as they ought to be. That, in a nutshell, is multiple imputation.

In the piece, I try to cover the key points: how the technique came to be, how it spread, and why people use it. To complement that, in this post I wanted to get a little bit closer to the technical details, and say a bit about why some of the workarounds a naive physicist would come up with don’t actually work.

If you’re anything like me, multiple imputation sounds like a very weird way to deal with missing data. In order to fill in missing data, you have to use statistical techniques to find good guesses. Why can’t you just use the same techniques to analyze the data in the first place? And why do you have to use a random number generator to model your uncertainty, instead of just doing propagation of errors?

It turns out, you can sort of do both of these things. Full Information Maximum Likelihood is a method where you use all the data you have, and only the data you have, without imputing anything or throwing anything out. The catch is that you need a model, one with parameters you can try to find the most likely values for. Physicists usually do have a model like this (for example, the Standard Model), so I assumed everyone would. But for many things you want to measure in social science and medicine, you don’t have any such model, so multiple imputation ends up being more versatile in practice.

(If you want more detail on this, you need to read something written by actual statisticians. The aforementioned statistician family member has a website here that compares and contrasts multiple imputation with full information maximum likelihood.)

What about the randomness? It turns out there is yet another technique, called Fractional Imputation. While multiple imputation randomly chooses different values to impute, fractional imputation gives each value a weight based on the chance for it to come up. This gives the same result…if you can compute the weights, and store all the results. The impression I’ve gotten is that people are working on this, but it isn’t very well-developed.

“Just do propagation of errors”, the thing I wanted to suggest as a physicist, is much less of an option. In many of these datasets, you don’t attribute errors to the base data points to begin with. And on the other hand, if you want to be more sophisticated, then something like propagation of errors is too naive. You have a variety of different variables, correlated with each other in different ways, giving a complicated multivariate distribution. Propagation of errors is already pretty fraught when you go beyond linear relationships (something they don’t tend to tell baby physicists), using it for this would be pushing it rather too far.

The thing I next wanted to suggest, “just carry the distribution through the calculation”, turns out to relate to something I’ve called the “one philosophical problem of my sub-field”. In the area of physics I’ve worked in, a key question is what it means to have “done” an integral. Here, one can ask what it means to do a calculation on a distribution. In both cases, the end goal is to get numbers out: physics predictions on the one hand, statistical estimates on the other. You can get those numbers by “just” doing numerics, using randomness and approximations to estimate the number you’re interested in. And in a way, that’s all you can do. Any time you “just do the integral” or “just carry around the distribution”, the thing you get in the end is some function: it could be a well-understood function like a sine or log, or it could be an exotic function someone defined for that purpose. But whatever function you get, you get numbers out of it the same way. A sine or a log, on a computer, is just an approximation scheme, a program that outputs numbers.

(But we do still care about analytic results, we don’t “just” do numerics. That’s because understanding the analytics helps us do numerics better, we can get more precise numbers faster and more stably. If you’re just carrying around some arbitrarily wiggly distribution, it’s not clear you can do that.)

So at this point, I get it. I’m still curious to see how Fractional Imputation develops, and when I do have an actual model I’d lean to wanting to use Full Information Maximum Likelihood instead. (And there are probably some other caveats I may need to learn at some point!) But I’m comfortable with the idea that Multiple Imputation makes sense for the people using it.