The Nowhere String

Space and time seem as fundamental as anything can get. Philosophers like Immanuel Kant thought that they were inescapable, that we could not conceive of the world without space and time. But increasingly, physicists suspect that space and time are not as fundamental as they appear. When they try to construct a theory of quantum gravity, physicists find puzzles, paradoxes that suggest that space and time may just be approximations to a more fundamental underlying reality.

One piece of evidence that quantum gravity researchers point to are dualities. These are pairs of theories that seem to describe different situations, including with different numbers of dimensions, but that are secretly indistinguishable, connected by a “dictionary” that lets you interpret any observation in one world in terms of an equivalent observation in the other world. By itself, duality doesn’t mean that space and time aren’t fundamental: as I explained in a blog post a few years ago, it could still be that one “side” of the duality is a true description of space and time, and the other is just a mathematical illusion. To show definitively that space and time are not fundamental, you would want to find a situation where they “break down”, where you can go from a theory that has space and time to a theory that doesn’t. Ideally, you’d want a physical means of going between them: some kind of quantum field that, as it shifts, changes the world between space-time and not space-time.

What I didn’t know when I wrote that post was that physicists already knew about such a situation in 1993.

Back when I was in pre-school, famous string theorist Edward Witten was trying to understand something that others had described as a duality, and realized there was something more going on.

In string theory, particles are described by lengths of vibrating string. In practice, string theorists like to think about what it’s like to live on the string itself, seeing it vibrate. In that world, there are two dimensions, one space dimension back and forth along the string and one time dimension going into the future. To describe the vibrations of the string in that world, string theorists use the same kind of theory that people use to describe physics in our world: a quantum field theory. In string theory, you have a two-dimensional quantum field theory stuck “inside” a theory with more dimensions describing our world. You see that this world exists by seeing the kinds of vibrations your two-dimensional world can have, through a type of quantum field called a scalar field. With ten scalar fields, ten different ways you can push energy into your stringy world, you can infer that the world around you is a space-time with ten dimensions.

String theory has “extra” dimensions beyond the three of space and one of time we’re used to, and these extra dimensions can be curled up in various ways to hide them from view, often using a type of shape called a Calabi-Yau manifold. In the late 80’s and early 90’s, string theorists had found a similarity between the two-dimensional quantum field theories you get folding string theory around some of these Calabi-Yau manifolds and another type of two-dimensional quantum field theory related to theories used to describe superconductors. People called the two types of theories dual, but Witten figured out there was something more going on.

Witten described the two types of theories in the same framework, and showed that they weren’t two equivalent descriptions of the same world. Rather, they were two different ways one theory could behave.

The two behaviors were connected by something physical: the value of a quantum field called a modulus field. This field can be described by a number, and that number can be positive or negative.

When the modulus field is a large positive number, then the theory behaves like string theory twisted around a Calabi-Yau manifold. In particular, the scalar fields have many different values they can take, values that are smoothly related to each other. These values are nothing more or less than the position of the string in space and time. Because the scalars can take many values, the string can sit in many different places, and because the values are smoothly related to each other, the string can smoothly move from one place to another.

When the modulus field is a large negative number, then the theory is very different. What people thought of as the other side of the duality, a theory like the theories used to describe superconductors, is the theory that describes what happens when the modulus field is large and negative. In this theory, the scalars can no longer take many values. Instead, they have one option, one stable solution. That means that instead of there being many different places the string could sit, describing space, there are no different places, and thus no space. The string lives nowhere.

These are two very different situations, one with space and one without. And they’re connected by something physical. You could imagine manipulating the modulus field, using other fields to funnel energy into it, pushing it back and forth from a world with space to a world of nowhere. Much more than the examples I was aware of, this is a super-clear example of a model where space is not fundamental, but where it can be manipulated, existing or not existing based on physical changes.

We don’t know whether a model like this describes the real world. But it’s gratifying to know that it can be written down, that there is a picture, in full mathematical detail, of how this kind of thing works. Hopefully, it makes the idea that space and time are not fundamental sound a bit more reasonable.

The “That’s Neat” Level

Everything we do, we do for someone.

The simplest things we do for ourselves. We grab that chocolate bar on the table and eat it, and it makes us happier.

Unless the chocolate bar is homemade, we probably paid money for it. We do other things, working for a living, to get the money to get those chocolate bars for ourselves.

(We also get chocolate bars for our loved ones, or for people we care about. Whether this is not in a sense also getting a chocolate bar for yourself is left as an exercise to the reader.)

What we do for the money, in turn, is driven by what would make someone else happier. Sometimes this is direct: you cut someone’s hair, they enjoy the breeze, they pay you, you enjoy the chocolate.

Other times, this gets mediated. You work in HR at a haircut chain. The shareholders want more money, to buy things like chocolate bars, so they vote for a board who wants to do what the shareholders want so as not to be in breach of contract and get fewer chocolate bars, so the board tells you to do things they believe will achieve that, and you do them because that’s how you get your chocolate bars. Every so often, the shareholders take a look at how many chocolate bars they can afford and adjust.

Compared to all this, academia is weirdly un-mediated.

It gets the closest to this model with students. Students want to learn certain things because they will allow them to provide other people with better services in future, which they can use to buy chocolate bars, and other things for the sheer pleasure, a neat experience almost comparable to a chocolate bar. People running universities want more money from students so they can spend it on things like giant statues of chocolate bars, so they instruct people working in the university to teach more of the things students want. (Typically in a very indirect way, for example funding a department in the US based on number of majors rather than number of students.)

But there’s a big chunk of academics whose performance is mostly judged not by their teaching, but by their research. They are paid salaries by departments based on the past quality of their research, or paid out of grants awarded based on the expected future quality of their research. (Or to combine them, paid salaries by departments based on the expected size of their grants.)

And in principle, that introduces many layers of mediation. The research universities and grant agencies are funded by governments, which pool money together in the expectation that someday by doing so they will bring about a world where more people can eat chocolate bars.

But the potential to bring about a world of increased chocolate bars isn’t like maximizing shareholder value. Nobody can check, one year later, how much closer you are to the science-fueled chocolate bar utopia.

And so in practice, in science, people fund you because they think what you’re doing is neat. Because it scratches the chocolate-bar-shaped hole in their brains. They might have some narrative about how your work could lead to the chocolate bar utopia the government is asking for, but it’s not like they’re calculating the expected distribution of chocolate bars if they fund your project versus another. You have to convince a human being, not that you are doing something instrumentally and measurably useful…but that you are doing something cool.

And that makes us very weird people! Halfway between haircuts and HR, selling a chocolate bar that promises to be something more.

Replacing Space-Time With the Space in Your Eyes

Nima Arkani-Hamed thinks space-time is doomed.

That doesn’t mean he thinks it’s about to be destroyed by a supervillain. Rather, Nima, like many physicists, thinks that space and time are just approximations to a deeper reality. In order to make sense of gravity in a quantum world, seemingly fundamental ideas, like that particles move through particular places at particular times, will probably need to become more flexible.

But while most people who think space-time is doomed research quantum gravity, Nima’s path is different. Nima has been studying scattering amplitudes, formulas used by particle physicists to predict how likely particles are to collide in particular ways. He has been trying to find ways to calculate these scattering amplitudes without referring directly to particles traveling through space and time. In the long run, the hope is that knowing how to do these calculations will help suggest new theories beyond particle physics, theories that can’t be described with space and time at all.

Ten years ago, Nima figured out how to do this in a particular theory, one that doesn’t describe the real world. For that theory he was able to find a new picture of how to calculate scattering amplitudes based on a combinatorical, geometric space with no reference to particles traveling through space-time. He gave this space the catchy name “the amplituhedron“. In the years since, he found a few other “hedra” describing different theories.

Now, he’s got a new approach. The new approach doesn’t have the same kind of catchy name: people sometimes call it surfaceology, or curve integral formalism. Like the amplituhedron, it involves concepts from combinatorics and geometry. It isn’t quite as “pure” as the amplituhedron: it uses a bit more from ordinary particle physics, and while it avoids specific paths in space-time it does care about the shape of those paths. Still, it has one big advantage: unlike the amplituhedron, Nima’s new approach looks like it can work for at least a few of the theories that actually describe the real world.

The amplituhedron was mysterious. Instead of space and time, it described the world in terms of a geometric space whose meaning was unclear. Nima’s new approach also describes the world in terms of a geometric space, but this space’s meaning is a lot more clear.

The space is called “kinematic space”. That probably still sounds mysterious. “Kinematic” in physics refers to motion. In the beginning of a physics class when you study velocity and acceleration before you’ve introduced a single force, you’re studying kinematics. In particle physics, kinematic refers to the motion of the particles you detect. If you see an electron going up and to the right at a tenth the speed of light, those are its kinematics.

Kinematic space, then, is the space of observations. By saying that his approach is based on ideas in kinematic space, what Nima is saying is that it describes colliding particles not based on what they might be doing before they’re detected, but on mathematics that asks questions only about facts about the particles that can be observed.

(For the experts: this isn’t quite true, because he still needs a concept of loop momenta. He’s getting the actual integrands from his approach, rather than the dual definition he got from the amplituhedron. But he does still have to integrate one way or another.)

Quantum mechanics famously has many interpretations. In my experience, Nima’s favorite interpretation is the one known as “shut up and calculate”. Instead of arguing about the nature of an indeterminately philosophical “real world”, Nima thinks quantum physics is a tool to calculate things people can observe in experiments, and that’s the part we should care about.

From a practical perspective, I agree with him. And I think if you have this perspective, then ultimately, kinematic space is where your theories have to live. Kinematic space is nothing more or less than the space of observations, the space defined by where things land in your detectors, or if you’re a human and not a collider, in your eyes. If you want to strip away all the speculation about the nature of reality, this is all that is left over. Any theory, of any reality, will have to be described in this way. So if you think reality might need a totally new weird theory, it makes sense to approach things like Nima does, and start with the one thing that will always remain: observations.

I Ain’t Afraid of No-Ghost Theorems

In honor of Halloween this week, let me say a bit about the spookiest term in physics: ghosts.

In particle physics, we talk about the universe in terms of quantum fields. There is an electron field for electrons, a gluon field for gluons, a Higgs field for Higgs bosons. The simplest fields, for the simplest particles, can be described in terms of just a single number at each point in space and time, a value describing how strong the field is. More complicated fields require more numbers.

Most of the fundamental forces have what we call vector fields. They’re called this because they are often described with vectors, lists of numbers that identify a direction in space and time. But these vectors actually contain too many numbers.

These extra numbers have to be tidied up in some way in order to describe vector fields in the real world, like the electromagnetic field or the gluon field of the strong nuclear force. There are a number of tricks, but the nicest is usually to add some extra particles called ghosts. Ghosts are designed to cancel out the extra numbers in a vector, leaving the right description for a vector field. They’re set up mathematically such that they can never be observed, they’re just a mathematical trick.

Mathematical tricks aren’t all that spooky (unless you’re scared of mathematics itself, anyway). But in physics, ghosts can take on a spookier role as well.

In order to do their job cancelling those numbers, ghosts need to function as a kind of opposite to a normal particle, a sort of undead particle. Normal particles have kinetic energy: as they go faster and faster, they have more and more energy. Said another way, it takes more and more energy to make them go faster. Ghosts have negative kinetic energy: the faster they go, the less energy they have.

If ghosts are just a mathematical trick, that’s fine, they’ll do their job and cancel out what they’re supposed to. But sometimes, physicists accidentally write down a theory where the ghosts aren’t just a trick cancelling something out, but real particles you could detect, without anything to hide them away.

In a theory where ghosts really exist, the universe stops making sense. The universe defaults to the lowest energy it can reach. If making a ghost particle go faster reduces its energy, then the universe will make ghost particles go faster and faster, and make more and more ghost particles, until everything is jam-packed with super-speedy ghosts unto infinity, never-ending because it’s always possible to reduce the energy by adding more ghosts.

The absence of ghosts, then, is a requirement for a sensible theory. People prove theorems showing that their new ideas don’t create ghosts. And if your theory does start seeing ghosts…well, that’s the spookiest omen of all: an omen that your theory is wrong.

Transforming Particles Are Probably Here to Stay

It can be tempting to imagine the world in terms of lego-like building-blocks. Atoms stick together protons, neutrons, and electrons, and protons and neutrons are made of stuck-together quarks in turn. And while atoms, despite the name, aren’t indivisible, you might think that if you look small enough you’ll find indivisible, unchanging pieces, the smallest building-blocks of reality.

Part of that is true. We might, at some point, find the smallest pieces, the things everything else is made of. (In a sense, it’s quite likely we’ve already found them!) But those pieces don’t behave like lego blocks. They aren’t indivisible and unchanging.

Instead, particles, even the most fundamental particles, transform! The most familiar example is beta decay, a radioactive process where a neutron turns into a proton, emitting an electron and a neutrino. This process can be explained in terms of more fundamental particles: the neutron is made of three quarks, and one of those quarks transforms from a “down quark” to an “up quark”. But the explanation, as far as we can tell, doesn’t go any deeper. Quarks aren’t unchanging, they transform.

Beta decay! Ignore the W, which is important but not for this post.

There’s a suggestion I keep hearing, both from curious amateurs and from dedicated crackpots: why doesn’t this mean that quarks have parts? If a down quark can turn into an up dark, an electron, and a neutrino, then why doesn’t that mean that a down quark contains an up quark, an electron, and a neutrino?

The simplest reason is that this isn’t the only way a quark transforms. You can also have beta-plus decay, where an up quark transforms into a down quark, emitting a neutrino and the positively charged anti-particle of the electron, called a positron.

Also, ignore the directions of the arrows, that’s weird particle physics notation that doesn’t matter here.

So to make your idea work, you’d somehow need each down quark to contain an up quark plus some other particles, and each up quark to contain a down quark plus some other particles.

Can you figure out some complicated scheme that works like that? Maybe. But there’s a deeper reason why this is the wrong path.

Transforming particles are part of a broader phenomenon, called particle production. Reactions in particle physics can produce new particles that weren’t there before. This wasn’t part of the earliest theories of quantum mechanics that described one electron at a time. But if you want to consider the quantum properties of not just electrons, but the electric field as well, then you need a more complete theory, called a quantum field theory. And in those theories, you can produce new particles. It’s as simple as turning on the lights: from a wiggling electron, you make light, which in a fully quantum theory is made up of photons. Those photons weren’t “part of” the electron to start with, they are produced by its motion.

If you want to avoid transforming particles, to describe everything in terms of lego-like building-blocks, then you want to avoid particle production altogether. Can you do this in a quantum field theory?

Actually, yes! But your theory won’t describe the whole of the real world.

In physics, we have examples of theories that don’t have particle production. These example theories have a property called integrability. They are theories we can “solve”, doing calculations that aren’t possible in ordinary theories, named after the fact that the oldest such theories in classical mechanics were solved using integrals.

Normal particle physics theories have conserved charges. Beta decay conserves electric charge: you start out with a neutral particle, and end up with one particle with positive charge and another with negative charge. It also conserves other things, like “electron-number” (the electron has electron-number one, the neutrino that comes out with it has electron-number minus one), energy, and momentum.

Integrable theories have those charges too, but they have more. In fact, they have an infinite number of conserved charges. As a result, you can show that in these theories it is impossible to produce new particles. It’s as if each particle’s existence is its own kind of conserved charge, one that can never be created or destroyed, so that each collision just rearranges the particles, never makes new ones.

But while we can write down these theories, we know they can’t describe the whole of the real world. In an integrable theory, when you build things up from the fundamental building-blocks, their energy follows a pattern. Compare the energy of a bunch of different combinations, and you find a characteristic kind of statistical behavior called a Poisson distribution.

Look at the distribution of energies of nuclei of atoms, and you’ll find a very different kind of behavior. It’s called a Wigner-Dyson distribution, and it indicates the opposite of integrability: chaos. Chaos is behavior that can’t be “solved” like integrable theories, behavior that has to be approached by simulations and approximations.

So if you really want there to be un-changing building-blocks, if you think that’s really essential? Then you should probably start looking at integrable theories. But I wouldn’t hold my breath if I were you: the real world seems pretty clearly chaotic, not integrable. And probably, that means particle production is here to stay.

Lack of Recognition Is a Symptom, Not a Cause

Science is all about being first. Once a discovery has been made, discovering the same thing again is redundant. At best, you can improve the statistical evidence…but for a theorem or a concept, you don’t even have that. This is why we make such a big deal about priority: the first person to discover something did something very valuable. The second, no matter how much effort and insight went into their work, did not.

Because priority matters, for every big scientific discovery there is a priority dispute. Read about science’s greatest hits, and you’ll find people who were left in the wings despite their accomplishments, people who arguably found key ideas and key discoveries earlier than the people who ended up famous. That’s why the idea Peter Higgs is best known for, the Higgs mechanism,

“is therefore also called the Brout–Englert–Higgs mechanism, or Englert–Brout–Higgs–Guralnik–Hagen–Kibble mechanism, Anderson–Higgs mechanism,Anderson–Higgs–Kibble mechanism, Higgs–Kibble mechanism by Abdus Salam and ABEGHHK’tH mechanism (for Anderson, Brout, Englert, Guralnik, Hagen, Higgs, Kibble, and ‘t Hooft) by Peter Higgs.”

Those who don’t get the fame don’t get the rewards. The scientists who get less recognition than they deserve get fewer grants and worse positions, losing out on the career outcomes that the person famous for the discovery gets, even if the less-recognized scientist made the discovery first.

…at least, that’s the usual story.

You can start to see the problem when you notice a contradiction: if a discovery has already been made, what would bring someone to re-make it?

Sometimes, people actually “steal” discoveries, finding something that isn’t widely known and re-publishing it without acknowledging the author. More often, though, the re-discoverer genuinely didn’t know. That’s because, in the real world, we don’t all know about a discovery as soon as it’s made. It has to be communicated.

At minimum, this means you need enough time to finish ironing out the kinks of your idea, write up a paper, and disseminate it. In the days before the internet, dissemination might involve mailing pre-prints to universities across the ocean. It’s relatively easy, in such a world, for two people to get started discovering the same thing, write it up, and even publish it before they learn about the other person’s work.

Sometimes, though, something gets rediscovered long after the original paper should have been available. In those cases, the problem isn’t time, it’s reach. Maybe the original paper was written in a way that hid its implications. Maybe it was published in a way only accessible to a smaller community: either a smaller part of the world, like papers that were only available to researchers in the USSR, or a smaller research community. Maybe the time hadn’t come yet, and the whole reason why the result mattered had yet to really materialize.

For a result like that, a lack of citations isn’t really the problem. Rather than someone who struggles because their work is overlooked, these are people whose work is overlooked, in a sense, because they are struggling: because their work is having a smaller impact on the work of others. Acknowledging them later can do something, but it can’t change the fact that this was work published for a smaller community, yielding smaller rewards.

And ultimately, it isn’t just priority we care about, but impact. While the first European to make contact with the New World might have been Erik the Red, we don’t call the massive exchange of plants and animals between the Old and New World the “Red Exchange”. Erik the Red being “first” matters much less, historically speaking, than Columbus changing the world. Similarly, in science, being the first to discover something is meaningless if that discovery doesn’t change how other people do science, and the person who manages to cause that change is much more valuable than someone who does the same work but doesn’t manage the same reach.

Am I claiming that it’s fair when scientists get famous for other peoples’ discoveries? No, it’s definitely not fair. It’s not fair because most of the reasons one might have lesser reach aren’t under one’s control. Soviet scientists (for the most part) didn’t choose to be based in the USSR. People who make discoveries before they become relevant don’t choose the time in which they were born. And while you can get better at self-promotion with practice, there’s a limited extent to which often-reclusive scientists should be blamed for their lack of social skills.

What I am claiming is that addressing this isn’t a matter of scrupulously citing the “original” discoverer after the fact. That’s a patch, and a weak one. If we want to get science closer to the ideal, where each discovery only has to be made once, then we need to work to increase reach for everyone. That means finding ways to speed up publication, to let people quickly communicate preliminary ideas with a wide audience and change the incentives so people aren’t penalized when others take up those ideas. It means enabling conversations between different fields and sub-fields, building shared vocabulary and opportunities for dialogue. It means making a community that rewards in-person hand-shaking less and careful online documentation more, so that recognition isn’t limited to the people with the money to go to conferences and the social skills to schmooze their way through them. It means anonymity when possible, and openness when we can get away with it.

Lack of recognition and redundant effort are both bad, and they both stem from the same failures to communicate. Instead of fighting about who deserves fame, we should work to make sure that science is truly global and truly universal. We can aim for a future where no-one’s contribution goes unrecognized, and where anything that is known to one is known to all.

Congratulations to John Hopfield and Geoffrey Hinton!

The 2024 Physics Nobel Prize was announced this week, awarded to John Hopfield and Geoffrey Hinton for using physics to propose foundational ideas in the artificial neural networks used for machine learning.

If the picture above looks off-center, it’s because this is the first time since 2015 that the Physics Nobel has been given to two, rather than three, people. Since several past prizes bundled together disparate ideas in order to make a full group of three, it’s noteworthy that this year the committee decided that each of these people deserved 1/2 the prize amount, without trying to find one more person to water it down further.

Hopfield was trained as a physicist, working in the broad area known as “condensed matter physics”. Condensed matter physicists use physics to describe materials, from semiconductors to crystals to glass. Over the years, Hopfield started using this training less for the traditional subject matter of the field and more to study the properties of living systems. He moved from a position in the physics department of Princeton to chemistry and biology at Caltech. While at Caltech he started studying neuroscience and proposed what are now known as Hopfield networks as a model for how neurons store memory. Hopfield networks have very similar properties to a more traditional condensed matter system called a “spin glass”, and from what he knew about those systems Hopfield could make predictions for how his networks would behave. Those networks would go on to be a major inspiration for the artificial neural networks used for machine learning today.

Hinton was not trained as a physicist, and in fact has said that he didn’t pursue physics in school because the math was too hard! Instead, he got a bachelor’s degree in psychology, and a PhD in the at the time nascent field of artificial intelligence. In the 1980’s, shortly after Hopfield published his network, Hinton proposed a network inspired by a closely related area of physics, one that describes temperature in terms of the statistics of moving particles. His network, called a Boltzmann machine, would be modified and made more efficient over the years, eventually becoming a key part of how artificial neural networks are “trained”.

These people obviously did something impressive. Was it physics?

In 2014, the Nobel prize in physics was awarded to the people who developed blue LEDs. Some of these people were trained as physicists, some weren’t: Wikipedia describes them as engineers. At the time, I argued that this was fine, because these people were doing “something physicists are good at”, studying the properties of a physical system. Ultimately, the thing that ties together different areas of physics is training: physicists are the people who study under other physicists, and go on to collaborate with other physicists. That can evolve in unexpected directions, from more mathematical research to touching on biology and social science…but as long as the work benefits from being linked to physics departments and physics degrees, it makes sense to say it “counts as physics”.

By that logic, we can probably call Hopfield’s work physics. Hinton is more uncertain: his work was inspired by a physical system, but so are other ideas in computer science, like simulated annealing. Other ideas, like genetic algorithms, are inspired by biological systems: does that mean they count as biology?

Then there’s the question of the Nobel itself. If you want to get a Nobel in physics, it usually isn’t enough to transform the field. Your idea has to actually be tested against nature. Theoretical physics is its own discipline, with several ideas that have had an enormous influence on how people investigate new theories, ideas which have never gotten Nobels because the ideas were not intended, by themselves, to describe the real world. Hopfield networks and Boltzmann machines, similarly, do not exist as physical systems in the real world. They exist as computer simulations, and it is those computer simulations that are useful. But one can simulate many ideas in physics, and that doesn’t tend to be enough by itself to get a Nobel.

Ultimately, though, I don’t think this way of thinking about things is helpful. The Nobel isn’t capable of being “fair”, there’s no objective standard for Nobel-worthiness, and not much reason for there to be. The Nobel doesn’t determine which new research gets funded, nor does it incentivize anyone (except maybe Brian Keating). Instead, I think the best way of thinking about the Nobel these days is a bit like Disney.

When Disney was young, its movies had to stand or fall on their own merits. Now, with so many iconic movies in its history, Disney movies are received in the context of that history. Movies like Frozen or Moana aren’t just trying to be a good movie by themselves, they’re trying to be a Disney movie, with all that entails.

Similarly, when the Nobel was young, it was just another award, trying to reward things that Alfred Nobel might have thought deserved rewarding. Now, though, each Nobel prize is expected to be “Nobel-like”, an analogy between each laureate and the laureates of the past. When new people are given Nobels the committee is on some level consciously telling a story, saying that these people fit into the prize’s history.

This year, the Nobel committee clearly wanted to say something about AI. There is no Nobel prize for computer science, or even a Nobel prize for mathematics. (Hinton already has the Turing award, the most prestigious award in computer science.) So to say something about AI, the Nobel committee gave rewards in other fields. In addition to physics, this year’s chemistry award went in part to the people behind AlphaFold2, a machine learning tool to predict what shapes proteins fold into. For both prizes, the committee had a reasonable justification. AlphaFold2 genuinely is an amazing advance in the chemistry of proteins, a research tool like nothing that came before. And the work of Hopfield and Hinton did lead ideas in physics to have an enormous impact on the world, an impact that is worth recognizing. Ultimately, though, whether or not these people should have gotten the Nobel doesn’t depend on that justification. It’s an aesthetic decision, one that (unlike Disney’s baffling decision to make live-action remakes of their most famous movies) doesn’t even need to impress customers. It’s a question of whether the action is “Nobel-ish” enough, according to the tastes of the Nobel committee. The Nobel is essentially expensive fanfiction of itself.

And honestly? That’s fine. I don’t think there’s anything else they could be doing at this point.

At Quanta This Week, With a Piece on Multiple Imputation

I’ve got another piece in Quanta Magazine this week.

While my past articles in Quanta have been about physics, this time I’m stretching my science journalism muscles in a new direction. I was chatting with a friend who works for a pharmaceutical company, and he told me about a statistical technique that sounded ridiculous. Luckily, he’s a patient person, and after annoying him and a statistician family member for a while I understood that the technique actually made sense. Since I love sharing counterintuitive facts, I thought this would be a great story to share with Quanta’s readers. I then tracked down more statisticians, and annoyed them in a more professional way, finally resulting in the Quanta piece.

The technique is called multiple imputation, and is a way to deal with missing data. By filling in (“imputing”) missing information with good enough guesses, you can treat a dataset with missing data as if it was complete. If you do this imputation multiple times with the help of a source of randomness, you can also model how uncertain those guesses are, so your final statistical estimates are as uncertain as they ought to be. That, in a nutshell, is multiple imputation.

In the piece, I try to cover the key points: how the technique came to be, how it spread, and why people use it. To complement that, in this post I wanted to get a little bit closer to the technical details, and say a bit about why some of the workarounds a naive physicist would come up with don’t actually work.

If you’re anything like me, multiple imputation sounds like a very weird way to deal with missing data. In order to fill in missing data, you have to use statistical techniques to find good guesses. Why can’t you just use the same techniques to analyze the data in the first place? And why do you have to use a random number generator to model your uncertainty, instead of just doing propagation of errors?

It turns out, you can sort of do both of these things. Full Information Maximum Likelihood is a method where you use all the data you have, and only the data you have, without imputing anything or throwing anything out. The catch is that you need a model, one with parameters you can try to find the most likely values for. Physicists usually do have a model like this (for example, the Standard Model), so I assumed everyone would. But for many things you want to measure in social science and medicine, you don’t have any such model, so multiple imputation ends up being more versatile in practice.

(If you want more detail on this, you need to read something written by actual statisticians. The aforementioned statistician family member has a website here that compares and contrasts multiple imputation with full information maximum likelihood.)

What about the randomness? It turns out there is yet another technique, called Fractional Imputation. While multiple imputation randomly chooses different values to impute, fractional imputation gives each value a weight based on the chance for it to come up. This gives the same result…if you can compute the weights, and store all the results. The impression I’ve gotten is that people are working on this, but it isn’t very well-developed.

“Just do propagation of errors”, the thing I wanted to suggest as a physicist, is much less of an option. In many of these datasets, you don’t attribute errors to the base data points to begin with. And on the other hand, if you want to be more sophisticated, then something like propagation of errors is too naive. You have a variety of different variables, correlated with each other in different ways, giving a complicated multivariate distribution. Propagation of errors is already pretty fraught when you go beyond linear relationships (something they don’t tend to tell baby physicists), using it for this would be pushing it rather too far.

The thing I next wanted to suggest, “just carry the distribution through the calculation”, turns out to relate to something I’ve called the “one philosophical problem of my sub-field”. In the area of physics I’ve worked in, a key question is what it means to have “done” an integral. Here, one can ask what it means to do a calculation on a distribution. In both cases, the end goal is to get numbers out: physics predictions on the one hand, statistical estimates on the other. You can get those numbers by “just” doing numerics, using randomness and approximations to estimate the number you’re interested in. And in a way, that’s all you can do. Any time you “just do the integral” or “just carry around the distribution”, the thing you get in the end is some function: it could be a well-understood function like a sine or log, or it could be an exotic function someone defined for that purpose. But whatever function you get, you get numbers out of it the same way. A sine or a log, on a computer, is just an approximation scheme, a program that outputs numbers.

(But we do still care about analytic results, we don’t “just” do numerics. That’s because understanding the analytics helps us do numerics better, we can get more precise numbers faster and more stably. If you’re just carrying around some arbitrarily wiggly distribution, it’s not clear you can do that.)

So at this point, I get it. I’m still curious to see how Fractional Imputation develops, and when I do have an actual model I’d lean to wanting to use Full Information Maximum Likelihood instead. (And there are probably some other caveats I may need to learn at some point!) But I’m comfortable with the idea that Multiple Imputation makes sense for the people using it.

The Mistakes Are the Intelligence

There’s a lot of hype around large language models, the foundational technology behind services like ChatGPT. Representatives of OpenAI have stated that, in a few years, these models might have “PhD-level intelligence“. On the other hand, at the time, ChatGPT couldn’t count the number of letter “r”s in the word “strawberry”. The model and the setup around it has improved, and GPT-4o1 apparently now gets the correct 3 “r”s…but I’m sure it makes other silly mistakes, mistakes an intelligent human would never make.

The mistakes made by large language models are important, due to the way those models are used. If people are going to use them for customer service, writing transcripts, or editing grammar, they don’t want to introduce obvious screwups. (Maybe this means they shouldn’t use the models this way!)

But the temptation is to go further, to say that these mistakes are proof that these models are, and will always be, dumb, not intelligent. And that’s not the right way to think about intelligence.

When we talk about intelligent people, when we think about measuring things like IQ, we’re looking at a collection of different traits. These traits typically go together in humans: a human who is good at one will usually be good at the others. But from the perspective of computer science, these traits are very different.

Intelligent people tend to be good at following complex instructions. They can remember more, and reason faster. They can hold a lot in their head at once, from positions of objects to vocabulary.

These are all things that computers, inherently, are very good at. When Turing wrote down his abstract description of a computer, he imagined a machine with infinite memory, able to follow any instructions with perfect fidelity. Nothing could live up to that ideal, but modern computers are much closer to it than humans. “Computer” used to be a job, with rooms full of people (often women) hired to do calculations for scientific projects. We don’t do that any more, machines have made that work superfluous.

What’s more, the kind of processing a Turing machine does is probably the only way to reliably answer questions. If you want to make sure you get the correct answer every time, then it seems that you can’t do better than to use a sufficiently powerful computer.

But while computer-the-machine replaced computer-the-job, mathematician-the-job still exists. And that’s because not all intelligence is about answering questions reliably.

Alexander Grothendieck was a famous mathematician, known for his deep insights and powerful ideas. According to legend, when giving a talk referring to prime numbers, someone in the audience asked him to name a specific prime. He named 57.

With a bit of work, any high-school student can figure out that 57, which equals 3 times 19, isn’t a prime number. A computer can easily figure out that 57 is not a prime number. Even ChatGPT knows that 57 is not a prime number.

But this doesn’t mean that Grothendieck was dumber than a high school student, or dumber than ChatGPT. Grothendieck was using a different kind of intelligence, the heuristic kind.

Heuristics are unreliable reasoning. They’re processes that get the right answer some of the time, but not all of the time. Because of that, though, they don’t have the same limits as reliable computer programs. Pick the right situation and the right conditions, and a heuristic can give you an answer faster than you could possibly get by following reliable rules.

Intelligent humans follow instructions well, but they also have good heuristics. They solve problems creatively, sometimes problems that are very hard for computers to address. People like Grothendieck make leaps of mathematical reasoning, guessing at the right argument before they have completely fleshed out a proof. This kind of intelligence is error-prone: rely on it, and you might claim 57 is prime. But at the moment, it’s our only intellectual advantage over machines.

Ultimately, ChatGPT is an advance in language processing, and language is a great example. Sentences don’t have definite meaning, we interpret what we read and hear in context, and sometimes our interpretation is wrong. Sometimes we hear words no-one actually said! It’s impossible, both for current technology and for the human brain, to process general text in a 100% reliable way. So large language models like GPT don’t do it reliably. They use an approximate model, a big complicated pile of rules tweaked over and over again until, enough of the time, they get the next word right in a text.

The kind of heuristic reasoning done by large language models is more effective than many people expected. Being able to predict the next word in a text unreliably also means you can write code unreliably, or count things unreliably, or do math unreliably. You can’t do any of these things as well as an appropriately-chosen human, at least not with current resources.

But in the longer run, heuristic intelligence is precisely the type of intelligence we should aspire to…or fear. Right now, we hire humans to do intellectual work because they have good heuristics. If we could build a machine with equivalent or better heuristics for those tasks, then people would hire a lot fewer humans. And if you’re worried about AI taking over the world, you’re worried about AI coming up with shortcuts to our civilization, tricks we couldn’t anticipate or plan against that destroy everything we care about. Those tricks can’t come from following rules: if they did, we could discover them just as easily. They would have to come from heuristics, sideways solutions that don’t work all the time but happen to work the one time that matters.

So yes, until the latest release, ChatGPT couldn’t tell you how many “r”s are in “strawberry”. Counting “r”s is something computers could already do, because it’s something that can be done by following reliable rules. It’s also something you can do easily, if you follow reliable rules. ChatGPT impresses people because it can do some of the things you do, that can’t be done with reliable rules. If technology like it has any chance of changing the world, those are the kinds of things it will have to be able to do.

The Bystander Effect for Reviewers

I probably came off last week as a bit of an extreme “journal abolitionist”. This week, I wanted to give a couple caveats.

First, as a commenter pointed out, the main journals we use in my field are run by nonprofits. Physical Review Letters, the journal where we publish five-page papers about flashy results, is run by the American Physical Society. The Journal of High-Energy Physics, where we publish almost everything else, is run by SISSA, the International School for Advanced Studies in Trieste. (SISSA does use Springer, a regular for-profit publisher, to do the actual publishing.)

The journals are also funded collectively, something I pointed out here before but might not have been obvious to readers of last week’s post. There is an agreement, SCOAP3, where research institutions band together to pay the journals. Authors don’t have to pay to publish, and individual libraries don’t have to pay for subscriptions.

And this is a lot better than the situation in other fields, yeah! Though I’d love to quantify how much. I haven’t been able to find a detailed breakdown, but SCOAP3 pays around 1200 EUR per article published. What I’d like to do (but not this week) is to compare this to what other fields pay, as well as to publishing that doesn’t have the same sort of trapped audience, and to online-only free journals like SciPost. (For example, publishing actual physical copies of journals at this point is sort of a vanity thing, so maybe we should compare costs to vanity publishers?)

Second, there’s reviewing itself. Even without traditional journals, one might still want to keep peer review.

What I wanted to understand last week was what peer review does right now, in my field. We read papers fresh off the arXiv, before they’ve gone through peer review. Authors aren’t forced to update the arXiv with the journal version of their paper, if they want another version, even if that version was rejected by the reviewers, then they’re free to do so, and most of us wouldn’t notice. And the sort of in-depth review that happens in peer review also happens without it. When we have journal clubs and nominate someone to present a recent paper, or when we try to build on a result or figure out why it contradicts something we thought we knew, we go through the same kind of in-depth reading that (in the best cases) reviewers do.

But I think I’ve hit upon something review does that those kinds of informal things don’t. It gets us to speak up about it.

I presented at a journal club recently. I read through a bombastic new paper, figured out what I thought was wrong with it, and explained it to my colleagues.

But did I reach out to the author? No, of course not, that would be weird.

Psychologists talk about the bystander effect. If someone collapses on the street, and you’re the only person nearby, you’ll help. If you’re one of many, you’ll wait and see if someone else helps instead.

I think there’s a bystander effect for correcting people. If someone makes a mistake and publishes something wrong, we’ll gripe about it to each other. But typically, we won’t feel like it’s our place to tell the author. We might get into a frustrating argument, there wouldn’t be much in it for us, and it might hurt our reputation if the author is well-liked.

(People do speak up when they have something to gain, of course. That’s why when you write a paper, most of the people emailing you won’t be criticizing the science: they’ll be telling you you need to cite them.)

Peer review changes the expectations. Suddenly, you’re expected to criticize, it’s your social role. And you’re typically anonymous, you don’t have to worry about the consequences. It becomes a lot easier to say what you really think.

(It also becomes quite easy to say lazy stupid things, of course. This is why I like setups like SciPost, where reviews are made public even when the reviewers are anonymous. It encourages people to put some effort in, and it means that others can see that a paper was rejected for bad reasons and put less stock in the rejection.)

I think any new structure we put in place should keep this feature. We need to preserve some way to designate someone a critic, to give someone a social role that lets them let loose and explain why someone else is wrong. And having these designated critics around does help my field. The good criticisms get implemented in the papers, the authors put the new versions up on arXiv. Reviewing papers for journals does make our science better…even if none of us read the journal itself.