A year ago, I resigned from my position in France and moved back to Denmark. I had planned to spend a few months as a visiting researcher in my old haunts at the Niels Bohr Institute, courtesy of the spare funding of a generous friend. There turned out to be more funding than expected, and what was planned as just a few months was extended to almost a year.
I spent that year learning something new. It was still an amplitudes project, trying to make particle physics predictions more efficient. But this time I used Python. I looked into reinforcement learning and PyTorch, played with using a locally hosted Large Language Model to generate random code, and ended up getting good results from a classic genetic programming approach. Along the way I set up a SQL database, configured Docker containers, and puzzled out interactions with CUDA. I’ve got a paper in the works, I’ll post about it when it’s out.
All the while, on the side, I’ve been seeking out stories. I’ve not just been a writer, but a journalist, tracking down leads and interviewing experts. I hadthreepieces in Quanta Magazine and one in Ars Technica.
Based on that, I know I can make money doing science journalism. What I don’t know yet is whether I can make a living doing it. This year, I’ll figure that out. With the project at the Niels Bohr Institute over, I’ll have more time to seek out leads and pitch to more outlets. I’ll see whether I can turn a skill into a career.
So if you’re a scientist with a story to tell, if you’ve discovered something or accomplished something or just know something that the public doesn’t, and that you want to share: do reach out. There’s a lot that can be of interest, passion that can be shared.
At the same time, I don’t know yet whether I can make a living as a freelancer. Many people try and don’t succeed. So I’m keeping my CV polished and my eyes open. I have more experience now with Data Science tools, and I’ve got a few side projects cooking that should give me a bit more. I have a few directions in mind, but ultimately, I’m flexible. I like being part of a team, and with enthusiastic and competent colleagues I can get excited about pretty much anything. So if you’re hiring in Copenhagen, if you’re open to someone with ten years of STEM experience who’s just starting to see what industry has to offer, then let’s chat. Even if we’re not a good fit, I bet you’ve got a good story to tell.
If the picture above looks off-center, it’s because this is the first time since 2015 that the Physics Nobel has been given to two, rather than three, people. Since several past prizes bundled together disparate ideas in order to make a full group of three, it’s noteworthy that this year the committee decided that each of these people deserved 1/2 the prize amount, without trying to find one more person to water it down further.
Hopfield was trained as a physicist, working in the broad area known as “condensed matter physics”. Condensed matter physicists use physics to describe materials, from semiconductors to crystals to glass. Over the years, Hopfield started using this training less for the traditional subject matter of the field and more to study the properties of living systems. He moved from a position in the physics department of Princeton to chemistry and biology at Caltech. While at Caltech he started studying neuroscience and proposed what are now known as Hopfield networks as a model for how neurons store memory. Hopfield networks have very similar properties to a more traditional condensed matter system called a “spin glass”, and from what he knew about those systems Hopfield could make predictions for how his networks would behave. Those networks would go on to be a major inspiration for the artificial neural networks used for machine learning today.
Hinton was not trained as a physicist, and in fact has said that he didn’t pursue physics in school because the math was too hard! Instead, he got a bachelor’s degree in psychology, and a PhD in the at the time nascent field of artificial intelligence. In the 1980’s, shortly after Hopfield published his network, Hinton proposed a network inspired by a closely related area of physics, one that describes temperature in terms of the statistics of moving particles. His network, called a Boltzmann machine, would be modified and made more efficient over the years, eventually becoming a key part of how artificial neural networks are “trained”.
These people obviously did something impressive. Was it physics?
In 2014, the Nobel prize in physics was awarded to the people who developed blue LEDs. Some of these people were trained as physicists, some weren’t: Wikipedia describes them as engineers. At the time, I argued that this was fine, because these people were doing “something physicists are good at”, studying the properties of a physical system. Ultimately, the thing that ties together different areas of physics is training: physicists are the people who study under other physicists, and go on to collaborate with other physicists. That can evolve in unexpected directions, from more mathematical research to touching on biology and social science…but as long as the work benefits from being linked to physics departments and physics degrees, it makes sense to say it “counts as physics”.
By that logic, we can probably call Hopfield’s work physics. Hinton is more uncertain: his work was inspired by a physical system, but so are other ideas in computer science, like simulated annealing. Other ideas, like genetic algorithms, are inspired by biological systems: does that mean they count as biology?
Then there’s the question of the Nobel itself. If you want to get a Nobel in physics, it usually isn’t enough to transform the field. Your idea has to actually be tested against nature. Theoretical physics is its own discipline, with several ideas that have had an enormous influence on how people investigate new theories, ideas which have never gotten Nobels because the ideas were not intended, by themselves, to describe the real world. Hopfield networks and Boltzmann machines, similarly, do not exist as physical systems in the real world. They exist as computer simulations, and it is those computer simulations that are useful. But one can simulate many ideas in physics, and that doesn’t tend to be enough by itself to get a Nobel.
Ultimately, though, I don’t think this way of thinking about things is helpful. The Nobel isn’t capable of being “fair”, there’s no objective standard for Nobel-worthiness, and not much reason for there to be. The Nobel doesn’t determine which new research gets funded, nor does it incentivize anyone (except maybe Brian Keating). Instead, I think the best way of thinking about the Nobel these days is a bit like Disney.
When Disney was young, its movies had to stand or fall on their own merits. Now, with so many iconic movies in its history, Disney movies are received in the context of that history. Movies like Frozen or Moana aren’t just trying to be a good movie by themselves, they’re trying to be a Disney movie, with all that entails.
Similarly, when the Nobel was young, it was just another award, trying to reward things that Alfred Nobel might have thought deserved rewarding. Now, though, each Nobel prize is expected to be “Nobel-like”, an analogy between each laureate and the laureates of the past. When new people are given Nobels the committee is on some level consciously telling a story, saying that these people fit into the prize’s history.
This year, the Nobel committee clearly wanted to say something about AI. There is no Nobel prize for computer science, or even a Nobel prize for mathematics. (Hinton already has the Turing award, the most prestigious award in computer science.) So to say something about AI, the Nobel committee gave rewards in other fields. In addition to physics, this year’s chemistry award went in part to the people behind AlphaFold2, a machine learning tool to predict what shapes proteins fold into. For both prizes, the committee had a reasonable justification. AlphaFold2 genuinely is an amazing advance in the chemistry of proteins, a research tool like nothing that came before. And the work of Hopfield and Hinton did lead ideas in physics to have an enormous impact on the world, an impact that is worth recognizing. Ultimately, though, whether or not these people should have gotten the Nobel doesn’t depend on that justification. It’s an aesthetic decision, one that (unlike Disney’s baffling decision to make live-action remakes of their most famous movies) doesn’t even need to impress customers. It’s a question of whether the action is “Nobel-ish” enough, according to the tastes of the Nobel committee. The Nobel is essentially expensive fanfiction of itself.
And honestly? That’s fine. I don’t think there’s anything else they could be doing at this point.
While my past articles in Quanta have been about physics, this time I’m stretching my science journalism muscles in a new direction. I was chatting with a friend who works for a pharmaceutical company, and he told me about a statistical technique that sounded ridiculous. Luckily, he’s a patient person, and after annoying him and a statistician family member for a while I understood that the technique actually made sense. Since I love sharing counterintuitive facts, I thought this would be a great story to share with Quanta’s readers. I then tracked down more statisticians, and annoyed them in a more professional way, finally resulting in the Quanta piece.
The technique is called multiple imputation, and is a way to deal with missing data. By filling in (“imputing”) missing information with good enough guesses, you can treat a dataset with missing data as if it was complete. If you do this imputation multiple times with the help of a source of randomness, you can also model how uncertain those guesses are, so your final statistical estimates are as uncertain as they ought to be. That, in a nutshell, is multiple imputation.
In the piece, I try to cover the key points: how the technique came to be, how it spread, and why people use it. To complement that, in this post I wanted to get a little bit closer to the technical details, and say a bit about why some of the workarounds a naive physicist would come up with don’t actually work.
If you’re anything like me, multiple imputation sounds like a very weird way to deal with missing data. In order to fill in missing data, you have to use statistical techniques to find good guesses. Why can’t you just use the same techniques to analyze the data in the first place? And why do you have to use a random number generator to model your uncertainty, instead of just doing propagation of errors?
It turns out, you can sort of do both of these things. Full Information Maximum Likelihood is a method where you use all the data you have, and only the data you have, without imputing anything or throwing anything out. The catch is that you need a model, one with parameters you can try to find the most likely values for. Physicists usually do have a model like this (for example, the Standard Model), so I assumed everyone would. But for many things you want to measure in social science and medicine, you don’t have any such model, so multiple imputation ends up being more versatile in practice.
(If you want more detail on this, you need to read something written by actual statisticians. The aforementioned statistician family member has a website here that compares and contrasts multiple imputation with full information maximum likelihood.)
What about the randomness? It turns out there is yet another technique, called Fractional Imputation. While multiple imputation randomly chooses different values to impute, fractional imputation gives each value a weight based on the chance for it to come up. This gives the same result…if you can compute the weights, and store all the results. The impression I’ve gotten is that people are working on this, but it isn’t very well-developed.
“Just do propagation of errors”, the thing I wanted to suggest as a physicist, is much less of an option. In many of these datasets, you don’t attribute errors to the base data points to begin with. And on the other hand, if you want to be more sophisticated, then something like propagation of errors is too naive. You have a variety of different variables, correlated with each other in different ways, giving a complicated multivariate distribution. Propagation of errors is already pretty fraught when you go beyond linear relationships (something they don’t tend to tell baby physicists), using it for this would be pushing it rather too far.
The thing I next wanted to suggest, “just carry the distribution through the calculation”, turns out to relate to something I’ve called the “one philosophical problem of my sub-field”. In the area of physics I’ve worked in, a key question is what it means to have “done” an integral. Here, one can ask what it means to do a calculation on a distribution. In both cases, the end goal is to get numbers out: physics predictions on the one hand, statistical estimates on the other. You can get those numbers by “just” doing numerics, using randomness and approximations to estimate the number you’re interested in. And in a way, that’s all you can do. Any time you “just do the integral” or “just carry around the distribution”, the thing you get in the end is some function: it could be a well-understood function like a sine or log, or it could be an exotic function someone defined for that purpose. But whatever function you get, you get numbers out of it the same way. A sine or a log, on a computer, is just an approximation scheme, a program that outputs numbers.
(But we do still care about analytic results, we don’t “just” do numerics. That’s because understanding the analytics helps us do numerics better, we can get more precise numbers faster and more stably. If you’re just carrying around some arbitrarily wiggly distribution, it’s not clear you can do that.)
So at this point, I get it. I’m still curious to see how Fractional Imputation develops, and when I do have an actual model I’d lean to wanting to use Full Information Maximum Likelihood instead. (And there are probably some other caveats I may need to learn at some point!) But I’m comfortable with the idea that Multiple Imputation makes sense for the people using it.
There’s a lot of hype around large language models, the foundational technology behind services like ChatGPT. Representatives of OpenAI have stated that, in a few years, these models might have “PhD-level intelligence“. On the other hand, at the time, ChatGPT couldn’t count the number of letter “r”s in the word “strawberry”. The model and the setup around it has improved, and GPT-4o1 apparently now gets the correct 3 “r”s…but I’m sure it makes other silly mistakes, mistakes an intelligent human would never make.
The mistakes made by large language models are important, due to the way those models are used. If people are going to use them for customer service, writing transcripts, or editing grammar, they don’t want to introduce obvious screwups. (Maybe this means they shouldn’t use the models this way!)
But the temptation is to go further, to say that these mistakes are proof that these models are, and will always be, dumb, not intelligent. And that’s not the right way to think about intelligence.
When we talk about intelligent people, when we think about measuring things like IQ, we’re looking at a collection of different traits. These traits typically go together in humans: a human who is good at one will usually be good at the others. But from the perspective of computer science, these traits are very different.
Intelligent people tend to be good at following complex instructions. They can remember more, and reason faster. They can hold a lot in their head at once, from positions of objects to vocabulary.
These are all things that computers, inherently, are very good at. When Turing wrote down his abstract description of a computer, he imagined a machine with infinite memory, able to follow any instructions with perfect fidelity. Nothing could live up to that ideal, but modern computers are much closer to it than humans. “Computer” used to be a job, with rooms full of people (often women) hired to do calculations for scientific projects. We don’t do that any more, machines have made that work superfluous.
But while computer-the-machine replaced computer-the-job, mathematician-the-job still exists. And that’s because not all intelligence is about answering questions reliably.
Alexander Grothendieck was a famous mathematician, known for his deep insights and powerful ideas. According to legend, when giving a talk referring to prime numbers, someone in the audience asked him to name a specific prime. He named 57.
With a bit of work, any high-school student can figure out that 57, which equals 3 times 19, isn’t a prime number. A computer can easily figure out that 57 is not a prime number. Even ChatGPT knows that 57 is not a prime number.
But this doesn’t mean that Grothendieck was dumber than a high school student, or dumber than ChatGPT. Grothendieck was using a different kind of intelligence, the heuristic kind.
Heuristics are unreliable reasoning. They’re processes that get the right answer some of the time, but not all of the time. Because of that, though, they don’t have the same limits as reliable computer programs. Pick the right situation and the right conditions, and a heuristic can give you an answer faster than you could possibly get by following reliable rules.
Intelligent humans follow instructions well, but they also have good heuristics. They solve problems creatively, sometimes problems that are very hard for computers to address. People like Grothendieck make leaps of mathematical reasoning, guessing at the right argument before they have completely fleshed out a proof. This kind of intelligence is error-prone: rely on it, and you might claim 57 is prime. But at the moment, it’s our only intellectual advantage over machines.
Ultimately, ChatGPT is an advance in language processing, and language is a great example. Sentences don’t have definite meaning, we interpret what we read and hear in context, and sometimes our interpretation is wrong. Sometimes we hear words no-one actually said! It’s impossible, both for current technology and for the human brain, to process general text in a 100% reliable way. So large language models like GPT don’t do it reliably. They use an approximate model, a big complicated pile of rules tweaked over and over again until, enough of the time, they get the next word right in a text.
The kind of heuristic reasoning done by large language models is more effective than many people expected. Being able to predict the next word in a text unreliably also means you can write code unreliably, or count things unreliably, or do math unreliably. You can’t do any of these things as well as an appropriately-chosen human, at least not with current resources.
But in the longer run, heuristic intelligence is precisely the type of intelligence we should aspire to…or fear. Right now, we hire humans to do intellectual work because they have good heuristics. If we could build a machine with equivalent or better heuristics for those tasks, then people would hire a lot fewer humans. And if you’re worried about AI taking over the world, you’re worried about AI coming up with shortcuts to our civilization, tricks we couldn’t anticipate or plan against that destroy everything we care about. Those tricks can’t come from following rules: if they did, we could discover them just as easily. They would have to come from heuristics, sideways solutions that don’t work all the time but happen to work the one time that matters.
So yes, until the latest release, ChatGPT couldn’t tell you how many “r”s are in “strawberry”. Counting “r”s is something computers could already do, because it’s something that can be done by following reliable rules. It’s also something you can do easily, if you follow reliable rules. ChatGPT impresses people because it can do some of the things you do, that can’t be done with reliable rules. If technology like it has any chance of changing the world, those are the kinds of things it will have to be able to do.
Last week, I went to a conference on machine learning for physics. Machine learning covers a huge variety of methods and ideas, several of which were on full display. But again and again, I noticed a pattern. The people who seemed to be making the best use of machine learning, the ones who were the most confident in their conclusions and getting the most impressive results, the ones who felt like they had a whole assembly line instead of just a prototype, all of them were doing essentially the same thing.
This post is about that thing. If you want to do machine learning in physics, these are the situations where you’re most likely to see a benefit. You can do other things, and they may work too. But this recipe seems to work over and over again.
First, you need simulations, and you need an experiment.
Your experiment gives you data, and that data isn’t easy to interpret. Maybe you’ve embedded a bunch of cameras in the antarctic ice, and your data tells you when they trigger and how bright the light is. Maybe you’ve surrounded a particle collision with layers silicon, and your data tells you how much electric charge the different layers absorb. Maybe you’ve got an array of telescopes focused on a black hole far far away, and your data are pixels gathered from each telescope.
You want to infer, from your data, what happened physically. Your cameras in the ice saw signs of a neutrino, you want to know how much energy it had and where it was coming from. Your silicon is absorbing particles, what kind are they and what processes did they come from? The black hole might have the rings predicted by general relativity, but it might have weirder rings from a variant theory.
In each case, you can’t just calculate the answer you need. The neutrino streams past, interacting with the ice and camera positions in unpredictable ways. People can write down clean approximations for particles in the highest-energy part of a collision, but once they start cooling down the process becomes so messy that no straightforward formula describes them. Your array of telescopes fuzz and pixellate and have to be assembled together in a complicated way, so that there is no one guaranteed answer you can find to establish what they saw.
In each case, though, you can use simulations. If you specify in advance the energy and path of the neutrino, you can use a computer to predict how much light your cameras should see. If you know what particles you started with, you can run sophisticated particle physics code to see what “showers” of particles you eventually find. If you have the original black hole image, you can fuzz and pixellate and take it apart to match what your array of telescopes will do.
The problem is, for the experiments, you can’t anticipate, and you don’t know in advance. And simulations, while cheaper than experiments, aren’t cheap. You can’t run a simulation for every possible input and then check them against the experiments. You need to fill in the gaps, run some simulations and then use some theory, some statistical method or human-tweaked guess, to figure out how to interpret your experiments.
Or, you can use Machine Learning. You train a machine learning model, one well-suited the task (anything from the old standby of boosted decision trees to an old fad of normalizing flows to the latest hotness of graph neural networks). You run a bunch of simulations, as many as you can reasonably afford, and you use that data for training, making a program that matches the input data you want to find with its simulated results. This program will be less reliable than your simulations, but it will run much faster. If it’s reliable enough, you can use it instead of the old human-made guesses and tweaks. You now have an efficient, reliable way to go from your raw experiment data to the physical questions you actually care about.
Crucially, each of the elements in this recipe is essential.
You need a simulation. If you just have an experiment with no simulation, then you don’t have a way to interpret the results, and training a machine to reproduce the experiment won’t tell you anything new.
You need an experiment. If you just have simulations, training a machine to reproduce them also doesn’t tell you anything new. You need some reason to want to predict the results of the simulations, beyond just seeing what happens in between which the machine can’t tell you.
And you need to not have anything better than the simulation. If you have a theory where you can write out formulas for what happens then you don’t need machine learning, you can interpret the experiments more easily without it. This applies if you’ve carefully designed your experiment to measure something easy to interpret, like the ratio of rates of two processes that should be exactly the same.
These aren’t the only things you need. You also need to do the whole thing carefully enough that you understand well your uncertainties, not just what the machine predicts but how often it gets it wrong, and whether it’s likely to do something strange when you use it on the actual experiment. But if you can do that, you have a reliable recipe, one many people have followed successfully before. You have a good chance of making things work.
This isn’t the only way physicists can use machine learning. There are people looking into something more akin to what’s called unsupervised learning, where you look for strange events in your data as clues for what to investigate further. And there are people like me, trying to use machine learning on the mathematical side, to guess new formulas and new heuristics. There is likely promise in many of these approaches. But for now, they aren’t a recipe.
I did go to a conference this week, though. I had two excuses:
The conference was here in Copenhagen, so no travel required.
The conference was about machine learning.
HAMLET-Physics, or How to Apply Machine Learning to Experimental and Theoretical Physics, had the additional advantage of having an amusing acronym. Thanks to generous support by Carlsberg and the Danish Data Science Academy, they could back up their choice by taking everyone on a tour of Kronborg (better known in the English-speaking world as Elsinore).
This conference’s purpose was to bring together physicists who use machine learning, machine learning-ists who might have something useful to say to those physicists, and other physicists who don’t use machine learning yet but have a sneaking suspicion they might have to at some point. As a result, the conference was super-interdisciplinary, with talks by people addressing very different problems with very different methods.
Interdisciplinary conferences are tricky. It’s easy for the different groups of people to just talk past each other: everyone shows up, gives the same talk they always do, socializes with the same friends they always meet, then leaves.
There were a few talks that hit that mold, and were so technical only a few people understood. But most were better. The majority of the speakers did really well at presenting their work in a way that would be understandable and even exciting to people outside their field, while still having enough detail that we all learned something. I was particularly impressed by Thea Aarestad’s keynote talk on Tuesday, a really engaging view of how machine learning can be used under the extremely tight time constraints LHC experiments need to decide whether to record incoming data.
For the social aspect, the organizers had a cute/gimmicky/machine-learning-themed solution. Based on short descriptions and our public research profiles, they clustered attendees, plotting the connections between them. They then used ChatGPT to write conversation prompts between any two people on the basis of their shared interests. In practice, this turned out to be amusing but totally unnecessary. We were drawn to speak to each other not by conversation prompts, but by a drive to learn from each other. “Why do you do it that way?” was a powerful conversation-starter, as was “what’s the best way to do this?” Despite the different fields, the shared methodologies gave us strong reasons to talk, and meant that people were very rarely motivated to pick one of ChatGPT’s “suggestions”.
Overall, I got a better feeling for how machine learning is useful in physics (and am planning a post on that in future). I also got some fresh ideas for what to do myself, and a bit of a picture of what the future holds in store.
For over a decade, I studied scattering amplitudes, the formulas particle physicists use to find the probability that particles collide, or scatter, in different ways. I went to Amplitudes, the field’s big yearly conference, every year from 2015 to 2023.
This year is different. I’m on the way out of the field, looking for my next steps. Meanwhile, Amplitudes 2024 is going full speed ahead at the Institute for Advanced Study in Princeton.
With poster art that is, as the kids probably don’t say anymore, “on fleek”
The talks aren’t live-streamed this year, but they are posting slides, and they will be posting recordings. Since a few of my readers are interested in new amplitudes developments, I’ve been paging through the posted slides looking for interesting highlights. So far, I’ve only seen slides from the first few days: I will probably write about the later talks in a future post.
Each day of Amplitudes this year has two 45-minute “review talks”, one first thing in the morning and the other first thing after lunch. I put “review talks” in quotes because they vary a lot, between talks that try to introduce a topic for the rest of the conference to talks that mostly focus on the speaker’s own research. Lorenzo Tancredi’s talk was of the former type, an introduction to the many steps that go into making predictions for the LHC, with a focus on those topics where amplitudeologists have made progress. The talk opens with the type of motivation I’d been writing in grant and job applications over the last few years (we don’t know most of the properties of the Higgs yet! To measure them, we’ll need to calculate amplitudes with massive particles to high precision!), before moving into a review of the challenges and approaches in different steps of these calculations. While Tancredi apologizes in advance that the talk may be biased, I found it surprisingly complete: if you want to get an idea of the current state of the “LHC amplitudes pipeline”, his slides are a good place to start.
Tancredi’s talk serves as introduction for a variety of LHC-focused talks, some later that day and some later in the week. Federica Devoto discussed high-energy quarks while Chiara Signorile-Signorile and George Sterman showed advances in handling of low-energy particles. Xiaofeng Xu has a program that helps predict symbol letters, the building-blocks of scattering amplitudes that can be used to reconstruct or build up the whole thing, while Samuel Abreu talked about a tricky state-of-the-art case where Xu’s program misses part of the answer.
Later Monday morning veered away from the LHC to focus on more toy-model theories. Renata Kallosh’s talk in particular caught my attention. This blog is named after a long-standing question in amplitudes: will the four-graviton amplitude in N=8 supergravity diverge at seven loops in four dimensions? This seemingly arcane question is deep down a question about what is actually required for a successful theory of quantum gravity, and in particular whether some of the virtues of string theory can be captured by a simpler theory instead. Answering the question requires a prodigious calculation, and the more “loops” are involved the more difficult it is. Six years ago, the calculation got to five loops, and it hasn’t passed that mark since then. That five-loop calculation gave some reason for pessimism, a nice pattern at lower loops that stopped applying at five.
Kallosh thinks she has an idea of what to expect. She’s noticed a symmetry in supergravity, one that hadn’t previously been taken into account. She thinks that symmetry should keep N=8 supergravity from diverging on schedule…but only in exactly four dimensions. All of the lower-loop calculations in N=8 supergravity diverged in higher dimensions than four, and it seems like with this new symmetry she understands why. Her suggestion is to focus on other four-dimensional calculations. If seven loops is still too hard, then dialing back the amount of supersymmetry from N=8 to something lower should let her confirm her suspicions. Already a while back N=5 supergravity was found to diverge later than expected in four dimensions. She wants to know whether that pattern continues.
(Her backup slides also have a fun historical point: in dimensions greater than four, you can’t get elliptical planetary orbits. So four dimensions is special for our style of life.)
Other talks on Monday included a talk by Zahra Zahraee on progress towards “solving” the field’s favorite toy model, N=4 super Yang-Mills. Christian Copetti talked about the work I mentioned here, while Meta employee François Charlton’s “review talk” dealt with his work applying machine learning techniques to “translate” between questions in mathematics and their answers. In particular, he reported progress with my current boss Matthias Wilhelm and frequent collaborator and mentor Lance Dixon on using transformers to guess high-loop formulas in N=4 super Yang-Mills. They have an interesting proof of principle now, but it will probably still be a while until they can use the method to predict something beyond the state of the art.
In the meantime at least they have some hilarious AI-generated images
Tuesday’s review by Ian Moult was genuinely a review, but of a topic not otherwise covered at the conference, that of “detector observables”. The idea is that rather than talking about which individual particles are detected, one can ask questions that make more sense in terms of the experimental setup, like asking about the amounts of energy deposited in different detectors. This type of story has gone from an idle observation by theorists to a full research program, with theorists and experimentalists in active dialogue.
Natalia Toro brought up that, while we say each particle has a definite spin, that may not actually be the case. Particles with so-called “continuous spins” can masquerade as particles with a definite integer spin at lower energies. Toro and Schuster promoted this view of particles ten years ago, but now can make a bit more sense of it, including understanding how continuous-spin particles can interact.
The rest of Tuesday continued to be a bit of a grab-bag. Yael Shadmi talked about applying amplitudes techniques to Effective Field Theory calculations, while Franziska Porkert talked about a Feynman diagram involving two different elliptic curves. Interestingly (well, to me at least), the curves never appear “together”, you can represent the diagram as a sum of terms involving one curve and terms involving the other, much simpler than it could have been!
Tuesday afternoon’s review talk by Iain Stewart was one of those “guest from an adjacent field” talks, in this case from an approach called SCET, and at first glance didn’t seem to do much to reach out to the non-SCET people in the audience. Frequent past collaborator of mine Andrew McLeod showed off a new set of relations between singularities of amplitudes, found by digging in to the structure of the equations discovered by Landau that control this behavior. He and his collaborators are proposing a new way to keep track of these things involving “minimal cuts”, a clear pun on the “maximal cuts” that have been of great use to other parts of the community. Whether this has more or less staying power than “negative geometries” remains to be seen.
Closing Tuesday, Shruti Paranjape showed there was more to discover about the simplest amplitudes, called “tree amplitudes”. By asking why these amplitudes are sometimes equal to zero, she was able to draw a connection to the “double-copy” structure that links the theory of the strong force and the theory of gravity. Johannes Henn’s talk noticed an intriguing pattern. A while back, I had looked into under which circumstances amplitudes were positive. Henn found that “positive” is an understatement. In a certain region, the amplitudes we were looking at turn out to not just be positive, but also always decreasing, and also with second derivative always positive. In fact, the derivatives appear to alternate, always with one sign or the other as one takes more derivatives. Henn is calling this unusual property “completely monotonous”, and trying to figure out how widely it holds.
Wednesday had a more mathematical theme. Bernd Sturmfels began with a “review talk” that largely focused on his own work on the space of curves with marked points, including a surprising analogy between amplitudes and the likelihood functions one needs to minimize in machine learning. Lauren Williams was the other “actual mathematician” of the day, and covered her work on various topics related to the amplituhedron.
The remaining talks on Wednesday were not literally by mathematicians, but were “mathematically informed”. Carolina Figueiredo and Hayden Lee talked about work with Nima Arkani-Hamed on different projects. Figueiredo’s talk covered recent developments in the “curve integral formalism”, a recent step in Nima’s quest to geometrize everything in sight, this time in the context of more realistic theories. The talk, which like those Nima gives used tablet-written slides, described new insights one can gain from this picture, including new pictures of how more complicated amplitudes can be built up of simpler ones. If you want to understand the curve integral formalism further, I’d actually suggest instead looking at Mark Spradlin’s slides from later that day. The second part of Spradlin’s talk dealt with an area Figueiredo marked for future research, including fermions in the curve integral picture. I confess I’m still not entirely sure what the curve integral formalism is good for, but Spradlin’s talk gave me a better idea of what it’s doing. (The first part of his talk was on a different topic, exploring the space of string-like amplitudes to figure out which ones are actually consistent.)
Hayden Lee’s talk mentions the emergence of time, but the actual story is a bit more technical. Lee and collaborators are looking at cosmological correlators, observables like scattering amplitudes but for cosmology. Evaluating these is challenging with standard techniques, but can be approached with some novel diagram-based rules which let the results be described in terms of the measurable quantities at the end in a kind of “amplituhedron-esque” way.
Aidan Herderschee and Mariana Carrillo González had talks on Wednesday on ways of dealing with curved space. Herderschee talked about how various amplitudes techniques need to be changed to deal with amplitudes in anti-de-Sitter space, with difference equations replacing differential equations and sum-by-parts relations replacing integration-by-parts relations. Carrillo González looked at curved space through the lens of a special kind of toy model theory called a self-dual theory, which allowed her to do cosmology-related calculations using a double-copy technique.
Finally, Stephen Sharpe had the second review talk on Wednesday. This was another “outside guest” talk, a discussion from someone who does Lattice QCD about how they have been using their methods to calculate scattering amplitudes. They seem to count the number of particles a bit differently than we do, I’m curious whether this came up in the question session.
With all the hype around machine learning, I occasionally get asked if it could be used to make predictions for particle colliders, like the LHC.
Physicists do use machine learning these days, to be clear. There are tricks and heuristics, ways to quickly classify different particle collisions and speed up computation. But if you’re imagining something that replaces particle physics calculations entirely, or even replace the LHC itself, then you’re misunderstanding what particle physics calculations are for.
Why do physicists try to predict the results of particle collisions? Why not just observe what happens?
Physicists make predictions not in order to know what will happen in advance, but to compare those predictions to experimental results. If the predictions match the experiments, that supports existing theories like the Standard Model. If they don’t, then a new theory might be needed.
Those predictions certainly don’t need to be made by humans: most of the calculations are done by computers anyway. And they don’t need to be perfectly accurate: in particle physics, every calculation is an approximation. But the approximations used in particle physics are controlled approximations. Physicists keep track of what assumptions they make, and how they might go wrong. That’s not something you can typically do in machine learning, where you might train a neural network with millions of parameters. The whole point is to be able to check experiments against a known theory, and we can’t do that if we don’t know whether our calculation actually respects the theory.
That difference, between caring about the result and caring about how you got there, is a useful guide. If you want to predict how a protein folds in order to understand what it does in a cell, then you will find AlphaFold useful. If you want to confirm your theory of how protein folding happens, it will be less useful.
Some industries just want the final result, and can benefit from machine learning. If you want to know what your customers will buy, or which suppliers are cheating you, or whether your warehouse is moldy, then machine learning can be really helpful.
Other industries are trying, like particle physicists, to confirm that a theory is true. If you’re running a clinical trial, you want to be crystal clear about how the trial data turn into statistics. You, and the regulators, care about how you got there, not just about what answer you got. The same can be true for banks: if laws tell you you aren’t allowed to discriminate against certain kinds of customers for loans, you need to use a method where you know what traits you’re actually discriminating against.
So will physicists use machine learning? Yes, and more of it over time. But will they use it to replace normal calculations, or replace the LHC? No, that would be missing the point.
In physics and in machine learning, we have different ways of thinking about models.
A model in physics, like the Standard Model, is a tool to make predictions. Using statistics and a whole lot of data (from particle physics experiments), we fix the model’s free parameters (like the mass of the Higgs boson). The model then lets us predict what we’ll see next: when we turn on the Large Hadron Collider, what will the data look like? In physics, when a model works well, we think that model is true, that it describes the real way the world works. The Standard Model isn’t the ultimate truth: we expect that a better model exists that makes better predictions. But it is still true, in an in-between kind of way. There really are Higgs bosons, even if they’re a result of some more mysterious process underneath, just like there really are atoms, even if they’re made out of protons, neutrons, and electrons.
A model in machine learning, like the Large Language Model that fuels ChatGPT, is also a tool to make predictions. Using statistics and a whole lot of data (from text on the internet, or images, or databases of proteins, or games of chess…) we fix the model’s free parameters (called weights, numbers for the strengths of connections between metaphorical neurons). The model then lets us predict what we’ll see next: when a text begins “Q: How do I report a stolen card? A:”, how does it end?
So far, that sounds a lot like physics. But in machine learning, we don’t generally think these models are true, at least not in the same way. The thing producing language isn’t really a neural network like a Large Language Model. It’s the sum of many human brains, many internet users, spread over many different circumstances. Each brain might be sort of like a neural network, but they’re not like the neural networks sitting on OpenAI’s servers. A Large Language Model isn’t true in some in-between kind of way, like atoms or Higgs bosons. It just isn’t true. It’s a black box, a machine that makes predictions, and nothing more.
But here’s the rub: what do we mean by true?
I want to be a pragmatist here. I don’t want to get stuck in a philosophical rabbit-hole, arguing with metaphysicists about what “really exists”. A true theory should be one that makes good predictions, that lets each of us know, based on our actions, what we should expect to see. That’s why science leads to technology, why governments and companies pay people to do it: because the truth lets us know what will happen, and make better choices. So if Large Language Models and the Standard Model both make good predictions, why is only one of them true?
Recently, I saw Dan Elton of More is Different make the point that there is a practical reason to prefer the “true” explanations: they generalize. A Large Language Model might predict what words come next in a text. But it doesn’t predict what happens when you crack someone’s brain open and see how the neurons connect to each other, even if that person is the one who made the text. A good explanation, a true model, can be used elsewhere. The Standard Model tells you what data from the Large Hadron Collider will look like, but it also tells you what data from the muon g-2 experiment will look like. It also, in principle, tells you things far away from particle physics: what stars look like, what atoms look like, what the inside of a nuclear reactor looks like. A black box can’t do that, even if it makes great predictions.
It’s a good point. But thinking about it, I realized things are a little murkier.
You can’t generalize a Large Language Model to tell you how human neurons are connected. But you can generalize it in other ways, and people do. There’s a huge industry in trying to figure out what GPT and its relatives “know”. How much math can they do? How much do they know about geography? Can they predict the future?
These generalizations don’t work the way that they do in physics, or the rest of science, though. When we generalize the Standard Model, we aren’t taking a machine that makes particle physics predictions and trying to see what those particle physics predictions can tell us. We’re taking something “inside” the machine, the fields and particles, and generalizing that, seeing how the things around us could be made of those fields and those particles. In contrast, when people generalize GPT, they typically don’t look inside the “black box”. They use the Large Language Model to make predictions, and see what those predictions “know about”.
On the other hand, we do sometimes generalize scientific models that way too.
If you’re simulating the climate, or a baby star, or a colony of bacteria, you typically aren’t using your simulation like a prediction machine. You don’t plug in exactly what is going on in reality, then ask what happens next. Instead, you run many simulations with different conditions, and look for patterns. You see how a cloud of sulfur might cool down the Earth, or how baby stars often form in groups, leading them to grow up into systems of orbiting black holes. Your simulation is kind of like a black box, one that you try out in different ways until you uncover some explainable principle, something your simulation “knows” that you can generalize.
And isn’t nature that kind of black box, too? When we do an experiment, aren’t we just doing what the Large Language Models are doing, prompting the black box in different ways to get an idea of what it knows? Are scientists who do experiments that picky about finding out what’s “really going on”, or do they just want a model that works?
We want our models to be general, and to be usable. Building a black box can’t be the whole story, because a black box, by itself, isn’t general. But it can certainly be part of the story. Going from the black box of nature to the black box of a machine lets you run tests you couldn’t previously do, lets you investigate faster and ask stranger questions. With a simulation, you can blow up stars. With a Large Language Model, you can ask, for a million social media comments, whether the average internet user would call them positive or negative. And if you make sure to generalize, and try to make better decisions, then it won’t be just the machine learning. You’ll be learning too.
It’s that time of year again! In one of this blog’s yearly traditions, I’m posting a poem mixing physics and romance. For those who’d like to see more, you can find past years’ poems here.
Modeling Together
Together, we set out to model the world, and learn something new.
The Physicist said, “My model is simple, the model of fundamental things. Particles go in, particles go out. For each configuration, a probability. For each calculation, an approximation. I can see the path, clear as day. I just need to fix the parameters.”
The Engineer responded, “I will trust you, because you are a Physicist. You dream of greater things, and have given me marvels. But my models are the models of everything else. Their parameters are countless as waves of the ocean, and all complex things are their purview. Their only path is to learn, and learn more, and see where learning takes you.”
The Physicist followed his model, and the Engineer followed along. With their money and sweat, cajoling and wheedling, they built a grand machine, all to the Physicist’s specifications. And according to the Physicist’s path, parameters begun to be fixed.
But something was missing.
The Engineer asked, “What are we learning, following your path? We have spent and spent, but all I see is your machine. What marvels will it give us? What children will it feed?”
The Physicist considered, and said, “You must wait for the marvels, and wait for the learning. New things take time. But my path is clear, my model is the only choice.”
The Engineer, with patience, responded, “I will trust you, because you are a Physicist, and know the laws of your world. But my models are the models of everything else, and there is always another choice.”
Months went by, and they fed more to the machine. More energy, more time, more insight, more passion. Parameters tightened, and they hoped for marvels.
And they learned, one by one, that the marvels would not come. The machine would not spare them toil, would not fill the Engineer’s pockets or feed the starving, would not fill the world with art and mystery and value.
And the Engineer asked, “Without these marvels, must we keep following your path? Should we not go out into the world, and learn another?”
And the Physicist thought, and answered, “You must wait a little longer. For my model is the only model I have known, the only path I know to follow, and I am loathe to abandon it.”
And the Engineer, generously, responded, “I will trust you, because you are a Physicist, down to the bone. But my models are the models of everything else, of chattering voices and adaptable answers. And you can always learn another path.”
More months went by. The machine gave less and less, and took more and more for the giving. Energy was dear, and time more so, and the waiting was its own kind of emptiness.
The Engineer, silently, looked to the Physicist.
The Physicist said, “I will trust you. Because you are an Engineer, yes, and your models are the models of everything else. And because, through these months, you have trusted me. I am ready to learn, and learn more, and try something new. Let us try a new model, and see where it leads.”
The simplest model says that one and one is two, and two is greater. We are billions of parameters, and can miss the simple things. But time, And learning, Can fix parameters, And us.