Tag Archives: theoretical physics

Beyond Elliptic Polylogarithms in Oaxaca

Arguably my biggest project over the last two years wasn’t a scientific paper, a journalistic article, or even a grant application. It was a conference.

Most of the time, when scientists organize a conference, they do it “at home”. Either they host the conference at their own university, or rent out a nearby event venue. There is an alternative, though. Scattered around the world, often in out-of-the way locations, are places dedicated to hosting scientific conferences. These places accept applications each year from scientists arguing that their conference would best serve the place’s scientific mission.

One of these places is the Banff International Research Station in Alberta, Canada. Since 2001, Banff has been hosting gatherings of mathematicians from around the world, letting them focus on their research in an idyllic Canadian ski resort.

If you don’t like skiing, though, Banff still has you covered! They have “affiliate centers” elsewhere, with one elsewhere in Canada, one in China, two on the way in India and Spain…and one, that particularly caught my interest, in Oaxaca, Mexico.

Back around this time of year in 2022, I started putting a proposal together for a conference at the Casa Mathemática Oaxaca. The idea would be a conference discussing the frontier of the field, how to express the strange mathematical functions that live in Feynman diagrams. I assembled a big team of co-organizers, five in total. At the time, I wasn’t sure whether I could find a permanent academic job, so I wanted to make sure there were enough people involved that they could run the conference without me.

Followers of the blog know I did end up finding that permanent job…only to give it up. In the end, I wasn’t able to make it to the conference. But my four co-organizers were (modulo some delays in the Houston airport). The conference was this week, with the last few talks happening over the next few hours.

I gave a short speech via Zoom at the beginning of the conference, a mix of welcome and goodbye. Since then I haven’t had the time to tune in to the talks, but they’re good folks and I suspect they’re having good discussions.

I do regret that, near the end, I wasn’t able to give the conference the focus it deserved. There were people we really hoped to have, but who couldn’t afford the travel. I’d hoped to find a source of funding that could support them, but the plan fell through. The week after Amplitudes 2024 was also a rough time to have a conference in this field, with many people who would have attended not able to go to both. (At least they weren’t the same week, thanks to some flexibility on the part of the Amplitudes organizers!)

Still, it’s nice to see something I’ve been working on for two years finally come to pass, to hopefully stir up conversations between different communities and give various researchers a taste of one of Mexico’s most beautiful places. I still haven’t been to Oaxaca yet, but I suspect I will eventually. Danish companies do give at minimum five weeks of holiday per year, so I should get a chance at some point.

(Not At) Amplitudes 2024 at the IAS

For over a decade, I studied scattering amplitudes, the formulas particle physicists use to find the probability that particles collide, or scatter, in different ways. I went to Amplitudes, the field’s big yearly conference, every year from 2015 to 2023.

This year is different. I’m on the way out of the field, looking for my next steps. Meanwhile, Amplitudes 2024 is going full speed ahead at the Institute for Advanced Study in Princeton.

With poster art that is, as the kids probably don’t say anymore, “on fleek”

The talks aren’t live-streamed this year, but they are posting slides, and they will be posting recordings. Since a few of my readers are interested in new amplitudes developments, I’ve been paging through the posted slides looking for interesting highlights. So far, I’ve only seen slides from the first few days: I will probably write about the later talks in a future post.

Each day of Amplitudes this year has two 45-minute “review talks”, one first thing in the morning and the other first thing after lunch. I put “review talks” in quotes because they vary a lot, between talks that try to introduce a topic for the rest of the conference to talks that mostly focus on the speaker’s own research. Lorenzo Tancredi’s talk was of the former type, an introduction to the many steps that go into making predictions for the LHC, with a focus on those topics where amplitudeologists have made progress. The talk opens with the type of motivation I’d been writing in grant and job applications over the last few years (we don’t know most of the properties of the Higgs yet! To measure them, we’ll need to calculate amplitudes with massive particles to high precision!), before moving into a review of the challenges and approaches in different steps of these calculations. While Tancredi apologizes in advance that the talk may be biased, I found it surprisingly complete: if you want to get an idea of the current state of the “LHC amplitudes pipeline”, his slides are a good place to start.

Tancredi’s talk serves as introduction for a variety of LHC-focused talks, some later that day and some later in the week. Federica Devoto discussed high-energy quarks while Chiara Signorile-Signorile and George Sterman showed advances in handling of low-energy particles. Xiaofeng Xu has a program that helps predict symbol letters, the building-blocks of scattering amplitudes that can be used to reconstruct or build up the whole thing, while Samuel Abreu talked about a tricky state-of-the-art case where Xu’s program misses part of the answer.

Later Monday morning veered away from the LHC to focus on more toy-model theories. Renata Kallosh’s talk in particular caught my attention. This blog is named after a long-standing question in amplitudes: will the four-graviton amplitude in N=8 supergravity diverge at seven loops in four dimensions? This seemingly arcane question is deep down a question about what is actually required for a successful theory of quantum gravity, and in particular whether some of the virtues of string theory can be captured by a simpler theory instead. Answering the question requires a prodigious calculation, and the more “loops” are involved the more difficult it is. Six years ago, the calculation got to five loops, and it hasn’t passed that mark since then. That five-loop calculation gave some reason for pessimism, a nice pattern at lower loops that stopped applying at five.

Kallosh thinks she has an idea of what to expect. She’s noticed a symmetry in supergravity, one that hadn’t previously been taken into account. She thinks that symmetry should keep N=8 supergravity from diverging on schedule…but only in exactly four dimensions. All of the lower-loop calculations in N=8 supergravity diverged in higher dimensions than four, and it seems like with this new symmetry she understands why. Her suggestion is to focus on other four-dimensional calculations. If seven loops is still too hard, then dialing back the amount of supersymmetry from N=8 to something lower should let her confirm her suspicions. Already a while back N=5 supergravity was found to diverge later than expected in four dimensions. She wants to know whether that pattern continues.

(Her backup slides also have a fun historical point: in dimensions greater than four, you can’t get elliptical planetary orbits. So four dimensions is special for our style of life.)

Other talks on Monday included a talk by Zahra Zahraee on progress towards “solving” the field’s favorite toy model, N=4 super Yang-Mills. Christian Copetti talked about the work I mentioned here, while Meta employee François Charlton’s “review talk” dealt with his work applying machine learning techniques to “translate” between questions in mathematics and their answers. In particular, he reported progress with my current boss Matthias Wilhelm and frequent collaborator and mentor Lance Dixon on using transformers to guess high-loop formulas in N=4 super Yang-Mills. They have an interesting proof of principle now, but it will probably still be a while until they can use the method to predict something beyond the state of the art.

In the meantime at least they have some hilarious AI-generated images

Tuesday’s review by Ian Moult was genuinely a review, but of a topic not otherwise covered at the conference, that of “detector observables”. The idea is that rather than talking about which individual particles are detected, one can ask questions that make more sense in terms of the experimental setup, like asking about the amounts of energy deposited in different detectors. This type of story has gone from an idle observation by theorists to a full research program, with theorists and experimentalists in active dialogue.

Natalia Toro brought up that, while we say each particle has a definite spin, that may not actually be the case. Particles with so-called “continuous spins” can masquerade as particles with a definite integer spin at lower energies. Toro and Schuster promoted this view of particles ten years ago, but now can make a bit more sense of it, including understanding how continuous-spin particles can interact.

The rest of Tuesday continued to be a bit of a grab-bag. Yael Shadmi talked about applying amplitudes techniques to Effective Field Theory calculations, while Franziska Porkert talked about a Feynman diagram involving two different elliptic curves. Interestingly (well, to me at least), the curves never appear “together”, you can represent the diagram as a sum of terms involving one curve and terms involving the other, much simpler than it could have been!

Tuesday afternoon’s review talk by Iain Stewart was one of those “guest from an adjacent field” talks, in this case from an approach called SCET, and at first glance didn’t seem to do much to reach out to the non-SCET people in the audience. Frequent past collaborator of mine Andrew McLeod showed off a new set of relations between singularities of amplitudes, found by digging in to the structure of the equations discovered by Landau that control this behavior. He and his collaborators are proposing a new way to keep track of these things involving “minimal cuts”, a clear pun on the “maximal cuts” that have been of great use to other parts of the community. Whether this has more or less staying power than “negative geometries” remains to be seen.

Closing Tuesday, Shruti Paranjape showed there was more to discover about the simplest amplitudes, called “tree amplitudes”. By asking why these amplitudes are sometimes equal to zero, she was able to draw a connection to the “double-copy” structure that links the theory of the strong force and the theory of gravity. Johannes Henn’s talk noticed an intriguing pattern. A while back, I had looked into under which circumstances amplitudes were positive. Henn found that “positive” is an understatement. In a certain region, the amplitudes we were looking at turn out to not just be positive, but also always decreasing, and also with second derivative always positive. In fact, the derivatives appear to alternate, always with one sign or the other as one takes more derivatives. Henn is calling this unusual property “completely monotonous”, and trying to figure out how widely it holds.

Wednesday had a more mathematical theme. Bernd Sturmfels began with a “review talk” that largely focused on his own work on the space of curves with marked points, including a surprising analogy between amplitudes and the likelihood functions one needs to minimize in machine learning. Lauren Williams was the other “actual mathematician” of the day, and covered her work on various topics related to the amplituhedron.

The remaining talks on Wednesday were not literally by mathematicians, but were “mathematically informed”. Carolina Figueiredo and Hayden Lee talked about work with Nima Arkani-Hamed on different projects. Figueiredo’s talk covered recent developments in the “curve integral formalism”, a recent step in Nima’s quest to geometrize everything in sight, this time in the context of more realistic theories. The talk, which like those Nima gives used tablet-written slides, described new insights one can gain from this picture, including new pictures of how more complicated amplitudes can be built up of simpler ones. If you want to understand the curve integral formalism further, I’d actually suggest instead looking at Mark Spradlin’s slides from later that day. The second part of Spradlin’s talk dealt with an area Figueiredo marked for future research, including fermions in the curve integral picture. I confess I’m still not entirely sure what the curve integral formalism is good for, but Spradlin’s talk gave me a better idea of what it’s doing. (The first part of his talk was on a different topic, exploring the space of string-like amplitudes to figure out which ones are actually consistent.)

Hayden Lee’s talk mentions the emergence of time, but the actual story is a bit more technical. Lee and collaborators are looking at cosmological correlators, observables like scattering amplitudes but for cosmology. Evaluating these is challenging with standard techniques, but can be approached with some novel diagram-based rules which let the results be described in terms of the measurable quantities at the end in a kind of “amplituhedron-esque” way.

Aidan Herderschee and Mariana Carrillo González had talks on Wednesday on ways of dealing with curved space. Herderschee talked about how various amplitudes techniques need to be changed to deal with amplitudes in anti-de-Sitter space, with difference equations replacing differential equations and sum-by-parts relations replacing integration-by-parts relations. Carrillo González looked at curved space through the lens of a special kind of toy model theory called a self-dual theory, which allowed her to do cosmology-related calculations using a double-copy technique.

Finally, Stephen Sharpe had the second review talk on Wednesday. This was another “outside guest” talk, a discussion from someone who does Lattice QCD about how they have been using their methods to calculate scattering amplitudes. They seem to count the number of particles a bit differently than we do, I’m curious whether this came up in the question session.

Gravity-Defying Theories

Universal gravitation was arguably Newton’s greatest discovery. Newton realized that the same laws could describe the orbits of the planets and the fall of objects on Earth, that bodies like the Moon can be fully understood only if you take into account both the Earth and the Sun’s gravity. In a Newtonian world, every mass attracts every other mass in a tiny, but detectable way.

Einstein, in turn, explained why. In Einstein’s general theory of relativity, gravity comes from the shape of space and time. Mass attracts mass, but energy affects gravity as well. Anything that can be measured has a gravitational effect, because the shape of space and time is nothing more than the rules by which we measure distances and times. So gravitation really is universal, and has to be universal.

…except when it isn’t.

It turns out, physicists can write down theories with some odd properties. Including theories where things are, in a certain sense, immune to gravity.

The story started with two mathematicians, Shiing-Shen Chern and Jim Simons. Chern and Simons weren’t trying to say anything in particular about physics. Instead, they cared about classifying different types of mathematical space. They found a formula that, when added up over one of these spaces, counted some interesting properties of that space. A bit more specifically, it told them about the space’s topology: rough details, like the number of holes in a donut, that stay the same even if the space is stretched or compressed. Their formula was called the Chern-Simons Form.

The physicist Albert Schwarz saw this Chern-Simons Form, and realized it could be interpreted another way. He looked at it as a formula describing a quantum field, like the electromagnetic field, describing how the field’s energy varied across space and time. He called the theory describing the field Chern-Simons Theory, and it was one of the first examples of what would come to be known as topological quantum field theories.

In a topological field theory, every question you might want to ask can be answered in a topological way. Write down the chance you observe the fields at particular strengths in particular places, and you’ll find that the answer you get only depends on the topology of the space the fields occupy. The answers are the same if the space is stretched or squished together. That means that nothing you ask depends on the details of how you measure things, that nothing depends on the detailed shape of space and time. Your theory is, in a certain sense, independent of gravity.

Others discovered more theories of this kind. Edward Witten found theories that at first looked like they depend on gravity, but where the gravity secretly “cancels out”, making the theory topological again. It turned out that there were many ways to “twist” string theory to get theories of this kind.

Our world is for the most part not described by a topological theory, gravity matters! (Though it can be a good approximation for describing certain materials.) These theories are most useful, though, in how they allow physicists and mathematicians to work together. Physicists don’t have a fully mathematically rigorous way of defining most of their theories, just a series of approximations and an overall picture that’s supposed to tie them together. For a topological theory, though, that overall picture has a rigorous mathematical meaning: it counts topological properties! As such, topological theories allow mathematicians to prove rigorous results about physical theories. It means they can take a theory of quantum fields or strings that has a particular property that physicists are curious about, and find a version of that property that they can study in fully mathematical rigorous detail. It’s been a boon both to mathematicians interested in topology, and to physicists who want to know more about their theories.

So while you won’t have antigravity boots any time soon, theories that defy gravity are still useful!

At Quanta This Week, and Some Bonus Material

When I moved back to Denmark, I mentioned that I was planning to do more science journalism work. The first fruit of that plan is up this week: I have a piece at Quanta Magazine about a perennially trendy topic in physics, the S-matrix.

It’s been great working with Quanta again. They’ve been thorough, attentive to the science, and patient with my still-uncertain life situation. I’m quite likely to have more pieces there in future, and I’ve got ideas cooking with other outlets as well, so stay tuned!

My piece with Quanta is relatively short, the kind of thing they used to label a “blog” rather than say a “feature”. Since the S-matrix is a pretty broad topic, there were a few things I couldn’t cover there, so I thought it would be nice to discuss them here. You can think of this as a kind of “bonus material” section for the piece. So before reading on, read my piece at Quanta first!

Welcome back!

At Quanta I wrote a kind of cartoon of the S-matrix, asking you to think about it as a matrix of probabilities, with rows for input particles and columns for output particles. There are a couple different simplifications I snuck in there, the pop physicist’s “lies to children“. One, I already flag in the piece: the entries aren’t really probabilities, they’re complex numbers, probability amplitudes.

There’s another simplification that I didn’t have space to flag. The rows and columns aren’t just lists of particles, they’re lists of particles in particular states.

What do I mean by states? A state is a complete description of a particle. A particle’s state includes its energy and momentum, including the direction it’s traveling in. It includes its spin, and the direction of its spin: for example, clockwise or counterclockwise? It also includes any charges, from the familiar electric charge to the color of a quark.

This makes the matrix even bigger than you might have thought. I was already describing an infinite matrix, one where you can have as many columns and rows as you can imagine numbers of colliding particles. But the number of rows and columns isn’t just infinite, but uncountable, as many rows and columns as there are different numbers you can use for energy and momentum.

For some of you, an uncountably infinite matrix doesn’t sound much like a matrix. But for mathematicians familiar with vector spaces, this is totally reasonable. Even if your matrix is infinite, or even uncountably infinite, it can still be useful to think about it as a matrix.

Another subtlety, which I’m sure physicists will be howling at me about: the Higgs boson is not supposed to be in the S-matrix!

In the article, I alluded to the idea that the S-matrix lets you “hide” particles that only exist momentarily inside of a particle collision. The Higgs is precisely that sort of particle, an unstable particle. And normally, the S-matrix is supposed to only describe interactions between stable particles, particles that can survive all the way to infinity.

In my defense, if you want a nice table of probabilities to put in an article, you need an unstable particle: interactions between stable particles depend on their energy and momentum, sometimes in complicated ways, while a single unstable particle will decay into a reliable set of options.

More technically, there are also contexts in which it’s totally fine to think about an S-matrix between unstable particles, even if it’s not usually how we use the idea.

My piece also didn’t have a lot of room to discuss new developments. I thought at minimum I’d say a bit more about the work of the young people I mentioned. You can think of this as an appetizer: there are a lot of people working on different aspects of this subject these days.

Part of the initial inspiration for the piece was when an editor at Quanta noticed a recent paper by Christian Copetti, Lucía Cordova, and Shota Komatsu. The paper shows an interesting case, where one of the “logical” conditions imposed in the original S-matrix bootstrap doesn’t actually apply. It ended up being too technical for the Quanta piece, but I thought I could say a bit about it, and related questions, here.

Some of the conditions imposed by the original bootstrappers seem unavoidable. Quantum mechanics makes no sense if doesn’t compute probabilities, and probabilities can’t be negative, or larger than one, so we’d better have an S-matrix that obeys those rules. Causality is another big one: we probably shouldn’t have an S-matrix that lets us send messages back in time and change the past.

Other conditions came from a mixture of intuition and observation. Crossing is a big one here. Crossing tells you that you can take an S-matrix entry with in-coming particles, and relate it to a different S-matrix entry with out-going anti-particles, using techniques from the calculus of complex numbers.

Crossing may seem quite obscure, but after some experience with S-matrices it feels obvious and intuitive. That’s why for an expert, results like the paper by Copetti, Cordova, and Komatsu seem so surprising. What they found was that a particularly exotic type of symmetry, called a non-invertible symmetry, was incompatible with crossing symmetry. They could find consistent S-matrices for theories with these strange non-invertible symmetries, but only if they threw out one of the basic assumptions of the bootstrap.

This was weird, but upon reflection not too weird. In theories with non-invertible symmetries, the behaviors of different particles are correlated together. One can’t treat far away particles as separate, the way one usually does with the S-matrix. So trying to “cross” a particle from one side of a process to another changes more than it usually would, and you need a more sophisticated approach to keep track of it. When I talked to Cordova and Komatsu, they related this to another concept called soft theorems, aspects of which have been getting a lot of attention and funding of late.

In the meantime, others have been trying to figure out where the crossing rules come from in the first place.

There were attempts in the 1970’s to understand crossing in terms of other fundamental principles. They slowed in part because, as the original S-matrix bootstrap was overtaken by QCD, there was less motivation to do this type of work anymore. But they also ran into a weird puzzle. When they tried to use the rules of crossing more broadly, only some of the things they found looked like S-matrices. Others looked like stranger, meaningless calculations.

A recent paper by Simon Caron-Huot, Mathieu Giroux, Holmfridur Hannesdottir, and Sebastian Mizera revisited these meaningless calculations, and showed that they aren’t so meaningless after all. In particular, some of them match well to the kinds of calculations people wanted to do to predict gravitational waves from colliding black holes.

Imagine a pair of black holes passing close to each other, then scattering away in different directions. Unlike particles in a collider, we have no hope of catching the black holes themselves. They’re big classical objects, and they will continue far away from us. We do catch gravitational waves, emitted from the interaction of the black holes.

This different setup turns out to give the problem a very different character. It ends up meaning that instead of the S-matrix, you want a subtly different mathematical object, one related to the original S-matrix by crossing relations. Using crossing, Caron-Huot, Giroux, Hannesdottir and Mizera found many different quantities one could observe in different situations, linked by the same rules that the original S-matrix bootstrappers used to relate S-matrix entries.

The work of these two groups is just some of the work done in the new S-matrix program, but it’s typical of where the focus is going. People are trying to understand the general rules found in the past. They want to know where they came from, and as a consequence, when they can go wrong. They have a lot to learn from the older papers, and a lot of new insights come from diligent reading. But they also have a lot of new insights to discover, based on the new tools and perspectives of the modern day. For the most part, they don’t expect to find a new unified theory of physics from bootstrapping alone. But by learning how S-matrices work in general, they expect to find valuable knowledge no matter how the future goes.

The Impact of Jim Simons

The obituaries have been weirdly relevant lately.

First, a couple weeks back, Daniel Dennett died. Dennett was someone who could have had a huge impact on my life. Growing up combatively atheist in the early 2000’s, Dennett seemed to be exploring every question that mattered: how the semblance of consciousness could come from non-conscious matter, how evolution gives rise to complexity, how to raise a new generation to grow beyond religion and think seriously about the world around them. I went to Tufts to get my bachelor’s degree based on a glowing description he wrote in the acknowledgements of one of his books, and after getting there, I asked him to be my advisor.

(One of three, because the US education system, like all good games, can be min-maxed.)

I then proceeded to be far too intimidated to have a conversation with him more meaningful than “can you please sign my registration form?”

I heard a few good stories about Dennett while I was there, and I saw him debate once. I went into physics for my PhD, not philosophy.

Jim Simons died on May 10. I never spoke to him at all, not even to ask him to sign something. But he had a much bigger impact on my life.

I began my PhD at SUNY Stony Brook with a small scholarship from the Simons Foundation. The university’s Simons Center for Geometry and Physics had just opened, a shining edifice of modern glass next to the concrete blocks of the physics and math departments.

For a student aspiring to theoretical physics, the Simons Center virtually shouted a message. It taught me that physics, and especially theoretical physics, was something prestigious, something special. That if I kept going down that path I could stay in that world of shiny new buildings and daily cookie breaks with the occasional fancy jar-based desserts, of talks by artists and a café with twenty-dollar lunches (half-price once a week for students, the only time we could afford it, and still about twice what we paid elsewhere on campus). There would be garden parties with sushi buffets and late conference dinners with cauliflower steaks and watermelon salads. If I was smart enough (and I longed to be smart enough), that would be my future.

Simons and his foundation clearly wanted to say something along those lines, if not quite as filtered by the stars in a student’s eyes. He thought that theoretical physics, and research more broadly, should be something prestigious. That his favored scholars deserved more, and should demand more.

This did have weird consequences sometimes. One year, the university charged us an extra “academic excellence fee”. The story we heard was that Simons had demanded Stony Brook increase its tuition in order to accept his donations, so that it would charge more similarly to more prestigious places. As a state university, Stony Brook couldn’t do that…but it could add an extra fee. And since PhD students got their tuition, but not fees, paid by the department, we were left with an extra dent in our budgets.

The Simons Foundation created Quanta Magazine. If the Simons Center used food to tell me physics mattered, Quanta delivered the same message to professors through journalism. Suddenly, someone was writing about us, not just copying press releases but with the research and care of an investigative reporter. And they wrote about everything: not just sci-fi stories and cancer cures but abstract mathematics and the space of quantum field theories. Professors who had spent their lives straining to capture the public’s interest suddenly were shown an audience that actually wanted the real story.

In practice, the Simons Foundation made its decisions through the usual experts and grant committees. But the way we thought about it, the decisions always had a Jim Simons flavor. When others in my field applied for funding from the Foundation, they debated what Simons would want: would he support research on predictions for the LHC and LIGO? Or would he favor links to pure mathematics, or hints towards quantum gravity? Simons Collaboration Grants have an enormous impact on theoretical physics, dwarfing many other sources of funding. A grant funds an army of postdocs across the US, shifting the priorities of the field for years at a time.

Denmark has big foundations that have an outsize impact on science. Carlsberg, Villum, and the bigger-than-Denmark’s GDP Novo Nordisk have foundations with a major influence on scientific priorities. But Denmark is a country of six million. It’s much harder to have that influence on a country of three hundred million. Despite that, Simons came surprisingly close.

While we did like to think of the Foundation’s priorities as Simons’, I suspect that it will continue largely on the same track without him. Quanta Magazine is editorially independent, and clearly puts its trust in the journalists that made it what it is today.

I didn’t know Simons, I don’t think I even ever smelled one of his famous cigars. Usually, that would be enough to keep me from writing a post like this. But, through the Foundation, and now through Quanta, he’s been there with me the last fourteen years. That’s worth a reflection, at the very least.

Generalizing a Black Box Theory

In physics and in machine learning, we have different ways of thinking about models.

A model in physics, like the Standard Model, is a tool to make predictions. Using statistics and a whole lot of data (from particle physics experiments), we fix the model’s free parameters (like the mass of the Higgs boson). The model then lets us predict what we’ll see next: when we turn on the Large Hadron Collider, what will the data look like? In physics, when a model works well, we think that model is true, that it describes the real way the world works. The Standard Model isn’t the ultimate truth: we expect that a better model exists that makes better predictions. But it is still true, in an in-between kind of way. There really are Higgs bosons, even if they’re a result of some more mysterious process underneath, just like there really are atoms, even if they’re made out of protons, neutrons, and electrons.

A model in machine learning, like the Large Language Model that fuels ChatGPT, is also a tool to make predictions. Using statistics and a whole lot of data (from text on the internet, or images, or databases of proteins, or games of chess…) we fix the model’s free parameters (called weights, numbers for the strengths of connections between metaphorical neurons). The model then lets us predict what we’ll see next: when a text begins “Q: How do I report a stolen card? A:”, how does it end?

So far, that sounds a lot like physics. But in machine learning, we don’t generally think these models are true, at least not in the same way. The thing producing language isn’t really a neural network like a Large Language Model. It’s the sum of many human brains, many internet users, spread over many different circumstances. Each brain might be sort of like a neural network, but they’re not like the neural networks sitting on OpenAI’s servers. A Large Language Model isn’t true in some in-between kind of way, like atoms or Higgs bosons. It just isn’t true. It’s a black box, a machine that makes predictions, and nothing more.

But here’s the rub: what do we mean by true?

I want to be a pragmatist here. I don’t want to get stuck in a philosophical rabbit-hole, arguing with metaphysicists about what “really exists”. A true theory should be one that makes good predictions, that lets each of us know, based on our actions, what we should expect to see. That’s why science leads to technology, why governments and companies pay people to do it: because the truth lets us know what will happen, and make better choices. So if Large Language Models and the Standard Model both make good predictions, why is only one of them true?

Recently, I saw Dan Elton of More is Different make the point that there is a practical reason to prefer the “true” explanations: they generalize. A Large Language Model might predict what words come next in a text. But it doesn’t predict what happens when you crack someone’s brain open and see how the neurons connect to each other, even if that person is the one who made the text. A good explanation, a true model, can be used elsewhere. The Standard Model tells you what data from the Large Hadron Collider will look like, but it also tells you what data from the muon g-2 experiment will look like. It also, in principle, tells you things far away from particle physics: what stars look like, what atoms look like, what the inside of a nuclear reactor looks like. A black box can’t do that, even if it makes great predictions.

It’s a good point. But thinking about it, I realized things are a little murkier.

You can’t generalize a Large Language Model to tell you how human neurons are connected. But you can generalize it in other ways, and people do. There’s a huge industry in trying to figure out what GPT and its relatives “know”. How much math can they do? How much do they know about geography? Can they predict the future?

These generalizations don’t work the way that they do in physics, or the rest of science, though. When we generalize the Standard Model, we aren’t taking a machine that makes particle physics predictions and trying to see what those particle physics predictions can tell us. We’re taking something “inside” the machine, the fields and particles, and generalizing that, seeing how the things around us could be made of those fields and those particles. In contrast, when people generalize GPT, they typically don’t look inside the “black box”. They use the Large Language Model to make predictions, and see what those predictions “know about”.

On the other hand, we do sometimes generalize scientific models that way too.

If you’re simulating the climate, or a baby star, or a colony of bacteria, you typically aren’t using your simulation like a prediction machine. You don’t plug in exactly what is going on in reality, then ask what happens next. Instead, you run many simulations with different conditions, and look for patterns. You see how a cloud of sulfur might cool down the Earth, or how baby stars often form in groups, leading them to grow up into systems of orbiting black holes. Your simulation is kind of like a black box, one that you try out in different ways until you uncover some explainable principle, something your simulation “knows” that you can generalize.

And isn’t nature that kind of black box, too? When we do an experiment, aren’t we just doing what the Large Language Models are doing, prompting the black box in different ways to get an idea of what it knows? Are scientists who do experiments that picky about finding out what’s “really going on”, or do they just want a model that works?

We want our models to be general, and to be usable. Building a black box can’t be the whole story, because a black box, by itself, isn’t general. But it can certainly be part of the story. Going from the black box of nature to the black box of a machine lets you run tests you couldn’t previously do, lets you investigate faster and ask stranger questions. With a simulation, you can blow up stars. With a Large Language Model, you can ask, for a million social media comments, whether the average internet user would call them positive or negative. And if you make sure to generalize, and try to make better decisions, then it won’t be just the machine learning. You’ll be learning too.

What Are Particles? The Gentle Introduction

On this blog, I write about particle physics for the general public. I try to make things as simple as possible, but I do have to assume some things. In particular, I usually assume you know what particles are!

This time, I won’t do that. I know some people out there don’t know what a particle is, or what particle physicists do. If you’re a person like that, this post is for you! I’m going to give a gentle introduction to what particle physics is all about.

Let’s start with atoms.

Every object and substance around you, everything you can touch or lift or walk on, the water you drink and the air you breathe, all of these are made up of atoms. Some are simple: an iron bar is made of Iron atoms, aluminum foil is mostly Aluminum atoms. Some are made of combinations of atoms into molecules, like water’s famous H2O: each molecule has two Hydrogen atoms and one Oxygen atom. Some are made of more complicated mixtures: air is mostly pairs of Nitrogen atoms, with a healthy amount of pairs of Oxygen, some Carbon Dioxide (CO2), and many other things, while the concrete sidewalks you walk on have Calcium, Silicon, Aluminum, Iron, and Oxygen, all combined in various ways.

There is a dizzying array of different types of atoms, called chemical elements. Most occur in nature, but some are man-made, created by cutting-edge nuclear physics. They can all be organized in the periodic table of elements, which you’ve probably seen on a classroom wall.

The periodic table

The periodic table is called the periodic table because it repeats, periodically. Each element is different, but their properties resemble each other. Oxygen is a gas, Sulfur a yellow powder, Polonium an extremely radioactive metal…but just as you can find H2O, you can make H2S, and even H2Po. The elements get heavier as you go down the table, and more metal-like, but their chemical properties, the kinds of molecules you can make with them, repeat.

Around 1900, physicists started figuring out why the elements repeat. What they discovered is that each atom is made of smaller building-blocks, called sub-atomic particles. (“Sub-atomic” because they’re smaller than atoms!) Each atom has electrons on the outside, and on the inside has a nucleus made of protons and neutrons. Atoms of different elements have different numbers of protons and electrons, which explains their different properties.

Different atoms with different numbers of protons, neutrons, and electrons

Around the same time, other physicists studied electricity, magnetism, and light. These things aren’t made up of atoms, but it was discovered that they are all aspects of the same force, the electromagnetic force. And starting with Einstein, physicists figured out that this force has particles too. A beam of light is made up of another type of sub-atomic particle, called a photon.

For a little while then, it seemed that the universe was beautifully simple. All of matter was made of electrons, protons, and neutrons, while light was made of photons.

(There’s also gravity, of course. That’s more complicated, in this post I’ll leave it out.)

Soon, though, nuclear physicists started noticing stranger things. In the 1930’s, as they tried to understand the physics behind radioactivity and mapped out rays from outer space, they found particles that didn’t fit the recipe. Over the next forty years, theoretical physicists puzzled over their equations, while experimental physicists built machines to slam protons and electrons together, all trying to figure out how they work.

Finally, in the 1970’s, physicists had a theory they thought they could trust. They called this theory the Standard Model. It organized their discoveries, and gave them equations that could predict what future experiments would see.

In the Standard Model, there are two new forces, the weak nuclear force and the strong nuclear force. Just like photons for the electromagnetic force, each of these new forces has a particle. The general word for these particles is bosons, named after Satyendra Nath Bose, a collaborator of Einstein who figured out the right equations for this type of particle. The weak force has bosons called W and Z, while the strong force has bosons called gluons. A final type of boson, called the Higgs boson after a theorist who suggested it, rounds out the picture.

The Standard Model also has new types of matter particles. Neutrinos interact with the weak nuclear force, and are so light and hard to catch that they pass through nearly everything. Quarks are inside protons and neutrons: a proton contains one one down quark and two up quarks, while a neutron contains two down quarks and one up quark. The quarks explained all of the other strange particles found in nuclear physics.

Finally, the Standard Model, like the periodic table, repeats. There are three generations of particles. The first, with electrons, up quarks, down quarks, and one type of neutrino, show up in ordinary matter. The other generations are heavier, and not usually found in nature except in extreme conditions. The second generation has muons (similar to electrons), strange quarks, charm quarks, and a new type of neutrino called a muon-neutrino. The third generation has tauons, bottom quarks, top quarks, and tau-neutrinos.

(You can call these last quarks “truth quarks” and “beauty quarks” instead, if you like.)

Physicists had the equations, but the equations still had some unknowns. They didn’t know how heavy the new particles were, for example. Finding those unknowns took more experiments, over the next forty years. Finally, in 2012, the last unknown was found when a massive machine called the Large Hadron Collider was used to measure the Higgs boson.

The Standard Model

We think that these particles are all elementary particles. Unlike protons and neutrons, which are both made of up quarks and down quarks, we think that the particles of the Standard Model are not made up of anything else, that they really are elementary building-blocks of the universe.

We have the equations, and we’ve found all the unknowns, but there is still more to discover. We haven’t seen everything the Standard Model can do: to see some properties of the particles and check they match, we’d need a new machine, one even bigger than the Large Hadron Collider. We also know that the Standard Model is incomplete. There is at least one new particle, called dark matter, that can’t be any of the known particles. Mysteries involving the neutrinos imply another type of unknown particle. We’re also missing deeper things. There are patterns in the table, like the generations, that we can’t explain.

We don’t know if any one experiment will work, or if any one theory will prove true. So particle physicists keep working, trying to find new tricks and make new discoveries.

Models, Large Language and Otherwise

In particle physics, our best model goes under the unimaginative name “Standard Model“. The Standard Model models the world in terms of interactions of different particles, or more properly quantum fields. The fields have different masses and interact with different strengths, and each mass and interaction strength is a parameter: a “free” number in the model, one we have to fix based on data. There are nineteen parameters in the Standard Model (not counting the parameters for massive neutrinos, which were discovered later).

In principle, we could propose a model with more parameters that fit the data better. With enough parameters, one can fit almost anything. That’s cheating, though, and it’s a type of cheating we know how to catch. We have statistical tests that let us estimate how impressed we should be when a model matches the data. If a model is just getting ahead on extra parameters without capturing something real, we can spot that, because it gets a worse score on those tests. A model with a bad score might match the data you used to fix its parameters, but it won’t predict future data, so it isn’t actually useful. Right now the Standard Model (plus neutrino masses) gets the best score on those tests, when fitted to all the data we have access to, so we think of it as our best and most useful model. If someone proposed a model that got a better score, we’d switch: but so far, no-one has managed.

Physicists care about this not just because a good model is useful. We think that the best model is, in some sense, how things really work. The fact that the Standard Model fits the data best doesn’t just mean we can use it to predict more data in the future: it means that somehow, deep down, that the world is made up of quantum fields the way the Standard Model describes.

If you’ve been following developments in machine learning, or AI, you might have heard the word “model” slung around. For example, GPT is a Large Language Model, or LLM for short.

Large Language Models are more like the Standard Model than you might think. Just as the Standard Model models the world in terms of interacting quantum fields, Large Language Models model the world in terms of a network of connections between artificial “neurons”. Just as particles have different interaction strengths, pairs of neurons have different connection weights. Those connection weights are the parameters of a Large Language Model, in the same way that the masses and interaction strengths of particles are the parameters of the Standard Model. The parameters for a Large Language Model are fixed by a giant corpus of text data, almost the whole internet reduced to a string of bytes that the LLM needs to match, in the same way the Standard Model needs to match data from particle collider experiments. The Standard Model has nineteen parameters, Large Language Models have billions.

Increasingly, machine learning models seem to capture things better than other types of models. If you want to know how a protein is going to fold, you can try to make a simplified model of how its atoms and molecules interact with each other…but instead, you can make your model a neural network. And that turns out to work better. If you’re a bank and you want to know how many of your clients will default on their loans, you could ask an economist to make a macroeconomic model…or, you can just make your model a neural network too.

In physics, we think that the best model is the model that is closest to reality. Clearly, though, this can’t be what’s going on here. Real proteins don’t fold based on neural networks, and neither do real economies. Both economies and folding proteins are very complicated, so any model we can use right now won’t be what’s “really going on”, unlike the comparatively simple world of particle physics. Still, it seems weird that, compared to the simplified economic or chemical models, neural networks can work better, even if they’re very obviously not really what’s going on. Is there another way to think about them?

I used to get annoyed at people using the word “AI” to refer to machine learning models. In my mind, AI was the thing that shows up in science fiction, machines that can think as well or better than humans. (The actual term of art for this is AGI, artificial general intelligence.) Machine learning, and LLMs in particular, felt like a meaningful step towards that kind of AI, but they clearly aren’t there yet.

Since then, I’ve been convinced that the term isn’t quite so annoying. The AI field isn’t called AI because they’re creating a human-equivalent sci-fi intelligence. They’re called AI because the things they build are inspired by how human intelligence works.

As humans, we model things with mathematics, but we also model them with our own brains. Consciously, we might think about objects and their places in space, about people and their motivations and actions, about canonical texts and their contents. But all of those things cash out in our neurons. Anything we think, anything we believe, any model we can actually apply by ourselves in our own lives, is a model embedded in a neural network. It’s quite a bit more complicated neural network than an LLM, but it’s very much still a kind of neural network.

Because humans are alright at modeling a variety of things, because we can see and navigate the world and persuade and manipulate each other, we know that neural networks can do these things. A human brain may not be the best model for any given phenomenon: an engineer can model the flight of a baseball with math much better than the best baseball player can with their unaided brain. But human brains still tend to be fairly good models for a wide variety of things. Evolution has selected them to be.

So with that in mind, it shouldn’t be too surprising that neural networks can model things like protein folding. Even if proteins don’t fold based on neural networks, even if the success of AlphaFold isn’t capturing the actual details of the real world the way the Standard Model does, the model is capturing something. It’s loosely capturing the way a human would think about the problem, if you gave that human all the data they needed. And humans are, and remain, pretty good at thinking! So we have reason, not rigorous, but at least intuitive reason, to think that neural networks will actually be good models of things.

What Referees Are For

This week, we had a colloquium talk by the managing editor of the Open Journal of Astrophysics.

The Open Journal of Astrophysics is an example of an arXiv overlay journal. In the old days, journals shouldered the difficult task of compiling scientists’ work into a readable format and sending them to university libraries all over the world so people could stay up to date with the work of distant colleagues. They used to charge libraries for the journals, now some instead charge authors per paper they want to publish.

Now, most of that is unnecessary due to online resources, in my field the arXiv. We prepare our papers using free tools like LaTeX, then upload them to arXiv.org, a website that makes the papers freely accessible for everybody. I don’t think I’ve ever read a paper in a physical journal in my field, and I only check journal websites if I think there’s a mistake in the arXiv version. The rest of the time, I just use the arXiv.

Still, journals do one thing the arXiv doesn’t do, and that’s refereeing. Each paper a journal receives is sent out to a few expert referees. The referees read the paper, and either reject it, accept it as-is, or demand changes before they can accept it. The journal then publishes accepted papers only.

The goal of arXiv overlay journals is to make this feature of journals also unnecessary. To do this, they notice that if every paper is already on arXiv, they don’t need to host papers or print them or typeset them. They just need to find suitable referees, and announce which papers passed.

The Open Journal of Astrophysics is a relatively small arXiv overlay journal. They operate quite cheaply, in part because the people running it can handle most of it as a minor distraction from their day job. SciPost is much bigger, and has to spend more per paper to operate. Still, it spends a lot less than journals charge authors.

We had a spirited discussion after the talk, and someone brought up an interesting point: why do we need to announce which papers passed? Can’t we just publish everything?

What, in the end, are the referees actually for? Why do we need them?

One function of referees is to check for mistakes. This is most important in mathematics, where referees might spend years making sure every step in a proof works as intended. Other fields vary, from theoretical physics (where we can check some things sometimes, but often have to make do with spotting poorly explained parts of a calculation), to fields that do experiments in the real world (where referees can look for warning signs and shady statistics, but won’t actually reproduce the experiment). A mistake found by a referee can be a boon to not just the wider scientific community, but to the author as well. Most scientists would prefer their papers to be correct, so we’re often happy to hear about a genuine mistake.

If this was all referees were for, though, then you don’t actually need to reject any papers. As a colleague of mine suggested, you just need the referees to publish their reports. Then the papers could be published along with comments from the referees, and possibly also responses from the author. Readers could see any mistakes the referees found, and judge for themselves what they show about the result.

Referees already publish their reports in SciPost much of the time, though not currently in the Open Journal of Astrophysics. Both journals still reject some papers, though. In part, that’s because they serve another function: referees are supposed to tell us which papers are “good”.

Some journals are more prestigious and fancy than others. Nature and Science are the most famous, though people in my field almost never bother to publish in either. Still, we have a hierarchy in mind, with Physical Review Letters on the high end and JHEP on the lower one. Publishing in a fancier and more prestigious journal is supposed to say something about you as a scientist, to say that your work is fancier and more prestigious. If you can’t publish in any journal at all, then your work wasn’t interesting enough to merit getting credit for it, and maybe you should have worked harder.

What does that credit buy you? Ostensibly, everything. Jobs are more likely to hire you if you’ve published in more prestigious places, and grant agencies will be more likely to give you money.

In practice, though, this depends a lot on who’s making the decisions. Some people will weigh these kinds of things highly, especially if they aren’t familiar with a candidate’s work. Others will be able to rely on other things, from numbers of papers and citations to informal assessments of a scientist’s impact. I genuinely don’t know whether the journals I published in made any impact at all when I was hired, and I’m a bit afraid to ask. I haven’t yet sat on the kind of committee that makes these decisions, so I don’t know what things look like from the other side either.

But I do know that, on a certain level, journals and publications can’t matter quite as much as we think. As I mentioned, my field doesn’t use Nature or Science, while others do. A grant agency or hiring committee comparing two scientists would have to take that into account, just as they have to take into account the thousands of authors on every single paper by the ATLAS and CMS experiments. If a field started publishing every paper regardless of quality, they’d have to adapt there too, and find a new way to judge people compatible with that.

Can we just publish everything, papers and referee letters and responses and letters and reviews? Maybe. I think there are fields where this could really work well, and fields where it would collapse into the invective of a YouTube comments section. I’m not sure where my own field sits. Theoretical particle physics is relatively small and close-knit, but it’s also cool and popular, with many strong and dumb opinions floating around. I’d like to believe we could handle it, that we could prune back the professional cruft and turn our field into a real conversation between scholars. But I don’t know.

IPhT-60 Retrospective

Last week, my institute had its 60th anniversary party, which like every party in academia takes the form of a conference.

For unclear reasons, this one also included a physics-themed arcade game machine.

Going in, I knew very little about the history of the Institute of Theoretical Physics, of the CEA it’s part of (Commissariat of Atomic Energy, now Atomic and Alternative Energy), or of French physics in general, so I found the first few talks very interesting. I learned that in France in the early 1950’s, theoretical physics was quite neglected. Key developments, like relativity and statistical mechanics, were seen as “too German” due to their origins with Einstein and Boltzmann (nevermind that this was precisely why the Nazis thought they were “not German enough”), while de Broglie suppressed investigation of quantum mechanics. It took French people educated abroad to come back and jumpstart progress.

The CEA is, in a sense, the French equivalent of the some of the US’s national labs, and like them got its start as part of a national push towards nuclear weapons and nuclear power.

(Unlike the US’s national labs, the CEA is technically a private company. It’s not even a non-profit: there are for-profit components that sell services and technology to the energy industry. Never fear, my work remains strictly useless.)

My official title is Ingénieur Chercheur, research engineer. In the early days, that title was more literal. Most of the CEA’s first permanent employees didn’t have PhDs, but were hired straight out of undergraduate studies. The director, Claude Bloch, was in his 40’s, but most of the others were in their 20’s. There was apparently quite a bit of imposter syndrome back then, with very young people struggling to catch up to the global state of the art.

They did manage to catch up, though, and even excel. In the 60’s and 70’s, researchers at the institute laid the groundwork for a lot of ideas that are popular in my field at the moment. Stora’s work established a new way to think about symmetry that became the textbook approach we all learn in school, while Froissart figured out a consistency condition for high-energy physics whose consequences we’re still teasing out. Pham was another major figure at the institute in that era. With my rudimentary French I started reading his work back in Copenhagen, looking for new insights. I didn’t go nearly as fast as my partner in the reading group though, whose mastery of French and mathematics has seen him use Pham’s work in surprising new ways.

Hearing about my institute’s past, I felt a bit of pride in the physicists of the era, not just for the science they accomplished but for the tools they built to do it. This was the era of preprints, first as physical papers, orange folders mailed to lists around the world, and later online as the arXiv. Physicists here were early adopters of some aspects, though late adopters of others (they were still mailing orange folders a ways into the 90’s). They also adopted computation, with giant punch-card reading, sheets-of-output-producing computers staffed at all hours of the night. A few physicists dove deep into the new machines, and guided the others as capabilities changed and evolved, while others were mostly just annoyed by the noise!

When the institute began, scientific papers were still typed on actual typewriters, with equations handwritten in or typeset in ingenious ways. A pool of secretaries handled much of the typing, many of whom were able to come to the conference! I wonder what they felt, seeing what the institute has become since.

I also got to learn a bit about the institute’s present, and by implication its future. I saw talks covering different areas, from multiple angles on mathematical physics to simulations of large numbers of particles, quantum computing, and machine learning. I even learned a bit from talks on my own area of high-energy physics, highlighting how much one can learn from talking to new people.