Tag Archives: mathematics

Integration by Parts, Evolved

I posted what may be my last academic paper today, about a project I’ve been working on with Matthias Wilhelm for most of the last year. The paper is now online here. For me, the project has been a chance to broaden my horizons, learn new skills, and start to step out of my academic comfort zone. For Matthias, I hope it was grant money well spent.

I wanted to work on something related to machine learning, for the usual trendy employability reasons. Matthias was already working with machine learning, but was interested in pursuing a different question.

When is machine learning worthwhile? Machine learning methods are heuristics, unreliable methods that sometimes work well. You don’t use a heuristic if you have a reliable method that runs fast enough. But if all you have are heuristics to begin with, then machine learning can give you a better heuristic.

Matthias noticed a heuristic embedded deep in how we do particle physics, and guessed that we could do better. In particle physics, we use pictures called Feynman diagrams to predict the probabilities for different outcomes of collisions, comparing those predictions to observation to look for evidence of new physics. Each Feynman diagram corresponds to an integral, and for each calculation there are hundreds, thousands, or even millions of those integrals to do.

Luckily, physicists don’t actually have to do all those integrals. It turns out that most of them are related, by a slightly more advanced version of that calculus class mainstay, integration by parts. Using integration by parts you can solve a list of equations, finding out how to write your integrals in terms of a much smaller list.

How big a list of equations do you need, and which ones? Twenty-five years ago, Stefano Laporta proposed a “golden rule” to choose, based on his own experience, and people have been using it (more or less, with their own tweaks) since then.

Laporta’s rule is a heuristic, with no proof that it is the best option, or even that it will always work. So we probably shouldn’t have been surprised when someone came up with a better heuristic. Watching talks at a December 2023 conference, Matthias saw a presentation by Johann Usovitsch on a curious new rule. The rule was surprisingly simple, just one extra condition on top of Laporta’s. But it was enough to reduce the number of equations by a factor of twenty.

That’s great progress, but it’s also a bit frustrating. Over almost twenty-five years, no-one had guessed this one simple change?

Maybe, thought Matthias and I, we need to get better at guessing.

We started out thinking we’d try reinforcement learning, a technique where a machine is trained by playing a game again and again, changing its strategy when that strategy brings it a reward. We thought we could have the machine learn to cut away extra equations, getting rewarded if it could cut more while still getting the right answer. We didn’t end up pursuing this very far before realizing another strategy would be a better fit.

What is a rule, but a program? Laporta’s golden rule and Johann’s new rule could both be expressed as simple programs. So we decided to use a method that could guess programs.

One method stood out for sheer trendiness and audacity: FunSearch. FunSearch is a type of algorithm called a genetic algorithm, which tries to mimic evolution. It makes a population of different programs, “breeds” them with each other to create new programs, and periodically selects out the ones that perform best. That’s not the trendy or audacious part, though, people have been doing that sort of genetic programming for a long time.

The trendy, audacious part is that FunSearch generates these programs with a Large Language Model, or LLM (the type of technology behind ChatGPT). Using an LLM trained to complete code, FunSearch presents the model with two programs labeled v0 and v1 and asks it to complete v2. In general, program v2 will have some traits from v0 and v1, but also a lot of variation due to the unpredictable output of LLMs. The inventors of FunSearch used this to contribute the variation needed for evolution, using it to evolve programs to find better solutions to math problems.

We decided to try FunSearch on our problem, modifying it a bit to fit the case. We asked it to find a shorter list of equations, giving a better score for a shorter list but a penalty if the list wasn’t able to solve the problem fully.

Some tinkering and headaches later, it worked! After a few days and thousands of program guesses, FunSearch was able to find a program that reproduced the new rule Johann had presented. A few hours more, and it even found a rule that was slightly better!

But then we started wondering: do we actually need days of GPU time to do this?

An expert on heuristics we knew had insisted, at the beginning, that we try something simpler. The approach we tried then didn’t work. But after running into some people using genetic programming at a conference last year, we decided to try again, using a Python package they used in their work. This time, it worked like a charm, taking hours rather than days to find good rules.

This was all pretty cool, a great opportunity for me to cut my teeth on Python programming and its various attendant skills. And it’s been inspiring, with Matthias drawing together more people interested in seeing just how much these kinds of heuristic methods can do there. I should be clear though, that so far I don’t think our result is useful. We did better than the state of the art on an example, but only slightly, and in a way that I’d guess doesn’t generalize. And we needed quite a bit of overhead to do it. Ultimately, while I suspect there’s something useful to find in this direction, it’s going to require more collaboration, both with people using the existing methods who know better what the bottlenecks are, and with experts in these, and other, kinds of heuristics.

So I’m curious to see what the future holds. And for the moment, happy that I got to try this out!

How Small Scales Can Matter for Large Scales

Why Quantum Gravity Is Controversial

27 Replies

Merging quantum mechanics and gravity is a famously hard physics problem. Explaining why merging quantum mechanics and gravity is hard is, in turn, a very hard science communication problem. The more popular descriptions tend to lead to misunderstandings, and I’ve posted many times over the years to chip away at those misunderstandings.

Merging quantum mechanics and gravity is hard…but despite that, there are proposed solutions. String Theory is supposed to be a theory of quantum gravity. Loop Quantum Gravity is supposed to be a theory of quantum gravity. Asymptotic Safety is supposed to be a theory of quantum gravity.

One of the great virtues of science and math is that we are, eventually, supposed to agree. Philosophers and theologians might argue to the end of time, but in math we can write down a proof, and in science we can do an experiment. If we don’t yet have the proof or the experiment, then we should reserve judgement. Either way, there’s no reason to get into an unproductive argument.

Despite that, string theorists and loop quantum gravity theorists and asymptotic safety theorists, famously, like to argue! There have been bitter, vicious, public arguments about the merits of these different theories, and decades of research doesn’t seem to have resolved them. To an outside observer, this makes quantum gravity seem much more like philosophy or theology than like science or math.

Why is there still controversy in quantum gravity? We can’t do quantum gravity experiments, sure, but if that were the problem physicists could just write down the possibilities and leave it at that. Why argue?

Some of the arguments are for silly aesthetic reasons, or motivated by academic politics. Some are arguments about which approaches are likely to succeed in future, which as always is something we can’t actually reliably judge. But the more justified arguments, the strongest and most durable ones, are about a technical challenge. They’re about something called non-perturbative physics.

Most of the time, when physicists use a theory, they’re working with an approximation. Instead of the full theory, they’re making an assumption that makes the theory easier to use. For example, if you assume that the velocity of an object is small, you can use Newtonian physics instead of special relativity. Often, physicists can systematically relax these assumptions, including more and more of the behavior of the full theory and getting a better and better approximation to the truth. This process is called perturbation theory.

Other times, this doesn’t work well. The full theory has some trait that isn’t captured by the approximations, something that hides away from these systematic tools. The theory has some important aspect that is non-perturbative.

Every proposed quantum gravity theory uses approximations like this. The theory’s proponents try to avoid these approximations when they can, but often they have to approximate and hope they don’t miss too much. The opponents, in turn, argue that the theory’s proponents are missing something important, some non-perturbative fact that would doom the theory altogether.

Asymptotic Safety is built on top of an approximation, one different from what other quantum gravity theorists typically use. To its proponents, work using their approximation suggests that gravity works without any special modifications, that the theory of quantum gravity is easier to find than it seems. Its opponents aren’t convinced, and think that the approximation is missing something important which shows that gravity needs to be modified.

In Loop Quantum Gravity, the critics think their approximation misses space-time itself. Proponents of Loop Quantum Gravity have been unable to prove that their theory, if you take all the non-perturbative corrections into account, doesn’t just roll up all of space and time into a tiny spiky ball. They expect that their theory should allow for a smooth space-time like we experience, but the critics aren’t convinced, and without being able to calculate the non-perturbative physics neither side can convince the other.

String Theory was founded and originally motivated by perturbative approximations. Later, String Theorists figured out how to calculate some things non-perturbatively, often using other simplifications like supersymmetry. But core questions, like whether or not the theory allows a positive cosmological constant, seem to depend on non-perturbative calculations that the theory gives no instructions for how to do. Some critics don’t think there is a consistent non-perturbative theory at all, that the approximations String Theorists use don’t actually approximate to anything. Even within String Theory, there are worries that the theory might try to resist approximation in odd ways, becoming more complicated whenever a parameter is small enough that you could use it to approximate something.

All of this would be less of a problem with real-world evidence. Many fields of science are happy to use approximations that aren’t completely rigorous, as long as those approximations have a good track record in the real world. In general though, we don’t expect evidence relevant to quantum gravity any time soon. Maybe we’ll get lucky, and studies of cosmology will reveal something, or an experiment on Earth will have a particularly strange result. But nature has no obligation to help us out.

Without evidence, though, we can still make mathematical progress. You could imagine someone proving that the various perturbative approaches to String Theory become inconsistent when stitched together into a full non-perturbative theory. Alternatively, you could imagine someone proving that a theory like String Theory is unique, that no other theory can do some key thing that it does. Either of these seems unlikely to come any time soon, and most researchers in these fields aren’t pursuing questions like that. But the fact the debate could be resolved means that it isn’t just about philosophy or theology. There’s a real scientific, mathematical controversy, one rooted in our inability to understand these theories beyond the perturbative methods their proponents use. And while I don’t expect it to be resolved any time soon, one can always hold out hope for a surprise.

Amplitudes 2024, Continued

Beyond Elliptic Polylogarithms in Oaxaca

1 Reply

Arguably my biggest project over the last two years wasn’t a scientific paper, a journalistic article, or even a grant application. It was a conference.

Most of the time, when scientists organize a conference, they do it “at home”. Either they host the conference at their own university, or rent out a nearby event venue. There is an alternative, though. Scattered around the world, often in out-of-the way locations, are places dedicated to hosting scientific conferences. These places accept applications each year from scientists arguing that their conference would best serve the place’s scientific mission.

One of these places is the Banff International Research Station in Alberta, Canada. Since 2001, Banff has been hosting gatherings of mathematicians from around the world, letting them focus on their research in an idyllic Canadian ski resort.

If you don’t like skiing, though, Banff still has you covered! They have “affiliate centers” elsewhere, with one elsewhere in Canada, one in China, two on the way in India and Spain…and one, that particularly caught my interest, in Oaxaca, Mexico.

Back around this time of year in 2022, I started putting a proposal together for a conference at the Casa Mathemática Oaxaca. The idea would be a conference discussing the frontier of the field, how to express the strange mathematical functions that live in Feynman diagrams. I assembled a big team of co-organizers, five in total. At the time, I wasn’t sure whether I could find a permanent academic job, so I wanted to make sure there were enough people involved that they could run the conference without me.

Followers of the blog know I did end up finding that permanent job…only to give it up. In the end, I wasn’t able to make it to the conference. But my four co-organizers were (modulo some delays in the Houston airport). The conference was this week, with the last few talks happening over the next few hours.

I gave a short speech via Zoom at the beginning of the conference, a mix of welcome and goodbye. Since then I haven’t had the time to tune in to the talks, but they’re good folks and I suspect they’re having good discussions.

I do regret that, near the end, I wasn’t able to give the conference the focus it deserved. There were people we really hoped to have, but who couldn’t afford the travel. I’d hoped to find a source of funding that could support them, but the plan fell through. The week after Amplitudes 2024 was also a rough time to have a conference in this field, with many people who would have attended not able to go to both. (At least they weren’t the same week, thanks to some flexibility on the part of the Amplitudes organizers!)

Still, it’s nice to see something I’ve been working on for two years finally come to pass, to hopefully stir up conversations between different communities and give various researchers a taste of one of Mexico’s most beautiful places. I still haven’t been to Oaxaca yet, but I suspect I will eventually. Danish companies do give at minimum five weeks of holiday per year, so I should get a chance at some point.

(Not At) Amplitudes 2024 at the IAS

2 Replies

For over a decade, I studied scattering amplitudes, the formulas particle physicists use to find the probability that particles collide, or scatter, in different ways. I went to Amplitudes, the field’s big yearly conference, every year from 2015 to 2023.

This year is different. I’m on the way out of the field, looking for my next steps. Meanwhile, Amplitudes 2024 is going full speed ahead at the Institute for Advanced Study in Princeton.

With poster art that is, as the kids probably don’t say anymore, “on fleek”

The talks aren’t live-streamed this year, but they are posting slides, and they will be posting recordings. Since a few of my readers are interested in new amplitudes developments, I’ve been paging through the posted slides looking for interesting highlights. So far, I’ve only seen slides from the first few days: I will probably write about the later talks in a future post.

Each day of Amplitudes this year has two 45-minute “review talks”, one first thing in the morning and the other first thing after lunch. I put “review talks” in quotes because they vary a lot, between talks that try to introduce a topic for the rest of the conference to talks that mostly focus on the speaker’s own research. Lorenzo Tancredi’s talk was of the former type, an introduction to the many steps that go into making predictions for the LHC, with a focus on those topics where amplitudeologists have made progress. The talk opens with the type of motivation I’d been writing in grant and job applications over the last few years (we don’t know most of the properties of the Higgs yet! To measure them, we’ll need to calculate amplitudes with massive particles to high precision!), before moving into a review of the challenges and approaches in different steps of these calculations. While Tancredi apologizes in advance that the talk may be biased, I found it surprisingly complete: if you want to get an idea of the current state of the “LHC amplitudes pipeline”, his slides are a good place to start.

Tancredi’s talk serves as introduction for a variety of LHC-focused talks, some later that day and some later in the week. Federica Devoto discussed high-energy quarks while Chiara Signorile-Signorile and George Sterman showed advances in handling of low-energy particles. Xiaofeng Xu has a program that helps predict symbol letters, the building-blocks of scattering amplitudes that can be used to reconstruct or build up the whole thing, while Samuel Abreu talked about a tricky state-of-the-art case where Xu’s program misses part of the answer.

Later Monday morning veered away from the LHC to focus on more toy-model theories. Renata Kallosh’s talk in particular caught my attention. This blog is named after a long-standing question in amplitudes: will the four-graviton amplitude in N=8 supergravity diverge at seven loops in four dimensions? This seemingly arcane question is deep down a question about what is actually required for a successful theory of quantum gravity, and in particular whether some of the virtues of string theory can be captured by a simpler theory instead. Answering the question requires a prodigious calculation, and the more “loops” are involved the more difficult it is. Six years ago, the calculation got to five loops, and it hasn’t passed that mark since then. That five-loop calculation gave some reason for pessimism, a nice pattern at lower loops that stopped applying at five.

Kallosh thinks she has an idea of what to expect. She’s noticed a symmetry in supergravity, one that hadn’t previously been taken into account. She thinks that symmetry should keep N=8 supergravity from diverging on schedule…but only in exactly four dimensions. All of the lower-loop calculations in N=8 supergravity diverged in higher dimensions than four, and it seems like with this new symmetry she understands why. Her suggestion is to focus on other four-dimensional calculations. If seven loops is still too hard, then dialing back the amount of supersymmetry from N=8 to something lower should let her confirm her suspicions. Already a while back N=5 supergravity was found to diverge later than expected in four dimensions. She wants to know whether that pattern continues.

(Her backup slides also have a fun historical point: in dimensions greater than four, you can’t get elliptical planetary orbits. So four dimensions is special for our style of life.)

Other talks on Monday included a talk by Zahra Zahraee on progress towards “solving” the field’s favorite toy model, N=4 super Yang-Mills. Christian Copetti talked about the work I mentioned here, while Meta employee François Charlton’s “review talk” dealt with his work applying machine learning techniques to “translate” between questions in mathematics and their answers. In particular, he reported progress with my current boss Matthias Wilhelm and frequent collaborator and mentor Lance Dixon on using transformers to guess high-loop formulas in N=4 super Yang-Mills. They have an interesting proof of principle now, but it will probably still be a while until they can use the method to predict something beyond the state of the art.

In the meantime at least they have some hilarious AI-generated images

Tuesday’s review by Ian Moult was genuinely a review, but of a topic not otherwise covered at the conference, that of “detector observables”. The idea is that rather than talking about which individual particles are detected, one can ask questions that make more sense in terms of the experimental setup, like asking about the amounts of energy deposited in different detectors. This type of story has gone from an idle observation by theorists to a full research program, with theorists and experimentalists in active dialogue.

Natalia Toro brought up that, while we say each particle has a definite spin, that may not actually be the case. Particles with so-called “continuous spins” can masquerade as particles with a definite integer spin at lower energies. Toro and Schuster promoted this view of particles ten years ago, but now can make a bit more sense of it, including understanding how continuous-spin particles can interact.

The rest of Tuesday continued to be a bit of a grab-bag. Yael Shadmi talked about applying amplitudes techniques to Effective Field Theory calculations, while Franziska Porkert talked about a Feynman diagram involving two different elliptic curves. Interestingly (well, to me at least), the curves never appear “together”, you can represent the diagram as a sum of terms involving one curve and terms involving the other, much simpler than it could have been!

Tuesday afternoon’s review talk by Iain Stewart was one of those “guest from an adjacent field” talks, in this case from an approach called SCET, and at first glance didn’t seem to do much to reach out to the non-SCET people in the audience. Frequent past collaborator of mine Andrew McLeod showed off a new set of relations between singularities of amplitudes, found by digging in to the structure of the equations discovered by Landau that control this behavior. He and his collaborators are proposing a new way to keep track of these things involving “minimal cuts”, a clear pun on the “maximal cuts” that have been of great use to other parts of the community. Whether this has more or less staying power than “negative geometries” remains to be seen.

Closing Tuesday, Shruti Paranjape showed there was more to discover about the simplest amplitudes, called “tree amplitudes”. By asking why these amplitudes are sometimes equal to zero, she was able to draw a connection to the “double-copy” structure that links the theory of the strong force and the theory of gravity. Johannes Henn’s talk noticed an intriguing pattern. A while back, I had looked into under which circumstances amplitudes were positive. Henn found that “positive” is an understatement. In a certain region, the amplitudes we were looking at turn out to not just be positive, but also always decreasing, and also with second derivative always positive. In fact, the derivatives appear to alternate, always with one sign or the other as one takes more derivatives. Henn is calling this unusual property “completely monotonous”, and trying to figure out how widely it holds.

Wednesday had a more mathematical theme. Bernd Sturmfels began with a “review talk” that largely focused on his own work on the space of curves with marked points, including a surprising analogy between amplitudes and the likelihood functions one needs to minimize in machine learning. Lauren Williams was the other “actual mathematician” of the day, and covered her work on various topics related to the amplituhedron.

The remaining talks on Wednesday were not literally by mathematicians, but were “mathematically informed”. Carolina Figueiredo and Hayden Lee talked about work with Nima Arkani-Hamed on different projects. Figueiredo’s talk covered recent developments in the “curve integral formalism”, a recent step in Nima’s quest to geometrize everything in sight, this time in the context of more realistic theories. The talk, which like those Nima gives used tablet-written slides, described new insights one can gain from this picture, including new pictures of how more complicated amplitudes can be built up of simpler ones. If you want to understand the curve integral formalism further, I’d actually suggest instead looking at Mark Spradlin’s slides from later that day. The second part of Spradlin’s talk dealt with an area Figueiredo marked for future research, including fermions in the curve integral picture. I confess I’m still not entirely sure what the curve integral formalism is good for, but Spradlin’s talk gave me a better idea of what it’s doing. (The first part of his talk was on a different topic, exploring the space of string-like amplitudes to figure out which ones are actually consistent.)

Hayden Lee’s talk mentions the emergence of time, but the actual story is a bit more technical. Lee and collaborators are looking at cosmological correlators, observables like scattering amplitudes but for cosmology. Evaluating these is challenging with standard techniques, but can be approached with some novel diagram-based rules which let the results be described in terms of the measurable quantities at the end in a kind of “amplituhedron-esque” way.

Aidan Herderschee and Mariana Carrillo González had talks on Wednesday on ways of dealing with curved space. Herderschee talked about how various amplitudes techniques need to be changed to deal with amplitudes in anti-de-Sitter space, with difference equations replacing differential equations and sum-by-parts relations replacing integration-by-parts relations. Carrillo González looked at curved space through the lens of a special kind of toy model theory called a self-dual theory, which allowed her to do cosmology-related calculations using a double-copy technique.

Finally, Stephen Sharpe had the second review talk on Wednesday. This was another “outside guest” talk, a discussion from someone who does Lattice QCD about how they have been using their methods to calculate scattering amplitudes. They seem to count the number of particles a bit differently than we do, I’m curious whether this came up in the question session.

Gravity-Defying Theories

5 Replies

Universal gravitation was arguably Newton’s greatest discovery. Newton realized that the same laws could describe the orbits of the planets and the fall of objects on Earth, that bodies like the Moon can be fully understood only if you take into account both the Earth and the Sun’s gravity. In a Newtonian world, every mass attracts every other mass in a tiny, but detectable way.

Einstein, in turn, explained why. In Einstein’s general theory of relativity, gravity comes from the shape of space and time. Mass attracts mass, but energy affects gravity as well. Anything that can be measured has a gravitational effect, because the shape of space and time is nothing more than the rules by which we measure distances and times. So gravitation really is universal, and has to be universal.

…except when it isn’t.

It turns out, physicists can write down theories with some odd properties. Including theories where things are, in a certain sense, immune to gravity.

The story started with two mathematicians, Shiing-Shen Chern and Jim Simons. Chern and Simons weren’t trying to say anything in particular about physics. Instead, they cared about classifying different types of mathematical space. They found a formula that, when added up over one of these spaces, counted some interesting properties of that space. A bit more specifically, it told them about the space’s topology: rough details, like the number of holes in a donut, that stay the same even if the space is stretched or compressed. Their formula was called the Chern-Simons Form.

The physicist Albert Schwarz saw this Chern-Simons Form, and realized it could be interpreted another way. He looked at it as a formula describing a quantum field, like the electromagnetic field, describing how the field’s energy varied across space and time. He called the theory describing the field Chern-Simons Theory, and it was one of the first examples of what would come to be known as topological quantum field theories.

In a topological field theory, every question you might want to ask can be answered in a topological way. Write down the chance you observe the fields at particular strengths in particular places, and you’ll find that the answer you get only depends on the topology of the space the fields occupy. The answers are the same if the space is stretched or squished together. That means that nothing you ask depends on the details of how you measure things, that nothing depends on the detailed shape of space and time. Your theory is, in a certain sense, independent of gravity.

Others discovered more theories of this kind. Edward Witten found theories that at first looked like they depend on gravity, but where the gravity secretly “cancels out”, making the theory topological again. It turned out that there were many ways to “twist” string theory to get theories of this kind.

Our world is for the most part not described by a topological theory, gravity matters! (Though it can be a good approximation for describing certain materials.) These theories are most useful, though, in how they allow physicists and mathematicians to work together. Physicists don’t have a fully mathematically rigorous way of defining most of their theories, just a series of approximations and an overall picture that’s supposed to tie them together. For a topological theory, though, that overall picture has a rigorous mathematical meaning: it counts topological properties! As such, topological theories allow mathematicians to prove rigorous results about physical theories. It means they can take a theory of quantum fields or strings that has a particular property that physicists are curious about, and find a version of that property that they can study in fully mathematical rigorous detail. It’s been a boon both to mathematicians interested in topology, and to physicists who want to know more about their theories.

So while you won’t have antigravity boots any time soon, theories that defy gravity are still useful!

The Impact of Jim Simons

2 Replies

The obituaries have been weirdly relevant lately.

First, a couple weeks back, Daniel Dennett died. Dennett was someone who could have had a huge impact on my life. Growing up combatively atheist in the early 2000’s, Dennett seemed to be exploring every question that mattered: how the semblance of consciousness could come from non-conscious matter, how evolution gives rise to complexity, how to raise a new generation to grow beyond religion and think seriously about the world around them. I went to Tufts to get my bachelor’s degree based on a glowing description he wrote in the acknowledgements of one of his books, and after getting there, I asked him to be my advisor.

(One of three, because the US education system, like all good games, can be min-maxed.)

I then proceeded to be far too intimidated to have a conversation with him more meaningful than “can you please sign my registration form?”

I heard a few good stories about Dennett while I was there, and I saw him debate once. I went into physics for my PhD, not philosophy.

Jim Simons died on May 10. I never spoke to him at all, not even to ask him to sign something. But he had a much bigger impact on my life.

I began my PhD at SUNY Stony Brook with a small scholarship from the Simons Foundation. The university’s Simons Center for Geometry and Physics had just opened, a shining edifice of modern glass next to the concrete blocks of the physics and math departments.

For a student aspiring to theoretical physics, the Simons Center virtually shouted a message. It taught me that physics, and especially theoretical physics, was something prestigious, something special. That if I kept going down that path I could stay in that world of shiny new buildings and daily cookie breaks with the occasional fancy jar-based desserts, of talks by artists and a café with twenty-dollar lunches (half-price once a week for students, the only time we could afford it, and still about twice what we paid elsewhere on campus). There would be garden parties with sushi buffets and late conference dinners with cauliflower steaks and watermelon salads. If I was smart enough (and I longed to be smart enough), that would be my future.

Simons and his foundation clearly wanted to say something along those lines, if not quite as filtered by the stars in a student’s eyes. He thought that theoretical physics, and research more broadly, should be something prestigious. That his favored scholars deserved more, and should demand more.

This did have weird consequences sometimes. One year, the university charged us an extra “academic excellence fee”. The story we heard was that Simons had demanded Stony Brook increase its tuition in order to accept his donations, so that it would charge more similarly to more prestigious places. As a state university, Stony Brook couldn’t do that…but it could add an extra fee. And since PhD students got their tuition, but not fees, paid by the department, we were left with an extra dent in our budgets.

The Simons Foundation created Quanta Magazine. If the Simons Center used food to tell me physics mattered, Quanta delivered the same message to professors through journalism. Suddenly, someone was writing about us, not just copying press releases but with the research and care of an investigative reporter. And they wrote about everything: not just sci-fi stories and cancer cures but abstract mathematics and the space of quantum field theories. Professors who had spent their lives straining to capture the public’s interest suddenly were shown an audience that actually wanted the real story.

In practice, the Simons Foundation made its decisions through the usual experts and grant committees. But the way we thought about it, the decisions always had a Jim Simons flavor. When others in my field applied for funding from the Foundation, they debated what Simons would want: would he support research on predictions for the LHC and LIGO? Or would he favor links to pure mathematics, or hints towards quantum gravity? Simons Collaboration Grants have an enormous impact on theoretical physics, dwarfing many other sources of funding. A grant funds an army of postdocs across the US, shifting the priorities of the field for years at a time.

Denmark has big foundations that have an outsize impact on science. Carlsberg, Villum, and the bigger-than-Denmark’s GDP Novo Nordisk have foundations with a major influence on scientific priorities. But Denmark is a country of six million. It’s much harder to have that influence on a country of three hundred million. Despite that, Simons came surprisingly close.

While we did like to think of the Foundation’s priorities as Simons’, I suspect that it will continue largely on the same track without him. Quanta Magazine is editorially independent, and clearly puts its trust in the journalists that made it what it is today.

I didn’t know Simons, I don’t think I even ever smelled one of his famous cigars. Usually, that would be enough to keep me from writing a post like this. But, through the Foundation, and now through Quanta, he’s been there with me the last fourteen years. That’s worth a reflection, at the very least.

Getting It Right vs Getting It Done

3 Replies

With all the hype around machine learning, I occasionally get asked if it could be used to make predictions for particle colliders, like the LHC.

Physicists do use machine learning these days, to be clear. There are tricks and heuristics, ways to quickly classify different particle collisions and speed up computation. But if you’re imagining something that replaces particle physics calculations entirely, or even replace the LHC itself, then you’re misunderstanding what particle physics calculations are for.

Why do physicists try to predict the results of particle collisions? Why not just observe what happens?

Physicists make predictions not in order to know what will happen in advance, but to compare those predictions to experimental results. If the predictions match the experiments, that supports existing theories like the Standard Model. If they don’t, then a new theory might be needed.

Those predictions certainly don’t need to be made by humans: most of the calculations are done by computers anyway. And they don’t need to be perfectly accurate: in particle physics, every calculation is an approximation. But the approximations used in particle physics are controlled approximations. Physicists keep track of what assumptions they make, and how they might go wrong. That’s not something you can typically do in machine learning, where you might train a neural network with millions of parameters. The whole point is to be able to check experiments against a known theory, and we can’t do that if we don’t know whether our calculation actually respects the theory.

That difference, between caring about the result and caring about how you got there, is a useful guide. If you want to predict how a protein folds in order to understand what it does in a cell, then you will find AlphaFold useful. If you want to confirm your theory of how protein folding happens, it will be less useful.

Some industries just want the final result, and can benefit from machine learning. If you want to know what your customers will buy, or which suppliers are cheating you, or whether your warehouse is moldy, then machine learning can be really helpful.

Other industries are trying, like particle physicists, to confirm that a theory is true. If you’re running a clinical trial, you want to be crystal clear about how the trial data turn into statistics. You, and the regulators, care about how you got there, not just about what answer you got. The same can be true for banks: if laws tell you you aren’t allowed to discriminate against certain kinds of customers for loans, you need to use a method where you know what traits you’re actually discriminating against.

So will physicists use machine learning? Yes, and more of it over time. But will they use it to replace normal calculations, or replace the LHC? No, that would be missing the point.

Models, Large Language and Otherwise

5 Replies

In particle physics, our best model goes under the unimaginative name “Standard Model“. The Standard Model models the world in terms of interactions of different particles, or more properly quantum fields. The fields have different masses and interact with different strengths, and each mass and interaction strength is a parameter: a “free” number in the model, one we have to fix based on data. There are nineteen parameters in the Standard Model (not counting the parameters for massive neutrinos, which were discovered later).

In principle, we could propose a model with more parameters that fit the data better. With enough parameters, one can fit almost anything. That’s cheating, though, and it’s a type of cheating we know how to catch. We have statistical tests that let us estimate how impressed we should be when a model matches the data. If a model is just getting ahead on extra parameters without capturing something real, we can spot that, because it gets a worse score on those tests. A model with a bad score might match the data you used to fix its parameters, but it won’t predict future data, so it isn’t actually useful. Right now the Standard Model (plus neutrino masses) gets the best score on those tests, when fitted to all the data we have access to, so we think of it as our best and most useful model. If someone proposed a model that got a better score, we’d switch: but so far, no-one has managed.

Physicists care about this not just because a good model is useful. We think that the best model is, in some sense, how things really work. The fact that the Standard Model fits the data best doesn’t just mean we can use it to predict more data in the future: it means that somehow, deep down, that the world is made up of quantum fields the way the Standard Model describes.

If you’ve been following developments in machine learning, or AI, you might have heard the word “model” slung around. For example, GPT is a Large Language Model, or LLM for short.

Large Language Models are more like the Standard Model than you might think. Just as the Standard Model models the world in terms of interacting quantum fields, Large Language Models model the world in terms of a network of connections between artificial “neurons”. Just as particles have different interaction strengths, pairs of neurons have different connection weights. Those connection weights are the parameters of a Large Language Model, in the same way that the masses and interaction strengths of particles are the parameters of the Standard Model. The parameters for a Large Language Model are fixed by a giant corpus of text data, almost the whole internet reduced to a string of bytes that the LLM needs to match, in the same way the Standard Model needs to match data from particle collider experiments. The Standard Model has nineteen parameters, Large Language Models have billions.

Increasingly, machine learning models seem to capture things better than other types of models. If you want to know how a protein is going to fold, you can try to make a simplified model of how its atoms and molecules interact with each other…but instead, you can make your model a neural network. And that turns out to work better. If you’re a bank and you want to know how many of your clients will default on their loans, you could ask an economist to make a macroeconomic model…or, you can just make your model a neural network too.

In physics, we think that the best model is the model that is closest to reality. Clearly, though, this can’t be what’s going on here. Real proteins don’t fold based on neural networks, and neither do real economies. Both economies and folding proteins are very complicated, so any model we can use right now won’t be what’s “really going on”, unlike the comparatively simple world of particle physics. Still, it seems weird that, compared to the simplified economic or chemical models, neural networks can work better, even if they’re very obviously not really what’s going on. Is there another way to think about them?

I used to get annoyed at people using the word “AI” to refer to machine learning models. In my mind, AI was the thing that shows up in science fiction, machines that can think as well or better than humans. (The actual term of art for this is AGI, artificial general intelligence.) Machine learning, and LLMs in particular, felt like a meaningful step towards that kind of AI, but they clearly aren’t there yet.

Since then, I’ve been convinced that the term isn’t quite so annoying. The AI field isn’t called AI because they’re creating a human-equivalent sci-fi intelligence. They’re called AI because the things they build are inspired by how human intelligence works.

As humans, we model things with mathematics, but we also model them with our own brains. Consciously, we might think about objects and their places in space, about people and their motivations and actions, about canonical texts and their contents. But all of those things cash out in our neurons. Anything we think, anything we believe, any model we can actually apply by ourselves in our own lives, is a model embedded in a neural network. It’s quite a bit more complicated neural network than an LLM, but it’s very much still a kind of neural network.

Because humans are alright at modeling a variety of things, because we can see and navigate the world and persuade and manipulate each other, we know that neural networks can do these things. A human brain may not be the best model for any given phenomenon: an engineer can model the flight of a baseball with math much better than the best baseball player can with their unaided brain. But human brains still tend to be fairly good models for a wide variety of things. Evolution has selected them to be.

So with that in mind, it shouldn’t be too surprising that neural networks can model things like protein folding. Even if proteins don’t fold based on neural networks, even if the success of AlphaFold isn’t capturing the actual details of the real world the way the Standard Model does, the model is capturing something. It’s loosely capturing the way a human would think about the problem, if you gave that human all the data they needed. And humans are, and remain, pretty good at thinking! So we have reason, not rigorous, but at least intuitive reason, to think that neural networks will actually be good models of things.

4 gravitons

Stories about physics from someone who's been there