Tag Archives: DoingScience

Cabinet of Curiosities: The Coaction

I had two more papers out this week, continuing my cabinet of curiosities. I’ll talk about one of them today, and the other in (probably) two weeks.

This week, I’m talking about a paper I wrote with an excellent Master’s student, Andreas Forum. Andreas came to me looking for a project on the mathematical side. I had a rather nice idea for his project at first, to explain a proof in an old math paper so it could be used by physicists.

Unfortunately, the proof I sent him off to explain didn’t actually exist. Fortunately, by the time we figured this out Andreas had learned quite a bit of math, so he was ready for his next project: a coaction for Calabi-Yau Feynman diagrams.

We chose to focus on one particular diagram, called a sunrise diagram for its resemblance to a sun rising over the sea:

This diagram

Feynman diagrams depict paths traveled by particles. The paths are a metaphor, or organizing tool, for more complicated calculations: computations of the chances fundamental particles behave in different ways. Each diagram encodes a complicated integral. This one shows one particle splitting into many, then those many particles reuniting into one.

Do the integrals in Feynman diagrams, and you get a variety of different mathematical functions. Many of them integrate to functions called polylogarithms, and we’ve gotten really really good at working with them. We can integrate them up, simplify them, and sometimes we can guess them so well we don’t have to do the integrals at all! We can do all of that because we know how to break polylogarithm functions apart, with a mathematical operation called a coaction. The coaction chops polylogarithms up to simpler parts, parts that are easier to work with.

More complicated Feynman diagrams give more complicated functions, though. Some of them give what are called elliptic functions. You can think of these functions as involving a geometrical shape, in this case a torus.

Other functions involve more complicated geometrical shapes, in some cases very complicated. For example, some involve the Calabi-Yau manifolds studied by string theorists. These sunrise diagrams are some of the simplest to involve such complicated geometry.

Other researchers had proposed a coaction for elliptic functions back in 2018. When they derived it, though, they left a recipe for something more general. Follow the instructions in the paper, and you could in principle find a coaction for other diagrams, even the Calabi-Yau ones, if you set it up right.

I had an idea for how to set it up right, and in the grand tradition of supervisors everywhere I got Andreas to do the dirty work of applying it. Despite the delay of our false start and despite the fact that this was probably in retrospect too big a project for a normal Master’s thesis, Andreas made it work!

Our result, though, is a bit weird. The coaction is a powerful tool for polylogarithms because it chops them up finely: keep chopping, and you get down to very simple functions. Our coaction isn’t quite so fine: we don’t chop our functions into as many parts, and the parts are more mysterious, more difficult to handle.

We think these are temporary problems though. The recipe we applied turns out to be a recipe with a lot of choices to make, less like Julia Child and more like one of those books where you mix-and-match recipes. We believe the community can play with the parameters of this recipe, finding new version of the coaction for new uses.

This is one of the shiniest of the curiosities in my cabinet this year, I hope it gets put to good use.

Cabinet of Curiosities: The Cubic

Before I launch into the post: I got interviewed on Theoretically Podcasting, a new YouTube channel focused on beginning grad student-level explanations of topics in theoretical physics. If that sounds interesting to you, check it out!

This Fall is paper season for me. I’m finishing up a number of different projects, on a number of different things. Each one was its own puzzle: a curious object found, polished, and sent off into the world.

Monday I published the first of these curiosities, along with Jake Bourjaily and Cristian Vergu.

I’ve mentioned before that the calculations I do involve a kind of “alphabet“. Break down a formula for the probability that two particles collide, and you find pieces that occur again and again. In the nicest cases, those pieces are rational functions, but they can easily get more complicated. I’ve talked before about a case where square roots enter the game, for example. But if square roots appear, what about something even more complicated? What about cubic roots?

What about 1024th roots?

Occasionally, my co-authors and I would say something like that at the end of a talk and an older professor would scoff: “Cube roots? Impossible!”

You might imagine these professors were just being unreasonable skeptics, the elderly-but-distinguished scientists from that Arthur C. Clarke quote. But while they turned out to be wrong, they weren’t being unreasonable. They were thinking back to theorems from the 60’s, theorems which seemed to argue that these particle physics calculations could only have a few specific kinds of behavior: they could behave like rational functions, like logarithms, or like square roots. Theorems which, as they understood them, would have made our claims impossible.

Eventually, we decided to figure out what the heck was going on here. We grabbed the simplest example we could find (a cube root involving three loops and eleven gluons in N=4 super Yang-Mills…yeah) and buckled down to do the calculation.

When we want to calculate something specific to our field, we can reference textbooks and papers, and draw on our own experience. Much of the calculation was like that. A crucial piece, though, involved something quite a bit less specific: calculating a cubic root. And for things like that, you can tell your teachers we use only the very best: Wikipedia.

Check out the Wikipedia entry for the cubic formula. It’s complicated, in ways the quadratic formula isn’t. It involves complex numbers, for one. But it’s not that crazy.

What those theorems from the 60’s said (and what they actually said, not what people misremembered them as saying), was that you can’t take a single limit of a particle physics calculation, and have it behave like a cubic root. You need to take more limits, not just one, to see it.

It turns out, you can even see this just from the Wikipedia entry. There’s a big cube root sign in the middle there, equal to some variable “C”. Look at what’s inside that cube root. You want that part inside to vanish. That means two things need to cancel: Wikipedia labels them \Delta_1, and \sqrt{\Delta_1^2-4\Delta_0^3}. Do some algebra, and you’ll see that for those to cancel, you need \Delta_0=0.

So you look at the limit, \Delta_0\rightarrow 0. This time you need not just some algebra, but some calculus. I’ll let the students in the audience work it out, but at the end of the day, you should notice how C behaves when \Delta_0 is small. It isn’t like \sqrt[3]{\Delta_0}. It’s like just plain \Delta_0. The cube root goes away.

It can come back, but only if you take another limit: not just \Delta_0\rightarrow 0, but \Delta_1\rightarrow 0 as well. And that’s just fine according to those theorems from the 60’s. So our cubic curiosity isn’t impossible after all.

Our calculation wasn’t quite this simple, of course. We had to close a few loopholes, checking our example in detail using more than just Wikipedia-based methods. We found what we thought was a toy example, that turned out to be even more complicated, involving roots of a degree-six polynomial (one that has no “formula”!).

And in the end, polished and in their display case, we’ve put our examples up for the world to see. Let’s see what people think of them!

Why the Antipode Was Supposed to Be Useless

A few weeks back, Quanta Magazine had an article about a new discovery in my field, called antipodal duality.

Some background: I’m a theoretical physicist, and I work on finding better ways to make predictions in particle physics. Folks in my field make these predictions with formulas called “scattering amplitudes” that encode the probability that particles bounce, or scatter, in particular ways. One trick we’ve found is that these formulas can often be written as “words” in a kind of “alphabet”. If we know the alphabet, we can make our formulas much simpler, or even guess formulas we could never have calculated any other way.

Quanta’s article describes how a few friends of mine (Lance Dixon, Ömer Gürdoğan, Andrew McLeod, and Matthias Wilhelm) noticed a weird pattern in two of these formulas, from two different calculations. If you flip the “words” around, back to front (an operation called the antipode), you go from a formula describing one collision of particles to a formula for totally different particles. Somehow, the two calculations are “dual”: two different-seeming descriptions that secretly mean the same thing.

Quanta quoted me for their article, and I was (pleasantly) baffled. See, the antipode was supposed to be useless. The mathematicians told us it was something the math allows us to do, like you’re allowed to order pineapple on pizza. But just like pineapple on pizza, we couldn’t imagine a situation where we actually wanted to do it.

What Quanta didn’t say was why we thought the antipode was useless. That’s a hard story to tell, one that wouldn’t fit in a piece like that.

It fits here, though. So in the rest of this post, I’d like to explain why flipping around words is such a strange, seemingly useless thing to do. It’s strange because it swaps two things that in physics we thought should be independent: branch cuts and derivatives, or particles and symmetries.

Let’s start with the first things in each pair: branch cuts, and particles.

The first few letters of our “word” tell us something mathematical, and they tell us something physical. Mathematically, they tell us ways that our formula can change suddenly, and discontinuously.

Take a logarithm, the inverse of e^x. You’re probably used to plugging in positive numbers, and getting out something reasonable, that changes in a smooth and regular way: after all, e^x is always positive, right? But in mathematics, you don’t have to just use positive numbers. You can use negative numbers. Even more interestingly, you can use complex numbers. And if you take the logarithm of a complex number, and look at the imaginary part, it looks like this:

Mostly, this complex logarithm still seems to be doing what it’s supposed to, changing in a nice slow way. But there is a weird “cut” in the graph for negative numbers: a sudden jump, from \pi to -\pi. That jump is called a “branch cut”.

As physicists, we usually don’t like our formulas to make sudden changes. A change like this is an infinitely fast jump, and we don’t like infinities much either. But we do have one good use for a formula like this, because sometimes our formulas do change suddenly: when we have enough energy to make a new particle.

Imagine colliding two protons together, like at the LHC. Colliding particles doesn’t just break the protons into pieces: due to Einstein’s famous E=mc^2, it can create new particles as well. But to create a new particle, you need enough energy: mc^2 worth of energy. So as you dial up the energy of your protons, you’ll notice a sudden change: you couldn’t create, say, a Higgs boson, and now you can. Our formulas represent some of those kinds of sudden changes with branch cuts.

So the beginning of our “words” represent branch cuts, and particles. The end represents derivatives and symmetries.

Derivatives come from the land of calculus, a place spooky to those with traumatic math class memories. Derivatives shouldn’t be so spooky though. They’re just ways we measure change. If we have a formula that is smoothly changing as we change some input, we can describe that change with a derivative.

The ending of our “words” tell us what happens when we take a derivative. They tell us which ways our formulas can smoothly change, and what happens when they do.

In doing so, they tell us about something some physicists make sound spooky, called symmetries. Symmetries are changes we can make that don’t really change what’s important. For example, you could imagine lifting up the entire Large Hadron Collider and (carefully!) carrying it across the ocean, from France to the US. We’d expect that, once all the scared scientists return and turn it back on, it would start getting exactly the same results. Physics has “translation symmetry”: you can move, or “translate” an experiment, and the important stuff stays the same.

These symmetries are closely connected to derivatives. If changing something doesn’t change anything important, that should be reflected in our formulas: they shouldn’t change either, so their derivatives should be zero. If instead the symmetry isn’t quite true, if it’s what we call “broken”, then by knowing how it was “broken” we know what the derivative should be.

So branch cuts tell us about particles, derivatives tell us about symmetries. The weird thing about the antipode, the un-physical bizarre thing, is that it swaps them. It makes the particles of one calculation determine the symmetries of another.

(And lest you’ve heard about particles with symmetries, like gluons and SU(3)…this is a different kind of thing. I don’t have enough room to explain why here, but it’s completely unrelated.)

Why the heck does this duality exist?

A commenter on the last post asked me to speculate. I said there that I have no clue, and that’s most of the answer.

If I had to speculate, though, my answer might be disappointing.

Most of the things in physics we call “dualities” have fairly deep physical meanings, linked to twisting spacetime in complicated ways. AdS/CFT isn’t fully explained, but it seems to be related to something called the holographic principle, the idea that gravity ties together the inside of space with the boundary around it. T duality, an older concept in string theory, is explained: a consequence of how strings “see” the world in terms of things to wrap around and things to spin around. In my field, one of our favorite dualities links back to this as well, amplitude-Wilson loop duality linked to fermionic T-duality.

The antipode doesn’t twist spacetime, it twists the mathematics. And it may be it matters only because the mathematics is so constrained that it’s forced to happen.

The trick that Lance Dixon and co. used to discover antipodal duality is the same trick I used with Lance to calculate complicated scattering amplitudes. It relies on taking a general guess of words in the right “alphabet”, and constraining it: using mathematical and physical principles it must obey and throwing out every illegal answer until there’s only one answer left.

Currently, there are some hints that the principles used for the different calculations linked by antipodal duality are “antipodal mirrors” of each other: that different principles have the same implication when the duality “flips” them around. If so, then it could be this duality is in some sense just a coincidence: not a coincidence limited to a few calculations, but a coincidence limited to a few principles. Thought of in this way, it might not tell us a lot about other situations, it might not really be “deep”.

Of course, I could be wrong about this. It could be much more general, could mean much more. But in that context, I really have no clue what to speculate. The antipode is weird: it links things that really should not be physically linked. We’ll have to see what that actually means.

Amplitudes 2022 Retrospective

I’m back from Amplitudes 2022 with more time to write, and (besides the several papers I’m working on) that means writing about the conference! Casual readers be warned, there’s no way around this being a technical post, I don’t have the space to explain everything!

I mostly said all I wanted about the way the conference was set up in last week’s post, but one thing I didn’t say much about was the conference dinner. Most conference dinners are the same aside from the occasional cool location or haggis speech. This one did have a cool location, and a cool performance by a blind pianist, but the thing I really wanted to comment on was the setup. Typically, the conference dinner at Amplitudes is a sit-down affair: people sit at tables in one big room, maybe getting up occasionally to pick up food, and eventually someone gives an after-dinner speech. This time the tables were standing tables, spread across several rooms. This was a bit tiring on a hot day, but it did have the advantage that it naturally mixed people around. Rather than mostly talking to “your table”, you’d wander, ending up at a new table every time you picked up new food or drinks. It was a good way to meet new people, a surprising number of which in my case apparently read this blog. It did make it harder to do an after-dinner speech, so instead Lance gave an after-conference speech, complete with the now-well-established running joke where Greta Thunberg tries to get us to fly less.

(In another semi-running joke, the organizers tried to figure out who had attended the most of the yearly Amplitudes conferences over the years. Weirdly, no-one has attended all twelve.)

In terms of the content, and things that stood out:

Nima is getting close to publishing his newest ‘hedron, the surfacehedron, and correspondingly was able to give a lot more technical detail about it. (For his first and most famous amplituhedron, see here.) He still didn’t have enough time to explain why he has to use category theory to do it, but at least he was concrete enough that it was reasonably clear where the category theory was showing up. (I wasn’t there for his eight-hour lecture at the school the week before, maybe the students who stuck around until 2am learned some category theory there.) Just from listening in on side discussions, I got the impression that some of the ideas here actually may have near-term applications to computing Feynman diagrams: this hasn’t been a feature of previous ‘hedra and it’s an encouraging development.

Alex Edison talked about progress towards this blog’s namesake problem, the question of whether N=8 supergravity diverges at seven loops. Currently they’re working at six loops on the N=4 super Yang-Mills side, not yet in a form it can be “double-copied” to supergravity. The tools they’re using are increasingly sophisticated, including various slick tricks from algebraic geometry. They are looking to the future: if they’re hoping their methods will reach seven loops, the same methods have to make six loops a breeze.

Xi Yin approached a puzzle with methods from String Field Theory, prompting the heretical-for-us title “on-shell bad, off-shell good”. A colleague reminded me of a local tradition for dealing with heretics.

While Nima was talking about a new ‘hedron, other talks focused on the original amplituhedron. Paul Heslop found that the amplituhedron is not literally a positive geometry, despite slogans to the contrary, but what it is is nonetheless an interesting generalization of the concept. Livia Ferro has made more progress on her group’s momentum amplituhedron: previously only valid at tree level, they now have a picture that can accomodate loops. I wasn’t sure this would be possible, there are a lot of things that work at tree level and not for loops, so I’m quite encouraged that this one made the leap successfully.

Sebastian Mizera, Andrew McLeod, and Hofie Hannesdottir all had talks that could be roughly summarized as “deep principles made surprisingly useful”. Each took topics that were explored in the 60’s and translated them into concrete techniques that could be applied to modern problems. There were surprisingly few talks on the completely concrete end, on direct applications to collider physics. I think Simone Zoia’s was the only one to actually feature collider data with error bars, which might explain why I singled him out to ask about those error bars later.

Likewise, Matthias Wilhelm’s talk was the only one on functions beyond polylogarithms, the elliptic functions I’ve also worked on recently. I wonder if the under-representation of some of these topics is due to the existence of independent conferences: in a year when in-person conferences are packed in after being postponed across the pandemic, when there are already dedicated conferences for elliptics and practical collider calculations, maybe people are just a bit too tired to go to Amplitudes as well.

Talks on gravitational waves seem to have stabilized at roughly a day’s worth, which seems reasonable. While the subfield’s capabilities continue to be impressive, it’s also interesting how often new conceptual challenges appear. It seems like every time a challenge to their results or methods is resolved, a new one shows up. I don’t know whether the field will ever get to a stage of “business as usual”, or whether it will be novel qualitative questions “all the way up”.

I haven’t said much about the variety of talks bounding EFTs and investigating their structure, though this continues to be an important topic. And I haven’t mentioned Lance Dixon’s talk on antipodal duality, largely because I’m planning a post on it later: Quanta Magazine had a good article on it, but there are some aspects even Quanta struggled to cover, and I think I might have a good way to do it.

At Amplitudes 2022 in Prague

It’s that time of year again! I’m at the big yearly conference of my subfield, Amplitudes, this year in Prague.

The conference poster included a picture of Prague’s famous clock, which is admittedly cool. But I think this computer-generated anachronism from Matt Schwartz’s machine learning talk is much more fun.

Amplitudes has grown, and keeps growing. The last time we met in person, there were 175 of us. This year, many people are skipping: some avoiding travel due to COVID, others just exhausted from a summer filled with long-postponed conferences. Nonetheless, we have more people here than then: 222 registered participants!

The large number of people means a large number of talks. Almost all were quite short, 25+5 minutes. Some speakers took advantage of the short length to deliver very accessible talks. Others seemed to think of the time limit as an excuse to cut short the introduction and dive right into technical details. We had just a few 40+5 minute talks, each a review from an adjacent field.

It’s been fun seeing people in person again. I think half of my conversations started with “It’s been a long time!” It’s easy for motivation to wane when you don’t have regular contact with the wider field, getting enthusiastic about shared goals and brainstorming big questions.

I’ll probably give a longer retrospective later: the packed schedule means I don’t have much time to write! But I can say that I’ve largely enjoyed this, the organizers were organized and the presenters presented and things felt a bit more like they ought to in the world.

Covering the Angles

One way to think of science is of a lot of interesting little problems. Some scientists are driven by questions like “how does this weird cell work?” or “how accurately can I predict the chance these particles collide?” If the puzzles are fun enough and the questions are interesting enough, then that can be enough motivation on its own.

Another perspective thinks of science as pursuit of a few big problems. Physicists want to write down the laws of nature, to know where the universe came from, to reconcile gravity and quantum mechanics. Biologists want to understand how life works and manipulate it, psychologists want the same for the human mind. For some scientists, these big questions are at the heart of why they do science. Someone in my field once joked he can’t get up in the morning without telling himself “spacetime is doomed”.

Even if you care about the big questions, though, you can’t neglect the small ones. That’s because modern science is collaborative. A big change, like a new particle or a whole new theory of physics, requires confirmation. It’s not enough for one person to propose it. The ideas that last in science last because they crop up in many different places, with many different methods. They last because we check all the angles, compulsively, looking for any direction that might be screwed up.

In those checks, any and all science can be useful. We need the big conceptual leaps from people like Einstein and the careful and systematic measurements of Brahe. We need people who look for the wackiest ideas, not just because they might be true, but to rule them out when they’re false, to make us all the more confident we’re on the right path. We need people pushing tried-and-true theories to the next leap of precision, to show that nothing is hiding in the gaps and make it clearer when something is. We need many people pushing many different paths: all are necessary, and any one might be crucial.

Often, one of these paths gets the lion’s share of the glory: the press, the Nobel, the mention in the history books. But the other paths still matter: we wouldn’t be confident in the science if they didn’t exist. Most working scientists will be on those other paths, as a matter of course. But we still need them to get science done.

The Conference Dilemma: Freshness vs. Breadth

Back in 2017, I noticed something that should have struck me as a little odd. My sub-field has a big yearly conference, called Amplitudes, that brings in everyone who works on our kind of research. Amplitudes 2017 was fun, but not “fresh”: most people talked about work they had already published. A smaller conference I went to that year, called QCD Meets Gravity, was much “fresher”: a lot of discussion of work in progress and work “hot off the presses”.

At the time, I chalked the difference up to timing: it was a few months later, and people happened to have projects that matured around then. But I realized recently there’s another reason, one why you would expect bigger conferences to have less fresh content.

See, I’ve recently been on the other “side of the curtain”: I was an organizer for Amplitudes last year. And I noticed one big obstacle to having fresh content: the timeframe.

The bigger a conference is, the longer in advance you need to invite speakers. It’s a bigger task to organize everyone, to make sure travel and hotels and raw availability works, that everyone has time to prepare their talks and you have a nice full (but not too full) schedule. So when we started asking people, we didn’t know what the “freshest” work was going to be. We had recommendations from our scientific committee (a group of experts in the subfield whose job is to suggest speakers), but in practice the goal is more one of breadth than freshness: we needed to make sure that everybody in our community was represented.

A smaller conference can get around this. It can be organized a bit later, so the organizers have more information about new developments. It covers a smaller area, so the organizers have more information about new hot topics and unpublished results. And it typically invites most of the sub-community anyway, so you’re guaranteed to cover the hot new stuff just by raw completeness.

This doesn’t mean small conferences are “just better” or anything like that. Breadth is genuinely useful: a big conference covering a whole subfield is great for bringing a community together, getting everyone on a shared page and expanding their horizons. There’s a real tradeoff between those goals and getting a conference with the latest progress. It’s not a fixed tradeoff, we can improve both goals at once (I think at Amplitudes we as organizers could have been better at highlighting unpublished work), but we still have to make choices of what to emphasize.

Einstein-Years

Scott Aaronson recently published an interesting exchange on his blog Shtetl Optimized, between him and cognitive psychologist Steven Pinker. The conversation was about AI: Aaronson is optimistic (though not insanely so) Pinker is pessimistic (again, not insanely though). While fun reading, the whole thing would normally be a bit too off-topic for this blog, except that Aaronson’s argument ended up invoking something I do know a bit about: how we make progress in theoretical physics.

Aaronson was trying to respond to an argument of Pinker’s, that super-intelligence is too vague and broad to be something we could expect an AI to have. Aaronson asks us to imagine an AI that is nothing more or less than a simulation of Einstein’s brain. Such a thing isn’t possible today, and might not even be efficient, but it has the advantage of being something concrete we can all imagine. Aarsonson then suggests imagining that AI sped up a thousandfold, so that in one year it covers a thousand years of Einstein’s thought. Such an AI couldn’t solve every problem, of course. But in theoretical physics, surely such an AI could be safely described as super-intelligent: an amazing power that would change the shape of physics as we know it.

I’m not as sure of this as Aaronson is. We don’t have a machine that generates a thousand Einstein-years to test, but we do have one piece of evidence: the 76 Einstein-years the man actually lived.

Einstein is rightly famous as a genius in theoretical physics. His annus mirabilis resulted in five papers that revolutionized the field, and the next decade saw his theory of general relativity transform our understanding of space and time. Later, he explored what general relativity was capable of and framed challenges that deepened our understanding of quantum mechanics.

After that, though…not so much. For Einstein-decades, he tried to work towards a new unified theory of physics, and as far as I’m aware made no useful progress at all. I’ve never seen someone cite work from that period of Einstein’s life.

Aarsonson mentions simulating Einstein “at his peak”, and it would be tempting to assume that the unified theory came “after his peak”, when age had weakened his mind. But while that kind of thing can sometimes be an issue for older scientists, I think it’s overstated. I don’t think careers peak early because of “youthful brains”, and with the exception of genuine dementia I don’t think older physicists are that much worse-off cognitively than younger ones. The reason so many prominent older physicists go down unproductive rabbit-holes isn’t because they’re old. It’s because genius isn’t universal.

Einstein made the progress he did because he was the right person to make that progress. He had the right background, the right temperament, and the right interests to take others’ mathematics and take them seriously as physics. As he aged, he built on what he found, and that background in turn enabled him to do more great things. But eventually, the path he walked down simply wasn’t useful anymore. His story ended, driven to a theory that simply wasn’t going to work, because given his experience up to that point that was the work that interested him most.

I think genius in physics is in general like that. It can feel very broad because a good genius picks up new tricks along the way, and grows their capabilities. But throughout, you can see the links: the tools mastered at one age that turn out to be just right for a new pattern. For the greatest geniuses in my field, you can see the “signatures” in their work, hints at why they were just the right genius for one problem or another. Give one a thousand years, and I suspect the well would eventually run dry: the state of knowledge would no longer be suitable for even their breadth.

…of course, none of that really matters for Aaronson’s point.

A century of Einstein-years wouldn’t have found the Standard Model or String Theory, but a century of physicist-years absolutely did. If instead of a simulation of Einstein, your AI was a simulation of a population of scientists, generating new geniuses as the years go by, then the argument works again. Sure, such an AI would be much more expensive, much more difficult to build, but the first one might have been as well. The point of the argument is simply to show such a thing is possible.

The core of Aaronson’s point rests on two key traits of technology. Technology is replicable: once we know how to build something, we can build more of it. Technology is scalable: if we know how to build something, we can try to build a bigger one with more resources. Evolution can tap into both of these, but not reliably: just because it’s possible to build a mind a thousand times better at some task doesn’t mean it will.

That is why the possibility of AI leads to the possibility of super-intelligence. If we can make a computer that can do something, we can make it do that something faster. That something doesn’t have to be “general”, you can have programs that excel at one task or another. For each such task, with more resources you can scale things up: so anything a machine can do now, a later machine can probably do better. Your starting-point doesn’t necessarily even have to be efficient, or a good algorithm: bad algorithms will take longer to scale, but could eventually get there too.

The only question at that point is “how fast?” I don’t have the impression that’s settled. The achievements that got Pinker and Aarsonson talking, GPT-3 and DALL-E and so forth, impressed people by their speed, by how soon they got to capabilities we didn’t expect them to have. That doesn’t mean that something we might really call super-intelligence is close: that has to do with the details, with what your target is and how fast you can actually scale. And it certainly doesn’t mean that another approach might not be faster! (As a total outsider, I can’t help but wonder if current ML is in some sense trying to fit a cubic with straight lines.)

It does mean, though, that super-intelligence isn’t inconceivable, or incoherent. It’s just the recognition that technology is a master of brute force, and brute force eventually triumphs. If you want to think about what happens in that “eventually”, that’s a very important thing to keep in mind.

Proxies for Proxies

Why pay scientists?

Maybe you care about science itself. You think that exploring the world should be one of our central goals as human beings, that it “makes our country worth defending”.

Maybe you care about technology. You support science because, down the line, you think it will give us new capabilities that improve people’s lives. Maybe you expect this to happen directly, or maybe indirectly as “spinoff” inventions like the internet.

Maybe you just think science is cool. You want the stories that science tells: they entertain you, they give you a place in the world, they help distract from the mundane day to day grind.

Maybe you just think that the world ought to have scientists in it. You can think of it as a kind of bargain, maintaining expertise so that society can tackle difficult problems. Or you can be more cynical, paying early-career scientists on the assumption that most will leave academia and cheapen labor costs for tech companies.

Maybe you want to pay the scientists to teach, to be professors at universities. You notice that they don’t seem to be happy if you don’t let them research, so you throw a little research funding at them, as a treat.

Maybe you just want to grow your empire: your department, your university, the job numbers in your district.

In most jobs, you’re supposed to do what people pay you to do. As a scientist, the people who pay you have all of these motivations and more. You can’t simply choose to do what people pay you to do.

So you come up with a proxy. You sum up all of these ideas, into a vague picture of what all those people want. You have some idea of scientific quality: not just a matter of doing science correctly and carefully, but doing interesting science. It’s not something you ever articulate. It’s likely even contradictory, after all, the goals it approximates often are. Nonetheless, it’s your guide, and not just your guide: it’s the guide of those who hire you, those who choose if you get promoted or whether you get more funding. All of these people have some vague idea in their head of what makes good science, their own proxy for the desires of the vast mass of voters and decision-makers and funders.

But of course, the standard is still vague. Should good science be deep? Which topics are deeper than others? Should it be practical? Practical for whom? Should it be surprising? What do you expect to happen, and what would surprise you? Should it get the community excited? Which community?

As a practicing scientist, you have to build your own proxy for these proxies. The same work that could get you hired in one place might meet blank stares at another, and you can’t build your life around those unpredictable quirks. So you make your own vague idea of what you’re supposed to do, an alchemy of what excites you and what makes an impact and what your friends are doing. You build a stand-in in your head, on the expectation that no-one else will have quite the same stand-in, then go out and convince the other stand-ins to give money to your version. You stand on a shifting pile of unwritten rules, subtler even than some artists, because at the end of the day there’s never a real client to be seen. Just another proxy.

Types of Undergrad Projects

I saw a discussion on twitter recently, about PhD programs in the US. Apparently universities are putting more and more weight whether prospective students published a paper during their Bachelor’s degree. For some, it’s even an informal requirement. Some of those in the discussion were skeptical that the students were really contributing to these papers much, and thought that most of the work must have been done by the papers’ other authors. If so, this would mean universities are relying more and more on a metric that depends on whether students can charm their professors enough to be “included” in this way, rather than their own abilities.

I won’t say all that much about the admissions situation in the US. (Except to say that if you find yourself making up new criteria to carefully sift out a few from a group of already qualified-enough candidates, maybe you should consider not doing that.) What I did want to say a bit about is what undergraduates can typically actually do, when it comes to research in my field.

First, I should clarify that I’m talking about students in the US system here. Undergraduate degrees in Europe follow a different path. Students typically take three years to get a Bachelor’s degree, often with a project at the end, followed by a two-year Master’s degree capped with a Master’s thesis. A European Master’s thesis doesn’t have to result in a paper, but is often at least on that level, while a European Bachelor project typically isn’t. US Bachelor’s degrees are four years, so one might expect a Bachelor’s thesis to be in between a European Bachelor’s project and Master’s thesis. In practice, it’s a bit different: courses for Master’s students in Europe will generally cover material taught to PhD students in the US, so a typical US Bachelor’s student won’t have had some courses that have a big role in research in my field, like Quantum Field Theory. On the other hand, the US system is generally much more flexible, with students choosing more of their courses and having more opportunities to advance ahead of the default path. So while US Bachelor’s students don’t typically take Quantum Field Theory, the more advanced students can and do.

Because of that, how advanced a given US Bachelor’s student is varies. A small number are almost already PhD students, and do research to match. Most aren’t, though. Despite that, it’s still possible for such a student to complete a real research project in theoretical physics, one that results in a real paper. What does that look like?

Sometimes, it’s because the student is working with a toy model. The problems we care about in theoretical physics can be big and messy, involving a lot of details that only an experienced researcher will know. If we’re lucky, we can make a simpler version of the problem, one that’s easier to work with. Toy models like this are often self-contained, the kind of thing a student can learn without all of the background we expect. The models may be simpler than the real world, but they can still be interesting, suggesting new behavior that hadn’t been considered before. As such, with a good choice of toy model an undergraduate can write something that’s worthy of a real physics paper.

Other times, the student is doing something concrete in a bigger collaboration. This isn’t quite the same as the “real scientists” doing all the work, because the student has a real task to do, just one that is limited in scope. Maybe there is particular computer code they need to get working, or a particular numerical calculation they need to do. The calculation may be comparatively straightforward, but in combination with other results it can still merit a paper. My first project as a PhD student was a little like that, tackling one part of a larger calculation. Once again, the task can be quite self-contained, the kind of thing you can teach a student over a summer project.

Undergraduate projects in the US won’t always result in a paper, and I don’t think anyone should expect, or demand, that they do. But a nontrivial number do, and not because the student is “cheating”. With luck, a good toy model or a well-defined sub-problem can lead a Bachelor’s student to make a real contribution to physics, and get a paper in the bargain.