Amplitudes has grown, and keeps growing. The last time we met in person, there were 175 of us. This year, many people are skipping: some avoiding travel due to COVID, others just exhausted from a summer filled with long-postponed conferences. Nonetheless, we have more people here than then: 222 registered participants!
The large number of people means a large number of talks. Almost all were quite short, 25+5 minutes. Some speakers took advantage of the short length to deliver very accessible talks. Others seemed to think of the time limit as an excuse to cut short the introduction and dive right into technical details. We had just a few 40+5 minute talks, each a review from an adjacent field.
It’s been fun seeing people in person again. I think half of my conversations started with “It’s been a long time!” It’s easy for motivation to wane when you don’t have regular contact with the wider field, getting enthusiastic about shared goals and brainstorming big questions.
I’ll probably give a longer retrospective later: the packed schedule means I don’t have much time to write! But I can say that I’ve largely enjoyed this, the organizers were organized and the presenters presented and things felt a bit more like they ought to in the world.
Scott Aaronson recently published an interesting exchange on his blog Shtetl Optimized, between him and cognitive psychologist Steven Pinker. The conversation was about AI: Aaronson is optimistic (though not insanely so) Pinker is pessimistic (again, not insanely though). While fun reading, the whole thing would normally be a bit too off-topic for this blog, except that Aaronson’s argument ended up invoking something I do know a bit about: how we make progress in theoretical physics.
Aaronson was trying to respond to an argument of Pinker’s, that super-intelligence is too vague and broad to be something we could expect an AI to have. Aaronson asks us to imagine an AI that is nothing more or less than a simulation of Einstein’s brain. Such a thing isn’t possible today, and might not even be efficient, but it has the advantage of being something concrete we can all imagine. Aarsonson then suggests imagining that AI sped up a thousandfold, so that in one year it covers a thousand years of Einstein’s thought. Such an AI couldn’t solve every problem, of course. But in theoretical physics, surely such an AI could be safely described as super-intelligent: an amazing power that would change the shape of physics as we know it.
I’m not as sure of this as Aaronson is. We don’t have a machine that generates a thousand Einstein-years to test, but we do have one piece of evidence: the 76 Einstein-years the man actually lived.
After that, though…not so much. For Einstein-decades, he tried to work towards a new unified theory of physics, and as far as I’m aware made no useful progress at all. I’ve never seen someone cite work from that period of Einstein’s life.
Aarsonson mentions simulating Einstein “at his peak”, and it would be tempting to assume that the unified theory came “after his peak”, when age had weakened his mind. But while that kind of thing can sometimes be an issue for older scientists, I think it’s overstated. I don’t think careers peak early because of “youthful brains”, and with the exception of genuine dementia I don’t think older physicists are that much worse-off cognitively than younger ones. The reason so many prominent older physicists go down unproductive rabbit-holes isn’t because they’re old. It’s because genius isn’t universal.
Einstein made the progress he did because he was the right person to make that progress. He had the right background, the right temperament, and the right interests to take others’ mathematics and take them seriously as physics. As he aged, he built on what he found, and that background in turn enabled him to do more great things. But eventually, the path he walked down simply wasn’t useful anymore. His story ended, driven to a theory that simply wasn’t going to work, because given his experience up to that point that was the work that interested him most.
I think genius in physics is in general like that. It can feel very broad because a good genius picks up new tricks along the way, and grows their capabilities. But throughout, you can see the links: the tools mastered at one age that turn out to be just right for a new pattern. For the greatest geniuses in my field, you can see the “signatures” in their work, hints at why they were just the right genius for one problem or another. Give one a thousand years, and I suspect the well would eventually run dry: the state of knowledge would no longer be suitable for even their breadth.
…of course, none of that really matters for Aaronson’s point.
A century of Einstein-years wouldn’t have found the Standard Model or String Theory, but a century of physicist-years absolutely did. If instead of a simulation of Einstein, your AI was a simulation of a population of scientists, generating new geniuses as the years go by, then the argument works again. Sure, such an AI would be much more expensive, much more difficult to build, but the first one might have been as well. The point of the argument is simply to show such a thing is possible.
The core of Aaronson’s point rests on two key traits of technology. Technology is replicable: once we know how to build something, we can build more of it. Technology is scalable: if we know how to build something, we can try to build a bigger one with more resources. Evolution can tap into both of these, but not reliably: just because it’s possible to build a mind a thousand times better at some task doesn’t mean it will.
That is why the possibility of AI leads to the possibility of super-intelligence. If we can make a computer that can do something, we can make it do that something faster. That something doesn’t have to be “general”, you can have programs that excel at one task or another. For each such task, with more resources you can scale things up: so anything a machine can do now, a later machine can probably do better. Your starting-point doesn’t necessarily even have to be efficient, or a good algorithm: bad algorithms will take longer to scale, but could eventually get there too.
The only question at that point is “how fast?” I don’t have the impression that’s settled. The achievements that got Pinker and Aarsonson talking, GPT-3 and DALL-E and so forth, impressed people by their speed, by how soon they got to capabilities we didn’t expect them to have. That doesn’t mean that something we might really call super-intelligence is close: that has to do with the details, with what your target is and how fast you can actually scale. And it certainly doesn’t mean that another approach might not be faster! (As a total outsider, I can’t help but wonder if current ML is in some sense trying to fit a cubic with straight lines.)
It does mean, though, that super-intelligence isn’t inconceivable, or incoherent. It’s just the recognition that technology is a master of brute force, and brute force eventually triumphs. If you want to think about what happens in that “eventually”, that’s a very important thing to keep in mind.
The Niels Bohr Institute is hosting a conference this week on New Ideas in Cosmology. I’m no cosmologist, but it’s a pretty cool field, so as a local I’ve been sitting in on some of the talks. So far they’ve had a selection of really interesting speakers with quite a variety of interests, including a talk by Roger Penrose with his trademark hand-stippled drawings.
One thing that has impressed me has been the “interdisciplinary” feel of the conference. By all rights this should be one “discipline”, cosmology. But in practice, each speaker came at the subject from a different direction. They all had a shared core of knowledge, common models of the universe they all compare to. But the knowledge they brought to the subject varied: some had deep knowledge of the mathematics of gravity, others worked with string theory, or particle physics, or numerical simulations. Each talk, aware of the varied audience, was a bit “colloquium-style“, introducing a framework before diving in to the latest research. Each speaker knew enough to talk to the others, but not so much that they couldn’t learn from them. It’s been unexpectedly refreshing, a real interdisciplinary conference done right.
I’m at a conference this week of a very particular type: a birthday conference. When folks in my field turn 60, their students and friends organize a special conference for them, celebrating their research legacy. With COVID restrictions just loosening, my advisor Michael Douglas is getting a last-minute conference. And as one of the last couple students he graduated at Stony Brook, I naturally showed up.
The conference, Mikefest, is at the Institut des Hautes Études Scientifiques, just outside of Paris. Mike was a big supporter of the IHES, putting in a lot of fundraising work for them. Another big supporter, James Simons, was Mike’s employer for a little while after his time at Stony Brook. The conference center we’re meeting in is named for him.
I wasn’t involved in organizing the conference, so it was interesting seeing differences between this and other birthday conferences. Other conferences focus on the birthday prof’s “family tree”: their advisor, their students, and some of their postdocs. We’ve had several talks from Mike’s postdocs, and one from his advisor, but only one from a student. Including him and me, three of Mike’s students are here: another two have had their work mentioned but aren’t speaking or attending.
Most of the speakers have collaborated with Mike, but only for a few papers each. All of them emphasized a broader debt though, for discussions and inspiration outside of direct collaboration. The message, again and again, is that Mike’s work has been broad enough to touch a wide range of people. He’s worked on branes and the landscape of different string theory universes, pure mathematics and computation, neuroscience and recently even machine learning. The talks generally begin with a few anecdotes about Mike, before pivoting into research talks on the speakers’ recent work. The recent-ness of the work is perhaps another difference from some birthday conferences: as one speaker said, this wasn’t just a celebration of Mike’s past, but a “welcome back” after his return from the finance world.
One thing I don’t know is how much this conference might have been limited by coming together on short notice. For other birthday conferences impacted by COVID (and I’m thinking of one in particular), it might be nice to have enough time to have most of the birthday prof’s friends and “academic family” there in person. As-is, though, Mike seems to be having fun regardless.
Before this year’s prize was announced, I remember a few “water cooler chats” about who might win. No guess came close, though. The Nobel committee seems to have settled into a strategy of prizes on a loosely linked “basket” of topics, with half the prize going to a prominent theorist and the other half going to two experimental, observational, or (in this case) computational physicists. It’s still unclear why they’re doing this, but regardless it makes it hard to predict what they’ll do next!
When I read the announcement, my first reaction was, “surely it’s not that Parisi?” Giorgio Parisi is known in my field for the Altarelli-Parisi equations (more properly known as the DGLAP equations, the longer acronym because, as is often the case in physics, the Soviets got there first). These equations are in some sense why the scattering amplitudes I study are ever useful at all. I calculate collisions of individual fundamental particles, like quarks and gluons, but a real particle collider like the LHC collides protons. Protons are messy, interacting combinations of quarks and gluons. When they collide you need not merely the equations describing colliding quarks and gluons, but those that describe their messy dynamics inside the proton, and in particular how those dynamics look different for experiments with different energies. The equation that describes that is the DGLAP equation.
As it turns out, Parisi is known for a lot more than the DGLAP equation. He is best known for his work on “spin glasses”, models of materials where quantum spins try to line up with each other, never quite settling down. He also worked on a variety of other complex systems, including flocks of birds!
I don’t know as much about Manabe and Hasselmann’s work. I’ve only seen a few talks on the details of climate modeling. I’ve seen plenty of talks on other types of computer modeling, though, from people who model stars, galaxies, or black holes. And from those, I can appreciate what Manabe and Hasselmann did. Based on those talks, I recognize the importance of those first one-dimensional models, a single column of air, especially back in the 60’s when computer power was limited. Even more, I recognize how impressive it is for someone to stay on the forefront of that kind of field, upgrading models for forty years to stay relevant into the 2000’s, as Manabe did. Those talks also taught me about the challenge of coupling different scales: how small effects in churning fluids can add up and affect the simulation, and how hard it is to model different scales at once. To use these effects to discover which models are reliable, as Hasselmann did, is a major accomplishment.
Scientific programming was in the news lately, when doubts were raised about a coronavirus simulation by researchers at Imperial College London. While the doubts appear to have been put to rest, doing so involved digging through some seriously messy code. The whole situation seems to have gotten a lot of people worried. If these people are that bad at coding, why should we trust their science?
I don’t know much about coronavirus simulations, my knowledge there begins and ends with a talk I saw last month. But I know a thing or two about bad scientific code, because I write it. My code is atrocious. And I’ve seen published code that’s worse.
Why do scientists write bad code?
In part, it’s a matter of training. Some scientists have formal coding training, but most don’t. I took two CS courses in college and that was it. Despite that lack of training, we’re expected and encouraged to code. Before I took those courses, I spent a summer working in a particle physics lab, where I was expected to pick up the C++-based interface pretty much on the fly. I don’t think there’s another community out there that has as much reason to code as scientists do, and as little training for it.
Would it be useful for scientists to have more of the tools of a trained coder? Sometimes, yeah. Version control is a big one, I’ve collaborated on papers that used Git and papers that didn’t, and there’s a big difference. There are coding habits that would speed up our work and lead to fewer dead ends, and they’re worth picking up when we have the time.
But there’s a reason we don’t prioritize “proper coding”. It’s because the things we’re trying to do, from a coding perspective, are really easy.
What, code-wise, is a coronavirus simulation? A vector of “people”, really just simple labels, all randomly infecting each other and recovering, with a few parameters describing how likely they are to do so and how long it takes. What do I do, code-wise? Mostly, giant piles of linear algebra.
These are not some sort of cutting-edge programming tasks. These are things people have been able to do since the dawn of computers. These are things that, when you screw them up, become quite obvious quite quickly.
Compared to that, the everyday tasks of software developers, like making a reliable interface for users, or efficient graphics, are much more difficult. They’re tasks that really require good coding practices, that just can’t function without them.
For us, the important part is not the coding itself, but what we’re doing with it. Whatever bugs are in a coronavirus simulation, they will have much less impact than, for example, the way in which the simulation includes superspreaders. Bugs in my code give me obviously wrong answers, bad scientific assumptions are much harder for me to root out.
There’s an exception that proves the rule here, and it’s that, when the coding task is actually difficult, scientists step up and write better code. Scientists who want to run efficiently on supercomputers, who are afraid of numerical error or need to simulate on many scales at once, these people learn how to code properly. The code behind the LHC still might be jury-rigged by industry standards, but it’s light-years better than typical scientific code.
I get the furor around the Imperial group’s code. I get that, when a government makes a critical decision, you hope that their every input is as professional as possible. But without getting too political for this blog, let me just say that whatever your politics are, if any of it is based on science, it comes from code like this. Psychology studies, economic modeling, polling…they’re using code, and it’s jury-rigged to hell. Scientists just have more important things to worry about.
On one hand, the practical benefits of a 53-qubit computer are pretty minimal. Scott discusses some applications: you can generate random numbers, distributed in a way that will let others verify that they are truly random, the kind of thing it’s occasionally handy to do in cryptography. Still, by itself this won’t change the world, and compared to the quantum computing hype I can understand if people find this underwhelming.
Ok, I’m actually just re-phrasing what I said before. The Extended Church-Turing Thesis proposes that a classical computer (more specifically, a probabilistic Turing machine) can efficiently simulate any reasonable computation. Falsifying it means finding something that a classical computer cannot compute efficiently but another sort of computer (say, a quantum computer) can. If the calculation Google did truly can’t be done efficiently on a classical computer (this is not proven, though experts seem to expect it to be true) then yes, that’s what Google claims to have done.
So we get back to the real question: should we be impressed by quantum supremacy?
Well, should we have been impressed by the Higgs?
The detection of the Higgs boson in 2012 hasn’t led to any new Higgs-based technology. No-one expected it to. It did teach us something about the world: that the Higgs boson exists, and that it has a particular mass. I think most people accept that that’s important: that it’s worth knowing how the world works on a fundamental level.
Google may have detected the first-known violation of the Extended Church-Turing Thesis. This could eventually lead to some revolutionary technology. For now, though, it hasn’t. Instead, it teaches us something about the world.
It may not seem like it, at first. Unlike the Higgs boson, “Extended Church-Turing is false” isn’t a law of physics. Instead, it’s a fact about our capabilities. It’s a statement about the kinds of computers we can and cannot build, about the kinds of algorithms we can and cannot implement, the calculations we can and cannot do.
Facts about our capabilities are still facts about the world. They’re still worth knowing, for the same reasons that facts about the world are still worth knowing. They still give us a clearer picture of how the world works, which tells us in turn what we can and cannot do. According to the leaked paper, Google has taught us a new fact about the world, a deep fact about our capabilities. If that’s true we should be impressed, even without new technology.
There’s a picture we learn in high school. It’s not the whole story, certainly: philosophers of science have much more sophisticated notions. But for practicing scientists, it’s a picture that often sits in the back of our minds, informing what we do. Because of that, it’s worth examining in detail.
In the high school picture, scientific theories make predictions. Importantly, postdictions don’t count: if you “predict” something that already happened, it’s too easy to cheat and adjust your prediction. Also, your predictions must be different from those of other theories. If all you can do is explain the same results with different words you aren’t doing science, you’re doing “something else” (“metaphysics”, “religion”, “mathematics”…whatever the person you’re talking to wants to make fun of, but definitely not science).
Seems reasonable, right? Let’s try a thought experiment.
In the late 1950’s, the physics of protons and neutrons was still quite mysterious. They seemed to be part of a bewildering zoo of particles that no-one could properly explain. In the 60’s and 70’s the field started converging on the right explanation, from Gell-Mann’s eightfold way to the parton model to the full theory of quantum chromodynamics (QCD for short). Today we understand the theory well enough to package things into computer code: amplitudes programs like BlackHat for collisions of individual quarks, jet algorithms that describe how those quarks become signals in colliders, lattice QCD implemented on supercomputers for pretty much everything else.
Now imagine that you had a time machine, prodigious programming skills, and a grudge against 60’s era-physicists.
Suppose you wrote a computer program that combined the best of QCD in the modern world. BlackHat and more from the amplitudes side, the best jet algorithms and lattice QCD code, and more: a program that could reproduce any calculation in QCD that anyone can do today. Further, suppose you don’t care about silly things like making your code readable. Since I began the list above with BlackHat, we’ll call the combined box of different codes BlackBox.
Now suppose you went back in time, and told the bewildered scientists of the 50’s that nuclear physics was governed by a very complicated set of laws: the ones implemented in BlackBox.
Your “BlackBox theory” passes the high school test. Not only would it match all previous observations, it could make predictions for any experiment the scientists of the 50’s could devise. Up until the present day, your theory would match observations as well as…well as well as QCD does today.
(Let’s ignore for the moment that they didn’t have computers that could run this code in the 50’s. This is a thought experiment, we can fudge things a bit.)
Now suppose that one of those enterprising 60’s scientists, Gell-Mann or Feynman or the like, noticed a pattern. Maybe they got it from an experiment scattering electrons off of protons, maybe they saw it in BlackBox’s code. They notice that different parts of “BlackBox theory” run on related rules. Based on those rules, they suggest a deeper reality: protons are made of quarks!
But is this “quark theory” scientific?
“Quark theory” doesn’t make any new predictions. Anything you could predict with quarks, you could predict with BlackBox. According to the high school picture of science, for these 60’s scientists quarks wouldn’t be scientific: they would be “something else”, metaphysics or religion or mathematics.
And in practice? I doubt that many scientists would care.
“Quark theory” makes the same predictions as BlackBox theory, but I think most of us understand that it’s a better theory. It actually explains what’s going on. It takes different parts of BlackBox and unifies them into a simpler whole. And even without new predictions, that would be enough for the scientists in our thought experiment to accept it as science.
Why am I thinking about this? For two reasons:
First, I want to think about what happens when we get to a final theory, a “Theory of Everything”. It’s probably ridiculously arrogant to think we’re anywhere close to that yet, but nonetheless the question is on physicists’ minds more than it has been for most of history.
Right now, the Standard Model has many free parameters, numbers we can’t predict and must fix based on experiments. Suppose there are two options for a final theory: one that has a free parameter, and one that doesn’t. Once that one free parameter is fixed, both theories will match every test you could ever devise (they’re theories of everything, after all).
If we come up with both theories before testing that final parameter, then all is well. The theory with no free parameters will predict the result of that final experiment, the other theory won’t, so the theory without the extra parameter wins the high school test.
What if we do the experiment first, though?
If we do, then we’re in a strange situation. Our “prediction” of the one free parameter is now a “postdiction”. We’ve matched numbers, sure, but by the high school picture we aren’t doing science. Our theory, the same theory that was scientific if history went the other way, is now relegated to metaphysics/religion/mathematics.
I don’t know about you, but I’m uncomfortable with the idea that what is or is not science depends on historical chance. I don’t like the idea that we could be stuck with a theory that doesn’t explain everything, simply because our experimentalists were able to work a bit faster.
My second reason focuses on the here and now. You might think we have nothing like BlackBox on offer, no time travelers taunting us with poorly commented code. But we’ve always had the option of our own Black Box theory: experiment itself.
The Standard Model fixes some of its parameters from experimental results. You do a few experiments, and you can predict the results of all the others. But why stop there? Why not fix all of our parameters with experiments? Why not fix everything with experiments?
That’s the Black Box Theory of Everything. Each individual experiment you could possibly do gets its own parameter, describing the result of that experiment. You do the experiment, fix that parameter, then move on to the next experiment. Your theory will never be falsified, you will never be proven wrong. Sure, you never predict anything either, but that’s just an extreme case of what we have now, where the Standard Model can’t predict the mass of the Higgs.
What’s wrong with the Black Box Theory? (I trust we can all agree that it’s wrong.)
It’s not just that it can’t make predictions. You could make it a Black Box All But One Theory instead, that predicts one experiment and takes every other experiment as input. You could even make a Black Box Except the Standard Model Theory, that predicts everything we can predict now and just leaves out everything we’re still confused by.
The Black Box Theory is wrong because the high school picture of what counts as science is wrong. The high school picture is a useful guide, it’s a good rule of thumb, but it’s not the ultimate definition of science. And especially now, when we’re starting to ask questions about final theories and ultimate parameters, we can’t cling to the high school picture. We have to be willing to actually think, to listen to the philosophers and consider our own motivations, to figure out what, in the end, we actually mean by science.
When I join a new department or institute, the first thing I ask is “do we have a cluster?”
Most of what I do, I do on a computer. Gone are the days when theorists would always do all their work on notepads and chalkboards (though many still do!). Instead, we use specialized computer programs like Mathematica and Maple. Using a program helps keep us from forgetting pesky minus signs, and it allows working with equations far too long to fit on a sheet of paper.
Supercomputers are great, but they’re also expensive. The people who use supercomputers are the ones who model large, complicated systems, like the weather, or supernovae. For most theorists, you still want power, but you don’t need quite that much. That’s where computer clusters come in.
A computer cluster is pretty much what it sounds like: several computers wired together. Different clusters contain different numbers of computers. For example, my department has a ten-node cluster. Sure, that doesn’t stack up to a supercomputer, but it’s still ten times as fast as an ordinary computer, right?
The power of ten computers!
Well, not exactly. As several of my friends have been surprised to learn, the computers on our cluster are actually slower than most of our laptops.
The power of ten old computers!
Still, ten older computers is still faster than one new one, yes?
Even then, it depends how you use it.
Run a normal task on a cluster, and it’s just going to run on one of the computers, which, as I’ve said, are slower than a modern laptop. You need to get smarter.
There are two big advantages of clusters: time, and parallelization.
Sometimes, you want to do a calculation that will take a long time. Your computer is going to be busy for a day or two, and that’s inconvenient when you want to do…well, pretty much anything else. A cluster is a space to run those long calculations. You put the calculation on one of the nodes, you go back to doing your work, and you check back in a day or two to see if it’s finished.
Clusters are at their most powerful when you can parallelize. If you need to do ten versions of the same calculation, each slightly different, then rather than doing them one at a time a cluster lets you do them all at once. At that point, it really is making you ten times faster.
If you ever program, I’d encourage you to look into the resources you have available. A cluster is a very handy thing to have access to, no matter what you’re doing!