Sometimes I envy astronomers. Particle physicists can write books full of words and pages of colorful graphs and charts, and the public won’t retain any of it. Astronomers can mesmerize the world with a single picture.
Images like this enter the popular imagination. The Hubble telescope’s deep field has appeared on essentially every artistic product one could imagine. As of writing this, searching for “Hubble” on Etsy gives almost 5,000 results. “JWST”, the acronym for the James Webb Space Telescope, already gives over 1,000, including several on the front page that already contain just-released images. Despite the Large Hadron Collider having operated for over a decade, searching “LHC” also leads to just around 1,000 results…and a few on the front page are actually pictures of the JWST!
It would be great as particle physicists to have that kind of impact…but I think we shouldn’t stress ourselves too much about it. Ultimately astronomers will always have this core advantage. Space is amazing, visually stunning and mind-bogglingly vast. It has always had a special place for human cultures, and I’m happy for astronomers to inherit that place.
Back in 2017, I noticed something that should have struck me as a little odd. My sub-field has a big yearly conference, called Amplitudes, that brings in everyone who works on our kind of research. Amplitudes 2017 was fun, but not “fresh”: most people talked about work they had already published. A smaller conference I went to that year, called QCD Meets Gravity, was much “fresher”: a lot of discussion of work in progress and work “hot off the presses”.
At the time, I chalked the difference up to timing: it was a few months later, and people happened to have projects that matured around then. But I realized recently there’s another reason, one why you would expect bigger conferences to have less fresh content.
The bigger a conference is, the longer in advance you need to invite speakers. It’s a bigger task to organize everyone, to make sure travel and hotels and raw availability works, that everyone has time to prepare their talks and you have a nice full (but not too full) schedule. So when we started asking people, we didn’t know what the “freshest” work was going to be. We had recommendations from our scientific committee (a group of experts in the subfield whose job is to suggest speakers), but in practice the goal is more one of breadth than freshness: we needed to make sure that everybody in our community was represented.
A smaller conference can get around this. It can be organized a bit later, so the organizers have more information about new developments. It covers a smaller area, so the organizers have more information about new hot topics and unpublished results. And it typically invites most of the sub-community anyway, so you’re guaranteed to cover the hot new stuff just by raw completeness.
This doesn’t mean small conferences are “just better” or anything like that. Breadth is genuinely useful: a big conference covering a whole subfield is great for bringing a community together, getting everyone on a shared page and expanding their horizons. There’s a real tradeoff between those goals and getting a conference with the latest progress. It’s not a fixed tradeoff, we can improve both goals at once (I think at Amplitudes we as organizers could have been better at highlighting unpublished work), but we still have to make choices of what to emphasize.
Scott Aaronson recently published an interesting exchange on his blog Shtetl Optimized, between him and cognitive psychologist Steven Pinker. The conversation was about AI: Aaronson is optimistic (though not insanely so) Pinker is pessimistic (again, not insanely though). While fun reading, the whole thing would normally be a bit too off-topic for this blog, except that Aaronson’s argument ended up invoking something I do know a bit about: how we make progress in theoretical physics.
Aaronson was trying to respond to an argument of Pinker’s, that super-intelligence is too vague and broad to be something we could expect an AI to have. Aaronson asks us to imagine an AI that is nothing more or less than a simulation of Einstein’s brain. Such a thing isn’t possible today, and might not even be efficient, but it has the advantage of being something concrete we can all imagine. Aarsonson then suggests imagining that AI sped up a thousandfold, so that in one year it covers a thousand years of Einstein’s thought. Such an AI couldn’t solve every problem, of course. But in theoretical physics, surely such an AI could be safely described as super-intelligent: an amazing power that would change the shape of physics as we know it.
I’m not as sure of this as Aaronson is. We don’t have a machine that generates a thousand Einstein-years to test, but we do have one piece of evidence: the 76 Einstein-years the man actually lived.
After that, though…not so much. For Einstein-decades, he tried to work towards a new unified theory of physics, and as far as I’m aware made no useful progress at all. I’ve never seen someone cite work from that period of Einstein’s life.
Aarsonson mentions simulating Einstein “at his peak”, and it would be tempting to assume that the unified theory came “after his peak”, when age had weakened his mind. But while that kind of thing can sometimes be an issue for older scientists, I think it’s overstated. I don’t think careers peak early because of “youthful brains”, and with the exception of genuine dementia I don’t think older physicists are that much worse-off cognitively than younger ones. The reason so many prominent older physicists go down unproductive rabbit-holes isn’t because they’re old. It’s because genius isn’t universal.
Einstein made the progress he did because he was the right person to make that progress. He had the right background, the right temperament, and the right interests to take others’ mathematics and take them seriously as physics. As he aged, he built on what he found, and that background in turn enabled him to do more great things. But eventually, the path he walked down simply wasn’t useful anymore. His story ended, driven to a theory that simply wasn’t going to work, because given his experience up to that point that was the work that interested him most.
I think genius in physics is in general like that. It can feel very broad because a good genius picks up new tricks along the way, and grows their capabilities. But throughout, you can see the links: the tools mastered at one age that turn out to be just right for a new pattern. For the greatest geniuses in my field, you can see the “signatures” in their work, hints at why they were just the right genius for one problem or another. Give one a thousand years, and I suspect the well would eventually run dry: the state of knowledge would no longer be suitable for even their breadth.
…of course, none of that really matters for Aaronson’s point.
A century of Einstein-years wouldn’t have found the Standard Model or String Theory, but a century of physicist-years absolutely did. If instead of a simulation of Einstein, your AI was a simulation of a population of scientists, generating new geniuses as the years go by, then the argument works again. Sure, such an AI would be much more expensive, much more difficult to build, but the first one might have been as well. The point of the argument is simply to show such a thing is possible.
The core of Aaronson’s point rests on two key traits of technology. Technology is replicable: once we know how to build something, we can build more of it. Technology is scalable: if we know how to build something, we can try to build a bigger one with more resources. Evolution can tap into both of these, but not reliably: just because it’s possible to build a mind a thousand times better at some task doesn’t mean it will.
That is why the possibility of AI leads to the possibility of super-intelligence. If we can make a computer that can do something, we can make it do that something faster. That something doesn’t have to be “general”, you can have programs that excel at one task or another. For each such task, with more resources you can scale things up: so anything a machine can do now, a later machine can probably do better. Your starting-point doesn’t necessarily even have to be efficient, or a good algorithm: bad algorithms will take longer to scale, but could eventually get there too.
The only question at that point is “how fast?” I don’t have the impression that’s settled. The achievements that got Pinker and Aarsonson talking, GPT-3 and DALL-E and so forth, impressed people by their speed, by how soon they got to capabilities we didn’t expect them to have. That doesn’t mean that something we might really call super-intelligence is close: that has to do with the details, with what your target is and how fast you can actually scale. And it certainly doesn’t mean that another approach might not be faster! (As a total outsider, I can’t help but wonder if current ML is in some sense trying to fit a cubic with straight lines.)
It does mean, though, that super-intelligence isn’t inconceivable, or incoherent. It’s just the recognition that technology is a master of brute force, and brute force eventually triumphs. If you want to think about what happens in that “eventually”, that’s a very important thing to keep in mind.
Maybe you care about technology. You support science because, down the line, you think it will give us new capabilities that improve people’s lives. Maybe you expect this to happen directly, or maybe indirectly as “spinoff” inventions like the internet.
Maybe you just think science is cool. You want the stories that science tells: they entertain you, they give you a place in the world, they help distract from the mundane day to day grind.
Maybe you just think that the world ought to have scientists in it. You can think of it as a kind of bargain, maintaining expertise so that society can tackle difficult problems. Or you can be more cynical, paying early-career scientists on the assumption that most will leave academia and cheapen labor costs for tech companies.
Maybe you want to pay the scientists to teach, to be professors at universities. You notice that they don’t seem to be happy if you don’t let them research, so you throw a little research funding at them, as a treat.
Maybe you just want to grow your empire: your department, your university, the job numbers in your district.
In most jobs, you’re supposed to do what people pay you to do. As a scientist, the people who pay you have all of these motivations and more. You can’t simply choose to do what people pay you to do.
So you come up with a proxy. You sum up all of these ideas, into a vague picture of what all those people want. You have some idea of scientific quality: not just a matter of doing science correctly and carefully, but doing interesting science. It’s not something you ever articulate. It’s likely even contradictory, after all, the goals it approximates often are. Nonetheless, it’s your guide, and not just your guide: it’s the guide of those who hire you, those who choose if you get promoted or whether you get more funding. All of these people have some vague idea in their head of what makes good science, their own proxy for the desires of the vast mass of voters and decision-makers and funders.
But of course, the standard is still vague. Should good science be deep? Which topics are deeper than others? Should it be practical? Practical for whom? Should it be surprising? What do you expect to happen, and what would surprise you? Should it get the community excited? Which community?
As a practicing scientist, you have to build your own proxy for these proxies. The same work that could get you hired in one place might meet blank stares at another, and you can’t build your life around those unpredictable quirks. So you make your own vague idea of what you’re supposed to do, an alchemy of what excites you and what makes an impact and what your friends are doing. You build a stand-in in your head, on the expectation that no-one else will have quite the same stand-in, then go out and convince the other stand-ins to give money to your version. You stand on a shifting pile of unwritten rules, subtler even than some artists, because at the end of the day there’s never a real client to be seen. Just another proxy.
If you imagine a particle physicist, you probably picture someone spending their whole day dreaming up new particles. They figure out how to test those particles in some big particle collider, and for a lucky few their particle gets discovered and they get a Nobel prize.
Occasionally, a wiseguy asks if we can’t just cut out the middleman. Instead of dreaming up particles to test, why don’t we just write down every possible particle and test for all of them? It would save the Nobel committee a lot of money at least!
It turns out, you can sort of do this, through something called Effective Field Theory. An Effective Field Theory is a type of particle physics theory that isn’t quite true: instead, it’s “effectively” true, meaning true as long as you don’t push it too far. If you test it at low energies and don’t “zoom in” too much then it’s fine. Crank up your collider energy high enough, though, and you expect the theory to “break down”, revealing new particles. An Effective Field Theory lets you “hide” unknown particles inside new interactions between the particles we already know.
To help you picture how this works, imagine that the pink and blue lines here represent familiar particles like electrons and quarks, while the dotted line is a new particle somebody dreamed up. (The picture is called a Feynman diagram, if you don’t know what that is check out this post.)
In an Effective Field Theory, we “zoom out”, until the diagram looks like this:
Now we’ve “hidden” the new particle. Instead, we have a new type of interaction between the particles we already know.
So instead of writing down every possible new particle we can imagine, we only have to write down every possible interaction between the particles we already know.
Using these rules you can play a kind of game. You start out with a space representing all of the interactions you can imagine. You begin chipping at it, carving away parts that don’t obey the rules, and you see what shape is left over. You end up with plots that look a bit like carving a ham.
People in my subfield are getting good at this kind of game. It isn’t quite our standard fare: usually, we come up with tricks to make calculations with specific theories easier. Instead, many groups are starting to look at these general, effective theories. We’ve made friends with groups in related fields, building new collaborations. There still isn’t one clear best way to do this carving, so each group manages to find a way to chip a little farther. Out of the block of every theory we could imagine, we’re carving out a space of theories that make sense, theories that could conceivably be right. Theories that are worth testing.
Today, we’d call Leibniz a mathematician, a physicist, and a philosopher. As a mathematician, Leibniz turned calculus into something his contemporaries could actually use. As a physicist, he championed a doomed theory of gravity. In philosophy, he seems to be most remembered for extremely cheaty arguments.
I don’t blame him for this. Faced with a tricky philosophical problem, it’s enormously tempting to just blaze through with an answer that makes every subtlety irrelevant. It’s a temptation I’ve succumbed to time and time again. Faced with a genie, I would always wish for more wishes. On my high school debate team, I once forced everyone at a tournament to switch sides with some sneaky definitions. It’s all good fun, but people usually end up pretty annoyed with you afterwards.
People were annoyed with Leibniz too, especially with his solution to the problem of evil. If you believe in a benevolent, all-powerful god, as Leibniz did, why is the world full of suffering and misery? Leibniz’s answer was that even an all-powerful god is constrained by logic, so if the world contains evil, it must be logically impossible to make the world any better: indeed, we live in the best of all possible worlds. Voltaire famously made fun of this argument in Candide, dragging a Leibniz-esque Professor Pangloss through some of the most creative miseries the eighteenth century had to offer. It’s possibly the most famous satire of a philosopher, easily beating out Aristophanes’ The Clouds (which is also great).
I called Leibniz’s argument “cheaty”, and you might presume I think the same of the multiverse. But “cheaty” doesn’t mean “wrong”. It all depends what you’re trying to do.
Leibniz’s argument and the multiverse both work by dodging a problem. For Leibniz, the problem of evil becomes pointless: any evil might be necessary to secure a greater good. With a multiverse, naturalness becomes pointless: with many different laws of physics in different places, the existence of one like ours needs no explanation.
In both cases, though, the dodge isn’t perfect. To really explain any given evil, Leibniz would have to show why it is secretly necessary in the face of a greater good (and Pangloss spends Candide trying to do exactly that). To explain any given law of physics, the multiverse needs to use anthropic reasoning: it needs to show that that law needs to be the way it is to support human-like life.
This sounds like a strict requirement, but in both cases it’s not actually so useful. Leibniz could (and Pangloss does) come up with an explanation for pretty much anything. The problem is that no-one actually knows which aspects of the universe are essential and which aren’t. Without a reliable way to describe the best of all possible worlds, we can’t actually test whether our world is one.
The same problem holds for anthropic reasoning. We don’t actually know what conditions are required to give rise to people like us. “People like us” is very vague, and dramatically different universes might still contain something that can perceive and observe. While it might seem that there are clear requirements, so far there hasn’t been enough for people to do very much with this type of reasoning.
However, for both Leibniz and most of the physicists who believe anthropic arguments, none of this really matters. That’s because the “best of all possible worlds” and “most anthropic of all possible worlds” aren’t really meant to be predictive theories. They’re meant to say that, once you are convinced of certain things, certain problems don’t matter anymore.
Leibniz, in particular, wasn’t trying to argue for the existence of his god. He began the argument convinced that a particular sort of god existed: one that was all-powerful and benevolent, and set in motion a deterministic universe bound by logic. His argument is meant to show that, if you believe in such a god, then the problem of evil can be ignored: no matter how bad the universe seems, it may still be the best possible world.
So despite their cheaty feel, both arguments are fine…provided you agree with their assumptions. Personally, I don’t agree with Leibniz. For the multiverse, I’m less sure. I’m not confident the universe expands fast enough to create a multiverse, I’m not even confident it’s speeding up its expansion now. I know there’s a lot of controversy about the math behind the string theory landscape, about whether the vast set of possible laws of physics are as consistent as they’re supposed to be…and of course, as anyone must admit, we don’t know whether string theory itself is true! I don’t think it’s impossible that the right argument comes around and convinces me of one or both claims, though. These kinds of arguments, “if assumptions, then conclusion” are the kind of thing that seems useless for a while…until someone convinces you of the conclusion, and they matter once again.
So in the end, despite the similarity, I’m not sure the multiverse deserves its own Candide. I’m not even sure Leibniz deserved Candide. But hopefully by understanding one, you can understand the other just a bit better.
I saw a discussion on twitter recently, about PhD programs in the US. Apparently universities are putting more and more weight whether prospective students published a paper during their Bachelor’s degree. For some, it’s even an informal requirement. Some of those in the discussion were skeptical that the students were really contributing to these papers much, and thought that most of the work must have been done by the papers’ other authors. If so, this would mean universities are relying more and more on a metric that depends on whether students can charm their professors enough to be “included” in this way, rather than their own abilities.
I won’t say all that much about the admissions situation in the US. (Except to say that if you find yourself making up new criteria to carefully sift out a few from a group of already qualified-enough candidates, maybe you should consider not doing that.) What I did want to say a bit about is what undergraduates can typically actually do, when it comes to research in my field.
First, I should clarify that I’m talking about students in the US system here. Undergraduate degrees in Europe follow a different path. Students typically take three years to get a Bachelor’s degree, often with a project at the end, followed by a two-year Master’s degree capped with a Master’s thesis. A European Master’s thesis doesn’t have to result in a paper, but is often at least on that level, while a European Bachelor project typically isn’t. US Bachelor’s degrees are four years, so one might expect a Bachelor’s thesis to be in between a European Bachelor’s project and Master’s thesis. In practice, it’s a bit different: courses for Master’s students in Europe will generally cover material taught to PhD students in the US, so a typical US Bachelor’s student won’t have had some courses that have a big role in research in my field, like Quantum Field Theory. On the other hand, the US system is generally much more flexible, with students choosing more of their courses and having more opportunities to advance ahead of the default path. So while US Bachelor’s students don’t typically take Quantum Field Theory, the more advanced students can and do.
Because of that, how advanced a given US Bachelor’s student is varies. A small number are almost already PhD students, and do research to match. Most aren’t, though. Despite that, it’s still possible for such a student to complete a real research project in theoretical physics, one that results in a real paper. What does that look like?
Sometimes, it’s because the student is working with a toy model. The problems we care about in theoretical physics can be big and messy, involving a lot of details that only an experienced researcher will know. If we’re lucky, we can make a simpler version of the problem, one that’s easier to work with. Toy models like this are often self-contained, the kind of thing a student can learn without all of the background we expect. The models may be simpler than the real world, but they can still be interesting, suggesting new behavior that hadn’t been considered before. As such, with a good choice of toy model an undergraduate can write something that’s worthy of a real physics paper.
Other times, the student is doing something concrete in a bigger collaboration. This isn’t quite the same as the “real scientists” doing all the work, because the student has a real task to do, just one that is limited in scope. Maybe there is particular computer code they need to get working, or a particular numerical calculation they need to do. The calculation may be comparatively straightforward, but in combination with other results it can still merit a paper. My first project as a PhD student was a little like that, tackling one part of a larger calculation. Once again, the task can be quite self-contained, the kind of thing you can teach a student over a summer project.
Undergraduate projects in the US won’t always result in a paper, and I don’t think anyone should expect, or demand, that they do. But a nontrivial number do, and not because the student is “cheating”. With luck, a good toy model or a well-defined sub-problem can lead a Bachelor’s student to make a real contribution to physics, and get a paper in the bargain.
I’ve tried to convince you that you are a particle detector. You choose your experiment, what actions you take, and then observe the outcome. If you focus on that view of yourself, data out and data in, you start to wonder if the world outside really has any meaning. Maybe you’re just trapped in the Matrix.
From a physics perspective, you actually are trapped in a sort of a Matrix. We call it the S Matrix.
“S” stands for scattering. The S Matrix is a formula we use, a mathematical tool that tells us what happens when fundamental particles scatter: when they fly towards each other, colliding or bouncing off. For each action we could take, the S Matrix gives the probability of each outcome: for each pair of particles we collide, the chance we detect different particles at the end. You can imagine putting every possible action in a giant vector, and every possible observation in another giant vector. Arrange the probabilities for each action-observation pair in a big square grid, and that’s a matrix.
Actually, I lied a little bit. This is particle physics, and particle physics uses quantum mechanics. Because of that, the entries of the S Matrix aren’t probabilities: they’re complex numbers called probability amplitudes. You have to multiply them by their complex conjugate to get probability out.
Ok, that probably seemed like a lot of detail. Why am I telling you all this?
What happens when you multiply the whole S Matrix by its complex conjugate? (Using matrix multiplication, naturally.) You can still pick your action, but now you’re adding up every possible outcome. You’re asking “suppose I take an action. What’s the chance that anything happens at all?”
The answer to that question is 1. There is a 100% chance that something happens, no matter what you do. That’s just how probability works.
We call this property unitarity, the property of giving “unity”, or one. And while it may seem obvious, it isn’t always so easy. That’s because we don’t actually know the S Matrix formula most of the time. We have to approximate it, a partial formula that only works for some situations. And unitarity can tell us how much we can trust that formula.
Imagine doing an experiment trying to detect neutrinos, like the IceCube Neutrino Observatory. For you to detect the neutrinos, they must scatter off of electrons, kicking them off of their atoms or transforming them into another charged particle. You can then notice what happens as the energy of the neutrinos increases. If you do that, you’ll notice the probability also start to increase: it gets more and more likely that the neutrino can scatter an electron. You might propose a formula for this, one that grows with energy. [EDIT: Example changed after a commenter pointed out an issue with it.]
If you keep increasing the energy, though, you run into a problem. Those probabilities you predict are going to keep increasing. Eventually, you’ll predict a probability greater than one.
That tells you that your theory might have been fine before, but doesn’t work for every situation. There’s something you don’t know about, which will change your formula when the energy gets high. You’ve violated unitarity, and you need to fix your theory.
In this case, the fix is already known. Neutrinos and electrons interact due to another particle, called the W boson. If you include that particle, then you fix the problem: your probabilities stop going up and up, instead, they start slowing down, and stay below one.
For other theories, we don’t yet know the fix. Try to write down an S Matrix for colliding gravitational waves (or really, gravitons), and you meet the same kind of problem, a probability that just keeps growing. Currently, we don’t know how that problem should be solved: string theory is one answer, but may not be the only one.
So even if you’re trapped in an S Matrix, sending data out and data in, you can still use logic. You can still demand that probability makes sense, that your matrix never gives a chance greater than 100%. And you can learn something about physics when you do!
The Niels Bohr Institute is hosting a conference this week on New Ideas in Cosmology. I’m no cosmologist, but it’s a pretty cool field, so as a local I’ve been sitting in on some of the talks. So far they’ve had a selection of really interesting speakers with quite a variety of interests, including a talk by Roger Penrose with his trademark hand-stippled drawings.
One thing that has impressed me has been the “interdisciplinary” feel of the conference. By all rights this should be one “discipline”, cosmology. But in practice, each speaker came at the subject from a different direction. They all had a shared core of knowledge, common models of the universe they all compare to. But the knowledge they brought to the subject varied: some had deep knowledge of the mathematics of gravity, others worked with string theory, or particle physics, or numerical simulations. Each talk, aware of the varied audience, was a bit “colloquium-style“, introducing a framework before diving in to the latest research. Each speaker knew enough to talk to the others, but not so much that they couldn’t learn from them. It’s been unexpectedly refreshing, a real interdisciplinary conference done right.
I’m at a conference this week of a very particular type: a birthday conference. When folks in my field turn 60, their students and friends organize a special conference for them, celebrating their research legacy. With COVID restrictions just loosening, my advisor Michael Douglas is getting a last-minute conference. And as one of the last couple students he graduated at Stony Brook, I naturally showed up.
The conference, Mikefest, is at the Institut des Hautes Études Scientifiques, just outside of Paris. Mike was a big supporter of the IHES, putting in a lot of fundraising work for them. Another big supporter, James Simons, was Mike’s employer for a little while after his time at Stony Brook. The conference center we’re meeting in is named for him.
I wasn’t involved in organizing the conference, so it was interesting seeing differences between this and other birthday conferences. Other conferences focus on the birthday prof’s “family tree”: their advisor, their students, and some of their postdocs. We’ve had several talks from Mike’s postdocs, and one from his advisor, but only one from a student. Including him and me, three of Mike’s students are here: another two have had their work mentioned but aren’t speaking or attending.
Most of the speakers have collaborated with Mike, but only for a few papers each. All of them emphasized a broader debt though, for discussions and inspiration outside of direct collaboration. The message, again and again, is that Mike’s work has been broad enough to touch a wide range of people. He’s worked on branes and the landscape of different string theory universes, pure mathematics and computation, neuroscience and recently even machine learning. The talks generally begin with a few anecdotes about Mike, before pivoting into research talks on the speakers’ recent work. The recent-ness of the work is perhaps another difference from some birthday conferences: as one speaker said, this wasn’t just a celebration of Mike’s past, but a “welcome back” after his return from the finance world.
One thing I don’t know is how much this conference might have been limited by coming together on short notice. For other birthday conferences impacted by COVID (and I’m thinking of one in particular), it might be nice to have enough time to have most of the birthday prof’s friends and “academic family” there in person. As-is, though, Mike seems to be having fun regardless.