I’m traveling this week, so this will just be a short post. This isn’t a scientific trip exactly: I’m in Poland, at an event connected to the 550th anniversary of the birth of Copernicus.
Part of this event involved visiting the Copernicus Science Center, the local children’s science museum. The place was sold out completely. For any tired science communicators, I recommend going to a sold-out science museum: the sheer enthusiasm you’ll find there is balm for the most jaded soul.
For those who have been following these developments, things don’t feel quite so sudden. Already in 2019, AI Dungeon showed off how an early version of GPT could be used to mimic an old-school text-adventure game, and a tumblr blogger built a bot that imitates his posts as a fun side project. Still, the newer programs have shown some impressive capabilities.
Are we close to “real AI”, to artificial minds like the positronic brains in Isaac Asimov’s I, Robot? I can’t say, in part because I’m not sure what “real AI” really means. But if you want to understand where things like ChatGPT come from, how they work and why they can do what they do, then all the talk of AI won’t be helpful. Instead, you need to think of an entirely different set of Asimov novels: the Foundation series.
While Asimov’s more famous I, Robot focused on the science of artificial minds, the Foundation series is based on a different fictional science, the science of psychohistory. In the stories, psychohistory is a kind of futuristic social science. In the real world, historians and sociologists can find general principles of how people act, but don’t yet have the kind of predictive theories physicists or chemists do. Foundation imagines a future where powerful statistical methods have allowed psychohistorians to precisely predict human behavior: not yet that of individual people, but at least the average behavior of civilizations. They can not only guess when an empire is soon to fall, but calculate how long it will be before another empire rises, something few responsible social scientists would pretend to do today.
GPT and similar programs aren’t built to predict the course of history, but they do predict something: given part of a text, they try to predict the rest. They’re called Large Language Models, or LLMs for short. They’re “models” in the sense of mathematical models, formulas that let us use data to make predictions about the world, and the part of the world they model is our use of language.
Normally, a mathematical model is designed based on how we think the real world works. A mathematical model of a pandemic, for example, might use a list of people, each one labeled as infected or not. It could include an unknown number, called a parameter, for the chance that one person infects another. That parameter would then be filled in, or fixed, based on observations of the pandemic in the real world.
LLMs (as well as most of the rest of what people call “AI” these days) are a bit different. Their models aren’t based on what we expect about the real world. Instead, they’re in some sense “generic”, models that could in principle describe just about anything. In order to make this work, they have a lot more parameters, tons and tons of flexible numbers that can get fixed in different ways based on data.
The surprising thing is that this works, and works surprisingly well. Just as psychohistory from the Foundation novels can predict events with much more detail than today’s historians and sociologists, LLMs can predict what a text will look like much more precisely than today’s literature professors. That isn’t necessarily because LLMs are “intelligent”, or because they’re “copying” things people have written. It’s because they’re mathematical models, built by statistically analyzing a giant pile of texts.
Just as Asimov’s psychohistory can’t predict the behavior of individual people, LLMs can’t predict the behavior of individual texts. If you start writing something, you shouldn’t expect an LLM to predict exactly how you would finish. Instead, LLMs predict what, on average, the rest of the text would look like. They give a plausible answer, one of many, for what might come next.
They can’t do that perfectly, but doing it imperfectly is enough to do quite a lot. It’s why they can be used to make chatbots, by predicting how someone might plausibly respond in a conversation. It’s why they can write fiction, or ads, or college essays, by predicting a plausible response to a book jacket or ad copy or essay prompt.
LLMs like GPT were invented by computer scientists, not social scientists or literature professors. Because of that, they get described as part of progress towards artificial intelligence, not as progress in social science. But if you want to understand what ChatGPT is right now, and how it works, then that perspective won’t be helpful. You need to put down your copy of I, Robot and pick up Foundation. You’ll still be impressed, but you’ll have a clearer idea of what could come next.
Since Valentine’s Day was this week, it’s time for the next installment of my traditional Valentine’s Day Physics Poems. New readers, don’t let this drive you off, I only do it once a year! And if you actually like it, you can take a look at poems from previous years here.
Married to a Model
If you ever face a physics class distracted,
Rappers and footballers twinkling on their phones,
Then like an awkward youth pastor, interject,
“You know who else is married to a Model?”
Her name is Standard, you see,
Wife of fifty years to Old Man Physics,
Known for her beauty, charm, and strangeness too.
But Old Man Physics has a wandering eye,
and dreams of Models Beyond.
Let the old man bend your ear,
a litany of Problems.
He’ll never understand her, so he starts.
Some matters she holds weighty, some feather-light
with nary rhyme or reason
(which he is owed, he’s sure).
She’s unnatural, he says,
(echoing Higgins et al.),
a set of rules he can’t predict.
(But with those rules, all else is possible.)
Some regularities she holds to fast, despite room for exception,
others breaks, like an ill-lucked bathroom mirror.
And then, he says, she’ll just blow up
(when taken to extremes),
while singing nonsense in the face of Gravity.
He’s been keeping a careful eye
and noticing anomalies
(and each time, confronting them,
finds an innocent explanation,
but no matter).
And he imagines others
with yet wilder curves
and more sensitive reactions
(and nonsense, of course,
that he’s lived fifty years without).
Old man physics talks,
But beyond the talk,
beyond the phases and phrases,
(conscious uncoupling, non-empirical science),
he stays by her side.
He knows Truth,
in this world,
is worth fighting for.
You can think of a quantum particle like a coin frozen in mid-air. Once measured, the coin falls, and you read it as heads or tails, but before then the coin is neither, with equal chance to be one or the other. In this metaphor, quantum entanglement slices the coin in half. Slice a coin in half on a table, and its halves will either both show heads, or both tails. Slice our “frozen coin” in mid-air, and it keeps this property: the halves, both still “frozen”, can later be measured as both heads, or both tails. Even if you separate them, the outcomes never become independent: you will never find one half-coin to land on tails, and the other on heads.
Einstein thought that this couldn’t be the whole story. He was bothered by the way that measuring a “frozen” coin seems to change its behavior faster than light, screwing up his theory of special relativity. Entanglement, with its ability to separate halves of a coin as far as you liked, just made the problem worse. He thought that there must be a deeper theory, one with “hidden variables” that determined whether the halves would be heads or tails before they were separated.
Bell’s inequalities were just theory, though, until this year’s Nobelists arrived to test them. Clauser was first: in the 70’s, he proposed a variant of Bell’s inequalities, then tested them by measuring members of a pair of entangled photons in two different places. He found complete agreement with quantum mechanics.
Still, there was a loophole left for Einstein’s idea. If the settings on the two measurement devices could influence the pair of photons when they were first entangled, that would allow hidden variables to influence the outcome in a way that avoided Bell and Clauser’s calculations. It was Aspect, in the 80’s, who closed this loophole: by doing experiments fast enough to change the measurement settings after the photons were entangled, he could show that the settings could not possibly influence the forming of the entangled pair.
Aspect’s experiments, in many minds, were the end of the story. They were the ones emphasized in the textbooks when I studied quantum mechanics in school.
The remaining loopholes are trickier. Some hope for a way to correlate the behavior of particles and measurement devices that doesn’t run afoul of Aspect’s experiment. This idea, called, superdeterminism, has recently had a fewpassionateadvocates, but speaking personally I’m still confused as to how it’s supposed to work. Others want to jettison special relativity altogether. This would not only involve measurements influencing each other faster than light, but also would break a kind of symmetry present in the experiments, because it would declare one measurement or the other to have happened “first”, something special relativity forbids. The majority, uncomfortable with either approach, thinks that quantum mechanics is complete, with no deterministic theory that can replace it. They differ only on how to describe, or interpret, the theory, a debate more the domain of careful philosophy than of physics.
After all of these philosophical debates over the nature of reality, you may ask what quantum entanglement can do for you?
For overambitious apes like us, adding integers is the easiest thing in the world. Take one berry, add another, and you have two. Each remains separate, you can lay them in a row and count them one by one, each distinct thing adding up to a group of distinct things.
Other things in math are less like berries. Add two real numbers, like pi and the square root of two, and you get another real number, bigger than the first two, something you can write in an infinite messy decimal. You know in principle you can separate it out again (subtract pi, get the square root of two), but you can’t just stare at it and see the parts. This is less like adding berries, and more like adding fluids. Pour some water in to some other water, and you certainly have more water. You don’t have “two waters”, though, and you can’t tell which part started as which.
Some things in math look like berries, but are really like fluids. Take a polynomial, say . It looks like three types of things, like three berries: five , six , and eight . Add another polynomial, and the illusion continues: add and you get . You’ve just added more , more , more , like adding more strawberries, blueberries, and raspberries.
But those berries were a choice you made, and not the only one. You can rewrite that first polynomial, for example saying . That’s the same thing, you can check. But now it looks like five , negative four , and seven . It’s different numbers of different things, blackberries or gooseberries or something. And you can do this in many ways, infinitely many in fact. The polynomial isn’t really a collection of berries, for all it looked like one. It’s much more like a fluid, a big sloshing mess you can pour into buckets of different sizes. (Technically, it’s a vector space. Your berries were a basis.)
Even smart, advanced students can get tripped up on this. You can be used to treating polynomials as a fluid, and forget that directions in space are a fluid, one you can rotate as you please. If you’re used to directions in space, you’ll get tripped up by something else. You’ll find that types of particles can be more fluid than berry, the question of which quark is which not as simple as how many strawberries and blueberries you have. The laws of physics themselves are much more like a fluid, which should make sense if you take a moment, because they are made of equations, and equations are like a fluid.
So my fellow overambitious apes, do be careful. Not many things are like berries in the end. A whole lot are like fluids.
Monday is Valentine’s Day, so I’m following my yearly tradition and posting a poem about love and physics. If you like it, be sure to check out my poems from past years here.
A physicist once dreamed
of a life like a crystal.
Each facet the same, again and again,
until the end of time.
This is, of course, impossible.
A physicist once dreamed
of a life like a crystal.
Each facet the same, again and again,
with reliable effort
(what the young physicists call work).
This, (you might say of course,) is possible.
It means more than you’d think.
A thing we model as a spring
(or: anyone and anything)
has a restoring force:
a force to pull it back
a force to keep it going.
A thing we model as a spring
(yes you and me and everything)
has a damping force, too:
this slows it down
and tires it out.
The dismal law
of finite life.
The driving force is another thing
no mere possession of the spring.
The driving force comes from
o u t s i d e
and breaks the rules.
Your rude “of course”:
a sign you guess
a simple resolution.
That outside helpmeet,
will be used up,
fueling that crystal life.
That was the discovery.
No net drain,
but back and forth,
each feeding the other.
With this alone
(and only this)
the system breaks the dismal law
and lives forever.
(As a child, did you ever sing,
of giving away, and giving away,
and only having more?)
A physicist dreamed,
of a life like a crystal.
Collaboration made it real.
There’s something endlessly fascinating about the early days of quantum physics. In a century, we went from a few odd, inexplicable experiments to a practically complete understanding of the fundamental constituents of matter. Along the way the new ideas ended a world war, almost fueled another, and touched almost every field of inquiry. The people lucky enough to be part of this went from familiarly dorky grad students to architects of a new reality. Victor Weisskopf was one of those people, and The Joy of Insight: Passions of a Physicist is his autobiography.
Less well-known today than his contemporaries, Weisskopf made up for it with a front-row seat to basically everything that happened in particle physics. In the late 20’s and early 30’s he went from studying in Göttingen (including a crush on Maria Göppert before a car-owning Joe Mayer snatched her up) to a series of postdoctoral positions that would exhaust even a modern-day physicist, working in Leipzig, Berlin, Copenhagen, Cambridge, Zurich, and Copenhagen again, before fleeing Europe for a faculty position in Rochester, New York. During that time he worked for, studied under, collaborated or partied with basically everyone you might have heard of from that period. As a result, this section of the autobiography was my favorite, chock-full of stories, from the well-known (Pauli’s rudeness and mythical tendency to break experimental equipment) to the less-well known (a lab in Milan planned to prank Pauli with a door that would trigger a fake explosion when opened, which worked every time they tested it…and failed when Pauli showed up), to the more personal (including an in retrospect terrifying visit to the Soviet Union, where they asked him to critique a farming collective!) That era also saw his “almost Nobel”, in his case almost discovering the Lamb Shift.
Despite an “almost Nobel”, Weisskopf was paid pretty poorly when he arrived in Rochester. His story there puts something I’d learned before about another refugee physicist, Hertha Sponer, in a new light. Sponer’s university also didn’t treat her well, and it seemed reminiscent of modern academia. Weisskopf, though, thinks his treatment was tied to his refugee status: that, aware that they had nowhere else to go, universities gave the scientists who fled Europe worse deals than they would have in a Nazi-less world, snapping up talent for cheap. I could imagine this was true for Sponer as well.
Like almost everyone with the relevant expertise, Weisskopf was swept up in the Manhattan project at Los Alamos. There he rose in importance, both in the scientific effort (becoming deputy leader of the theoretical division) and the local community (spending some time on and chairing the project’s “town council”). Like the first sections, this surreal time leads to a wealth of anecdotes, all fascinating. In his descriptions of the life there I can see the beginnings of the kinds of “hiking retreats” physicists would build in later years, like the one at Aspen, that almost seem like attempts to recreate that kind of intense collaboration in an isolated natural place.
After the war, Weisskopf worked at MIT before a stint as director of CERN. He shepherded the facility’s early days, when they were building their first accelerators and deciding what kinds of experiments to pursue. I’d always thought that the “nuclear” in CERN’s name was an artifact of the times, when “nuclear” and “particle” physics were thought of as the same field, but according to Weisskopf the fields were separate and it was already a misnomer when the place was founded. Here the book’s supply of anecdotes becomes a bit more thin, and instead he spends pages on glowing descriptions of people he befriended. The pattern continues after the directorship as his duties get more administrative, spending time as head of the physics department at MIT and working on arms control, some of the latter while a member of the Pontifical Academy of Sciences (which apparently even a Jewish atheist can join). He does work on some science, though, collaborating on the “bag of quarks” model of protons and neutrons. He lives to see the fall of the Berlin wall, and the end of the book has a bit of 90’s optimism to it, the feeling that finally the conflicts of his life would be resolved. Finally, the last chapter abandons chronology altogether, and is mostly a list of his opinions of famous composers, capped off with a Bohr-inspired musing on the complementary nature of science and the arts, humanities, and religion.
One of the things I found most interesting in this book was actually something that went unsaid. Weisskopf’s most famous student was Murray Gell-Mann, a key player in the development of the theory of quarks (including coining the name). Gell-Mann was famously cultured (in contrast to the boorish-almost-as-affectation Feynman) with wide interests in the humanities, and he seems like exactly the sort of person Weisskopf would have gotten along with. Surprisingly though, he gets no anecdotes in this book, and no glowing descriptions: just a few paragraphs, mostly emphasizing how smart he was. I have to wonder if there was some coldness between them. Maybe Weisskopf had difficulty with a student who became so famous in his own right, or maybe they just never connected. Maybe Weisskopf was just trying to be generous: the other anecdotes in that part of the book are of much less famous people, and maybe Weisskopf wanted to prioritize promoting them, feeling that they were underappreciated.
Weisskopf keeps the physics light to try to reach a broad audience. This means he opts for short explanations, and often these are whatever is easiest to reach for. It creates some interesting contradictions: the way he describes his “almost Nobel” work in quantum electrodynamics is very much the way someone would have described it at the time, but very much not how it would be understood later, and by the time he talks about the bag of quarks model his more modern descriptions don’t cleanly link with what he said earlier. Overall, his goal isn’t really to explain the physics, but to explain the physicists. I enjoyed the book for that: people do it far too rarely, and the result was a really fun read.
Ask a doctor or a psychologist if they’re sure about something, and they might say “it has p<0.05”. Ask a physicist, and they’ll say it’s a “5 sigma result”. On the surface, they sound like they’re talking about completely different things. As it turns out, they’re not quite that different.
Whether it’s a p-value or a sigma, what scientists are giving you is shorthand for a probability. The p-value is the probability itself, while sigma tells you how many standard deviations something is away from the mean on a normal distribution. For people not used to statistics this might sound very complicated, but it’s not so tricky in the end. There’s a graph, called a normal distribution, and you can look at how much of it is above a certain point, measured in units called standard deviations, or “sigmas”. That gives you your probability.
What are these numbers a probability of? At first, you might think they’re a probability of the scientist being right: of the medicine working, or the Higgs boson being there.
That would be reasonable, but it’s not how it works. Scientists can’t measure the chance they’re right. All they can do is compare models. When a scientist reports a p-value, what they’re doing is comparing to a kind of default model, called a “null hypothesis”. There are different null hypotheses for different experiments, depending on what the scientists want to test. For the Higgs, scientists looked at pairs of photons detected by the LHC. The null hypothesis was that these photons were created by other parts of the Standard Model, like the strong force, and not by a Higgs boson. For medicine, the null hypothesis might be that people get better on their own after a certain amount of time. That’s hard to estimate, which is why medical experiments use a control group: a similar group without the medicine, to see how much they get better on their own.
Once we have a null hypothesis, we can use it to estimate how likely it is that it produced the result of the experiment. If there was no Higgs, and all those photons just came from other particles, what’s the chance there would still be a giant pile of them at one specific energy? If the medicine didn’t do anything, what’s the chance the control group did that much worse than the treatment group?
Ideally, you want a small probability here. In medicine and psychology, you’re looking for a 5% probability, for p<0.05. In physics, you need 5 sigma to make a discovery, which corresponds to a one in 3.5 million probability. If the probability is low, then you can say that it would be quite unlikely for your result to happen if the null hypothesis was true. If you’ve got a better hypothesis (the Higgs exists, the medicine works), then you should pick that instead.
Before this year’s prize was announced, I remember a few “water cooler chats” about who might win. No guess came close, though. The Nobel committee seems to have settled into a strategy of prizes on a loosely linked “basket” of topics, with half the prize going to a prominent theorist and the other half going to two experimental, observational, or (in this case) computational physicists. It’s still unclear why they’re doing this, but regardless it makes it hard to predict what they’ll do next!
When I read the announcement, my first reaction was, “surely it’s not that Parisi?” Giorgio Parisi is known in my field for the Altarelli-Parisi equations (more properly known as the DGLAP equations, the longer acronym because, as is often the case in physics, the Soviets got there first). These equations are in some sense why the scattering amplitudes I study are ever useful at all. I calculate collisions of individual fundamental particles, like quarks and gluons, but a real particle collider like the LHC collides protons. Protons are messy, interacting combinations of quarks and gluons. When they collide you need not merely the equations describing colliding quarks and gluons, but those that describe their messy dynamics inside the proton, and in particular how those dynamics look different for experiments with different energies. The equation that describes that is the DGLAP equation.
As it turns out, Parisi is known for a lot more than the DGLAP equation. He is best known for his work on “spin glasses”, models of materials where quantum spins try to line up with each other, never quite settling down. He also worked on a variety of other complex systems, including flocks of birds!
I don’t know as much about Manabe and Hasselmann’s work. I’ve only seen a few talks on the details of climate modeling. I’ve seen plenty of talks on other types of computer modeling, though, from people who model stars, galaxies, or black holes. And from those, I can appreciate what Manabe and Hasselmann did. Based on those talks, I recognize the importance of those first one-dimensional models, a single column of air, especially back in the 60’s when computer power was limited. Even more, I recognize how impressive it is for someone to stay on the forefront of that kind of field, upgrading models for forty years to stay relevant into the 2000’s, as Manabe did. Those talks also taught me about the challenge of coupling different scales: how small effects in churning fluids can add up and affect the simulation, and how hard it is to model different scales at once. To use these effects to discover which models are reliable, as Hasselmann did, is a major accomplishment.