This blog is currently hosted on a site called WordPress.com. When I started the blog, I picked WordPress mostly just because it was easy and free. (Since then I started paying them money, both to remove ads and to get a custom domain, 4gravitons.com.)
Now, the blog is more popular, and you guys access it in a wide variety of ways. 333 of you are other users of WordPress.com: WordPress has a “Reader” tab that lets users follow other blogs through the site. (I use that tab to keep up with a few of the blogs in my Blogroll.) 258 of you instead get a weekly email: this is a service WordPress.com offers, letting people sign up by email to the blog. Others follow on social media: on Twitter, Facebook, and Tumblr.
(Are there other options? If someone’s figured out how to follow with an RSS feed, or wants me to change something so they can do that, let me know in the comments!)
Recently, I’ve gotten a bit annoyed with the emails WordPress sends out. The problem is that they don’t seem to handle images in a sensible way: I can scale an image to fit in a blog post, but in the email the image is always full-size, sometimes taking up the entire screen.
Last year, someone reached out to me from Substack.com, trying to recruit me to switch to their site. Substack is a (increasingly popular) blogging platform, focused on email newsletters. The whole site is built around the idea that posts are emailed to subscribers, with a simplified layout that makes that feasible and consistent. Like WordPress, they have a system where people can follow the blog through a Substack account, and the impression I get is that a lot of people use it, browsing topics they find interesting.
(Substack also has a system for paid subscribers. That isn’t mandatory, and partly due to recent events I’m not expecting to use it.)
Since Substack is built for emails, I’m guessing it would solve the issue I’ve been having with images. It would also let more people discover the blog via the Substack app. On the other hand, Substack allows a lot less customization. I wouldn’t be able to have the cute pull-down menus from the top of the blog, or the Feynman diagram background. I don’t think I could have the Tag cloud or the Categories filter.
Most importantly, though, I don’t want to lose long-term readers. I don’t know if some of you would have more trouble accessing Substack than WordPress, or if some really prefer to follow here.
One option is that I use both sites, at least for a bit. There are services built for cross-posting, that let a post on Substack automatically get posted on WordPress as well. I might do that temporarily (to make sure everyone has enough warning to transfer) or permanently (if there are people who really would never use Substack).
(I also might end up making an institutional web page with some of the useful educational guides, once I’ve got a permanent job. That could cover some features that Substack can’t.)
I wanted to do a couple polls, to get a feeling for your opinions. The first is a direct yes or no: do you prefer I stay at WordPress, prefer I switch to Substack, or don’t care either way. (For example, if you follow me via Facebook, you’ll get a link every week regardless.) The second poll asks about more detailed concerns, and you can pick as many entries as you want to give me a feeling for what matters to you. Please, if you read the blog at all regularly, fill out both polls: I want to know what you think!
It’s a question I’ve now heard several times, in different forms. People hear that I’ll be hired as a researcher at an institute of theoretical physics, and they ask, “what, exactly, are they paying you to research?”
The answer, with some caveats: “Whatever I want.”
When a company hires a researcher, they want to accomplish specific things: to improve their products, to make new ones, to cut down on fraud or out-think the competition. Some government labs are the same: if you work for NIST, for example, your work should contribute in some way to achieving more precise measurements and better standards for technology.
Other government labs, and universities, are different. They pursue basic research, research not on any specific application but on the general principles that govern the world. Researchers doing basic research are given a lot of freedom, and that freedom increases as their careers go on.
As a PhD student, a researcher is a kind of apprentice, working for their advisor. Even then, they have some independence: an advisor may suggest projects, but PhD students usually need to decide how to execute them on their own. In some fields, there can be even more freedom: in theoretical physics, it’s not unusual for the more independent students to collaborate with other people than just their advisor.
Postdocs, in turn, have even more freedom. In some fields they get hired to work on a specific project, but they tend to have more freedom as to how to execute it than a PhD student would. Other fields give them more or less free rein: in theoretical physics, a postdoc will have some guidance, but often will be free to work on whatever they find interesting.
Professors, and other long-term researchers, have the most freedom of all. Over the climb from PhD to postdoc to professor, researchers build judgement, demonstrating a track record for tackling worthwhile scientific problems. Universities, and institutes of basic research, trust that judgement. They hire for that judgement. They give their long-term researchers free reign to investigate whatever questions they think are valuable.
In practice, there are some restrictions. Usually, you’re supposed to research in a particular field: at an institute for theoretical physics, I should probably research theoretical physics. (But that can mean many things: one of my future colleagues studies the science of cities.) Further pressure comes from grant funding, money you need to hire other researchers or buy equipment that can come with restrictions attached. When you apply for a grant, you have to describe what you plan to do. (In practice, grant agencies are more flexible about this than you might expect, allowing all sorts of changes if you have a good reason…but you still can’t completely reinvent yourself.) Your colleagues themselves also have an impact: it’s much easier to work on something when you can walk down the hall and ask an expert when you get stuck. It’s why we seek out colleagues who care about the same big questions as we do.
Overall, though, research is one of the free-est professions there is. If you can get a job learning for a living, and do it well enough, then people will trust your judgement. They’ll set you free to ask your own questions, and seek your own answers.
My blog began, almost eleven years ago, with the title “Four Gravitons and a Grad Student”. Since then, I finished my PhD. The “Grad Student” dropped from the title, and the mysterious word “postdoc” showed up on a few pages. For three years I worked as a postdoc at the Perimeter Institute in Canada, before hopping the pond and starting another three-year postdoc job in Denmark. With a grant from the EU, three years became four. More funding got me to five (with a fancier title), and now nearing on six. Each step, my contract has been temporary: at first three years at a time, then one-year extensions. Each year I applied, all over the world, looking for a permanent job: for a chance to settle down somewhere, to build my own research group without worrying about having to move the next year.
This year, things have finally worked out. In the Fall I will be moving to France, starting a junior permanent position with L’Institut de Physique Théorique (or IPhT) at CEA Paris-Saclay.
A photo of the entryway to the Institute, taken when I interviewed
It’s been a long journey to get here, with a lot of soul-searching. This year in particular has been a year of reassessment: of digging deep and figuring out what matters to me, what I hope to accomplish and what clues I have to guide the way. Sometimes I feel like I’ve matured more as a physicist in the last year than in the last three put together.
The CEA (originally Commissariat à l’énergie atomique, now Commissariat à l’énergie atomique et aux énergies alternatives, or Alternative Energies and Atomic Energy Commission, and yes that means they’re using the “A” for two things at the same time), is roughly a parallel organization to the USA’s Department of Energy. Both organizations began as a way to manage their nation’s nuclear program, but both branched out, both into other forms of energy and into scientific research. Both run a nationwide network of laboratories, lightly linked but independent from their nations’ universities, both with notable facilities for particle physics. The CEA’s flagship site is in Saclay, on the outskirts of Paris, and it’s their Institute for Theoretical Physics where I’ll be working.
My new position is genuinely permanent: unlike a tenure-track position in the US, I don’t go up for review after a fixed span of time, with the expectation that if I don’t get promoted I lose the job altogether. It’s also not a university, which in particular means I’m not required to teach. I’ll have the option of teaching, working with nearby universities. In the long run, I think I’ll pursue that option. I’ve found teaching helpful the past couple years: it’s helped me think about physics, and think about how to communicate physics. But it’s good not to have to rush into preparing a new course when I arrive, as new professors often do.
Working temporary positions year after year, not knowing where I’ll be the next year, has been stressful. Others have had it worse, though. Some of you might have seen a recent post by Bret Deveraux, a military historian with a much more popular blog who has been in a series of adjunct positions. Deveraux describes the job market for the humanities in the US quite well. I’m in theoretical physics in Europe, so while my situation hasn’t been easy, it has been substantially better.
First, there’s the physics component. Physics has “adjunctified” much less than other fields. I don’t think I know a single physicist who has taken an adjunct teaching position, the kind of thing where you’re paid per course and only to teach. I know many who have left physics for other kinds of work, for Wall Street or Silicon Valley or to do data science for a bank or to teach high school. On the other side, I know people in other fields who do work as adjuncts, particularly in mathematics.
Deveraux blames the culture of his field, but I think funding also must have an important role. Physicists, and scientists in many other areas, rarely get professor positions right after their PhDs, but that doesn’t mean they leave the field entirely because most can find postdoc positions. Those postdocs are focused on research, and are often paid for by government grants: in my field in the US, that usually means the Department of Energy. People can go through two or sometimes even three such positions before finding something permanent, if they don’t leave the field before that. Without something like the Department of Energy or National Institutes of Health providing funding, I don’t know if the humanities could imitate that structure even if they wanted to.
Europe, in turn, has a different situation than the US. Most European countries don’t have a tenure-track: just permanent positions and fixed-term positions. Funding also works quite differently. Department of Energy funding in the US is spread widely and lightly: grants are shared by groups of theorists at a given university, each getting funding for a few postdocs and PhDs across the group. In Europe, a lot of the funding is much more concentrated: big grants from the European Research Council going to individual professors, with various national and private grants supplementing or mirroring that structure. That kind of funding, and the rarity of tenure, in turn leads to a different kind of temporary position: one not hired to teach a course but hired for research as long as the funding lasts. The Danish word for my current title is Adjunkt, but that’s as one says in France a faux ami: the official English translation is Assistant Professor, and it’s nothing like a US adjunct. I know people in a variety of forms of that kind of position in a variety of countries, people who landed a five-year grant where they could act like a professor, hire people and so on, but who in the end were expected to move when the grant was over. It’s a stressful situation, but at least it lets us further our research and make progress, unlike a US adjunct in the humanities or math who needs to spend much of their time on teaching.
I do hope Deveraux finds a permanent position, he’s got a great blog. And to return to the theme of the post, I am extremely grateful and happy that I have managed to find a permanent position. I’m looking forward to joining the group at Saclay: to learning more about physics from them, but also, to having a place where I can start to build something, and make a lasting impact on the world around me.
I’m traveling this week, so this will just be a short post. This isn’t a scientific trip exactly: I’m in Poland, at an event connected to the 550th anniversary of the birth of Copernicus.
Not this one, but they do have nice posters!
Part of this event involved visiting the Copernicus Science Center, the local children’s science museum. The place was sold out completely. For any tired science communicators, I recommend going to a sold-out science museum: the sheer enthusiasm you’ll find there is balm for the most jaded soul.
Scientists have famously bad work-life balance. You’ve probably heard stories of scientists working long into the night, taking work with them on weekends or vacation, or falling behind during maternity or paternity leave.
Some of this is culture. Certain fields have a very cutthroat attitude, with many groups competing to get ahead and careers on the line if they fail. Not every field is like that though: there are sub-fields that are more collaborative than competitive, that understand work-life balance and try to work together to a shared goal. I’m in a sub-field like that, so I know they exist.
Put aside the culture, and you’ve still got passion. Science is fun, it’s puzzle after puzzle, topics chosen because we find them fascinating. Even in the healthiest workplace you’d still have scientists pondering in the shower and scribbling notes on the plane, mixing business with pleasure because the work is genuinely both.
But there’s one more reason scientists are workaholics. I suspect, ultimately, it’s the most powerful reason. It’s that every scientist is, in some sense, irreplaceable.
In most jobs, if you go on vacation, someone can fill in when you’re gone. The replacement may not be perfect (think about how many times you watched movies in school with a substitute teacher), but they can cover for you, making some progress on your tasks until you get back. That works because you and they have a shared training, a common core that means they can step in and get what needs to be done done.
Scientists have shared training too, of course. Some of our tasks work the same way, the kind of thing that any appropriate expert can do, that just need someone to spend the time to do them.
But our training has a capstone, the PhD thesis. And the thing about a PhD thesis is that it is, always and without exception, original research. Each PhD thesis is an entirely new result, something no-one else had known before, discovered by the PhD candidate. Each PhD thesis is unique.
That, in turn, means that each scientist is unique. Each of us has our own knowledge, our own background, our own training, built up not just during the PhD but through our whole career. And sometimes, the work we do requires that unique background. It’s why we collaborate, why we reach out to different people around the world, looking for the unique few people who know how to do what we need.
Over time, we become a kind of embodiment of our accumulated knowledge. We build a perspective shaped by our experience, goals for the field and curiosity just a bit different from everyone else’s. We act as agents of that perspective, each the one person who can further our particular vision of where science is going. When we enter a collaboration, when we walk into the room at a conference, we are carrying with us all we picked up along the way, each a story just different enough to matter. We extrapolate from what we know, and try to do everything that knowledge can do.
So we can, and should, take vacations, yes, and we can, and should, try to maintain a work-life balance. We need to to survive, to stay sane. But we do have to accept that when we do, certain things won’t get done as fast. Our own personal vision, our extrapolated knowledge…will just have to wait.
Scientists want to know everything, and we’ve been trying to get there since the dawn of science. So why aren’t we there yet? Why are there things we still don’t know?
Sometimes, the reason is obvious: we can’t do the experiments yet. Victorian London had neither the technology nor the wealth to build a machine like Fermilab, so they couldn’t discover the top quark. Even if Newton had the idea for General Relativity, the telescopes of the era wouldn’t have let astronomers see its effect on the motion of Mercury. As we grow (in technology, in resources, in knowledge, in raw number of human beings), we can test more things and learn more about the world.
But I’m a theoretical physicist, not an experimental physicist. I still want to understand the world, but what I contribute aren’t new experiments, but new ideas and new calculations. This brings back the question in a new form: why are there calculations we haven’t done yet? Why are there ideas we haven’t had yet?
Sometimes, we can track the reason down to bottlenecks. A bottleneck is a step in a calculation that, for some reason, is harder than the rest. As you try to push a calculation to new heights, the bottleneck is the first thing that slows you down, like the way liquid bubbles through the neck of a literal bottle. If you can clear the bottleneck, you can speed up your calculation and accomplish more.
In the clearest cases, we can see how these bottlenecks could be solved with more technology. As computers get faster and more powerful, calculations become possible that weren’t possible before, in the same way new experiments become possible with new equipment. This is essentially what has happened recently with machine learning, where relatively old ideas are finally feasible to apply on a massive scale.
In physics, a subtlety is that we rarely have access to the most powerful computers available. Some types of physics are done on genuine supercomputers, but for more speculative or lower-priority research we have to use small computer clusters, or even our laptops. Something can be a bottleneck not because it can’t be done on any computer, but because it can’t be done on the computers we can afford.
Most of the time, bottlenecks aren’t quite so obvious. That’s because in theoretical physics, often, we don’t know what we want to calculate. If we want to know why something happens, and not merely that it happens, then we need a calculation that we can interpret, that “makes sense” and that thus, hopefully, we can generalize. We might have some ideas for how that calculation could work: some property a mathematical theory might have that we already know how to understand. Some of those ideas are easy to check, so we check, and make progress. Others are harder, and we have to decide: is the calculation worth it, if we don’t know if it will give us the explanation we need?
Those decisions provide new bottlenecks, often hidden ones. As we get better at calculation, the threshold for an “easy” check gets easier and easier to meet. We put aside fewer possibilities, so we notice more things, which inspire yet more ideas. We make more progress, not because the old calculations were impossible, but because they weren’t easy enough, and now they are. Progress fuels progress, a virtuous cycle that gets us closer and closer to understanding everything we want to understand (which is everything).
In the past, Michio Kaku made important contributions to string theory, but he’s best known for what could charitably be called science popularization. He’s an excited promoter of physics and technology, but that excitement often strays into inaccuracy. Pretty much every time I’ve heard him mentioned, it’s for some wildly overenthusiastic statement about physics that, rather than just being simplified for a general audience, is generally flat-out wrong, conflating a bunch of different developments in a way that makes zero actual sense.
Michio Kaku isn’t unique in this. There’s a whole industry in making nonsense statements about science, overenthusiastic books and videos hinting at science fiction or mysticism. Deepak Chopra is a famous figure from deeper on this spectrum, known for peddling loosely quantum-flavored spirituality.
There was a time I was worried about this kind of thing. Super-popular misinformation is the bogeyman of the science popularizer, the worry that for every nice, careful explanation we give, someone else will give a hundred explanations that are way more exciting and total baloney. Somehow, though, I hear less and less from these people over time, and thus worry less and less about them.
(But then in practice, I’m more likely to reflect on content with even smalleraudiences.)
If misinformation is this popular, shouldn’t I be doing more to combat it?
Popular misinformation is also going to be popular among critics. For every big-time nonsense merchant, there are dozens of people breaking down and debunking every false statement they say, every piece of hype they release. Often, these people will end up saying the same kinds of things over and over again.
If I can be useful, I don’t think it will be by saying the same thing over and over again. I come up with new metaphors, new descriptions, new explanations. I clarify things others haven’t clarified, I clear up misinformation others haven’t addressed. That feels more useful to me, especially in a world where others are already countering the big problems. I write, and writing lasts, and can be used again and again when needed. I don’t need to keep up with the Kakus and Chopras of the world to do that.
(Which doesn’t imply I’ll never address anything one of those people says…but if I do, it will be because I have something new to say back!)
Nowadays, we have telescopes that detect not just light, but gravitational waves. We’ve already learned quite a bit about astrophysics from these telescopes. They observe ripples coming from colliding black holes, giving us a better idea of what kinds of black holes exist in the universe. But the coolest thing a gravitational wave telescope could discover is something that hasn’t been seen yet: a cosmic string.
This art is from an article in Symmetry magazine which is, as far as I can tell, not actually about cosmic strings.
You might have heard of cosmic strings, but unless you’re a physicist you probably don’t know much about them. They’re a prediction, coming from cosmology, of giant string-like objects floating out in space.
That might sound like it has something to do with string theory, but it doesn’t actually have to, you can have these things without any string theory at all. Instead, you might have heard that cosmic strings are some kind of “cracks” or “wrinkles” in space-time. Some articles describe this as like what happens when ice freezes, cracks forming as water settles into a crystal.
That description, in terms of ice forming cracks between crystals, is great…if you’re a physicist who already knows how ice forms cracks between crystals. If you’re not, I’m guessing reading those kinds of explanations isn’t helpful. I’m guessing you’re still wondering why there ought to be any giant strings floating in space.
The real explanation has to do with a type of mathematical gadget physicists use, called a scalar field. You can think of a scalar field as described by a number, like a temperature, that can vary in space and time. The field carries potential energy, and that energy depends on what the scalar field’s “number” is. Left alone, the field settles into a situation with as little potential energy as it can, like a ball rolling down a hill. That situation is one of the field’s default values, something we call a “vacuum” value. Changing the field away from its vacuum value can take a lot of energy. The Higgs boson is one example of a scalar field. Its vacuum value is the value it has in day to day life. In order to make a detectable Higgs boson at the Large Hadron Collider, they needed to change the field away from its vacuum value, and that took a lot of energy.
In the very early universe, almost back at the Big Bang, the world was famously in a hot dense state. That hot dense state meant that there was a lot of energy to go around, so scalar fields could vary far from their vacuum values, pretty much randomly. As the universe expanded and cooled, there was less and less energy available for these fields, and they started to settle down.
Now, the thing about these default, “vacuum” values of a scalar field is that there doesn’t have to be just one of them. Depending on what kind of mathematical function the field’s potential energy is, there could be several different possibilities each with equal energy.
Let’s imagine a simple example, of a field with two vacuum values: +1 and -1. As the universe cooled down, some parts of the universe would end up with that scalar field number equal to +1, and some to -1. But what happens in between?
The scalar field can’t just jump from -1 to +1, that’s not allowed in physics. It has to pass through 0 in between. But, unlike -1 and +1, 0 is not a vacuum value. When the scalar field number is equal to 0, the field has more energy than it does when it’s equal to -1 or +1. Usually, a lot more energy.
That means the region of scalar field number 0 can’t spread very far: the further it spreads, the more energy it takes to keep it that way. On the other hand, the region can’t vanish altogether: something needs to happen to transition between the numbers -1 and +1.
The thing that happens is called a domain wall. A domain wall is a thin sheet, as thin as it can physically be, where the scalar field doesn’t take its vacuum value. You can roughly think of it as made up of the scalar field, a churning zone of the kind of bosons the LHC was trying to detect.
This sheet still has a lot of energy, bound up in the unusual value of the scalar field, like an LHC collision in every proton-sized chunk. As such, like any object with a lot of energy, it has a gravitational field. For a domain wall, the effect of this gravity would be very very dramatic: so dramatic, that we’re pretty sure they’re incredibly rare. If they were at all common, we would have seen evidence of them long before now!
Ok, I’ve shown you a wall, that’s weird, sure. What does that have to do with cosmic strings?
The number representing a scalar field doesn’t have to be a real number: it can be imaginary instead, or even complex. Now I’d like you to imagine a field with vacuum values on the unit circle, in the complex plane. That means that +1 and -1 are still vacuum values, but so are , and , and everything else you can write as . However, 0 is still not a vacuum value. Neither is, for example, .
With vacuum values like this, you can’t form domain walls. You can make a path between -1 and +1 that only goes through the unit circle, through for example. The field will be at its vacuum value throughout, taking no extra energy.
However, imagine the different regions form a circle. In the picture above, suppose that the blue area at the bottom is at vacuum value -1 and red is at +1. You might have in the green region, and in the purple region, covering the whole circle smoothly as you go around.
Now, think about what happens in the middle of the circle. On one side of the circle, you have -1. On the other, +1. (Or, on one side , on the other, ). No matter what, different sides of the circle are not allowed to be next to each other, you can’t just jump between them. So in the very middle of the circle, something else has to happen.
Once again, that something else is a field that goes away from its vacuum value, that passes through 0. Once again, that takes a lot of energy, so it occupies as little space as possible. But now, that space isn’t a giant wall. Instead, it’s a squiggly line: a cosmic string.
Cosmic strings don’t have as dramatic a gravitational effect as domain walls. That means they might not be super-rare. There might be some we haven’t seen yet. And if we do see them, it could be because they wiggle space and time, making gravitational waves.
Cosmic strings don’t require string theory, they come from a much more basic gadget, scalar fields. We know there is one quite important scalar field, the Higgs field. The Higgs vacuum values aren’t like +1 and -1, or like the unit circle, though, so the Higgs by itself won’t make domain walls or cosmic strings. But there are a lot of proposals for scalar fields, things we haven’t discovered but that physicists think might answer lingering questions in particle physics, and some of those could have the right kind of vacuum values to give us cosmic strings. Thus, if we manage to detect cosmic strings, we could learn something about one of those lingering questions.
I’ve met quite a few people who are surprised by this. I hear the same question again and again, from curious Danes at outreach events to a tired border guard in the pre-clearance area of the Toronto airport: why are you, an American, working here?
It’s not, on the face of it, an unreasonable question. Moving internationally is hard and expensive. You may have to take your possessions across the ocean, learn new languages and customs, and navigate an unfamiliar bureaucracy. You begin as a temporary resident, not a citizen, with all the risks and uncertainty that involves. Given a choice, most people choose to stay close to home. Countries sometimes back up this choice with additional incentives. There are laws in many places that demand that, given a choice, companies hire a local instead of a foreigner. In some places these laws apply to universities as well. With all that weight, why do so many researchers move abroad?
Two different forces stir the pot, making universities international: specialization, and diversification.
Researchers may find it easier to live close to people who grew up with us, but we work better near people who share our research interests. Science, and scholarship more generally, are often collaborative: we need to discuss with and learn from others to make progress. That’s still very hard to do remotely: it requires serendipity, chance encounters in the corridor and chats at the lunch table. As researchers in general have become more specialized, we’ve gotten to the point where not just any university will do: the people who do our kind of work are few enough that we often have to go to other countries to find them.
Specialization alone would tend to lead to extreme clustering, with researchers in each area gathering in only a few places. Universities push back against this, though. A university wants to maximize the chance that one of their researchers makes a major breakthrough, so they don’t want to hire someone whose work will just be a copy of someone they already have. They want to encourage interdisciplinary collaboration, to try to get people in different areas to talk to each other. Finally, they want to offer a wide range of possible courses, to give the students (many of whom are still local), a chance to succeed at many different things. As a result, universities try to diversify their faculty, to hire people from areas that, while not too far for meaningful collaboration, are distinct from what their current employees are doing.
The result is a constant international churn. We search for jobs in a particular sweet spot: with people close enough to spur good discussion, but far enough to not overspecialize. That search takes us all over the world, and all but guarantees we won’t find a job where we were trained, let alone where we were born. It makes universities quite international places, with a core of local people augmented by opportune choices from around the world. It makes us, and the way we lead our lives, quite unusual on a global scale. But it keeps the science fresh, and the ideas moving.
For those who have been following these developments, things don’t feel quite so sudden. Already in 2019, AI Dungeon showed off how an early version of GPT could be used to mimic an old-school text-adventure game, and a tumblr blogger built a bot that imitates his posts as a fun side project. Still, the newer programs have shown some impressive capabilities.
Are we close to “real AI”, to artificial minds like the positronic brains in Isaac Asimov’s I, Robot? I can’t say, in part because I’m not sure what “real AI” really means. But if you want to understand where things like ChatGPT come from, how they work and why they can do what they do, then all the talk of AI won’t be helpful. Instead, you need to think of an entirely different set of Asimov novels: the Foundation series.
While Asimov’s more famous I, Robot focused on the science of artificial minds, the Foundation series is based on a different fictional science, the science of psychohistory. In the stories, psychohistory is a kind of futuristic social science. In the real world, historians and sociologists can find general principles of how people act, but don’t yet have the kind of predictive theories physicists or chemists do. Foundation imagines a future where powerful statistical methods have allowed psychohistorians to precisely predict human behavior: not yet that of individual people, but at least the average behavior of civilizations. They can not only guess when an empire is soon to fall, but calculate how long it will be before another empire rises, something few responsible social scientists would pretend to do today.
GPT and similar programs aren’t built to predict the course of history, but they do predict something: given part of a text, they try to predict the rest. They’re called Large Language Models, or LLMs for short. They’re “models” in the sense of mathematical models, formulas that let us use data to make predictions about the world, and the part of the world they model is our use of language.
Normally, a mathematical model is designed based on how we think the real world works. A mathematical model of a pandemic, for example, might use a list of people, each one labeled as infected or not. It could include an unknown number, called a parameter, for the chance that one person infects another. That parameter would then be filled in, or fixed, based on observations of the pandemic in the real world.
LLMs (as well as most of the rest of what people call “AI” these days) are a bit different. Their models aren’t based on what we expect about the real world. Instead, they’re in some sense “generic”, models that could in principle describe just about anything. In order to make this work, they have a lot more parameters, tons and tons of flexible numbers that can get fixed in different ways based on data.
The surprising thing is that this works, and works surprisingly well. Just as psychohistory from the Foundation novels can predict events with much more detail than today’s historians and sociologists, LLMs can predict what a text will look like much more precisely than today’s literature professors. That isn’t necessarily because LLMs are “intelligent”, or because they’re “copying” things people have written. It’s because they’re mathematical models, built by statistically analyzing a giant pile of texts.
Just as Asimov’s psychohistory can’t predict the behavior of individual people, LLMs can’t predict the behavior of individual texts. If you start writing something, you shouldn’t expect an LLM to predict exactly how you would finish. Instead, LLMs predict what, on average, the rest of the text would look like. They give a plausible answer, one of many, for what might come next.
They can’t do that perfectly, but doing it imperfectly is enough to do quite a lot. It’s why they can be used to make chatbots, by predicting how someone might plausibly respond in a conversation. It’s why they can write fiction, or ads, or college essays, by predicting a plausible response to a book jacket or ad copy or essay prompt.
LLMs like GPT were invented by computer scientists, not social scientists or literature professors. Because of that, they get described as part of progress towards artificial intelligence, not as progress in social science. But if you want to understand what ChatGPT is right now, and how it works, then that perspective won’t be helpful. You need to put down your copy of I, Robot and pick up Foundation. You’ll still be impressed, but you’ll have a clearer idea of what could come next.