Over the last year, some people felt like they were living in a science fiction novel. Last November, the research laboratory OpenAI released ChatGPT, a program that can answer questions on a wide variety of topics. Last month, they announced GPT-4, a more powerful version of ChatGPT’s underlying program. Already in February, Microsoft used GPT-4 to add a chatbot feature to its search engine Bing, which journalists quickly managed to use to spin tales of murder and mayhem.
For those who have been following these developments, things don’t feel quite so sudden. Already in 2019, AI Dungeon showed off how an early version of GPT could be used to mimic an old-school text-adventure game, and a tumblr blogger built a bot that imitates his posts as a fun side project. Still, the newer programs have shown some impressive capabilities.
Are we close to “real AI”, to artificial minds like the positronic brains in Isaac Asimov’s I, Robot? I can’t say, in part because I’m not sure what “real AI” really means. But if you want to understand where things like ChatGPT come from, how they work and why they can do what they do, then all the talk of AI won’t be helpful. Instead, you need to think of an entirely different set of Asimov novels: the Foundation series.
While Asimov’s more famous I, Robot focused on the science of artificial minds, the Foundation series is based on a different fictional science, the science of psychohistory. In the stories, psychohistory is a kind of futuristic social science. In the real world, historians and sociologists can find general principles of how people act, but don’t yet have the kind of predictive theories physicists or chemists do. Foundation imagines a future where powerful statistical methods have allowed psychohistorians to precisely predict human behavior: not yet that of individual people, but at least the average behavior of civilizations. They can not only guess when an empire is soon to fall, but calculate how long it will be before another empire rises, something few responsible social scientists would pretend to do today.
GPT and similar programs aren’t built to predict the course of history, but they do predict something: given part of a text, they try to predict the rest. They’re called Large Language Models, or LLMs for short. They’re “models” in the sense of mathematical models, formulas that let us use data to make predictions about the world, and the part of the world they model is our use of language.
Normally, a mathematical model is designed based on how we think the real world works. A mathematical model of a pandemic, for example, might use a list of people, each one labeled as infected or not. It could include an unknown number, called a parameter, for the chance that one person infects another. That parameter would then be filled in, or fixed, based on observations of the pandemic in the real world.
LLMs (as well as most of the rest of what people call “AI” these days) are a bit different. Their models aren’t based on what we expect about the real world. Instead, they’re in some sense “generic”, models that could in principle describe just about anything. In order to make this work, they have a lot more parameters, tons and tons of flexible numbers that can get fixed in different ways based on data.
(If that part makes you a bit uncomfortable, it bothers me too, though I’ve mostly made my peace with it.)
The surprising thing is that this works, and works surprisingly well. Just as psychohistory from the Foundation novels can predict events with much more detail than today’s historians and sociologists, LLMs can predict what a text will look like much more precisely than today’s literature professors. That isn’t necessarily because LLMs are “intelligent”, or because they’re “copying” things people have written. It’s because they’re mathematical models, built by statistically analyzing a giant pile of texts.
Just as Asimov’s psychohistory can’t predict the behavior of individual people, LLMs can’t predict the behavior of individual texts. If you start writing something, you shouldn’t expect an LLM to predict exactly how you would finish. Instead, LLMs predict what, on average, the rest of the text would look like. They give a plausible answer, one of many, for what might come next.
They can’t do that perfectly, but doing it imperfectly is enough to do quite a lot. It’s why they can be used to make chatbots, by predicting how someone might plausibly respond in a conversation. It’s why they can write fiction, or ads, or college essays, by predicting a plausible response to a book jacket or ad copy or essay prompt.
LLMs like GPT were invented by computer scientists, not social scientists or literature professors. Because of that, they get described as part of progress towards artificial intelligence, not as progress in social science. But if you want to understand what ChatGPT is right now, and how it works, then that perspective won’t be helpful. You need to put down your copy of I, Robot and pick up Foundation. You’ll still be impressed, but you’ll have a clearer idea of what could come next.
I think there should be a deeper exploration of the connection between Transformer Architectures and PsychoHistory, drawing inspiration from the Foundation series, Westworld, and recent research. Simulating human behavior through autonomous agents, like those built with OpenAI’s APIs, can create interactive models that align with human values and improve without human input. These models could be personalized, integrated into software, and contribute to more advanced, task-specific AI systems.
The paper “Generative Agents: Interactive Simulacra of Human Behavior” explores this concept further. By applying these human-like agents beyond game NPCs, we can envision a future akin to the TV series Pantheon, where uploaded human minds achieve exponential development and breakthroughs. Accelerated simulations, driven by advanced GPUs, could help us reach new heights in research and technology. Ultimately, simulating AI agents as societies could lead to more advanced, better-aligned, and personalized AI systems, surpassing the capabilities of GPT and other large language models.
here are some of the papers where such research is geared towards
LikeLiked by 1 person
I don’t know that I’d say that the models improve, since it’s not that there’s actual new training of the underlying model going on, just finding tricks to fit a lot of past experiences of a simulated agent into a fixed context window. But these developments are fascinating nonetheless, thanks for sharing!