Tag Archives: PublicPerception

An AI Opinions Chart

You ever read something and suddenly a whole classification scheme lights up in your head?

A thread on X from “stringking42069” showed me a combination of opinions I hadn’t seen before. stringking42069 is a pro-string theory commentator with a macho gym bro memer gimmick. He’s openly contemptuous of many physicists who describe themselves as string theorists, arguing that only a smaller number really deserve the name.

To be clear, none of that is the new combination. Long-time readers of this blog will remember a frequent commenter with a very similar attitude, if much less tendency to use the word “bro”.

The new thing, from my perspective, is how he thinks about AI. As he explains in that thread, he sees AI as great at certain kinds of physics calculations, ones where the methods and goals are mostly known and the challenge is working out the math. He doesn’t expect it to be able to contribute real creativity or judgement, the messy decision-making that physicists use to decide what is worth building in the first place.

Others with that perspective tend to argue that this will be a boon for scientists, who AI will free up to do creative work, multiplying their output. The difference is, stringking42069 thinks a lot of scientists are not doing creative work in the first place, including most of the people making extensive use of AI. So if anything he’s happy to see them go, and only pissed that they’re sucking up resources and attention on the way out, and discouraging students who could be joining the parts of the field that do real creative work.

It made me realize that there are two axes to thinking about AI in physics.

On the one hand, there’s where you think AI capabilities are. Is AI going to lead to “a nation of geniuses in a data center”, an AI-powered super-(cyber-)Ed Witten for everything and everyone? Is AI great at routine work and coding, but will never be able to do anything really creative or novel? Or is AI total hype, almost always a waste of time?

On the other hand, there’s another axis: misanthropy about science. For some of the people arguing about AI online, most scientists are good people trying their best to do worthwhile things. For others, most scientists are complacent and cliquish, wasting time and money on ideas that are going nowhere and forcing the real geniuses out of the field.

Put those together, and you get the table below:

Thinks academia is mostly fineMisanthrope
AI geniuses are comingThe practice of science will change. We’ll play at science like chess, and have fun trying to read and understand amazing AI insights.Soon all scientists will be out of a job when the public notices AI can do it all better. Then the real breakthroughs will come.
AI can do routine workAI frees scientists to focus on what we do best: creativity. We should think carefully about how to train junior scientists now, though.AI is comparable to bad scientists who only do derivative work. If they leave, we real paradigm-changers could inherit the field.
AI is complete hypeMost scientists don’t use AI. AI is worrying because it misleads students and the public, who should listen to real scientists.Scientists are shilling for AI companies, as you should expect for people who waste the public’s money on reputation games.

This classification is missing a lot, of course. One important question is not just what AI can do in principle, but what it can do cost-effectively, and whether anyone is actually willing to pay for it. A point where I agree with stringking42069 is that companies get a lot of good PR out of building AI physicists right now, and that PR benefit won’t be relevant forever. I’m also leaving out the more general questions of AI’s effect on society, for example people who think AI geniuses will lead to the end of the world as we know it.

But I suspect if you look at this table, you can already start matching the scientists you see on social media. I’ve seen examples of all of these in the wild (though the bottom-left is somewhat rare, as far as I can tell). Where do you fall?

Should You Read What You Cite? That Depends

When arXiv announced it would ban people for hallucinated citations, that is citations of papers that don’t exist, the discussion online got sidetracked by the question of whether academics actually read the papers they cite. Some people proudly insisted that any good scholar always reads every paper they reference, others argued that was ridiculous.

As always, the answer is never that simple. In certain fields, it is enormously important to read the papers you cite if you want to do solid, careful, scholarly work. In others, it’s entirely irrelevant.

It mostly comes down to what citations are for. And luckily, I’ve already written a post about that.

So let’s go through the citation motivations I mention in that post.

First, some citations are about respecting priority, feeding the system by which academics get credit for having an idea first. The incentive system of academia depends on getting this more or less right, but that doesn’t mean every academic has to check things at every step of the way. Besides, if you get this wrong, you’ll find out quickly. Submit a paper to a preprint server like arXiv, and you’ll be sure to get emails telling you that some obscure Soviet researcher figured it all out first.

Other citations are about substantiating claims. These are the most important to get right. Here, you really ought to have read, if not the whole paper, at least the full justification for the claim you’re making. You can have some leeway if the methods are unfamiliar enough, for example a complicated experiment you can’t understand all the details of. Science and technology do require some trust. But you should have at least a sense of where things could go wrong, and why.

Citations to provide context are a different beast. Here, you’re trying to tell a reader where your ideas come from. You can’t show them the conversations you have with your colleagues, the things they value and get you excited about. So you have to show them papers instead. But the papers aren’t the thing you read, they’re just a convenient proxy.

Finally, citations do sometimes just exist to follow social conventions. And yeah, you don’t have to read these, just like you don’t have to say how you’re doing when someone asks you how you’re doing. They’re the academic equivalent of social white lies, and should be taken roughly as seriously, both by their supporters and detractors.

Doing Things Well Is an International Activity

In the US, funding agencies seem to be increasingly opposed to an often inevitable feature of good science: international collaboration. Scientists have been told by officials at the National Institutes of Health that they need to remove mention of foreign collaborators from progress reports, or that they need to avoid such collaborations to begin with. At NASA, officials have told scientists that rather than just avoiding funding work in China, they should actively avoid collaborating with Chinese researchers. And a recently introduced bill would make that restriction more explicit.

I have a general policy against discussing concrete political issues on this blog, so I’m not going to dig into the details of who’s doing what here, how far it’s going or how novel it is. That policy extends to the comments. If you mention specific laws, politicians, or political parties, I will delete your comment.

I do want to say something more general, though. I think people often underestimate just how important international collaboration is.

I’ve talked before about how scientific specialization spreads scientists around the world. Scientists want to work with people who work on their specific interests, and there are often only a few people that fit that description. So people move across the world, creating centers of expertise.

More than that, though, essentially any activity, done well, is done internationally. The better you want to perform, the more likely it is that the best collaborator will be someone in another country.

People don’t notice this as much as they could, because they’re used to the exceptions. Popular art is often siloed by language and cultural references. Sports are intentionally set up as competitions between regions and nations, and militaries compete as a practical necessity. But without those exceptions, international competition wins out. The best doctor, the best classical musician, and the best businessperson for a job can’t be expected to come from one country or another. Those fields, like science, are international.

When that internationalism is weak, it’s a warning sign. Without that drive to succeed on an international stage, scientists get lazy. There are countries with a history of academic cronyism, where universities were run more on interpersonal politics than scholarly merit, cozy fiefdoms where prominent academics dole out positions. To combat this, policymakers work to make their research systems more international. They explicitly ask about international collaborations and participation in international conferences in grant applications, not to discourage them, but to encourage them: to reward academics who show merit on the international stage and break up lazy patronage networks.

It worries me that it sounds like some US policymakers want to do the opposite. People are increasingly worried about bias and groupthink in the sciences, and increasingly mad that scientists could be wasting the public’s money to maintain a cushy lifestyle. International collaboration is how you hold scientists to account, how you force them to compete and show their merit. If you drop that, academia is going to get a whole lot worse.

Breakthrough Prize 2026

Because of last week’s “bonus info” post, I’m only now getting around to commenting on this year’s Breakthrough Prizes in Fundamental Physics. While I don’t comment on them every year, I know enough about several of this year’s winners that I figured a post would be helpful.

For those who haven’t heard of it, the Breakthrough Prizes are a bit like the Nobel, if it was created by a 21st century rich person instead of a 19th century one. They give out more money, and instead of an organization like the Swedish Academy of Sciences they pick winners via a committee of past winners. They’re more flexible in structure than the Nobel, with extra prizes for early-career researchers and a tendency to reward accomplishments that are either entirely theoretical or solid experimental work that doesn’t show a new discovery, both of which are things the Nobel Prize is structured to avoid. They’ve also shown willingness to reward large collaborations, rather than following the Nobel’s informal rule to only give the award to three people at a time.

This last was on display this year in their main award in physics this year, for the muon g-2 collaborations. The award is going to collaborations of scientists and engineers at three different particle colliders, for work done over a span of over fifty years to measure the magnetic properties of the muon. These measurements have shown a tantalizing discrepancy with predictions that inspired many to conjecture new physics. However, in the last few years it’s looked more and more like the discrepancy was due to an imprecise prediction, and better methods seem to be converging to the experimental value. At this point, smart money is that there is no disagreement with the Standard Model here, but as always in science there’s a chance some mystery remains.

The Breakthrough Prize also offered a special, out-of-schedule prize to David Gross. Already a Nobel laureate, Gross had a crucial role in our understanding of the force of quantum chromodynamics that binds protons and neutrons together. He was also a major founding figure in string theory, and since the Breakthrough Prize is more comfortable recognizing theoretical contributions they get to mention this as well. Gross is also known in the community for his personality, which tends to fill up any room he’s in. I can only imagine the conversations that led to Breakthrough’s decision to add a special prize for him this year.

Breakthrough is also adding a new recurring prize, the Vera Rubin New Frontiers Prize, honoring women who make important contributions to physics within two years of their PhD. The prize is a bit smaller than the exiting early-career New Horizons in Physics Prizes, presumably because it goes to even younger researchers. This year’s winner is from my old field, scattering amplitudes. Carolina Figueiredo is part of the latest evolution of the research program behind the amplituhedron. The new framework of “surfaceology” seems like a promising geometry-flavored way to understand particle physics calculations in more realistic theories, and unlike its predecessors may have some practical value eventually as well. Congrats Carolina!

Finally, the New Horizons in Physics Prizes are for impressive early-career researchers. I don’t know much about the first recipient, Benjamin Safdi, who works on searches for axions and axion-like particles, today’s most trendy dark matter candidate. I know a bit more about the work done by Clay Córdova, Thomas Dumitrescu, Shu-Heng Shao, and Yifan Wang, having met several of them in my physics career. They work on what are called generalized symmetries, concepts which go beyond the usual idea of how symmetry is supposed to work by involving more complicated tensors. I saw these crop up a fair bit in talks, but they were distant enough from my area that I never had a particularly clear grasp of what people were doing with them. I know even less about the work of the last three, Dillon Brout, J. Colin Hill, Mathew Madhavacheril, Maria Vincenzi, Daniel Scolnic, and W. L. Kimmy Wu, on cosmological measurements, but I was friends with Mathew in grad school and am impressed that he’s now working on cosmology given how little cosmology research there was at Stony Brook at the time.

The Twitter of Physics

The paper I talked about last week was frustratingly short. That’s not because the authors were trying to hide anything, or because they were lazy. It’s just that these days, that’s how the game is played.

Twitter started out with a fun gimmick: all posts had to be under 140 characters. The restriction inspired some great comedy, trying to pack as much humor as possible into a bite-sized format. Then, Twitter somehow became the place for journalists to discuss the news, tech people to discuss the industry, and politicians to discuss politics. Now, the length limit fuels conflict, an endless scroll of strong opinions without space for nuance.

Physics has something like this too.

In the 1950’s, it was hard for scientists to get the word out quickly about important results. The journal Physical Review had a trick: instead of normal papers, they’d accept breaking news in the form of letters to the editor, which they could publish more quickly than the average paper. In 1958, editor Samuel Goudsmit founded a new journal, Physical Review Letters (or PRL for short), that would publish those letters all in one place, enforcing a length limit to make them faster to process.

The new journal was a hit, and soon played host to a series of breakthrough results, as scientists chose it as a way to get their work out fast. That popularity created a problem, though. As PRL’s reputation grew, physicists started trying to publish there not because their results needed to get out fast, but because just by publishing in PRL, their papers would be associated with all of the famous breakthroughs the journal had covered. Goudsmit wrote editorials trying to slow this trend, but to no avail.

Now, PRL is arguably the most prestigious journal in physics, hosting over a quarter of Nobel prize-winning work. Its original motivation is no longer particularly relevant: the journal is not all that much faster than other journals in its area, if at all, and is substantially slower than the preprint server arXiv, which is where physicists actually read papers in practice.

The length limit has changed over the years, but not dramatically. It now sits at 3,750 words, typically allowing a five-or-six page article in tight two-column text.

If you see a physics paper on arXiv.org that fits the format, it’s almost certainly aimed at PRL, or one of the journals with similar policies that it inspired. It means the authors think their work is cool enough to hang out with a quarter of all Nobel-winning results, or at least would like it to be.

And that, in turn, means that anyone who wants to claim that prestige has to be concise. They have to leave out details (often, saving them for a later publication in a less-renowned journal). The results have to lean, by the journal’s nature, more to physicist-clickbait and a cleaned-up story than to anything their colleagues can actually replicate.

Is it fun? Yeah, I had some PRLs in my day. It’s a rush, shining up your work as far as it can go, trimming down complexities into six pages of essentials.

But I’m not sure it’s good for the field.

About the OpenAI Amplitudes Paper, but Not as Much as You’d Like

I’ve had a bit more time to dig in to the paper I mentioned last week, where OpenAI collaborated with amplitudes researchers, using one of their internal models to find and prove a simplified version of a particle physics formula. I figured I’d say a bit about my own impressions from reading the paper and OpenAI’s press release.

This won’t be a real “deep dive”, though it will be long nonetheless. As it turns out, most of the questions I’d like answers to aren’t answered in the paper or the press release. Getting them will involve actual journalistic work, i.e. blocking off time to interview people, and I haven’t done that yet. What I can do is talk about what I know so far, and what I’m still wondering.

Context:

Scattering amplitudes are formulas used by particle physicists to make predictions. For a while, people would just calculate these when they needed them, writing down pages of mess that you could plug in numbers to to get answers. However, forty years ago two physicists decided they wanted more, writing “we hope to obtain a simplified form for the answer, making our result not only an experimentalist’s, but a theorist’s delight.”

In their next paper, they managed to find that “theorist’s delight”: a simplified, intuitive-looking answer that worked for calculations involving any number of particles, summarizing many different calculations. Ten years later, a few people had started building on it, and ten years after that, the big shots started paying attention. A whole subfield, “amplitudeology”, grew from that seed, finding new forms of “theorists’s delight” in scattering amplitudes.

Each subfield has its own kind of “theory of victory”, its own concept for what kind of research is most likely to yield progress. In amplitudes, it’s these kinds of simplifications. When they work out well, they yield new, more efficient calculation techniques, yielding new messy results which can be simplified once more. To one extent or another, most of the field is chasing after those situations when simplification works out well.

That motivation shapes both the most ambitious projects of senior researchers, and the smallest student projects. Students often spend enormous amounts of time looking for a nice formula for something and figuring out how to generalize it, often on a question suggested by a senior researcher. These projects mostly serve as training, but occasionally manage to uncover something more impressive and useful, an idea others can build around.

I’m mentioning all of this, because as far as I can tell, what ChatGPT and the OpenAI internal model contributed here roughly lines up with the roles students have on amplitudes papers. In fact, it’s not that different from the role one of the authors, Alfredo Guevara, had when I helped mentor him during his Master’s.

Senior researchers noticed something unusual, suggested by prior literature. They decided to work out the implications, did some calculations, and got some messy results. It wasn’t immediately clear how to clean up the results, or generalize them. So they waited, and eventually were contacted by someone eager for a research project, who did the work to get the results into a nice, general form. Then everyone publishes together on a shared paper.

How impressed should you be?

I said, “as far as I can tell” above. What’s annoying is that this paper makes it hard to tell.

If you read through the paper, they mention AI briefly in the introduction, saying they used GPT-5.2 Pro to conjecture formula (39) in the paper, and an OpenAI internal model to prove it. The press release actually goes into more detail, saying that the humans found formulas (29)-(32), and GPT-5.2 Pro found a special case where it could simplify them to formulas (35)-(38), before conjecturing (39). You can get even more detail from an X thread by one of the authors, OpenAI Research Scientist Alex Lupsasca. Alex had done his PhD with another one of the authors, Andrew Strominger, and was excited to apply the tools he was developing at OpenAI to his old research field. So they looked for a problem, and tried out the one that ended up in the paper.

What is missing, from the paper, press release, and X thread, is any real detail about how the AI tools were used. We don’t have the prompts, or the output, or any real way to assess how much input came from humans and how much from the AI.

(We have more for their follow-up paper, where Lupsasca posted a transcript of the chat.)

Contra some commentators, I don’t think the authors are being intentionally vague here. They’re following business as usual. In a theoretical physics paper, you don’t list who did what, or take detailed account of how you came to the results. You clean things up, and create a nice narrative. This goes double if you’re aiming for one of the most prestigious journals, which tend to have length limits.

This business-as-usual approach is ok, if frustrating, for the average physics paper. It is, however, entirely inappropriate for a paper showcasing emerging technologies. For a paper that was going to be highlighted this highly by OpenAI, the question of how they reached their conclusion is much more interesting than the results themselves. And while I wouldn’t ask them to go to the standards of an actual AI paper, with ablation analysis and all that jazz, they could at least have aimed for the level of detail of my final research paper, which gave samples of the AI input and output used in its genetic algorithm.

For the moment, then, I have to guess what input the AI had, and what it actually accomplished.

Let’s focus on the work done by the internal OpenAI model. The descriptions I’ve seen suggest that it started where GPT-5.2 Pro did, with formulas (29)-(32), but with a more specific prompt that guided what it was looking for. It then ran for 12 hours with no additional input, and both conjectured (39) and proved it was correct, providing essentially the proof that follows formula (39) in the paper.

Given that, how impressed should we be?

First, the model needs to decide to go to a specialized region, instead of trying to simplify the formula in full generality. I don’t know whether they prompted their internal model explicitly to do this. It’s not something I’d expect a student to do, because students don’t know what types of results are interesting enough to get published, so they wouldn’t be confident in computing only a limited version of a result without an advisor telling them it was ok. On the other hand, it is actually something I’d expect an LLM to be unusually likely to do, as a result of not managing to consistently stick to the original request! What I don’t know is whether the LLM proposed this for the right reason: that if you have the formula for one region, you can usually find it for other regions.

Second, the model needs to take formulas (29)-(32), write them in the specialized region, and simplify them to formulas (35)-(38). I’ve seen a few people saying you can do this pretty easily with Mathematica. That’s true, though not every senior researcher is comfortable doing that kind of thing, as you need to be a bit smarter than just using the Simplify[] command. Most of the people on this paper strike me as pen-and-paper types who wouldn’t necessarily know how to do that. It’s definitely the kind of thing I’d expect most students to figure out, perhaps after a couple of weeks of flailing around if it’s their first crack at it. The LLM likely would not have used Mathematica, but would have used SymPy, since these “AI scientist” setups usually can write and execute Python code. You shouldn’t think of this as the AI reasoning through the calculation itself, but it at least sounds like it was reasonably quick at coding it up.

Then, the model needs to conjecture formula (39). This gets highlighted in the intro, but as many have pointed out, it’s pretty easy to do. If any non-physicists are still reading at this point, take a look:

Could you guess (39) from (35)-(38)?

After that, the paper goes over the proof that formula (39) is correct. Most of this proof isn’t terribly difficult, but the way it begins is actually unusual in an interesting way. The proof uses ideas from time-ordered perturbation theory, an old-fashioned way to do particle physics calculations. Time-ordered perturbation theory isn’t something any of the authors are known for using with regularity, but it has recently seen a resurgence in another area of amplitudes research, showing up for example in papers by Matthew Schwartz, a colleague of Strominger at Harvard.

If a student of Strominger came up with an idea drawn from time-ordered perturbation theory, that would actually be pretty impressive. It would mean that, rather than just learning from their official mentor, this student was talking to other people in the department and broadening their horizons, showing a kind of initiative that theoretical physicists value a lot.

From an LLM, though, this is not impressive in the same way. The LLM was not trained by Strominger, it did not learn specifically from Strominger’s papers. Its context suggested it was working on an amplitudes paper, and it produced an idea which would be at home in an amplitudes paper, just a different one than the one it was working on.

While not impressive, that capability may be quite useful. Academic subfields can often get very specialized and siloed. A tool that suggests ideas from elsewhere in the field could help some people broaden their horizons.

Overall, it appears that that twelve-hour OpenAI internal model run reproduced roughly what an unusually bright student would be able to contribute over the course of a several-month project. Like most student projects, you could find a senior researcher who could do the project much faster, maybe even faster than the LLM. But it’s unclear whether any of the authors could have: different senior researchers have different skillsets.

A stab at implications:

If we take all this at face-value, it looks like OpenAI’s internal model was able to do a reasonably competent student project with no serious mistakes in twelve hours. If they started selling that capability, what would happen?

If it’s cheap enough, you might wonder if professors would choose to use the OpenAI model instead of hiring students. I don’t think this would happen, though: I think it misunderstands why these kinds of student projects exist in a theoretical field. Professors sometimes use students to get results they care about, but more often, the student’s interest is itself the motivation, with the professor wanting to educate someone, to empire-build, or just to take on their share of the department’s responsibilities. AI is only useful for this insofar as AI companies continue reaching out to these people to generate press releases: once this is routinely possible, the motivation goes away.

More dangerously, if it’s even cheaper, you could imagine students being tempted to use it. The whole point of a student project is to train and acculturate the student, to get them to the point where they have affection for the field and the capability to do more impressive things. You can’t skip that, but people are going to be tempted to.

And of course, there is the broader question of how much farther this technology can go. That’s the hardest to estimate here, since we don’t know the prompts used. So I don’t know if seeing this result tells us anything more about the bigger picture than we knew going in.

Remaining questions:

At the end of the day, there are a lot of things I still want to know. And if I do end up covering this professionally, they’re things I’ll ask.

  1. What was the prompt given to the internal model, and how much did it do based on that prompt?
  2. Was it really done in one shot, no retries or feedback?
  3. How much did running the internal model cost?
  4. Is this result likely to be useful? Are there things people want to calculate that this could make easier? Recursion relations it could seed? Is it useful for SCET somehow?
  5. How easy would it have been for the authors to do what the LLM did? What about other experts in the community?

Hypothesis: If AI Is Bad at Originality, It’s a Documentation Problem

Recently, a few people have asked me about this paper.

A couple weeks back, OpenAI announced a collaboration with a group of amplitudes researchers, physicists who study the types of calculations people do to make predictions at particle colliders. The amplitudes folks had identified an interesting loophole, finding a calculation that many would have expected to be zero actually gave a nonzero answer. They did the calculation for different examples involving more and more particles, and got some fairly messy answers. They suspected, as amplitudes researchers always expect, that there was a simpler formula, one that worked for any number of particles. But they couldn’t find it.

Then a former amplitudes researcher at OpenAI suggested that they use AI to find it.

“Use AI” can mean a lot of different things, and most of them don’t look much like the way the average person talks to ChatGPT. This was closer than most. They were using “reasoning models”, loops that try to predict the next few phrases in a “chain of thought” again and again and again. Using that kind of tool, they were able to find that simpler formula, and mathematically prove that it was correct.

A few of you are hoping for an in-depth post about what they did, and its implications. This isn’t that. I’m still figuring out if I’ll be writing that for an actual news site, for money, rather than free, for you folks.

Instead, I want to talk about a specific idea I’ve seen crop up around the paper.

See, for some, the existence of a result like this isn’t all that surprising.

Mathematicians have been experimenting with reasoning models for a bit, now. Recently, a group published a systematic study, setting the AI loose on a database of minor open problems proposed by the famously amphetamine-fueled mathematician Paul Erdös. The AI managed to tackle a few of the problems, sometimes by identifying existing solutions that had not yet been linked to the problem database, but sometimes by proofs that appeared to be new.

The Erdös problems solved by the AI were not especially important. Neither was the problem solved by the amplitudes researchers, as far as I can tell at this point.

But I get the impression the amplitudes problem was a bit more interesting than the Erdös problems. The difference, so far, has mostly been attributed to human involvement. This amplitudes paper started because human amplitudes researchers found an interesting loophole, and only after that used the AI. Unlike the mathematicians, they weren’t just searching a database.

This lines up with a general point, one people tend to make much less carefully. It’s often said that, unlike humans, AI will never be truly creative. It can solve mechanical problems, do things people have done before, but it will never be good at having truly novel ideas.

To me, that line of thinking goes a bit too far. I suspect it’s right on one level, that it will be hard for any of these reasoning models to propose anything truly novel. But if so, I think it will be for a different reason.

The thing is, creativity is not as magical as we make it out to be. Our ideas, scientific or artistic, don’t just come from the gods. They recombine existing ideas, shuffling them in ways more akin to randomness than miracle. They’re then filtered through experience, deep heuristics honed over careers. Some people are good at ideas, and some are bad at them. Having ideas takes work, and there are things people do to improve their ideas. Nothing about creativity suggests it should be impossible to mechanize.

However, a machine trained on text won’t necessarily know how to do any of that.

That’s because in science, we don’t write down our inspirations. By the time a result gets into a scientific paper or textbook, it’s polished and refined into a pure argument, cutting out most of the twists and turns that were an essential part of the creative process. Mathematics is even worse, most math papers don’t even mention the motivation behind the work, let alone the path taken to the paper.

This lack of documentation makes it hard for students, making success much more a function of having the right mentors to model good practices, rather than being able to pick them up from literature everyone can access. I suspect it makes it even harder for language models. And if today’s language model-based reasoning tools are bad at that crucial, human-seeming step, of coming up with the right idea at the right time? I think that has more to do with this lack of documentation, than with the fact that they’re “statistical parrots”.

Most Academics Don’t Choose Their Specialty

It’s there in every biography, and many interviews: the moment the scientist falls in love with an idea. It can be a kid watching ants in the backyard, a teen peering through a telescope, or an undergrad seeing a heart cell beat on a slide. It’s a story so common that it forms the heart of the public idea of a scientist: not just someone smart enough to understand the world, but someone passionate enough to dive in to their one particular area above all else. It’s easy to think of it as a kind of passion most people never get to experience.

And it does happen, sometimes. But it’s a lot less common than you’d think.

I first started to suspect this as a PhD student. In the US, getting accepted into a PhD program doesn’t guarantee you an advisor to work with. You have to impress a professor to get them to spend limited time and research funding on you. In practice, the result was the academic analog of the dating scene. Students looked for who they might have a chance with, based partly on interest but mostly on availability and luck and rapport, and some bounced off many potential mentors before finding one that would stick.

Then, for those who continued to postdoctoral positions, the same story happened all over again. Now, they were applying for jobs, looking for positions where they were qualified enough and might have some useful contacts, with interest into the specific research topic at best a distant third.

Working in the EU, I’ve seen the same patterns, but offset a bit. Students do a Master’s thesis, and the search for a mentor there is messy and arbitrary in similar ways. Then for a PhD, they apply for specific projects elsewhere, and as each project is its own funded position the same job search dynamics apply.

The picture only really clicked for me, though, when I started doing journalism.

Nowadays, I don’t do science, I interview people about it. The people I interview are by and large survivors: people who got through the process of applying again and again and now are sitting tight in an in-principle permanent position. They’re people with a lot of freedom to choose what to do.

And so I often ask for that reason, that passion, that scientific love at first sight moment: why do you study what you do? It’s a story that audiences love, and thus that editors love, it’s always a great way to begin a piece.

But surprisingly often, I get an unromantic answer. Why study this? Because it was available. Because in the Master’s, that professor taught the intro course. Because in college, their advisor had contacts with that lab to arrange a study project. Because that program accepted people from that country.

And I’ve noticed how even the romantic answers tend to be built on the unromantic ones. The professors who know how to weave a story, to self-promote and talk like a politician, they’ll be able to tell you about falling in love with something, sure. But if you read between the lines, you’ll notice where their anecdotes fall, how they trace a line through the same career steps that less adroit communicators admit were the real motivation.

There’s been times I’ve thought that my problem was a lack of passion, that I wasn’t in love the same way other scientists were in love. I’ve even felt guilty, that I took resources and positions from people who were. There is still some truth in that guilt, I don’t think I had the same passion for my science as most of my colleagues.

But I appreciate more now, that that passion is in part a story. We don’t choose our specialty, making some grand agentic move. Life chooses for us. And the romance comes in how you tell that story, after the fact.

The Timeline for Replacing Theorists Is Not Technological

Quanta Magazine recently published a reflection by Natalie Wolchover on the state of fundamental particle physics. The discussion covers a lot of ground, but one particular paragraph has gotten the lion’s share of the attention. Wolchover talked to Jared Kaplan, the ex-theoretical physicist turned co-founder of Anthropic, one of the foremost AI companies today.

Kaplan was one of Nima Arkani-Hamed’s PhD students, which adds an extra little punch.

There’s a lot to contest here. Is AI technology anywhere close to generating papers as good as the top physicists, or is that relegated to the sci-fi future? Does Kaplan really believe this, or is he just hyping up his company?

I don’t have any special insight into those questions, about the technology and Kaplan’s motivations. But I think that, even if we trusted him on the claim that AI could be generating Witten- or Nima-level papers in three years, that doesn’t mean it will replace theoretical physicists. That part of the argument isn’t a claim about the technology, but about society.

So let’s take the technological claims as given, and make them a bit more specific. Since we don’t have any objective way of judging the quality of scientific papers, let’s stick to the subjective. Today, there are a lot of people who get excited when Witten posts a new paper. They enjoy reading them, they find the insights inspiring, they love the clarity of the writing and their tendency to clear up murky ideas. They also find them reliable: the papers very rarely have mistakes, and don’t leave important questions unanswered.

Let’s use that as our baseline, then. Suppose that Anthropic had an AI workflow that could reliably write papers that were just as appealing to physicists as Witten’s papers are, for the same reasons. What happens to physicists?

Witten himself is retired, which for an academic means you do pretty much the same thing you were doing before, but now paid out of things like retirement savings and pension funds, not an institute budget. Nobody is going to fire Witten, there’s no salary to fire him from. And unless he finds these developments intensely depressing and demoralizing (possible, but very much depends on how this is presented), he’s not going to stop writing papers. Witten isn’t getting replaced.

More generally, though, I don’t think this directly results in anyone getting fired, or in universities trimming positions. The people making funding decisions aren’t just sitting on a pot of money, trying to maximize research output. They’ve got money to be spent on hires, and different pools of money to be spent on equipment, and the hires get distributed based on what current researchers at the institutes think is promising. Universities want to hire people who can get grants, to help fund the university, and absent rules about AI personhood, the AIs won’t be applying for grants.

Funding cuts might be argued for based on AI, but that will happen long before AI is performing at the Witten level. We already see this happening in other industries or government agencies, where groups that already want to cut funding are getting think tanks and consultants to write estimates that justify cutting positions, without actually caring whether those estimates are performed carefully enough to justify their conclusions. That can happen now, and doesn’t depend on technological progress.

AI could also replace theoretical physicists in another sense: the physicists themselves might use AI to do most of their work. That’s more plausible, but here adoption still heavily depends on social factors. Will people feel like they are being assessed on whether they can produce these Witten-level papers, and that only those who make them get hired, or funded? Maybe. But it will propagate unevenly, from subfield to subfield. Some areas will make their own rules forbidding AI content, there will be battles and scandals and embarrassments aplenty. It won’t be a single switch, the technology alone setting the timeline.

Finally, AI could replace theoretical physicists in another way, by people outside of academia filling the field so much that theoretical physicists have nothing more that they want to do. Some non-physicists are very passionate about physics, and some of those people have a lot of money. I’ve done writing work for one such person, whose foundation is now attempting to build an AI Physicist. If these AI Physicists get to Witten-level quality, they might start writing compelling paper after compelling paper. Those papers, though, will due to their origins be specialized. Much as philanthropists mostly fund the subfields they’ve heard of, philanthropist-funded AI will mostly target topics the people running the AI have heard are important. Much like physicists themselves adopting the technology, there will be uneven progress from subfield to subfield, inch by socially-determined inch.

In a hard-to-quantify area like progress in science, that’s all you can hope for. I suspect Kaplan got a bit of a distorted picture of how progress and merit work in theoretical physics. He studied with Nima Arkani-Hamed, who is undeniably exceptionally brilliant but also undeniably exceptionally charismatic. It must feel to a student of Nima’s that academia simply hires the best people, that it does whatever it takes to accomplish the obviously best research. But the best research is not obvious.

I think some of these people imagine a more direct replacement process, not arranged by topic and tastes, but by goals. They picture AI sweeping in and doing what theoretical physics was always “meant to do”: solve quantum gravity, and proceed to shower us with teleporters and antigravity machines. I don’t think there’s any reason to expect that to happen. If you just asked a machine to come up with the most useful model of the universe for a near-term goal, then in all likelihood it wouldn’t consider theoretical high-energy physics at all. If you see your AI as a tool to navigate between utopia and dystopia, theoretical physics might matter at some point: when your AI has devoured the inner solar system, is about to spread beyond the few light-minutes when it can signal itself in real-time, and has to commit to a strategy. But as long as the inner solar system remains un-devoured, I don’t think you’ll see an obviously successful theory of fundamental physics.

On Theories of Everything and Cures for Cancer

Some people are disappointed in physics. Shocking, I know!

Those people, when careful enough, clarify that they’re disappointed in fundamental physics: not the physics of materials or lasers or chemicals or earthquakes, or even the physics of planets and stars, but the physics that asks big fundamental questions, about the underlying laws of the universe and where they come from.

Some of these people are physicists themselves, or were once upon a time. These often have in mind other directions physicists should have gone. They think that, with attention and funding, their own ideas would have gotten us closer to our goals than the ideas that, in practice, got the attention and the funding.

Most of these people, though, aren’t physicists. They’re members of the general public.

It’s disappointment from the general public, I think, that feels the most unfair to physicists. The general public reads history books, and hears about a series of revolutions: Newton and Maxwell, relativity and quantum mechanics, and finally the Standard Model. They read science fiction books, and see physicists finding “theories of everything”, and making teleporters and antigravity engines. And they wonder what made the revolutions stop, and postponed the science fiction future.

Physicists point out, rightly, that this is an oversimplified picture of how the world works. Something happens between those revolutions, the kind of progress not simple enough to summarize for history class. People tinker away at puzzles, and make progress. And they’re still doing that, even for the big fundamental questions. Physicists know more about even faraway flashy topics like quantum gravity than they did ten years ago. And while physicists and ex-physicists can argue about whether that work is on the right path, it’s certainly farther along its own path than it was. We know things we didn’t know before, progress continues to be made. We aren’t at the “revolution” stage yet, or even all that close. But most progress isn’t revolutionary, and no-one can predict how often revolutions should take place. A revolution is never “due”, and thus can never be “overdue”.

Physicists, in turn, often don’t notice how normal this kind of reaction from the public is. They think people are being stirred up by grifters, or negatively polarized by excess hype, that fundamental physics is facing an unfair reaction only shared by political hot-button topics. But while there are grifters, and people turned off by the hype…this is also just how the public thinks about science.

Have you ever heard the phrase “a cure for cancer”?

Fiction is full of scientists working on a cure for cancer, or who discovered a cure for cancer, or were prevented from finding a cure for cancer. It’s practically a trope. It’s literally a trope.

It’s also a real thing people work on, in a sense. Many scientists work on better treatments for a variety of different cancers. They’re making real progress, even dramatic progress. As many whose loved ones have cancer know, it’s much more likely for someone with cancer to survive than it was, say, twenty years ago.

But those cures don’t meet the threshold for science fiction, or for the history books. They don’t move us, like the polio vaccine did, from a world where you know many people with a disease to a world where you know none. They don’t let doctors give you a magical pill, like in a story or a game, that instantly cures your cancer.

For the vast majority of medical researchers, that kind of goal isn’t realistic, and isn’t worth thinking about. The few that do pursue it work towards extreme long-term solutions, like periodically replacing everyone’s skin with a cloned copy.

So while you will run into plenty of media descriptions of scientists working on cures for cancer, you won’t see the kind of thing the public expects is an actual “cure for cancer”. And people are genuinely disappointed about this! “Where’s my cure for cancer?” is a complaint on the same level as “where’s my hovercar?” There are people who think that medical science has made no progress in fifty years, because after all those news articles, we still don’t have a cure for cancer.

I appreciate that there are real problems in what messages are being delivered to the public about physics, both from hypesters in the physics mainstream and grifters outside it. But put those problems aside, and a deeper issue remains. People understand the world as best they can, as a story. And the world is complicated and detailed, full of many people making incremental progress on many things. Compared to a story, the truth is always at a disadvantage.