Tag Archives: PublicPerception

Newsworthiness Bias

I had a chat about journalism recently, and I had a realization about just how weird science journalism, in particular, is.

Journalists aren’t supposed to be cheerleaders. Journalism and PR have very different goals (which is why I keep those sides of my work separate). A journalist is supposed to be uncompromising, to write the truth even if it paints the source in a bad light.

Norms are built around this. Serious journalistic outlets usually don’t let sources see pieces before they’re published. The source doesn’t have the final say in how they’re portrayed: the journalist reserves the right to surprise them if justified. Investigative journalists can be superstars, digging up damning secrets about the powerful.

When a journalist starts a project, the piece might turn out positive, or negative. A politician might be the best path forward, or a disingenuous grifter. A business might be a great investment opportunity, or a total scam. A popular piece of art might be a triumph, or a disappointment.

And a scientific result?

It might be a fraud, of course. Scientific fraud does exist, and is a real problem. But it’s not common, really. Pick a random scientific paper, filter by papers you might consider reporting on in the first place, and you’re very unlikely to find a fraudulent result. Science journalists occasionally report on spectacularly audacious scientific frauds, or frauds in papers that have already made the headlines. But you don’t expect fraud in the average paper you cover.

It might be scientifically misguided: flawed statistics, a gap in a proof, a misuse of concepts. Journalists aren’t usually equipped to ferret out these issues, though. Instead, this is handled in principle by peer review, and in practice by the scientific community outside of the peer review process.

Instead, for a scientific result, the most common negative judgement isn’t that it’s a lie, or a mistake. It’s that it’s boring.

And certainly, a good science journalist can judge a paper as boring. But there is a key difference between doing that, and judging a politician as crooked or a popular work of art as mediocre. You can write an article about the lying candidate for governor, or the letdown Tarantino movie. But if a scientific result is boring, and nobody else has covered it…then it isn’t newsworthy.

In science, people don’t usually publish their failures, their negative results, their ho-hum obvious conclusions. That fills the literature with only the successes, a phenomenon called publication bias. It also means, though, that scientists try to make their results sound more successful, more important and interesting, than they actually are. Some of the folks fighting the replication crisis have coined a term for this: they call it importance hacking.

The same incentives apply to journalists, especially freelancers. Starting out, it was far from clear that I could make enough to live on. I felt like I had to make every lead count, to find a newsworthy angle on every story idea I could find, because who knew when I would find another one? Over time, I learned to balance that pull better. Now that I’m making most of my income from consulting instead, the pressure has eased almost entirely: there are things I’m tempted to importance-hack for the sake of friends, but nothing that I need to importance-hack to stay in the black.

Doing journalism on the side may be good for me personally at the moment, but it’s not really a model. Much like we need career scientists, even if their work is sometimes boring, we need career journalists, even if they’re sometimes pressured to overhype.

So if we don’t want to incentivize science journalists to be science cheerleaders, what can we do instead?

In science, one way to address publication bias is with pre-registered studies. A scientist sets out what they plan to test, and a journal agrees to publish the result, no matter what it is. You could imagine something like this for science journalism. I once proposed a recurring column where every month I would cover a random paper from arXiv.org, explaining what it meant to accomplish. I get why the idea was turned down, but I still think about it.

In journalism, the arts offer the closest parallel with a different approach. There are many negative reviews of books, movies, and music, and most of them merely accuse the art of being boring, not evil. These exist because they focus on popular works that people pay attention to anyway, so that any negative coverage has someone to convince. You could imagine applying this model to science, though it could be a bit silly. I’m envisioning a journalist who writes an article every time Witten publishes, rating some papers impressive and others disappointing, the same way a music journalist might cover every Taylor Swift album.

Neither of these models are really satisfactory. You could imagine an even more adversarial model, where journalists run around accusing random scientists of wasting the government’s money, but that seems dramatically worse.

So I’m not sure. Science is weird, and hard to accurately value: if we knew how much something mattered already, it would be engineering, not science. Journalism is weird: it’s public-facing research, where the public facing is the whole point. Their combination? Even weirder.

Microdosing Vibe Physics

Have you heard of “vibe physics”?

The phrase “vibe coding” came first. People have been using large language models like ChatGPT to write computer code (and not the way I did last year). They chat with the model, describing what they want to do and asking the model to code it up. You can guess the arguments around this, from people who are convinced AI is already better than a human programmer to people sure the code will be riddled with errors and vulnerabilities.

Now, there are people claiming not only to do vibe coding, but vibe physics: doing theoretical physics by chatting with an AI.

I think we can all agree that’s a lot less plausible. Some of the people who do vibe coding actually know how to code, but I haven’t seen anyone claiming to do vibe physics who actually understands physics. They’re tech entrepreneurs in the most prominent cases, random people on the internet otherwise. And while a lot of computer code is a minor tweak on something someone has already done, theoretical physics doesn’t work that way: if someone has already come up with your idea, you’re an educator, not a physicist.

Still, I think there is something to keep in mind about the idea of “vibe physics”, related to where physics comes from.

Here’s a question to start with: go back a bit before the current chat-bot boom. There were a ton of other computational and mathematical tools. Theorem-proving software could encode almost arbitrary mathematical statements in computer code and guarantee their accuracy. Statistical concepts like Bayes’ rule described how to reason from evidence to conclusions, not flawlessly but as well as anyone reliably can. We had computer simulations for a wealth of physical phenomena, and approximation schemes for many others.

With all those tools, why did we still have human physicists?

That is, go back before ChatGPT, before large language models. Why not just code up a program that starts with the evidence and checks which mathematical model fits it best?

In principle, I think you really could have done that. But you could never run that program. It would take too long.

Doing science 100% correctly and reliably is agonizingly slow, and prohibitively expensive. You cannot check every possible model, nor can you check those models against all the available data. You must simplify your problem, somehow, even if it makes your work less reliable, and sometimes incorrect.

And for most of history, humans have provided that simplification.

A physicist isn’t going to consider every possible model. They’re going to consider models that are similar to models they studied, or similar to models others propose. They aren’t going to consider all the evidence. They’ll look at some of the evidence, the evidence other physicists are talking about and puzzled by. They won’t simulate the consequences of their hypotheses in exhaustive detail. Instead, they’ll guess, based on their own experience, a calculation that captures what they expect to be relevant.

Human physicists provided the unreliable part of physics, the heuristics. The “vibe physics”, if you will.

AI is also unreliable, also heuristic. But humans still do this better than AI.

Part of the difference is specificity. These AIs are trained on all of human language, and then perhaps fine-tuned on a general class of problems. A human expert has spent their life fine-tuning on one specific type of problem, and their intuitions, their heuristics, their lazy associations and vibes, all will be especially well-suited to problems of that type.

Another part of the difference, though, is scale.

When you talk to ChatGPT, it follows its vibes into paragraphs of text. If you turn on reasoning features, you make it check its work in the background, but it still is generating words upon words inside, evaluating those words, then generating more.

I suspect, for a physicist, the “control loop” is much tighter. Many potential ideas get ruled out a few words in. Many aren’t even expressed in words at all, just concepts. A human physicist is ultimately driven by vibes, but they check and verify those vibes, based on their experience, at a much higher frequency than any current AI system can achieve.

(I know almost nothing about neuroscience. I’m just basing this on what it can feel like, to grope through a sentence and have it assemble itself as it goes into something correct, rather than having to go back and edit it.)

As companies get access to bigger datacenters, I suspect they’ll try to make this loop tighter, to get AI to do something closer to what (I suspect, it appears) humans do. And then maybe AI will be able to do vibe physics.

Even then, though, you should not do vibe physics with the AI.

If you look at the way people describe doing vibe physics, they’re not using the AI for the vibes. They’re providing the vibes, and the AI is supposed to check things.

And that, I can confidently say, is completely ass-backwards. The AI is a vibe machine, it is great at vibes. Substituting your vibes will just make it worse. On the other hand, the AI is awful at checking things. It can find published papers sometimes, which can help you check something. But it is not set up to do the math, at least not unless the math can be phrased as a simple Python script or an IMO problem. In order to do anything like that, it has to call another type of software to verify. And you could have just used that software.

Theoretical physics is still not something everyone can do. Proposing a crackpot theory based on a few papers you found on Google and a couple YouTube videos may make you feel less confident than proposing a crackpot theory based on praise from ChatGPT and a list of papers it claims have something to do with your idea, which makes it more tempting. But it’s still proposing a crackpot theory. If you want to get involved, there’s still no substitute for actually learning how physics works.

Hype, Incentives, and Culture

To be clear, hype isn’t just lying.

We have a word for when someone lies to convince someone else to pay them, and that word is fraud. Most of what we call hype doesn’t reach that bar.

Instead, hype lives in a gray zone of affect and metaphor.

Some hype is pure affect. It’s about the subjective details, it’s about mood. “This is amazing” isn’t a lie, or at least, isn’t a lie you can check. They might really be amazed!

Some hype relies on metaphor. A metaphor can’t really be a lie, because a metaphor is always incomplete. But a metaphor can certainly be misleading. It can associate something minor with something important, or add emotional valence that isn’t really warranted.

Hype lies in a gray zone…and precisely because it lives in a gray zone, not everything that looks like hype is intended to be type.

We think of hype as a consequence of incentives. Scientists hype their work to grant committees to get grants, and hype it more to the public for prestige. Companies hype their products to sell them, and their business plans to draw in investors.

But what looks like hype can also be language, and culture.

To many people in the rest of the world, the way Americans talk about almost everything is hype. Everything is bigger and nicer and cooler. This isn’t because Americans are under some sort of weird extra career incentives, though. It’s just how they expect to talk, how they learned to talk, how everyone around them normally talks.

Similarly, people in different industries are used to talking differently. Depending on what work you do, you interpret different metaphors in different ways. What might seem like an enthusiastic endorsement in one industry might be dismissive in another.

In the end, it takes two to communicate: a speaker, and an audience. Speakers want to get their audience excited, and hopefully, if they don’t want to hype, to understand something of the truth. That means understanding how the audience communicates enthusiasm, and how it differs from the speaker. It means understanding language, and culture.

Bonus Info on the LHC and Beyond

Three of my science journalism pieces went up last week!

(This is a total coincidence. One piece was a general explainer “held in reserve” for a nice slot in the schedule, one was a piece I drafted in February, while the third I worked on in May. In journalism, things take as long as they take.)

The shortest piece, at Quanta Magazine, was an explainer about the two types of particles in physics: bosons, and fermions.

I don’t have a ton of bonus info here, because of how tidy the topic is, so just two quick observations.

First, I have the vague impression that Bose, bosons’ namesake, is “claimed” by both modern-day Bangladesh and India. I had friends in grad school who were proud of their fellow physicist from Bangladesh, but while he did his most famous work in Dhaka, he was born and died in Calcutta. Since both were under British India for most of his life, these things likely get complicated.

Second, at the end of the piece I mention a “world on a wire” where fermions and bosons are the same. One example of such a “wire” is a string, like in string theory. One thing all young string theorists learn is “bosonization”: the idea that, in a 1+1-dimensional world like a string, you can re-write any theory with fermions as a theory with bosons, as well as vice versa. This has important implications for how string theory is set up.

Next, in Ars Technica, I had a piece about how LHC physicists are using machine learning to untangle the implications of quantum interference.

As a journalist, it’s really easy to fall into a trap where you give the main person you interview too much credit: after all, you’re approaching the story from their perspective. I tried to be cautious about this, only to be stymied when literally everyone else I interviewed praised Aishik Ghosh to the skies and credited him with being the core motivating force behind the project. So I shrugged my shoulders and followed suit. My understanding is that he has been appropriately rewarded and will soon be a professor at Georgia Tech.

I didn’t list the inventors of the NSBI method that Ghosh and co. used, but names like Kyle Cranmer and Johann Brehmer tend to get bandied about. It’s a method that was originally explored for a more general goal, trying to characterize what the Standard Model might be missing, while the work I talk about in the piece takes it in a new direction, closer to the typical things the ATLAS collaboration looks for.

I also did not say nearly as much as I was tempted to about how the ATLAS collaboration publishes papers, which was honestly one of the most intriguing parts of the story for me. There is a huge amount of review that goes on inside ATLAS before one of their papers reaches the outside world, way more than there ever is in a journal’s peer review process. This is especially true for “physics papers”, where ATLAS is announcing a new conclusion about the physical world, as ATLAS’s reputation stands on those conclusions being reliable. That means starting with an “internal note” that’s hundreds of pages long (and sometimes over a thousand), an editorial board that manages the editing process, disseminating the paper to the entire collaboration for comment, and getting specific experts and institute groups within the collaboration to read through the paper in detail. The process is a bit less onerous for “technical papers”, which describe a new method, not a new conclusion about the world. Still, it’s cumbersome enough that for those papers, often scientists don’t publish them “within ATLAS” at all, instead releasing them independently. The results I reported on are special because they involved a physics paper and a technical paper, both within the ATLAS collaboration process. Instead of just working with partial or simplified data, they wanted to demonstrate the method on a “full analysis”, with all the computation and human coordination that requires. Normally, ATLAS wouldn’t go through the whole process of publishing a physics paper without basing it on new data, but this was different: the method had the potential to be so powerful that the more precise results would be worth stating as physics results alone.

(Also, for the people in the comments worried about training a model on old data: that’s not what they did. In physics, they don’t try to train a neural network model to predict the results of colliders, such a model wouldn’t tell us anything useful. They run colliders to tell us whether what they see matches the analytic, Standard, model. The neural network is trained to predict not what the experiment will say, but what the Standard Model will say, as we can usually only figure that out through time-consuming simulations. So it’s trained on (new) simulations, not on experimental data.)

Finally, on Friday I had a piece in Physics Today about the European Strategy for Particle Physics (or ESPP), and in particular, plans for the next big collider.

Before I even started working on this piece, I saw a thread by Patrick Koppenburg on some of the 263 documents submitted for the ESPP update. While my piece ended up mostly focused on the big circular collider plan that most of the field is converging on (the future circular collider, or FCC), Koppenburg’s thread was more wide-ranging, meant to illustrate the breadth of ideas under discussion. Some of that discussion is about the LHC’s current plans, like its “high-luminosity” upgrade that will see it gather data at much higher rates up until 2040. Some of it is assessing broader concerns, which it may surprise some of you to learn includes sustainability: yes, there are more or less sustainable ways to build giant colliders.

The most fun part of the discussion, though, concerns all of the other collider proposals.

Some report progress on new technologies. Muon colliders are the most famous of these, but there are other proposals that would specifically help with a linear collider. I never did end up understanding what Cooled Copper Colliders are all about, beyond that they let you get more energy in a smaller machine without super-cooling. If you know about them, chime in in the comments! Meanwhile, plasma wakefield acceleration could accelerate electrons on a wave of plasma. This has the disadvantage that you want to collide electrons and positrons, and if you try to stick a positron in plasma it will happily annihilate with the first electron it meets. So what do you do? You go half-and-half, with the HALHF project: speed up the electron with a plasma wakefield, accelerate the positron normally, and have them meet in the middle.

Others are backup plans, or “budget options”, where CERN could get a bit better measurements on some parameters if they can’t stir up the funding to measure the things they really want. They could put electrons and positrons into the LHC tunnel instead of building a new one, for a weaker machine that could still study the Higgs boson to some extent. They could use a similar experiment to produce Z bosons instead, which could serve as a bridge to a different collider project. Or, they could collider the LHC’s proton beam with an electron beam, for an experiment that mixes advantages and disadvantages of some of the other approaches.

While working on the piece, one resource I found invaluable was this colloquium talk by Tristan du Pree, where he goes through the 263 submissions and digs up a lot of interesting numbers and commentary. Read the slides for quotes from the different national inputs and “solo inputs” with comments from particular senior scientists. I used that talk to get a broad impression of what the community was feeling, and it was interesting how well it was reflected in the people I interviewed. The physicist based in Switzerland felt the most urgency for the FCC plan, while the Dutch sources were more cautious, with other Europeans firmly in the middle.

Going over the FCC report itself, one thing I decided to leave out of the discussion was the cost-benefit analysis. There’s the potential for a cute sound-bite there, “see, the collider is net positive!”, but I’m pretty skeptical of the kind of analysis they’re doing there, even if it is standard practice for government projects. Between the biggest benefits listed being industrial benefits to suppliers and early-career researcher training (is a collider unusually good for either of those things, compared to other ways we spend money?) and the fact that about 10% of the benefit is the science itself (where could one possibly get a number like that?), it feels like whatever reasoning is behind this is probably the kind of thing that makes rigor-minded economists wince. I wasn’t able to track down the full calculation though, so I really don’t know, maybe this makes more sense than it looks.

I think a stronger argument than anything along those lines is a much more basic point, about expertise. Right now, we have a community of people trying to do something that is not merely difficult, but fundamental. This isn’t like sending people to space, where many of the engineering concerns will go away when we can send robots instead. This is fundamental engineering progress in how to manipulate the forces of nature (extremely powerful magnets, high voltages) and process huge streams of data. Pushing those technologies to the limit seems like it’s going to be relevant, almost no matter what we end up doing. That’s still not putting the science first and foremost, but it feels a bit closer to an honest appraisal of what good projects like this do for the world.

In Scientific American, With a Piece on Vacuum Decay

I had a piece in Scientific American last week. It’s paywalled, but if you’re a subscriber there you can see it, or you can buy the print magazine.

(I also had two pieces out in other outlets this week. I’ll be saying more about them…in a couple weeks.)

The Scientific American piece is about an apocalyptic particle physics scenario called vacuum decay. It’s a topic I covered last year in Quanta Magazine, an unlikely event where the Higgs field which gives fundamental particles their mass changes value, suddenly making all other particles much more massive and changing physics as we know it. It’s a change that physicists think would start as a small bubble and spread at (almost) the speed of light, covering the universe.

What I wrote for Quanta was a short news piece covering a small adjustment to the calculation, one that made the chance of vacuum decay slightly more likely. (But still mind-bogglingly small, to be clear.)

Scientific American asked for a longer piece, and that gave me space to dig deeper. I was able to say more about how vacuum decay works, with a few metaphors that I think should make it a lot easier to understand. I also got to learn about some new developments, in particular, an interesting story about how tiny primordial black holes could make vacuum decay dramatically more likely.

One thing that was a bit too complicated to talk about were the puzzles involved in trying to calculate these chances. In the article, I mention a calculation of the chance of vacuum decay by a team including Matthew Schwartz. That calculation wasn’t the first to estimate the chance of vacuum decay, and it’s not the most recent update either. Instead, I picked it because Schwartz’s team approached the question in what struck me as a more reliable way, trying to cut through confusion by asking the most basic question you can in a quantum theory: given that now you observe X, what’s the chance that later you observe Y? Figuring out how to turn vacuum decay into that kind of question correctly is tricky (for example, you need to include the possibility that vacuum decay happens, then reverses, then happens again).

The calculations of black holes speeding things up didn’t work things out in quite as much detail. I like to think I’ve made a small contribution by motivating them to look at Schwartz’s work, which might spawn a more rigorous calculation in future. When I talked to Schwartz, he wasn’t even sure whether the picture of a bubble forming in one place and spreading at light speed is correct: he’d calculated the chance of the initial decay, but hadn’t found a similarly rigorous way to think about the aftermath. So even more than the uncertainty I talk about in the piece, the questions about new physics and probability, there is even some doubt about whether the whole picture really works the way we’ve been imagining it.

That makes for a murky topic! But it’s also a flashy one, a compelling story for science fiction and the public imagination, and yeah, another motivation to get high-precision measurements of the Higgs and top quark from future colliders! (If maybe not quite the way this guy said it.)

There Is No Shortcut to Saying What You Mean

Blogger Andrew Oh-Willeke of Dispatches from Turtle Island pointed me to an editorial in Science about the phrase scientific consensus.

The editorial argues that by referring to conclusions like the existence of climate change or vaccine safety as “the scientific consensus”, communicators have inadvertently fanned the flames of distrust. By emphasizing agreement between scientists, the phrase “scientific consensus” leaves open the question of how that consensus was reached. More conspiracy-minded people imagine shady backroom deals and corrupt payouts, while the more realistic blame incentives and groupthink. If you disagree with “the scientific consensus”, you may thus decide the best way forward is to silence those pesky scientists.

(The link to current events is left as an exercise to the reader, to comment on elsewhere. As usual, please no explicit discussion of politics on this blog!)

Instead of “scientific consensus”, the editorial suggests another term, convergence of evidence. The idea is that by centering the evidence instead of the scientists, the phrase would make it clear that these conclusions are justified by something more than social pressures, and will remain even if the scientists promoting them are silenced.

Oh-Willeke pointed me to another blog post responding to the editorial, which has a nice discussion of how the terms were used historically, showing their popularity over time. “Convergence of evidence” was more popular in the 1950’s, with a small surge in the late 90’s and early 2000’s. “Scientific consensus” rose in the 1980’s and 90’s, lining up with a time when social scientists were skeptical about science’s objectivity and wanted to explore the social reasons why scientists come to agreement. It then fell around the year 2000, before rising again, this time used instead by professional groups of scientists to emphasize their agreement on issues like climate change.

(The blog post then goes on to try to motivate the word “consilience” instead, on the rather thin basis that “convergence of evidence” isn’t interdisciplinary enough, which seems like a pretty silly objection. “Convergence” implies coming in from multiple directions, it’s already interdisciplinary!)

I appreciate “convergence of evidence”, it seems like a useful phrase. But I think the editorial is working from the wrong perspective, in trying to argue for which terms “we should use” in the first place.

Sometimes, as a scientist or an organization or a journalist, you want to emphasize evidence. Is it “a preponderance of evidence”, most but not all? Is it “overwhelming evidence”, evidence so powerful it is unlikely to ever be defeated? Or is it a “convergence of evidence”, evidence that came in slowly from multiple paths, each independent route making a coincidence that much less likely?

But sometimes, you want to emphasize the judgement of the scientists themselves.

Sometimes when scientists agree, they’re working not from evidence but from personal experience: feelings of which kinds of research pan out and which don’t, or shared philosophies that sit deep in how they conceive their discipline. Describing physicists’ reasons for expecting supersymmetry before the LHC turned on as a convergence of evidence would be inaccurate. Describing it as having been a (not unanimous) consensus gets much closer to the truth.

Sometimes, scientists do have evidence, but as a journalist, you can’t evaluate its strength. You note some controversy, you can follow some of the arguments, but ultimately you have to be honest about how you got the information. And sometimes, that will be because it’s what most of the responsible scientists you talked to agreed on: scientific consensus.

As science communicators, we care about telling the truth (as much as we ever can, at any rate). As a result, we cannot adopt blanket rules of thumb. We cannot say, “we as a community are using this term now”. The only responsible thing we can do is to think about each individual word. We need to decide what we actually mean, to read widely and learn from experience, to find which words express our case in a way that is both convincing and accurate. There’s no shortcut to that, no formula where you just “use the right words” and everything turns out fine. You have to do the work, and hope it’s enough.

Antimatter Isn’t Magic

You’ve heard of antimatter, right?

For each type of particle, there is a rare kind of evil twin with the opposite charge, called an anti-particle. When an anti-proton meets a proton, they annihilate each other in a giant blast of energy.

I see a lot of questions online about antimatter. One recurring theme is people asking a very general question: how does antimatter work?

If you’ve just heard the pop physics explanation, antimatter probably sounds like magic. What about antimatter lets it destroy normal matter? Does it need to touch? How long does it take? And what about neutral particles like neutrons?

You find surprisingly few good explanations of this online, but I can explain why. Physicists like me don’t expect antimatter to be confusing in this way, because to us, antimatter isn’t doing anything all that special. When a particle and an antiparticle annihilate, they’re doing the same thing that any other pair of particles do when they do…basically anything else.

Instead of matter and antimatter, let’s talk about one of the oldest pieces of evidence for quantum mechanics, the photoelectric effect. Scientists shone light at a metal, and found that if the wavelength of the light was short enough, electrons would spring free, causing an electric current. If the wavelength was too long, the metal wouldn’t emit any electrons, no matter how much light they shone. Einstein won his Nobel prize for the explanation: the light hitting the metal comes in particle-sized pieces, called photons, whose energy is determined by the wavelength of the light. If the individual photons don’t have enough energy to get an electron to leave the metal, then no electron will move, no matter how many photons you use.

What happens to the photons after they hit the metal?

They go away. We say they are absorbed, an electron absorbs a photon and speeds up, increasing its kinetic energy so it can escape.

But we could just as easily say the photon is annihilated, if we wanted to.

In the photoelectric effect, you start with one electron and one photon, they come together, and you end up with one electron and no photon. In proton-antiproton annihilation, you start with a proton and an antiproton, they come together, and you end up with no protons or antiprotons, but instead “energy”…which in practice, usually means two photons.

That’s all that happens, deep down at the root of things. The laws of physics are rules about inputs and outputs. Start with these particles, they come together, you end up with these other particles. Sometimes one of the particles stays the same. Sometimes particles seem to transform, and different kinds of particles show up. Sometimes some of the particles are photons, and you think of them as “just energy”, and easy to absorb. But particles are particles, and nothing is “just energy”. Each thing, absorption, decay, annihilation, each one is just another type of what we call interactions.

What makes annihilation of matter and antimatter seem unique comes down to charges. Interactions have to obey the laws of physics: they conserve energy, they conserve momentum, and they conserve charge.

So why can an antiproton and a proton annihilate to pure photons, while two protons can’t? A proton and an antiproton have opposite charge, a photon has zero charge. You could combine two protons to make something else, but it would have to have the same charge as two protons.

What about neutrons? A neutron has no electric charge, so you might think it wouldn’t need antimatter. But a neutron has another type of charge, called baryon number. In order to annihilate one, you’d need an anti-neutron, which would still have zero electric charge but would have the opposite baryon number. (By the way, physicists have been making anti-neutrons since 1956.)

On the other hand, photons actually have no charge. So do Higgs bosons. So one Higgs boson can become two photons, without annihilating with anything else. Each of these particles can be called its own antiparticle: a photon is also an antiphoton, a Higgs is also an anti-Higgs.

Because particle-antiparticle annihilation follows the same rules as other interactions between particles, it also takes place via the same forces. When a proton and an antiproton annihilate each other, they typically do this via the electromagnetic force. This is why you end up with light, which is an electromagnetic wave. Like everything in the quantum world, this annihilation isn’t certain. Is has a chance to happen, proportional to the strength of the interaction force involved.

What about neutrinos? They also appear to have a kind of charge, called lepton number. That might not really be a conserved charge, and neutrinos might be their own antiparticles, like photons. However, they are much less likely to be annihilated than protons and antiprotons, because they don’t have electric charge, and thus their interaction doesn’t depend on the electromagnetic force, but on the much weaker weak nuclear force. A weaker force means a less likely interaction.

Antimatter might seem like the stuff of science fiction. But it’s not really harder to understand than anything else in particle physics.

(I know, that’s a low bar!)

It’s just interactions. Particles go in, particles go out. If it follows the rules, it can happen, if it doesn’t, it can’t. Antimatter is no different.

I’ve Felt Like a Hallucinating LLM

ChatGPT and its kin work by using Large Language Models, or LLMs.

A climate model is a pile of mathematics and code, honed on data from the climate of the past. Tell it how the climate starts out, and it will give you a prediction for what happens next.

Similarly, a language model is a pile of mathematics and code, honed on data from the texts of the past. Tell it how a text starts, and it will give you a prediction for what happens next.

We have a rough idea of what a climate model can predict. The climate has to follow the laws of physics, for example. Similarly, a text should follow the laws of grammar, the order of verbs and nouns and so forth. The creators of the earliest, smallest language models figured out how to do that reasonably well.

Texts do more than just follow grammar, though. They can describe the world. And LLMs are both surprisingly good and surprisingly bad at that. They can do a lot when used right, answering test questions most humans would struggle with. But they also “hallucinate”, confidently saying things that have nothing to do with reality.

If you want to understand why large language models make both good predictions and bad, you shouldn’t just think about abstract “texts”. Instead, think about a specific type of text: a story.

Stories follow grammar, most of the time. But they also follow their own logic. The hero sets out, saves the world, and returns home again. The evil queen falls from the tower at the climax of the final battle. There are three princesses, and only the third can break the spell.

We aren’t usually taught this logic, like we’re taught physics or grammar. We learn it from experience, from reading stories and getting used to patterns. It’s the logic, not of how a story must go, but of how a story typically goes. And that question, of what typically comes next, is exactly the question LLMs are designed to answer.

It’s also a question we sometimes answer.

I was a theatre kid, and I loved improv in particular. Some of it was improv comedy, the games and skits you might have seen on “Whose Line is it Anyway?” But some of it was more…hippy stuff.

I’d meet up with a group on Saturdays. One year we made up a creation myth, half-rehearsed and half-improvised, a collection of gods and primordial beings. The next year we moved the story forward. Civilization had risen…and fallen again. We played a group of survivors gathered around a campfire, wary groups wondering what came next.

We plotted out characters ahead of time. I was the “villain”, or the closest we had to one. An enforcer of the just-fallen empire, the oppressor embodied. While the others carried clubs, staves, and farm implements, I was the only one with a real weapon: a sword.

(Plastic in reality, but the audience knew what to do.)

In the arguments and recriminations of the story, that sword set me apart, a constant threat that turned my character from contemptible to dangerous, that gave me a seat at the table even as I antagonized and stirred the pot.

But the story had another direction. The arguments pushed and pulled, and gradually the survivors realized that they would not survive if they did not put their grievances to rest, if they did not seek peace. So, one man stepped forward, and tossed his staff into the fire.

The others followed. One by one, clubs and sticks and menacing tools were cast aside. And soon, I was the only one armed.

If I was behaving logically, if I followed my character’s interests, I would have “won” there. I had gotten what I wanted, now there was no check on my power.

But that wasn’t what the story wanted. Improv is a game of fast decisions and fluid invention. We follow our instincts, and our instincts are shaped by experience. The stories of the past guide our choices, and must often be the only guide: we don’t have time to edit, or to second-guess.

And I felt the story, and what it wanted. It was a command that transcended will, that felt like it left no room for an individual actor making an individual decision.

I cast my sword into the fire.

The instinct that brought me to do that is the same instinct that guides authors when they say that their characters write themselves, when their story goes in an unexpected direction. It’s an instinct that can be tempered and counteracted, with time and effort, because it can easily lead to nonsense. It’s why every good book needs an editor, why improv can be as repetitive as it is magical.

And it’s been the best way I’ve found to understand LLMs.

An LLM telling a story tells a typical story, based on the data used to create it. In the same way, an LLM giving advice gives typical advice, to some extent in content but more importantly in form, advice that is confident and mentions things advice often mentions. An LLM writing a biography will write a typical biography, which may not be your biography, even if your biography was one of those used to create it, because it tries to predict how a biography should go based on all the other biographies. And all of these predictions and hallucinations are very much the kind of snap judgement that disarmed me.

These days, people are trying to build on top of LLMs and make technology that does more, that can edit and check its decisions. For the most part, they’re building these checks out of LLMs. Instead of telling one story, of someone giving advice on the internet, they tell two stories: the advisor and the editor, one giving the advice and one correcting it. They have to tell these stories many times, broken up into many parts, to approximate something other than the improv actor’s first instincts, and that’s why software that does this is substantially more expensive than more basic software that doesn’t.

I can’t say how far they’ll get. Models need data to work well, decisions need reliability to be good, computers need infrastructure to compute. But if you want to understand what’s at an LLM’s beating heart, think about the first instincts you have in writing or in theatre, in stories or in play. Then think about a machine that just does that.

I Have a Theory

“I have a theory,” says the scientist in the book. But what does that mean? What does it mean to “have” a theory?

First, there’s the everyday sense. When you say “I have a theory”, you’re talking about an educated guess. You think you know why something happened, and you want to check your idea and get feedback. A pedant would tell you you don’t really have a theory, you have a hypothesis. It’s “your” hypothesis, “your theory”, because it’s what you think happened.

The pedant would insist that “theory” means something else. A theory isn’t a guess, even an educated guess. It’s an explanation with evidence, tested and refined in many different contexts in many different ways, a whole framework for understanding the world, the most solid knowledge science can provide. Despite the pedant’s insistence, that isn’t the only way scientists use the word “theory”. But it is a common one, and a central one. You don’t really “have” a theory like this, though, except in the sense that we all do. These are explanations with broad consensus, things you either know of or don’t, they don’t belong to one person or another.

Except, that is, if one person takes credit for them. We sometimes say “Darwin’s theory of evolution”, or “Einstein’s theory of relativity”. In that sense, we could say that Einstein had a theory, or that Darwin had a theory.

Sometimes, though, “theory” doesn’t mean this standard official definition, even when scientists say it. And that changes what it means to “have” a theory.

For some researchers, a theory is a lens with which to view the world. This happens sometimes in physics, where you’ll find experts who want to think about a situation in terms of thermodynamics, or in terms of a technique called Effective Field Theory. It happens in mathematics, where some choose to analyze an idea with category theory not to prove new things about it, but just to translate it into category theory lingo. It’s most common, though, in the humanities, where researchers often specialize in a particular “interpretive framework”.

For some, a theory is a hypothesis, but also a pet project. There are physicists who come up with an idea (maybe there’s a variant of gravity with mass! maybe dark energy is changing!) and then focus their work around that idea. That includes coming up with ways to test whether the idea is true, showing the idea is consistent, and understanding what variants of the idea could be proposed. These ideas are hypotheses, in that they’re something the scientist thinks could be true. But they’re also ideas with many moving parts that motivate work by themselves.

Taken to the extreme, this kind of “having” a theory can go from healthy science to political bickering. Instead of viewing an idea as a hypothesis you might or might not confirm, it can become a platform to fight for. Instead of investigating consistency and proposing tests, you focus on arguing against objections and disproving your rivals. This sometimes happens in science, especially in more embattled areas, but it happens much more often with crackpots, where people who have never really seen science done can decide it’s time for their idea, right or wrong.

Finally, sometimes someone “has” a theory that isn’t a hypothesis at all. In theoretical physics, a “theory” can refer to a complete framework, even if that framework isn’t actually supposed to describe the real world. Some people spend time focusing on a particular framework of this kind, understanding its properties in the hope of getting broader insights. By becoming an expert on one particular theory, they can be said to “have” that theory.

Bonus question: in what sense do string theorists “have” string theory?

You might imagine that string theory is an interpretive framework, like category theory, with string theorists coming up with the “string version” of things others understand in other ways. This, for the most part, doesn’t happen. Without knowing whether string theory is true, there isn’t much benefit in just translating other things to string theory terms, and people for the most part know this.

For some, string theory is a pet project hypothesis. There is a community of people who try to get predictions out of string theory, or who investigate whether string theory is consistent. It’s not a huge number of people, but it exists. A few of these people can get more combative, or make unwarranted assumptions based on dedication to string theory in particular: for example, you’ll see the occasional argument that because something is difficult in string theory it must be impossible in any theory of quantum gravity. You see a spectrum in the community, from people for whom string theory is a promising project to people for whom it is a position that needs to be defended and argued for.

For the rest, the question of whether string theory describes the real world takes a back seat. They’re people who “have” string theory in the sense that they’re experts, and they use the theory primarily as a mathematical laboratory to learn broader things about how physics works. If you ask them, they might still say that they hypothesize string theory is true. But for most of these people, that question isn’t central to their work.

This Week, at FirstPrinciples.org

I’ve got a piece out this week in a new venue: FirstPrinciples.org, where I’ve written a profile of a startup called Vaire Computing.

Vaire works on reversible computing, an idea that tries to leverage thermodynamics to make a computer that wastes as little heat as possible. While I learned a lot of fun things that didn’t make it into the piece…I’m not going to tell you them this week! That’s because I’m working on another piece about reversible computing, focused on a different aspect of the field. When that piece is out I’ll have a big “bonus material post” talking about what I learned writing both pieces.

This week, instead, the bonus material is about FirstPrinciples.org itself, where you’ll be seeing me write more often in future. The First Principles Foundation was founded by Ildar Shar, a Canadian tech entrepreneur who thinks that physics is pretty cool. (Good taste that!) His foundation aims to support scientific progress, especially in addressing the big, fundamental questions. They give grants, analyze research trends, build scientific productivity tools…and most relevantly for me, publish science news on their website, in a section called the Hub.

The first time I glanced through the Hub, it was clear that FirstPrinciples and I have a lot in common. Like me, they’re interested both in scientific accomplishments and in the human infrastructure that makes them possible. They’ve interviewed figures in the open access movement, like the creators of arXiv and SciPost. On the science side, they mix coverage of the mainstream and reputable with outsiders challenging the status quo, and hot news topics with explainers of key concepts. They’re still new, and still figuring out what they want to be. But from my glimpse on the way, it looks like they’re going somewhere good.