Tag Archives: PublicPerception

Transforming Particles Are Probably Here to Stay

It can be tempting to imagine the world in terms of lego-like building-blocks. Atoms stick together protons, neutrons, and electrons, and protons and neutrons are made of stuck-together quarks in turn. And while atoms, despite the name, aren’t indivisible, you might think that if you look small enough you’ll find indivisible, unchanging pieces, the smallest building-blocks of reality.

Part of that is true. We might, at some point, find the smallest pieces, the things everything else is made of. (In a sense, it’s quite likely we’ve already found them!) But those pieces don’t behave like lego blocks. They aren’t indivisible and unchanging.

Instead, particles, even the most fundamental particles, transform! The most familiar example is beta decay, a radioactive process where a neutron turns into a proton, emitting an electron and a neutrino. This process can be explained in terms of more fundamental particles: the neutron is made of three quarks, and one of those quarks transforms from a “down quark” to an “up quark”. But the explanation, as far as we can tell, doesn’t go any deeper. Quarks aren’t unchanging, they transform.

Beta decay! Ignore the W, which is important but not for this post.

There’s a suggestion I keep hearing, both from curious amateurs and from dedicated crackpots: why doesn’t this mean that quarks have parts? If a down quark can turn into an up dark, an electron, and a neutrino, then why doesn’t that mean that a down quark contains an up quark, an electron, and a neutrino?

The simplest reason is that this isn’t the only way a quark transforms. You can also have beta-plus decay, where an up quark transforms into a down quark, emitting a neutrino and the positively charged anti-particle of the electron, called a positron.

Also, ignore the directions of the arrows, that’s weird particle physics notation that doesn’t matter here.

So to make your idea work, you’d somehow need each down quark to contain an up quark plus some other particles, and each up quark to contain a down quark plus some other particles.

Can you figure out some complicated scheme that works like that? Maybe. But there’s a deeper reason why this is the wrong path.

Transforming particles are part of a broader phenomenon, called particle production. Reactions in particle physics can produce new particles that weren’t there before. This wasn’t part of the earliest theories of quantum mechanics that described one electron at a time. But if you want to consider the quantum properties of not just electrons, but the electric field as well, then you need a more complete theory, called a quantum field theory. And in those theories, you can produce new particles. It’s as simple as turning on the lights: from a wiggling electron, you make light, which in a fully quantum theory is made up of photons. Those photons weren’t “part of” the electron to start with, they are produced by its motion.

If you want to avoid transforming particles, to describe everything in terms of lego-like building-blocks, then you want to avoid particle production altogether. Can you do this in a quantum field theory?

Actually, yes! But your theory won’t describe the whole of the real world.

In physics, we have examples of theories that don’t have particle production. These example theories have a property called integrability. They are theories we can “solve”, doing calculations that aren’t possible in ordinary theories, named after the fact that the oldest such theories in classical mechanics were solved using integrals.

Normal particle physics theories have conserved charges. Beta decay conserves electric charge: you start out with a neutral particle, and end up with one particle with positive charge and another with negative charge. It also conserves other things, like “electron-number” (the electron has electron-number one, the neutrino that comes out with it has electron-number minus one), energy, and momentum.

Integrable theories have those charges too, but they have more. In fact, they have an infinite number of conserved charges. As a result, you can show that in these theories it is impossible to produce new particles. It’s as if each particle’s existence is its own kind of conserved charge, one that can never be created or destroyed, so that each collision just rearranges the particles, never makes new ones.

But while we can write down these theories, we know they can’t describe the whole of the real world. In an integrable theory, when you build things up from the fundamental building-blocks, their energy follows a pattern. Compare the energy of a bunch of different combinations, and you find a characteristic kind of statistical behavior called a Poisson distribution.

Look at the distribution of energies of nuclei of atoms, and you’ll find a very different kind of behavior. It’s called a Wigner-Dyson distribution, and it indicates the opposite of integrability: chaos. Chaos is behavior that can’t be “solved” like integrable theories, behavior that has to be approached by simulations and approximations.

So if you really want there to be un-changing building-blocks, if you think that’s really essential? Then you should probably start looking at integrable theories. But I wouldn’t hold my breath if I were you: the real world seems pretty clearly chaotic, not integrable. And probably, that means particle production is here to stay.

Lack of Recognition Is a Symptom, Not a Cause

Science is all about being first. Once a discovery has been made, discovering the same thing again is redundant. At best, you can improve the statistical evidence…but for a theorem or a concept, you don’t even have that. This is why we make such a big deal about priority: the first person to discover something did something very valuable. The second, no matter how much effort and insight went into their work, did not.

Because priority matters, for every big scientific discovery there is a priority dispute. Read about science’s greatest hits, and you’ll find people who were left in the wings despite their accomplishments, people who arguably found key ideas and key discoveries earlier than the people who ended up famous. That’s why the idea Peter Higgs is best known for, the Higgs mechanism,

“is therefore also called the Brout–Englert–Higgs mechanism, or Englert–Brout–Higgs–Guralnik–Hagen–Kibble mechanism, Anderson–Higgs mechanism,Anderson–Higgs–Kibble mechanism, Higgs–Kibble mechanism by Abdus Salam and ABEGHHK’tH mechanism (for Anderson, Brout, Englert, Guralnik, Hagen, Higgs, Kibble, and ‘t Hooft) by Peter Higgs.”

Those who don’t get the fame don’t get the rewards. The scientists who get less recognition than they deserve get fewer grants and worse positions, losing out on the career outcomes that the person famous for the discovery gets, even if the less-recognized scientist made the discovery first.

…at least, that’s the usual story.

You can start to see the problem when you notice a contradiction: if a discovery has already been made, what would bring someone to re-make it?

Sometimes, people actually “steal” discoveries, finding something that isn’t widely known and re-publishing it without acknowledging the author. More often, though, the re-discoverer genuinely didn’t know. That’s because, in the real world, we don’t all know about a discovery as soon as it’s made. It has to be communicated.

At minimum, this means you need enough time to finish ironing out the kinks of your idea, write up a paper, and disseminate it. In the days before the internet, dissemination might involve mailing pre-prints to universities across the ocean. It’s relatively easy, in such a world, for two people to get started discovering the same thing, write it up, and even publish it before they learn about the other person’s work.

Sometimes, though, something gets rediscovered long after the original paper should have been available. In those cases, the problem isn’t time, it’s reach. Maybe the original paper was written in a way that hid its implications. Maybe it was published in a way only accessible to a smaller community: either a smaller part of the world, like papers that were only available to researchers in the USSR, or a smaller research community. Maybe the time hadn’t come yet, and the whole reason why the result mattered had yet to really materialize.

For a result like that, a lack of citations isn’t really the problem. Rather than someone who struggles because their work is overlooked, these are people whose work is overlooked, in a sense, because they are struggling: because their work is having a smaller impact on the work of others. Acknowledging them later can do something, but it can’t change the fact that this was work published for a smaller community, yielding smaller rewards.

And ultimately, it isn’t just priority we care about, but impact. While the first European to make contact with the New World might have been Erik the Red, we don’t call the massive exchange of plants and animals between the Old and New World the “Red Exchange”. Erik the Red being “first” matters much less, historically speaking, than Columbus changing the world. Similarly, in science, being the first to discover something is meaningless if that discovery doesn’t change how other people do science, and the person who manages to cause that change is much more valuable than someone who does the same work but doesn’t manage the same reach.

Am I claiming that it’s fair when scientists get famous for other peoples’ discoveries? No, it’s definitely not fair. It’s not fair because most of the reasons one might have lesser reach aren’t under one’s control. Soviet scientists (for the most part) didn’t choose to be based in the USSR. People who make discoveries before they become relevant don’t choose the time in which they were born. And while you can get better at self-promotion with practice, there’s a limited extent to which often-reclusive scientists should be blamed for their lack of social skills.

What I am claiming is that addressing this isn’t a matter of scrupulously citing the “original” discoverer after the fact. That’s a patch, and a weak one. If we want to get science closer to the ideal, where each discovery only has to be made once, then we need to work to increase reach for everyone. That means finding ways to speed up publication, to let people quickly communicate preliminary ideas with a wide audience and change the incentives so people aren’t penalized when others take up those ideas. It means enabling conversations between different fields and sub-fields, building shared vocabulary and opportunities for dialogue. It means making a community that rewards in-person hand-shaking less and careful online documentation more, so that recognition isn’t limited to the people with the money to go to conferences and the social skills to schmooze their way through them. It means anonymity when possible, and openness when we can get away with it.

Lack of recognition and redundant effort are both bad, and they both stem from the same failures to communicate. Instead of fighting about who deserves fame, we should work to make sure that science is truly global and truly universal. We can aim for a future where no-one’s contribution goes unrecognized, and where anything that is known to one is known to all.

The Mistakes Are the Intelligence

There’s a lot of hype around large language models, the foundational technology behind services like ChatGPT. Representatives of OpenAI have stated that, in a few years, these models might have “PhD-level intelligence“. On the other hand, at the time, ChatGPT couldn’t count the number of letter “r”s in the word “strawberry”. The model and the setup around it has improved, and GPT-4o1 apparently now gets the correct 3 “r”s…but I’m sure it makes other silly mistakes, mistakes an intelligent human would never make.

The mistakes made by large language models are important, due to the way those models are used. If people are going to use them for customer service, writing transcripts, or editing grammar, they don’t want to introduce obvious screwups. (Maybe this means they shouldn’t use the models this way!)

But the temptation is to go further, to say that these mistakes are proof that these models are, and will always be, dumb, not intelligent. And that’s not the right way to think about intelligence.

When we talk about intelligent people, when we think about measuring things like IQ, we’re looking at a collection of different traits. These traits typically go together in humans: a human who is good at one will usually be good at the others. But from the perspective of computer science, these traits are very different.

Intelligent people tend to be good at following complex instructions. They can remember more, and reason faster. They can hold a lot in their head at once, from positions of objects to vocabulary.

These are all things that computers, inherently, are very good at. When Turing wrote down his abstract description of a computer, he imagined a machine with infinite memory, able to follow any instructions with perfect fidelity. Nothing could live up to that ideal, but modern computers are much closer to it than humans. “Computer” used to be a job, with rooms full of people (often women) hired to do calculations for scientific projects. We don’t do that any more, machines have made that work superfluous.

What’s more, the kind of processing a Turing machine does is probably the only way to reliably answer questions. If you want to make sure you get the correct answer every time, then it seems that you can’t do better than to use a sufficiently powerful computer.

But while computer-the-machine replaced computer-the-job, mathematician-the-job still exists. And that’s because not all intelligence is about answering questions reliably.

Alexander Grothendieck was a famous mathematician, known for his deep insights and powerful ideas. According to legend, when giving a talk referring to prime numbers, someone in the audience asked him to name a specific prime. He named 57.

With a bit of work, any high-school student can figure out that 57, which equals 3 times 19, isn’t a prime number. A computer can easily figure out that 57 is not a prime number. Even ChatGPT knows that 57 is not a prime number.

But this doesn’t mean that Grothendieck was dumber than a high school student, or dumber than ChatGPT. Grothendieck was using a different kind of intelligence, the heuristic kind.

Heuristics are unreliable reasoning. They’re processes that get the right answer some of the time, but not all of the time. Because of that, though, they don’t have the same limits as reliable computer programs. Pick the right situation and the right conditions, and a heuristic can give you an answer faster than you could possibly get by following reliable rules.

Intelligent humans follow instructions well, but they also have good heuristics. They solve problems creatively, sometimes problems that are very hard for computers to address. People like Grothendieck make leaps of mathematical reasoning, guessing at the right argument before they have completely fleshed out a proof. This kind of intelligence is error-prone: rely on it, and you might claim 57 is prime. But at the moment, it’s our only intellectual advantage over machines.

Ultimately, ChatGPT is an advance in language processing, and language is a great example. Sentences don’t have definite meaning, we interpret what we read and hear in context, and sometimes our interpretation is wrong. Sometimes we hear words no-one actually said! It’s impossible, both for current technology and for the human brain, to process general text in a 100% reliable way. So large language models like GPT don’t do it reliably. They use an approximate model, a big complicated pile of rules tweaked over and over again until, enough of the time, they get the next word right in a text.

The kind of heuristic reasoning done by large language models is more effective than many people expected. Being able to predict the next word in a text unreliably also means you can write code unreliably, or count things unreliably, or do math unreliably. You can’t do any of these things as well as an appropriately-chosen human, at least not with current resources.

But in the longer run, heuristic intelligence is precisely the type of intelligence we should aspire to…or fear. Right now, we hire humans to do intellectual work because they have good heuristics. If we could build a machine with equivalent or better heuristics for those tasks, then people would hire a lot fewer humans. And if you’re worried about AI taking over the world, you’re worried about AI coming up with shortcuts to our civilization, tricks we couldn’t anticipate or plan against that destroy everything we care about. Those tricks can’t come from following rules: if they did, we could discover them just as easily. They would have to come from heuristics, sideways solutions that don’t work all the time but happen to work the one time that matters.

So yes, until the latest release, ChatGPT couldn’t tell you how many “r”s are in “strawberry”. Counting “r”s is something computers could already do, because it’s something that can be done by following reliable rules. It’s also something you can do easily, if you follow reliable rules. ChatGPT impresses people because it can do some of the things you do, that can’t be done with reliable rules. If technology like it has any chance of changing the world, those are the kinds of things it will have to be able to do.

The Bystander Effect for Reviewers

I probably came off last week as a bit of an extreme “journal abolitionist”. This week, I wanted to give a couple caveats.

First, as a commenter pointed out, the main journals we use in my field are run by nonprofits. Physical Review Letters, the journal where we publish five-page papers about flashy results, is run by the American Physical Society. The Journal of High-Energy Physics, where we publish almost everything else, is run by SISSA, the International School for Advanced Studies in Trieste. (SISSA does use Springer, a regular for-profit publisher, to do the actual publishing.)

The journals are also funded collectively, something I pointed out here before but might not have been obvious to readers of last week’s post. There is an agreement, SCOAP3, where research institutions band together to pay the journals. Authors don’t have to pay to publish, and individual libraries don’t have to pay for subscriptions.

And this is a lot better than the situation in other fields, yeah! Though I’d love to quantify how much. I haven’t been able to find a detailed breakdown, but SCOAP3 pays around 1200 EUR per article published. What I’d like to do (but not this week) is to compare this to what other fields pay, as well as to publishing that doesn’t have the same sort of trapped audience, and to online-only free journals like SciPost. (For example, publishing actual physical copies of journals at this point is sort of a vanity thing, so maybe we should compare costs to vanity publishers?)

Second, there’s reviewing itself. Even without traditional journals, one might still want to keep peer review.

What I wanted to understand last week was what peer review does right now, in my field. We read papers fresh off the arXiv, before they’ve gone through peer review. Authors aren’t forced to update the arXiv with the journal version of their paper, if they want another version, even if that version was rejected by the reviewers, then they’re free to do so, and most of us wouldn’t notice. And the sort of in-depth review that happens in peer review also happens without it. When we have journal clubs and nominate someone to present a recent paper, or when we try to build on a result or figure out why it contradicts something we thought we knew, we go through the same kind of in-depth reading that (in the best cases) reviewers do.

But I think I’ve hit upon something review does that those kinds of informal things don’t. It gets us to speak up about it.

I presented at a journal club recently. I read through a bombastic new paper, figured out what I thought was wrong with it, and explained it to my colleagues.

But did I reach out to the author? No, of course not, that would be weird.

Psychologists talk about the bystander effect. If someone collapses on the street, and you’re the only person nearby, you’ll help. If you’re one of many, you’ll wait and see if someone else helps instead.

I think there’s a bystander effect for correcting people. If someone makes a mistake and publishes something wrong, we’ll gripe about it to each other. But typically, we won’t feel like it’s our place to tell the author. We might get into a frustrating argument, there wouldn’t be much in it for us, and it might hurt our reputation if the author is well-liked.

(People do speak up when they have something to gain, of course. That’s why when you write a paper, most of the people emailing you won’t be criticizing the science: they’ll be telling you you need to cite them.)

Peer review changes the expectations. Suddenly, you’re expected to criticize, it’s your social role. And you’re typically anonymous, you don’t have to worry about the consequences. It becomes a lot easier to say what you really think.

(It also becomes quite easy to say lazy stupid things, of course. This is why I like setups like SciPost, where reviews are made public even when the reviewers are anonymous. It encourages people to put some effort in, and it means that others can see that a paper was rejected for bad reasons and put less stock in the rejection.)

I think any new structure we put in place should keep this feature. We need to preserve some way to designate someone a critic, to give someone a social role that lets them let loose and explain why someone else is wrong. And having these designated critics around does help my field. The good criticisms get implemented in the papers, the authors put the new versions up on arXiv. Reviewing papers for journals does make our science better…even if none of us read the journal itself.

Why Journals Are Sticky

An older professor in my field has a quirk: every time he organizes a conference, he publishes all the talks in a conference proceeding.

In some fields, this would be quite normal. In computer science, where progress flows like a torrent, new developments are announced at conferences long before they have the time to be written up carefully as a published paper. Conference proceedings are summaries of what was presented at the conference, published so that anyone can catch up on the new developments.

In my field, this is rarer. A few results at each conference will be genuinely new, never-before-published discoveries. Most, though, are talks on older results, results already available online. Writing them up again in summarized form as a conference proceeding seems like a massive waste of time.

The cynical explanation is that this professor is doing this for the citations. Each conference proceeding one of his students publishes is another publication on their CV, another work that they can demand people cite whenever someone uses their ideas or software, something that puts them above others’ students without actually doing any extra scientific work.

I don’t think that’s how this professor thinks about it, though. He certainly cares about his students’ careers, and will fight for them to get cited as much as possible. But he asks everyone at the conference to publish a proceeding, not just his students. I think he’d argue that proceedings are helpful, that they can summarize papers in new ways and make them more accessible. And if they give everyone involved a bit more glory, if they let them add new entries to their CV and get fancy books on their shelves, so much the better for everyone.

My guess is, he really believes something like that. And I’m fairly sure he’s wrong.

The occasional conference proceeding helps, but only because it makes us more flexible. Sometimes, it’s important to let others know about a new result that hasn’t been published yet, and we let conference proceedings go into less detail than a full published paper, so this can speed things up. Sometimes, an old result can benefit from a new, clearer explanation, which normally couldn’t be published without it being a new result (or lecture notes). It’s good to have the option of a conference proceeding.

But there is absolutely no reason to have one for every single talk at a conference.

Between the cynical reason and the explicit reason, there’s the banal one. This guy insists on conference proceedings because they were more useful in the past, because they’re useful in other fields, and because he’s been doing them himself for years. He insists on them because to him, they’re a part of what it means to be a responsible scientist.

And people go along with it. Because they don’t want to get into a fight with this guy, certainly. But also because it’s a bit of extra work that could give a bit of a career boost, so what’s the harm?

I think something similar to this is why academic journals still work the way they do.

In the past, journals were the way physicists heard about new discoveries. They would get each edition in the mail, and read up on new developments. The journal needed to pay professional copyeditors and printers, so they needed money, and they got that money from investors by being part of for-profit companies that sold shares.

Now, though, physicists in my field don’t read journals. We publish our new discoveries online on a non-profit website, formatting them ourselves with software that uses the same programming skills we use in the rest of our professional lives. We then discuss the papers in email threads and journal club meetings. When a paper is wrong, or missing something important, we tell the author, and they fix it.

Oh, and then after that we submit the papers to the same for-profit journals and the same review process that we used to use before we did all this, listing the journals that finally accept the papers on our CVs.

Why do we still do that?

Again, you can be cynical. You can accuse the journals of mafia-ish behavior, you can tie things back to the desperate need to publish in high-ranked journals to get hired. But I think the real answer is a bit more innocent, and human, than that.

Imagine that you’re a senior person in the field. You may remember the time before we had all of these nice web-based publishing options, when journals were the best way to hear about new developments. More importantly than that, though, you’ve worked with these journals. You’ve certainly reviewed papers for them, everyone in the field does that, but you may have also served as an editor, tracking down reviewers and handling communication between the authors and the journal. You’ve seen plenty of cases where the journal mattered, where tracking down the right reviewers caught a mistake or shot down a crackpot’s ambitions, where the editing cleaned something up or made a work more appear more professional. You think of the journals as having high standards, standards you have helped to uphold: when choosing between candidates for a job, you notice that one has several papers in Physical Review Letters, and remember papers you’ve rejected for not meeting what you intuited were that journal’s standards. To you, journals are a key part of being a responsible scientist.

Does any of that make journals worth it, though?

Well, that depends on costs. It depends on alternatives. It depends not merely on what the journals catch, but on how often they do it, and how much would have been caught on its own. It depends on whether the high standards you want to apply to job applicants are already being applied by the people who write their recommendation letters and establish their reputations.

And you’re not in a position to evaluate any of that, of course. Few people are, who don’t spend a ton of time thinking about scientific publishing.

And thus, for the non-senior people, there’s not much reason to push back. One hears a few lofty speeches about Elsevier’s profits, and dreams about the end of the big for-profit journals. But most people aren’t cut out to be crusaders or reformers, especially when they signed up to be scientists. Most people are content not to annoy the most respected people in their field by telling them that something they’ve spent an enormous amount of time on is now pointless. Most people want to be seen as helpful by these people, to not slack off on work like reviewing that they argue needs doing.

And most of us have no reason to think we know that much better, anyway. Again, we’re scientists, not scientific publishing experts.

I don’t think it’s good practice to accuse people of cognitive biases. Everyone thinks they have good reasons to believe what they believe, and the only way to convince them is to address those reasons.

But the way we use journals in physics these days is genuinely baffling. It’s hard to explain, it’s the kind of thing people have been looking quizzically at for years. And this kind of explanation is the only one I’ve found that matches what I’ve seen. Between the cynical explanation and the literal arguments, there’s the basic human desire to do what seems like the responsible thing. That tends to explain a lot.

Why Quantum Gravity Is Controversial

Merging quantum mechanics and gravity is a famously hard physics problem. Explaining why merging quantum mechanics and gravity is hard is, in turn, a very hard science communication problem. The more popular descriptions tend to lead to misunderstandings, and I’ve posted many times over the years to chip away at those misunderstandings.

Merging quantum mechanics and gravity is hard…but despite that, there are proposed solutions. String Theory is supposed to be a theory of quantum gravity. Loop Quantum Gravity is supposed to be a theory of quantum gravity. Asymptotic Safety is supposed to be a theory of quantum gravity.

One of the great virtues of science and math is that we are, eventually, supposed to agree. Philosophers and theologians might argue to the end of time, but in math we can write down a proof, and in science we can do an experiment. If we don’t yet have the proof or the experiment, then we should reserve judgement. Either way, there’s no reason to get into an unproductive argument.

Despite that, string theorists and loop quantum gravity theorists and asymptotic safety theorists, famously, like to argue! There have been bitter, vicious, public arguments about the merits of these different theories, and decades of research doesn’t seem to have resolved them. To an outside observer, this makes quantum gravity seem much more like philosophy or theology than like science or math.

Why is there still controversy in quantum gravity? We can’t do quantum gravity experiments, sure, but if that were the problem physicists could just write down the possibilities and leave it at that. Why argue?

Some of the arguments are for silly aesthetic reasons, or motivated by academic politics. Some are arguments about which approaches are likely to succeed in future, which as always is something we can’t actually reliably judge. But the more justified arguments, the strongest and most durable ones, are about a technical challenge. They’re about something called non-perturbative physics.

Most of the time, when physicists use a theory, they’re working with an approximation. Instead of the full theory, they’re making an assumption that makes the theory easier to use. For example, if you assume that the velocity of an object is small, you can use Newtonian physics instead of special relativity. Often, physicists can systematically relax these assumptions, including more and more of the behavior of the full theory and getting a better and better approximation to the truth. This process is called perturbation theory.

Other times, this doesn’t work well. The full theory has some trait that isn’t captured by the approximations, something that hides away from these systematic tools. The theory has some important aspect that is non-perturbative.

Every proposed quantum gravity theory uses approximations like this. The theory’s proponents try to avoid these approximations when they can, but often they have to approximate and hope they don’t miss too much. The opponents, in turn, argue that the theory’s proponents are missing something important, some non-perturbative fact that would doom the theory altogether.

Asymptotic Safety is built on top of an approximation, one different from what other quantum gravity theorists typically use. To its proponents, work using their approximation suggests that gravity works without any special modifications, that the theory of quantum gravity is easier to find than it seems. Its opponents aren’t convinced, and think that the approximation is missing something important which shows that gravity needs to be modified.

In Loop Quantum Gravity, the critics think their approximation misses space-time itself. Proponents of Loop Quantum Gravity have been unable to prove that their theory, if you take all the non-perturbative corrections into account, doesn’t just roll up all of space and time into a tiny spiky ball. They expect that their theory should allow for a smooth space-time like we experience, but the critics aren’t convinced, and without being able to calculate the non-perturbative physics neither side can convince the other.

String Theory was founded and originally motivated by perturbative approximations. Later, String Theorists figured out how to calculate some things non-perturbatively, often using other simplifications like supersymmetry. But core questions, like whether or not the theory allows a positive cosmological constant, seem to depend on non-perturbative calculations that the theory gives no instructions for how to do. Some critics don’t think there is a consistent non-perturbative theory at all, that the approximations String Theorists use don’t actually approximate to anything. Even within String Theory, there are worries that the theory might try to resist approximation in odd ways, becoming more complicated whenever a parameter is small enough that you could use it to approximate something.

All of this would be less of a problem with real-world evidence. Many fields of science are happy to use approximations that aren’t completely rigorous, as long as those approximations have a good track record in the real world. In general though, we don’t expect evidence relevant to quantum gravity any time soon. Maybe we’ll get lucky, and studies of cosmology will reveal something, or an experiment on Earth will have a particularly strange result. But nature has no obligation to help us out.

Without evidence, though, we can still make mathematical progress. You could imagine someone proving that the various perturbative approaches to String Theory become inconsistent when stitched together into a full non-perturbative theory. Alternatively, you could imagine someone proving that a theory like String Theory is unique, that no other theory can do some key thing that it does. Either of these seems unlikely to come any time soon, and most researchers in these fields aren’t pursuing questions like that. But the fact the debate could be resolved means that it isn’t just about philosophy or theology. There’s a real scientific, mathematical controversy, one rooted in our inability to understand these theories beyond the perturbative methods their proponents use. And while I don’t expect it to be resolved any time soon, one can always hold out hope for a surprise.

Clickbait or Koan

Last month, I had a post about a type of theory that is, in a certain sense, “immune to gravity”. These theories don’t allow you to build antigravity machines, and they aren’t totally independent of the overall structure of space-time. But they do ignore the core thing most people think of as gravity, the curvature of space that sends planets around the Sun and apples to the ground. And while that trait isn’t something we can use for new technology, it has led to extremely productive conversations between mathematicians and physicists.

After posting, I had some interesting discussions on twitter. A few people felt that I was over-hyping things. Given all the technical caveats, does it really make sense to say that these theories defy gravity? Isn’t a title like “Gravity-Defying Theories” just clickbait?

Obviously, I don’t think so.

There’s a concept in education called inductive teaching. We remember facts better when they come in context, especially the context of us trying to solve a puzzle. If you try to figure something out, and then find an answer, you’re going to remember that answer better than if you were just told the answer from the beginning. There are some similarities here to the concept of a Zen koan: by asking questions like “what is the sound of one hand clapping?” a Zen master is supposed to get you to think about the world in a different way.

When I post with a counterintuitive title, I’m aiming for that kind of effect. I know that you’ll read the title and think “that can’t be right!” Then you’ll read the post, and hear the explanation. That explanation will stick with you better because you asked that question, because “how can that be right?” is the solution to a puzzle that, in that span of words, you cared about.

Clickbait is bad for two reasons. First, it sucks you in to reading things that aren’t actually interesting. I write my blog posts because I think they’re interesting, so I hope I avoid that. Second, it can spread misunderstandings. I try to be careful about these, and I have some tips how you can be too:

  1. Correct the misunderstanding early. If I’m worried a post might be misunderstood in a clickbaity way, I make sure that every time I post the link I include a sentence discouraging the misunderstanding. For example, for the post on Gravity-Defying Theories, before the link I wrote “No flying cars, but it is technically possible for something to be immune to gravity”. If I’m especially worried, I’ll also make sure that the first paragraph of the piece corrects the misunderstanding as well.
  2. Know your audience. This means both knowing the normal people who read your work, and how far something might go if it catches on. Your typical readers might be savvy enough to skip the misunderstanding, but if they latch on to the naive explanation immediately then the “koan” effect won’t happen. The wider your reach can be, the more careful you need to be about what you say. If you’re a well-regarded science news piece, don’t write a title saying that scientists have built a wormhole.
  3. Have enough of a conclusion to be “worth it”. This is obviously a bit subjective. If your post introduces a mystery and the answer is that you just made some poetic word choice, your audience is going to feel betrayed, like the puzzle they were considering didn’t have a puzzly answer after all. Whatever you’re teaching in your post, it needs to have enough “meat” that solving it feels like a real discovery, like the reader did some real work to solve it.

I don’t think I always live up to these, but I do try. And I think trying is better than the conservative option, of never having catchy titles that make counterintuitive claims. One of the most fun aspects of science is that sometimes a counterintuitive fact is actually true, and that’s an experience I want to share.

Does Science Require Publication?

Seen on Twitter:

As is traditional, twitter erupted into dumb arguments over this. Some made fun of Yann LeCun for implying that Elon Musk will be forgotten, which despite any other faults of his seems unlikely. Science popularizer Sabine Hossenfelder pointed out that there are two senses of “publish” getting confused here: publish as in “make public” and publish as in “put in a scientific journal”. The latter tends to be necessary for scientists in practice, but is not required in principle. (The way journals work has changed a lot over just the last century!) The former, Sabine argued, is still 100% necessary.

Plenty of people on twitter still disagreed (this always happens). It got me thinking a bit about the role of publication in science.

When we talk about what science requires or doesn’t require, what are we actually talking about?

“Science” is a word, and like any word its meaning is determined by how it is used. Scientists use the word “science” of course, as do schools and governments and journalists. But if we’re getting into arguments about what does or does not count as science, then we’re asking about a philosophical problem, one in which philosophers of science try to understand what counts as science and what doesn’t.

What do philosophers of science want? Many things, but a big one is to explain why science works so well. Over a few centuries, humanity went from understanding the world in terms of familiar materials and living creatures to decomposing them in terms of molecules and atoms and cells and proteins. In doing this, we radically changed what we were capable of, computers out of the reach of blacksmiths and cures for diseases that weren’t even distinguishable. And while other human endeavors have seen some progress over this time (democracy, human rights…), science’s accomplishment demands an explanation.

Part of that explanation, I think, has to include making results public. Alchemists were interested in many of the things later chemists were, and had started to get some valuable insights. But alchemists were fearful of what their knowledge would bring (especially the ones who actually thought they could turn lead into gold). They published almost only in code. As such, the pieces of progress they made didn’t build up, didn’t aggregate, didn’t become overall progress. It was only when a new scientific culture emerged, when natural philosophers and physicists and chemists started writing to each other as clearly as they could, that knowledge began to build on itself.

Some on twitter pointed out the example of the Manhattan project during World War II. A group of scientists got together and made progress on something almost entirely in secret. Does that not count as science?

I’m willing to bite this bullet: I don’t think it does! When the Soviets tried to replicate the bomb, they mostly had to start from scratch, aside from some smuggled atomic secrets. Today, nations trying to build their own bombs know more, but they still must reinvent most of it. We may think this is a good thing, we may not want more countries to make progress in this way. But I don’t think we can deny that it genuinely does slow progress!

At the same time, to contradict myself a bit: I think you can think of science that happens within a particular community. The scientists of the Manhattan project didn’t publish in journals the Soviets could read. But they did write internal reports, they did publish to each other. I don’t think science by its nature has to include the whole of humanity (if it does, then perhaps studying the inside of black holes really is unscientific). You probably can do science sticking to just your own little world. But it will be slower. Better, for progress’s sake, if you can include people from across the world.

The Impact of Jim Simons

The obituaries have been weirdly relevant lately.

First, a couple weeks back, Daniel Dennett died. Dennett was someone who could have had a huge impact on my life. Growing up combatively atheist in the early 2000’s, Dennett seemed to be exploring every question that mattered: how the semblance of consciousness could come from non-conscious matter, how evolution gives rise to complexity, how to raise a new generation to grow beyond religion and think seriously about the world around them. I went to Tufts to get my bachelor’s degree based on a glowing description he wrote in the acknowledgements of one of his books, and after getting there, I asked him to be my advisor.

(One of three, because the US education system, like all good games, can be min-maxed.)

I then proceeded to be far too intimidated to have a conversation with him more meaningful than “can you please sign my registration form?”

I heard a few good stories about Dennett while I was there, and I saw him debate once. I went into physics for my PhD, not philosophy.

Jim Simons died on May 10. I never spoke to him at all, not even to ask him to sign something. But he had a much bigger impact on my life.

I began my PhD at SUNY Stony Brook with a small scholarship from the Simons Foundation. The university’s Simons Center for Geometry and Physics had just opened, a shining edifice of modern glass next to the concrete blocks of the physics and math departments.

For a student aspiring to theoretical physics, the Simons Center virtually shouted a message. It taught me that physics, and especially theoretical physics, was something prestigious, something special. That if I kept going down that path I could stay in that world of shiny new buildings and daily cookie breaks with the occasional fancy jar-based desserts, of talks by artists and a café with twenty-dollar lunches (half-price once a week for students, the only time we could afford it, and still about twice what we paid elsewhere on campus). There would be garden parties with sushi buffets and late conference dinners with cauliflower steaks and watermelon salads. If I was smart enough (and I longed to be smart enough), that would be my future.

Simons and his foundation clearly wanted to say something along those lines, if not quite as filtered by the stars in a student’s eyes. He thought that theoretical physics, and research more broadly, should be something prestigious. That his favored scholars deserved more, and should demand more.

This did have weird consequences sometimes. One year, the university charged us an extra “academic excellence fee”. The story we heard was that Simons had demanded Stony Brook increase its tuition in order to accept his donations, so that it would charge more similarly to more prestigious places. As a state university, Stony Brook couldn’t do that…but it could add an extra fee. And since PhD students got their tuition, but not fees, paid by the department, we were left with an extra dent in our budgets.

The Simons Foundation created Quanta Magazine. If the Simons Center used food to tell me physics mattered, Quanta delivered the same message to professors through journalism. Suddenly, someone was writing about us, not just copying press releases but with the research and care of an investigative reporter. And they wrote about everything: not just sci-fi stories and cancer cures but abstract mathematics and the space of quantum field theories. Professors who had spent their lives straining to capture the public’s interest suddenly were shown an audience that actually wanted the real story.

In practice, the Simons Foundation made its decisions through the usual experts and grant committees. But the way we thought about it, the decisions always had a Jim Simons flavor. When others in my field applied for funding from the Foundation, they debated what Simons would want: would he support research on predictions for the LHC and LIGO? Or would he favor links to pure mathematics, or hints towards quantum gravity? Simons Collaboration Grants have an enormous impact on theoretical physics, dwarfing many other sources of funding. A grant funds an army of postdocs across the US, shifting the priorities of the field for years at a time.

Denmark has big foundations that have an outsize impact on science. Carlsberg, Villum, and the bigger-than-Denmark’s GDP Novo Nordisk have foundations with a major influence on scientific priorities. But Denmark is a country of six million. It’s much harder to have that influence on a country of three hundred million. Despite that, Simons came surprisingly close.

While we did like to think of the Foundation’s priorities as Simons’, I suspect that it will continue largely on the same track without him. Quanta Magazine is editorially independent, and clearly puts its trust in the journalists that made it what it is today.

I didn’t know Simons, I don’t think I even ever smelled one of his famous cigars. Usually, that would be enough to keep me from writing a post like this. But, through the Foundation, and now through Quanta, he’s been there with me the last fourteen years. That’s worth a reflection, at the very least.

Getting It Right vs Getting It Done

With all the hype around machine learning, I occasionally get asked if it could be used to make predictions for particle colliders, like the LHC.

Physicists do use machine learning these days, to be clear. There are tricks and heuristics, ways to quickly classify different particle collisions and speed up computation. But if you’re imagining something that replaces particle physics calculations entirely, or even replace the LHC itself, then you’re misunderstanding what particle physics calculations are for.

Why do physicists try to predict the results of particle collisions? Why not just observe what happens?

Physicists make predictions not in order to know what will happen in advance, but to compare those predictions to experimental results. If the predictions match the experiments, that supports existing theories like the Standard Model. If they don’t, then a new theory might be needed.

Those predictions certainly don’t need to be made by humans: most of the calculations are done by computers anyway. And they don’t need to be perfectly accurate: in particle physics, every calculation is an approximation. But the approximations used in particle physics are controlled approximations. Physicists keep track of what assumptions they make, and how they might go wrong. That’s not something you can typically do in machine learning, where you might train a neural network with millions of parameters. The whole point is to be able to check experiments against a known theory, and we can’t do that if we don’t know whether our calculation actually respects the theory.

That difference, between caring about the result and caring about how you got there, is a useful guide. If you want to predict how a protein folds in order to understand what it does in a cell, then you will find AlphaFold useful. If you want to confirm your theory of how protein folding happens, it will be less useful.

Some industries just want the final result, and can benefit from machine learning. If you want to know what your customers will buy, or which suppliers are cheating you, or whether your warehouse is moldy, then machine learning can be really helpful.

Other industries are trying, like particle physicists, to confirm that a theory is true. If you’re running a clinical trial, you want to be crystal clear about how the trial data turn into statistics. You, and the regulators, care about how you got there, not just about what answer you got. The same can be true for banks: if laws tell you you aren’t allowed to discriminate against certain kinds of customers for loans, you need to use a method where you know what traits you’re actually discriminating against.

So will physicists use machine learning? Yes, and more of it over time. But will they use it to replace normal calculations, or replace the LHC? No, that would be missing the point.