Tag Archives: DoingScience

Hypothesis: If AI Is Bad at Originality, It’s a Documentation Problem

Recently, a few people have asked me about this paper.

A couple weeks back, OpenAI announced a collaboration with a group of amplitudes researchers, physicists who study the types of calculations people do to make predictions at particle colliders. The amplitudes folks had identified an interesting loophole, finding a calculation that many would have expected to be zero actually gave a nonzero answer. They did the calculation for different examples involving more and more particles, and got some fairly messy answers. They suspected, as amplitudes researchers always expect, that there was a simpler formula, one that worked for any number of particles. But they couldn’t find it.

Then a former amplitudes researcher at OpenAI suggested that they use AI to find it.

“Use AI” can mean a lot of different things, and most of them don’t look much like the way the average person talks to ChatGPT. This was closer than most. They were using “reasoning models”, loops that try to predict the next few phrases in a “chain of thought” again and again and again. Using that kind of tool, they were able to find that simpler formula, and mathematically prove that it was correct.

A few of you are hoping for an in-depth post about what they did, and its implications. This isn’t that. I’m still figuring out if I’ll be writing that for an actual news site, for money, rather than free, for you folks.

Instead, I want to talk about a specific idea I’ve seen crop up around the paper.

See, for some, the existence of a result like this isn’t all that surprising.

Mathematicians have been experimenting with reasoning models for a bit, now. Recently, a group published a systematic study, setting the AI loose on a database of minor open problems proposed by the famously amphetamine-fueled mathematician Paul Erdös. The AI managed to tackle a few of the problems, sometimes by identifying existing solutions that had not yet been linked to the problem database, but sometimes by proofs that appeared to be new.

The Erdös problems solved by the AI were not especially important. Neither was the problem solved by the amplitudes researchers, as far as I can tell at this point.

But I get the impression the amplitudes problem was a bit more interesting than the Erdös problems. The difference, so far, has mostly been attributed to human involvement. This amplitudes paper started because human amplitudes researchers found an interesting loophole, and only after that used the AI. Unlike the mathematicians, they weren’t just searching a database.

This lines up with a general point, one people tend to make much less carefully. It’s often said that, unlike humans, AI will never be truly creative. It can solve mechanical problems, do things people have done before, but it will never be good at having truly novel ideas.

To me, that line of thinking goes a bit too far. I suspect it’s right on one level, that it will be hard for any of these reasoning models to propose anything truly novel. But if so, I think it will be for a different reason.

The thing is, creativity is not as magical as we make it out to be. Our ideas, scientific or artistic, don’t just come from the gods. They recombine existing ideas, shuffling them in ways more akin to randomness than miracle. They’re then filtered through experience, deep heuristics honed over careers. Some people are good at ideas, and some are bad at them. Having ideas takes work, and there are things people do to improve their ideas. Nothing about creativity suggests it should be impossible to mechanize.

However, a machine trained on text won’t necessarily know how to do any of that.

That’s because in science, we don’t write down our inspirations. By the time a result gets into a scientific paper or textbook, it’s polished and refined into a pure argument, cutting out most of the twists and turns that were an essential part of the creative process. Mathematics is even worse, most math papers don’t even mention the motivation behind the work, let alone the path taken to the paper.

This lack of documentation makes it hard for students, making success much more a function of having the right mentors to model good practices, rather than being able to pick them up from literature everyone can access. I suspect it makes it even harder for language models. And if today’s language model-based reasoning tools are bad at that crucial, human-seeming step, of coming up with the right idea at the right time? I think that has more to do with this lack of documentation, than with the fact that they’re “statistical parrots”.

Most Academics Don’t Choose Their Specialty

4 Replies

It’s there in every biography, and many interviews: the moment the scientist falls in love with an idea. It can be a kid watching ants in the backyard, a teen peering through a telescope, or an undergrad seeing a heart cell beat on a slide. It’s a story so common that it forms the heart of the public idea of a scientist: not just someone smart enough to understand the world, but someone passionate enough to dive in to their one particular area above all else. It’s easy to think of it as a kind of passion most people never get to experience.

And it does happen, sometimes. But it’s a lot less common than you’d think.

I first started to suspect this as a PhD student. In the US, getting accepted into a PhD program doesn’t guarantee you an advisor to work with. You have to impress a professor to get them to spend limited time and research funding on you. In practice, the result was the academic analog of the dating scene. Students looked for who they might have a chance with, based partly on interest but mostly on availability and luck and rapport, and some bounced off many potential mentors before finding one that would stick.

Then, for those who continued to postdoctoral positions, the same story happened all over again. Now, they were applying for jobs, looking for positions where they were qualified enough and might have some useful contacts, with interest into the specific research topic at best a distant third.

Working in the EU, I’ve seen the same patterns, but offset a bit. Students do a Master’s thesis, and the search for a mentor there is messy and arbitrary in similar ways. Then for a PhD, they apply for specific projects elsewhere, and as each project is its own funded position the same job search dynamics apply.

The picture only really clicked for me, though, when I started doing journalism.

Nowadays, I don’t do science, I interview people about it. The people I interview are by and large survivors: people who got through the process of applying again and again and now are sitting tight in an in-principle permanent position. They’re people with a lot of freedom to choose what to do.

And so I often ask for that reason, that passion, that scientific love at first sight moment: why do you study what you do? It’s a story that audiences love, and thus that editors love, it’s always a great way to begin a piece.

But surprisingly often, I get an unromantic answer. Why study this? Because it was available. Because in the Master’s, that professor taught the intro course. Because in college, their advisor had contacts with that lab to arrange a study project. Because that program accepted people from that country.

And I’ve noticed how even the romantic answers tend to be built on the unromantic ones. The professors who know how to weave a story, to self-promote and talk like a politician, they’ll be able to tell you about falling in love with something, sure. But if you read between the lines, you’ll notice where their anecdotes fall, how they trace a line through the same career steps that less adroit communicators admit were the real motivation.

There’s been times I’ve thought that my problem was a lack of passion, that I wasn’t in love the same way other scientists were in love. I’ve even felt guilty, that I took resources and positions from people who were. There is still some truth in that guilt, I don’t think I had the same passion for my science as most of my colleagues.

But I appreciate more now, that that passion is in part a story. We don’t choose our specialty, making some grand agentic move. Life chooses for us. And the romance comes in how you tell that story, after the fact.

A Paper With a Bluesky Account

2 Replies

People make social media accounts for their pets. Why not a scientific paper?

Anthropologist Ed Hagen made a Bluesky account for his recent preprint, “Menopause averted a midlife energetic crisis with help from older children and parents: A simulation study.” The paper’s topic itself is interesting (menopause is surprisingly rare among mammals, he has a plausible account as to why), but not really the kind of thing I cover here.

Rather, it’s his motivation that’s interesting. Hagen didn’t make the account out of pure self-promotion or vanity. Instead, he’s promoting it as a novel approach to scientific publishing. Unlike Twitter, Bluesky is based on an open, decentralized protocol. Anyone can host an account compatible with Bluesky on their own computer, and anyone with the programming know-how can build a computer program that reads Bluesky posts. That means that nothing actually depends on Bluesky, in principle: the users have ultimate control.

Hagen’s idea, then, is that this could be a way to fulfill the role of scientific journals without channeling money and power to for-profit publishers. If each paper is hosted on a scientist’s own site, the papers can link to each other via following each other. Scientists on Bluesky can follow or like the paper, or comment on and discuss it, creating a way to measure interest from the scientific community and aggregate reviews, two things journals are supposed to cover.

I must admit, I’m skeptical. The interface really seems poorly-suited for this. Hagen’s paper’s account is called @menopause-preprint.edhagen.net. What happens when he publishes another paper on menopause, what will he call it? How is he planning to keep track of interactions from other scientists with an account for every single paper, won’t swapping between fifteen Bluesky accounts every morning get tedious? Or will he just do this with papers he wants to promote?

I applaud the general idea. Decentralized hosting seems like a great way to get around some of the problems of academic publishing. But this will definitely take a lot more work, if it’s ever going to be viable on a useful scale.

Still, I’ll keep an eye on it, and see if others give it a try. Stranger things have happened.

What You’re Actually Scared of in Impostor Syndrome

2 Replies

Academics tend to face a lot of impostor syndrome. Something about a job with no clear criteria for success, where you could always in principle do better and you mostly only see the cleaned-up, idealized version of others’ work, is a recipe for driving people utterly insane with fear.

The way most of us talk about that fear, it can seem like a cognitive bias, like a failure of epistemology. “Competent people think they’re less competent than they are,” the less-discussed half of the Dunning-Kruger effect.

(I’ve talked about it that way before. And, in an impostor-syndrome-inducing turn of events, I got quoted in a news piece in Nature about it.)

There’s something missing in that perspective, though. It doesn’t really get across how impostor syndrome feels. There’s something very raw about it, something that feels much more personal and urgent than an ordinary biased self-assessment.

To get at the core of it, let me ask a question: what happens to impostors?

The simple answer, the part everyone will admit to, is to say they stop getting grants, or stop getting jobs. Someone figures out they can’t do what they claim, and stops choosing them to receive limited resources. Pretty much anyone with impostor syndrome will say that they fear this: the moment that they reach too far, and the world decides they aren’t worth the money after all.

In practice, it’s not even clear that that happens. You might have people in your field who are actually thought of as impostors, on some level. People who get snarked about behind their back, people where everyone rolls their eyes when they ask a question at a conference and the question just never ends. People who are thought of as shiny storytellers without substance, who spin a tale for journalists but aren’t accomplishing anything of note. Those people…aren’t facing consequences at all, really! They keep getting the grants, they keep finding the jobs, and the ranks of people leaving for industry are instead mostly filled with those you respect.

Instead, I think what we fear when we feel impostor syndrome isn’t the obvious consequence, or even the real consequence, but something more primal. Primatologists and psychologists talk about our social brain, and the role of ostracism. They talk about baboons who piss off the alpha and get beat up and cast out of the group, how a social animal on their own risks starvation and becomes easy prey for bigger predators.

I think when we wake up in a cold sweat remembering how we had no idea what that talk was about, and were too afraid to ask, it’s a fear on that level that’s echoing around in our heads. That the grinding jags of adrenaline, the run-away-and-hide feeling of never being good enough, the desperate unsteadiness of trying to sound competent when you’re sure that you’re not and will get discovered at any moment…that’s not based on any realistic fears about what would happen if you got caught. That’s your monkey-brain, telling you a story drilled down deep by evolution.

Does that help? I’m not sure. If you manage to tell your inner monkey that it won’t get eaten by a lion if its friends stop liking it, let me know!

I Have a Theory

1 Reply

“I have a theory,” says the scientist in the book. But what does that mean? What does it mean to “have” a theory?

First, there’s the everyday sense. When you say “I have a theory”, you’re talking about an educated guess. You think you know why something happened, and you want to check your idea and get feedback. A pedant would tell you you don’t really have a theory, you have a hypothesis. It’s “your” hypothesis, “your theory”, because it’s what you think happened.

The pedant would insist that “theory” means something else. A theory isn’t a guess, even an educated guess. It’s an explanation with evidence, tested and refined in many different contexts in many different ways, a whole framework for understanding the world, the most solid knowledge science can provide. Despite the pedant’s insistence, that isn’t the only way scientists use the word “theory”. But it is a common one, and a central one. You don’t really “have” a theory like this, though, except in the sense that we all do. These are explanations with broad consensus, things you either know of or don’t, they don’t belong to one person or another.

Except, that is, if one person takes credit for them. We sometimes say “Darwin’s theory of evolution”, or “Einstein’s theory of relativity”. In that sense, we could say that Einstein had a theory, or that Darwin had a theory.

Sometimes, though, “theory” doesn’t mean this standard official definition, even when scientists say it. And that changes what it means to “have” a theory.

For some researchers, a theory is a lens with which to view the world. This happens sometimes in physics, where you’ll find experts who want to think about a situation in terms of thermodynamics, or in terms of a technique called Effective Field Theory. It happens in mathematics, where some choose to analyze an idea with category theory not to prove new things about it, but just to translate it into category theory lingo. It’s most common, though, in the humanities, where researchers often specialize in a particular “interpretive framework”.

For some, a theory is a hypothesis, but also a pet project. There are physicists who come up with an idea (maybe there’s a variant of gravity with mass! maybe dark energy is changing!) and then focus their work around that idea. That includes coming up with ways to test whether the idea is true, showing the idea is consistent, and understanding what variants of the idea could be proposed. These ideas are hypotheses, in that they’re something the scientist thinks could be true. But they’re also ideas with many moving parts that motivate work by themselves.

Taken to the extreme, this kind of “having” a theory can go from healthy science to political bickering. Instead of viewing an idea as a hypothesis you might or might not confirm, it can become a platform to fight for. Instead of investigating consistency and proposing tests, you focus on arguing against objections and disproving your rivals. This sometimes happens in science, especially in more embattled areas, but it happens much more often with crackpots, where people who have never really seen science done can decide it’s time for their idea, right or wrong.

Finally, sometimes someone “has” a theory that isn’t a hypothesis at all. In theoretical physics, a “theory” can refer to a complete framework, even if that framework isn’t actually supposed to describe the real world. Some people spend time focusing on a particular framework of this kind, understanding its properties in the hope of getting broader insights. By becoming an expert on one particular theory, they can be said to “have” that theory.

Bonus question: in what sense do string theorists “have” string theory?

You might imagine that string theory is an interpretive framework, like category theory, with string theorists coming up with the “string version” of things others understand in other ways. This, for the most part, doesn’t happen. Without knowing whether string theory is true, there isn’t much benefit in just translating other things to string theory terms, and people for the most part know this.

For some, string theory is a pet project hypothesis. There is a community of people who try to get predictions out of string theory, or who investigate whether string theory is consistent. It’s not a huge number of people, but it exists. A few of these people can get more combative, or make unwarranted assumptions based on dedication to string theory in particular: for example, you’ll see the occasional argument that because something is difficult in string theory it must be impossible in any theory of quantum gravity. You see a spectrum in the community, from people for whom string theory is a promising project to people for whom it is a position that needs to be defended and argued for.

For the rest, the question of whether string theory describes the real world takes a back seat. They’re people who “have” string theory in the sense that they’re experts, and they use the theory primarily as a mathematical laboratory to learn broader things about how physics works. If you ask them, they might still say that they hypothesize string theory is true. But for most of these people, that question isn’t central to their work.

Integration by Parts, Evolved

5 Replies

I posted what may be my last academic paper today, about a project I’ve been working on with Matthias Wilhelm for most of the last year. The paper is now online here. For me, the project has been a chance to broaden my horizons, learn new skills, and start to step out of my academic comfort zone. For Matthias, I hope it was grant money well spent.

I wanted to work on something related to machine learning, for the usual trendy employability reasons. Matthias was already working with machine learning, but was interested in pursuing a different question.

When is machine learning worthwhile? Machine learning methods are heuristics, unreliable methods that sometimes work well. You don’t use a heuristic if you have a reliable method that runs fast enough. But if all you have are heuristics to begin with, then machine learning can give you a better heuristic.

Matthias noticed a heuristic embedded deep in how we do particle physics, and guessed that we could do better. In particle physics, we use pictures called Feynman diagrams to predict the probabilities for different outcomes of collisions, comparing those predictions to observation to look for evidence of new physics. Each Feynman diagram corresponds to an integral, and for each calculation there are hundreds, thousands, or even millions of those integrals to do.

Luckily, physicists don’t actually have to do all those integrals. It turns out that most of them are related, by a slightly more advanced version of that calculus class mainstay, integration by parts. Using integration by parts you can solve a list of equations, finding out how to write your integrals in terms of a much smaller list.

How big a list of equations do you need, and which ones? Twenty-five years ago, Stefano Laporta proposed a “golden rule” to choose, based on his own experience, and people have been using it (more or less, with their own tweaks) since then.

Laporta’s rule is a heuristic, with no proof that it is the best option, or even that it will always work. So we probably shouldn’t have been surprised when someone came up with a better heuristic. Watching talks at a December 2023 conference, Matthias saw a presentation by Johann Usovitsch on a curious new rule. The rule was surprisingly simple, just one extra condition on top of Laporta’s. But it was enough to reduce the number of equations by a factor of twenty.

That’s great progress, but it’s also a bit frustrating. Over almost twenty-five years, no-one had guessed this one simple change?

Maybe, thought Matthias and I, we need to get better at guessing.

We started out thinking we’d try reinforcement learning, a technique where a machine is trained by playing a game again and again, changing its strategy when that strategy brings it a reward. We thought we could have the machine learn to cut away extra equations, getting rewarded if it could cut more while still getting the right answer. We didn’t end up pursuing this very far before realizing another strategy would be a better fit.

What is a rule, but a program? Laporta’s golden rule and Johann’s new rule could both be expressed as simple programs. So we decided to use a method that could guess programs.

One method stood out for sheer trendiness and audacity: FunSearch. FunSearch is a type of algorithm called a genetic algorithm, which tries to mimic evolution. It makes a population of different programs, “breeds” them with each other to create new programs, and periodically selects out the ones that perform best. That’s not the trendy or audacious part, though, people have been doing that sort of genetic programming for a long time.

The trendy, audacious part is that FunSearch generates these programs with a Large Language Model, or LLM (the type of technology behind ChatGPT). Using an LLM trained to complete code, FunSearch presents the model with two programs labeled v0 and v1 and asks it to complete v2. In general, program v2 will have some traits from v0 and v1, but also a lot of variation due to the unpredictable output of LLMs. The inventors of FunSearch used this to contribute the variation needed for evolution, using it to evolve programs to find better solutions to math problems.

We decided to try FunSearch on our problem, modifying it a bit to fit the case. We asked it to find a shorter list of equations, giving a better score for a shorter list but a penalty if the list wasn’t able to solve the problem fully.

Some tinkering and headaches later, it worked! After a few days and thousands of program guesses, FunSearch was able to find a program that reproduced the new rule Johann had presented. A few hours more, and it even found a rule that was slightly better!

But then we started wondering: do we actually need days of GPU time to do this?

An expert on heuristics we knew had insisted, at the beginning, that we try something simpler. The approach we tried then didn’t work. But after running into some people using genetic programming at a conference last year, we decided to try again, using a Python package they used in their work. This time, it worked like a charm, taking hours rather than days to find good rules.

This was all pretty cool, a great opportunity for me to cut my teeth on Python programming and its various attendant skills. And it’s been inspiring, with Matthias drawing together more people interested in seeing just how much these kinds of heuristic methods can do there. I should be clear though, that so far I don’t think our result is useful. We did better than the state of the art on an example, but only slightly, and in a way that I’d guess doesn’t generalize. And we needed quite a bit of overhead to do it. Ultimately, while I suspect there’s something useful to find in this direction, it’s going to require more collaboration, both with people using the existing methods who know better what the bottlenecks are, and with experts in these, and other, kinds of heuristics.

So I’m curious to see what the future holds. And for the moment, happy that I got to try this out!

Physics Gets Easier, Then Harder

Lack of Recognition Is a Symptom, Not a Cause

The Bystander Effect for Reviewers

3 Replies

I probably came off last week as a bit of an extreme “journal abolitionist”. This week, I wanted to give a couple caveats.

First, as a commenter pointed out, the main journals we use in my field are run by nonprofits. Physical Review Letters, the journal where we publish five-page papers about flashy results, is run by the American Physical Society. The Journal of High-Energy Physics, where we publish almost everything else, is run by SISSA, the International School for Advanced Studies in Trieste. (SISSA does use Springer, a regular for-profit publisher, to do the actual publishing.)

The journals are also funded collectively, something I pointed out here before but might not have been obvious to readers of last week’s post. There is an agreement, SCOAP3, where research institutions band together to pay the journals. Authors don’t have to pay to publish, and individual libraries don’t have to pay for subscriptions.

And this is a lot better than the situation in other fields, yeah! Though I’d love to quantify how much. I haven’t been able to find a detailed breakdown, but SCOAP3 pays around 1200 EUR per article published. What I’d like to do (but not this week) is to compare this to what other fields pay, as well as to publishing that doesn’t have the same sort of trapped audience, and to online-only free journals like SciPost. (For example, publishing actual physical copies of journals at this point is sort of a vanity thing, so maybe we should compare costs to vanity publishers?)

Second, there’s reviewing itself. Even without traditional journals, one might still want to keep peer review.

What I wanted to understand last week was what peer review does right now, in my field. We read papers fresh off the arXiv, before they’ve gone through peer review. Authors aren’t forced to update the arXiv with the journal version of their paper, if they want another version, even if that version was rejected by the reviewers, then they’re free to do so, and most of us wouldn’t notice. And the sort of in-depth review that happens in peer review also happens without it. When we have journal clubs and nominate someone to present a recent paper, or when we try to build on a result or figure out why it contradicts something we thought we knew, we go through the same kind of in-depth reading that (in the best cases) reviewers do.

But I think I’ve hit upon something review does that those kinds of informal things don’t. It gets us to speak up about it.

I presented at a journal club recently. I read through a bombastic new paper, figured out what I thought was wrong with it, and explained it to my colleagues.

But did I reach out to the author? No, of course not, that would be weird.

Psychologists talk about the bystander effect. If someone collapses on the street, and you’re the only person nearby, you’ll help. If you’re one of many, you’ll wait and see if someone else helps instead.

I think there’s a bystander effect for correcting people. If someone makes a mistake and publishes something wrong, we’ll gripe about it to each other. But typically, we won’t feel like it’s our place to tell the author. We might get into a frustrating argument, there wouldn’t be much in it for us, and it might hurt our reputation if the author is well-liked.

(People do speak up when they have something to gain, of course. That’s why when you write a paper, most of the people emailing you won’t be criticizing the science: they’ll be telling you you need to cite them.)

Peer review changes the expectations. Suddenly, you’re expected to criticize, it’s your social role. And you’re typically anonymous, you don’t have to worry about the consequences. It becomes a lot easier to say what you really think.

(It also becomes quite easy to say lazy stupid things, of course. This is why I like setups like SciPost, where reviews are made public even when the reviewers are anonymous. It encourages people to put some effort in, and it means that others can see that a paper was rejected for bad reasons and put less stock in the rejection.)

I think any new structure we put in place should keep this feature. We need to preserve some way to designate someone a critic, to give someone a social role that lets them let loose and explain why someone else is wrong. And having these designated critics around does help my field. The good criticisms get implemented in the papers, the authors put the new versions up on arXiv. Reviewing papers for journals does make our science better…even if none of us read the journal itself.

HAMLET-Physics 2024

1 Reply

Back in January, I announced I was leaving France and leaving academia. Since then, it hasn’t made much sense for me to go to conferences, even the big conference of my sub-field or the conference I organized.

I did go to a conference this week, though. I had two excuses:

The conference was here in Copenhagen, so no travel required.
The conference was about machine learning.

HAMLET-Physics, or How to Apply Machine Learning to Experimental and Theoretical Physics, had the additional advantage of having an amusing acronym. Thanks to generous support by Carlsberg and the Danish Data Science Academy, they could back up their choice by taking everyone on a tour of Kronborg (better known in the English-speaking world as Elsinore).

This conference’s purpose was to bring together physicists who use machine learning, machine learning-ists who might have something useful to say to those physicists, and other physicists who don’t use machine learning yet but have a sneaking suspicion they might have to at some point. As a result, the conference was super-interdisciplinary, with talks by people addressing very different problems with very different methods.

Interdisciplinary conferences are tricky. It’s easy for the different groups of people to just talk past each other: everyone shows up, gives the same talk they always do, socializes with the same friends they always meet, then leaves.

There were a few talks that hit that mold, and were so technical only a few people understood. But most were better. The majority of the speakers did really well at presenting their work in a way that would be understandable and even exciting to people outside their field, while still having enough detail that we all learned something. I was particularly impressed by Thea Aarestad’s keynote talk on Tuesday, a really engaging view of how machine learning can be used under the extremely tight time constraints LHC experiments need to decide whether to record incoming data.

For the social aspect, the organizers had a cute/gimmicky/machine-learning-themed solution. Based on short descriptions and our public research profiles, they clustered attendees, plotting the connections between them. They then used ChatGPT to write conversation prompts between any two people on the basis of their shared interests. In practice, this turned out to be amusing but totally unnecessary. We were drawn to speak to each other not by conversation prompts, but by a drive to learn from each other. “Why do you do it that way?” was a powerful conversation-starter, as was “what’s the best way to do this?” Despite the different fields, the shared methodologies gave us strong reasons to talk, and meant that people were very rarely motivated to pick one of ChatGPT’s “suggestions”.

Overall, I got a better feeling for how machine learning is useful in physics (and am planning a post on that in future). I also got some fresh ideas for what to do myself, and a bit of a picture of what the future holds in store.

4 gravitons

Stories about physics from someone who's been there