Monthly Archives: May 2026

Doing Things Well Is an International Activity

In the US, funding agencies seem to be increasingly opposed to an often inevitable feature of good science: international collaboration. Scientists have been told by officials at the National Institutes of Health that they need to remove mention of foreign collaborators from progress reports, or that they need to avoid such collaborations to begin with. At NASA, officials have told scientists that rather than just avoiding funding work in China, they should actively avoid collaborating with Chinese researchers. And a recently introduced bill would make that restriction more explicit.

I have a general policy against discussing concrete political issues on this blog, so I’m not going to dig into the details of who’s doing what here, how far it’s going or how novel it is. That policy extends to the comments. If you mention specific laws, politicians, or political parties, I will delete your comment.

I do want to say something more general, though. I think people often underestimate just how important international collaboration is.

I’ve talked before about how scientific specialization spreads scientists around the world. Scientists want to work with people who work on their specific interests, and there are often only a few people that fit that description. So people move across the world, creating centers of expertise.

More than that, though, essentially any activity, done well, is done internationally. The better you want to perform, the more likely it is that the best collaborator will be someone in another country.

People don’t notice this as much as they could, because they’re used to the exceptions. Popular art is often siloed by language and cultural references. Sports are intentionally set up as competitions between regions and nations, and militaries compete as a practical necessity. But without those exceptions, international competition wins out. The best doctor, the best classical musician, and the best businessperson for a job can’t be expected to come from one country or another. Those fields, like science, are international.

When that internationalism is weak, it’s a warning sign. Without that drive to succeed on an international stage, scientists get lazy. There are countries with a history of academic cronyism, where universities were run more on interpersonal politics than scholarly merit, cozy fiefdoms where prominent academics dole out positions. To combat this, policymakers work to make their research systems more international. They explicitly ask about international collaborations and participation in international conferences in grant applications, not to discourage them, but to encourage them: to reward academics who show merit on the international stage and break up lazy patronage networks.

It worries me that it sounds like some US policymakers want to do the opposite. People are increasingly worried about bias and groupthink in the sciences, and increasingly mad that scientists could be wasting the public’s money to maintain a cushy lifestyle. International collaboration is how you hold scientists to account, how you force them to compete and show their merit. If you drop that, academia is going to get a whole lot worse.

ArXiv Will Ban You for Hallucinated References

7 Replies

Thomas Dietterich, Chair of the Computer Science section of the preprint server arXiv.org, recently clarified the site’s policies towards “hallucinated” citations and other signs of careless use of AI in a post on X. If your paper contains a citation to a paper that doesn’t actually exist, or has other signs you didn’t read it before posting like leftover commentary (the example he gave was “here is a 200 word summary; would you like me to make any changes?”), then you can get banned from the arXiv for one year. Even after that year you’d be on a kind of “probation”, and would need to show that your next few papers had been accepted by peer-reviewed journals first before posting them.

At the risk of saying the obvious, this is a good idea! arXiv isn’t peer review, it isn’t meant to judge the value of the papers it hosts. But it still needs to be a useful place for scientists to post their papers, which is why they try to keep spam and irrelevant content to a minimum. If you don’t actually endorse the content of a paper, you shouldn’t post it in the first place.

That said, the whole existence of hallucinated citations on arXiv feels a little silly. It makes sense for academic journals and preprint servers in other fields. But arXiv was the first site of its kind for a reason. Its users, physicists, mathematicians, and computer scientists, don’t need much hand-holding when it comes to computers. Papers submitted to arXiv aren’t typically written in Word, they’re written in a document-writing language called LaTeX, that lets users make decently-formatted papers without help from a journal. Physicist-written code may be terrible by any reasonable criteria…but it exists, much more universally than for example biologist-written code.

This extends to citations. In my old field, there is a database called INSPIRE that updates automatically from arXiv. Click on a paper, and a handy “cite” link gives you standardized citations in several formats, ready to copy and paste into your LaTeX code. Nearly every citation in my papers is copied from there. The ones that aren’t are either from other fields where I didn’t know of that style of database, or things that haven’t been published (this can be manuscripts in preparation, or personal communications).

All of this, though, feels like a lot less than what the field could be doing. In a world where almost everyone posts their papers to the same website, and almost everyone has at least a rudimentary understanding of programming…why are people still writing citations in free-form text in the first place? Why aren’t citations built in to the submitted papers on arXiv, automatically linked to the papers they cite? Why don’t we have a setup where, except for a small number of “special” citations, every citation is built so that it automatically goes to a real paper, and gives a clear error message if it doesn’t? In short, why are hallucinated citations even possible?

Look, I’m naive, I get that. I believe in automation, not in the modern context of LLMs and other heuristics, but in setting clear procedures and building clear rules. The world doesn’t work that way! The clear rules are always more contentious than you expect, the fuzzy human-led version always the only choice people can agree on.

But still. Citations. There has to be a better system, right?

Make No Mistakes

9 Replies

I’m taking a Danish exam next week, and it’s a big one, a culmination of years learning the language. My classmates are stressed. Despite how much we’ve learned, it feels like we’re always making little mistakes. We write the wrong prepositions, put verbs in the wrong form, or mess up the order of words in a sentence. And while we should have time to check our work, that doesn’t help as much as it should. If we don’t notice a mistake the first time around, what guarantee is there that we notice it on the next read, or the next? Too many checks and we can even end up second-guessing ourselves, “correcting” something that was right to begin with.

It’s given me some sympathy for AI.

Earlier this month, investor Marc Andreessen posted a custom prompt he inputs when using AI, which was immediately mocked.

Current AI custom prompt:

You are a world class expert in all domains. Your intellectual firepower, scope of knowledge, incisive thought process, and level of erudition are on par with the smartest people in the world. Answer with complete, detailed, specific answers. Process…
— Marc Andreessen 🇺🇸 (@pmarca) May 4, 2026

The silliest instruction, according to many critics, was to “Never hallucinate or make anything up.” It’s similar to a prompt that’s become a meme used to make fun of AI-using “vibe coders”, “Make no mistakes”.

Experts point out that this is just not how AI works. Large language model-powered programs like ChatGPT are inherently random, producing text largely based on its similarity to other text. “Hallucinations” or “mistakes” are an inevitable feature of the technology, and a prompt like Andreessen wrote isn’t a set of instructions the AI will follow without error: it’s just another part of the text the AI is trying to generate.

All that said, telling an AI to “make no mistakes” should have some effect. But it likely won’t be what you want.

The best way I’ve found to understand AI is in terms of stories. Chatbots like ChatGPT take a large language model, a mathematical formula for how words are most likely to appear in a text, and warp it, twisting it to almost always produce one particular kind of text: one half of a dialogue with a fictional AI assistant. This twisted formula determines how the AI responds to your prompts, but these days it also is used behind the scenes, in a kind of structured soliloquy called a “chain of thought”. You can think of the prompts you send to the AI as a preface to those soliloquies, and imagine the AI telling stories of a sort that would typically follow that preface.

So if you tell an AI “make no mistakes” or “do not hallucinate”, you’re making it more likely to generate the kind of story that begins, “the AI was instructed to make no mistakes”.

Let me put it this way, Mr. Amor. The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error. – HAL 9000, “2001: A Space Odyssey”

You’d expect this to affect the chain of thought. For example, the AI might occasionally pause to say “I’m supposed to make no mistakes, so I should check this. What could have gone wrong?” and then list something that plausibly could be wrong with its idea. If this happens often enough, you’ll probably catch some real problems.

But I’m reminded of my classmates, practicing for that Danish exam. We can go over the text again and again, asking if this thing, or that, might be wrong. We can try again and again to use our mental model of the Danish language, seeing if this time it catches a new mistake. But there are things we won’t catch. And if we do it too much, we’ll second-guess ourselves out of the good answers, too.

Ultimately, “make no mistakes” isn’t a great instruction, either for humans or for chatbots. And its use by people like Marc Andreessen has me wondering if they are used to interacting with humans in the same way, as tools that keep making mistakes no matter how many times they’re instructed not to, requiring more and more long-winded instructions and yet continuing to misbehave.

Then again, that may be a mistake on my part.

Bonus Info for “100-year-old assumption about the universe may soon be overturned”

2 Replies

I had a piece up in New Scientist last week (paywalled, sorry!), about a new analysis that suggests the universe is less homogeneous (more “lumpy”) that most cosmologists believe.

The piece was a bit different than my usual. Normally I do what people in the biz call “features”: longer articles about general trends. This was a much more classic “news piece”. The people I interviewed had several papers up in early April, the editors at New Scientist thought they were interesting enough to write about, so I was asked for a short, timely piece with the key takeaways.

That means I didn’t have a ton of space for background info. So if you’d like to know more, this post is for you!

The 100-year old assumption in the title refers to the Friedmann–Lemaître–Robertson–Walker (or FLRW) universe, an idea that first came together in the 1920’s, where cosmologists model the universe as homogeneous and isotropic: the same no matter where, or in which direction, you look. That sounds like a crazy assumption, but on the largest scales we can measure it’s actually mostly fine. Once you’re trying to calculate ripples in the cosmic microwave background or find out how fast distant galaxies are accelerating away, it works surprisingly well to act like the universe is an evenly-mixed soup of matter, radiation, dark matter, and dark energy.

But every assumption in physics has its doubters. The doubters of homogeneity are known as inhomogeneous cosmologists, and I’ve been sympathetic to their complaints for a while now.

I even let an inhomogeneous cosmologist do a guest post on my blog, back in 2019. That post argued something dramatic: that dark energy may not even exist, but that measurements of accelerating expansion may be a consequence of a dramatic lopsidedness in the universe around us.

The people I covered in New Scientist, Asta Heinesen, Tim Clifton, and Sofie Marie Koksbang, are arguing something much less dramatic…but that’s part of what makes it more compelling. Instead of arguing that the universe is dramatically uneven or lopsided, they’re arguing that the universe can still be on average smooth and homogeneous, the soup of galaxies people seem to expect…but still, can’t be fully modeled that way.

This is a tricky distinction to explain, and certainly something I didn’t have space to cover well enough in New Scientist. But let me take a stab at it here:

Any cosmologist will agree that FLRW can’t be the whole story. We know the universe isn’t a perfectly mixed soup: there are galaxies, and stars, and black holes, and they all wiggle the fabric of the universe in different places. When they study the universe as a whole, they’re averaging out all of that, to get the overall behavior, a bit like you could average the number of children in each family to get the average children per family in a country.

But FLRW isn’t just an average, it’s a model of spacetime. Because of that, it has to obey certain equations, called Einstein’s equations. It has to make sense by itself, as the correct answer for how spacetime would behave if it were filled with a uniform soup.

That’s an extra restriction, and that extra restriction can get you in trouble. To continue with the analogy, any real family has a whole number of children. But the average family doesn’t have a whole number of children. When I was born, the average family in the US had around 2.5 children. A lot of cartoons imagined what the half-child looked like.

From the perspective of Heinesen, Clifton, and Koksbang, assuming FLRW is a bit like assuming that the average family must have two children, or three, and can’t possibly have 2.5. Averages don’t have to look like sensible spacetimes, they don’t have to obey the Einstein equations.

In practice, the assumption of FLRW has worked a lot better than assuming that the average family can’t have 2.5 children, and that’s why Heinesen, Clifton, and Koksbang are cautious. They’re not claiming that inhomogeneity can explain everything, all the way to major components of the universe like dark energy. But they do think it can be a good explanation for smaller effects. And as cosmologists worry about smaller and smaller effects, wondering if dark energy changes over time and why the expansion rate of the universe doesn’t match up between different measurements, it can be important to remember that averages aren’t all-powerful. Eventually, they can break down. It’s a more subtle issue than a fractional child. But, as I covered in New Scientist, it may already be happening.

Breakthrough Prize 2026

Leave a reply

Because of last week’s “bonus info” post, I’m only now getting around to commenting on this year’s Breakthrough Prizes in Fundamental Physics. While I don’t comment on them every year, I know enough about several of this year’s winners that I figured a post would be helpful.

For those who haven’t heard of it, the Breakthrough Prizes are a bit like the Nobel, if it was created by a 21st century rich person instead of a 19th century one. They give out more money, and instead of an organization like the Swedish Academy of Sciences they pick winners via a committee of past winners. They’re more flexible in structure than the Nobel, with extra prizes for early-career researchers and a tendency to reward accomplishments that are either entirely theoretical or solid experimental work that doesn’t show a new discovery, both of which are things the Nobel Prize is structured to avoid. They’ve also shown willingness to reward large collaborations, rather than following the Nobel’s informal rule to only give the award to three people at a time.

This last was on display this year in their main award in physics this year, for the muon g-2 collaborations. The award is going to collaborations of scientists and engineers at three different particle colliders, for work done over a span of over fifty years to measure the magnetic properties of the muon. These measurements have shown a tantalizing discrepancy with predictions that inspired many to conjecture new physics. However, in the last few years it’s looked more and more like the discrepancy was due to an imprecise prediction, and better methods seem to be converging to the experimental value. At this point, smart money is that there is no disagreement with the Standard Model here, but as always in science there’s a chance some mystery remains.

The Breakthrough Prize also offered a special, out-of-schedule prize to David Gross. Already a Nobel laureate, Gross had a crucial role in our understanding of the force of quantum chromodynamics that binds protons and neutrons together. He was also a major founding figure in string theory, and since the Breakthrough Prize is more comfortable recognizing theoretical contributions they get to mention this as well. Gross is also known in the community for his personality, which tends to fill up any room he’s in. I can only imagine the conversations that led to Breakthrough’s decision to add a special prize for him this year.

Breakthrough is also adding a new recurring prize, the Vera Rubin New Frontiers Prize, honoring women who make important contributions to physics within two years of their PhD. The prize is a bit smaller than the exiting early-career New Horizons in Physics Prizes, presumably because it goes to even younger researchers. This year’s winner is from my old field, scattering amplitudes. Carolina Figueiredo is part of the latest evolution of the research program behind the amplituhedron. The new framework of “surfaceology” seems like a promising geometry-flavored way to understand particle physics calculations in more realistic theories, and unlike its predecessors may have some practical value eventually as well. Congrats Carolina!

Finally, the New Horizons in Physics Prizes are for impressive early-career researchers. I don’t know much about the first recipient, Benjamin Safdi, who works on searches for axions and axion-like particles, today’s most trendy dark matter candidate. I know a bit more about the work done by Clay Córdova, Thomas Dumitrescu, Shu-Heng Shao, and Yifan Wang, having met several of them in my physics career. They work on what are called generalized symmetries, concepts which go beyond the usual idea of how symmetry is supposed to work by involving more complicated tensors. I saw these crop up a fair bit in talks, but they were distant enough from my area that I never had a particularly clear grasp of what people were doing with them. I know even less about the work of the last three, Dillon Brout, J. Colin Hill, Mathew Madhavacheril, Maria Vincenzi, Daniel Scolnic, and W. L. Kimmy Wu, on cosmological measurements, but I was friends with Mathew in grad school and am impressed that he’s now working on cosmology given how little cosmology research there was at Stony Brook at the time.

4 gravitons

Stories about physics from someone who's been there