Category Archives: Amateur Philosophy

Ideally, Exams Are for the Students

I should preface this by saying I don’t actually know that much about education. I taught a bit in my previous life as a professor, yes, but I probably spent more time being taught how to teach than actually teaching.

Recently, the Atlantic had a piece about testing accommodations for university students, like extra time on exams, or getting to do an exam in a special distraction-free environment. The piece quotes university employees who are having more and more trouble satisfying these accommodations, and includes the statistic that 20 percent of undergraduate students at Brown and Harvard are registered as disabled.

The piece has kicked off a firestorm on social media, mostly focused on that statistic (which conveniently appears just before the piece’s paywall). People are shocked, and cynical. They feel like more and more students are cheating the system, getting accommodations that they don’t actually deserve.

I feel like there is a missing mood in these discussions, that the social media furor is approaching this from the wrong perspective. People are forgetting what exams actually ought to be for.

Exams are for the students.

Exams are measurement tools. An exam for a class says whether a student has learned the material, or whether they haven’t, and need to retake the class or do more work to get there. An entrance exam, or a standardized exam like the SAT, predicts a student’s future success: whether they will be able to benefit from the material at a university, or whether they don’t yet have the background for that particular program of study.

These are all pieces of information that are most important to the students themselves, that help them structure their decisions. If you want to learn the material, should you take the course again? Which universities are you prepared for, and which not?

We have accommodations, and concepts like disability, because we believe that there are kinds of students for whom the exams don’t give this information accurately. We think that a student with more time, or who can take the exam in a distraction-free environment, would have a more accurate idea of whether they need to retake the material, or whether they’re ready for a course of study, than a student who has to take the exam under ordinary conditions. And we think we can identify the students who this matters for, and the students for whom this doesn’t matter nearly as much.

These aren’t claims about our values, or about what students deserve. They’re empirical claims, about how test results correlate with outcomes the students want. The conversation, then, needs to be built on top of those empirical claims. Are we better at predicting the success of students that receive accommodations, or worse? Can we measure that at all, or are we just guessing? And are we communicating the consequences accurately to students, that exam results tell them something useful and statistically robust that should help them plan their lives?

Values come in later, of course. We don’t have infinite resources, as the Atlantic piece emphasizes. We can’t measure everyone with as much precision as we would like. At some level, generalization takes over and accuracy is lost. There is absolutely a debate to be had about which measurements we can afford to make, and which we can’t.

But in order to have that argument at all, we first need to agree on what we’re measuring. And I feel like most of the people talking about this piece haven’t gotten there yet.

Bonus Info For “Cosmic Paradox Reveals the Awful Consequence of an Observer-Free Universe”

I had a piece in Quanta Magazine recently, about a tricky paradox that’s puzzling quantum gravity researchers and some early hints at its resolution.

The paradox comes from trying to describe “closed universes”, which are universes where it is impossible to reach the edge, even if you had infinite time to do it. This could be because the universe wraps around like a globe, or because the universe is expanding so fast no traveler could ever reach an edge. Recently, theoretical physicists have been trying to describe these closed universes, and have noticed a weird issue: each such universe appears to have only one possible quantum state. In general, quantum systems have more possible states the more complex they are, so for a whole universe to have only one possible state is a very strange thing, implying a bizarrely simple universe. Most worryingly, our universe may well be closed. Does that mean that secretly, the real world has only one possible state?

There is a possible solution that a few groups are playing around with. The argument that a closed universe has only one state depends on the fact that nothing inside a closed universe can reach the edge. But if nothing can reach the edge, then trying to observe the universe as a whole from outside would tell you nothing of use. Instead, any reasonable measurement would have to come from inside the universe. Such a measurement introduces a new kind of “edge of the universe”, this time not in the far distance, but close by: the edge between an observer and the rest of the world. And when you add that edge to the calculations, the universe stops being closed, and has all the many states it ought to.

This was an unusually tricky story for me to understand. I narrowly avoided several misconceptions, and I’m still not sure I managed to dodge all of them. Likewise, it was unusually tricky for the editors to understand, and I suspect it was especially tricky for Quanta’s social media team to understand.

It was also, quite clearly, tricky for the readers to understand. So I thought I would use this post to clear up a few misconceptions. I’ll say a bit more about what I learned investigating this piece, and try to clarify what the result does and does not mean.

Q: I’m confused about the math terms you’re using. Doesn’t a closed set contain its boundary?

A: Annoyingly, what physicists mean by a closed universe is a bit different from what mathematicians mean by a closed manifold, which is in turn more restrictive than what mathematicians mean by a closed set. One way to think about this that helped me is that in an open set you can take a limit that takes you out of the set, which is like being able to describe a (possibly infinite) path that takes you “out of the universe”. A closed set doesn’t have that, every path, no matter how long, still ends up in the same universe.

Q: So a bunch of string theorists did a calculation and got a result that doesn’t make sense, a one-state universe. What if they’re just wrong?

A: Two things:

First, the people I talked to emphasized that it’s pretty hard to wiggle out of the conclusion. It’s not just a matter of saying you don’t believe in string theory and that’s that. The argument is based in pretty fundamental principles, and it’s not easy to propose a way out that doesn’t mess up something even more important.

That’s not to say it’s impossible. One of the people I interviewed, Henry Maxfield, thinks that some of the recent arguments are misunderstanding how to use one of their core techniques, in a way that accidentally presupposes the one-state universe.

But even he thinks that the bigger point, that closed universes have only one state, is probably true.

And that’s largely due to a second reason: there are older arguments that back the conclusion up.

One of the oldest dates back to John Wheeler, a physicist famous for both deep musings about the nature of space and time and coining evocative terms like “wormhole”. In the 1960’s, Wheeler argued that, in a theory where space and time can be curved, one should think of a system’s state as including every configuration it can evolve into over time, since it can be tricky to specify a moment “right now”. In a closed universe, you could expect a quantum system to explore every possible configuration…meaning that such a universe should be described by only one state.

Later, physicists studying holography ran into a similar conclusion. They kept noticing systems in quantum gravity where you can describe everything that happens inside by what happens on the edges. If there are no edges, that seems to suggest that in some sense there is nothing inside. Apparently, Lenny Susskind had a slide at the end of talks in the 90’s where he kept bringing up this point.

So even if the modern arguments are wrong, and even if string theory is wrong…it still looks like the overall conclusion is right.

Q: If a closed universe has only one state, does that make it deterministic, and thus classical?

A: Oh boy…

So, on the one hand, there is an idea, which I think also goes back to Wheeler, that asks: “if the universe as a whole has a wavefunction, how does it collapse?” One possibility is that the universe has only one state, so that nobody is needed to collapse the wavefunction, it already is in a definite state.

On the other hand, a universe with only one state does not actually look much like a classical universe. Our universe looks classical largely due to a process called decoherence, where small quantum systems interact with big quantum systems with many states, diluting quantum effects until the world looks classical. If there is only one state, there are no big systems to interact with, and the world has large quantum fluctuations that make it look very different from a classical universe.

Q: How, exactly, are you defining “observer”?

A: A few commenters helpfully chimed in to talk about how physics models observers as “witness” systems, objects that preserve some record of what happens to them. A simple example is a ball sitting next to a bowl: if you find the ball in the bowl later, it means something moved it. This process, preserving what happens and making it more obvious, is in essence how physicists think about observers.

However, this isn’t the whole story in this case. Here, different research groups introducing observers are doing it in different ways. That’s, in part, why none of them are confident they have the right answer.

One of the approaches describes an observer in terms of its path through space and time, its worldline. Instead of a detailed witness system with specific properties, all they do is pick out a line and say “the observer is there”. Identifying that line, and declaring it different from its surroundings, seems to be enough to recover the complexity the universe ought to have.

The other approach treats the witness system in a bit more detail. We usually treat an observer in quantum mechanics as infinitely large compared to the quantum systems they measure. This approach instead gives the observer a finite size, and uses that to estimate how far their experience will be from classical physics.

Crucially, both approaches aren’t a matter of defining a physical object, and looking for it in the theory. Given a collection of atoms, neither team can tell you what is an observer, and what isn’t. Instead, in each approach, the observer is arbitrary: a choice, made by us when we use quantum mechanics, of what to count as an observer and what to count as the rest of the world. That choice can be made in many different ways, and each approach tries to describe what happens when you change that choice.

This is part of what makes this approach uncomfortable to some more philosophically-minded physicists: it treats observers not as a predictable part of the physical world, but as a mathematical description used to make statements about the world.

Q: If these ideas come from AdS/CFT, which is an open universe, how do you use them to describe a closed universe?

A: While more examples emerged later, initially theorists were thinking about two types of closed universes:

First, think about a black hole. You may have heard that when you fall into a black hole, you watch the whole universe age away before your eyes, due to the dramatic differences in the passage of time caused by the extreme gravity. Once you’ve seen the outside universe fade away, you are essentially in a closed universe of your own. The outside world will never affect you again, and you are isolated, with no path to the outside. These black hole interiors are one of the examples theorists looked at.

The other example are so-called “baby universes”. When physicists use quantum mechanics to calculate the chance of something happening, they have to add up every possible series of events that could have happened in between. For quantum gravity, this includes every possible arrangement of space and time. This includes arrangements with different shapes, including ones with tiny extra “baby universes” which branch off from the main universe and return. Universes with these “baby universes” are another example that theorists considered to understand closed universes.

Q: So wait, are you actually saying the universe needs to be observed to exist? That’s ridiculous, didn’t the universe exist long before humans existed to observe it? Is this some sort of Copenhagen Interpretation thing, or that thing called QBism?

You’re starting to ask philosophical questions, and here’s the thing:

There are physicists who spend their time thinking about how to interpret quantum mechanics. They talk to philosophers, and try to figure out how to answer these kinds of questions in a consistent and systematic way, keeping track of all the potential pitfalls and implications. They’re part of a subfield called “quantum foundations”.

The physicists whose work I was talking about in that piece are not those people.

Of the people I interviewed, one of them, Rob Myers, probably has lunch with quantum foundations researchers on occasion. The others, based at places like MIT and the IAS, probably don’t even do that.

Instead, these are people trying to solve a technical problem, people whose first inclination is to put philosophy to the side, and “shut up and calculate”. These people did a calculation that ought to have worked, checking how many quantum states they could find in a closed universe, and found a weird and annoying answer: just one. Trying to solve the problem, they’ve done technical calculation work, introducing a path through the universe, or a boundary around an observer, and seeing what happens. While some of them may have their own philosophical leanings, they’re not writing works of philosophy. Their papers don’t talk through the philosophical implications of their ideas in all that much detail, and they may well have different thoughts as to what those implications are.

So while I suspect I know the answers they would give to some of these questions, I’m not sure.

Instead, how about I tell you what I think?

I’m not a philosopher, I can’t promise my views will be consistent, that they won’t suffer from some pitfall. But unlike other people’s views, I can tell you what my own views are.

To start off: yes, the universe existed before humans. No, there is nothing special about our minds, we don’t have psychic powers to create the universe with our thoughts or anything dumb like that.

What I think is that, if we want to describe the world, we ought to take lessons from science.

Science works. It works for many reasons, but two important ones stand out.

Science works because it leads to technology, and it leads to technology because it guides actions. It lets us ask, if I do this, what will happen? What will I experience?

And science works because it lets people reach agreement. It lets people reach agreement because it lets us ask, if I observe this, what do I expect you to observe? And if we agree, we can agree on the science.

Ultimately, if we want to describe the world with the virtues of science, our descriptions need to obey this rule: they need to let us ask “what if?” questions about observations.

That means that science cannot avoid an observer. It can often hide the observer, place them far away and give them an infinite mind to behold what they see, so that one observer is essentially the same as another. But we shouldn’t expect to always be able to do this. Sometimes, we can’t avoid saying something about the observer: about where they are, or how big they are, for example.

These observers, though, don’t have to actually exist. We should be able to ask “what if” questions about others, and that means we should be able to dream up fictional observers, and ask, if they existed, what would they see? We can imagine observers swimming in the quark-gluon plasma after the Big Bang, or sitting inside a black hole’s event horizon, or outside our visible universe. The existence of the observer isn’t a physical requirement, but a methodological one: a restriction on how we can make useful, scientific statements about the world. Our theory doesn’t have to explain where observers “come from”, and can’t and shouldn’t do that. The observers aren’t part of the physical world being described, they’re a precondition for us to describe that world.

Is this the Copenhagen Interpretation? I’m not a historian, but I don’t think so. The impression I get is that there was no real Copenhagen Interpretation, that Bohr and Heisenberg, while more deeply interested in philosophy than many physicists today, didn’t actually think things through in enough depth to have a perspective you can name and argue with.

Is this QBism? I don’t think so. It aligns with some things QBists say, but they say a lot of silly things as well. It’s probably some kind of instrumentalism, for what that’s worth.

Is it logical positivism? I’ve been told logical positivists would argue that the world outside the visible universe does not exist. If that’s true, I’m not a logical positivist.

Is it pragmatism? Maybe? What I’ve seen of pragmatism definitely appeals to me, but I’ve seen my share of negative characterizations as well.

In the end, it’s an idea about what’s useful and what’s not, about what moves science forward and what doesn’t. It tries to avoid being preoccupied with unanswerable questions, and as much as possible to cash things out in testable statements. If I do this, what happens? What if I did that instead?

The results I covered for Quanta, to me, show that the observer matters on a deep level. That isn’t a physical statement, it isn’t a mystical statement. It’s a methodological statement: if we want to be scientists, we can’t give up on the observer.

When Your Theory Is Already Dead

Occasionally, people try to give “even-handed” accounts of crackpot physics, like people who claim to have invented anti-gravity devices. These accounts don’t go so far as to say that the crackpots are right, and will freely point out plausible doubts about the experiments. But at the end of the day, they’ll conclude that we still don’t really know the answer, and perhaps the next experiment will go differently. More tests are needed.

For someone used to engineering, or to sciences without much theory behind them, this might sound pretty reasonable. Sure, any one test can be critiqued. But you can’t prove a negative: you can’t rule out a future test that might finally see the effect.

That’s all well and good…if you have no idea what you’re doing. But these people, just like anyone else who grapples with physics, aren’t just proposing experiments. They’re proposing theories: models of the world.

And once you’ve got a theory, you don’t just have to care about future experiments. You have to care about past experiments too. Some theories…are already dead.

The "You're already dead" scene from the anime North Star
Warning: this is a link to TVTropes, enter only if you have lots of time on your hands

To get a little more specific, let’s talk about antigravity proposals that use scalar fields.

Scalar fields seem to have some sort of mysticism attached to them in the antigravity crackpot community, but for physicists they’re just the simplest possible type of field, the most obvious thing anyone would have proposed once they were comfortable enough with the idea of fields in the first place. We know of one, the Higgs field, which gives rise to the Higgs boson.

We also know that if there are any more, they’re pretty subtle…and as a result, pretty useless.

We know this because of a wide variety of what are called “fifth-force experiments“, tests and astronomical observations looking for an undiscovered force that, like gravity, reaches out to long distances. Many of these experiments are quite general, the sort of thing that would pick up a wide variety of scalar fields. And so far, none of them have seen anything.

That “so far” doesn’t mean “wait and see”, though. Each time physicists run a fifth-force experiment, they establish a limit. They say, “a fifth force cannot be like this“. It can’t be this strong, it can’t operate on these scales, it can’t obey this model. Each experiment doesn’t just say “no fifth force yet”, it says “no fifth force of this kind, at all”.

When you write down a theory, if you’re not careful, you might find it has already been ruled out by one of these experiments. This happens to physicists all the time. Physicists want to use scalar fields to understand the expansion of the universe, they use them to think about dark matter. And frequently, a model one physicist proposed will be ruled out, not by new experiments, but by someone doing the math and realizing that the model is already contradicted by a pre-existing fifth-force experiment.

So can you prove a negative? Sort of.

If you never commit to a model, if you never propose an explanation, then you can never be disproven, you can always wait for the experiment of your dreams to come true. But if you have any model, any idea, any explanation at all, then your explanation will have implications. Those implications may kill your theory in a future experiment. Or, they may have already killed it.

Requests for an Ethnography of Cheating

What is AI doing to higher education? And what, if anything, should be done about it?

Chad Orzel at Counting Atoms had a post on this recently, tying the question to a broader point. There is a fundamental tension in universities, between actual teaching and learning and credentials. A student who just wants the piece of paper at the end has no reason not to cheat if they can get away with it, so the easier it becomes to get away with cheating (say, by using AI), the less meaningful the credential gets. Meanwhile, professors who want students to actually learn something are reduced to trying to “trick” these goal-oriented students into accidentally doing something that makes them fall in love with a subject, while being required to police the credential side of things.

Social science, as Orzel admits and emphasizes, is hard. Any broad-strokes picture like this breaks down into details, and while Orzel talks through some of those details he and I are of course not social scientists.

Because of that, I’m not going to propose my own “theory” here. Instead, think of this post as a request.

I want to read an ethnography of cheating. Like other ethnographies, it should involve someone spending time in the culture in question (here, cheating students), talking to the people involved, and getting a feeling for what they believe and value. Ideally, it would be augmented with an attempt at quantitative data, like surveys, that estimate how representative the picture is.

I suspect that cheating students aren’t just trying to get a credential. Part of why is that I remember teaching pre-meds. In the US, students don’t directly study medicine as a Bachelor’s degree. Instead, they study other subjects as pre-medical students (“pre-meds”), and then apply to Medical School, which grants a degree on the same level as a PhD. As part of their application, they include a standardized test called the MCAT, which checks that they have the basic level of math and science that the medical schools expect.

A pre-med in a physics class, then, has good reason to want to learn: the better they know their physics, the better they will do on the MCAT. If cheating was mostly about just trying to get a credential, pre-meds wouldn’t cheat.

I’m pretty sure they do cheat, though. I didn’t catch any cheaters back when I taught, but there were a lot of students who tried to push the rules, pre-meds and not.

Instead, I think there are a few other motivations involved. And in an ethnography of cheating, I’d love to see some attempt to estimate how prevalent they are:

  1. Temptation: Maybe students know that they shouldn’t cheat, in the same way they know they should go to the gym. They want to understand the material and learn in the same way people who exercise have physical goals. But the mind, and flesh, are weak. You have a rough week, you feel like you can’t handle the work right now. So you compensate. Some of the motivation here is still due to credentials: a student who shrugs and accepts that their breakup will result in failing a course is a student who might have to pay for an extra year of ultra-expensive US university education to get that credential. But I suspect there is a more fundamental motivation here, related to ego and easy self-deception. If you do the assignment, even if you cheat for part of it, you get to feel like you did it, while if you just turn in a blank page you have to accept the failure.
  2. Skepticism: Education isn’t worth much if it doesn’t actually work. Students may be skeptical that the things that professors are asking them to do actually help them learn what they want to learn, or that the things the professors want them to learn are actually the course’s most valuable content. A student who uses ChatGPT to write an essay might believe that they will never have to write something without ChatGPT in life, so why not use it now? Sometimes professors simply aren’t explicit about what an exercise is actually meant to teach (there have been a huge number of blog posts explaining that writing is meant to teach you to think, not to write), and sometimes professors are genuinely pretty bad at teaching, since there is little done to retain the good ones in most places. A student in this situation still has to be optimistic about some aspect of the education, at some time. But they may be disillusioned, or just interested in something very different.
  3. Internalized Expectations: Do employers actually care if you get a bad grade? Does it matter? By the time a student is in college, they’ve been spending half their waking hours in a school environment for over a decade. Maybe the need to get good grades is so thoroughly drilled in that the actual incentives don’t matter. If you think of yourself as the kind of person who doesn’t fail courses, and you start failing, what do you do?
  4. External Non-Credential Expectations: Don’t worry about the employers, worry about the parents. Some college students have the kind of parents who keep checking in on how they’re doing, who want to see evidence and progress the same way they did when they were kids. Any feedback, no matter how much it’s intended to teach, not to judge, might get twisted into a judgement. Better to avoid that judgement, right?
  5. Credentials, but for the Government, not Employers: Of course, for some students, failing really does wreck their life. If you’re on the kind of student visa that requires you maintain grades a certain level, you’ve got a much stronger incentive to cheat, imposed for much less reason.

If you’re aware of a good ethnography of cheating, let me know! And if you’re a social scientist, consider studying this!

The Rocks in the Ground Era of Fundamental Physics

It’s no secret that the early twentieth century was a great time to make progress in fundamental physics. On one level, it was an era when huge swaths of our understanding of the world were being rewritten, with relativity and quantum mechanics just being explored. It was a time when a bright student could guide the emergence of whole new branches of scholarship, and recently discovered physical laws could influence world events on a massive scale.

Put that way, it sounds like it was a time of low-hanging fruit, the early days of a field when great strides can be made before the easy problems are all solved and only the hard ones are left. And that’s part of it, certainly: the fields sprung from that era have gotten more complex and challenging over time, requiring more specialized knowledge to make any kind of progress. But there is also a physical reason why physicists had such an enormous impact back then.

The early twentieth century was the last time that you could dig up a rock out of the ground, do some chemistry, and end up with a discovery about the fundamental laws of physics.

When scientists like Curie and Becquerel were working with uranium, they didn’t yet understand the nature of atoms. The distinctions between elements were described in qualitative terms, but only just beginning to be physically understood. That meant that a weird object in nature, “a weird rock”, could do quite a lot of interesting things.

And once you find a rock that does something physically unexpected, you can scale up. From the chemistry experiments of a single scientist’s lab, countries can build industrial processes to multiply the effect. Nuclear power and the bomb were such radical changes because they represented the end effect of understanding the nature of atoms, and atoms are something people could build factories to manipulate.

Scientists went on to push that understanding further. They wanted to know what the smallest pieces of matter were composed of, to learn the laws behind the most fundamental laws they knew. And with relativity and quantum mechanics, they could begin to do so systematically.

US particle physics has a nice bit of branding. They talk about three frontiers: the Energy Frontier, the Intensity Frontier, and the Cosmic Frontier.

Some things we can’t yet test in physics are gated by energy. If we haven’t discovered a particle, it may be because it’s unstable, decaying quickly into lighter particles so we can’t observe it in everyday life. If these particles interact appreciably with particles of everyday matter like protons and electrons, then we can try to make them in particle colliders. These end up creating pretty much everything up to a certain mass, due to a combination of the tendency in quantum mechanics for everything that can happen to happen, and relativity’s E=mc^2. In the mid-20th century these particle colliders were serious pieces of machinery, but still small enough to make industrial: now, there are so-called medical accelerators in many hospitals based on their designs. But current particle accelerators are a different beast, massive facilities built by international collaborations. This is the Energy Frontier.

Some things in physics are gated by how rare they are. Some particles interact only very faintly with other particles, so to detect them, physicists have to scan a huge chunk of matter, a giant tank of argon or a kilometer of antarctic ice, looking for deviations from the norm. Over time, these experiments have gotten bigger, looking for more and more subtle effects. A few weird ones still fit on tabletops, but only because they have the tools to measure incredibly small variations. Most are gigantic. This is the Intensity Frontier.

Finally, the Cosmic Frontier looks for the unknown behind both kinds of gates, using the wider universe to look at events with extremely high energy or size.

Pushing these frontiers has meant cleaning up our understanding of the fundamental laws of physics up to these frontiers. It means that whatever is still hiding, it either requires huge amounts of energy to produce, or is an extremely rare, subtle effect.

That means that you shouldn’t expect another nuclear bomb out of fundamental physics. Physics experiments are already working on vast scales, to the extent that a secret government project would have to be smaller than publicly known experiments, in physical size, energy use, and budget. And you shouldn’t expect another nuclear power plant, either: we’ve long passed the kinds of things you could devise a clever industrial process to take advantage of at scale.

Instead, new fundamental physics will only be directly useful once we’re the kind of civilization that operates on a much greater scale than we do today. That means larger than the solar system: there wouldn’t be much advantage, at this point, of putting a particle physics experiment on the edge of the Sun. It means the kind of civilization that tosses galaxies around.

It means that right now, you won’t see militaries or companies pushing the frontiers of fundamental physics, unlike the way they might have wanted to at the dawn of the twentieth century. By the time fundamental physics is useful in that way, all of these actors will likely be radically different: companies, governments, and in all likelihood human beings themselves. Instead, supporting fundamental physics right now is an act of philanthropy, maintaining a practice because it maintains good habits of thought and produces powerful ideas, the same reasons organizations support mathematics or poetry. That’s not nothing, and fundamental physics is still often affordable as philanthropy goes. But it’s not changing the world, not the way physicists did in the early twentieth century.

Two Types of Scientific Fraud: for a Fee and for Power

A paper about scientific fraud has been making the rounds in social media lately. The authors gather evidence of large-scale networks of fraudsters across multiple fields, from teams of editors that fast-track fraudulent research to businesses that take over journals, sell spots for articles, and then move on to a new target when the journal is de-indexed. I’m not an expert in this kind of statistical sleuthing, but the work looks impressively thorough.

Still, I think the authors overplay their results a bit. They describe themselves as revealing something many scientists underestimate. They point to what they label as misconceptions: that scientific fraud is usually perpetrated alone by individual unethical scientists, or that it is almost entirely a problem of the developing world, and present their work as disproving those misconceptions. Listen to them, and you might get the feeling that science is rife with corruption, that no result, or scientist, can be trusted.

As far as I can tell, though, those “misconceptions” they identify are true. Someone who believes that scientific fraud is perpetrated by loners is probably right, as is someone who believes it largely takes place outside of the first world.

As is often the case, the problem is words.

“Scientific Fraud” is a single term for two different things. The two both involve bad actors twisting scientific activity. But in everything else — their incentives, their geography, their scale, and their consequences — they are dramatically different.

One of the types of scientific fraud is largely about power.

In references 84-89 of the paper, the authors give examples of large-scale scientific fraud in Europe and the US. All (except one, which I’ll mention later) are about the career of a single researcher. Each of these people systematically bent the truth, whether with dodgy statistics, doctored images, or inflating citation counts. Some seemed motivated to promote a particular scientific argument, cutting corners to push a particular conclusion through. Others were purer cases of self-promotion. These people often put pressure on students, postdocs, and other junior researchers in their orbits, which increases the scale of their impact. In some cases, their work rippled out to convince other researchers, prolonging bad ideas and strangling good ones. These were people with power, who leveraged that power to increase their power.

There also don’t appear to be that many of them. These people are loners in a meaningful sense, cores of fraud working on their own behalf. They don’t form networks with each other, for the most part: because they work towards their own aggrandizement, they have no reason to trust anyone else doing the same. I have yet to see evidence that the number of these people is increasing. They exist, they’re a problem, they’re important to watch out for. But they’re not a crisis, and they shouldn’t shift your default expectations of science.

The other, quite different, type of scientific fraud is fraud for a fee.

The cases this paper investigates seem to fall into this category. They are businesses, offering the raw material of academic credit (papers, co-authorship, citations, publication) for cash. They’re paper mills, of various sorts. These are, at least from an academic perspective, large organizations, with hundreds or thousands of customers and tens of suborned editors or scientists farming out their credibility. As the authors of this paper argue, fraudsters of this type are churning out more and more papers, potentially now fueled by AI, adding up to a still small, but non-negligible, proportion of scientific papers in total.

Compared to the first type of fraud, though, buying credit in this way doesn’t give very much power. As the paper describes, many of the papers churned out by paper mills don’t even go into relevant journals: for example, they mention “an article about roasting hazelnuts in a journal about HIV/AIDS care”. An article like that isn’t going to mislead the hazelnut roasting community, or the HIV/AIDS community. Indeed, that would be counter to its purpose. The paper isn’t intended to be read at all, and ideally gets ignored: it’s just supposed to inflate a number.

These numbers are most relevant in the developing world, and when push comes to shove, almost all of the buyers of these services identified by the authors of this paper come from there. In many developing countries, a combination of low trust and advice from economists leads to explicit point systems, where academics are paid or hired explicitly based on criteria like where and how often they publish or how they are cited. The more a country can trust people to vouch for each other without corruption, the less these kinds of incentives have purchase. Outside of the developing world, involvement in paper mills and the like generally seems to involve a much smaller number of people, and typically as sellers, not buyers: selling first-world credibility in exchange for fees from many developing-world applicants.

(The one reference I mentioned above is an interesting example of this: a system built out of points and low trust to recruit doctors from the developing world to the US, gamed by a small number of co-authorship brokers.)

This kind of fraud doesn’t influence science directly. Its perpetrators aren’t trying to get noticed, but to keep up a cushy scam. You don’t hear their conclusions in the press, other scientists don’t see their work. Instead, they siphon off resources: cannibalizing journals, flooding editors with mass-produced crap, and filling positions and slurping up science budgets in the countries that can least afford them. As they publish more and more, they shouldn’t affect your expectations of the credibility of science: any science you hear about will be either genuine, or fraud from the other category. But they do make the science you hear about harder and harder to do.

(The authors point out one exception: what about AI? If a company trains a large language model on the current internet, will its context windows be long enough to tell that that supposedly legitimate paper about hazelnuts is in an HIV/AIDS journal? If something gets said often enough, copied again and again in papers sold by a mill, will an AI trained on all these papers be convinced? Presumably, someone is being paid good money to figure out how to filter AI-generated slop from training data: can they filter paper mill fraud as well?)

It’s a shame that we have one term, scientific fraud, to deal with these two very different things. But it’s important to keep in mind that they are different. Fraud for power and fraud for money can have very different profiles, and offer very different risks. If you don’t trust a scientific result, it’s worth understanding what might be at play.

Technology as Evidence

How much can you trust general relativity?

On the one hand, you can read through a lovely Wikipedia article full of tests, explaining just how far and how precisely scientists have pushed their knowledge of space and time. On the other hand, you can trust GPS satellites.

As many of you may know, GPS wouldn’t work if we didn’t know about general relativity. In order for the GPS in your phone to know where you are, it has to compare signals from different satellites, each giving the location and time the signal was sent. To get an accurate result, the times measured on those satellites have to be adjusted: because of the lighter gravity they experience, time moves more quickly for them than for us down on Earth.

In a sense, general relativity gets tested every minute of every day, on every phone in the world. That’s pretty trustworthy! Any time that science is used in technology, it gets tested in this way. The ideas we can use are ideas that have shown they can perform, ideas which do what we expect again and again and again.

In another sense, though, GPS is a pretty bad test of general relativity. It tests one of general relativity’s simplest consequences, based on the Schwarzchild metric for how gravity behaves near a large massive object, and not to an incredibly high degree of precision. Gravity could still violate general relativity in a huge number of other ways, and GPS would still function. That’s why the other tests are valuable: if you want to be sure general relativity doesn’t break down, you need to test it under conditions that GPS doesn’t cover, and to higher precision.

Once you know to look for it, these layers of tests come up everywhere. You might see the occasional article talking about tests of quantum gravity. The tests they describe are very specific, testing a very general and basic question: does quantum mechanics make sense at all in a gravitational world? In contrast, most scientists who research quantum gravity don’t find that question very interesting: if gravity breaks quantum mechanics in a way those experiments could test, it’s hard to imagine it not leading to a huge suite of paradoxes. Instead, quantum gravity researchers tend to be interested in deeper problems with quantum gravity, distinctions between theories that don’t dramatically break with our existing ideas, but that because of that are much harder to test.

The easiest tests are important, especially when they come from technology: they tell us, on a basic level, what we can trust. But we need the hard tests too, because those are the tests that are most likely to reveal something new, and bring us to a new level of understanding.

Value in Formal Theory Land

What makes a physics theory valuable?

You may think that a theory’s job is to describe reality, to be true. If that’s the goal, we have a whole toolbox of ways to assess its value. We can check if it makes predictions and if those predictions are confirmed. We can assess whether the theory can cheat to avoid the consequences of its predictions (falsifiability) and whether its complexity is justified by the evidence (Occam’s razor, and statistical methods that follow from it).

But not every theory in physics can be assessed this way.

Some theories aren’t even trying to be true. Others may hope to have evidence some day, but are clearly not there yet, either because the tests are too hard or the theory hasn’t been fleshed out enough.

Some people specialize in theories like these. We sometimes say they’re doing “formal theory”, working with the form of theories rather than whether they describe the world.

Physics isn’t mathematics. Work in formal theory is still supposed to help describe the real world. But that help might take a long time to arrive. Until then, how can formal theorists know which theories are valuable?

One option is surprise. After years tinkering with theories, a formal theorist will have some idea of which sorts of theories are possible and which aren’t. Some of this is intuition and experience, but sometimes it comes in the form of an actual “no-go theorem”, a proof that a specific kind of theory cannot be consistent.

Intuition and experience can be wrong, though. Even no-go theorems are fallible, both because they have assumptions which can be evaded and because people often assume they go further than they do. So some of the most valuable theories are valuable because they are surprising: because they do something that many experienced theorists think is impossible.

Another option is usefulness. Here I’m not talking about technology: these are theories that may or may not describe the real world and can’t be tested in feasible experiments, they’re not being used for technology! But they can certainly be used by other theorists. They can show better ways to make predictions from other theories, or better ways to check other theories for contradictions. They can be a basis that other theories are built on.

I remember, back before my PhD, hearing about the consistent histories interpretation of quantum mechanics. I hadn’t heard much about it, but I did hear that it allowed calculations that other interpretations didn’t. At the time, I thought this was an obvious improvement: surely, if you can’t choose based on observations, you should at least choose an interpretation that is useful. In practice, it doesn’t quite live up to the hype. The things it allows you to calculate are things other interpretations would say don’t make sense to ask, questions like “what was the history of the universe” instead of observations you can test like “what will I see next?” But still, being able to ask new questions has proven useful to some, and kept a community interested.

Often, formal theories are judged on vaguer criteria. There’s a notion of explanatory power, of making disparate effects more intuitively part of the same whole. There’s elegance, or beauty, which is the theorist’s Occam’s razor, favoring ideas that do more with less. And there’s pure coolness, where a bunch of nerds are going to lean towards ideas that let them play with wormholes and multiverses.

But surprise, and usefulness, feel more solid to me. If you can find someone who says “I didn’t think this was possible”, then you’ve almost certainly done something valuable. And if you can’t do that, “I’d like to use this” is an excellent recommendation too.

Experiments Should Be Surprising, but Not Too Surprising

People are talking about colliders again.

This year, the European particle physics community is updating its shared plan for the future, the European Strategy for Particle Physics. A raft of proposals at the end of March stirred up a tail of public debate, focused on asking what sort of new particle collider should be built, and discussing potential reasons why.

That discussion, in turn, has got me thinking about experiments, and how they’re justified.

The purpose of experiments, and of science in general, is to learn something new. The more sure we are of something, the less reason there is to test it. Scientists don’t check whether the Sun rises every day. Like everyone else, they assume it will rise, and use that knowledge to learn other things.

You want your experiment to surprise you. But to design an experiment to surprise you, you run into a contradiction.

Suppose that every morning, you check whether the Sun rises. If it doesn’t, you will really be surprised! You’ll have made the discovery of the century! That’s a really exciting payoff, grant agencies should be lining up to pay for…

Well, is that actually likely to happen, though?

The same reasons it would be surprising if the Sun stopped rising are reasons why we shouldn’t expect the Sun to stop rising. A sunrise-checking observatory has incredibly high potential scientific reward…but an absurdly low chance of giving that reward.

Ok, so you can re-frame your experiment. You’re not hoping the Sun won’t rise, you’re observing the sunrise. You expect it to rise, almost guaranteed, so your experiment has an almost guaranteed payoff.

But what a small payoff! You saw exactly what you expected, there’s no science in that!

By either criterion, the “does the Sun rise” observatory is a stupid experiment. Real experiments operate in between the two extremes. They also mix motivations. Together, that leads to some interesting tensions.

What was the purpose of the Large Hadron Collider?

There were a few things physicists were pretty sure of, when they planned the LHC. Previous colliders had measured W bosons and Z bosons, and their properties made it clear that something was missing. If you could collide protons with enough energy, physicists were pretty sure you’d see the missing piece. Physicists had a reasonably plausible story for that missing piece, in the form of the Higgs boson. So physicists could be pretty sure they’d see something, and reasonably sure it would be the Higgs boson.

If physicists expected the Higgs boson, what was the point of the experiment?

First, physicists expected to see the Higgs boson, but they didn’t expect it to have the mass that it did. In fact, they didn’t know anything about the particle’s mass, besides that it should be low enough that the collider could produce it, and high enough that it hadn’t been detected before. The specific number? That was a surprise, and an almost-inevitable one. A rare creature, an almost-guaranteed scientific payoff.

I say almost, because there was a second point. The Higgs boson didn’t have to be there. In fact, it didn’t have to exist at all. There was a much bigger potential payoff, of noticing something very strange, something much more complicated than the straightforward theory most physicists had expected.

(Many people also argued for another almost-guaranteed payoff, and that got a lot more press. People talked about finding the origin of dark matter by discovering supersymmetric particles, which they argued was almost guaranteed due to a principle called naturalness. This is very important for understanding the history…but it’s an argument that many people feel has failed, and that isn’t showing up much anymore. So for this post, I’ll leave it to the side.)

This mix, of a guaranteed small surprise and the potential for a very large surprise, was a big part of what made the LHC make sense. The mix has changed a bit for people considering a new collider, and it’s making for a rougher conversation.

Like the LHC, most of the new collider proposals have a guaranteed payoff. The LHC could measure the mass of the Higgs, these new colliders will measure its “couplings”: how strongly it influences other particles and forces.

Unlike the LHC, though, this guarantee is not a guaranteed surprise. Before building the LHC, we did not know the mass of the Higgs, and we could not predict it. On the other hand, now we absolutely can predict the couplings of the Higgs. We have quite precise numbers, our expectation for what they should be based on a theory that so far has proven quite successful.

We aren’t certain, of course, just like physicists weren’t certain before. The Higgs boson might have many surprising properties, things that contradict our current best theory and usher in something new. These surprises could genuinely tell us something about some of the big questions, from the nature of dark matter to the universe’s balance of matter and antimatter to the stability of the laws of physics.

But of course, they also might not. We no longer have that rare creature, a guaranteed mild surprise, to hedge in case the big surprises fail. We have guaranteed observations, and experimenters will happily tell you about them…but no guaranteed surprises.

That’s a strange position to be in. And I’m not sure physicists have figured out what to do about it.

Does Science Require Publication?

Seen on Twitter:

As is traditional, twitter erupted into dumb arguments over this. Some made fun of Yann LeCun for implying that Elon Musk will be forgotten, which despite any other faults of his seems unlikely. Science popularizer Sabine Hossenfelder pointed out that there are two senses of “publish” getting confused here: publish as in “make public” and publish as in “put in a scientific journal”. The latter tends to be necessary for scientists in practice, but is not required in principle. (The way journals work has changed a lot over just the last century!) The former, Sabine argued, is still 100% necessary.

Plenty of people on twitter still disagreed (this always happens). It got me thinking a bit about the role of publication in science.

When we talk about what science requires or doesn’t require, what are we actually talking about?

“Science” is a word, and like any word its meaning is determined by how it is used. Scientists use the word “science” of course, as do schools and governments and journalists. But if we’re getting into arguments about what does or does not count as science, then we’re asking about a philosophical problem, one in which philosophers of science try to understand what counts as science and what doesn’t.

What do philosophers of science want? Many things, but a big one is to explain why science works so well. Over a few centuries, humanity went from understanding the world in terms of familiar materials and living creatures to decomposing them in terms of molecules and atoms and cells and proteins. In doing this, we radically changed what we were capable of, computers out of the reach of blacksmiths and cures for diseases that weren’t even distinguishable. And while other human endeavors have seen some progress over this time (democracy, human rights…), science’s accomplishment demands an explanation.

Part of that explanation, I think, has to include making results public. Alchemists were interested in many of the things later chemists were, and had started to get some valuable insights. But alchemists were fearful of what their knowledge would bring (especially the ones who actually thought they could turn lead into gold). They published almost only in code. As such, the pieces of progress they made didn’t build up, didn’t aggregate, didn’t become overall progress. It was only when a new scientific culture emerged, when natural philosophers and physicists and chemists started writing to each other as clearly as they could, that knowledge began to build on itself.

Some on twitter pointed out the example of the Manhattan project during World War II. A group of scientists got together and made progress on something almost entirely in secret. Does that not count as science?

I’m willing to bite this bullet: I don’t think it does! When the Soviets tried to replicate the bomb, they mostly had to start from scratch, aside from some smuggled atomic secrets. Today, nations trying to build their own bombs know more, but they still must reinvent most of it. We may think this is a good thing, we may not want more countries to make progress in this way. But I don’t think we can deny that it genuinely does slow progress!

At the same time, to contradict myself a bit: I think you can think of science that happens within a particular community. The scientists of the Manhattan project didn’t publish in journals the Soviets could read. But they did write internal reports, they did publish to each other. I don’t think science by its nature has to include the whole of humanity (if it does, then perhaps studying the inside of black holes really is unscientific). You probably can do science sticking to just your own little world. But it will be slower. Better, for progress’s sake, if you can include people from across the world.