A reader pointed me to Stephen Wolfram’s one-year update of his proposal for a unified theory of physics. I was pretty squeamish about it one year ago, and now I’m even less interested in wading in to the topic. But I thought it would be worth saying something, and rather than say something specific, I realized I could say something general. I thought I’d talk a bit about how we judge good and bad research in theoretical physics.
In science, there are two things we want out of a new result: we want it to be true, and we want it to be surprising. The first condition should be obvious, but the second is also important. There’s no reason to do an experiment or calculation if it will just tell us something we already know. We do science in the hope of learning something new, and that means that the best results are the ones we didn’t expect.
(What about replications? We’ll get there.)
If you’re judging an experiment, you can measure both of these things with statistics. Statistics lets you estimate how likely an experiment’s conclusion is to be true: was there a large enough sample? Strong enough evidence? It also lets you judge how surprising the experiment is, by estimating how likely it would be to happen given what was known beforehand. Did existing theories and earlier experiments make the result seem likely, or unlikely? While you might not have considered replications surprising, from this perspective they can be: if a prior experiment seems unreliable, successfully replicating it can itself be a surprising result.
If instead you’re judging a theoretical result, these measures get more subtle. There aren’t always good statistical tools to test them. Nonetheless, you don’t have to rely on vague intuitions either. You can be fairly precise, both about how true a result is and how surprising it is.
We get our results in theoretical physics through mathematical methods. Sometimes, this is an actual mathematical proof: guaranteed to be true, no statistics needed. Sometimes, it resembles a proof, but falls short: vague definitions and unstated assumptions mar the argument, making it less likely to be true. Sometimes, the result uses an approximation. In those cases we do get to use some statistics, estimating how good the approximation may be. Finally, a result can’t be true if it contradicts something we already know. This could be a logical contradiction in the result itself, but if the result is meant to describe reality (note: not always the case), it might contradict the results of a prior experiment.
What makes a theoretical result surprising? And how precise can we be about that surprise?
Theoretical results can be surprising in the light of earlier theory. Sometimes, this gets made precise by a no-go theorem, a proof that some kind of theoretical result is impossible to obtain. If a result finds a loophole in a no-go theorem, that can be quite surprising. Other times, a result is surprising because it’s something no-one else was able to do. To be precise about that kind of surprise, you need to show that the result is something others wanted to do, but couldn’t. Maybe someone else made a conjecture, and only you were able to prove it. Maybe others did approximate calculations, and now you can do them more precisely. Maybe a question was controversial, with different people arguing for different sides, and you have a more conclusive argument. This is one of the better reasons to include a long list of references in a paper: not to pad your friends’ citation counts, but to show that your accomplishment is surprising: that others might have wanted to achieve it, but had to settle for something lesser.
In general, this means that showing whether a theoretical result is good: not merely true, but surprising and new, links you up to the rest of the theoretical community. You can put in all the work you like on a theory of everything, and make it as rigorous as possible, but if all you did was reproduce a sub-case of someone else’s theory then you haven’t accomplished all that much. If you put your work in context, compare and contrast to what others have done before, then we can start getting precise about how much we should be surprised, and get an idea of what your result is really worth.



