Sabine Hossenfelder had an explainer video recently on how to tell science from pseudoscience. This is a famously difficult problem, so naturally we have different opinions. I actually think the picture she draws is reasonably sound. But while it is a good criterion to tell whether you yourself are doing pseudoscience, it’s surprisingly tricky to apply it to other people.

Hossenfelder argues that science, at its core, is about explaining observations. To tell whether something is science or pseudoscience you need to ask, first, if it agrees with observations, and second, if it *is simpler than* those observations. In particular, a scientist should prefer models with *fewer parameters*. If your model has so many parameters that you can fit any observation, you’re not being scientific.

This is a great rule of thumb, one that as Hossenfelder points out forms the basis of a whole raft of statistical techniques. It does rely on one tricky judgement, though: how many parameters does your model actually have?

Suppose I’m one of those wacky theorists who propose a whole new particle to explain some astronomical mystery. Hossenfelder, being more conservative in these things, proposes a model with no new particles. Neither of our models fit the data perfectly. Perhaps my model fits a little better, but after all it has one extra parameter, from the new particle. If we want to compare our models, we should take that into account, and penalize mine.

Here’s the question, though: how do I know that Hossenfelder didn’t start out with more particles, and got rid of them to get a better fit? If she did, she had more parameters than I did. She just fit them away.

The problem here is closely related to one called the look-elsewhere effect. Scientists don’t publish everything they try. An unscrupulous scientist can do a bunch of different tests until one of them randomly works, and just publish that one, making the result look meaningful when really it was just random chance. Even if no individual scientist is unscrupulous, a *community* can do the same thing: many scientists testing many different models, until one accidentally appears to work.

As a scientist, you mostly know if your motivations are genuine. You know if you actually tried a bunch of different models or had good reasons from the start to pick the one you did. As someone judging other scientists, you often don’t have that luxury. Sometimes you can look at prior publications and see all the other attempts someone made. Sometimes they’ll even tell you explicitly what parameters they used and how they fit them. But sometimes, someone will swear up and down that their model is just the most natural, principled choice they could have made, and they never considered anything else. When that happens, how do we guard against the look-elsewhere effect?

The normal way to deal with the look-elsewhere effect is to consider, not just whatever tests the scientist claims to have done, but all tests they could reasonably have done. You need to count *all* the parameters, not just the ones they say they varied.

This works in some fields. If you have an idea of what’s reasonable and what’s not, you have a relatively manageable list of things to look at. You can come up with clear rules for which theories are simpler than others, and people will agree on them.

Physics doesn’t have it so easy. We don’t have any pre-set rules for what kind of model is “reasonable”. If we want to parametrize every “reasonable” model, the best we can do are what are called Effective Field Theories, theories which try to describe every possible type of new physics in terms of its effect on the particles we already know. Even there, though, we need assumptions. The most popular effective field theory, called SMEFT, assumes the forces of the Standard Model keep their known symmetries. You get a different model if you relax that assumption, and even that model isn’t the most general: for example, it still keeps relativity intact. Try to make the most general model possible, and you end up waist-deep in parameter soup.

Subjectivity is a dirty word in science…but as far as I can tell it’s the only way out of this. We can try to count parameters when we can, and use statistical tools…but at the end of the day, we still need to make choices. We need to judge what counts as an extra parameter and what doesn’t, which possible models to compare to and which to ignore. That’s going to be dependent on our scientific culture, on fashion and aesthetics, there just isn’t a way around that. The best we can do is own up to our assumptions, and be ready to change them when we need to.

Henrik MunchGreat blog post Matt! Your last paragraph reminds of me of Paul Feyerabend’s so-called epistemological anarchism: Although we teach high school students about

thescientific method, in reality it’s more of an “whatever works” kind of situation – scientific methodology is arguably much less rigid than scientists would like to admit.Now, if science is about explaining observations, as Hossenfelder argues, how can we say that folks who spend their entire career on N=8 supergravity or 2-dimensional CFTs are doing science? They would only be explaining observations in a very spin-off kind of way, if ever.

LikeLike

4gravitonsPost authorFrom what I’ve seen Hossenfelder is mostly ok with “formal theory”. Yeah, we aren’t directly describing observations, but we’re understanding the kinds of theories that do describe observations, and that understanding is hopefully useful: more of a spin-in than a spin-off in some sense.

LikeLike

ohwillekeComing at this from an applied math background, the Chi-square test does have a very established way of evaluating goodness of fit v. how many parameters you have in a model and making that tradeoff. Coming up with a model in fundamental physics is more epic than that, of course, and the risk of using a simplistic approach like that is that you risk having a set of data points that don’t reflect the full range of phenomena that you want your theory to explain. So, you’d think that a generalization of this kind of test could definitively say that one model is better than another one, at least in principle.

But, this kind of thinking illustrates the problem with thinking of problems like which model of the physical world is best in abstract terms, as an extended example illustrates, and ignores the fact that deciding which model is better requires you to know yourself well enough to know what it is that you really see of the most important parts of quality (to use Pirsig’s terminology) to what you are looking for, which thinking about something at very abstract level, rather than concretely obscures, especially in circumstances where you are trying to compare the relative desirability of a few dozen quite specific contending models, and not an infinitely variable set of models in hypothetical theory space.

For example, you can do a lot of useful, very high precision science with a proton, neutron, residual strong force (PNRSF) model with 10 experimentally measured parameters: the extremely exactly measured proton and neutron masses, extremely precisely measured proton and neutron magnetic moments, an extremely precisely measured proton charge radiu and neutron radius respectively, three experimentally measured, non-integer coefficients in the Reid potential function (each know to five significant digits), and the half life of the neutron.

In contrast, the Standard Model (SM() needs fourteen parameters (all of which are measured much less precisely) to describe the same phenomena that you can with the proton, neutron and residual strong force model: six quark masses, four CKM matrix elements, two weak force masses (from which the weak force coupling constant can be derived), a QCD coupling constant, and a U(1) coupling constant, and honestly, you really need the three charged lepton masses and the Higgs boson mass as well to really get the calculations right (even though these make only virtual contributions to the phenomena described by the PNRSF model), bringing you to 18 parameters.

So, with the PNRSF you need 10 parameters all measured with exquisite precision, while with the SM you need 18 parameters all measured with profoundly less precision, to describe all of the phenomena you can describe with the PNRSF model.

Furthermore, as you know better than anyone, the math in the SM, should you try to apply it, is profoundly more difficult to use than the math needed to apply the PNRSF model.

And, it isn’t as if the domain of applicability where the PNRSF model is useful is small. It encompasses everything you need for chemistry and biology, and everything you need to do atomic fusion and fission reactions. This is where the vast share of our existing experimental data points about strong force bound interactions are for all of human history (and for that matter probably the vast majority of the history of technologically advanced intelligent alien races in other galaxies whom we haven’t encountered yet but must be out there somewhere too).

So, the PNRSF gives you a much more precise fit to the data, with much easier calculations relative to the SM for almost all phenomena other than particle colliders, high energy cosmic rays, and neutron stars (and even with neutron stars the PNRSF give the SM a run for its money and our observational data isn’t good enough to really discriminate between the two models). So using the hypothetical set of all data of all experiments ever conducted by human beings (or all sentient beings in the universe for that matter) regarding phenomena involving systems bound by what the SM calls the strong force, in any naive conceptual generalization of the Chi-square test, the PNRSF wins hands down over the SM.

if we would just stop doing particle accelerator experiments that aren’t “natural” any in the non-technical and non-quantitative sense of that word, the closeness of fit and utility of the PNRSF which has ten fewer experimentally measured parameters (almost half as many) would even more obviously be the superior choice relative to the SM, and we could just throw our QCD and weak force theory textbooks in the trash, and put all of the world’s amplitudologists on the dole at a substantial economic savings that could be used to pay athletes and movie stars and viral YouTube stars better.

But of course, unlike a Chi-square analysis, we actually prefer a model that has a vastly greater domain of applicability than the PNRSF, even at the cost of a less precise fits to the data in the cases where the PNRSF is applicable, less accurately measured parameters, and far more difficult math.

The SM produces intelligible, consistent results that are probably a pretty decent description of reality all of the way up to at least the GUT energy scale of 10^16 GeV, while you start to see mesons that the PNRSF is at a loss to explain like pions and kaons at the 2 GeV energy scale and higher. So, for 99.999999999999999% of the energy scale ranges from zero to a potential GUT scale UV completion, the SM kicks the PNRSF’s butt.

Yes, in the 0-2 GeV domain of applicability of the PNRSF model is superior. And yes, the 0-2 GeV domain of applicability is found in almost all of the space-time in our universe in the 13.8 billion years that have elapsed after the first hour after the Big Bang, even if you ignore the vacuum bits and just look at the energy scale or temperature of baryonic matter in the universe on a mass weighted basis, most of which is in ordinary stars (as opposed to neutron stars and black holes), which are still pretty damn hot. This is true even the SM won’t tell you what the inside of a black hole looks like, so it isn’t truly complete over the entire conceivable domain of applicability of a model describing the laws of nature either. But, the SM still wins the domain of applicability contest because it has the largest domain of applicability for a model that actually works that also describes the phenomena of the PNRSF.

Intuitively, we know that the SM is more fundamental and hence a preferred and better description of the laws of the universe, even though its has almost twice as many parameters, each of which are less precisely measured than those of the PNRSF, and even though is much harder to do calculations with, sometimes taking months or years to do with powerful computers at fairly low precision, instead of an hour or two tops in the PNRSF, for comparable physical systems.

So, why are we willing to pay the price of more parameters, less accuracy and more work to use it?

Ultimately, it is a matter of all or nothing thinking, which is usually a fallacy, but not here.

The whole point of knowing the “laws of the universe” is to have a model that is tolerably accurate for the maximal domain of applicability – that comes tolerably close for 99% of the theoretically possible cases, rather than 2 divided by 10^16 time 100 percent of the theoretically possible cases.

If we came up with a model that added ten more parameters to the SM and had half the precision of the SM and to ten times as long to do calculations with, but provided absolute certainty that it gave roughly correct answers from the GUT scale up to infinitely high energies, we’d demote the SM to the low glamour but hard working status of the PNRSF model today (as we already do when we patronizingly call the SM a low energy effective theory) in a heartbeat in favor of a more byzantine, harder to work with and less precise, infinitely high energy super-standard model.

The bottom line is that goodness of fit to observations and fewer parameters are the marks of a superior model, but only to the extent that you are making an apples to apples comparison of models with the same domain of applicability.

If, however, you have two models with different domains of applicability, the model with a significantly greater domain of applicability, that provides even crude precision in its goodness of fit over the entire range where it applies, is still going to be viewed as better and more fundamental, even if it is more cumbersome an an even if it has more parameters so long as the model with the greater domain of applicability has a number of parameters that is roughly on the same order of magnitude within a power of ten of so. We’d still prefer the infinitely high energy super-standard model to the standard model, even it had 250 parameters (as some maximal SUSY models do) rather than the 25 or so parameters of the SM.

Of course, this is partially true because science is an all you can eat buffet and not a fix prix menu from which you may choose only one selection.

When we say that the SM is superior and more fundamental than the PNRSF model, we still get to chose which one of those two models we want to use for any particular problem.

Indeed, in real life applications, we’ll choose to use the clunky periodic table of elements and table of isotypes model (i.e. CRC Handbook of Chemistry and Physics), with hundreds of parameters, over the extremely elegant and beautiful eight parameter PNRSF model whose parameters are measured more precisely.

Why?

Because its cumbersome to do the math from scratch every time using the PNRSF model, and because even with a very skilled practitioner, the PNRSF model is more vulnerable to conceptual and theoretical user error in its application to a particular problem, than the directly measured values of the hundreds of parameters in the Handbook of Chemistry and Physics, which is far more robust in the face of use by less skilled practitioners and provides a result much more quickly.

Declaring one model the king of physics and constitutes the True Laws of Nature doesn’t banish the others from their role of courtiers in the aristocracy of good scientific models, who have their uses at the appropriate moments (which is actually most of the time).

Similarly, just because you have a theory of quantum gravity, and general relativity, and Newtonian gravity, doesn’t mean that you can’t use the approximation of gravity that says that is a uniform acceleration pulling towards the center of the Earth at 9.8 meters/second squared when that simplification is good enough to get the job done. The theory of quantum gravity, which would have the most expansive domain of applicability would still be preferred as the more fundamental law of Nature, even if the calculations you do with it are less precise than comparable calculations done with general relativity or Newtonian gravity which has an even narrower domain of applicability.

LikeLiked by 1 person

4gravitonsPost authorGreat analysis!

I think there is another way to characterize why we prefer the SM to the PNRSF, one that makes the justification match Chi-squared-like reasoning a bit better. (Here I mean “prefer to believe in”, rather than “prefer to use”.)

The idea is that new theories don’t merely give new models for a fixed set of data points: they also change which data points we view as distinct. If you think of your data as every observation someone has ever made, each as its own data point, then the tiny amount of data from high-energy colliders is swamped by the sum of all chemistry and nuclear physics from all recorded history. But once we have quantum field theory, we don’t think of all of those chemistry and nuclear physics experiments as independent data points: we think of them as measuring a few common observables, at a few points along a large range of energies. And once you have that perspective, you can (probably) justify the SM on chi-squared grounds, because the weight of the collider data gets much closer to the weight of the pre-collider data.

That does require that change in perspective though, and that doesn’t seem like the kind of thing we can formalize. On some level you have to already buy into the new theory to accept that reasoning, you can’t get there by pure thought alone.

(By the way, is it actually that much easier to use the PNRSF in practice, for a typical problem of relevance? I had the impression that modeling large nuclei is still pretty hard, involves supercomputers, etc., and I don’t think those people are using full-on QCD.)

LikeLike

ohwilleke“is it actually that much easier to use the PNRSF in practice, for a typical problem of relevance? I had the impression that modeling large nuclei is still pretty hard, involves supercomputers, etc., and I don’t think those people are using full-on QCD”

Yes. For example, this is the theory that was used to design the A-bomb, the H-bomb, and the almost all of the commercial and naval nuclear fission reactors in use today (all of which predated the Standard Model and QCD entirely) and this was done with slide rules and punch card computers that a college student’s graphing calculator could put to shame.

It was also used to generate the original models of neutron star structure, which continue to be the starting point and benchmark for most sophisticated QCD driven analysis of more complex possibilities like quark stars and strange starts, again in the pre-silicon processor/pre-keyboard UI era. It continues to be an important tool used in evaluating the shape of atomic nuclei in potentially unstable isotopes and in looking for “islands of stability” among hypothetical not yet observed isotopes. You still need lots of computing power and analytical chops to formulate and solve many body problems using the PNRSF model in complex systems, but it s far less finicky and uses far less computing power than lattice QCD methods, for example, of evaluating the same thing.

If you use PNRSF you have one empirically validated analytical expression of interactions between nucleons, which you have to make an EM adjustment for. If you do the same thing with QCD you have to do propagator calculations separately for several mesons and in practice have to use empirical data to estimate the rate at which three kinds of pions with varying properties and three or four other mesons with second or third order impacts reflected by additional terms in the RSF equation between every nucleon and every other nucleon, which is vastly more difficult and takes much more computing power.

Usually, for an atom with lots of nuclei you can use symmetry arguments and clever framing of the question and resorts to average values that you then have to separately validate as appropriate to greatly simplify the problem relative to a true N-body analysis.

LikeLiked by 1 person