As the saying goes, it is better not to see laws or sausages being made. You’d prefer to see the clean package on the outside than the mess behind the scenes.
The same is true of science. A good paper tells a nice, clean story: a logical argument from beginning to end, with no extra baggage to slow it down. That story isn’t a lie: for any decent paper in theoretical physics, the conclusions will follow from the premises. Most of the time, though, it isn’t how the physicist actually did it.
The way we actually make discoveries is messy. It involves looking for inspiration in all the wrong places: pieces of old computer code and old problems, trying to reproduce this or that calculation with this or that method. In the end, once we find something interesting enough, we can reconstruct a clearer, cleaner, story, something actually fit to publish. We hide the original mess partly for career reasons (easier to get hired if you tell a clean, heroic story), partly to be understood (a paper that embraced the mess of discovery would be a mess to read), and partly just due to that deep human instinct to not let others see us that way.
The trouble is, some of that “mess” is useful, even essential. And because it’s never published or put into textbooks, the only way to learn it is word of mouth.
A lot of these messy tricks involve numerics. Many theoretical physics papers derive things analytically, writing out equations in symbols. It’s easy to make a mistake in that kind of calculation, either writing something wrong on paper or as a bug in computer code. To correct mistakes, many things are checked numerically: we plug in numbers to make sure everything still works. Sometimes this means using an approximation, trying to make sure two things cancel to some large enough number of decimal places. Sometimes instead it’s exact: we plug in prime numbers, and can much more easily see if two things are equal, or if something is rational or contains a square root. Sometimes numerics aren’t just used to check something, but to find a solution: exploring many options in an easier numerical calculation, finding one that works, and doing it again analytically.
“Ansatze” are also common: our fancy word for an educated guess. These we sometimes admit, when they’re at the core of a new scientific idea. But the more minor examples go un-mentioned. If a paper shows a nice clean formula and proves it’s correct, but doesn’t explain how the authors got it…probably, they used an ansatz. This trick can go hand-in-hand with numerics as well: make a guess, check it matches the right numbers, then try to see why it’s true.
The messy tricks can also involve the code itself. In my field we often use “computer algebra” systems, programs to do our calculations for us. These systems are programming languages in their own right, and we need to write computer code for them. That code gets passed around informally, but almost never standardized. Mathematical concepts that come up again and again can be implemented very differently by different people, some much more efficiently than others.
I don’t think it’s unreasonable that we leave “the mess” out of our papers. They would certainly be hard to understand otherwise! But it’s a shame we don’t publish our dirty tricks somewhere, even in special “dirty tricks” papers. Students often start out assuming everything is done the clean way, and start doubting themselves when they notice it’s much too slow to make progress. Learning the tricks is a big part of learning to be a physicist. We should find a better way to teach them.
This is a very interesting point. I believe that these “tricks” and expertises, which usually do not appear in scientific papers, can be learned in good master’s and PhD’s thesis, this is one of the most important roles of theses documents. My former PhD advisor always told that the thesis will be read by current and future students, then all details are important must be carefully writen. This is even more fundamental for us as theoretical physicists.
LikeLiked by 1 person
I’m a big fan of pushing for a cultural shift where all papers include links (maybe to a permanent repository hosted by the journal?) to a folder containing all of the messy Mathematica, MatLab, and Python scripts used in the work. And, even though the university offices responsible for printing would hate it, all theses should include these scripts in their entirety within the thesis itself.
I firmly believe I could have shaved two years off my PhD if I was just able to see the unnecessarily secretive code employed by some of the people who’s work mine was built on. Just months and months wasted, wondering “how on earth are these people doing this.” Maybe if we really enforced good code commenting practice early on, people wouldn’t be too embarrassed to share the nitty gritty dirty details—I suspect that’s the source of a lot of the secrecy.
LikeLike
Yeah some of this is just, ugly code to the point that it’s extremely hard to read. One of my first papers involved modifying someone else’s code, and in this case it was published code, in a public package…and the commenting was still bad enough that most of the project was just figuring out how it actually worked.
I agree that learning good commenting, and more general “good programmer habits” early on might benefit a lot of scientists.
LikeLike
Excellent post!
I think many people have realized this, as I see more and more articles that come with code, Mathematica notebooks or other supplementary material on the arXiv. For my own work I use GitHub to store the code and cross-ref to arXiv through thttps://paperswithcode.com (the “Code & Data” tab below the abstract). It’s very easy.
But in fact I have only recently started to do so myself, and I can see that most of my collaborators are not yet aware of this possibility. Of course this requires a bit more work and discipline in terms of coding habits, but everybody has something to gain from this. Not least the authors! I find it amazingly hard to go back to computations done by myself only a couple of years back if I didn’t properly document the code back then…
LikeLiked by 1 person