In Defense of Shitty Code

Scientific programming was in the news lately, when doubts were raised about a coronavirus simulation by researchers at Imperial College London. While the doubts appear to have been put to rest, doing so involved digging through some seriously messy code. The whole situation seems to have gotten a lot of people worried. If these people are that bad at coding, why should we trust their science?

I don’t know much about coronavirus simulations, my knowledge there begins and ends with a talk I saw last month. But I know a thing or two about bad scientific code, because I write it. My code is atrocious. And I’ve seen published code that’s worse.

Why do scientists write bad code?

In part, it’s a matter of training. Some scientists have formal coding training, but most don’t. I took two CS courses in college and that was it. Despite that lack of training, we’re expected and encouraged to code. Before I took those courses, I spent a summer working in a particle physics lab, where I was expected to pick up the C++-based interface pretty much on the fly. I don’t think there’s another community out there that has as much reason to code as scientists do, and as little training for it.

Would it be useful for scientists to have more of the tools of a trained coder? Sometimes, yeah. Version control is a big one, I’ve collaborated on papers that used Git and papers that didn’t, and there’s a big difference. There are coding habits that would speed up our work and lead to fewer dead ends, and they’re worth picking up when we have the time.

But there’s a reason we don’t prioritize “proper coding”. It’s because the things we’re trying to do, from a coding perspective, are really easy.

What, code-wise, is a coronavirus simulation? A vector of “people”, really just simple labels, all randomly infecting each other and recovering, with a few parameters describing how likely they are to do so and how long it takes. What do I do, code-wise? Mostly, giant piles of linear algebra.

These are not some sort of cutting-edge programming tasks. These are things people have been able to do since the dawn of computers. These are things that, when you screw them up, become quite obvious quite quickly.

Compared to that, the everyday tasks of software developers, like making a reliable interface for users, or efficient graphics, are much more difficult. They’re tasks that really require good coding practices, that just can’t function without them.

For us, the important part is not the coding itself, but what we’re doing with it. Whatever bugs are in a coronavirus simulation, they will have much less impact than, for example, the way in which the simulation includes superspreaders. Bugs in my code give me obviously wrong answers, bad scientific assumptions are much harder for me to root out.

There’s an exception that proves the rule here, and it’s that, when the coding task is actually difficult, scientists step up and write better code. Scientists who want to run efficiently on supercomputers, who are afraid of numerical error or need to simulate on many scales at once, these people learn how to code properly. The code behind the LHC still might be jury-rigged by industry standards, but it’s light-years better than typical scientific code.

I get the furor around the Imperial group’s code. I get that, when a government makes a critical decision, you hope that their every input is as professional as possible. But without getting too political for this blog, let me just say that whatever your politics are, if any of it is based on science, it comes from code like this. Psychology studies, economic modeling, polling…they’re using code, and it’s jury-rigged to hell. Scientists just have more important things to worry about.

11 thoughts on “In Defense of Shitty Code

  1. vuurklip

    “Shitty” code producing correct results is OK, maybe, but “shitty” code producing garbage is just “shit”. Why would a scientist be rigorous about the design of an experiment and data collection but be sloppy about processing the data? Doesn’t the coding demand the same rigour?

    Like

    Reply
    1. 4gravitons Post author

      Scientists prioritize checking that the code does the right thing scientifically over whether it’s bug-free. That means checking things like “if I close all the malls in this simulation, does this do what I expect it to?” rather than “if I use the same seed do I get the same output?” I don’t know whether or not the Imperial team checked that kind of thing, but if not their scientific negligence is going to matter a lot more than their coding negligence.

      Also, I don’t know if you’ve ever seen an experimental setup in a lab, but they’re messy. Really messy. Experimentalists run tests on their setup for a wide variety of things, to determine whether to trust it. But they aren’t following good engineering procedure at all, and they won’t meet standards from that field.

      Basically, if the scientists are doing their job they’ll have tested the scientific output, and that’s much more relevant for catching incorrect results than good coding in these contexts.

      Like

      Reply
  2. Marc Briand

    I am a software developer with a couple decades’ worth of experience. For the most part, I agree with what you’re saying, with some caveats (which I will get to). In fact, we face the same issue within our industry, where a group of developers may have lots of domain knowledge in, say, embedded systems, but are not experts at the progamming languages they use (almost always C and C++ in embedded systems) or great at writing code. We make allowances because their domain knowledge may be more important than their coding ability. For example, it might be more important that a developer thoroughly understand the ins and outs of a particular analog-to-digital converter chip, including its unpublished quirks, than the best way to structure the code that talks to the chip. It is not reasonable to expect perfection in everything we do. But we have processes to mitigate the risks inherent in “shitty” code: code reviews and rigorous testing to name a couple.

    Now for the caveats: I have done a fair amount of simulation work myself, and I have been surprised over and over by how devious a simple bug in the simulation can be. The simulation won’t necessarily fail spectacularly, which is what we all would hope for. It can manifest as a gradual error in output Y vs. input X, indicating a trend that does not in fact exist, or worse perhaps, hiding a trend that does. The problem here is that software is not like math, despite our attempts to convince ourselves that it is. In the math world, real numbers do not overflow, they just get bigger and bigger. In the software world, we don’t have real numbers, we have floating-point numbers, which are rather shabby real number wanna-bes that work okay most of the time — except when they don’t. One instance where they don’t is where the inevitable rounding errors inherent in their use is multiplied millions of times.

    I don’t have evidence to back this up, but I propose that people who write shitty code are more likely to create these kinds of devious bugs than people who write good code. So if you want to defend the practice of scientists writing shitty code, I hope that you will also promote testing that goes beyond mere sanity testing and includes test cases designed to sniff out the pathological behaviors in the simulation.

    Another caveat: it troubles me that you say scientists are encouraged to write code. Why aren’t you encouraged, instead, to learn how to use robust, well-tested simulation packages? In the case of the coronavirus simulation, I find it hard to believe that it was necessary to write a special program for that purpose. As you said yourself, it’s just a bunch of vectors. Although it may sound like it, I’m not being a software development snob here, saying “stay in your lane.” In fact, I give the same advice to my colleagues: if you can, find a way to avoid writing code! Use a standard library or a well-known and exercised open source library instead.

    Hey, man, I love your blog!

    Like

    Reply
    1. 4gravitons Post author

      For the coronavirus simulation, I would guess everything in it would be discrete, so it wouldn’t have the specific problems you mention. But that doesn’t rule out other subtle bugs of course.

      I agree that it would be great to use a standard library or package for a lot of these things. There’s a reason I use Mathematica. The issue is one of demand: most of these topics are small and specialized enough that the professionals haven’t bothered to write anything for it. And given how bad these scientists are at coding, you don’t want them making the package themselves! I do think that there is probably an untapped resource out there, of generic packages that turn out to do what a given subfield needs, and that plenty of scientists would probably adopt them if they knew about them. But in many cases the appropriate professional packages just don’t exist.

      Like

      Reply
  3. Jacques Distler

    It really depends on whether this code is a one-off, or whether you either

    a) expect to be using this code years from now or
    b) expect others to pick up/use/extend this code.

    If either of those pertain, then writing crappy code today will lead to a world of pain tomorrow.

    Documenting code, writing tests, … are the sort of boring activities that most scientists (and many professional programmers) would prefer having a root-canal over doing. But if you ever have to pick up someone else’s code or dust off some code that you wrote years ago, you’ll be glad that they (or you) made the effort.

    Like

    Reply
    1. 4gravitons Post author

      Agreed. But I think there’s a difference between “your bad coding is going to cause months of unnecessary grad student headaches” and “your bad coding is going to lead you to publish incorrect conclusions about the world”. The former is common enough in science, and I agree we should be much better about avoiding it. The latter is much rarer.

      Like

      Reply
  4. ohwilleke

    Genuine coding itself is already impressive. You would be stunned at how far up the food chain of big decision making in finance, in economics, and in lots of scientific fields, analysis and decisions are done via Xcel spreadsheets. A key economics journal article that reached an incorrect conclusion due to an Xcel spreadsheet formula error, was actually used as the basis for making major economic policy decisions for about a decade in dozens of countries. Even Mathmatica is a step up from that.

    Like

    Reply
  5. nostalgebraist

    Huh… I usually agree with your posts, the ones I understand anyway, but I disagree pretty strongly with this one.

    I don’t see any clear-cut distinction between “easy” programming tasks that don’t require good practices, and “hard” or “cutting-edge” ones that do.

    Most of the code written in the software industry is also doing routine and familiar stuff. There’s nothing new or cutting-edge about “make a database of users with names, passwords and profiles,” or “hook that database up to a website,” or “make the website look sleek and modern and Web 2.0-ish.” These, and other tasks of comparable mundanity, are what software engineers at tech companies are paid to do. If version control, automated testing, modular design, and other “good coding” practices are important at tech companies, it’s not because they are doing novel computational tasks.

    Indeed, I’d argue that good coding practices and “doing cutting-edge programming tasks” are almost mutually exclusive. Good coding practices become well-defined once an area of work is sufficiently well-understood that you can identify routine steps that can be automated, or identify pitfalls that repeatedly come up and develop explicit warnings against them. It’s like coming into a math/physics subfield and cleaning up the notation and concepts after an initial period of groundbreaking but notationally/conceptually somewhat confused work. Good programming, like good notation, is about identifying manual steps that skilled practitioners can already do and making those steps automatic. It’s not about going from impossible to possible, but from possible to easy and from easy to automatic.

    As an example, consider automated testing. Automated testing only makes sense when you have a clear sense of what right and wrong outputs looks like — when you’re doing “things that, when you screw them up, become quite obvious quite quickly.” If the tech industry wasn’t doing such things, it wouldn’t do automated testing — but it does, and (for the most part) academia doesn’t. Automated testing would be immensely helpful in numerics: it’s easy to check if you successfully wrote a 4th-order method (try it out, check the order of the errors), so we should be able to automate this, and let anyone verify any method they’ve written with the click of a button. Yet, grad students still spent lots of time doing this “easy” task over and over, sometimes even making new human errors while trying to check for previous human errors (source: personal experience :P). Yes, it’s “obvious” what a wrong output looks like here, it’s “easy” to check for it — but it takes time and effort, and when a human does it instead of a computer, it takes away mental resources that could be allocated to things a computer could not do. The reason this task should be automated is precisely that it is so routine: every unit of human effort expended on it is a purely wasted unit, because it could so unambiguously be done by automation.

    Finally, I don’t think the range of scientific problems which could be caused by “bugs” has a narrow or even well-defined scope. If someone says their Covid-19 simulation could have any number of completely arbitrary bugs in it, then I don’t trust anything that comes out of it — as far as I can tell this is the only possible response, on a literal reading of “any number of completely arbitrary bugs in it.” So the argument depends on the idea that sufficiently impactful bugs would be noticed by the developers. But the only way I can trust this assumption is knowing about how the developers checked their code for correctness — which means I’m asking them for good code practice! “Their code must have been ‘good’ enough to produce results that, right or wrong, looked publishable” is a very weak constraint, but without special attention to coding practice, it’s the only constraint we can trust to have been active in the production of academic code.

    Like

    Reply
    1. 4gravitons Post author

      You make a good point about “simple” vs “hard” tasks, on reflection the line I was trying to draw there doesn’t really make sense. If there’s something sensible left over, maybe it’s a question of scale: you’re perfectly right that most software engineering tasks are not hard from a CS standpoint, but they’re hard from a coordination standpoint: many contributors and many tasks that need to be done in a uniform and consistent way.

      The scale perspective, at least, seems to line up fairly well with the relevant scientific cultures. Subfields whose coding projects are bigger tend to have better coding practices, in the same way that subfields with more “CS-hard” requirements, like good numerical accuracy, do. That does suggest an important exception, which embarrassingly for me probably includes the Imperial group: if your coding project is atypically complicated for your subfield, the flaws in your coding practices will be atypically relevant.

      The case of testing suites is an interesting one. You’re right to point out that this is only relevant when you’re doing something “easy” enough to check. There’s a bound on the other side too, though: it needs to be “hard enough” that you can’t use existing professional code. That feels like a relatively small window. It certainly happens: if you care a lot about efficiency you might be forced to code up your own Runge-Kutta, while still being able to isolate it enough from the rest of your code that you can do automated testing. But I suspect you’d get a lot of cases outside that range, where your bugs are not “this module that I can cleanly describe in terms of a known problem is wrong” but “some of the code I used to connect the professionally coded known problem bits to each other is wrong”.

      As to why I think those bugs are likely to be caught, even without good coding practices, it’s mostly because I think good scientific practice should be enough by itself to catch relevant bugs. It kind of has to, because not every scientific task uses code at all. If I’m doing a pen-and-paper calculation, I don’t have access to automated testing. If I’m building a physical experiment, I can’t implement industrial quality control. Scientists in those situations figure out their own checks, rooted in what they expect to go wrong scientifically. “How good is the vacuum?” “What’s the background rate?” “Do these numbers add up to zero?” I’m not saying these kinds of checks always work, or that one should always trust scientists: there’s a lot of bad science out there in every field. I just don’t think code is uniquely vulnerable to this kind of thing. I think we’re debating “good coding practices” and not “good pen-and-paper practices” because today there is an industry of professional coders, and no longer an industry of professional pen-and-paper computers, not because scientists’ coding practices are any more suspect than their pen-and-paper ones.

      In terms of COVID simulations specifically, I’m coming from a more skeptical place than it probably seems, which might help illustrate my position. We had a colloquium from a COVID modeler here recently, one of the models the Danish government has been consulting. He showed plots of two models, one that included “superspreaders” and one that didn’t. Both seemed to describe current data equally well, but while the “superspreaders” model showed infections continue to fall after our lockdown ends, the “no superspreaders” model showed them shoot up immediately. The speaker put a positive spin on it, “with more detail it looks like we’re safe”, but my takeaway was that there is still a huge amount of scientific uncertainty in COVID modeling. And if you’re worried about governments taking academic models seriously, that seems a lot more relevant than whether one group’s simulation has subtle bugs.

      My slogan in this post was something like “scientific problems matter more than coding problems”. That has a corollary: if bugs do get through and affect your published work, if a buggy result is good enough to “look publishable”, then that tells me something about your science. It tells me that the scientific constraints on your work are weak enough that I shouldn’t rely on them that much to begin with, that even if your code were perfect your assumptions just haven’t been tested enough.

      (Not saying much about it here, but I agree 100% with your point about wasted effort: I doubt bad code affects our scientific beliefs, but it certainly makes a lot of grad students unnecessarily miserable.)

      Liked by 1 person

      Reply
      1. nostalgebraist

        Thanks for the response.

        I do agree that code is not special: for the scientific enterprise to work, we have to trust that scientists will do all kinds of nontrivial things with sufficient care, and if I don’t trust someone’s ability to write code when their research requires code, maybe I just don’t trust their ability to [X] when their research requires [X] in general.

        And you make a good point about the role of mature scientific ideas in building this trust. If you do an experiment whose outcome is an N-dimensional vector, and your theory rules out all outcomes outside a very narrow N-d region, then a lot of mistakes will produce an obviously wrong result instead of a silently wrong one.

        However, I do want to say that the comparison to other aspects of science cuts both ways. Code is a lot like lab work — it’s experimentalism in a simulated world — but I don’t see lab scientists display the same “not really my area” attitude toward lab work that simulation scientists frequently display towards code. Sure, maybe academic labs look sloppy relative to industry labs, but you wouldn’t ever see a lab experimentalist dismiss “good lab practice” per se as somehow irrelevant to their work. In other words, what is missing is the attitudes which, in other areas of science, work to build the trust we’re talking about.

        The stock negative stereotype of a lab PI is a dictator who micromanages people’s working hours and insists on meticulous record-keeping and impeccable caretaking of the instruments. (I’m sure you’ve seen those scary letters from chemistry PIs that have been passed around in the last few years.) Whereas the stock negative stereotype of a simulation PI is someone who has never seen the code, doesn’t care about it, and think of it as some grubby thing beneath them that the postdocs and grad students will work out on their own. It’s as if the director of a chemistry lab were to reveal they had no idea what instruments were in their lab or how they worked! This seems like an important difference when I’m trying to judge how much I trust the end result.

        Like

        Reply
        1. 4gravitons Post author

          Huh, this makes me think that a big part of the difference here are our stereotypes about lab PIs. My stereotype of labs are messy places in which everything is held together by duct tape and ritual, where postdocs and grad students pass on half-understood tricks and nothing is properly documented. This is how phdcomics depicts them, and phdcomics tends if anything to be more accurate than I expect. There are PIs who never show up, and there are PIs who micromanage, but I don’t think the micromanagey ones have a reputation for meticulousness nearly as much as they do for unreasonable expectations of what can be done with the equipment. But I probably haven’t seen the letters you mention.

          I don’t see experimentalists actively dismissing proper lab/engineering practices, but I think that’s because lab/engineering professionals don’t critique them as often. You can demand a scientist publish their code, it’s much harder for them to publicly share a physical experiment. You could critique blueprints and design specs, but often those don’t even exist.

          (Maybe chemistry is different from physics here? Chemistry has a lot of repetitive manual tasks, like pipetting, where you can judge quality, physics has a lot of one-off black magic like using scotch tape to make layers of graphene.)

          Liked by 1 person

          Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s