Fields and Scale

I am a theoretical particle physicist, and every morning I check the arXiv.

arXiv.org is a type of website called a preprint server. It’s where we post papers before they are submitted to (and printed by) a journal. In practice, everything in our field shows up on arXiv, publicly accessible, before it appears anywhere else. There’s no peer review process on arXiv, the journals still handle that, but in our field peer review doesn’t often notice substantive errors. So in practice, we almost never read the journals: we just check arXiv.

And so every day, I check the arXiv. I go to the section on my sub-field, and I click on a link that lists all of the papers that were new that day. I skim the titles, and if I see an interesting paper I’ll read the abstract, and maybe download the full thing. Checking as I’m writing this, there were ten papers posted in my field, and another twenty “cross-lists” were posted in other fields but additionally classified in mine.

Other fields use arXiv: mathematicians and computer scientists and even economists use it in roughly the same way physicists do. For biology and medicine, though, there are different, newer sites: bioRxiv and medRxiv.

One thing you may notice is the different capitalization. When physicists write arXiv, the “X” is capitalized. In the logo, it looks like a Greek letter chi, thus saying “archive”. The biologists and medical researchers capitalize the R instead. The logo still has an X that looks like a chi, but positioned with the R it looks like the Rx of medical prescriptions.

Something I noticed, but you might not, was the lack of a handy link to see new papers. You can search medRxiv and bioRxiv, and filter by date. But there’s no link that directly takes you to the newest papers. That suggests that biologists aren’t using bioRxiv like we use arXiv, and checking the new papers every day.

I was curious if this had to do with the scale of the field. I have the impression that physics and mathematics are smaller fields than biology, and that much less physics and mathematics research goes on than medical research. Certainly, theoretical particle physics is a small field. So I might have expected arXiv to be smaller than bioRxiv and medRxiv, and I certainly would expect fewer papers in my sub-field than papers in a medium-sized subfield of biology.

On the other hand, arXiv in my field is universal. In biology, bioRxiv and medRxiv are still quite controversial. More and more people are using them, but not every journal accepts papers posted to a preprint server. Many people still don’t use these services. So I might have expected bioRxiv and medRxiv to be smaller.

Checking now, neither answer is quite right. I looked between November 1 and November 2, and asked each site how many papers were uploaded between those dates. arXiv had the most, 604 papers. bioRxiv had roughly half that many, 348. medRxiv had 97.

arXiv represents multiple fields, bioRxiv is “just” biology. Specializing, on that day arXiv had 235 physics papers, 135 mathematics papers, and 250 computer science papers. So each individual field has fewer papers than biology in this period.

Specializing even further, I can look at a subfield. My subfield, which is fairly small, had 20 papers between those dates. Cell biology, which I would expect to be quite a big subfield, had 33.

Overall, the numbers were weirdly comparable, with medRxiv unexpectedly small compared to both arXiv and bioRxiv. I’m not sure whether there are more biologists than physicists, but I’m pretty sure there should be more cell biologists than theoretical particle physicists. This suggests that many still aren’t using bioRxiv. It makes me wonder: will bioRxiv grow dramatically in future? Are the people running it ready for if it does?

3 thoughts on “Fields and Scale

  1. AZ

    My current fun hypothesis is that the use of preprint servers in a field is inversely correlated with the ease of fraud in that field. For example, in hep-th, why even publish at all: cranks will be disregarded immediately since they don’t speak the mathematical language and give themselves away after one paragraph, and genuine errors can be uncovered by everyone as there’s no data that needs to be trusted. You can’t hide on math-ph or hep-th!

    Fully deductive (in the philosophical sense of the term) fields have absolutely no need to publish in a journal. Whereas data-dependent, inductive sciences need authorities and safety checks to prevent against fraudulent data, necessitating reviewers and journals. Case in point: a news outlet saying “a new medRxiv preprint claims that…” makes the hair on your neck stand up like no other sentence can. As such, I doubt bioRxiv and medRxiv will ever reach a popularity even close to arXiv.

    Like

    Reply
    1. 4gravitons Post author

      The principle makes sense, but it relies on the belief that peer review is actually capable of checking for fraudulent data (or other ways an empirical article can be wrong: crappy statistics, garden of forking paths, etc.). It can do that sometimes, but (I have the impression) it’s not like peer reviewers get to look at the initial dataset, they just get the same information everyone else who reads the paper does. So while it might be harder to hide in the life sciences, I don’t think it’s necessarily harder to hide in a way that peer reviewers can help with.

      (If anything it kind of goes in the other direction: in pure math, the peer reviewers are genuinely doing something that a casual reader isn’t, namely going through the whole proof and making sure it works. That does sometimes catch errors. It’s something that culturally physicists don’t often do (with some dedicated exceptions), even when it’s a topic that could benefit from the same approach. But that isn’t typically possible for a data-based paper.)

      I think you’re on to something with the mention of news though. One reason that it’s less important to rely on journals in physics is just because the stakes are lower. If the popular press is going to be spreading your conclusions and people will use them for their own health, then you probably want an extra guarantee that at least someone sensible has looked at it. While if a news outlet wants to say something about a dubious result in theoretical physics then it’s either pure entertainment level or it’s something like Quanta Magazine, where they have people who know how to ask the right questions and figure out whether a paper holds up.

      Like

      Reply
  2. ohwilleke

    “will bioRxiv grow dramatically in future?”

    I hope so. It would be better yet for all of the archive preprint sites to merge.

    I check read five of the top level arXiv preprint categories every day, and read bioRxiv when someone else has called attention to papers there (my main interest is mostly genetics and especially ancient DNA studies).

    These sites are to the diffusion of scientific knowledge through old school scientific journals, what Spotify is to FM radio. The amount of access it provides at a minimal cost (monthly broadband Internet access fees and a laptop) to a casually interested outsider is immense.

    Why limit yourself to reading the press releases at Science Daily when you can read the primary sources yourself?

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s