Tag Archives: arXiv

A Field That Doesn’t Read Its Journals

Last week, the University of California system ended negotiations with Elsevier, one of the top academic journal publishers. UC had been trying to get Elsevier to switch to a new type of contract, one in which instead of paying for access to journals they pay for their faculty to publish, then make all the results openly accessible to the public. In the end they couldn’t reach an agreement and thus didn’t renew their contract, cutting Elsevier off from millions of dollars and their faculty from reading certain (mostly recent) Elsevier journal articles. There’s a nice interview here with one of the librarians who was sent to negotiate the deal.

I’m optimistic about what UC was trying to do. Their proposal sounds like it addresses some of the concerns raised here with open-access systems. Currently, journals that offer open access often charge fees directly to the scientists publishing in them, fees that have to be scrounged up from somebody’s grant at the last minute. By setting up a deal for all their faculty together, UC would have avoided that. While the deal fell through, having an organization as big as the whole University of California system advocating open access (and putting the squeeze on Elsevier’s profits) seems like it can only lead to progress.

The whole situation feels a little surreal, though, when I compare it to my own field.

At the risk of jinxing it, my field’s relationship with journals is even weirder than xkcd says.

arXiv.org is a website that hosts what are called “preprints”, which originally meant papers that haven’t been published yet. They’re online, freely accessible to anyone who wants to read them, and will be for as long as arXiv exists to host them. Essentially everything anyone publishes in my field ends up on arXiv.

Journals don’t mind, in part, because many of them are open-access anyway. There’s an organization, SCOAP3, that runs what is in some sense a large-scale version of what UC was trying to set up: instead of paying for subscriptions, university libraries pay SCOAP3 and it covers the journals’ publication costs.

This means that there are two coexisting open-access systems, the journals themselves and arXiv. But in practice, arXiv is the one we actually use.

If I want to show a student a paper, I don’t send them to the library or the journal website, I tell them how to find it on arXiv. If I’m giving a talk, there usually isn’t room for a journal reference, so I’ll give the arXiv number instead. In a paper, we do give references to journals…but they’re most useful when they have arXiv links as well. I think the only times I’ve actually read an article in a journal were for articles so old that arXiv didn’t exist when they were published.

We still submit our papers to journals, though. Peer review still matters, we still want to determine whether our results are cool enough for the fancy journals or only good enough for the ordinary ones. We still put journal citations on our CVs so employers and grant agencies know not only what we’ve done, but which reviewers liked it.

But the actual copy-editing and formatting and publishing, that the journals still employ people to do? Mostly, it never gets read.

In my experience, that editing isn’t too impressive. Often, it’s about changing things to fit the journal’s preferences: its layout, its conventions, its inconvenient proprietary document formats. I haven’t seen them try to fix grammar, or improve phrasing. Maybe my papers have unusually good grammar, maybe they do more for other papers. And maybe they used to do more, when journals had a more central role. But now, they don’t change much.

Sometimes the journal version ends up on arXiv, if the authors put it there. Sometimes it doesn’t. And sometimes the result is in between. For my last paper about Calabi-Yau manifolds in Feynman diagrams, we got several helpful comments from the reviewers, but the journal also weighed in to get us to remove our more whimsical language, down to the word “bestiary”. For the final arXiv version, we updated for the reviewer comments, but kept the whimsical words. In practice, that version is the one people in our field will read.

This has some awkward effects. It means that sometimes important corrections don’t end up on arXiv, and people don’t see them. It means that technically, if someone wanted to insist on keeping an incorrect paper online, they could, even if a corrected version was published in a journal. And of course, it means that a large amount of effort is dedicated to publishing journal articles that very few people read.

I don’t know whether other fields could get away with this kind of system. Physics is small. It’s small enough that it’s not so hard to get corrections from authors when one needs to, small enough that social pressure can get wrong results corrected. It’s small enough that arXiv and SCOAP3 can exist, funded by universities and private foundations. A bigger field might not be able to do any of that.

For physicists, we should keep in mind that our system can and should still be improved. For other fields, it’s worth considering whether you can move in this direction, and what it would cost to do so. Academic publishing is in a pretty bizarre place right now, but hopefully we can get it to a better one.

A Paper About Ranking Papers

If you’ve ever heard someone list problems in academia, citation-counting is usually near the top. Hiring and tenure committees want easy numbers to judge applicants with: number of papers, number of citations, or related statistics like the h-index. Unfortunately, these metrics can be gamed, leading to a host of bad practices that get blamed for pretty much everything that goes wrong in science. In physics, it’s not even clear that these statistics tell us anything: papers in our field have been including more citations over time, and for thousand-person experimental collaborations the number of citations and papers don’t really reflect any one person’s contribution.

It’s pretty easy to find people complaining about this. It’s much rarer to find a proposed solution.

That’s why I quite enjoyed Alessandro Strumia and Riccardo Torre’s paper last week, on Biblioranking fundamental physics.

Some of their suggestions are quite straightforward. With the number of citations per paper increasing, it makes sense to divide each paper by the number of citations it contains: it means more to get cited by a paper with ten citations than by a paper with one hundred. Similarly, you could divide credit for a paper among its authors, rather than giving each author full credit.

Some are more elaborate. They suggest using a variant of Google’s PageRank algorithm to rank papers and authors. Essentially, the algorithm imagines someone wandering from paper to paper and tries to figure out which papers are more central to the network. This is apparently an old idea, but by combining it with their normalization by number of citations they eke a bit more mileage from it. (I also found their treatment a bit clearer than the older papers they cite. There are a few more elaborate setups in the literature as well, but they seem to have a lot of free parameters so Strumia and Torre’s setup looks preferable on that front.)

One final problem they consider is that of self-citations, and citation cliques. In principle, you could boost your citation count by citing yourself. While that’s easy to correct for, you could also be one of a small number of authors who cite each other a lot. To keep the system from being gamed in this way, they propose a notion of a “CitationCoin” that counts (normalized) citations received minus (normalized) citations given. The idea is that, just as you can’t make anyone richer just by passing money between your friends without doing anything with it, so a small community can’t earn “CitationCoins” without getting the wider field interested.

There are still likely problems with these ideas. Dividing each paper by its number of authors seems like overkill: a thousand-person paper is not typically going to get a thousand times as many citations. I also don’t know whether there are ways to game this system: since the metrics are based in part on citations given, not just citations received, I worry there are situations where it would be to someone’s advantage to cite others less. I think they manage to avoid this by normalizing by number of citations given, and they emphasize that PageRank itself is estimating something we directly care about: how often people read a paper. Still, it would be good to see more rigorous work probing the system for weaknesses.

In addition to the proposed metrics, Strumia and Torre’s paper is full of interesting statistics about the arXiv and InSpire databases, both using more traditional metrics and their new ones. Whether or not the methods they propose work out, the paper is definitely worth a look.

An Elliptical Workout

I study scattering amplitudes, probabilities that particles scatter off each other.

In particular, I’ve studied them using polylogarithmic functions. Polylogarithmic functions can be taken apart into “logs”, which obey identities much like logarithms do. They’re convenient and nice, and for my favorite theory of N=4 super Yang-Mills they’re almost all you need.

Well, until ten particles get involved, anyway.

That’s when you start needing elliptic integrals, and elliptic polylogarithms. These integrals substitute one of the “logs” of a polylogarithm with an integration over an elliptic curve.

And with Jacob Bourjaily, Andrew McLeod, Marcus Spradlin, and Matthias Wilhelm, I’ve now computed one.

tenpointimage

This one, to be specific

Our paper, The Elliptic Double-Box Integral, went up on the arXiv last night.

The last few weeks have been a frenzy of work, finishing up our calculations and writing the paper. It’s the fastest I’ve ever gotten a paper out, which has been a unique experience.

Computing this integral required new, so far unpublished tricks by Jake Bourjaily, as well as some rather powerful software and Mark Spradlin’s extensive expertise in simplifying polylogarithms. In the end, we got the integral into a “canonical” form, one other papers had proposed as the right way to represent it, with the elliptic curve in a form standardized by Weierstrass.

One of the advantages of fixing a “canonical” form is that it should make identities obvious. If two integrals are actually the same, then writing them according to the same canonical rules should make that clear. This is one of the nice things about polylogarithms, where these identities are really just identities between logs and the right form is comparatively easy to find.

Surprisingly, the form we found doesn’t do this. We can write down an integral in our “canonical” form that looks different, but really is the same as our original integral. The form other papers had suggested, while handy, can’t be the final canonical form.

What the final form should be, we don’t yet know. We have some ideas, but we’re also curious what other groups are thinking. We’re relatively new to elliptic integrals, and there are other groups with much more experience with them, some with papers coming out soon. As far as we know they’re calculating slightly different integrals, ones more relevant for the real world than for N=4 super Yang-Mills. It’s going to be interesting seeing what they come up with. So if you want to follow this topic, don’t just watch for our names on the arXiv: look for Claude Duhr and Falko Dulat, Luise Adams and Stefan Weinzierl. In the elliptic world, big things are coming.

An Amplitudes Flurry

Now that we’re finally done with flurries of snow here in Canada, in the last week arXiv has been hit with a flurry of amplitudes papers.

kitchener-construction

We’re also seeing a flurry of construction, but that’s less welcome.

Andrea Guerrieri, Yu-tin Huang, Zhizhong Li, and Congkao Wen have a paper on what are known as soft theorems. Most famously studied by Weinberg, soft theorems are proofs about what happens when a particle in an amplitude becomes “soft”, or when its momentum becomes very small. Recently, these theorems have gained renewed interest, as new amplitudes techniques have allowed researchers to go beyond Weinberg’s initial results (to “sub-leading” order) in a variety of theories.

Guerrieri, Huang, Li, and Wen’s contribution to the topic looks like it clarifies things quite a bit. Previously, most of the papers I’d seen about this had been isolated examples. This paper ties the various cases together in a very clean way, and does important work in making some older observations more rigorous.

 

Vittorio Del Duca, Claude Duhr, Robin Marzucca, and Bram Verbeek wrote about transcendental weight in something known as the multi-Regge limit. I’ve talked about transcendental weight before: loosely, it’s counting the power of pi that shows up in formulas. The multi-Regge limit concerns amplitudes with very high energies, in which we have a much better understanding of how the amplitudes should behave. I’ve used this limit before, to calculate amplitudes in N=4 super Yang-Mills.

One slogan I love to repeat is that N=4 super Yang-Mills isn’t just a toy model, it’s the most transcendental part of QCD. I’m usually fairly vague about this, because it’s not always true: while often a calculation in N=4 super Yang-Mills will give the part of the same calculation in QCD with the highest power of pi, this isn’t always the case, and it’s hard to propose a systematic principle for when it should happen. Del Duca, Duhr, Marzucca, and Verbeek’s work is a big step in that direction. While some descriptions of the multi-Regge limit obey this property, others don’t, and in looking at the ones that don’t the authors gain a better understanding of what sorts of theories only have a “maximally transcendental part”. What they find is that even when such theories aren’t restricted to N=4 super Yang-Mills, they have shared properties, like supersymmetry and conformal symmetry. Somehow these properties are tied to the transcendentality of functions in the amplitude, in a way that’s still not fully understood.

 

My colleagues at Perimeter released two papers over the last week: one, by Freddy Cachazo and Alfredo Guevara, uses amplitudes techniques to look at classical gravity, while the other, by Sebastian Mizera and Guojun Zhang, looks at one of the “pieces” inside string theory amplitudes.

I worked with Freddy and Alfredo on an early version of their result, back at the PSI Winter School. While I was off lazing about in Santa Barbara, they were hard at work trying to understand how the quantum-looking “loops” one can use to make predictions for potential energy in classical gravity are secretly classical. What they ended up finding was a trick to figure out whether a given amplitude was going to have a classical part or be purely quantum. So far, the trick works for amplitudes with one loop, and a few special cases at higher loops. It’s still not clear if it works for the general case, and there’s a lot of work still to do to understand what it means, but it definitely seems like an idea with potential. (Pun mostly not intended.)

I’ve talked before about “Z theory”, the weird thing you get when you isolate the “stringy” part of string theory amplitudes. What Sebastian and Guojun have carved out isn’t quite the same piece, but it’s related. I’m still not sure of the significance of cutting string amplitudes up in this way, I’ll have to read the paper more thoroughly (or chat with the authors) to find out.

A Tale of Two Archives

When it comes to articles about theoretical physics, I have a pet peeve, one made all the more annoying by the fact that it appears even in pieces that are otherwise well written. It involves the following disclaimer:

“This article has not been peer-reviewed.”

Here’s the thing: if you’re dealing with experiments, peer review is very important. Plenty of experiments have subtle problems with their methods, enough that it’s important to have a group of experts who can check them. In experimental fields, you really shouldn’t trust things that haven’t been through a journal yet: there’s just a lot that can go wrong.

In theoretical physics, though, peer review is important for different reasons. Most papers are mathematically rigorous enough that they’re not going to be wrong per se, and most of the ways they could be wrong won’t be caught by peer review. While peer review sometimes does catch mistakes, much more often it’s about assessing the significance of a result. Peer review determines whether a result gets into a prestigious journal or a less prestigious one, which in turn matters for job and grant applications.

As such, it doesn’t really make sense for a journalist to point out that a theoretical physics paper hasn’t been peer reviewed yet. If you think it’s important enough to write an article about, then you’ve already decided it’s significant: peer review wasn’t going to tell you anything else.

We physicists post our papers to arXiv, a free-to-access paper repository, before submitting them to journals. While arXiv does have some moderation, it’s not much: pretty much anyone in the field can post whatever they want.

This leaves a lot of people confused. In that sort of system, how do we know which papers to trust?

Let’s compare to another archive: Archive of Our Own, or AO3 for short.

Unlike arXiv, AO3 hosts not physics, but fanfiction. However, like arXiv it’s quite lightly moderated and free to access. On arXiv you want papers you can trust, on AO3 you want stories you enjoy. In each case, if anyone can post, how do you find them?

The first step is filtering. AO3 and arXiv both have systems of tags and subject headings. The headings on arXiv are simpler and more heavily moderated than those on AO3, but they both serve the purpose of letting people filter out the subjects, whether scientific or fictional, that they find interesting. If you’re interested in astrophysics, try astro-ph on arXiv. If you want Harry Potter fanfiction, try the “Harry Potter – J.K. Rowling” tag on AO3.

Beyond that, it helps to pay attention to authors. When an author has written something you like, it’s worth it not only to keep up with other things they write, but to see which other authors they like and pay attention to them as well. That’s true whether the author is Juan Maldacena or your favorite source of Twilight fanfic.

Even if you follow all of this, you can’t trust every paper you find on arXiv. You also won’t enjoy everything you dig up on AO3. Either way, publication (in journals or books) won’t solve your problem: both are an additional filter, but not an infallible one. Judgement is still necessary.

This is all to say that “this article has not been peer-reviewed” can be a useful warning, but often isn’t. In theoretical physics, knowing who wrote an article and what it’s about will often tell you much more than whether or not it’s been peer-reviewed yet.

arXiv vs. snarXiv: Can You Tell the Difference?

Have you ever played arXiv vs snarXiv?

arXiv is a preprint repository: it’s where we physicists put our papers before they’re published to journals.

snarXiv is…well..sound it out.

A creation of David Simmons-Duffin, snarXiv randomly generates titles and abstracts out of trendy arXiv buzzwords. It’s designed so that the papers on it look almost plausible…until you take a closer look, anyway.

Hence the game, arXiv vs snarXiv. Given just the titles of two papers, can you figure out which one is real, and which is fake?

I played arXiv vs snarXiv for a bit today, waiting for some code to run. Out of twenty questions, I only got two wrong.

Sometimes, it was fairly clear which paper was fake because snarXiv overreached. By trying to pile on too many buzzwords, it ended up with a title that repeated itself, or didn’t quite work grammatically.

Other times, I had to use some actual physics knowledge. Usually, this meant noticing when a title tied together unrelated areas in an implausible way. When a title claims to tie obscure mathematical concepts from string theory to a concrete problem in astronomy, it’s pretty clearly snarXiv talking.

The toughest questions, including the ones I got wrong, were when snarXiv went for something subtle. For short enough titles, the telltale signs of snarXiv were suppressed. There just weren’t enough buzzwords for a mistake to show up. I’m not sure there’s a way to distinguish titles like that, even for people in the relevant sub-field.

How well do you do at arXiv vs snarXiv? Any tips?

arXiv, Our Printing Press

IMG_20160714_091400

Johannes Gutenberg, inventor of the printing press, and possibly the only photogenic thing on the Mainz campus

I’ve had a few occasions to dig into older papers recently, and I’ve noticed a trend: old papers are hard to read!

Ok, that might not be surprising. The older a paper is, the greater the chance it will use obsolete notation, or assume a context that has long passed by. Older papers have different assumptions about what matters, or what rigor requires, and their readers cared about different things. All this is to be expected: a slow, gradual approach to a modern style and understanding.

I’ve been noticing, though, that this slow, gradual approach doesn’t always hold. Specifically, it seems to speed up quite dramatically at one point: the introduction of arXiv, the website where we store all our papers.

Part of this could just be a coincidence. As it happens, the founding papers in my subfield, those that started Amplitudes with a capital “A”, were right around the time that arXiv first got going. It could be that all I’m noticing is the difference between Amplitudes and “pre-Amplitudes”, with the Amplitudes subfield sharing notation more than they did before they had a shared identity.

But I suspect that something else is going on. With arXiv, we don’t just share papers (that was done, piecemeal, before arXiv). We also share LaTeX.

LaTeX is a document formatting language, like a programming language for papers. It’s used pretty much universally in physics and math, and increasingly in other fields. As it turns out, when we post a paper to arXiv, we don’t just send a pdf: we include the raw LaTeX code as well.

Before arXiv, if you wanted to include an equation from another paper, you’d format it yourself. You’d probably do it a little differently from the other paper, in accord with your own conventions, and just to make it easier on yourself. Over time, more and more differences would crop up, making older papers harder and harder to read.

With arXiv, you can still do all that. But you can also just copy.

Since arXiv makes the LaTeX code behind a paper public, it’s easy to lift the occasional equation. Even if you’re not lifting it directly, you can see how they coded it. Even if you don’t plan on copying, the default gets flipped around: instead of having to try to make your equation like the one in the previous paper and accidentally getting it wrong, every difference is intentional.

This reminds me, in a small-scale way, of the effect of the printing press on anatomy books.

Before the printing press, books on anatomy tended to be full of descriptions, but not illustrations. Illustrations weren’t reliable: there was no guarantee the monk who copied them would do so correctly, so nobody bothered. This made it hard to tell when an anatomist (fine it was always Galen) was wrong: he could just be using an odd description. It was only after the printing press that books could actually have illustrations that were reliable across copies of a book. Suddenly, it was possible to point out that a fellow anatomist had left something out: it would be missing from the illustration!

In a similar way, arXiv seems to have led to increasingly standard notation. We still aren’t totally consistent…but we do seem a lot more consistent than older papers, and I think arXiv is the reason why.