Tag Archives: arXiv

Who Plagiarizes an Acknowledgements Section?

I’ve got plagiarists on the brain.

Maybe it was running into this interesting discussion about a plagiarized application for the National Science Foundation’s prestigious Graduate Research Fellowship Program. Maybe it’s due to the talk Paul Ginsparg, founder of arXiv, gave this week about, among other things, detecting plagiarism.

Using arXiv’s repository of every paper someone in physics thought was worth posting, Ginsparg has been using statistical techniques to sift out cases of plagiarism. Probably the funniest cases involved people copying a chunk of their thesis acknowledgements section, as excerpted here. Compare:

“I cannot describe how indebted I am to my wonderful girlfriend, Amanda, whose love and encouragement will always motivate me to achieve all that I can. I could not have written this thesis without her support; in particular, my peculiar working hours and erratic behaviour towards the end could not have been easy to deal with!”

“I cannot describe how indebted I am to my wonderful wife, Renata, whose love and encouragement will always motivate me to achieve all that I can. I could not have written this thesis without her support; in particular, my peculiar working hours and erratic behaviour towards the end could not have been easy to deal with!”

Why would someone do this? Copying the scientific part of a thesis makes sense, in a twisted way: science is hard! But why would someone copy the fluff at the end, the easy part that’s supposed to be a genuine take on your emotions?

The thing is, the acknowledgements section of a thesis isn’t exactly genuine. It’s very formal: a required section of the thesis, with tacit expectations about what’s appropriate to include and what isn’t. It’s also the sort of thing you only write once in your life: while published papers also have acknowledgements sections, they’re typically much shorter, and have different conventions.

If you ever were forced to write thank-you notes as a kid, you know where I’m going with this.

It’s not that you don’t feel grateful, you do! But when you feel grateful, you express it by saying “thank you” and moving on. Writing a note about it isn’t very intuitive, it’s not a way you’re used to expressing gratitude, so the whole experience feels like you’re just following a template.

Literally in some cases.

That sort of situation: where it doesn’t matter how strongly you feel something, only whether you express it in the right way, is a breeding ground for plagiarism. Aunt Mildred isn’t going to care what you write in your thank-you note, and Amanda/Renata isn’t going to be moved by your acknowledgements section. It’s so easy to decide, in that kind of situation, that it’s better to just grab whatever appropriate text you can than to teach yourself a new style of writing.

In general, plagiarism happens because there’s a disconnect between incentives and what they’re meant to be for. In a world where very few beginning graduate students actually have a solid research plan, the NSF’s fellowship application feels like a demand for creative lying, not an honest way to judge scientific potential. In countries eager for highly-cited faculty but low on preexisting experts able to judge scientific merit, tenure becomes easier to get by faking a series of papers than by doing the actual work.

If we want to get rid of plagiarism, we need to make sure our incentives match our intent. We need a system in which people succeed when they do real work, get fellowships when they honestly have talent, and where we care about whether someone was grateful, not how they express it. If we can’t do that, then there will always be people trying to sneak through the cracks.

Amplitudes on Paperscape

Paperscape is a very cool tool developed by Damien George and Rob Knegjens. It analyzes papers from arXiv, the paper repository where almost all physics and math papers live these days. By putting papers that cite each other closer together and pushing papers that don’t cite each other further apart, Paperscape creates a map of all the papers on arXiv, arranged into “continents” based on the links between them. Papers with more citations are shown larger, newer papers are shown brighter, and subject categories are indicated by color-coding.

Here’s a zoomed-out view:

PaperscapeFullMap

Already you can see several distinct continents, corresponding to different arXiv categories like high energy theory and astrophysics.

If you want to find amplitudes on this map, just zoom in between the purple continent (high energy theory, much of which is string theory) and the green one (high energy lattice, nuclear experiment, high energy experiment, and high energy phenomenology, broadly speaking these are all particle physics).

PaperscapeAmplitudesMap

When you zoom in, Paperscape shows words that commonly appear in a given region of papers. Zoomed in this far, you can see amplitudes!

Amplitudeologists like me live on an island between particle physics and string theory. We’re connected on both sides by bridges of citations and shared terms, linking us to people who study quarks and gluons on one side to people who study strings and geometry on the other. Think of us like Manhattan, an island between two shores, densely networked in to the surroundings.

PaperscapeZoomedMap

Zoom in further, and you can see common keywords for individual papers. Exploring around here shows not only what is getting talked about, but what sort of subjects as well. You can see by the color-coding that many papers in amplitudes are published as hep-th, or high energy theory, but there’s a fair number of papers from hep-ph (phenomenology) and from nuclear physics as well.

There’s a lot of interesting things you can do with Paperscape. You can search for individuals, or look at individual papers, seeing who they cite and who cite them. Try it out!

What’s up with arXiv?

First of all, I wanted to take a moment to say that this is the one-year anniversary of this blog. I’ve been posting every week, (almost always) on Friday, since I first was motivated to start blogging back in November 2012. It’s been a fun ride, through ups and downs, Ars Technica and Amplituhedra, and I hope it’s been fun for you, the reader, as well!

I’ve been giving links to arXiv since my very first post, but I haven’t gone into detail about what arXiv is. Since arXiv is a rather unique phenomenon, it could use a more full description.

arXivpic

The word arXiv is pronounced much like the normal word archive, just think of the capital X like a Greek letter Chi.

Much as the name would suggest, arXiv is an archive, specifically a preprint archive. A pre-print is in a sense a paper before it becomes a paper; more accurately, it is a scientific paper that has not yet been published in a journal. In the past, such preprints would be kept by individual universities, or passed between interested individuals. Now arXiv, for an increasing range of fields (first physics and mathematics, now also computer science, quantitative biology, quantitative finance, and statistics) puts all of the preprints in one easily accessible, free to access place.

Different fields have different conventions when it comes to using arXiv. As a theoretical physicist, I can only really speak to how we use the system.

When theoretical physicists write a paper, it is often not immediately clear which journal we should submit it to. Different journals have different standards, and a paper that will gather more interest can be published in a more prestigious journal. In order to gauge how much interest a paper will raise, most theoretical physicists will put their papers up on arXiv as preprints first, letting them sit there for a few months to drum up attention and get feedback before formally submitting the paper to a journal.

The arXiv isn’t just for preprints, though. Once a paper is published in a journal, a copy of the paper remains on arXiv. Often, the copy on arXiv will be updated when the paper is updated, changed to the journal’s preferred format and labeled with the correct journal reference. So arXiv, ultimately, contains almost all of the papers published in theoretical physics in the last decade or two, all free to read.

But it’s not just papers! The digital format of arXiv makes it much easier to post other files alongside a paper, so that many people upload not just their results, but the computer code they used to generate them, or their raw data in long files. You can also post papers too long or unwieldy to publish in a journal, making arXiv an excellent dropping-off point for information in whatever format you think is best.

We stand at the edge of a new age of freely accessible science. As more and more disciplines start to use arXiv and similar services, we’ll have more flexibility to get more information to more people, while still keeping the advantage of peer review for publication in actual journals. It’s going to be very interesting to see where things go from here.