Monthly Archives: September 2024

The Mistakes Are the Intelligence

There’s a lot of hype around large language models, the foundational technology behind services like ChatGPT. Representatives of OpenAI have stated that, in a few years, these models might have “PhD-level intelligence“. On the other hand, at the time, ChatGPT couldn’t count the number of letter “r”s in the word “strawberry”. The model and the setup around it has improved, and GPT-4o1 apparently now gets the correct 3 “r”s…but I’m sure it makes other silly mistakes, mistakes an intelligent human would never make.

The mistakes made by large language models are important, due to the way those models are used. If people are going to use them for customer service, writing transcripts, or editing grammar, they don’t want to introduce obvious screwups. (Maybe this means they shouldn’t use the models this way!)

But the temptation is to go further, to say that these mistakes are proof that these models are, and will always be, dumb, not intelligent. And that’s not the right way to think about intelligence.

When we talk about intelligent people, when we think about measuring things like IQ, we’re looking at a collection of different traits. These traits typically go together in humans: a human who is good at one will usually be good at the others. But from the perspective of computer science, these traits are very different.

Intelligent people tend to be good at following complex instructions. They can remember more, and reason faster. They can hold a lot in their head at once, from positions of objects to vocabulary.

These are all things that computers, inherently, are very good at. When Turing wrote down his abstract description of a computer, he imagined a machine with infinite memory, able to follow any instructions with perfect fidelity. Nothing could live up to that ideal, but modern computers are much closer to it than humans. “Computer” used to be a job, with rooms full of people (often women) hired to do calculations for scientific projects. We don’t do that any more, machines have made that work superfluous.

What’s more, the kind of processing a Turing machine does is probably the only way to reliably answer questions. If you want to make sure you get the correct answer every time, then it seems that you can’t do better than to use a sufficiently powerful computer.

But while computer-the-machine replaced computer-the-job, mathematician-the-job still exists. And that’s because not all intelligence is about answering questions reliably.

Alexander Grothendieck was a famous mathematician, known for his deep insights and powerful ideas. According to legend, when giving a talk referring to prime numbers, someone in the audience asked him to name a specific prime. He named 57.

With a bit of work, any high-school student can figure out that 57, which equals 3 times 19, isn’t a prime number. A computer can easily figure out that 57 is not a prime number. Even ChatGPT knows that 57 is not a prime number.

But this doesn’t mean that Grothendieck was dumber than a high school student, or dumber than ChatGPT. Grothendieck was using a different kind of intelligence, the heuristic kind.

Heuristics are unreliable reasoning. They’re processes that get the right answer some of the time, but not all of the time. Because of that, though, they don’t have the same limits as reliable computer programs. Pick the right situation and the right conditions, and a heuristic can give you an answer faster than you could possibly get by following reliable rules.

Intelligent humans follow instructions well, but they also have good heuristics. They solve problems creatively, sometimes problems that are very hard for computers to address. People like Grothendieck make leaps of mathematical reasoning, guessing at the right argument before they have completely fleshed out a proof. This kind of intelligence is error-prone: rely on it, and you might claim 57 is prime. But at the moment, it’s our only intellectual advantage over machines.

Ultimately, ChatGPT is an advance in language processing, and language is a great example. Sentences don’t have definite meaning, we interpret what we read and hear in context, and sometimes our interpretation is wrong. Sometimes we hear words no-one actually said! It’s impossible, both for current technology and for the human brain, to process general text in a 100% reliable way. So large language models like GPT don’t do it reliably. They use an approximate model, a big complicated pile of rules tweaked over and over again until, enough of the time, they get the next word right in a text.

The kind of heuristic reasoning done by large language models is more effective than many people expected. Being able to predict the next word in a text unreliably also means you can write code unreliably, or count things unreliably, or do math unreliably. You can’t do any of these things as well as an appropriately-chosen human, at least not with current resources.

But in the longer run, heuristic intelligence is precisely the type of intelligence we should aspire to…or fear. Right now, we hire humans to do intellectual work because they have good heuristics. If we could build a machine with equivalent or better heuristics for those tasks, then people would hire a lot fewer humans. And if you’re worried about AI taking over the world, you’re worried about AI coming up with shortcuts to our civilization, tricks we couldn’t anticipate or plan against that destroy everything we care about. Those tricks can’t come from following rules: if they did, we could discover them just as easily. They would have to come from heuristics, sideways solutions that don’t work all the time but happen to work the one time that matters.

So yes, until the latest release, ChatGPT couldn’t tell you how many “r”s are in “strawberry”. Counting “r”s is something computers could already do, because it’s something that can be done by following reliable rules. It’s also something you can do easily, if you follow reliable rules. ChatGPT impresses people because it can do some of the things you do, that can’t be done with reliable rules. If technology like it has any chance of changing the world, those are the kinds of things it will have to be able to do.

The Bystander Effect for Reviewers

I probably came off last week as a bit of an extreme “journal abolitionist”. This week, I wanted to give a couple caveats.

First, as a commenter pointed out, the main journals we use in my field are run by nonprofits. Physical Review Letters, the journal where we publish five-page papers about flashy results, is run by the American Physical Society. The Journal of High-Energy Physics, where we publish almost everything else, is run by SISSA, the International School for Advanced Studies in Trieste. (SISSA does use Springer, a regular for-profit publisher, to do the actual publishing.)

The journals are also funded collectively, something I pointed out here before but might not have been obvious to readers of last week’s post. There is an agreement, SCOAP3, where research institutions band together to pay the journals. Authors don’t have to pay to publish, and individual libraries don’t have to pay for subscriptions.

And this is a lot better than the situation in other fields, yeah! Though I’d love to quantify how much. I haven’t been able to find a detailed breakdown, but SCOAP3 pays around 1200 EUR per article published. What I’d like to do (but not this week) is to compare this to what other fields pay, as well as to publishing that doesn’t have the same sort of trapped audience, and to online-only free journals like SciPost. (For example, publishing actual physical copies of journals at this point is sort of a vanity thing, so maybe we should compare costs to vanity publishers?)

Second, there’s reviewing itself. Even without traditional journals, one might still want to keep peer review.

What I wanted to understand last week was what peer review does right now, in my field. We read papers fresh off the arXiv, before they’ve gone through peer review. Authors aren’t forced to update the arXiv with the journal version of their paper, if they want another version, even if that version was rejected by the reviewers, then they’re free to do so, and most of us wouldn’t notice. And the sort of in-depth review that happens in peer review also happens without it. When we have journal clubs and nominate someone to present a recent paper, or when we try to build on a result or figure out why it contradicts something we thought we knew, we go through the same kind of in-depth reading that (in the best cases) reviewers do.

But I think I’ve hit upon something review does that those kinds of informal things don’t. It gets us to speak up about it.

I presented at a journal club recently. I read through a bombastic new paper, figured out what I thought was wrong with it, and explained it to my colleagues.

But did I reach out to the author? No, of course not, that would be weird.

Psychologists talk about the bystander effect. If someone collapses on the street, and you’re the only person nearby, you’ll help. If you’re one of many, you’ll wait and see if someone else helps instead.

I think there’s a bystander effect for correcting people. If someone makes a mistake and publishes something wrong, we’ll gripe about it to each other. But typically, we won’t feel like it’s our place to tell the author. We might get into a frustrating argument, there wouldn’t be much in it for us, and it might hurt our reputation if the author is well-liked.

(People do speak up when they have something to gain, of course. That’s why when you write a paper, most of the people emailing you won’t be criticizing the science: they’ll be telling you you need to cite them.)

Peer review changes the expectations. Suddenly, you’re expected to criticize, it’s your social role. And you’re typically anonymous, you don’t have to worry about the consequences. It becomes a lot easier to say what you really think.

(It also becomes quite easy to say lazy stupid things, of course. This is why I like setups like SciPost, where reviews are made public even when the reviewers are anonymous. It encourages people to put some effort in, and it means that others can see that a paper was rejected for bad reasons and put less stock in the rejection.)

I think any new structure we put in place should keep this feature. We need to preserve some way to designate someone a critic, to give someone a social role that lets them let loose and explain why someone else is wrong. And having these designated critics around does help my field. The good criticisms get implemented in the papers, the authors put the new versions up on arXiv. Reviewing papers for journals does make our science better…even if none of us read the journal itself.

Why Journals Are Sticky

An older professor in my field has a quirk: every time he organizes a conference, he publishes all the talks in a conference proceeding.

In some fields, this would be quite normal. In computer science, where progress flows like a torrent, new developments are announced at conferences long before they have the time to be written up carefully as a published paper. Conference proceedings are summaries of what was presented at the conference, published so that anyone can catch up on the new developments.

In my field, this is rarer. A few results at each conference will be genuinely new, never-before-published discoveries. Most, though, are talks on older results, results already available online. Writing them up again in summarized form as a conference proceeding seems like a massive waste of time.

The cynical explanation is that this professor is doing this for the citations. Each conference proceeding one of his students publishes is another publication on their CV, another work that they can demand people cite whenever someone uses their ideas or software, something that puts them above others’ students without actually doing any extra scientific work.

I don’t think that’s how this professor thinks about it, though. He certainly cares about his students’ careers, and will fight for them to get cited as much as possible. But he asks everyone at the conference to publish a proceeding, not just his students. I think he’d argue that proceedings are helpful, that they can summarize papers in new ways and make them more accessible. And if they give everyone involved a bit more glory, if they let them add new entries to their CV and get fancy books on their shelves, so much the better for everyone.

My guess is, he really believes something like that. And I’m fairly sure he’s wrong.

The occasional conference proceeding helps, but only because it makes us more flexible. Sometimes, it’s important to let others know about a new result that hasn’t been published yet, and we let conference proceedings go into less detail than a full published paper, so this can speed things up. Sometimes, an old result can benefit from a new, clearer explanation, which normally couldn’t be published without it being a new result (or lecture notes). It’s good to have the option of a conference proceeding.

But there is absolutely no reason to have one for every single talk at a conference.

Between the cynical reason and the explicit reason, there’s the banal one. This guy insists on conference proceedings because they were more useful in the past, because they’re useful in other fields, and because he’s been doing them himself for years. He insists on them because to him, they’re a part of what it means to be a responsible scientist.

And people go along with it. Because they don’t want to get into a fight with this guy, certainly. But also because it’s a bit of extra work that could give a bit of a career boost, so what’s the harm?

I think something similar to this is why academic journals still work the way they do.

In the past, journals were the way physicists heard about new discoveries. They would get each edition in the mail, and read up on new developments. The journal needed to pay professional copyeditors and printers, so they needed money, and they got that money from investors by being part of for-profit companies that sold shares.

Now, though, physicists in my field don’t read journals. We publish our new discoveries online on a non-profit website, formatting them ourselves with software that uses the same programming skills we use in the rest of our professional lives. We then discuss the papers in email threads and journal club meetings. When a paper is wrong, or missing something important, we tell the author, and they fix it.

Oh, and then after that we submit the papers to the same for-profit journals and the same review process that we used to use before we did all this, listing the journals that finally accept the papers on our CVs.

Why do we still do that?

Again, you can be cynical. You can accuse the journals of mafia-ish behavior, you can tie things back to the desperate need to publish in high-ranked journals to get hired. But I think the real answer is a bit more innocent, and human, than that.

Imagine that you’re a senior person in the field. You may remember the time before we had all of these nice web-based publishing options, when journals were the best way to hear about new developments. More importantly than that, though, you’ve worked with these journals. You’ve certainly reviewed papers for them, everyone in the field does that, but you may have also served as an editor, tracking down reviewers and handling communication between the authors and the journal. You’ve seen plenty of cases where the journal mattered, where tracking down the right reviewers caught a mistake or shot down a crackpot’s ambitions, where the editing cleaned something up or made a work more appear more professional. You think of the journals as having high standards, standards you have helped to uphold: when choosing between candidates for a job, you notice that one has several papers in Physical Review Letters, and remember papers you’ve rejected for not meeting what you intuited were that journal’s standards. To you, journals are a key part of being a responsible scientist.

Does any of that make journals worth it, though?

Well, that depends on costs. It depends on alternatives. It depends not merely on what the journals catch, but on how often they do it, and how much would have been caught on its own. It depends on whether the high standards you want to apply to job applicants are already being applied by the people who write their recommendation letters and establish their reputations.

And you’re not in a position to evaluate any of that, of course. Few people are, who don’t spend a ton of time thinking about scientific publishing.

And thus, for the non-senior people, there’s not much reason to push back. One hears a few lofty speeches about Elsevier’s profits, and dreams about the end of the big for-profit journals. But most people aren’t cut out to be crusaders or reformers, especially when they signed up to be scientists. Most people are content not to annoy the most respected people in their field by telling them that something they’ve spent an enormous amount of time on is now pointless. Most people want to be seen as helpful by these people, to not slack off on work like reviewing that they argue needs doing.

And most of us have no reason to think we know that much better, anyway. Again, we’re scientists, not scientific publishing experts.

I don’t think it’s good practice to accuse people of cognitive biases. Everyone thinks they have good reasons to believe what they believe, and the only way to convince them is to address those reasons.

But the way we use journals in physics these days is genuinely baffling. It’s hard to explain, it’s the kind of thing people have been looking quizzically at for years. And this kind of explanation is the only one I’ve found that matches what I’ve seen. Between the cynical explanation and the literal arguments, there’s the basic human desire to do what seems like the responsible thing. That tends to explain a lot.

Grad Students Don’t Have Majors

A pet peeve of mine:

Suppose you’re writing a story, and one of your characters is studying for a PhD in linguistics. You could call them a grad student or a PhD student, a linguistics student or even just a linguist. But one thing you absolutely shouldn’t call them is a linguistics major.

Graduate degrees, from the PhD to medical doctors to masters degrees, don’t have majors. Majors are a very specific concept, from a very specific system: one that only applies to undergraduate degrees, and even there is uncommon to unheard of in most of the world.

You can think of “major” as short for “major area of study”. In many universities in the US, bachelor’s degree students enter not as students of a particular topic, but as “undecided” students. They then have some amount of time to choose a major. Majors define some of your courses, but not all of them. You can also have “minors”, minor areas of study where you take a few courses from another department, and you typically have to take some number of general courses from other departments as well. Overall, the US system for bachelor’s students is quite flexible. The idea is that students can choose from a wide range of courses offered by different departments at a university, focusing on one department’s program but sampling from many. The major is your major focus, but not your only focus.

Basically no other degree works this way.

In Europe, bachelor’s degree students sign up as students of a specific department. By default, all of their courses will be from that department. If you have to learn more math, or writing skills, then normally your department will have its own math or writing course, focused on the needs of their degree. It can be possible to take courses from other departments, but it’s not common and it’s often not easy, sometimes requiring special permission. You’re supposed to have done your general education as a high school student, and be ready to focus on a particular area.

Graduate degrees in the US also don’t work this way. A student in medical school or law school isn’t a medicine major or a law major, they’re a med student or a law student. They typically don’t take courses from the rest of the university at that point, just from the med school or the law school. A student studying for an MBA (Master’s in Business Administration) is similarly a business student, not the business major they might have been during their bachelor’s studies. And a student studying for a PhD is a PhD student, a student of a specific department. They might still have the option of taking classes outside of that department (for example, I took classes in science communication). But these are special exceptions. A linguistics PhD student will take almost all of their classes from the linguistics department, a physics PhD student will take almost all of their classes from the physics department. They don’t have majors.

So the next time you write a story with people with advanced degrees, keep this in mind. Majors are a thing for US bachelor’s degrees, and a few similar systems. Anything else, don’t call it a major!