Scott Aaronson recently published an interesting exchange on his blog Shtetl Optimized, between him and cognitive psychologist Steven Pinker. The conversation was about AI: Aaronson is optimistic (though not insanely so) Pinker is pessimistic (again, not insanely though). While fun reading, the whole thing would normally be a bit too off-topic for this blog, except that Aaronson’s argument ended up invoking something I do know a bit about: how we make progress in theoretical physics.
Aaronson was trying to respond to an argument of Pinker’s, that super-intelligence is too vague and broad to be something we could expect an AI to have. Aaronson asks us to imagine an AI that is nothing more or less than a simulation of Einstein’s brain. Such a thing isn’t possible today, and might not even be efficient, but it has the advantage of being something concrete we can all imagine. Aarsonson then suggests imagining that AI sped up a thousandfold, so that in one year it covers a thousand years of Einstein’s thought. Such an AI couldn’t solve every problem, of course. But in theoretical physics, surely such an AI could be safely described as super-intelligent: an amazing power that would change the shape of physics as we know it.
I’m not as sure of this as Aaronson is. We don’t have a machine that generates a thousand Einstein-years to test, but we do have one piece of evidence: the 76 Einstein-years the man actually lived.
Einstein is rightly famous as a genius in theoretical physics. His annus mirabilis resulted in five papers that revolutionized the field, and the next decade saw his theory of general relativity transform our understanding of space and time. Later, he explored what general relativity was capable of and framed challenges that deepened our understanding of quantum mechanics.
After that, though…not so much. For Einstein-decades, he tried to work towards a new unified theory of physics, and as far as I’m aware made no useful progress at all. I’ve never seen someone cite work from that period of Einstein’s life.
Aarsonson mentions simulating Einstein “at his peak”, and it would be tempting to assume that the unified theory came “after his peak”, when age had weakened his mind. But while that kind of thing can sometimes be an issue for older scientists, I think it’s overstated. I don’t think careers peak early because of “youthful brains”, and with the exception of genuine dementia I don’t think older physicists are that much worse-off cognitively than younger ones. The reason so many prominent older physicists go down unproductive rabbit-holes isn’t because they’re old. It’s because genius isn’t universal.
Einstein made the progress he did because he was the right person to make that progress. He had the right background, the right temperament, and the right interests to take others’ mathematics and take them seriously as physics. As he aged, he built on what he found, and that background in turn enabled him to do more great things. But eventually, the path he walked down simply wasn’t useful anymore. His story ended, driven to a theory that simply wasn’t going to work, because given his experience up to that point that was the work that interested him most.
I think genius in physics is in general like that. It can feel very broad because a good genius picks up new tricks along the way, and grows their capabilities. But throughout, you can see the links: the tools mastered at one age that turn out to be just right for a new pattern. For the greatest geniuses in my field, you can see the “signatures” in their work, hints at why they were just the right genius for one problem or another. Give one a thousand years, and I suspect the well would eventually run dry: the state of knowledge would no longer be suitable for even their breadth.
…of course, none of that really matters for Aaronson’s point.
A century of Einstein-years wouldn’t have found the Standard Model or String Theory, but a century of physicist-years absolutely did. If instead of a simulation of Einstein, your AI was a simulation of a population of scientists, generating new geniuses as the years go by, then the argument works again. Sure, such an AI would be much more expensive, much more difficult to build, but the first one might have been as well. The point of the argument is simply to show such a thing is possible.
The core of Aaronson’s point rests on two key traits of technology. Technology is replicable: once we know how to build something, we can build more of it. Technology is scalable: if we know how to build something, we can try to build a bigger one with more resources. Evolution can tap into both of these, but not reliably: just because it’s possible to build a mind a thousand times better at some task doesn’t mean it will.
That is why the possibility of AI leads to the possibility of super-intelligence. If we can make a computer that can do something, we can make it do that something faster. That something doesn’t have to be “general”, you can have programs that excel at one task or another. For each such task, with more resources you can scale things up: so anything a machine can do now, a later machine can probably do better. Your starting-point doesn’t necessarily even have to be efficient, or a good algorithm: bad algorithms will take longer to scale, but could eventually get there too.
The only question at that point is “how fast?” I don’t have the impression that’s settled. The achievements that got Pinker and Aarsonson talking, GPT-3 and DALL-E and so forth, impressed people by their speed, by how soon they got to capabilities we didn’t expect them to have. That doesn’t mean that something we might really call super-intelligence is close: that has to do with the details, with what your target is and how fast you can actually scale. And it certainly doesn’t mean that another approach might not be faster! (As a total outsider, I can’t help but wonder if current ML is in some sense trying to fit a cubic with straight lines.)
It does mean, though, that super-intelligence isn’t inconceivable, or incoherent. It’s just the recognition that technology is a master of brute force, and brute force eventually triumphs. If you want to think about what happens in that “eventually”, that’s a very important thing to keep in mind.
Regarding “youthful brains”, you really need youthfulness of mind. Part of it is to be open to new ideas, regardless of whether they are newly emerged or you newly came to them. And you should look at your own ideas in new ways too, as if you saw them the first time instead of for decades.
Thinking of which, the hard thing seems to be a combination of having knowledge and experience, that is being oldish, while still keeping the youthfulness of mind as defined above. Guessing that the lack of such an old/young intersection is a common issue for theorists.
This link to a skeptical article about this issue by Ted Chiang in the New Yorker may be of interest http://www.newyorker.com/culture/annals-of-inquiry/why-computers-wont-make-themselves-smarter
Yeah, it looks like in the end he comes to basically the same conclusion, but with different valence. A single “Einstein simulation” or equivalent wouldn’t be enough for an intelligence explosion, a “scientific community” simulation would. I agree I don’t think there’s any evidence the latter is particularly close, but it still can be useful to think about, in the same way the far future generally is. And I don’t think we know that kind of thing is the only way to get an intelligence explosion, it’s just the only one we can describe simply in terms of extrapolations of existing human capabilities.
I like your blog.
Science magazine messed up my subscription during the pandemic. So, I have not been reading much recently. However, when I had been, there had been numerous articles questioning the scientific value of research in artificial intelligence precisely because there is no evidence that what they do can be replicated. That is, to produce one “AI Einstein” is a headline. To reproduce the copy of that “AI Einstein” in order to demonstrate an understanding of the mechanisms involved is what is being questioned.
From the first time I ever read about the Turing test, I questioned whether it had anything whatever to do with human intelligence. All of us are aware of con-artists who imitate legitimate entities with the expectation of fooling enough people to make their efforts profitable. As with your description of Einstein, Turing had been a combination of intellect and environment. His genius lay with the observation that actual computation on sheets of paper never involved more than finitely many symbols at any given step of a calculation. The relationship of this fact to “intelligence” appears, to me, to be more sociological.
From the time of the priority debate between Newton and Leibniz, British mathematics evolved separately from Continental mathematics. Empiricist philosophy is more closely associated with British mathematics. The idea that the universe can be explained as a machine can never be anything more than a belief defended with rhetoric. And, one side effect of rationalist demands in this regard has been to reduce their own “ground” for knowledge to mere dogmatic stipulation.
Popper’s student Bartley documents this in his book “Retreat to Commitment” available online at archive.org. A simple way to grasp the situation is to read the quote from “Alice in Wonderland” at the Wikipedia entry in Humpty Dumpty followed by the discussion in the Wikipedia entry on the Munchhausen trilemma. When you deny both infinities and circular regresses what is left is simply the dogmatic stipulation of what words mean by other words.
What is referred to as artificial intelligence depends, however, on more than Turing complete computation. What a logician might call a truth function is a switching function in other contexts. An excellent book on the mathematics of switching functions is “Threshold Logic” by Hu. The class of switching functions partitions according to the subclass of linearly separable switching functions. These are what Hu refers to as threshold functions.
As I wrote in a comment which will probably not be published on Dr. Aaronson’s blog, to associate “intelligence” with this mathematics is to essentially define intelligence as effective decidability. From Hu’s book, one surmises that brute force addresses two things. First, the ratio of threshold functions to switching functions decreases to zero as the number of Boolean variables increase without bound. Second, linear separability can be approximated. Such an approximation is a “reach to infinity” since what is not linearly separable cannot be made so.
I believe that this accounts for the conclusions in your link about the cubics.
I did once purchase a book on artificial neural networks which I never bothered to read. Its opening chapter explained the relationship between linear separability and Weierstrass’ polynomial approximation theorem.
Time to go. The sledgehammer I use to pay my bills is waiting.
I’d also heard that there was some discussion about whether the various impressive AI artifacts people are creating are reproducible, but I do get the impression there’s concrete research on this topic: at minimum there is quite a lot of discussion of scaling, and attempts to discover what it buys and what you can expect it to buy.
I actually am pretty bullish on the Turing test, in a certain context. It isn’t a proxy for certain kinds of intelligence: it won’t tell you whether the machine can also prove theorems or keep your kids from sticking tinkertoys in their noses. But it does neatly encapsulate intelligence in the sense of “intelligent beings should be treated as citizens”. Once you can consistently convince other people that you are a person, they will treat you as a person. That’s the only way that has ever worked: we treat others as equals because we find their behavior convincing that they are our equals.
Not being a computer scientist, I don’t know enough about switching and threshold functions to know what you’re talking about there. But I suggest if you think it accounts for the phenomenon of double descent then you should consider why the phenomenon of double descent is still considered an open problem. Maybe that’s just because someone with the relevant expertise hasn’t published on the topic, but then if you’re such a person maybe you should consider doing so. (Or maybe you just mean it explains the toy model with fitting a polynomial?)
(I’m also vaguely curious why you expect your comment to Aaronson not to be published…you seem polite enough here, if long-winded 😉 and I think I’m more censorious than he is.)
One reason I like your blog is the temperament behind your statements. The “civility interpretation” for the Turing test would be a good example of what I mean.
I have no doubt that there is research intended to address the questions brought out in the various articles I read in the past. As a whole, the science community does respond to certain kinds of criticism. Many of the alternative hypotheses for climate warming had been addressed; vaccine researchers moved quickly to obtain reliable data on possible connections to autism; and, if I recall correctly, the reproducibilty problem erupted in the field of psychology where there is a great deal of effort being made to improve quality.
Dr. Aaronson’s blog is fantastic. I simply think that something I inadvertently wrote may have violated what rules he does exercise. I will check tomorrow evening. If it does get posted I might find myself inundated with papers I should read. And, as I always do, I will go through them.
What I study on my own, as an autodidact, is the foundations of mathematics. When I started 37 years ago, I had certain ideas about mathematics not unlike what one finds across the Internet with talented people working in application. But, I had to sort through claims like “mathematics is formal,” “mathematics is extensional,” “mathematics is fictional,” and so on. Consequently, the matters are not so simple and the questions are very hard. A great deal rests upon belief and folklore.
I did not mean to imply that a question like double descent is simply understood within its problem domain. It is just that the entire field of artificial intelligence rests on the theory of switching functions. That there are consequences to this can be seen in discussions on Dr. Aaronson’s blog when researchers in neuroscience direct attention to the use of quantum formalism in their own field.
When Birkhoff and von Neumann attempted to characterize a logic for quantum mechanics, they introduced the theory of orthomodular lattices. Every Boolean lattice is an orthomodular lattice. So, it is intrinsic to the mathematics that quantum formalism will have applications beyond physics.
I often ask myself if certain problems in application domains are knocking on the door of undecidability and incompleteness. The expertise needed to even begin to address such questions is far beyond my abilities.