After I wrote about AI last time (https://fakenous.substack.com/p/are-we-on-the-verge-of-true-ai), some readers urged me to take more seriously AI existential risk, i.e., the threat that AI is going to kill all humans. Here, AI researcher Eliezer Yudkowsky freaks out about this threat: www.youtube.com/watch?v=gA1sNLL6yg4
He makes it sound like we’re virtually certain to all be killed by a superintelligent AI in the not-too-distant future. Here, I’ll explain why I’m not freaking out as much as Eliezer Yudkowsky.
1. The History of AI Predictions
AI researchers have a history of exaggerated predictions. From Wikipedia:
1958, H. A. Simon and Allen Newell: “within ten years a digital computer will be the world’s chess champion”; “within ten years a digital computer will discover and prove an important new mathematical theorem.”
1965, H. A. Simon: “machines will be capable, within twenty years, of doing any work a man can do.”
1970, Marvin Minsky: “In from three to eight years we will have a machine with the general intelligence of an average human being.” [Edit: This quote may be bogus: https://www.openphilanthropy.org/research/what-should-we-learn-from-past-ai-forecasts/ —mh]
(Note: The computers of 1978 did not in fact have the intelligence of a human being.)
Science fiction stories were similarly off. The highly intelligent HAL 9000 computer imagined by Arthur C. Clarke was supposed to exist in 2001. The Skynet system that tried to kill all humans in The Terminator was supposed to achieve self-awareness in 1997. (They also developed time travel by 2029, which seems unlikely.)
In 2005, Ray Kurzweil predicted that personal computers would match the computing power of the human brain by 2020, AI would pass the Turing Test by 2029, and people would be uploading themselves by the end of the 2030’s.
That’s just about AI, but more generally, humans have an embarrassing history of crying wolf about all kinds of things (https://fakenous.substack.com/p/the-doomsday-cult). This shows that there is a strong, widespread bias in the human mind towards disaster predictions. When you hear a disaster prediction, or feel inclined to make one yourself, you need to discount it to correct for that bias.
2. Will AI Achieve Consciousness?
What do modern AI programs do? They do giant math problems. Here’s an excellent video explaining neural networks, the type of program that recent specialized AI systems (like Dall-E, ChatGPT, or self-driving car software) use:
They do a very large, complex arithmetic problem, starting from an input string of numbers, using a set of numbers the program already has (acquired during “training”), and outputting another series of numbers. It’s a huge problem, but it is all just adding and multiplying little numbers.
Now to modernize Searle’s classic Chinese Room Argument: Imagine that we put you in a room with a giant book full of numbers. The book contains, say, 175 billion little numbers. A piece of paper with numbers on it comes in through a slit in the wall. You take the numbers on that paper, start multiplying them by numbers in the book, adding the results to other numbers, etc. It takes you about a million years, but eventually, after doing lots of addition and multiplication, you come up with a string of numbers that is the answer to this huge arithmetic problem. You write those numbers down on another piece of paper and pass them out through the slot in the wall.
Unbeknownst to you, the numbers that came in were the encoding of a prompt that someone typed into a chatbot program, and the numbers you passed out were the encoding of the chatbot’s response to that prompt.
Q: Do you, in this scenario, understand the prompt and your response to it?
A: Obviously not.
Well, that is all that an AI chatbot does. It converts your prompt into numbers, then multiplies and adds a whole bunch of numbers really fast, then converts the resulting string of numbers into text. If that wouldn’t give you an understanding of the text, there is no reason to think it would give a computer an understanding of the text either.
E.g., the text could be talking about the beautiful colors in the sunset. But neither you in the room nor the computer need have ever seen that sunset. A colorblind person could implement the program; he wouldn’t then suddenly know what colors look like.
It doesn’t matter if they give the program a trillion parameters instead of 175 billion, or if they give it more training data, etc. It’s still just multiplying and adding up numbers. There’s no reason to think that would give anyone an understanding of any of the text that those numbers encode.
That’s basically why I don’t believe that we’re about to build a conscious computer.
Q: But the computer is doing the same thing the brain does! So how can it not be conscious?
A: I don’t think it does the same thing the brain does. You probably didn’t learn English by reading 8 million web pages and 570 Gigabytes of text. You did, however, observe the actual phenomena that the words refer to. The computer didn’t observe any of them. The computer also is not made up of 86 billion little cells sending electrical impulses to each other. (And btw, the “neurons” of the neural network are generally not physically real.) It contains completely different materials in a completely different arrangement, doing completely different things. (More discussion of the nature of the mind is warranted, but there’s no time for that now.)
Caveat: I don’t know exactly what causes consciousness, so I can’t rule out that we are somehow going to artificially create it at some future time. I am just saying that the mere fact that a machine produces humanlike text, via the sort of processes current computers use, is not good evidence that it has any mental states.
3. Is Non-Conscious AI Safe?
Does this mean that AI is perfectly safe? Of course not. In some ways, non-conscious AI’s might be more dangerous than conscious beings. Consider the fate of Elaine Herzberg, the first pedestrian to be killed by a self-driving car (https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg). In March 2018, she was walking a bicycle across a road (jaywalking), when a self-driving Uber car hit and killed her. Apparently, the software detected her 6 seconds before the collision but did not recognize her as a person until it was too late. It may have been “confused” by the presence of the bicycle. (Presumably, the training data contained both pedestrians and bicyclists, but not people walking a bike.) The human backup operator in the car also was not watching the road, so this was also a human failure.
The interesting point is not that self-driving cars can have accidents. We have accidents too! The interesting point is that they make kinds of mistakes that no person would make. (People make different mistakes, like getting drunk or texting their friends while driving.) No person, if actually watching the road, would have hit that pedestrian. No person would think that because the pedestrian has a bicycle, that means you don’t have to slow down for them.
The reason a computer can make that kind of absurd error is that the computer has no understanding of any of this. It’s just a calculating machine. It doesn’t know what a person is, or a car, or what driving is, or why you shouldn’t run over people, etc.
Self-driving cars could still be safer than humans overall. But the concern we should have is about more powerful AI’s that we would use for more important tasks. It’s not implausible, for example, that if we put military weapons systems under AI control, a computer error might trigger a war. This is a real risk that we’re going to have to take care to avoid.
However, the usual scenarios you hear in which AI winds up deciding to kill everyone seem to essentially depend on the idea that the AI will literally have desires, in the same sense that we do. If you don’t assume that, it’s hard to make out a plausible extinction scenario. Any way of exterminating the entire human species is going to involve a very complicated plan, including specific counter-measures for everything that the humans are going to do to try to avoid extinction. Out of a huge space of possible sets of actions the machine can do, it would have to navigate an extremely specific, narrow path that leads to human extinction. It’s very hard to see how the computer is going to just happen to end up doing the exact things it needs to do to kill all of us, merely as a result of programming error, or some unusual input that wasn’t represented in the training data.
In other words, it’s highly unlikely that the set of weights and biases in the neural network that minimize a loss function during training is going to just happen to be one of the tiny number of such sets that lead it to do some human-extinction-causing sequence of actions in the real world.
Look at real-world computer failures. In the Uber accident discussed above, the car killed the pedestrian, but that didn’t involve the computer going through a sequence of actions that looked like a clever plan to target that pedestrian (or to do anything else). E.g., swerving toward the pedestrian as she moved out of the way, driving off the road to get her, etc. It just failed to brake.
4. The Dangers of Intelligence
I admitted above (sec. 2) that we might produce consciousness artificially at some future time. So what if we make an AI that literally has desires and understanding? Sure, then it might kill us. But it probably won’t.
4.1. Would a Superintelligent Being Want to Kill Us?
Probably not. If we figured out how to produce genuine desires and other mental states, we would probably create AI’s with benevolent desires.
When asked why the AI would want to kill us, Yudkowsky explained that “your body is made out of atoms that it could use for something else.” He made this sound almost inevitable, just based on theoretical reflection about the nature of intelligence. But consider actual, observed examples of intelligence.
Humans are highly intelligent compared to other animals. We are vastly smarter than ants (just as, people speculate, advanced AI will be vastly smarter than us). Yet we have not tried to exterminate all ants.
Or compare high-IQ humans to low-IQ humans. The high-IQ humans do not generally try to kill off the low-IQ ones. High-IQ people are in fact much less likely than low-IQ people to want to harm other people. So if we somehow build super-intelligent conscious AI’s, they will probably be even less crime-prone than smart humans.
Granted, we only have a small amount of evidence, since we only know one intelligent species, and conscious computers might be different. But at least the little bit of empirical evidence we have about intelligence does not suggest that it inevitably leads to wanting to destroy less intelligent beings.
Objection: Human tribes have often killed and/or enslaved other tribes. We still routinely torture other species and then chop their bodies into pieces for our pleasure. If we model an AI on our own minds, then it will lose all respect for us as soon as it has more power than us—because that’s what we are like.
Reply: That’s a fair point. So we’d better not model the AI on ourselves—at least not the average human; perhaps we could design a computer to imitate the best people we know. If we make a conscious machine, we’ll probably design it to be much more benevolent than us. E.g., if we figure out the basis for emotions, perhaps we’d make the AI love us.
4.2. Would a Superintelligent Being Be Able to Kill Us?
Occasionally, an intelligent person has bad goals. Ted Kaczynski, a.k.a. the Unabomber, is said to have a 167 IQ, making him among the smartest murderers in history. He’s now in prison for mailing numerous bombs and killing three people.
Wherever he’s being held, he’s much smarter than any of the guards in that prison. But that does not mean that he’s going to escape that prison.
You can imagine any genius you want. Say Einstein, or Isaac Newton, or whoever is your favorite genius, gets thrown in prison. If the genius and the prison guard have a contest of wits starting from equal positions, then the genius is going to win. But if the genius starts out on the inside of the prison cell, and the guard on the outside, then the genius is never getting out. Intelligence isn’t magic; it doesn’t enable you to do just anything you want.
Take another example. If Magnus Carlsen (the human world chess champion) plays a match against Alpha Zero (the world’s best chess program), Carlsen is going to lose, with near 100% certainty. But now imagine that Carlsen gets a handicap: Alpha Zero has to play without one of its knights. Well, in that case, Carlsen is going to win every time. There’s just no recovering (against an expert player) from that kind of disadvantage. It doesn’t matter if you ramp up the program’s processing power by a trillion-fold or let it train for a decade. The program could play a perfect game from that point, and it would still lose. (Ask any chess player.)
If we build an Artificial General Intelligence, we’re probably going to give ourselves huge starting advantages over it, advantages that simply can’t be overcome even with the equivalent of perfect play.
4.3. Objection: The Precautionary Principle: If there is even a small chance that AI will kill everyone, shouldn’t we try to stop it?
Okay, this one is a good point. The odds of our building an evil, superintelligent computer are small, but we should still make efforts to avoid doing that, since the consequences are so large. So I’m glad that there are AI safety researchers working on that now.
It’s worth considering the other side of the ledger, though, the good side of superintelligent AI. There are other existential risks that humanity faces, which AI may help us avert. E.g., it might help us build an effective asteroid-defense system, or find a cure for a global pandemic. It seems more likely that advanced AI will help save the human species from extinction than that it will extinguish the species. So it makes sense to continue working on AI, while still working on ways of making it safer.
5. Humans: The Real Threat
As I mentioned in a previous post (https://fakenous.substack.com/p/existential-risks-ai), I think our biggest problem isn’t AI alignment but human alignment.
We’re lucky that nuclear weapons are expensive and difficult to make. If we had a cheap, widely available technology with the power to extinguish the human species, some human would deliberately use it to do that. That, in my view, is the biggest threat to human survival. Extremely evil/crazy humans are a very small portion of the species, but a small number might be able to cause enormous harm, and (unlike killer computers) we have definite, empirical proof of such humans existing. If ISIS had access to a technology that would kill all the Jews in the world, they would use it in a second.
So if, e.g., we build a superintelligent AI with lots of safety features built in, we also need to figure out how to stop humans from deliberately disabling the safety features.
I think you may be confusing consciousness (the capacity to experience qualia) with agency (the act of taking steps to achieve objectives). Reinforcement learning agents such as AlphaZero already possess "desires" – they aim to maximize their chances of winning. Yudkowsky's concern is that we might create an exceptionally intelligent agent with an inadequately defined utility function, resulting in everyone dying. The agent does not necessarily need to be conscious.
People working on AI-related existential risks often reference thought experiments, like Steven Omohundro's work instrumental convergence, to illustrate why an intelligent agent might cause human extinction. Essentially, Omohundro argues that nearly all utilitarian objectives drive the agent to pursue power, ultimately disempowering humans and potentially killing them in the process.
I find the reasoning behind the thought experiment convincing. I'm less convinced that we'll inevitably build such an AI, though. Yudkowsky's thoughts might stem from prior RL agent research, yet current self-supervised deep learning methods, used in LLMs like GPT-4, aren't producing anything close. Still, we probably should avoid constructing unrestricted utility-maximizers until we can align them with correct moral values.
This was a nice piece to read alongside AC10's "Why I Am Not (As Much Of) A Doomer (As Some People)"
https://open.substack.com/pub/astralcodexten/p/why-i-am-not-as-much-of-a-doomer?r=u0kr&utm_campaign=post&utm_medium=web