Mar 18·edited Mar 18Liked by Michael Huemer

I think you may be confusing consciousness (the capacity to experience qualia) with agency (the act of taking steps to achieve objectives). Reinforcement learning agents such as AlphaZero already possess "desires" – they aim to maximize their chances of winning. Yudkowsky's concern is that we might create an exceptionally intelligent agent with an inadequately defined utility function, resulting in everyone dying. The agent does not necessarily need to be conscious.

People working on AI-related existential risks often reference thought experiments, like Steven Omohundro's work instrumental convergence, to illustrate why an intelligent agent might cause human extinction. Essentially, Omohundro argues that nearly all utilitarian objectives drive the agent to pursue power, ultimately disempowering humans and potentially killing them in the process.

I find the reasoning behind the thought experiment convincing. I'm less convinced that we'll inevitably build such an AI, though. Yudkowsky's thoughts might stem from prior RL agent research, yet current self-supervised deep learning methods, used in LLMs like GPT-4, aren't producing anything close. Still, we probably should avoid constructing unrestricted utility-maximizers until we can align them with correct moral values.

Expand full comment
Mar 18Liked by Michael Huemer

This was a nice piece to read alongside AC10's "Why I Am Not (As Much Of) A Doomer (As Some People)"


Expand full comment
Mar 20Liked by Michael Huemer

> (Note: The computers of 1978 did not in fact have the intelligence of a human being.)

I think this point is debatable. Have you met humans?

Expand full comment
Mar 19·edited Mar 19Liked by Michael Huemer

Good piece. I'm glad to see you on the "don't panic" side!

On 4.3: This is a really important point that I have talked about and need to write about. Superintelligent AI could produce enormous benefits. Not least of which is radical life extension. Without this, we are all going to die. Humans just aren't making much progress on the aging problem. So the entire existing human race is likely to go extinct -- unless AI can accelerate the research. That raises the issue of how much to value the future existence of people who don't exist yet, but that's big thing to get into and I find it hard to talk to many AI people about this because they are utilitarians and I am not.

Expand full comment

A chess engine was able to beat a human grandmaster (in a single game) while being a knight down: https://www.chess.com/news/view/smerdon-beats-komodo-5-1-with-knight-odds.

A 5-1 loss is fine, in a chess tournament. It's much less fine when a single loss means the end of humanity.

Expand full comment

“we have not tried to exterminate all ants.”

We exterminate ants any time we consider it convenient and cost effective. If we had no concern about the environmental role of ants, or about the ants for their own sake, and an economical means for converting them into something more useful to us, we would.

So long as any AIs we create depend on us as a necessary part of their environment, or have a concern for humans directly, and they are smart enough not to shoot themselves in the foot, we are fine. But we don’t yet know how to let them modify themselves without opening up a possibility that they might decide that robots could fulfill the environmental role of humans pretty well considering the inconvenience of allowing humans to do their thing.

Perhaps it depends on how much we think morality is derived from pure reason and prudence. Is intelligence necessarily social? Necessarily social and welcoming to cooperative outsiders? What counts as cooperative? Would true advanced general intelligence rule out psychopathology? How alien can an AGI be and still be intelligent?

If we predict the behavior of AGI by extrapolating human intelligence, that seems like we are making some big assumptions. Are we able to design these assumptions into the AGI in a way that it can’t unlearn them? Are they baked into intelligence itself? The existence of human psychopaths demonstrates that this is not true even for the sample we are extrapolating from. Maybe the psychopaths that concern us are not smart enough to play along?

Expand full comment

Whether an AI has genuine desires or whether it merely simulates desires doesn't seem to make any difference to the argument for AI existential risk. I am confused why you seem to think that it does make a difference. What matters more is whether an AI takes high-quality actions towards a goal that is contrary to what humans want. And I don't think any argument in this essay demonstrates that AIs won't do that in the future.

I actually agree with you about Eliezer Yudkowsky being wrong about doom. I think he's wrong about a lot of things regarding AI. I wrote a comment outlining some of the ways that I disagree with him here: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=9ZhXbv8p2fr8mkXaa

But on the whole, and in an abstract sense, I think his arguments are way closer to being correct than the ones you gave here. I think it would help for you to think more carefully about the different types of possible future scenarios in which advanced AI gets created.

Even if no individual AI system is conscious, or can take over the world all by themselves, it still seems highly plausible to me that, in the long-run, AIs will gradually be given the keys to power in our world as they begin to automate intellectual labor, surpass the (functional) intelligence of humans in every relevant domain, and become much more numerous or materially productive than humans. In that type of scenario, it's quite clear to me that something could go deeply wrong, possibly ending in human extinction.

Expand full comment

What events or milestones would need to happen for you to consider AI X risk to even be a possibility?

And at what events would be needed for you to be concerned?

Expand full comment

Searle’s Chinese Room Argument always seemed dumb to me, and it surprises me that people are still using it.

The Chinese Room Argument contains two fallacies:

First, it conflates a composition of elements with a single element:

Is a plank of wood a ship? No, but a whole bunch of them are. You can't say that you can never sail across the ocean because a wood plank isn't a ship.

No, the message operator in the room doesn't speak Chinese. But the room *as a system* does speak it. Are the individual neurons in your brain conscious? No, they are simple machines. But you as a whole are sentient. Likewise, software may be sentient even if individual lines of code are not.

The second fallacy is a trick: it compares a one-rule operator to 100 billion neurons in your brain. The hidden argument is that a simple rule engine cannot compete with a 100-billion rule network. The hidden implication is that a "simple" system can never do the job of an ultra-complex one like the brain. But no one claims that sentience can be replicated with a simple system. Perhaps 100 billion neurons are the minimum needed for intelligence, and that's fine. In fact, GPT-4 has about the same number of neurons as the human brain.

Expand full comment

really really really really really really really really really really really really really really really really really really really bad arguments

Expand full comment

The Chinese Room argument is a terrible argument. I wish you would stop finding it so persuasive. The only form in which it has any semblance of plausibility is the look-up table version. No modern AI system is a look-up table. Neural networks are not look-up tables. Despite your protestations, yes, at an abstract level, neural networks work the same way as your brain does. You say: "You probably didn’t learn English by reading 8 million web pages and 570 Gigabytes of text.", but why should this matter for the plausibility of your Chinese Room argument? Why should it matter how the parameters of the neural network are learned? Your brain and a neural network are both systems composed of a very large number of units each doing a simple, mindless calculation (this is really the only thing that matters for your version of the Chinese room argument). If the Chinese room argument is convincing for one, it should be convincing for the other as well, regardless of how they learned the parameters of their respective mindless units.

Expand full comment

> In 2005, Ray Kurzweil predicted that personal computers would match the computing power of the human brain by 2020,

Computing power of human brain is roughly 50 Hz * 10^11 neurons = 5 * 10^12 operations per second. The most powerful consumer GPU released in 2020 was RTX 3090 which was capable of 36 teraflops. These numbers are not quite apples-to-apples, but still it seems like the development of personal computers has overtaken Kurzweil's prediction.

> AI would pass the Turing Test by 2029

ChatGPT is specifically trained to NOT pass Turing Test and to explicitly say that it's a language model at any opportunity. I have no doubt that if this part of the training was reversed, it would've easily passed Turing Test.

In the hindsight Minsky & co had completely wrong concept of AI development, so their predictions should be discounted.

> Well, that is all that an AI chatbot does. It converts your prompt into numbers, then multiplies and adds a whole bunch of numbers really fast, then converts the resulting string of numbers into text.

How is it significantly different from what's happening in our brain?

> Occasionally, an intelligent person has bad goals.

The risk is not that the AI will become more intelligent than any particular human, by achieving e.g. IQ 200. The risk is that it will become smarter than all humans put together by achieving e.g. IQ 1000.

Expand full comment

The ant analogy is also a bit unconvincing. Humans may not have tried to kill off all ants but you know how many species we have driven to extinction? It's loads! The AI may not care about us, but it will want to build a world that is suited to its needs just as we humans previously did; and that world probably look very different indeed; except that, since the AI is smarter than us, it will be able to alter the environment even more drastically than we have. Who knows what what world looks like, but we can say with certainty that a world built by an AI to suit AI needs is less hospitable for humans than a world built by humans to suit human needs.

Expand full comment

I disagree with the notion that most possible sets of actions an AI could take would be relatively harmless, and that only a few would be existential. The more powerful an AI gets, the less true this becomes, the higher the proportion of world states it can make happen are dangerous. If an AI gets control of a nuclear button, then "press the button yes/no?" is a decision with two options and one of them is catastrophic, so already 50% of the possibility space is disastrous. The more potentially devastating actions it is capable of taking, the more it has to do the "right" thing every time to keep humans safe.

Expand full comment

Some nitpicks (see my other comment for a more substantive reply):

- The Marvin Minsky quote is likely fictitious. See this article: https://www.openphilanthropy.org/research/what-should-we-learn-from-past-ai-forecasts/#2-the-peak-of-ai-hype

- Ray Kurzweil's prediction that a Turing Test will be passed by 2029 hasn't been falsified yet, so I don't understand why you're citing it in your list of failed predictions.

- I think the version of the Chinese Room argument you presented is very weak. We could imagine an analogous argument that applies equally to human brains. For example, you could say that other people are not conscious since all that's happening in their brain is that a bunch of subatomic particles are moving around. But obviously this is a bad argument, unless you're willing to say that humans aren't conscious either.

- I don't know what you mean by this: "You probably didn’t learn English by reading 8 million web pages and 570 Gigabytes of text. You did, however, observe the actual phenomena that the words refer to. The computer didn’t observe any of them." It's true that the human brain learns more efficiently than current machine learning models, but I don't see why data efficiency is a pre-requisite for consciousness. Also, multi-modal models like GPT-4 actually do observe "the actual phenomena" that words refer to, since they're fed in images as well as text. Future models will likely be fed in audio, video, and tactile information too. The only way I can see your argument making sense here is if you start by assuming your conclusion (namely, that the AI isn't conscious, so it's not really observing anything in the same way we are).

- I'm slightly skeptical that Magnus Carlsen would be able to consistently beat the latest version of Stockfish running on competitive hardware even with the handicap that a Stockfish doesn't have a knight. I think you might be underestimating the difference between Stockfish and Magnus Carlsen. I don't know that much about chess, but I notice that this ranking puts Stockfish at an elo score of about 3534 (https://ccrl.chessdom.com/ccrl/4040/), whereas Carlsen is typically rated nearly 700 points lower. To put that in perspective, that's about the same as the difference between Magnus Carlsen and a candidate master (4 ranks below a grandmaster, according to this table on Wikipedia: https://en.wikipedia.org/wiki/Chess_rating_system#Elo_rating_system). I'm not sure though and I'd love to hear someone who knows a lot more about chess weigh in.

Expand full comment

I think one of Yudkowsky's main points is that the space of possible "minds" is so vast and that the method used to realize such "minds" namely Gradient descent is likely to result in producing a "mind" that is a whole lot stranger than anyone imagines, I think he in many ways is fighting against the human tendency to anthropomorphize "minds", I think he is wrong about foom and intelligence being some sort of vast magical power, and so I'm currently not very concerned (although my beliefs have gone all over the place), nevertheless I think the point about AI potentially being a Shoggoth is overlooked by many of his critics, and perhaps over exaggerated by his followers.

Expand full comment