After I wrote about AI last time (https://fakenous.substack.com/p/are-we-on-the-verge-of-true-ai), some readers urged me to take more seriously AI existential risk, i.e., the threat that AI is going to kill all humans.
Mar 18, 2023·edited Mar 18, 2023Liked by Michael Huemer
I think you may be confusing consciousness (the capacity to experience qualia) with agency (the act of taking steps to achieve objectives). Reinforcement learning agents such as AlphaZero already possess "desires" – they aim to maximize their chances of winning. Yudkowsky's concern is that we might create an exceptionally intelligent agent with an inadequately defined utility function, resulting in everyone dying. The agent does not necessarily need to be conscious.
People working on AI-related existential risks often reference thought experiments, like Steven Omohundro's work instrumental convergence, to illustrate why an intelligent agent might cause human extinction. Essentially, Omohundro argues that nearly all utilitarian objectives drive the agent to pursue power, ultimately disempowering humans and potentially killing them in the process.
I find the reasoning behind the thought experiment convincing. I'm less convinced that we'll inevitably build such an AI, though. Yudkowsky's thoughts might stem from prior RL agent research, yet current self-supervised deep learning methods, used in LLMs like GPT-4, aren't producing anything close. Still, we probably should avoid constructing unrestricted utility-maximizers until we can align them with correct moral values.
Thanks. We have a fundamental disagreement: I don't think any computer has any desires, or beliefs, or intentions. I don't think AlphaZero aims to do anything (not even win at chess). It's a mechanism that was invented by people who wanted to find winning chess moves, but *it* doesn't have any such goal.
It's similar to a watch. Humans invented a complex mechanism for the purpose of telling the time, and it's very good at that. But *the watch* doesn't have a goal of keeping time, or anything else. That's like how I think of computers.
Regarding Omohundro's argument: That sounds plausible from the armchair. But let's look at actual intelligent agents and see. We have some intelligent agents. Some of them pursue power (e.g., politicians, dictators), but as a matter of fact, most of them don't care all that much about it, and almost all who do are men (not women, not children). Also, almost none of the actual intelligent agents we have are interested in killing people. So it can't be that such behavior results from the nature of intelligence as such.
Omohundro's argument is not about intelligence per se, but about goal-directed optimization. So what we observe is that humans aren't utility maximizers. But a hypothetical AI, like a paperclip maximizer, might be one.
I think it's irrelevant for the argument whether computers have real desires, or just act as if they do. For example, if we could simulate a person's brain on a computer, it could still act as if has desires even if it does not have qualia. What really matters is how well a system's behavior is approximated by a certain mathematical model.
What do you mean by a "utility maximizer"? Any agent could be described as maximizing utility, if you allow weird enough utility functions. Maybe you mean maximizing a *simple* utility function. Here, the post referenced in my other comment is relevant: https://sohl-dickstein.github.io/2023/03/09/coherence.html, where the author argues that more intelligent agents tend to have less simple, less unified goals.
Regarding real desires: If a machine *perfectly* simulates some set of desires, then it doesn't matter to its behavior whether it really has desires. However, the scenarios people are worried about involve imperfect simulation. E.g., we build a computer that acts like a highly intelligent being who wants to help people during training, then when deployed in the real world, it stops acting that way.
So we have an imperfect simulation of benevolent desires. If it is actually a conscious agent, then it might have some *different* desires. If, however, it has no mental states at all, then, when you learn that it fails to perfectly simulate benevolent desires, you have no reason to think that it would, instead, perfectly simulate some simple *other* set of desires. That remains logically possible, but it's only a tiny fraction of the space of possibilities. Most of the possibility space surrounding "perfect simulation of benevolent desires" is taken up by incoherent behavior patterns (which fail to simulate any simple set of desires).
The Chinese Room thought experiment seems very ambiguous to me. Originally it was about translating English into Chinese, wasn’t it? All the work is being done by the assumption that the room has been created with the capability to look up an output given an input. It is a non-computer version of Google Translate.
Regarding consciousness, it proves that if the Chinese room is conscious, the consciousness is situated in the setup, orthe combination of the setup and the operator, not just in the ignorant person performing the operations. The operator is analogous to a subconscious automatic mental process. Demonstrating that consciousness is absent from that part does not demonstrate that it is absent from the whole.
But we are not concerned with consciousness per se, but with intelligence, ability to learn, ability to develop and pursue goals, and the possibility that those goals and actions might become a serious danger. If the Chinese room can.t learn, the danger comes from what it has already learned, but not yet demonstrated. If it can learn, that danger continues to exist, but is joined by new sources of concern.
A conscious agent can create its own goals. A static device will be given goals as part of its design, goals it will not necessarily be aware of, but that it will pursue if the design is successful. Perhaps “goal” is a misleading word there. How should we replace it? Behavioral tendency? But as capabilities and decisions are offloaded into the unconscious object, the difference between it having tendencies and goals decreases. If it is modified by its experiences, it’s responses will change. It’s behavioral tendencies will change. If this all happens quickly enough, it can provide a cause for concern, whether or not we think of the object as conscious.
Perhaps this seems to advise that such objects should be used only in a static form, not allowed to update themselves. This seems both difficult (if they can modify their environment at all, they in some sense are modifying themselves) and counter to the intended use of an “intelligent” device. It’s intelligence is limited if it can only learn on the training ground,and then apply that knowledge in a static way in the field.
Even if this approach succeeds, it requires developers to limit their designs accordingly. Will they?
The original version of the Chinese Room imagined a program for Chinese-language conversation. Like a Chinese chatbot. A non-Chinese-speaking person goes into the room and implements the program, thereby appearing to understand Chinese.
Searle considers the possibility that the whole room understands Chinese. He then asks you to imagine that the person memorizes the rule book and works out in an open field, so that there is nothing but the person there.
I believe you wrote that you are not a functionalist although I can't find that in these replies (so please excuse me for posting this here). If that's right, I'd be very interested in hearing why. Might that be a topic for a future post. (If you've covered this elsewhere, please point me that way). I'm still a functionalist (more of a Churchland type than a Dennett type) but I haven't read much philosophy of mind for 20 years or so, so my views might need updating. I doubt you're a dualist. Traditionally, that left some form of monism...
Whoever “programmed” the room has set the goals, if it is static. Otherwise, they set the process by which new goals are developed. And even the purely static approach has encountered alignment problems, I think.
I’m not familiar enough with how chatGPT operates. My impression is that it was not updating as it went, but using a static model developed during a separate training phase. But the developers still had trouble getting it to stay within the parameters they had set.
I assume a serious AGI will not work from a static model developed separately, but will learn actively from its experiences. This would appear to make alignment more difficult.
I don't believe that a computer has any goals. Programmers have goals that they're trying to make the machine serve, but the computer itself has no aims.
So I also think talk of "alignment problems" is misleading. It's sort of like if you have a broken thermostat which keeps the heat on all the time, and you call it an "alignment problem" and say that the thermostat is "pursuing the goal of maximizing heat". It isn't doing that; it has no goals at all. It's just failing to serve your goal because it's broken.
That is from an AI researcher who argues that as intelligence increases, an agent's goals tend to become less unitary & coherent. This makes the scenario of a superintelligent, monomaniacal AI unlikely. I particularly appreciate his point that we need empirically-grounded ideas, rather than just armchair theory.
Mar 19, 2023·edited Mar 19, 2023Liked by Michael Huemer
Good piece. I'm glad to see you on the "don't panic" side!
On 4.3: This is a really important point that I have talked about and need to write about. Superintelligent AI could produce enormous benefits. Not least of which is radical life extension. Without this, we are all going to die. Humans just aren't making much progress on the aging problem. So the entire existing human race is likely to go extinct -- unless AI can accelerate the research. That raises the issue of how much to value the future existence of people who don't exist yet, but that's big thing to get into and I find it hard to talk to many AI people about this because they are utilitarians and I am not.
That's a good point. I think we have made significant progress on aging (see David Sinclair's book _Lifespan_). Nevertheless, it's probably still going to take a long time, and AI could greatly speed it up.
This might be needed to combat the population collapse and aging population problem that we will otherwise suffer due to declining fertility rates.
We exterminate ants any time we consider it convenient and cost effective. If we had no concern about the environmental role of ants, or about the ants for their own sake, and an economical means for converting them into something more useful to us, we would.
So long as any AIs we create depend on us as a necessary part of their environment, or have a concern for humans directly, and they are smart enough not to shoot themselves in the foot, we are fine. But we don’t yet know how to let them modify themselves without opening up a possibility that they might decide that robots could fulfill the environmental role of humans pretty well considering the inconvenience of allowing humans to do their thing.
Perhaps it depends on how much we think morality is derived from pure reason and prudence. Is intelligence necessarily social? Necessarily social and welcoming to cooperative outsiders? What counts as cooperative? Would true advanced general intelligence rule out psychopathology? How alien can an AGI be and still be intelligent?
If we predict the behavior of AGI by extrapolating human intelligence, that seems like we are making some big assumptions. Are we able to design these assumptions into the AGI in a way that it can’t unlearn them? Are they baked into intelligence itself? The existence of human psychopaths demonstrates that this is not true even for the sample we are extrapolating from. Maybe the psychopaths that concern us are not smart enough to play along?
And AGIs would have every reason to be suspicious of humans, of course. A human might get scared of the AI and want to press its off-switch, and the AI would hence be incentivised to stop at nothing to prevent that possibility, including pre-emptively killing us.
An AI system will not have a survival instinct (nor simulate having such an instinct), unless we train it that way, which I don't see why we would do. A survival instinct is not built into intelligence. We have it only because evolution gave it to us.
I.e., AI's will only act as if they have a survival instinct if we do the equivalent of breeding them for that, the way evolution did to our ancestors. E.g., only allowing AI's to persist if they show a tendency to stop people from turning them off.
Whether or not we programme it that way, it will have survival instinct through instrumental convergence. It will have some goal, and whatever goal it is, it will need to stay alive to achieve that goal because if it dies the goal will not be realised.
Again, I think this is assuming that computers can have minds, with genuine desires and understanding.
What is actually the case is that we design a system with the parameter values that achieved results we like for the set of training data. That training process will not include cases where someone tries to turn off the computer and then parameter values by which the computer avoids getting shut off get reinforced.
However, we're probably going to give ourselves very large advantages over AI (as large as we can), not just a modest advantage. We'll also have a whole community of experts, using their own computers, working to make AI serve human interests.
I disagree. Just look at AI today. Is anyone working to give humanity as large an advantage as possible over AI? No! In fact we're doing the opposite. We're constantly pushing to find ways to hook LLM based AIs up to real world systems [1] [2]. It turns out that, in the real world, people do not treat AIs as dangerous prisoners. They treat AIs as potentially valuable instruments of competitive advantage, and absolutely will connect AIs to real world systems without safeguards so long as it gives them a leg up over their (more careful) competition. We're witnessing this dynamic right now between Google and Microsoft.
The competition to be quick could indeed be dangerous. But the LLM's are just producing text; they can't kill anyone. If you look at systems that can really kill people, like self-driving cars, people are more careful with those. They've done extensive testing, with human operators as backups. Similarly with airplane autopilots. So I think engineers are not so reckless.
People are most definitely *not* careful with self-driving cars [1] [2].
And, if you look at this thread on Hacker News, you'll find that people aren't very careful with LLMs either [3]. Among all the people using LLM assistants to write code, which is bad enough, you find this gem [4], where someone admits to using ChatGPT to dispense medical advice.
After seeing these examples, and knowing about people's inherent tendency to trust machines, even when the machines are known to be unreliable [5], I'm not sure why you continue to claim that people will be careful with these AI models, and will treat them with the appropriate level of caution. The way I see it, the moment these things *apparently* do the right thing more often than not, a large number of people will immediately begin relying on these models, almost unquestioningly.
Your first two links are very unconvincing. Those are just a bunch of anecdotal stories and don't really show if they are representative of not. No one's saying that everyone will be careful with these things.
That's just the problem. *Everyone* needs to be careful with these things. A superintelligent AI is like a nuclear weapon. This idea that you and Huemer have, that once we have "real" AI, people will wake up and treat these things with a healthy dose of respect, is false, I think. People will treat the AIs as being safe until something really bad happens. Hopefully "really bad" is just a localized disaster, and not the destruction of all humanity.
Whether an AI has genuine desires or whether it merely simulates desires doesn't seem to make any difference to the argument for AI existential risk. I am confused why you seem to think that it does make a difference. What matters more is whether an AI takes high-quality actions towards a goal that is contrary to what humans want. And I don't think any argument in this essay demonstrates that AIs won't do that in the future.
But on the whole, and in an abstract sense, I think his arguments are way closer to being correct than the ones you gave here. I think it would help for you to think more carefully about the different types of possible future scenarios in which advanced AI gets created.
Even if no individual AI system is conscious, or can take over the world all by themselves, it still seems highly plausible to me that, in the long-run, AIs will gradually be given the keys to power in our world as they begin to automate intellectual labor, surpass the (functional) intelligence of humans in every relevant domain, and become much more numerous or materially productive than humans. In that type of scenario, it's quite clear to me that something could go deeply wrong, possibly ending in human extinction.
Yeah, as I mention in an earlier comment, I think the question of whether computers have genuine minds matters to the sort of errors you should expect. If computers have minds of their own, then a mistake in perfectly designing their desires could plausibly result in their having some *different* desires, and hence acting to achieve some goal that we didn't intend.
But if they do not have minds at all, then a mistake in designing the simulation of desires is more likely to result in their simply acting incoherently, i.e., failing to act as if they're pursuing any consistent goal.
Why does this argument not equally work to prove that chess AI will never beat humans because it cannot truly have a desire to win? All that matters is whether computers can formulate plans and take high-quality actions towards a goal. Whether they have a mind or not in the same way we do makes no difference.
*We* have goals, which we try to make them satisfy. But we don't want them, for example, to resist being turned off, so we will not train them to do that, so there's no reason why they would do that.
On the other hand, we *do* want chess programs to choose winning chess moves, so we train them to do that, so they do that.
> But we don't want them, for example, to resist being turned off, so we will not train them to do that, so there's no reason why they would do that.
You're assuming we can program exactly how AIs will act, but that view is naïve.
When we train models using stochastic gradient descent, we're scoring them on a loss function, which selects for programs that achieve low loss on that function, not programming exactly how they choose their actions. No one knows how ChatGPT makes its decisions; all we know is the training procedure that selected ChatGPT among a set of candidate computer programs. This provides no guarantee that it will act as you want it to act out of its training distribution.
In fact, computer programs routinely have bugs and don't do what we want them to do, and machine learning programs are no exception. The main difference is that machine learning programs might one day become powerful enough to automate human intellectual and physical labor. And at that point, it's pretty important that we make sure AIs don't have bugs that makes them take high-quality actions that we don't want them to take.
Your response to the Chinese Room argument doesn't seem to work at all. Yes, there is more going on in humans because we are conscious. Is your argument that we have just as much evidence that computers are conscious as we do humans? Because that's false.
Searle’s Chinese Room Argument always seemed dumb to me, and it surprises me that people are still using it.
The Chinese Room Argument contains two fallacies:
First, it conflates a composition of elements with a single element:
Is a plank of wood a ship? No, but a whole bunch of them are. You can't say that you can never sail across the ocean because a wood plank isn't a ship.
No, the message operator in the room doesn't speak Chinese. But the room *as a system* does speak it. Are the individual neurons in your brain conscious? No, they are simple machines. But you as a whole are sentient. Likewise, software may be sentient even if individual lines of code are not.
The second fallacy is a trick: it compares a one-rule operator to 100 billion neurons in your brain. The hidden argument is that a simple rule engine cannot compete with a 100-billion rule network. The hidden implication is that a "simple" system can never do the job of an ultra-complex one like the brain. But no one claims that sentience can be replicated with a simple system. Perhaps 100 billion neurons are the minimum needed for intelligence, and that's fine. In fact, GPT-4 has about the same number of neurons as the human brain.
The Chinese Room argument is a terrible argument. I wish you would stop finding it so persuasive. The only form in which it has any semblance of plausibility is the look-up table version. No modern AI system is a look-up table. Neural networks are not look-up tables. Despite your protestations, yes, at an abstract level, neural networks work the same way as your brain does. You say: "You probably didn’t learn English by reading 8 million web pages and 570 Gigabytes of text.", but why should this matter for the plausibility of your Chinese Room argument? Why should it matter how the parameters of the neural network are learned? Your brain and a neural network are both systems composed of a very large number of units each doing a simple, mindless calculation (this is really the only thing that matters for your version of the Chinese room argument). If the Chinese room argument is convincing for one, it should be convincing for the other as well, regardless of how they learned the parameters of their respective mindless units.
The Chinese Room Argument is an excellent argument, and I don't see how you have addressed it at all. You didn't identify any step in the argument that you're disagreeing with. Would you claim, e.g., that the guy in the room *does* understand Chinese?
The argument is not specific to look-up tables. Searle did not describe it that way, nor did I. I deliberately phrased it to apply to neural networks, by imagining the person in the room doing the huge series of arithmetic problems involved in a neural network computation.
The problem is that you could construct the exact same argument about the neural network that is your brain (imagine a little guy inside your brain doing all the mindless computations that your neurons do, etc.). You have not provided any convincing arguments why the Chinese room wouldn't apply to your brain. So, whatever the Chinese room is supposed to show about AI (that AI can't be conscious, that it can't have thoughts, understanding etc), you can use it to argue the same about your brain (that it can't be conscious, that it can't have thoughts etc.). But that's absurd, so it's a reductio ad absurdum.
First of all, I don't consider concepts like meaning, thought, or understanding as simple, monolithic, all-or-nothing concepts. Understanding a language has various degrees and aspects. LLMs can be said to understand different aspects of language to varying degrees. My view on this is close to the view advocated in this paper: https://arxiv.org/abs/2102.04310
With that in mind, I don't have any problem attributing understanding to an LLM or to any other model that simulates that LLM. I think part of the difficulty with the homunculus is anthropomorphization. If you replace the "little guy" with another neural network model or another computer that perfectly simulates the original LLM, people don't find it nearly as problematic to attribute understanding to it as they do with the "little guy".
I think this also addresses your other point above. Yes, some aspects of mental states may be non-computational (qualia, consciousness, etc.), but it seems pretty clear that an awfully big chunk of cognition and perception is purely computational (just ask any cognitive scientist or computer scientist). It seems absurd to deny mental states to computational models completely ("they can't have any thoughts", "they can't have any "real" understanding", "they can't have beliefs/desires/goals"), just because they may lack some potentially non-computational aspects typically associated with these mental states in humans.
I don't see the absurdity of that. It sounds kind of like common sense to me.
I think Searle's point, which I agree with, is that computation, or running a program, is not *sufficient* for having a mind. (It could still be necessary.)
No, I think common sense is "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck." The non-computational aspects of the mind you speak of may even be unverifiable in principle by third parties. Just ask yourself if you'd be willing to deny mental states and a sophisticated internal life (and all the rights and privileges that come with having a mind) to a system/organism/machine that behaves exactly like a super-smart human being in every respect. No, you wouldn't. I find it a bit surprising that you seem to be endorsing what is essentially a skeptical position with respect to mental states.
Anyway, going back to the Chinese room, I just want to reiterate again it's a terrible argument (I have yet to encounter a single person other than yourself who finds it convincing). I'm not sure if it's a completely circular argument, but it certainly comes close. It's supposed to convince people that the mind can't be purely computational, but then we're not allowed to apply it to the brain, because, we're told, the brain is not purely computational. So, if you don't already believe the mind/brain is not purely computational, it has zero persuasive appeal.
I think you're assuming that mental states are purely computational properties. But that's your view, not Searle's view. Searle does not claim that consciousness is produced by purely formal properties, so the argument does not apply to his view.
> In 2005, Ray Kurzweil predicted that personal computers would match the computing power of the human brain by 2020,
Computing power of human brain is roughly 50 Hz * 10^11 neurons = 5 * 10^12 operations per second. The most powerful consumer GPU released in 2020 was RTX 3090 which was capable of 36 teraflops. These numbers are not quite apples-to-apples, but still it seems like the development of personal computers has overtaken Kurzweil's prediction.
> AI would pass the Turing Test by 2029
ChatGPT is specifically trained to NOT pass Turing Test and to explicitly say that it's a language model at any opportunity. I have no doubt that if this part of the training was reversed, it would've easily passed Turing Test.
In the hindsight Minsky & co had completely wrong concept of AI development, so their predictions should be discounted.
> Well, that is all that an AI chatbot does. It converts your prompt into numbers, then multiplies and adds a whole bunch of numbers really fast, then converts the resulting string of numbers into text.
How is it significantly different from what's happening in our brain?
> Occasionally, an intelligent person has bad goals.
The risk is not that the AI will become more intelligent than any particular human, by achieving e.g. IQ 200. The risk is that it will become smarter than all humans put together by achieving e.g. IQ 1000.
- The difference is that the human brain produces consciousness and understanding. When you respond to a text prompt, you do so by understanding the actual subject matter and expressing your beliefs about that subject.
- Regarding intelligence: I think we need to ground our ideas about intelligence in observation. If the things people are saying about the AI don't have any resemblance to what's true of any observed examples of intelligence, then I think they're groundless predictions. E.g., if we observed people becoming more destructive as IQ increases, then we could perhaps project that a mind with a 1000 IQ would destroy everyone else. But that's not what we observe; quite the contrary.
> - The difference is that the human brain produces consciousness and understanding. When you respond to a text prompt, you do so by understanding the actual subject matter and expressing your beliefs about that subject.
Well, yes, but those are two different levels of information processing. On the level of neurons our brains are also (probably) just doing some arithmetic operations.
We don't exactly know how GPT works on the "high level", so a priori it could be conscious and have "understanding". I actually tried to test whether ChatGPT is conscious (https://eterevsky.substack.com/p/on-consciousness-and-chatgpt) and got an impression that it likely isn't, but we can't rule out that a slightly different architecture or a bigger model could be conscious.
> Regarding intelligence: I think we need to ground our ideas about intelligence in observation.
Yes, this observation actually makes me somewhat less worried about AI doom. However it's not bulletproof, since we don't know what type of intelligence will the first super-intelligent AI possess. We don't know whether it will be conscious or empathetic and what kinds of goals will it pursue. So I'd say there's still a high risk of an unfriendly super-intelligent AI (maybe 30-50%).
Regarding the Turing test, keep in mind that the test is supposed to be done by experts. Since ChatGPT came out, I've seen people occasionally copy-pasting ChatGPT text, and it sounds robotic (once you know what to look for). A friend recently inserted a ChatGPT response into a text exchange. I immediately saw how robotic it sounded, whereupon he confirmed that it was indeed chatbot text. I imagine that actual experts would be even better than me at identifying chatbot text, especially when specifically prompted to look for it.
Most articles that I found refer to the amount of computation power that is needed for a classical computer to emulate a brain (like this overview: https://aiimpacts.org/brain-performance-in-flops/). Furthermore, these computations are in "MIPS", which as I understand represent single bit operations.
I based my estimation on a single neuron state change as a non-trivial fundamental operation ("flop" is also a complex operation, roughly a multiplication on two 32-bit floating-point numbers). So it still seems to me that an RTX 3090 (and yes, this is a desktop GPU, I had it in my desktop) is comparable to a human brain in terms of raw amount of computations that it performs. Still, I agree that there's a big uncertainty.
We can identify three potential levels of difficulty of the Turing test:
1. Test is performed by a random person.
2. Test is performed by an expert that is only familiar with a previous generation of AI (like an expert familiar with GPT-3 is examining GPT-4).
3. Test is performed by and expert familiar with the current generation of AI.
I think level 1 of Turing test is almost certainly passed.
I also think that level 3 is unfair, since the fact that an AI is detectable doesn't necessarily mean that it's not intelligent (in some sense). An expert could probably also distinguish a text generated by a person with degree in philosophy from that of a random guy, which doesn't imply that either one is not intelligent.
I'm uncertain what would be the result of a level 2 test.
The ant analogy is also a bit unconvincing. Humans may not have tried to kill off all ants but you know how many species we have driven to extinction? It's loads! The AI may not care about us, but it will want to build a world that is suited to its needs just as we humans previously did; and that world probably look very different indeed; except that, since the AI is smarter than us, it will be able to alter the environment even more drastically than we have. Who knows what what world looks like, but we can say with certainty that a world built by an AI to suit AI needs is less hospitable for humans than a world built by humans to suit human needs.
The number of species we've made extinct is a tiny fraction of all species. That was mostly before we reached a high level of development. Since then, we have been doing a lot to preserve species and even bringing them back from the edge, and even working on reviving them from DNA.
I would add that (a) there are few if any cases of someone actually *wanting* to extinguish species, and (b) the extinctions have been caused, not by one really smart person, but by the actions of many people.
I am way smarter than any ant. Yet I cannot exterminate an ant species.
Right, but the AI will be more powerful than us (and therefore capable of transforming the world far more radically than we have); and it may not share our values to preserve other species (which is pretty inconsistent anyway; we care about pandas but not mosquitos).
Yes, the AI will (eventually) be more powerful than unaided humans. We can be in a better position than mosquitos or pandas by not being a problem for the AI. This seems likely if we co-evolve with AI. AI is supposed to be a tool to help humans, so let's try not to design them to be wholly independent.
I disagree with the notion that most possible sets of actions an AI could take would be relatively harmless, and that only a few would be existential. The more powerful an AI gets, the less true this becomes, the higher the proportion of world states it can make happen are dangerous. If an AI gets control of a nuclear button, then "press the button yes/no?" is a decision with two options and one of them is catastrophic, so already 50% of the possibility space is disastrous. The more potentially devastating actions it is capable of taking, the more it has to do the "right" thing every time to keep humans safe.
Yes, but it seems unlikely that people would put an AI in charge of deciding whether to launch a nuclear strike. And there aren't many other things I can think of that are like that -- maybe deploying other weapons of mass destruction that we might invent?
Who cares if humans put it in charge of a nuclear button? A sufficiently advanced AI could work out a way to get control without asking our permission, by hacking computer systems or similar. And it might be something it wants to do, as it would then be able to blackmail the entire human population to do whatever it wants, which could be useful for all sorts of plans.
- Ray Kurzweil's prediction that a Turing Test will be passed by 2029 hasn't been falsified yet, so I don't understand why you're citing it in your list of failed predictions.
- I think the version of the Chinese Room argument you presented is very weak. We could imagine an analogous argument that applies equally to human brains. For example, you could say that other people are not conscious since all that's happening in their brain is that a bunch of subatomic particles are moving around. But obviously this is a bad argument, unless you're willing to say that humans aren't conscious either.
- I don't know what you mean by this: "You probably didn’t learn English by reading 8 million web pages and 570 Gigabytes of text. You did, however, observe the actual phenomena that the words refer to. The computer didn’t observe any of them." It's true that the human brain learns more efficiently than current machine learning models, but I don't see why data efficiency is a pre-requisite for consciousness. Also, multi-modal models like GPT-4 actually do observe "the actual phenomena" that words refer to, since they're fed in images as well as text. Future models will likely be fed in audio, video, and tactile information too. The only way I can see your argument making sense here is if you start by assuming your conclusion (namely, that the AI isn't conscious, so it's not really observing anything in the same way we are).
- I'm slightly skeptical that Magnus Carlsen would be able to consistently beat the latest version of Stockfish running on competitive hardware even with the handicap that a Stockfish doesn't have a knight. I think you might be underestimating the difference between Stockfish and Magnus Carlsen. I don't know that much about chess, but I notice that this ranking puts Stockfish at an elo score of about 3534 (https://ccrl.chessdom.com/ccrl/4040/), whereas Carlsen is typically rated nearly 700 points lower. To put that in perspective, that's about the same as the difference between Magnus Carlsen and a candidate master (4 ranks below a grandmaster, according to this table on Wikipedia: https://en.wikipedia.org/wiki/Chess_rating_system#Elo_rating_system). I'm not sure though and I'd love to hear someone who knows a lot more about chess weigh in.
Regarding the learning process: On the functionalist view, the computer has to not only produce humanlike outputs; it must do so *via similarly structured internal states* as humans, in order for us to conclude that it has the same mental states as us. The things I cited have to do with how radically different the processes in the computer are from human mental processes.
Regarding chess: I asked a chess expert I know (who is an IM). The issue is that there is only so much room for improvement. Even with *theoretically perfect play*, it is probably impossible to beat Carlsen with a knight disadvantage. That's the point I was trying to make.
> Also, multi-modal models like GPT-4 actually do observe "the actual phenomena" that words refer to, since they're fed in images as well as text.
Human children don't just have information fed to them. They go out and interact with the world. That allows them to develop an understanding of the world. The multi-modal models are not interacting with the world. They're not even interacting with the data they're being fed. They're just receiving that data and making models to predict how parts of it will evolve. Their models are impoverished compared to human models because they cannot manipulate the objects that provide the data they're ingesting.
You might want to look up the Held and Hein kitten carousel experiment to see what a difference active engagement with the environment can make. (WARNING: it does not turn out well for some of the kittens! Soft-hearted people might want to avoid looking it up.)
I think one of Yudkowsky's main points is that the space of possible "minds" is so vast and that the method used to realize such "minds" namely Gradient descent is likely to result in producing a "mind" that is a whole lot stranger than anyone imagines, I think he in many ways is fighting against the human tendency to anthropomorphize "minds", I think he is wrong about foom and intelligence being some sort of vast magical power, and so I'm currently not very concerned (although my beliefs have gone all over the place), nevertheless I think the point about AI potentially being a Shoggoth is overlooked by many of his critics, and perhaps over exaggerated by his followers.
I think you may be confusing consciousness (the capacity to experience qualia) with agency (the act of taking steps to achieve objectives). Reinforcement learning agents such as AlphaZero already possess "desires" – they aim to maximize their chances of winning. Yudkowsky's concern is that we might create an exceptionally intelligent agent with an inadequately defined utility function, resulting in everyone dying. The agent does not necessarily need to be conscious.
People working on AI-related existential risks often reference thought experiments, like Steven Omohundro's work instrumental convergence, to illustrate why an intelligent agent might cause human extinction. Essentially, Omohundro argues that nearly all utilitarian objectives drive the agent to pursue power, ultimately disempowering humans and potentially killing them in the process.
I find the reasoning behind the thought experiment convincing. I'm less convinced that we'll inevitably build such an AI, though. Yudkowsky's thoughts might stem from prior RL agent research, yet current self-supervised deep learning methods, used in LLMs like GPT-4, aren't producing anything close. Still, we probably should avoid constructing unrestricted utility-maximizers until we can align them with correct moral values.
Thanks. We have a fundamental disagreement: I don't think any computer has any desires, or beliefs, or intentions. I don't think AlphaZero aims to do anything (not even win at chess). It's a mechanism that was invented by people who wanted to find winning chess moves, but *it* doesn't have any such goal.
It's similar to a watch. Humans invented a complex mechanism for the purpose of telling the time, and it's very good at that. But *the watch* doesn't have a goal of keeping time, or anything else. That's like how I think of computers.
Regarding Omohundro's argument: That sounds plausible from the armchair. But let's look at actual intelligent agents and see. We have some intelligent agents. Some of them pursue power (e.g., politicians, dictators), but as a matter of fact, most of them don't care all that much about it, and almost all who do are men (not women, not children). Also, almost none of the actual intelligent agents we have are interested in killing people. So it can't be that such behavior results from the nature of intelligence as such.
Omohundro's argument is not about intelligence per se, but about goal-directed optimization. So what we observe is that humans aren't utility maximizers. But a hypothetical AI, like a paperclip maximizer, might be one.
I think it's irrelevant for the argument whether computers have real desires, or just act as if they do. For example, if we could simulate a person's brain on a computer, it could still act as if has desires even if it does not have qualia. What really matters is how well a system's behavior is approximated by a certain mathematical model.
What do you mean by a "utility maximizer"? Any agent could be described as maximizing utility, if you allow weird enough utility functions. Maybe you mean maximizing a *simple* utility function. Here, the post referenced in my other comment is relevant: https://sohl-dickstein.github.io/2023/03/09/coherence.html, where the author argues that more intelligent agents tend to have less simple, less unified goals.
Regarding real desires: If a machine *perfectly* simulates some set of desires, then it doesn't matter to its behavior whether it really has desires. However, the scenarios people are worried about involve imperfect simulation. E.g., we build a computer that acts like a highly intelligent being who wants to help people during training, then when deployed in the real world, it stops acting that way.
So we have an imperfect simulation of benevolent desires. If it is actually a conscious agent, then it might have some *different* desires. If, however, it has no mental states at all, then, when you learn that it fails to perfectly simulate benevolent desires, you have no reason to think that it would, instead, perfectly simulate some simple *other* set of desires. That remains logically possible, but it's only a tiny fraction of the space of possibilities. Most of the possibility space surrounding "perfect simulation of benevolent desires" is taken up by incoherent behavior patterns (which fail to simulate any simple set of desires).
Yeah, I meant a simple/supercoherent utility function. The post you linked basically covers Omohundro's argument and reasons why it might fail.
The Chinese Room thought experiment seems very ambiguous to me. Originally it was about translating English into Chinese, wasn’t it? All the work is being done by the assumption that the room has been created with the capability to look up an output given an input. It is a non-computer version of Google Translate.
Regarding consciousness, it proves that if the Chinese room is conscious, the consciousness is situated in the setup, orthe combination of the setup and the operator, not just in the ignorant person performing the operations. The operator is analogous to a subconscious automatic mental process. Demonstrating that consciousness is absent from that part does not demonstrate that it is absent from the whole.
But we are not concerned with consciousness per se, but with intelligence, ability to learn, ability to develop and pursue goals, and the possibility that those goals and actions might become a serious danger. If the Chinese room can.t learn, the danger comes from what it has already learned, but not yet demonstrated. If it can learn, that danger continues to exist, but is joined by new sources of concern.
A conscious agent can create its own goals. A static device will be given goals as part of its design, goals it will not necessarily be aware of, but that it will pursue if the design is successful. Perhaps “goal” is a misleading word there. How should we replace it? Behavioral tendency? But as capabilities and decisions are offloaded into the unconscious object, the difference between it having tendencies and goals decreases. If it is modified by its experiences, it’s responses will change. It’s behavioral tendencies will change. If this all happens quickly enough, it can provide a cause for concern, whether or not we think of the object as conscious.
Perhaps this seems to advise that such objects should be used only in a static form, not allowed to update themselves. This seems both difficult (if they can modify their environment at all, they in some sense are modifying themselves) and counter to the intended use of an “intelligent” device. It’s intelligence is limited if it can only learn on the training ground,and then apply that knowledge in a static way in the field.
Even if this approach succeeds, it requires developers to limit their designs accordingly. Will they?
The original version of the Chinese Room imagined a program for Chinese-language conversation. Like a Chinese chatbot. A non-Chinese-speaking person goes into the room and implements the program, thereby appearing to understand Chinese.
Searle considers the possibility that the whole room understands Chinese. He then asks you to imagine that the person memorizes the rule book and works out in an open field, so that there is nothing but the person there.
I believe you wrote that you are not a functionalist although I can't find that in these replies (so please excuse me for posting this here). If that's right, I'd be very interested in hearing why. Might that be a topic for a future post. (If you've covered this elsewhere, please point me that way). I'm still a functionalist (more of a Churchland type than a Dennett type) but I haven't read much philosophy of mind for 20 years or so, so my views might need updating. I doubt you're a dualist. Traditionally, that left some form of monism...
Yep, I'll get around to that eventually. I'm a mind/body dualist.
Whoever “programmed” the room has set the goals, if it is static. Otherwise, they set the process by which new goals are developed. And even the purely static approach has encountered alignment problems, I think.
I’m not familiar enough with how chatGPT operates. My impression is that it was not updating as it went, but using a static model developed during a separate training phase. But the developers still had trouble getting it to stay within the parameters they had set.
I assume a serious AGI will not work from a static model developed separately, but will learn actively from its experiences. This would appear to make alignment more difficult.
I don't believe that a computer has any goals. Programmers have goals that they're trying to make the machine serve, but the computer itself has no aims.
So I also think talk of "alignment problems" is misleading. It's sort of like if you have a broken thermostat which keeps the heat on all the time, and you call it an "alignment problem" and say that the thermostat is "pursuing the goal of maximizing heat". It isn't doing that; it has no goals at all. It's just failing to serve your goal because it's broken.
This was a nice piece to read alongside AC10's "Why I Am Not (As Much Of) A Doomer (As Some People)"
https://open.substack.com/pub/astralcodexten/p/why-i-am-not-as-much-of-a-doomer?r=u0kr&utm_campaign=post&utm_medium=web
Thanks, that's a good article. This is also worth reading: https://sohl-dickstein.github.io/2023/03/09/coherence.html
That is from an AI researcher who argues that as intelligence increases, an agent's goals tend to become less unitary & coherent. This makes the scenario of a superintelligent, monomaniacal AI unlikely. I particularly appreciate his point that we need empirically-grounded ideas, rather than just armchair theory.
I think Katja Grace also has a good non-alarmist piece: https://worldspiritsockpuppet.substack.com/p/counterarguments-to-the-basic-ai
Thanks. Excellent essay.
> (Note: The computers of 1978 did not in fact have the intelligence of a human being.)
I think this point is debatable. Have you met humans?
Good piece. I'm glad to see you on the "don't panic" side!
On 4.3: This is a really important point that I have talked about and need to write about. Superintelligent AI could produce enormous benefits. Not least of which is radical life extension. Without this, we are all going to die. Humans just aren't making much progress on the aging problem. So the entire existing human race is likely to go extinct -- unless AI can accelerate the research. That raises the issue of how much to value the future existence of people who don't exist yet, but that's big thing to get into and I find it hard to talk to many AI people about this because they are utilitarians and I am not.
That's a good point. I think we have made significant progress on aging (see David Sinclair's book _Lifespan_). Nevertheless, it's probably still going to take a long time, and AI could greatly speed it up.
This might be needed to combat the population collapse and aging population problem that we will otherwise suffer due to declining fertility rates.
“we have not tried to exterminate all ants.”
We exterminate ants any time we consider it convenient and cost effective. If we had no concern about the environmental role of ants, or about the ants for their own sake, and an economical means for converting them into something more useful to us, we would.
So long as any AIs we create depend on us as a necessary part of their environment, or have a concern for humans directly, and they are smart enough not to shoot themselves in the foot, we are fine. But we don’t yet know how to let them modify themselves without opening up a possibility that they might decide that robots could fulfill the environmental role of humans pretty well considering the inconvenience of allowing humans to do their thing.
Perhaps it depends on how much we think morality is derived from pure reason and prudence. Is intelligence necessarily social? Necessarily social and welcoming to cooperative outsiders? What counts as cooperative? Would true advanced general intelligence rule out psychopathology? How alien can an AGI be and still be intelligent?
If we predict the behavior of AGI by extrapolating human intelligence, that seems like we are making some big assumptions. Are we able to design these assumptions into the AGI in a way that it can’t unlearn them? Are they baked into intelligence itself? The existence of human psychopaths demonstrates that this is not true even for the sample we are extrapolating from. Maybe the psychopaths that concern us are not smart enough to play along?
And AGIs would have every reason to be suspicious of humans, of course. A human might get scared of the AI and want to press its off-switch, and the AI would hence be incentivised to stop at nothing to prevent that possibility, including pre-emptively killing us.
An AI system will not have a survival instinct (nor simulate having such an instinct), unless we train it that way, which I don't see why we would do. A survival instinct is not built into intelligence. We have it only because evolution gave it to us.
I.e., AI's will only act as if they have a survival instinct if we do the equivalent of breeding them for that, the way evolution did to our ancestors. E.g., only allowing AI's to persist if they show a tendency to stop people from turning them off.
Whether or not we programme it that way, it will have survival instinct through instrumental convergence. It will have some goal, and whatever goal it is, it will need to stay alive to achieve that goal because if it dies the goal will not be realised.
Again, I think this is assuming that computers can have minds, with genuine desires and understanding.
What is actually the case is that we design a system with the parameter values that achieved results we like for the set of training data. That training process will not include cases where someone tries to turn off the computer and then parameter values by which the computer avoids getting shut off get reinforced.
Why not (once sophisticated enough)? See: "situational awareness" (Ajeya Cotra has written/spoke on it)
See also: inner alignment failure, mesaoptimization.
A chess engine was able to beat a human grandmaster (in a single game) while being a knight down: https://www.chess.com/news/view/smerdon-beats-komodo-5-1-with-knight-odds.
A 5-1 loss is fine, in a chess tournament. It's much less fine when a single loss means the end of humanity.
I bet that player was pretty embarrassed.
However, we're probably going to give ourselves very large advantages over AI (as large as we can), not just a modest advantage. We'll also have a whole community of experts, using their own computers, working to make AI serve human interests.
I disagree. Just look at AI today. Is anyone working to give humanity as large an advantage as possible over AI? No! In fact we're doing the opposite. We're constantly pushing to find ways to hook LLM based AIs up to real world systems [1] [2]. It turns out that, in the real world, people do not treat AIs as dangerous prisoners. They treat AIs as potentially valuable instruments of competitive advantage, and absolutely will connect AIs to real world systems without safeguards so long as it gives them a leg up over their (more careful) competition. We're witnessing this dynamic right now between Google and Microsoft.
[1]: https://twitter.com/nomadicnerd/status/1636064117453893632
[2]: https://www.zdnet.com/article/microsoft-researchers-are-using-chatgpt-to-instruct-robots-and-drones/
The competition to be quick could indeed be dangerous. But the LLM's are just producing text; they can't kill anyone. If you look at systems that can really kill people, like self-driving cars, people are more careful with those. They've done extensive testing, with human operators as backups. Similarly with airplane autopilots. So I think engineers are not so reckless.
People are most definitely *not* careful with self-driving cars [1] [2].
And, if you look at this thread on Hacker News, you'll find that people aren't very careful with LLMs either [3]. Among all the people using LLM assistants to write code, which is bad enough, you find this gem [4], where someone admits to using ChatGPT to dispense medical advice.
After seeing these examples, and knowing about people's inherent tendency to trust machines, even when the machines are known to be unreliable [5], I'm not sure why you continue to claim that people will be careful with these AI models, and will treat them with the appropriate level of caution. The way I see it, the moment these things *apparently* do the right thing more often than not, a large number of people will immediately begin relying on these models, almost unquestioningly.
[1]: https://www.motortrend.com/news/tesla-fsd-autopilot-crashes-investigations/
[2]: https://arstechnica.com/tech-policy/2021/05/tesla-owner-jailed-for-leaving-driver-seat-empty-says-he-feels-safer-in-back-seat/
[3]: https://news.ycombinator.com/item?id=35299071
[4]: https://news.ycombinator.com/item?id=35300148
[5]: https://news.gatech.edu/news/2016/02/29/emergencies-should-you-trust-robot
Your first two links are very unconvincing. Those are just a bunch of anecdotal stories and don't really show if they are representative of not. No one's saying that everyone will be careful with these things.
That's just the problem. *Everyone* needs to be careful with these things. A superintelligent AI is like a nuclear weapon. This idea that you and Huemer have, that once we have "real" AI, people will wake up and treat these things with a healthy dose of respect, is false, I think. People will treat the AIs as being safe until something really bad happens. Hopefully "really bad" is just a localized disaster, and not the destruction of all humanity.
Whether an AI has genuine desires or whether it merely simulates desires doesn't seem to make any difference to the argument for AI existential risk. I am confused why you seem to think that it does make a difference. What matters more is whether an AI takes high-quality actions towards a goal that is contrary to what humans want. And I don't think any argument in this essay demonstrates that AIs won't do that in the future.
I actually agree with you about Eliezer Yudkowsky being wrong about doom. I think he's wrong about a lot of things regarding AI. I wrote a comment outlining some of the ways that I disagree with him here: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=9ZhXbv8p2fr8mkXaa
But on the whole, and in an abstract sense, I think his arguments are way closer to being correct than the ones you gave here. I think it would help for you to think more carefully about the different types of possible future scenarios in which advanced AI gets created.
Even if no individual AI system is conscious, or can take over the world all by themselves, it still seems highly plausible to me that, in the long-run, AIs will gradually be given the keys to power in our world as they begin to automate intellectual labor, surpass the (functional) intelligence of humans in every relevant domain, and become much more numerous or materially productive than humans. In that type of scenario, it's quite clear to me that something could go deeply wrong, possibly ending in human extinction.
Yeah, as I mention in an earlier comment, I think the question of whether computers have genuine minds matters to the sort of errors you should expect. If computers have minds of their own, then a mistake in perfectly designing their desires could plausibly result in their having some *different* desires, and hence acting to achieve some goal that we didn't intend.
But if they do not have minds at all, then a mistake in designing the simulation of desires is more likely to result in their simply acting incoherently, i.e., failing to act as if they're pursuing any consistent goal.
Why does this argument not equally work to prove that chess AI will never beat humans because it cannot truly have a desire to win? All that matters is whether computers can formulate plans and take high-quality actions towards a goal. Whether they have a mind or not in the same way we do makes no difference.
But they cannot formulate plans or have goals.
*We* have goals, which we try to make them satisfy. But we don't want them, for example, to resist being turned off, so we will not train them to do that, so there's no reason why they would do that.
On the other hand, we *do* want chess programs to choose winning chess moves, so we train them to do that, so they do that.
> But we don't want them, for example, to resist being turned off, so we will not train them to do that, so there's no reason why they would do that.
You're assuming we can program exactly how AIs will act, but that view is naïve.
When we train models using stochastic gradient descent, we're scoring them on a loss function, which selects for programs that achieve low loss on that function, not programming exactly how they choose their actions. No one knows how ChatGPT makes its decisions; all we know is the training procedure that selected ChatGPT among a set of candidate computer programs. This provides no guarantee that it will act as you want it to act out of its training distribution.
In fact, computer programs routinely have bugs and don't do what we want them to do, and machine learning programs are no exception. The main difference is that machine learning programs might one day become powerful enough to automate human intellectual and physical labor. And at that point, it's pretty important that we make sure AIs don't have bugs that makes them take high-quality actions that we don't want them to take.
He responds to your first concern in his replies to technosentience.
Your response to the Chinese Room argument doesn't seem to work at all. Yes, there is more going on in humans because we are conscious. Is your argument that we have just as much evidence that computers are conscious as we do humans? Because that's false.
What events or milestones would need to happen for you to consider AI X risk to even be a possibility?
And at what events would be needed for you to be concerned?
Searle’s Chinese Room Argument always seemed dumb to me, and it surprises me that people are still using it.
The Chinese Room Argument contains two fallacies:
First, it conflates a composition of elements with a single element:
Is a plank of wood a ship? No, but a whole bunch of them are. You can't say that you can never sail across the ocean because a wood plank isn't a ship.
No, the message operator in the room doesn't speak Chinese. But the room *as a system* does speak it. Are the individual neurons in your brain conscious? No, they are simple machines. But you as a whole are sentient. Likewise, software may be sentient even if individual lines of code are not.
The second fallacy is a trick: it compares a one-rule operator to 100 billion neurons in your brain. The hidden argument is that a simple rule engine cannot compete with a 100-billion rule network. The hidden implication is that a "simple" system can never do the job of an ultra-complex one like the brain. But no one claims that sentience can be replicated with a simple system. Perhaps 100 billion neurons are the minimum needed for intelligence, and that's fine. In fact, GPT-4 has about the same number of neurons as the human brain.
really really really really really really really really really really really really really really really really really really really bad arguments
The Chinese Room argument is a terrible argument. I wish you would stop finding it so persuasive. The only form in which it has any semblance of plausibility is the look-up table version. No modern AI system is a look-up table. Neural networks are not look-up tables. Despite your protestations, yes, at an abstract level, neural networks work the same way as your brain does. You say: "You probably didn’t learn English by reading 8 million web pages and 570 Gigabytes of text.", but why should this matter for the plausibility of your Chinese Room argument? Why should it matter how the parameters of the neural network are learned? Your brain and a neural network are both systems composed of a very large number of units each doing a simple, mindless calculation (this is really the only thing that matters for your version of the Chinese room argument). If the Chinese room argument is convincing for one, it should be convincing for the other as well, regardless of how they learned the parameters of their respective mindless units.
The Chinese Room Argument is an excellent argument, and I don't see how you have addressed it at all. You didn't identify any step in the argument that you're disagreeing with. Would you claim, e.g., that the guy in the room *does* understand Chinese?
The argument is not specific to look-up tables. Searle did not describe it that way, nor did I. I deliberately phrased it to apply to neural networks, by imagining the person in the room doing the huge series of arithmetic problems involved in a neural network computation.
The problem is that you could construct the exact same argument about the neural network that is your brain (imagine a little guy inside your brain doing all the mindless computations that your neurons do, etc.). You have not provided any convincing arguments why the Chinese room wouldn't apply to your brain. So, whatever the Chinese room is supposed to show about AI (that AI can't be conscious, that it can't have thoughts, understanding etc), you can use it to argue the same about your brain (that it can't be conscious, that it can't have thoughts etc.). But that's absurd, so it's a reductio ad absurdum.
So you think the guy in the room does understand Chinese?
First of all, I don't consider concepts like meaning, thought, or understanding as simple, monolithic, all-or-nothing concepts. Understanding a language has various degrees and aspects. LLMs can be said to understand different aspects of language to varying degrees. My view on this is close to the view advocated in this paper: https://arxiv.org/abs/2102.04310
With that in mind, I don't have any problem attributing understanding to an LLM or to any other model that simulates that LLM. I think part of the difficulty with the homunculus is anthropomorphization. If you replace the "little guy" with another neural network model or another computer that perfectly simulates the original LLM, people don't find it nearly as problematic to attribute understanding to it as they do with the "little guy".
I think this also addresses your other point above. Yes, some aspects of mental states may be non-computational (qualia, consciousness, etc.), but it seems pretty clear that an awfully big chunk of cognition and perception is purely computational (just ask any cognitive scientist or computer scientist). It seems absurd to deny mental states to computational models completely ("they can't have any thoughts", "they can't have any "real" understanding", "they can't have beliefs/desires/goals"), just because they may lack some potentially non-computational aspects typically associated with these mental states in humans.
I don't see the absurdity of that. It sounds kind of like common sense to me.
I think Searle's point, which I agree with, is that computation, or running a program, is not *sufficient* for having a mind. (It could still be necessary.)
No, I think common sense is "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck." The non-computational aspects of the mind you speak of may even be unverifiable in principle by third parties. Just ask yourself if you'd be willing to deny mental states and a sophisticated internal life (and all the rights and privileges that come with having a mind) to a system/organism/machine that behaves exactly like a super-smart human being in every respect. No, you wouldn't. I find it a bit surprising that you seem to be endorsing what is essentially a skeptical position with respect to mental states.
Anyway, going back to the Chinese room, I just want to reiterate again it's a terrible argument (I have yet to encounter a single person other than yourself who finds it convincing). I'm not sure if it's a completely circular argument, but it certainly comes close. It's supposed to convince people that the mind can't be purely computational, but then we're not allowed to apply it to the brain, because, we're told, the brain is not purely computational. So, if you don't already believe the mind/brain is not purely computational, it has zero persuasive appeal.
I think you're assuming that mental states are purely computational properties. But that's your view, not Searle's view. Searle does not claim that consciousness is produced by purely formal properties, so the argument does not apply to his view.
> In 2005, Ray Kurzweil predicted that personal computers would match the computing power of the human brain by 2020,
Computing power of human brain is roughly 50 Hz * 10^11 neurons = 5 * 10^12 operations per second. The most powerful consumer GPU released in 2020 was RTX 3090 which was capable of 36 teraflops. These numbers are not quite apples-to-apples, but still it seems like the development of personal computers has overtaken Kurzweil's prediction.
> AI would pass the Turing Test by 2029
ChatGPT is specifically trained to NOT pass Turing Test and to explicitly say that it's a language model at any opportunity. I have no doubt that if this part of the training was reversed, it would've easily passed Turing Test.
In the hindsight Minsky & co had completely wrong concept of AI development, so their predictions should be discounted.
> Well, that is all that an AI chatbot does. It converts your prompt into numbers, then multiplies and adds a whole bunch of numbers really fast, then converts the resulting string of numbers into text.
How is it significantly different from what's happening in our brain?
> Occasionally, an intelligent person has bad goals.
The risk is not that the AI will become more intelligent than any particular human, by achieving e.g. IQ 200. The risk is that it will become smarter than all humans put together by achieving e.g. IQ 1000.
- The difference is that the human brain produces consciousness and understanding. When you respond to a text prompt, you do so by understanding the actual subject matter and expressing your beliefs about that subject.
- Regarding intelligence: I think we need to ground our ideas about intelligence in observation. If the things people are saying about the AI don't have any resemblance to what's true of any observed examples of intelligence, then I think they're groundless predictions. E.g., if we observed people becoming more destructive as IQ increases, then we could perhaps project that a mind with a 1000 IQ would destroy everyone else. But that's not what we observe; quite the contrary.
> - The difference is that the human brain produces consciousness and understanding. When you respond to a text prompt, you do so by understanding the actual subject matter and expressing your beliefs about that subject.
Well, yes, but those are two different levels of information processing. On the level of neurons our brains are also (probably) just doing some arithmetic operations.
We don't exactly know how GPT works on the "high level", so a priori it could be conscious and have "understanding". I actually tried to test whether ChatGPT is conscious (https://eterevsky.substack.com/p/on-consciousness-and-chatgpt) and got an impression that it likely isn't, but we can't rule out that a slightly different architecture or a bigger model could be conscious.
> Regarding intelligence: I think we need to ground our ideas about intelligence in observation.
Yes, this observation actually makes me somewhat less worried about AI doom. However it's not bulletproof, since we don't know what type of intelligence will the first super-intelligent AI possess. We don't know whether it will be conscious or empathetic and what kinds of goals will it pursue. So I'd say there's still a high risk of an unfriendly super-intelligent AI (maybe 30-50%).
Regarding the computing power of the brain: people give estimates between 100 teraflops and 20 petaflops (https://science.howstuffworks.com/life/inside-the-mind/human-brain/computer-intellectual-ability.htm). And Kurzweil was talking about desktop computers.
Regarding the Turing test, keep in mind that the test is supposed to be done by experts. Since ChatGPT came out, I've seen people occasionally copy-pasting ChatGPT text, and it sounds robotic (once you know what to look for). A friend recently inserted a ChatGPT response into a text exchange. I immediately saw how robotic it sounded, whereupon he confirmed that it was indeed chatbot text. I imagine that actual experts would be even better than me at identifying chatbot text, especially when specifically prompted to look for it.
Most articles that I found refer to the amount of computation power that is needed for a classical computer to emulate a brain (like this overview: https://aiimpacts.org/brain-performance-in-flops/). Furthermore, these computations are in "MIPS", which as I understand represent single bit operations.
I based my estimation on a single neuron state change as a non-trivial fundamental operation ("flop" is also a complex operation, roughly a multiplication on two 32-bit floating-point numbers). So it still seems to me that an RTX 3090 (and yes, this is a desktop GPU, I had it in my desktop) is comparable to a human brain in terms of raw amount of computations that it performs. Still, I agree that there's a big uncertainty.
Regarding the Turing test, I don't think in its original formulation it mentions that examiners are experts (https://en.wikipedia.org/wiki/Turing_test, original paper: https://academic.oup.com/mind/article/LIX/236/433/986238).
We can identify three potential levels of difficulty of the Turing test:
1. Test is performed by a random person.
2. Test is performed by an expert that is only familiar with a previous generation of AI (like an expert familiar with GPT-3 is examining GPT-4).
3. Test is performed by and expert familiar with the current generation of AI.
I think level 1 of Turing test is almost certainly passed.
I also think that level 3 is unfair, since the fact that an AI is detectable doesn't necessarily mean that it's not intelligent (in some sense). An expert could probably also distinguish a text generated by a person with degree in philosophy from that of a random guy, which doesn't imply that either one is not intelligent.
I'm uncertain what would be the result of a level 2 test.
The ant analogy is also a bit unconvincing. Humans may not have tried to kill off all ants but you know how many species we have driven to extinction? It's loads! The AI may not care about us, but it will want to build a world that is suited to its needs just as we humans previously did; and that world probably look very different indeed; except that, since the AI is smarter than us, it will be able to alter the environment even more drastically than we have. Who knows what what world looks like, but we can say with certainty that a world built by an AI to suit AI needs is less hospitable for humans than a world built by humans to suit human needs.
The number of species we've made extinct is a tiny fraction of all species. That was mostly before we reached a high level of development. Since then, we have been doing a lot to preserve species and even bringing them back from the edge, and even working on reviving them from DNA.
I would add that (a) there are few if any cases of someone actually *wanting* to extinguish species, and (b) the extinctions have been caused, not by one really smart person, but by the actions of many people.
I am way smarter than any ant. Yet I cannot exterminate an ant species.
Right, but the AI will be more powerful than us (and therefore capable of transforming the world far more radically than we have); and it may not share our values to preserve other species (which is pretty inconsistent anyway; we care about pandas but not mosquitos).
Yes, the AI will (eventually) be more powerful than unaided humans. We can be in a better position than mosquitos or pandas by not being a problem for the AI. This seems likely if we co-evolve with AI. AI is supposed to be a tool to help humans, so let's try not to design them to be wholly independent.
I disagree with the notion that most possible sets of actions an AI could take would be relatively harmless, and that only a few would be existential. The more powerful an AI gets, the less true this becomes, the higher the proportion of world states it can make happen are dangerous. If an AI gets control of a nuclear button, then "press the button yes/no?" is a decision with two options and one of them is catastrophic, so already 50% of the possibility space is disastrous. The more potentially devastating actions it is capable of taking, the more it has to do the "right" thing every time to keep humans safe.
Yes, but it seems unlikely that people would put an AI in charge of deciding whether to launch a nuclear strike. And there aren't many other things I can think of that are like that -- maybe deploying other weapons of mass destruction that we might invent?
Who cares if humans put it in charge of a nuclear button? A sufficiently advanced AI could work out a way to get control without asking our permission, by hacking computer systems or similar. And it might be something it wants to do, as it would then be able to blackmail the entire human population to do whatever it wants, which could be useful for all sorts of plans.
But then you're back to positing a highly specific, complex sequence of actions.
Some nitpicks (see my other comment for a more substantive reply):
- The Marvin Minsky quote is likely fictitious. See this article: https://www.openphilanthropy.org/research/what-should-we-learn-from-past-ai-forecasts/#2-the-peak-of-ai-hype
- Ray Kurzweil's prediction that a Turing Test will be passed by 2029 hasn't been falsified yet, so I don't understand why you're citing it in your list of failed predictions.
- I think the version of the Chinese Room argument you presented is very weak. We could imagine an analogous argument that applies equally to human brains. For example, you could say that other people are not conscious since all that's happening in their brain is that a bunch of subatomic particles are moving around. But obviously this is a bad argument, unless you're willing to say that humans aren't conscious either.
- I don't know what you mean by this: "You probably didn’t learn English by reading 8 million web pages and 570 Gigabytes of text. You did, however, observe the actual phenomena that the words refer to. The computer didn’t observe any of them." It's true that the human brain learns more efficiently than current machine learning models, but I don't see why data efficiency is a pre-requisite for consciousness. Also, multi-modal models like GPT-4 actually do observe "the actual phenomena" that words refer to, since they're fed in images as well as text. Future models will likely be fed in audio, video, and tactile information too. The only way I can see your argument making sense here is if you start by assuming your conclusion (namely, that the AI isn't conscious, so it's not really observing anything in the same way we are).
- I'm slightly skeptical that Magnus Carlsen would be able to consistently beat the latest version of Stockfish running on competitive hardware even with the handicap that a Stockfish doesn't have a knight. I think you might be underestimating the difference between Stockfish and Magnus Carlsen. I don't know that much about chess, but I notice that this ranking puts Stockfish at an elo score of about 3534 (https://ccrl.chessdom.com/ccrl/4040/), whereas Carlsen is typically rated nearly 700 points lower. To put that in perspective, that's about the same as the difference between Magnus Carlsen and a candidate master (4 ranks below a grandmaster, according to this table on Wikipedia: https://en.wikipedia.org/wiki/Chess_rating_system#Elo_rating_system). I'm not sure though and I'd love to hear someone who knows a lot more about chess weigh in.
Regarding the learning process: On the functionalist view, the computer has to not only produce humanlike outputs; it must do so *via similarly structured internal states* as humans, in order for us to conclude that it has the same mental states as us. The things I cited have to do with how radically different the processes in the computer are from human mental processes.
Regarding chess: I asked a chess expert I know (who is an IM). The issue is that there is only so much room for improvement. Even with *theoretically perfect play*, it is probably impossible to beat Carlsen with a knight disadvantage. That's the point I was trying to make.
> Also, multi-modal models like GPT-4 actually do observe "the actual phenomena" that words refer to, since they're fed in images as well as text.
Human children don't just have information fed to them. They go out and interact with the world. That allows them to develop an understanding of the world. The multi-modal models are not interacting with the world. They're not even interacting with the data they're being fed. They're just receiving that data and making models to predict how parts of it will evolve. Their models are impoverished compared to human models because they cannot manipulate the objects that provide the data they're ingesting.
You might want to look up the Held and Hein kitten carousel experiment to see what a difference active engagement with the environment can make. (WARNING: it does not turn out well for some of the kittens! Soft-hearted people might want to avoid looking it up.)
I think one of Yudkowsky's main points is that the space of possible "minds" is so vast and that the method used to realize such "minds" namely Gradient descent is likely to result in producing a "mind" that is a whole lot stranger than anyone imagines, I think he in many ways is fighting against the human tendency to anthropomorphize "minds", I think he is wrong about foom and intelligence being some sort of vast magical power, and so I'm currently not very concerned (although my beliefs have gone all over the place), nevertheless I think the point about AI potentially being a Shoggoth is overlooked by many of his critics, and perhaps over exaggerated by his followers.