You're right that ChatGPT is far from being sentient. However, I don't think it means that the same methods used to train ChatGPT can't scale up to a genuine understanding of reality.

A neural network trained to guess the next word in a sentence might still end up with an internal model of the world that helps it predict how a human would complete the given input. NNs discovering hidden structure in the data is already a thing that happens in neural networks: for example, computer vision models learn to recognize objects' edges. Knowledge about the world helps text prediction on tasks like logical reasoning, so a sufficiently advanced model likely will incorporate it in predictions.

Unlike most such features, we may even invent tools to extract the internal truth function of the model. Truth has a very nice property of logical consistency: if the model estimates that proposition X holds with probability p, then proposition not-X should hold with probability with 1-p, and so on. Very few naturally occurring features will have such a structure, so this property will distinguish truth from everything else.

Actually, the method described above has been tried in existing LLMs with promising results. You can read a Twitter thread about it here: https://twitter.com/CollinBurns4/status/1600892261633785856, and the full explanation of the paper in this blog post: https://www.alignmentforum.org/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-without.

Expand full comment