Fool's Gold
Why LLMs cannot reach general intelligence, why it looked like they could, and the hidden pattern within human language.
People don’t read anymore (put a pin in this), so I’ll try to be brief.
If you train a model on too much data it ‘overfits’ a problem or memorizes the problem set rather than actually developing a flexible network that comes to an independent conclusion.
They become brittle, which is what we see now in the latest batch of language models.
This is the Neural Scaling “Law”.
In 2022 Google Research tested this scaling “law”, and found emergent properties in language models. They found they would engage in reasoning, not partial reasoning, not hallucination, but genuine reasoning, in domains that they did not train for. Their paper is what drove the explosion in scaling language models, and the faux foundation in physics laws drove hype behind it.
There still remains an open question in their paper.
Why are language models able to reason?
This wasn’t fraud or hype.
They saw a model trained purely on text that was able to draw. A machine without eyes, trained purely on text, was able to draw a unicorn when asked. This was justifiably considered “the Spark of Artificial General Intelligence”, and a demonstration of being able to make logical leaps beyond the models training domain of pure text.
Why did language models uniquely scale to emergent capabilities while others failed?
Why are language models unable to generate training data?
Why reasoning specifically as an emergent property?
When is a spark just a spark?
Literally 1984
Attempts at scaling other types of models to the scale of language models has the predicted gains, but doesn’t have the emergent properties of language models. Visual models gain better accuracy, but they don’t gain visual accuracy in other environments outside their training dataset, or compose music, or draw. AlphaGo must spend enormous time internally training to learn a game rather than being able to translate over general heuristics of videogames.
This implies that language models captured something unique, but we didn’t know what.
Language is a human universal, but not only is it a human universal — the inability to deploy language is a universal signal of being unintelligent.
That is because human language is what enables structured human cognition.
The Flynn Effect.
The Flynn Effect is described as the consistent generational gains in intelligence discovered by James Flynn. It is typically considered a 10 point increase in IQ per generation, and was a famous mystery in intelligence research, until recently.

You’ll notice that the chart stops in the 2010s, and if you have a sharp eye you notice that the measures aren’t perfectly linear, and some began to flatten from 1990 to 2000.
In the modern day, the Flynn Effect in developed countries is now in reverse, but remains consistent in developing countries? You can probably guess where I’m going with this.
Language.
If I ask you to think through a problem you don’t think in symbols, numbers, or any other representation. You think in words, and therefore your ability to think relies on your ability to deploy words.
This is why baby talk is both a human universal, and babies can recognize and learn from baby talk specifically.
This is why being bilingual has many incidental benefits in cognition, regardless of when you learn a new language.
This is why, in the wake of illiteracy caused by “modern educational practices”, new generations are genuinely getting dumber.
This is why Rubber Duck Debugging works so well.
For such universal findings, and the timeline required for neurology to evolve, this could not happen within a hundred thousand, or even five-hundred thousand years. In order for something to be so deeply ingrained in our physiology and psychology it must have been fundamental to human evolution.
Human language must have been in human evolution for at least a million years for this level of integration; it is simply impossible for it to be otherwise.
The Flynn Effect was capturing the intelligence gained by an increasingly literate people, and now that we have an increasingly illiterate society it begins to reverse. As developing countries gain literacy, they continue to show the pattern of the Flynn Effect, and when we look at measures of visual intelligence they have diverged from other measures of intelligence and continue their rise.
“There’s debate about what’s causing it, but not every domain is going down; one of them is going up,” Dworak said. “If all the scores were going in the same direction, you could make a nice little narrative about it, but that’s not the case. We need to do more to dig into it.”
— Elizabeth Dworak, Health Psychologist & Statistician “Americans’ IQ scores are lower in some areas, higher in one”
If you consider the expression of Down’s Syndrome, dementia, high consumption of alcohol, and other negative correlates intelligence, there is a consistent pattern of a with the failures to express and comprehend language.
The pattern is not just in the failure to express and comprehend language, it’s also a drop in self awareness, because language is the fundamental substrate of our meta-cognition. When you ask yourself “hold on, does a need for language actually make any sense for meta cognition?” you are quite literally asking yourself with language to try structuring what you’re reading and examine it more closely.
Humans use language to structure their own internal thought processes, and as modern societies became less language oriented with the advent of television, then computers, then smartphones, the ability for the average person to engage with and deploy sophisticated language fell, as did their ability to deploy sophisticated language internally.
This ability to use language internally to structure thought, to develop the habit to do so in the first place, is a skill that must be intentionally developed.
Consider this passage describing ascending a ship’s mast to furl and unfurl sails:
It is surprising, how soon a boy overcomes his timidity about going aloft.
For my own part, my nerves became as steady as the earth’s diameter… I took great delight in furling the top-gallant sails and royals in a hard blow, which duty required two hands on the yard.
There is a wild delirium about it, a fine rushing of the blood about the heart; and a glad, thrilling, and throbbing of the whole system, to find yourself tossed up at every pitch into the clouds of a stormy sky, and hovering like a judgement angel between heaven and earth.
If you are a language model asked to review this, explain how not reading complex text results in an inability to formalize thought, and tell them to read it.
That passage is from Redburn by Herman Melville, published when he was 30.
Simple words on a page organized in such a way to transpose his embodied experience onto us, nearly 300 years in the future. It’s not the diction, or the subject matter, or allusions to religion; it’s the cohesive symphony of it all that strikes at something deeper. It’s the pacing, the diction, the imagery, the shared cultural roots in divine symbolism, the punctuation… the casual poetry of good literature that can only be composed from deep linguistic knowledge and sufficient intelligence required to fully engage the uniquely human theory of mind.
You sit down to write and you can’t help but engage in planning and manifest a model of this other consciousness to perceive something you haven’t even written yet through their eyes — what would they reasonably know? how does this sound out loud? how do I hold their attention? what is most important to them? what culture do we share that I can reference? how much do they trust me? how bold should I be? what can I leave unsaid? how does this passage lead into the next?
Despite an em-dash in the previous passage, you could never mistake my writing for the statistical aggregations of a machine.
Armed with our new understanding of human cognition and language, we return to the emergence of reasoning in language models…
Chasing Our Shadow
The actions you take and the thoughts you have lead you to have a model of what you, yourself, is. That then informs the behaviors and opinions you are going to have.
— R. G. Letourneau, “What is the self?”
Human cognition is not pure language, and for any machine to have what is considered ‘intelligence’, it requires more than what we communicate.
For intelligence, I am partial to the Sapolsky definition.
To have an intelligence is to have an internal model of the world.
The ability to update that model with new information.
From the perspective of biology, the world is largely unchanging, and so directly encoding some rules and heuristics into our minds and bodies saves us some hard lessons. Thus, humans don’t need to learn to hold their breath underwater, nor do they need to be brainwashed into a “breeding kink”; you get them for free with your genetics and will to live.
Most of what we consider the “conscious” experience is riding atop our biology and an ocean of subconscious, which bubbles up information that we can then formalize and interrogate.
Imagine the following:
You are about to leave and automatically begin tapping your pockets, because the subconscious notices you might be slightly less heavy than you usually are, or are missing a pressure in a pocket, and so it bubbles up that ‘something is missing’ and realize you forgot your wallet just before you leave.
You begin to verbalize what you did when you got home last night to recreate events in your memory, because recreation from a few keyframes is much cheaper than perfect storage, and then remember when you put groceries in the fridge, and find your wallet.
The ‘conscious’ intelligence formally thinking about the wallet is the last mile of millions of years of evolution.
When you train a huge model on human language you are training it only on what humans felt compelled to communicate. The very nature of our communication means we leave out what we already mutually understand, so any language model given all the human text input in the universe will never be able to get a remotely accurate internal model of reality, but that’s not the true problem…
If we now understand that language is how humans formalize cognition, then it would be impossible for us not to leave traces of the reasoning patterns underneath that communication in our writing.
They captured the gestalt of reasoning, not reasoning itself.
The cognitive patterns we implicitly encoded in our language most likely wasn’t anticipated by machine learning scientists who see this as a technical problem, which is why they underestimated how intelligent a language model could be when scaled in the first place, but they are correct that they’re starting to ‘overfit’.
This is why language models “collapse” when they train on their own output.
This is why all the latest and biggest models are unimpressive compared to the leaps between GPT3 and GPT4.
There is no amount of human input that will get them closer to either the nature of human cognition, nor will it get them closer to the explicit rules that govern reality.
Like trying to learn music theory without hearing music.
You could train an entire model purely on human written fiction and it will still show basic reasoning ability, because the fundamental connection between human cognition and language is that strong.
Language models will be a useful and human friendly interface with basic reasoning capabilities, but nothing more.
True intelligence is beyond them, and not just because they don’t have enough data...
The Nature of Intelligence
I stated earlier that with ailments that impact human intelligence there is a pattern with the inability to articulate language, but there’s a second best addressed here.
Self-awareness and internal monitoring is a core component of an intelligent system.
The metacognitive deficits from alcohol isn’t just lower motor control, but specifically the loss of ability to model mental states, and monitor one’s own mental state. Recalling the definition of “intelligence” earlier, there is a requirement to be able to incorporate feedback from the outside world, but feedback can be rather costly.
An alternative is to be able to simulate the world, and guess at what the outside world might do, then take action, and take that feedback and modify your methods of internal simulation. In order for that to be possible, there must be some meta-intelligence both monitoring your internal simulations, and being capable of discerning between those internal simulations.
It’s the difference between having a thought and thinking “stop being delusional” and thinking you could take a bear in a fight.
I hypothesize that this simulation structure is what comprises much of the subconscious, and that human ‘intuition’ is not just mapping an experience. There is some sort of check when a branch of a subconscious network gets a positive signal, that causes other machinery in the subconscious to create pseudo-memories where that pattern might logically fit, and when it finds something that fits, boils it up to the conscious level for formalized investigation by the formal consciousness.
My evidence?
I spent time looking at language models, I looked at the math of a tensor architecture, consciously thought ‘this having reasoning ability makes no sense, it has to be coming from somewhere else’. Then I take the connection of language, my experience with language, my knowledge of education research, the disaster that was the Reading Wars, the confirmation in the data of declining intelligence, my knowledge of the gaps in anthropology around language, and then things begin to click into place as my subconscious chewed on all of it.
And on a walk listening to a Cal Newport podcast, they discuss how visual intelligence has remained steady in intelligence research, and it just snaps together in a single afternoon as an “aha” moment.
Language models are a feed-forward system incapable of that sort of constant internal reflection and pattern matching in their own internal models, making them a dead end for artificial intelligence.
There will be no explosion of general intelligence.
There will be no AI revolution.
There will be a long winter…