In the past decade, powerful AI systems have matched or surpassed human levels of performance in a number of tasks such as image and speech recognition, skin cancer classification, breast cancer detection, and highly complex games like Go. These AI breakthroughs have been based on increasingly powerful and inexpensive computing technologies, innovative deep learning (DL) algorithms, and huge amounts of data on almost any subject. More recently, the advent of large language models (LLMs) are taking AI to the next level. And, for many technologists like me, LLMs and their associated chatbots have introduced us to the fascinating world of human language and cognition.
I recently learned the difference between form, communicative intent, meaning, and understanding from “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data,” a 2020 paper by linguistic professors Emily Bender and Alexander Koller. These linguistic concepts helped me understand the authors’ argument that “in contrast to some current hype, meaning cannot be learned from form alone. This means that even large language models such as BERT do not learn meaning; they learn some reflection of meaning into the linguistic form which is very useful in applications.”
A few weeks ago, I came across another interesting paper, “Dissociating Language and Thought in Large Language Models: a Cognitive Perspective,” published in January, 2023 by principal authors linguist Kyle Mahowald and cognitive neuroscientist Anna Ivanova and four additional co-authors. The paper nicely explains how the study of human language, cognition and neuroscience sheds light on the potential capabilities of LLMs and chatbots. Let me briefly discuss what I learned.
“Today’s large language models (LLMs) routinely generate coherent, grammatical and seemingly meaningful paragraphs of text,” said the paper’s abstract. “This achievement has led to speculation that these networks are — or will soon become — thinking machines, capable of performing tasks that require abstract knowledge and reasoning. “Here, we review the capabilities of LLMs by considering their performance on two different aspects of language use: formal linguistic competence, which includes knowledge of rules and patterns of a given language, and functional linguistic competence, a host of cognitive abilities required for language understanding and use in the real world.”
The authors point out that there’s a tight relationship between language and thought in humans. When we hear or read a sentence, we typically assume that it was produced by a rational person based on their real world knowledge, critical thinking, and reasoning abilities. We generally view other people’s statements not just as a reflection of their linguistic skills, but as a window into their mind.
The rise of LLMs has brought to the fore two common fallacies about the language-thought relationship:
- good at language —> good at thought: a human or machine that’s good at language must also be good at thinking; and
- bad at thought —> bad at language: a model that’s bad at thinking must also be a bad language model.
LLMs like GPT-3 and chatbots like ChatGPT can generate language that rivals human output which, as per the good at language —> good at thought fallacy, have led to claims that LLMs represent a major step towards the development of human-like AI . Given that until very recently our language interactions have only been with other humans, it’s not surprising that we’re now ascribing human-like properties to these novel machine interactions.
But, as we’ve been learning, LLMs are actually not so good at thinking because they lack the world knowledge, common sense, and reasoning abilities of humans. This has led us to assume that, as per the bad at thought —> bad at language fallacy, LLMs may not be as good at language because they don’t really understand what they’re saying the way a human would.
These two fallacies “stem from the conflation of language and thought, and both can be avoided if we distinguish between two kinds of linguistic competence: formal linguistic competence (the knowledge of rules and statistical regularities of language) and functional linguistic competence (the ability to use language in the real world, which often draws on non-linguistic capacities).”
The paper further explains that the distinction between formal and functional linguistic competence comes from our understanding of the functional architecture of the human brain. Research in cognitive science and neuroscience has established that in the human brain “the machinery dedicated to processing language is separate from the machinery responsible for memory, reasoning, and social skills. Armed with this distinction, we evaluate contemporary LLM performance and argue that LLMs have promise as scientific models of one piece of the human cognitive toolbox — formal language processing — but fall short of modeling human thought.” Let me summarize the difference between formal and functional linguistic competence.
Formal linguistic competence
The paper defines formal linguistic competence “as a set of core, specific capacities required to produce and comprehend a given language.” These include knowing a language vocabulary, the grammatical rules to form correct sentences, and the many exceptions to these rules and idiosyncratic language constructions. “The result is the human ability to understand and produce language and to make judgments of the kind of utterances that are acceptable and unacceptable in a language.”
LLMs are already generating coherent, grammatical texts that, in many cases, are indistinguishable from human generated language. They have demonstrated the ability to acquire linguistic concepts like hierarchical structures and certain abstract categories from their inputs alone. In addition, LLMs have shown substantial value in the study of language learning and processing. “We therefore conclude that LLMs are on track to acquiring formal linguistic competence,” said the authors.
Human language processing, — including spoken, written or signed language comprehension and production, — is concentrated in specific brain areas in the frontal and temporal lobes, typically in the left hemisphere. The set of brain language areas is only used for language processing: “it responds robustly and reliably when people listen to, read, or generate sentences,” but not when they’re engaged in non-language cognitive tasks like solving problems, doing arithmetic, listening to music, recognizing facial expressions, or thinking in general.
Functional linguistic competence
The paper defines functional linguistic competence “as non-language-specific cognitive functions that are required when we use language in real-world circumstances.” These include formal reasoning, — e.g., problem solving, quantitative thinking, logical analysis; world knowledge, — e.g., common knowledge about how things generally work, including facts, concepts, and ideas; social cognition, — e.g., assumptions about human behavior that we generally share with other people, understanding the social context of conversations; and situation modeling, — tracking the key details of conversations, stories, books or films that unfold over time. “Real-life language use requires integrating language into a broader cognitive framework.”
The paper explains that the cognitive functions required for language comprehension and production are not specific to language and are supported by different brain circuits from those required to listen, read, or generate fluent sentences. That means that models that master the vocabulary and grammatical rules of human language still cannot use language in human-like ways. “In particular, they struggle when engaging in formal reasoning, fail to acquire comprehensive and consistent world knowledge, cannot track objects, relations and events in long inputs, and are unable to generate utterances intentionally or infer communicative intent from linguistic input. In other words, their functional language competence remains in its infancy.”
“The many failures of LLMs on non-linguistic tasks do not undermine them as good models of language processing,” wrote the authors in conclusion. “After all, the set of areas that support language processing in the human brain also cannot do math, solve logical problems, or even track the meaning of a story across multiple paragraphs.”
“Finally, to those who are looking to language models as a route to AGI, we suggest that, instead of or in addition to scaling up the size of the models, more promising solutions will come in the form of modular architectures — pre-specified or emergent — that, like the human brain, integrate language processing with additional systems that carry out perception, reasoning, and planning.”
Comments