A recent issue of The Economist featured a technology focus on “The breakthrough AI needs” with eight articles on the topic. “Two years after ChatGPT took the world by storm, generative artificial intelligence seems to have hit a roadblock,” said the issue’s lead article. “The energy costs of building and using bigger models are spiralling, and breakthroughs are getting harder. Fortunately, researchers and entrepreneurs are racing for ways around the constraints. Their ingenuity will not just transform AI. It will determine which firms prevail, whether investors win, and which country holds sway over the technology.”
This is frightening for investors who’ve bet big on AI, but there is no reason to panic, said the article. “Plenty of other technologies have faced limits and gone on to prosper thanks to human ingenuity,” it added. “Already, developments in AI are showing how constraints can stimulate creativity.” In particular, The Economist mentions two such major innovations: the development of chips with the special purpose architectures needed to train and run AI models as fast and energy efficient as possible, and the development of smaller, more specialized domain specific models that consume a lot less energy than the very large models that rely on brute force computational power.
Let me discuss each of these two innovations.
Special purpose architectures
“A Cambrian moment,” one of The Economist’s articles, explains the key advances in chip architectures over the past couple of decades. AI has propelled chip architectures toward specialization and a tighter bond with the software they’re designed to run. The article uses the geological Cambrian period, when life on Earth went through a remarkable period of diversification, as a metaphor for the diversification that chips have been going through. Let me explain.
Cells, the basic building blocks of all life, first emerged on Earth around 4 billion years ago. Evolution continued to perfect the cell over the next few billion years, giving rise to a variety of single-celled organisms, followed later by multi celled organisms of various types. Then around 540 million years ago, a dramatic change took place in life on Earth, — the Cambrian Explosion.
Using the now perfected cells as building blocks, evolution took off in a very different direction. Over the next 50 million years or so, evolution accelerated, ushering a diverse set of organisms far larger and more complex than anything that existed before. By the end of the Cambrian geological period, the diversity and complexity of life began to resemble that of today.
A kind of Cambrian Explosion has been taking place in the world of IT over the past few decades. For the past 50 to 60 years, we’ve been perfecting our microprocessors, memory chips and other digital components based on our ability to cram more transistors in an integrated circuit, leading to a significant increase in the performance of computing.
In his legendary 1965 paper, Intel co-founder Gordon Moore first articulated what’s become known as Moore’s Law, the empirical observation that the number of components in integrated circuits had doubled every year since their invention in 1958. Moore predicted that the trend would continue for at least ten years, a prediction he subsequently changed to a doubling every two years.The semi-log graphs associated with Moore’s Law have since become a visual metaphor for the technology revolution unleashed by the exponential improvements of just about all digital components, from processing speeds and storage capacity to networking bandwidth and pixels.
For decades, central processing units (CPUs) became the basic components of general purpose computers capable of running any software, — from operating systems, to middleware like compilers, data base systems and browsers, to a wide variety of applications. General purpose CPUs are arguably the equivalent of cells, the basic building blocks of life in biology. Up to roughly the 2000s, IT hardware and software companies were content with the exponential processing gains that chipmakers delivered to CPUs every few years. Moore’s Law has had a very impressive run, but, like all good things, especially those based on exponential improvements, the long anticipated slowdown of semiconductor advances finally arrived in the 2000s.
Around the same time, AI applications based on machine learning models started to take off. Machine learning algorithms are based on artificial neural networks, — a highly specialized architecture inspired by the structure of the human brain, — made up of simulated layers of nodes or artificial neurons that perform additions and multiplications in parallel on very large number of nodes.
General purpose CPUs were not designed to support the large-scale parallel processing of the simple arithmetic operations used by neural networks. Specialized AI accelerators with multiple cores were required to process large neural network algorithms in parallel. Graphics processing units (GPU) originally designed to speed up the demanding graphics of image processing and video games turned out to be well suited for processing neural networks, outperforming general purpose CPU by orders of magnitude.
The 2010s saw the development of increasingly impressive AI applications based on multi-layered deep learning neural networks, where the output of a layer is passed on to the next layer if it clears a certain threshold. Deep learning neural networks can be a few layers deep or over 100. By the late 2010s, leading edge AI applications required far more processing power than previous applications just a few years earlier.
GPUs have become necessary, but not sufficient. More recently foundation models and generative AI applications have become much bigger, and the volume of data they process has increased exponentially, giving rise to a memory access bottleneck. Addressing these memory bottlenecks required taking the parallel approach of GPUs one step further, which Google took by coming up with a chip specifically designed for large neural networks, the tensor processing unit.
The TPU “contains thousands of multiply-and-add units directly connected in a giant grid. The TPU loads the data from external memory into its grid, where it flows through in regular waves, similar to how a heart pumps blood. After each multiplication the results are passed to the next unit. By reusing data from previous steps, the TPU reduces the need to access the off-chip memory. TPUs are a type of ‘domain-specific’ (DSA), processors that are hard-wired for one purpose. DSAs designed for AI algorithms are typically faster and more energy-efficient than generalist CPUs or even GPUs.”
Smaller, more energy efficient AI models
For much of the transistor’s history, chips not only got faster but used less power. “But that era is over,” notes “The relentless innovation machine,” another article in The Economist’s issue. “Leading-edge AI processors cram more transistors on a single chip or stack multiple ‘chiplets’ into one package to boost computing oomph. But the performance gains have come at a cost: the energy consumed by a chip has ballooned.” Blackwell, Nvidia’s latest super chip, “runs five times faster than its predecessor, but uses 70% more power in the process.”
“Data centres lash hundreds or thousands of these power-hungry chips together to run large artificial-intelligence (AI) models. By some estimates, OpenAI, maker of ChatGPT, guzzled more than 50 gigawatt-hours of electricity to train its latest model. The International Energy Agency calculates that in 2022 data centres consumed 460 terawatt-hours, or almost 2% of global electricity demand. The agency expects this figure to double by 2026.”
GPUs, TPUs, and other highly specialized AI chips will improve the energy efficiency of AI systems as well as their performance. The next major step will be to identify the next set of AI functions whose performance and energy efficiency can be significantly improved with the use of domain specific architectures (DSAs). Over time, very large AI models that rely on brute force computational power might give way to smaller AI systems, functions, and chips optimized for specific domains.
Once more, we can look to evolution for inspiration. Our brains are amazingly energy-efficient organs. The human brain contains approximately 100 billion neurons and consumes up to 20% of the energy used by our body, more than any other organ. “In computing terms, it can perform the equivalent of an exaflop — a billion-billion (1 followed by 18 zeros) mathematical operations per second — with just 20 watts of power. In comparison, one of the most powerful supercomputers in the world, the Oak Ridge Frontier, has recently demonstrated exaflop computing. But it needs a million times more power — 20 megawatts — to pull off this feat.”
How did the human brain get to be so efficient? Using brain scanning tools like functional MRI, cognitive and neuroscientists have shown that the functions of the brain, like those involved in language processing, reasoning, and problem solving, are concentrated in very distinct regions of the brain. Over 10s of millions of years, going back to early primates, our brains have evolved as a complex system of specialized functions that have worked together to enable us to survive and reproduced, as required by natural selection.
In the end, pushing AI beyond its current limits with ingenuity rather than brute force is truly the breakthrough that AI needs. “The AI era is still in its infancy, and much remains uncertain.” The next several decades promise to be very exciting as well as challenging.
Comments