“Even in its present state, the technology of artificial intelligence raises many concerns as it transitions from research into widespread use,” wrote UC Berkeley professor Stuart Russell in a recent essay, If We Succeed. “These concerns include potential misuses such as cybercrime, surveillance, disinformation, and political manipulation; the exacerbation of inequality and of many forms of bias in society; the creation and deployment of lethal autonomous weapons; and the usurpation of human roles in the economy and in social relationships.”
“My concern here, however, is with the potential consequences of success in creating general-purpose AI: that is, systems capable of quickly learning to perform at a high level in any task environment where humans (or collections of humans) can perform well.” Decades ago, Alan Turing expressed a similar concern. While answering a question during a lecture in 1951, Turing said: “It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. … At some stage therefore we should have to expect the machines to take control.” Over the past decade, a number of public figures like Stephen Hawking and Nick Bostrom have argued that a superintelligent AI could become difficult or impossible for humans to control, and might even pose an existential threat to humanity.
Until recently, we didn’t have to worry about the consequences of such superhuman AIs because AI applications weren’t all that smart and were mostly confined to research labs. But, powerful AI systems have now matched or surpassed human levels of performance in a number of application- specific tasks such as image and speech recognition, skin cancer classification, breast cancer detection, and complex games like Jeopardy and Go. While these application-specific breakthroughs, - generally referred to as soft, or narrow AI, - were once viewed as the exclusive domain of humans, AI still lacks the deeper, general-purpose human intelligence that’s long been measured in IQ tests.
“General-purpose AI has been the long-term goal of the field since its inception,” noted Russell. “Given the huge levels of investment in AI research and development and the influx of talented researchers into the field, it is reasonable to suppose that fundamental advances will continue to occur as we find new applications for which existing techniques and concepts are inadequate. … The potential benefits of general-purpose AI would be far greater than those of a collection of narrow, application-specific AI systems. For this reason, the prospect of creating general-purpose AI is driving massive investments and geopolitical rivalries.”
Russell isn’t worried that increasingly powerful AI systems will spontaneously become conscious and decide to hurt or eliminate humans sometime in the future. His concern is with the potential damage that could be inflicted by a powerful AI that humans are unable to control, not because it’s malicious, but because it’s been poorly developed and inadequately tested. We’ve seen disasters due to poorly developed and managed technologies in a number of areas, such as the Chernobyl and Fukushima nuclear reactors.
Russell’s concerns with AI and his proposals to overcome them are the subject of his 2019 book Human Compatible: Artificial Intelligence and the Problem of Control, and are nicely summarized in this NY Times OpEd and in this excellent Stanford seminar which I recently attended.
According to long-held notions of human behavior, a rational action is one that can be expected to achieve one’s objectives. When AI emerged in the 1950s, researchers borrowed these notions of rational behavior in humans to define machine intelligence: a machine is intelligent to the extent that its actions can be expected to achieve its objectives. Russell calls this the standard model in AI.
However, unlike humans, machine have no objectives of their own, so it’s up to us to not only create the machines but to define the objectives we want them to achieve. “The more intelligent the machine, the more likely it is to complete that objective.”
“Unfortunately, this standard model is a mistake,” said Russell. “It makes no sense to design machines that are beneficial to us only if we write down our objectives completely and correctly.” What happens if the machines are pursuing the fixed objectives that we gave them, but those objectives are misaligned with human benefit? There’s no need to assume that the machines are out of control due to some emergent consciousness that’s spontaneously generated its own objectives. “All that is needed to assume catastrophe is a highly competent machine combined with humans who have an imperfect ability to specify human preferences completely and correctly. This is why, when a genie has granted us three wishes, our third wish is always to undo the first two wishes.” The consequences of unleashing forces we inadequately understand were famously depicted in The Sorcerer’s Apprentice segment of the classic 1940s film Fantasia.
“The standard model, then, despite all its achievements, is a mistake. The mistake comes from transferring a perfectly reasonable definition of intelligence from humans to machines. It is not rational for humans to deploy machines that pursue fixed objectives when there is a significant possibility that those objectives diverge from our own.” Such a future is almost inevitable because there’s little chance that we can specify our objectives completely and correctly. “Indeed, we may lose control altogether, as machines take preemptive steps to ensure that the stated objective is achieved.”
Instead, we must design AI systems that are beneficial to humans, not just smart. Rather than developing machines to achieve their own objectives, we should embrace a different AI model: machines are beneficial to the extent that their actions can be expected to achieve our objectives. This means that machines will necessarily be uncertain about our objectives, while being obliged to pursue them on our behalf. Such a chance might seem small, but it’s crucial.
“This new model for AI, with its emphasis on uncertainty about objectives, entails a binary coupling between machines and humans that gives it a flavor quite different from the unary standard model of decoupled machines pursuing fixed objectives.”
“Uncertainty about objectives might sound counterproductive, but it is actually an essential feature of safe intelligent systems. It implies that no matter how intelligent they become, machines will always defer to humans. They will ask permission when appropriate, they will accept correction, and, most important, they will allow themselves to be switched off - precisely because they want to avoid doing whatever it is that would give humans a reason to switch them off. Once the focus shifts from building machines that are intelligent to ones that are beneficial, controlling them will become a far easier feat.”
Putting such a model into practice will require a great deal of research in the coming decades, said Russell in conclusion. “This won’t be easy. But it’s clear that this model must be in place before the abilities of A.I. systems exceed those of humans in the areas that matter. If we manage to do that, the result will be a new relationship between humans and machines, one that I hope will enable us to navigate the next few decades successfully.”
Comments