“AI systems don’t operate like traditional software – they require distinct development processes and rely on specialized and costly resources currently pooled in the hands of a few large tech companies,” wrote David Gray Widder, Meredith Whittaker, and Sarah Myers West in a recent research paper, “Open (for Business): Big Tech, Concentrated Power, and the Political Economy of Open AI” (Open AI). “Even so, many of the original promises of the open source movement, which were made in reference to open source software, are now being projected onto ‘open’ AI. From the promise that open source could democratize software development, to the view that many eyes on open code could ensure it did what it said it did and was free of vulnerabilities, to the perspective that open source levels the playing field, allowing the most innovative to triumph. Open source software did many of these things, to varying degrees. But ‘open’ AI is a different story.”
I’ve been personally involved with open source, open standards, and related initiatives for the past few decades. In the mid-late 1990s, I led IBM’s Internet Division, and let's remember that a major part of the explosive growth of the internet and World Wide Web was due to the widespread availability of open source implementation of their key open standards. In the early 2000s I led IBM’s Linux initiative, which played a leading role in organizing a consortium of companies to support the development of Linux which later became the Linux Foundation (LF). And, for the past two years, I’ve been a member of the Advisory Board of the Linux Foundation Research division.
In trying to understand how the original promises of the open source movement now apply to this new generation of AI systems, I found the Open AI paper quite helpful. Let me summarize some the key points discussed in the paper.
What is (and is not) open about Open Source AI
The paper notes that openness in AI is a hard concept to define, in part because AI itself is not clearly defined, and neither is what ‘open’ means when dealing with highly complex systems like AI. “Indeed, there is currently no agreed on definition of ‘open’ or ‘open source’ AI, even as attention to the topic has exploded,” note the authors.
Artificial intelligence first came to light in the mid-1950s as a promising new academic discipline that aimed to develop intelligent machines capable of handling human-like tasks like natural language processing. AI became one of the most exciting areas in computer sciences in the 1960s, ’70s, and ’80s, but after years of unfulfilled promises and hype, a so called AI winter of reduced interest and funding that nearly killed the field set in everywhere.
AI was successfully reborn in the 1990s with a totally different paradigm based on analyzing large amounts of data with sophisticated algorithms and powerful computers. This data-centric AI paradigm has continued to advance over the past 20 years with major innovations in a number of areas like big data, predictive analytics, machine learning, and more recently large language models (LLMs) and generative chatbots. AI has now emerged as one of, if not the key defining technology of the 21st century.
“Because large and generative AI systems are those that most clearly perturb the boundaries and question the traditional ideologies of open source and open science, we focus primarily on those systems in this paper,” wrote the authors. “When we use the term AI, we are using it to refer to these large systems.”
“Broadly, the terms ‘open’ and ‘open source’ are used in the context of AI in varying ways to refer to a range of capabilities that can be broadly bucketed as offering attributes of transparency — the ability to access and vet source code, documentation and data; reusability — the ability and licensing needed to allow third parties to reuse source code and/or data; and extensibility — the ability to build on top of extant off-the-shelf models, ‘tuning’ them for one or another specific purpose. While the terms ‘open’ and ‘open source’ are used variably to refer to these attributes, in practice there are gradients of openness that offer vastly differing levels of access.”
“We find that while a handful of maximally open AI systems exist, which offer intentional and extensive transparency, reusability, and extensibility — the resources needed to build AI from scratch, and to deploy large AI systems at scale, remain ‘closed’, — available only to those with significant (almost always corporate) resources.” The paper references “The Gradient of Generative AI Release: Methods and Considerations,” a recent paper which defined six gradients of openness in current generative AI systems.
The arguments For and Against ‘Open’ AI
“Openness is currently being referenced by powerful actors in policy discussions in an effort to shape the trajectory of regulatory policy around AI,” note the authors. This is not surprising. “Conversations and understandings about ‘open’ AI have significant and high-stakes implications beyond the narrow terms of open source licenses, transparency mechanisms, or extensibility, and are currently being deployed to shape the AI policy landscape overall. As with any examination of lobbying and influence, the rhetoric around ‘open’ AI policy advocacy must be read in light of the particular interests of the entity making them. And as in the past, openness is today being used by companies as a rhetorical wand to lobby to entrench and bolster their positions.”
IBM’s embrace of Linux in the 2000s was accompanied by similar serious challenges. Some of the companies who viewed Linux as a competitive threat strongly attacked not only IBM, but also companies that used Linux as well as the overall Linux community. Threats against Linux and the companies that embraced it continued for years, including a multi-billions lawsuit against IBM.
The Open AI paper discussed some of the claims being made about for and against open source AI. Let me discuss two of these claims, one for: safety through transparency; and one against: increased insecurity.
Open AI creates safety through transparency
In the late 1990s I was the industry co-chair of the President’s Information Technology Advisory Committee (PITAC). The use of open source software in highly sensitive systems was still fairly new so in October of 1999 PITAC convened a Panel on Open Source Software for High End Computing that included representatives from universities, federal agencies, national laboratories, and supercomputing vendors. The Panel held a number of meetings over the next year, and released its final report in October of 2000. Our findings are particularly relevant to our current discussions:
“The PITAC believes the open source development model represents a viable strategy for producing high quality software through a mixture of public, private, and academic partnerships. … Open source software may offer potential security advantages over the traditional proprietary development model. Specifically, access by developers to source code allows for a thorough examination that decreases the potential for embedded trap doors and/or Trojan horses. In addition, the open source model increases the number of programmers searching for software bugs and subsequently developing fixes, thus reducing potential areas for exploitation by malicious programmers.”
Similarly, the Open AI paper notes that “open-source AI promotes safety by enabling researchers and authorities to audit model performance, identify risks, and establish mitigations or countermeasures.” However, the authors add that “The effectiveness of auditing as a safety measure is heavily predicated on ensuring that significant resources are available and incentives aligned such that meaningful audits actually take place, and are robust enough to account for the real risks posed by the deployment of AI models.”
Open AI increases insecurity
“Arguments in the opposing direction position open source AI as a source of deep insecurity, by making powerful technology widely available for reuse, potentially placing it in the hands of bad actors.”
“Concerns about insecurity emanating from the proliferation of open source AI models are justified, among other reasons because open source models enable AI to be fine-tuned at small scale without a steep learning curve. What remains unremarked and unclear about this argument, especially given those making it, is why access to the same or similarly powerful models to those obtained through a cloud contract from Microsoft or Google — which is the current standard — poses less danger than reusing an openly released AI model. … A small number of large players does not, of its own accord, ensure safer AI.”
Conclusion
“Even in its more maximal instantiations, in which ‘open’ AI systems provide robust transparency, reusability, and extensibility, such affordances do not, on their own, ensure democratic access to or meaningful competition in AI,” wrote the authors in conclusion. “Nor does openness alone solve the problem of AI oversight and scrutiny. Even so, the rhetoric and promise of openness in AI systems is being leveraged by powerful companies to bolster their positions in the face of growing interest in AI regulation.”
“Policymakers need to approach the task of regulating AI with a clear understanding of the many things AI is, and is not, and with a materially grounded recognition of what ‘open’ AI can, and cannot, deliver. This will produce a vastly different picture of the affordances of ‘open’ AI than that being painted in much of the current rhetoric. It will also require focusing on the significant differences between open source software and ‘open’ AI, and recognizing that the development processes, resource requirements, and inherent centralization of AI mean that it cannot be easily described or defined in terms forged originally to promote and define open source software.”
Comments