Irving Wladawsky-Berger

A collection of observations, news and resources on the changing nature of innovation, technology, leadership, and other subjects.

ABOUT

Subscribe to this blog via email

“There is significant interest in the development and application of foundation models for scientific discovery,” said Foundation Models for Scientific Discovery and Innovation,” a recent report from the National Academies. “Foundation models possess the capacity to generate outputs or findings and discern patterns within extensive data sets with data volumes that are considered overwhelming for classical modes of inquiry. Efforts are under way to use these models to accelerate various aspects of scientific work flows (including streamlining literature reviews, planning experiments, data analysis, and code development) and generating novel findings and hypotheses that can then spur further research directions. However, significant challenges remain in the effective use of these models in scientific applications, including issues with flawed or limited training data and limited verification, validation, and uncertainty quantification capabilities.”

High performance computing has been a major part of my education and subsequent Pdcareer. In the late 1960s I was doing atomic and molecular calculations as a PhD physics student at the University of Chicago. Then in the early 1990s, I was the general manager of IBM’s new Scalable Powerparallel (SP) family of parallel supercomputers.

The advances of supercomputers over the past several decades have been remarkable.  The machines I used as a graduate student in the 1960s probably had a peak performance of a few million floating  point calculations per second (megaflops). Every year since 1993, the TOP500 project has been publishing a list of the 500 most powerful supercomputers in the world. In the latest such list, the fastest supercomputer surpassed 1.8  billion billion floating point calculation per second (exaflops).

AI is now taking high performance computing to a whole new level of capabilities. A September, 2023 issue of The Economist, How AI Can Revolutionize Science,” included a number of articles on the impact of AI on scientific discovery. “Debate about artificial intelligence (AI) tends to focus on its potential dangers: algorithmic bias and discrimination, the mass destruction of jobs and even, some say, the extinction of humanity,” noted the issue’s lead article. “As some observers fret about these dystopian scenarios, however, others are focusing on the potential rewards. AI could, they claim, help humanity solve some of its biggest and thorniest problems. And, they say, AI will do this in a very specific way: by radically accelerating the pace of scientific discovery, especially in areas such as medicine, climate science and green technology.”

The US Department of Energy (DOE) has long been a leader in the applications of the world’s most advanced supercomputers to conduct research on the biggest challenges facing our world, including modeling climate change, designing new kinds of materials, and protecting national security. DOE has some of the world’s most powerful supercomputers in its national laboratories, along with the expertise on how to use them effectively.

To explore how AI foundation models can be used to complement traditional supercomputing for scientific research, DOE asked the national academies to conduct a study on what’s required and recommend the approaches and investments necessary to support such an expanded DOE mission. “The resulting report recommends that DOE leverage its world-class scientific workforce, unique data, and experimental infrastructure to invest in foundation models development, particularly in areas of strategic importance for DOE’s mission.”

This is a highly comprehensive, long report. Let me summarize the reports’s key findings.

Integrating Foundation Models with Traditional Scientific Computing: A Pathway to Accelerated Discovery

“Foundation models can significantly enhance the entire research life cycle at DOE national laboratories and user facilities through multiple avenues of hybridization,” such as:

  • Simulation Acceleration and Enhancement: Foundation models, trained as surrogate models, can emulate computationally expensive physics simulations, allowing for accelerated parameter sweeps, ensemble studies, and real-time forecasting. Applications range from turbulence and fusion modeling to Earth systems science and high-energy physics.
  • Experimental Data Analysis: DOE facilities generate massive data sets across diverse modalities. Multimodal foundation models can interpret these data in real time, performing automated feature extraction, anomaly detection, and pattern recognition. This capability paves the way for “self-driving” experiments that optimize limited facility time and dynamically adjust to emergent results, fundamentally transforming experimental workflows.
  • Knowledge Discovery and Hypothesis Generation: With the scientific literature growing exponentially, AI foundation models — especially LLMs fine-tuned on curated corpora such as DOE’s Office of Scientific and Technical Information repositories — can synthesize findings, identify knowledge gaps, generate novel hypotheses, and suggest experiment designs.
  • Autonomous Laboratories: The fusion of foundation models with robotics and automated platforms unlocks the vision of “self-driving laboratories” that can autonomously design, execute, and interpret experiments. These systems promise to dramatically accelerate research cycles in materials discovery, synthetic biology, and beyond.

The Department of Energy (DOE) should study and develop the fusion of artificial intelligence (AI) and human capabilities. At present, Al systems handle the repetitive, manual, or routine tasks, and are starting to show abilities to reason. As Al becomes more capable, deep analysis and strategy recommendations become feasible, but humans should maintain oversight and validation, particularly for qualification and other aspects of DOE’s mission.”

Use Cases for DOE Foundation Models

“Commercial industry has driven rapid progress in developing large language model-based foundation models, yielding a robust ecosystem of tools and capabilities. … DOE can leverage these industry advances and findings as it develops foundation models for science and conducts coordinated DOE-wide assessments to identify appropriate opportunities.”

DOE retains clear strategic advantages in five areas:

  • A world-class scientific workforce in computational science;
  • Access to large-scale, science-focused, and experimental computing hardware;
  • Stewardship of unique experimental facilities and open or classified scientific data;
  • Capability to tackle long-term, high-risk, high-reward scientific problems; and
  • Access to unique scientific data for training future foundation models that may not be easily reproduced.

“The Department of Energy (DOE) should study and develop the fusion of artificial intelligence (AI) and human capabilities. …  As Al becomes more capable, deep analysis and strategy recommendations become feasible, but humans should maintain oversight and validation, particularly for qualification and other aspects of DOE’s mission. … DOE’s mission encompasses many areas including materials science, chemistry, physics, energy, Earth systems, and high-performance computing, to name a few. DOE also supports national security missions such as stewardship of the nation’s nuclear stockpile.”

Strategic Considerations and Directions for DOE Foundation Models

“Many DOE missions demand rapid analysis and decision making under urgent national security or economic constraints. Although the national laboratories hold deep institutional expertise — embedded in their workforce, legacy data sets, and extensive experimental and modeling infrastructure — the sheer scale of the DOE system, characterized by siloed specialized knowledge and the complexity of coordinating a large, distributed workforce, can be misaligned with the agility required for decisive action. Development of foundation models for this purpose poses a unique opportunity to address rapid analysis and decision making.

“DOE is uniquely positioned to shape the future of Al-driven science. … DOE’s unique strengths, such as its mission-driven work, long-term career paths, and powerful supercomputing infrastructure, can be leveraged to attract talent. Building a strong academic pipeline through closer collaboration with universities is also essential for its long-term success.”

“To increase the success of future foundation models for science, the Department of Energy should invest in large-scale data user facilities (classified and unclassified), leveraged by artificial intelligence’s growing capability to interpret heterogeneous scientific data, similar to the successes experienced with previous investments in supercomputers and open-source scientific computing libraries.”

Foundation Model Challenges

“Applying foundation models within DOE missions presents a multilayered set of scientific and operational challenges,” notes the report in its final section, reminding us that AI foundation models emerged in domains like natural language processing and vision that struggle to transfer directly into the strict requirements of DOE applications, posing major challenges for DOE foundation models. These include:

Security and safety considerations

”While AI systems can exceed human performance in many ways, they can also fail in ways a human likely never would. For this reason, it is important to keep humans in the loop for oversight and validation. DOE should prioritize further study of how AI and human capabilities can complement one another and evaluate the capabilities and risks of agentic AI systems.”

“To address potential security risks and protect against adversarial attacks, processes must be developed and implemented to verify that foundation models are reliable, safe, and trustworthy throughout their life cycles. DOE should explore proactive cybersecurity strategies for AI assurance, red-teaming, and the development of countermeasures.”

Improving validation and reproducibility

“AI models are inherently complex, opaque, and increasingly deployed in high-stakes situations. Establishing standards for verification, validation, and uncertainty quantification (VVUQ) for foundation models is critical to ensure that these systems can be used safely and effectively. … DOE should lead the development of VVUQ frameworks tailored to foundation models and prioritize data collection efforts to support reproducible foundation model training and validation.

While many of the technical challenges associated with foundation models can be addressed through research and development within DOE, deployment at DOE scale will involve external partnerships. DOE should deliberately pursue partnerships with industry and academia to address national mission goals.”

Investing in infrastructure and workforce

“To construct specialized foundation models and fully realize their power, DOE needs to modernize existing infrastructure and invest in new infrastructure to generate, curate, and facilitate the large data corpus necessary for scientific foundation models. DOE should also invest in large-scale data user facilities (classified and unclassified) that leverage AI’s growing capability to interpret heterogeneous scientific data.”

“To cultivate and maintain a top-tier workforce, DOE needs to design leadership-scale scientific research programs in machine learning and provide staff opportunities to rapidly adapt to a quickly evolving technological landscape. While it is difficult to compete with industry for AI talent, DOE can leverage its unique access to state-of-the- art science and mission-critical applications to attract early-career scientists.”

Posted in , , , , , ,

Leave a Reply

Discover more from Irving Wladawsky-Berger

Subscribe now to keep reading and get access to the full archive.

Continue reading