UMD Researchers Awarded $7M to Integrate AI into High-Performance Computing
ChatGPT and other large language models (LLMs) are increasingly helping coders and software engineers streamline their workflows, and Facebook parent company Meta even reports nearly all its developers use internal LLMs to enhance productivity.
While advanced artificial intelligence (AI) has rapidly progressed, it falls short in creating parallel code for software used in high-performance computing (HPC), often called supercomputing. Instead of a single machine carrying out a string of instruction, HPC involves executing complex parallel programs and processing massive datasets across hundreds or thousands of computing cores all at once—and this is where large language models can get tongue-tied.
Supported by a new $7 million multi-institutional award from the U.S. Department of Energy (DOE), two University of Maryland researchers—Abhinav Bhatele, an associate professor of computer science, and Tom Goldstein, a professor of computer science—are working to address this issue, collaborating with federal scientists and others to develop an AI-assisted HPC software ecosystem.
The project will take advantage of supercomputers at the Lawrence Livermore National Laboratory in California and Oak Ridge National Laboratory in Tennessee to test team’s novel software in hope of increasing the productivity of LLM software development for these massive computing clusters by at least tenfold.
While it’s relatively straightforward to write code for one computer, writing code that needs to run on 1,000 computers simultaneously is a feat, said Goldstein, director of the UMD Center for Machine Learning.
“All processors in the system must work in unison—timing their computation and communication in a sort of symphony—to ensure everything works,” he said.
Existing LLMs are good helpers for developers creating simpler programs, but they can’t handle more complicated tasks.
“If you ask an LLM to write code for a sequential program, which are programs that run one after the other, they’ll do just fine,” Bhatele said. “But, if you ask them to write code for parallel programs, that’s where LLMs can fail, creating code that’s confusing or that just doesn’t work.”
Goldstein’s research aims to address these shortcomings by helping LLMs handle massive amounts of information, perhaps by training the models with potentially millions of specific words related to the code, so that they gain a comprehensive understanding of the codebase; or providing them with a means to retrieve only the most relevant parts of the code, allowing them to access the codebase without having to process it all at once.
Meanwhile, Bhatele’s work aims to target objectives beyond just generating correct code—a focus of many researchers—and enhance the code’s performance or execution speed, quality and even energy efficiency.
Bhatele and Goldstein—who both have appointments in the University of Maryland Institute for Advanced Computer Studies (UMIACS)—will receive approximately $1 million of the DOE funds over three years to support gathering data and experimenting with techniques like generating synthetic code using LLMs to augment real data.
“We’re going to build our own datasets, our own language model tools, and our own retrieval models,” Goldstein said. “So, one of our biggest challenges is that we must cook from scratch. But that’s a challenge that we’re really excited about.”