Sandia Labs: Sandia researcher awarded Early-Career Research Program grant

Working to solve a problem, supercomputing researchers may encounter incomplete data or flawed programs. For both issues, Sandia researcher Drew Kouri has attracted interest from the broad computing community for his ability to mitigate uncertainty in both supercomputer programs and data, optimizing each to reach the best solutions.

His research was awarded a best-paper designation for 2019 in the journal Optimization Letters, and now has earned him a Department of Energy Early-Career Research Program grant, titled “Adaptive and Fault-Tolerant Algorithms for Data-Driven Optimization, Design, and Learning.”

The DOE grant, for Advanced Scientific Computing Research, provides about $500,000 per year for five years and is expected to cover Kouri’s salary and research expenses, including the salaries of post-doctoral assistants.

“Maintaining our nation’s brain trust of world-class scientists and researchers is one of DOE’s top priorities — and that means we need to give them the resources they need to succeed early on in their careers,” Secretary of Energy Jennifer M. Granholm said. The grant is one of 83 distributed this year by the 12-year-old program.

Kouri’s optimization algorithms solve complex problems in technical fields that may involve uncertain responses from subcomponents. Among those of interest to him are interactions between ice sheets and sea ice in climate models and between fuel pellets and protective cladding in light-water nuclear reactors. Other applications for Kouri’s optimization algorithms are radio frequency cavity designs for particle accelerators, energy network resource allocation, parameter estimation in seismology and the training of machine-learning models.

Already the joint author of more than 20 papers on the theme of developing algorithms for risk minimization, Kouri earned his doctorate in 2012 from Rice University in Houston in computational and applied mathematics. (His dissertation title, which seems a signpost for Kouri’s later work, is “An Approach for the Adaptive Solution of Optimization Problems Governed by Partial Differential Equations with Uncertain Coefficients.”)

Resilient algorithms to optimize extreme-scale simulations

“I am developing optimization algorithms to produce solutions that are resilient to faults and errors induced by three factors: next-generation supercomputing hardware, physical data insufficiencies and uncertainties in the model,” he said.

“While it may not always be possible to ‘reduce uncertainty,’ still, one must make a decision that accounts for the uncertainty.”

His algorithms, which handle uncertainties by mathematically quantifying their effects, are a way for supercomputer programmers to work around mistakes caused by error-prone hardware or software “without throwing an entire day’s work away,” he suggested.

Since uncertainties may grow as researchers at the national laboratories and other supercomputing locations upgrade their computers from petascale (a million billion operations per second) to exascale (a billion billion operations per second), Kouri noted that the increased speed and data flow of the incoming machines may magnify omissions and other errors.

There is a relation between his work, artificial intelligence and machine learning, all of which use optimization techniques to reach their solutions, he said.

“Machine learning and AI problems are typically posed as optimization problems,” he said, “and the algorithms that I develop could be applied to solve them.” The difference lies in how the problems are modeled.

“Machine learning and artificial intelligence models often are not motivated by physics. For the problems that I consider, the models typically come from physical laws,” he said.

According to the Early Career grant description of Kouri’s intent, his “work will permit inexpensive approximations during early iterations [and] … employ randomized compression of individual components to lessen the memory burden and to safeguard against hardware faults and failures.”