Synthetic Data Sets Set New Standard in Energy Systems Research

A decent maxim for systems engineering would be “there is no substitute for the real thing”—there is nothing like testing a new device or idea on a real system, at absolute fidelity. But for obvious reasons, this is not usually possible on energy systems: No utility is going to play with the power it serves to thousands, if not millions of customers.

Graphic of a Smart-DS distribution system in San Francisco.
A zoom-in of a Smart-DS synthetic distribution system in San Francisco.
Rather than the real thing, an National Renewable Energy Laboratory (NREL) team and external collaborators have invented the next best thing: near-replicas of electricity grids—realistic down to individual devices, and up to long-distance power transmission. Since its application in several groundbreaking research projects, the team’s creation, Synthetic Models for Advanced, Realistic Testing: Distribution Systems and Scenarios (Smart-DS), is set to become the new standard in full-scale modeling and simulating energy system scenarios.

“The goal of this project is to create the next generation of electrical data sets,” said NREL Engineer Tarek Elgindy. “This displaces what’s been used for the past 15 or 20 years,”

Historically, engineers and innovators would benchmark new technologies on much-simplified models, such as those derived from the IEEE test feeders that first appeared in the early ‘90s. These served a valuable purpose in verifying that then-emerging computer simulation tools were accurate, but their small size and lack of specificity were a far cry from the true complexity and scale of the actual electric grid. Moreover, to bend these common test systems to their specific applications—for example, to project the impact of rooftop solar growth, or to test out a new control algorithm—researchers often customize their experimental data sets. The result is messy. Researchers use many modified forms of a generic data set, which leaves power grid research both inexact and hard to reproduce. Smart-DS is intended to solve those issues.

Without insult or exaggeration, Smart-DS data sets compare like the “Mona Lisa” to the current stick-figure standard of power systems testing. While the largest IEEE system has 8,500 electric nodes, the Smart-DS San Francisco data set has 10 million. Smart-DS, which emerged from the ARPA-E GRID DATA program, algorithmically generates a model energy system to match an area’s geography, population distribution, land use, and other factors. It is like an alternate reality, where a region’s power system developed slightly different, but functionally the same. The modelled energy system captures true complexity and provides a new standard with realistic scale and detail. The method that researchers used to build the data sets has been published in IEEE, though the specific software is proprietary.

“In addition to having something at scale that is reasonably similar, now researchers have a resource to compare developments, to say that ‘my algorithm is 25% faster,’” said NREL Principal Research Engineer and Group Manager Bryan Palmintier. “With the old system, there was no way to compare big system results. Researchers would need to go to the same utility, use the same data, same everything.”

Utility data—the actual area-specific information of how power is bought, moved, and used—is necessarily protected and only accessible through non-disclosure agreements, which limit the data’s application. The Smart-DS data sets avoid the confidentiality concerns but keep the realism.

“These data sets are good fiction—they are not the real grids by design. We don’t want to build the real systems, because then we’re subject to constraints,” Palmintier said. “This is for folks who are building up greater algorithms and grid solutions and need a high-fidelity standardized resource to validate them.”

The first data sets from Smart-DS reproduced the electric distribution system of entire cities: Greensboro, North Carolina; Santa Fe, New Mexico; and the Bay Area in California. Drafts of these data sets have been published on, a data repository developed within the same ARPA-E program. Final data sets are expected by the end of the year. For each of the synthetic city networks, the development team also created standardized scenarios—for example, possible expansions in renewable loads like electric vehicles and battery storage. Instead of modifying the IEEE 37-node test feeder to look vaguely similar to a specific scenario, researchers can grab a Smart-DS-generated data set like “San Francisco, Version 1.0, High solar, moderate storage.”

Graphic of a Smart-DS distribution system in Santa Fe.
A Smart-DS synthetic distribution system dataset of Santa Fe, demonstrating Smart-DS’ scale of grid representation.
The difference is not just consistency, but resolution. The synthetic systems include tens of millions of electric nodes, representing millions of customers, and the diverse electrical infrastructure that feeds those loads. These are thousands-of-times larger than previous test systems, and unlike anything before, Smart-DS data sets include the entire electric system, combining transmission and distribution data sets, as well as the low-voltage system that connects the grid to individual customers. Such high resolution will be seen in the upcoming release of Smart-DS data sets, which includes a replica of Texas’ entire power system.

The scenarios that are built on top of these data sets provide standardized locations and technical parameters to capture realistic impacts from other domains. Scenarios are provided for multiple levels (e.g., low, moderate, high) of rooftop solar, utility-scale solar, customer storage, utility storage, electric vehicle (EV) chargers, enhanced utility controls, and more, such as the resilience scenario of damaged equipment. Researchers can then mix and match to form a combination of these scenarios to stress-test their technologies against reasonable and comparable variations in future energy systems.

What is the secret to creating such realistic models? Besides extensive collaboration with Comillas Pontifical University, Massachusetts Institute of Technology, Texas A&M University, leading distribution software vendors, and multiple utilities, the development team borrowed from NREL’s unique toolkit. This toolkit includes ResStock for residential building models and ComStock for commercial building models, and the National Solar Radiation Database, managed by NREL, for realistic solar availability. Altogether these provide key inputs for realistic scenarios, but at such large sizes, there can also be challenges effectively running the full models for advanced use cases.

“There are a lot of tools that model systems really well—refined models for distribution systems [D] and transmission systems [T],” Elgindy said. “When you get to the scale of Texas, how do you integrate these? We use HELICS to glue them together, to coordinate electricity between T and D, and to connect that domain knowledge.”

HELICS—the Hierarchical Engine for Large-scale Infrastructure Co-Simulation—allows Smart-DS to combine distinct energy systems and research domains. HELICS brings together several NREL tools within Smart-DS to enable simulation of the next generation of energy system models. This allows users to simulate a combination of various parts of the grid, such as the transmission system and a portion of the distribution system, on their laptop, and to adapt these simulations to a high-performance computing such as a utility or university cluster. Simulating the entire energy grid of Texas does require some heavy lifting: Smart-DS uses a dedicated node on NREL’s Eagle supercomputer to link large numbers of distribution simulations on other Eagle nodes with commercial power-flow tools running in Windows.

Together, Smart-DS and associated tools have revolutionized research at NREL. Landmark simulations have been achieved on replica energy systems, including algorithms for NREL’s Autonomous Energy Grids initiative, advanced mobility integration in the GEMINI-XFCPDF project, standardized models for the ARPA-E PERFORM project, and cybersecurity validations under the Situational Awareness of Grid Anomalies project.

To arrive at such high-fidelity models, the development team created some resources along the way. DiTTo (Distribution Transformation Tool) is one spin-off software that converts various distribution system modeling formats. Another, the Reference Network Model-US—which generates electrical models that match the technical specifications of U.S. systems—is being adapted by URBANopt™ to understand relationships between building layouts and distribution systems.

Although Smart-DS has transformed the scale and fidelity of energy systems research, this is just its debut. In-step with the Advanced Research on Integrated Energy Systems (ARIES) initiative, Smart-DS and associated tools will push even further into the frontiers of integrated energy systems, uniting power hardware with high-detail models of diverse energy domains to enable research that truly reflects the scale and complexity of modern power systems.

The next generation of Smart-DS data sets will be published soon—larger-scale, more sophisticated scenarios and greater integration with other energy domains. Stay tuned for news about the next release of Smart-DS data sets.