Scientists Reveal Detailed Human Pangenome Reference That Captures Human Diversity

The Human Genome Project, funded by the National Institutes of Health (NIH), ended in April 2003 and produced a human genome sequence made up of a patchwork of data from a small number of individuals. This lack of diversity limited its usefulness as a research tool for understanding human health and disease. Now, researchers have published a new set of reference human genome sequences that reveals far more genomic diversity from different populations of people than was available previously.

Led by the international Human Pangenome Reference Consortium, the work is funded by the National Human Genome Research Institute (NHGRI) of the NIH and appears in a set of papers published May 10 in the journal Nature.

Washington University School of Medicine in St. Louis serves as the national coordinating center for the consortium. Ting Wang, the Sanford C. and Karen P. Loewentheil Distinguished Professor of Medicine at Washington University, leads the coordinating center and is a co-senior author on one of the papers. Washington University also is a key player in the national data production center, led by the University of California, Santa Cruz. Wang leads Washington University’s data production, which contributed almost one-third of the data for this first set of studies.

“These new pangenome reference sequences will serve as an important tool in understanding the diversity of human genetics and its role in determining how human health is maintained and what can go wrong in various diseases,” Wang said. “We look forward to sharing this resource with the research community around the world.”

The new pangenome reference includes genome sequences from 47 people of diverse backgrounds. The study recruited participants from communities around the globe, including, for example, people of African Caribbean ancestry in Barbados, people of African ancestry in the southwest U.S., people of Peruvian ancestry in Lima, Peru, members of the Punjabi community in Lahore, Pakistan, and Han Chinese people in southern China, among others.

The work is ongoing, and the researchers hope to have sequenced the genomes of 350 people by mid-2024. This larger sample size will offer a more complete view of the full diversity of human populations globally.

A human genome is the DNA blueprint guiding the embryonic development and daily bodily functions of a person. In general, any two individuals’ genomes are about 99.6% identical. The 0.4% difference makes each person unique and can reveal information about a person’s health and risk of diseases such as cancer, and Alzheimer’s and heart disease, for example.

The current reference human genome sequence has gaps, especially in areas that are repetitive and hard to read. Recent technological advances such as long-read DNA sequencing, which reads longer stretches of the DNA at a time, helped researchers fill in those gaps to create the first complete human genome sequence. This complete human genome sequence was released last year as part of the NIH-funded Telomere-to-Telomere (T2T) consortium and is incorporated into the current pangenome reference.

“The human pangenome reference will enable us to represent tens of thousands of novel genomic variants in regions of the genome that were previously inaccessible,” said Wen-Wei Liao, a doctoral student in Washington University School of Medicine’s Division of Biology & Biomedical Sciences and a co-first author of one of the Nature papers. He has two academic affiliations and is currently conducting his research at Yale University. “With a pangenome reference, we can accelerate clinical research by improving our understanding of the link between genes and disease traits in diverse populations.”

Other researchers contributing to the new human pangenome reference include those from the University of California, Santa Cruz; Harvard Medical School; Yale University; Heinrich Heine University in Germany; and the University of Tennessee.