, , ,

Pramod Mahajan

Pramod B. Mahajan, Ph.D., is chair of the AAPS Pharmacogenomics focus group.

Last week, the National Human Genome Research Institute (NHGRI) announced completion of the Encyclopedia of DNA Elements, or Project ENCODE, which essentially forms the latest chapters added to the book of the human genome. The DNA double helix was first described in 1953. Several technological and scientific discoveries during the subsequent decades (fig. 1) helped us understand how genetic information is translated into proteins­­—the workhorses of living cells. In 2003, scientists succeeded in decrypting a complete sequence of the human genome. Initial analysis of the approximately 3100 million letters in this book of human life revealed, to everyone’s surprise, that only about 1.5% of the entire human genome sequence accounted for all the proteins made in human body. ENCODE_Fig1So the million dollar question was: What is the remaining 98.5% doing in human cells? Multiple hypotheses ensued, including one that characterized this extra DNA as junk. Well, scientists at the NHGRI didn’t quite agree with that theory and set out to investigate.

In 2003, a consortium of more than 400 leading researchers from all over the world was put together for this task. The first milestone was to fully analyze about 30 megabases or approximately 1% of the genome using high-throughput bioinformatics methods to specifically identify and catalog functional elements within this region. By 2007, this pilot phase had already produced some very interesting and novel results. For example, contrary to the generally accepted view, these studies established that the genome is transcribed pervasively, producing many nonprotein coding transcripts. Additionally, these results also reinforced the utility of the experimental techniques and bioinformatic tools used. Moreover, the next-generation sequencing technology became available, encouraging extension of these studies to the remaining part of the human genome.

This decade-long study, which cost approximately $164 million, has allowed the consortium scientists to assign functions to more than 80% of the human genome, uncovering millions of new regulatory elements and defining hitherto unknown mechanisms of regulation. Over 15 trillion bytes of data were generated, consuming over 300 years of computing time! Even the presentation of this voluminous data and conclusions of these studies have led to some ingenious publication methods. The novelty does not end there, though.

In addition to providing the scientific community with a goldmine of data to learn from, this project has contributed significantly to development of novel genome sequencing technologies, bringing the cost down from about $1,000 per megabase to less than 50 cents per megabase. While managing the sequence data will pose the next challenge, benefits of this information for disease identification, drug development, and drug repurposing will almost certainly outweigh the challenges. Understanding new regulatory mechanisms of human gene regulation will also lead to developing better treatments for complex metabolic disorders and neurodegenerative diseases such as Alzheimer’s, Parkinson’s, and autism. Thus, yesterday’s junk DNA is proving to be the state-of-the-art data of tomorrow’s health care.

What do you think the next human genome milestones will be?