Penn State scientists discover origins of genomic 'dark matter'

UNIVERSITY PARK, Pa. -- Scientists at Penn State have achieved a major milestone in understanding how genomic "dark matter" originates. This "dark matter" -- called non-coding RNA -- comprises more than 95 percent of the human genome, but it does not contain the blueprint, or code, for making proteins. The team's findings eventually may help to pinpoint exactly where complex-disease traits reside, since the genetic origins of many diseases reside outside of the coding region of the genome.

The research, published Sept. 18 as an Advance Online Publication in the journal Nature, was performed by B. Franklin Pugh, holder of the Willaman Chair in Molecular Biology at Penn State, and Penn State postdoctoral scholar Bryan Venters, who now holds a faculty position at Vanderbilt University.

In their research, Pugh and Venters set out to identify the precise location of the beginnings of transcription -- the first step in the expression of genes into proteins. "During transcription, DNA is copied into RNA -- the single-stranded genetic material that is thought to have preceded the appearance of DNA on Earth -- by an enzyme called RNA polymerase and, after several more steps, genes are encoded and proteins eventually are produced," Pugh explained. He added that, in their quest to learn just where transcription begins, other scientists had looked directly at RNA. However, Pugh and Venters instead determined where along human chromosomes the proteins that initiate transcription of the non-coding RNA were located.

"These non-coding RNAs have been called the 'dark matter' of the genome because, just like the dark matter of the universe, they are massive in terms of coverage -- making up over 95 percent of the human genome. However, they are difficult to detect and no one knows exactly what they all are doing or why they are there," Pugh said. "Now at least we know that they are real, and not just 'noise' or 'junk.' Of course, the next step is to answer the question, 'What, in fact, do they do?'"

Pugh added that the implications of this research could represent one step towards solving the problem of "missing heritability" -- a concept that describes how most traits, including many diseases, cannot be accounted for by individual genes and seem to have their origins in regions of the genome that do not code for proteins. "It is difficult to pin down the source of a disease when the mutation maps to a region of the genome with no known function," Pugh said. "However, if such regions produce RNA then we are one step closer to understanding that disease."

The research was funded by the U. S. National Institutes of Health. More information and an illustration are online at

Last Updated September 24, 2013