Going to the Dogs

Nancy Marie Brown
September 01, 2001

"The average investor has only 11,000 more genes than a worm? Leave it to the New York Times' business writers to put the human genome sequence in perspective.

head of a yellow dog

Craig Venter of Celera Genomics, lead author on the paper in Science that announced the completion of the sequence last February, phrased it nicely too. As he told Reuters news service: "Corn has the same number of genes as humans."

It was, indeed, the big surprise when the human genome sequence was published to learn that we had so few genes: Scientists had expected the number to top 100,000; it seems now that we have only 30,000 or so.

How can something as magnificently complex as a human have only three times the genes of a worm? As Venter and his coauthors wrote, "Now we know what we have to explain."

Andy Clark, a professor of biology at Penn State and a consultant for Celera, thinks the way to begin is to compare our genome with those of other mammals. Celera did, in fact, complete the mouse genome sequence shortly after the human one, and found that only 300 human genes had no clear match to genes in the mouse.

"What's the next genome to sequence?" asked Clark at his Frontiers of Science lecture. He leaned against the podium, hand on hip, a long, lean, Lincolnesque figure with a salt-and-pepper beard. "The dog," he said.

The dog? Clark had lots of reasons: "We live in close proximity to dogs, and because of this we have an acute awareness of differences in their health and well-being. There are a great number of disorders being carefully studied by veterinarians. We have a depth of knowledge of dogs' nutrition—mostly due to companies who are in the business of making dog-food. Dogs are used in cardiac research: They're a model for human cardiac physiology. The order of the genes on the human and dog chromosomes shows amazing similarity. And if you compare human diseases with their dog equivalents, you find that for over 350 diseases the dog equivalent is precisely the same disease that humans get." Scientifically, almost any mammal would serve Clark's purposes: to allow three-way comparisons among human, some other mammal, and mouse genes. "The mouse is a powerful system," he explained, long the standard for biological studies. "We have a huge knowledge base of similarities between mice and humans. It's really staggering to see the parallels."

A study by Clark and Penn State undergraduate Stacy Hubbell shows what might be possible if the dog genome were known as well. "We retrieved as many dog genes as we could," Clark said. "We found 254 such genes in databases, genes from 2,000 letters long up to 60,000."

Using a computer program called BLAST developed in part by Penn State computer scientist Webb Miller, Clark and Hubbell matched some of those genes to the corresponding human and mouse genes. Hubbell then checked each of the potential matches by hand.

Clark projected a short stretch of one gene—in all three versions—onto the lecture-hall's screen. "Out of 60 bases, the human genetic difference is only 5 bases. It's dead obvious that this is the same gene as the dog gene. When you scan the sequence your eye sees that the mouse gene tends to be a little more different."

Some of those differences matter, some don't as the gene's DNA code is read by the cell's machinery, translated into RNA, and used to fashion a protein. "Some changes in the DNA will result in changes in the protein," Clark said, "and some will not—that's a key point."

Although each group of three letters in the RNA message translates into one amino acid, some amino acids are more tolerant of typos than others. "Consider the third position of the amino acid Serine," Clark said. "Whatever base you put into that position, you'll get a Serine. It can be UCU, UCC, UCA, or UCG." (In RNA, U, or uracil, replaces the T, or thymine, in DNA.)

While many "devastating disorders," Clark noted, are caused by the change of a single letter, in other cases (like the third position of Serine) a single base change does nothing. Factor out these "silent sites," and the average difference between what is known of the dog genome and the human genome is 7.5 percent; between dog and mouse, it's 7.98 percent. These meaningful changes, collectively termed "mis-sense," might be what causes the evolutionary difference between species, Clark said. "A given gene has a much higher silent rate of change than its mis-sense rate. There's considerable constraint to the changes that are allowed, and it seems as though Nature generally—but not always—avoids mis-sense differences."

In addition, different genes are more likely to change than others. "Different genes are evolving at different rates. Some genes are going fast, others slowly." The genes for proteins that recognize pathogens show the highest rate of change, allowing a species to adapt quickly to a new disease threat. Other genes—such as those that govern the development of a sperm's tail—have changed very little. "You could put the human protein into a dog and vice versa and it would work," said Clark. "It's rather unsettling. These are things all mammals have to do more or less the same way."

One gene that shows a very high rate of mis-sense mutations in humans is the growth hormone receptor gene. While between dog and mouse there are 40 silent changes and 16 mis-sense changes, between human and mouse there are 27 silent changes and a whopping 43 changes that affect the amino acid sequence. "It's a very intriguing result. It looks like there's a specific tendency for the growth hormone receptor gene to evolve faster in humans." Clark speculated that at some point in human evolution, it suddenly became advantageous for us to be tall.

"It requires comparison among more than two species to make these kinds of statements," he noted. "The genes that are accelerated, we've found, are the genes involved in growth and in color vision"—two areas in which we know we're different from our dogs. "It's really a very satisfying kind of result."

With a full sequence of dog genes to match against the mouse and human genomes, more similarities and differences will become apparent. As anthropologist Svante Pääbo commented in Science, such cross-species comparisons "will make the unity of life more obvious to everyone." They will be "both a source of humility and a blow to the idea of human uniqueness."

Andy Clark, Ph.D., is professor of biology in the Eberly College of Science, 326 Mueller Lab, University Park, PA 16802; 814-863-3891; c92@psu.edu.


It's a BLAST!

"The idea for BLAST was very simple. So simple, most people did not believe it would be very effective." Webb Miller, professor of computer science and engineering at Penn State, is talking about Basic Local Alignment Search Tool, the subject of a 1990 paper he co-authored with Stephen Altschul, Warren Gish, and David Lipman, from the National Center for Biotechnology Information, and Eugene Meyers, of the University of Arizona. BLAST is a computer program that compares a protein’s amino-acid sequence with a database of all known proteins.

BLAST works in three steps, Miller explains. "First, it finds pairs of short regions, one region from each of the two sequences being compared, that are exactly the same. Second, for each of the pairs found in step one, it determines whether these short matching regions lie in longer regions that match even if insertions or deletions in the regions are not allowed. Third, for each of the longer matching regions in step two, it determines whether the matching regions are alike in longer regions that match when insertions and deletions are allowed.

"The reason for having so many steps is that step one is much faster than step two, and step two is much faster than step three," Miller adds. "The idea is to do a fast step that eliminates most of the possible matches and then apply the slower step to relatively few cases."

The program surprised even its creators. It was both fast and accurate. You could quickly determine the function of a new amino-acid sequence by searching for matching sequences that had already been investigated. BLAST was so successful, in fact, that the article in the Journal of Molecular Biology became the most cited paper of the decade, according to the Institute for Scientific Information. "The number of citations a paper receives reflects its impact on science," says Miller; with over 100,000 citations in ten years, BLAST has played a critical role in DNA research.

Miller hopes his latest program, PIPMaker, published in the April 2000 issue of Genome Research, will have a similar impact. BLAST can compare amino-acid sequences thousands of letters long; PIPMaker can handle sequences millions of letters long. It can compare the complete genomes of the human and the mouse. "When I charted my goal in 1990, I was aware that my prediction for completion of the mouse sequence, 2008, coincided with me reaching retirement age. If the prediction had been accurate, I could finish my project and then turn my attention to some hobby. As it turns out, I’ll soon need to identify another goal, because the completion of the human and mouse sequences is way ahead of schedule."

—Kristin McKee

Webb Miller, Ph. D., is professor of computer science and engineering in the Eberly College of Science, 326a Pond Laboratory, University Park, PA 16802; 814-865-4551; wcm2@psu.edu.

Last Updated September 01, 2001