This article first appeared in the St. Louis Beacon, Feb. 3, 2010 - Gene researchers looking for the genes that cause inherited diseases have been having a rough decade. The decoding of the human genome eight years ago was supposed to open the door to a golden age for them.
To hunt for cancer genes, all they would have to do is compare the genomes of people with inherited disorders to the standard human genome to find the DNA differences that cause the disease. Of course, there would be lots of other differences too, but if researchers looked at lots of patients, the differences they share in common should reveal the culprits.
Very direct, very logical, very satisfying. In a cloud of research funding, the hunt began.
The first hurdle that had to be overcome was the sheer size of the enterprise. Your genome is made up of 3.5 billion nucleotides (the letters of which DNA molecules are composed). That's BILLION. Eight years ago that kind of sequencing was a nontrivial task - there was simply no practical way to compare the genomes of lots of people.
So the researchers took a short-cut that seemed sure fire. It involves the normal variations that make each of us unique individuals, the random little differences in the DNA sequence of each person's genome from the standard sequence. Such differences are called "polymorphisms" (poly is Latin for 'many,' morph Latin for 'form'). Most of these differences arise because of errors in copying the DNA, and involve only a single nucleotide code letter. They are called, logically, Single Nucleotide Polymorphisms, or SNPs (pronounced "snips").
If you simply bust up a DNA molecule into lots of tiny bits a few genes long, the odds are that a few of them will contain a SNP. If you do it again for a second person, you will get a few more, as this person is different from the first at many places on the genome. Now do it for lots of people, and you'll get lots of fragments, each with a unique SNP. Researchers soon had SNP libraries covering essentially the entire human genome.
Now here's the fun part. Any gene change causing an inherited disease has to be SOMEWHERE on the genome, so it has to be on one of the tiny bits in the SNP library! All the researcher has to do to identify which one is to take advantage of the fact that DNA is a double-stranded molecule. When you separate the strands by gentle heating, each naked DNA strand will bind back to the complementary strand when cooled - or to any other DNA molecule of the same sequence.
To take advantage of this, researchers invented what we now call the "gene chip," little glass slivers with lots of single-stranded DNA molecules stuck to them. A so-called SNP chip can have as many as 500,000 different SNP-labeled DNA bits on it. Dip it into a solution of a person's DNA, and every SNP that person has will reveal its presence by binding to the chip.
Does the approach work? Yep. Looking at the genomes of people with inherited disorders, researchers have identified about 2,000 disease-associated SNPs.
So why aren't the gene hunters celebrating? Isn't this just what they sought?
Yeah, but nature has a way of confounding the most beautifully designed experiments. The problem is that for each inherited disorder, the SNPs they discovered in their gene hunt account for only a small portion of the inherited disease.
But the SNP library covered the whole genome! Where are the missing variants, the DNA changes that cause most instances of the disease?
The question is both profoundly important and profoundly disturbing. Important because we are talking about genetic diseases that affect a great many people; disturbing because the very direct, very logical, very satisfying approach taken by the gene hunters must, in some way we do not yet understand, be wrong-headed. Somewhere we are making an assumption that isn't true.
So now the real hunt begins. We need to re-examine all of our assumptions and ferret out the bad boy. What is it we think we know but don't?
For my money, a logical place to look more closely is at the assumption that gene changes cause inherited diseases. We are actually assuming a good deal more than that in our SNP hunts. We are assuming that individual gene changes are associated with inherited disorders. Sickle cell anemia is caused by a single nucleotide change in a hemoglobin gene, inherited breast cancer by a change in either of two BRCA genes, Tay-Sachs disease by a single-gene change that renders the enzyme hexosaminidase nonfunctional, and so forth.
What if things aren't so simple? What if, for most inherited disorders, several different paths are involved in producing the disease state?
All sorts of different combination of DNA changes could get the job done. None of them would show up on a SNP hunt for disease association, because the association of any one of them with the disease depends on the precise combination of others present. The missing DNA changes are located on SNP-labeled fragments, all right, right in front of our faces, but on lots of different combinations of them, not a simple common pattern.
So the SNP hunt is going to have to go. We had to try it as the most promising and logical approach, but it hasn't worked and isn't going to. We are just going to have to compare entire genomes. Then you don't miss anything.
Fortunately in the eight years since the SNP hunts began, sequencing DNA has become a lot easier. For a lot less money, and in a lot less time, the full genome sequence of a person can be crunched out on huge computer-run sequencing machines. One of the world's great sequencing centers, stuffed full of DNA sequencing machines serviced by an army of technicians, is right down the road at the Washington University School of Medicine.
Last week, the Washington University Center, in collaboration with St. Jude Children's Hospital in Memphis, announced a $65 million project targeting cancer in kids. A full genome comparison study, this seems to me a big step in the right direction. The study will involve 600 young patients being treated at St. Jude. The Washington University Center will sequence the full genome of each of the kids twice: first the genome of cells from normal tissue, and then the genome of cells from the tumor.
Gene researchers will be very interested in what they find out. Each child can be expected to exhibit a host of differences from the standard human genome sequence, reflecting his or her individuality. Of those, the differences found only in the tumor cells are probably not inherited ones (else they would be present in all body cells), but rather the result of DNA damage during development or childhood. Of the DNA changes found in all body cells and therefore likely inherited, the researchers can look for those that show up repeatedly in children with the same disorder. The hunt is the same as that carried out unsuccessfully with SNPs, but now we will be able to see all the data.
'On science'
George B. Johnson's "On Science" column looks at scientific issues and explains them in an accessible manner. There is no dumbing down in Johnson's writing; rather he uses analogy and precise terms to open the world of science to others.
Johnson, Ph.D., professor emeritus of Biology at Washington University, has taught biology and genetics to undergraduates for more than 30 years. Also professor of genetics at Washington University’s School of Medicine, Johnson is a student of population genetics and evolution, renowned for his pioneering studies of genetic variability.
He has authored more than 50 scientific publications and seven texts, including "BIOLOGY" (with botanist Peter Raven), "THE LIVING WORLD" and a widely used high school biology textbook, "HOLT BIOLOGY."
As the founding director of The Living World, the education center at the St Louis Zoo, from 1987 to 1990, he was responsible for developing innovative high-tech exhibits and new educational programs.
Copyright George Johnson