Monday, February 26, 2007

A Genome-wide Association Study to Find Genes Linked to Diabetes

I meant to get this post about an important new paper up 10 days ago, but illness intervened. The paper I discuss has now been published in the latest print issue of Nature.

Genome-wide association studies are hot right now, and Nature has recently published a large study that identifies new genes possibly linked to type II diabetes.

For those of you not familiar with such studies, here is a very brief, oversimplified account - for more in depth commentary on such studies, check out Nature's News and Views accompanying the article (subscription required).

The goal of genome-wide association studies is to find genes linked with a disease, in this case, diabetes. For diseases such as cystic fibrosis, which result from a catastrophic defect in a single gene, which have clear inheritance patterns, and few environmental factors involved, it's relatively straightforward to identify the gene and even understand how the defective gene results in the disease. But most common diseases that afflict industrialized societies, especially the US, such as heart disease and diabetes, are extremely complex. They result from the interplay of multiple environmental and genetic factors. A major goal of biomedical research is to identify these factors and to understand how they contribute to the development of the disease.

So how do you find the genes involved in something like diabetes? Ideally, you would get a study population of thousands of people, and compare the genomes of those who have diabetes with the genomes of those who don't have it. Variants of genes that tend to be found among diabetics, but not healthy subjects, would then be candidate diabetes genes. Unfortunately, we cannot (yet) sequence the complete genomes of thousands of people with the limited time and money available. And in fact, I'm exaggerating when I say that sequencing entire genomes would be ideal - the vast majority of our genome does not vary among individuals, so we really only want to look at those sequences where we vary. Major efforts have gone into looking at where human DNA varies among individuals (check this out), so we have some idea of where to check our genomes for differences.

We also have new technologies that make it feasible to compare the genomes of thousands of people without completely sequencing each genome. We can use microarrays to probe hundreds of thousands of places in the genome where there are known differences among people (see here; also look at this PDF for an into to a non-array technology). In most genome-wide association studies, the differences that researchers look for are SNPs (single nucleotide polymorphisms) - single 'letters' or bases of DNA that vary among people. At each SNP it is possible to find up to four different variants (either an A, T, C, or G, although at many of these points, only one or two of these variants will actually exist in among humans). In the study published in Nature, the researchers looked for SNPs that tend to show up among diabetics.

The authors of this paper, working in Canada, France, and the UK, used blood samples from 1,363 people about evenly divided between type II diabetics and non-diabetics. They determined the identity of the DNA base for 400,000 SNPs in each subject - in more technical terms, they genotyped 400,000 SNPs in their subjects. To give you an idea of how much of the genome this covers, that's about 1 SNP for every 7,500 bases in the 3 billion base genome, which is reasonably good coverage. These SNPs however are not evenly spaced over the whole genome, so you don't literally have a SNP every 7,500 bases - it is thus important to select the right set of SNPs, a complicated issue which we won't get into here.

This first screen identified about 60 SNPs possibly associated with diabetes, but to rigorously test that association the authors looked at the SNPs in a much larger study population. Since in this second stage they were now only checking a small number of SNPs, it was feasible to genotype a much larger group of people. As opposed to 1,363 people, they genotyped 5,511, divided about evenly between diabetics and non-diabetics. Ultimately they were able to check 57 SNPs, and found eight SNPs that fell within five regions, or loci, of the genome. One of these five loci contains a gene previously found to be linked with diabetes; this gene, TCF7L2, codes for a transcription factor. The other four loci also contained plausible candidate genes.

The most amazing gene found was one for a zinc transporter protein that is expressed only in the beta cells that secrete insulin, a process which requires zinc. One of the SNPs in this locus is non-synonymous, that is, it produces in an amino acid change in the zinc transporter protein, and such a change could easily, but not necessarily, impact the protein's function. This remarkable find immediately suggests treatment possibilities, such as drugs or diet supplements that could compensate for the change in this zinc transporter, though much more research will be needed to understand just what role this protein plays in the development of diabetes.

One of the most surprising conclusions in this paper is that most of the genetic variants associated with diabetes are the variants possessed by most of the human population, at least in the ethnic groups covered. In the case of the zinc transporter, those with a 'C' at that position in their DNA are more susceptible to diabetes - and most of us do have a 'C' at that position. The authors reported eight SNPs linked with diabetes, and in six of these cases the diabetes-associated variant is the major variant in the human populations studied. This suggests that most of us are genetically predisposed to diabetes, while a few of us are resistant, and thus for most of us environmental factors will play a large role in determining whether we develop diabetes. There is also an evolutionary story here - the major variants of these eight SNPs may have been beneficial in the environment in which our earliest human ancestors lived, but under our current diets these variants have become a liability. This is not the first time such an idea has been proposed, but the results of this study strengthen the evidence for this scenario.

This paper presents some of the earliest results using a genome-wide association approach with the promising new technologies described at the sites I linked to above. Whether such studies will substantially impact our understanding and ability to treat major diseases is still an open question, but these first results appear promising. The authors of this study report that they found other promising genes, which they will report later after further analysis. The hope is that such studies will enable us to confidently identify people who are at risk, and develop new treatments for these diseases that are major killers in our society.

1 comment:

Valentin Dinu said...

GWAS are very promising. dbGaP, a new db from NCBI, will hold and make available data from several GWAS studies. this will be an excellent resource for understanding the causes of complex diseases. valentin dinu, http://www.dinuinformatics.info/