Adaptive Complexity: January 2008

Wednesday, January 30, 2008

Self-Organizing Metabolism and the Origins of Life

The late Leslie Orgel, a pioneering researcher in pre-biotic evolution, has an interesting essay in the most recent issue of PLoS Biology. A long-running debate in origins-of-life research has been over what came first: genetic material or a self-organizing metabolism.

The self-organizing metabolism theory has been most prominently argued by Stuart Kauffman. Orgel doesn't rebut Kauffman's theoretical work, but he does claim it makes unrealistic assumptions about peptide chemistry, and thus is unlikely to be what happened on earth 4 billion years ago.

Orgel makes some reasonable arguments, but what is really needed is more experimental work to figure out just how plausible this self-organizing metabolism idea really is.

Monday, January 28, 2008

Sequencing 1000 Human Genomes - How Many Do We Really Need?

A group of the world's leading sequencing centers have announced plans to sequence 1000 human genomes. The cost of the first human genome project was about $3 billion; by comparison, the next 1000 will be a steal at possibly only $50 million dollars (and that's total cost, not per genome). But that's still a lot of money - why are we investing so much in sequencing genomes? It may be a lot up front, but the benefits, in terms of both economics and medical research, easily outweigh the cost of such a large project. By pooling sequencing resources and making large amounts of genome sequence data available up front, we can avoid inefficient and redundant sequencing efforts by groups of independent research groups trying discover gene variants involved in disease. In fact, it would probably be worthwhile to sequence 10,000 human genomes. With 1000 genomes, we're at least making a good start.

The 1000 genomes project comes at a time when new, genome-wide disease studies have created a need for an extensive catalog of human genetic variation, and new technology has provided a potentially efficient way to fill that need. Genome-wide association studies have scanned the genomes of thousands of people, sick and healthy, to find regions of DNA that may be involved in diseases like diabetes and heart disease. These studies have highlighted regions of our chromosomes that could be involved in disease, but the genome scans used are often still too low-resolution for researchers to pinpoint that exact genetic variant that might be involved. In many cases there dozens or more genes inside of a disease-linked region, any of which could be the disease gene of interest. As a result, groups that do these genome-wide studies have to invest considerable time and resources mapping genetic variants at higher resolution - which of course significantly increases the effort required to get anything useful out of genome-wide association studies. What we need is a much better, much more detailed catalog of all the spots where humans vary in their genomes.

Although we're all roughly 99.9% similar in our genomes, that still leaves millions of DNA positions where we vary. And often we don't just vary at single DNA base-pairs (instances where in one position you might have a 'T', while I have a 'C'); small deletions and insertions of stretches of DNA also exist, and can be involved in disease. The 1000 genomes project intends to map both single-base changes and these 'structural variants.'

Many genetic variants important in health show up in just 1%, 0.1%, or even 0.01% of the population. Imagine a medically important genetic variant present in just 0.1% of the U.S. population; 300,000 people will have this variant and will be possibly at risk for a particular disease. Knowing about such variants could help us understand how the disease develops, and possibly design prevention and treatment strategies. For 300,000 people in the U.S, and millions more around the world, it would therefore be a good thing to know about this variant. But to find those rare variants, we can't just sequence 100 human genomes, and even with 1000, we would miss a lot.

So it's obvious that 1000 genomes is just a start. To go more aggressively after rare variants that are likely to be medically important, the 1000 genomes group is going to also focus in on the small gene-containing fraction of the human genome. This will enable them to put more resources into finding rare variants near genes - places where we expect medically important variants to show up.

And let's not forget the technological benefits: by using next-generation sequencing technology, which is still not quite mature, this consortium hopes to develop innovative ways to effectively and cheaply re-sequence human genomes. In both physical technology and data analysis methods, we'll see benefits that will lead to more widespread use of this technology, hopefully in clinical diagnostics as well as research.

It has been nearly 20 years since the Human Genome Project officially began. We're still waiting for the promised medical benefits to emerge, but in the mean time, this effort has transformed biological research - in my opinion, at least as much as the foundational discoveries of molecular biology in the 50's and 60's. Future medical benefits will be based on this science.

Saturday, January 12, 2008

Richard Feynman on Doubt

[Fill in she instead of he as appropriate...]

"The scientist has a lot of experience with ignorance and doubt and uncertainty, and this experience is of very great importance, I think. When a scientist doesn't know the answer to a problem, he is ignorant. When he has a hunch as to what the result is, he is uncertain. And when he is pretty damn sure of what the result is going to be, he is in some doubt."

A scientific theory that withstands that kind of scrutiny for over a hundred years is a damn good theory.

(Quote is from The Pleasure of Finding Things Out, p. 146 - the book, not the TV documentary transcript of the same name.)

An "Irrational Attachment to The Theory of Evolution"?

In Gail Collins' NY Times column today, she says this:

Huckabee seems to be a nice guy, but conservatives are afraid he’d break up the old evangelical-plutocrat Republican alliance and most liberals are restrained by their irrational attachment to the theory of evolution.

Excuse me? I can't quite tell if Collins is being tongue-in-cheek (I frankly don't read her column enough to get where she's coming from), but it looks like Collins is the rationally challenged one here. What she just said is just as absurd as something like: "most liberals are restrained by their irrational attachment to the theory of quantum mechanics."

For people like myself who prefer to reside in the reality-based community, acceptance of the evidence for evolution is in fact an excellent litmus test for people who want my vote. It's a good indication of how decision-makers value evidence versus pet beliefs. Scientific evidence does not necessarily dictate what the correct government policy should be, but it sure as hell can rule out harebrained ones. I view that as a good thing.

This post isn't an endoresement or a slam against any particular political party or candidate - except of course candidates who have an irrational opposition to very successful, fundamental fields of science. The point is, creation vs. evolution isn't just some freak side issue on the fringes of the culture wars - it cuts to the heart of how people respond to the single most successful approach humans have developed for understanding and influencing the reality-based world.

Thursday, January 10, 2008

What Next Generation DNA Sequencing Means For You

Of all the 'Greatest Scientific Breakthroughs' of 2007 heralded in the pages of various newspapers and magazines this past month, perhaps the most unsung one is the entrance of next-generation DNA sequencing onto the stage of serious research. Prior to this year, the latest sequencing technologies were limited in their usefulness and accessibility due to their cost and a steep technical learning curve. That's now changing, and a group of recent research papers gives us a hint of just how powerful this new technology is going to be. Not only will next-generation sequencing be the biggest change in genomics since the advent of microarray technology, but it may also prove to be the first genome-scale technology to become part of every-day medical practice.

Sanger DNA sequencing is one of the most important scientific technologies created in the 20th century. It's the dominant method of sequencing DNA today, and very little of the best biological research of the last 20 years could have been done without it, including the whole genome sequencing projects that have thoroughly transformed modern biology. Now, new next-generation sequencing methods promise to rival Sanger sequencing in significance.

So what's so great about the latest sequencing technology? Sanger sequencing is inherently a one-at-a-time technology - it generates a single sequence read of one region of DNA at a time. This works well in many applications. If you're sequencing one gene from one sample, Sanger sequencing is reliable, and generates long sequence reads that, under the right conditions, can average well over 500 nucleotide bases. You get a nice clean readout of a single strand of DNA, as you can see in this example:

Modern sequencing machines that use this method generate a 4-color fluorescent dye readout, which you can see in the graph in the figure. Each peak of fluorescence in the graph represents one nucleotide base, and you know which base it is from the color of the dye.

Next-generation sequencing, also called pyrosequencing, can't generate the nice, long sequence reads you get with Sanger sequencing, nor are the individual reads as accurate. Instead of 500 DNA bases or more, you just get about 25 bases. But the difference is that you get lots and lots of sequence reads. Instead of just one long read from just one gene (or region of the genome), you get thousands of short, error-prone reads, from hundreds or thousands of different genes or genomic regions. Why exactly is this better? The individual reads may be short and error prone, but as they add up, you get accurate coverage of your DNA sample; thus you can get accurate sequence of many regions of the genome at once.

Next-generation sequencing isn't quite ready to replace Sanger sequencing of entire genomes, but in the meantime, it is poised to replace yet another major technology in genomics: microarrays. Like next-generation sequencing, microarrays can be used to examine thousands of genes in one experiment, and they are one of the bedrock technologies of genomic research. Microarrays are based on hybridization - you're basically seeing which fluorescently labeled DNA from your sample sticks (hybridizes) to spots of DNA probes on a microchip. The more fluorescent the spot, the more DNA of that particular type was in the original sample, like in this figure:

But quantifying the fluorescence of thousands of spots on a chip can be unreliable from experiment to experiment, and some DNA can hybridize to more than one spot, generating misleading results.

Next-generation sequencing gets around this by generating actual sequence reads. You want to know how much of a particular RNA molecule was in your sample? Simply tally up the number of sequence reads corresponding to that RNA molecule! Instead of measuring a fluorescent spot, trying to control for all sorts of experimental variation, you're just counting sequence reads. This technique works very well for some applications, and it has recently been used to look at regulatory markings in chromatin, to find where a neural regulatory protein binds in the genome, to look at the differences between stem cells and differentiated cells, and to see how a regulatory protein behaves after being activated by an external signal.

I've left out one the major selling points of this technology: it's going to be cheap. You get a lot of sequence at a fairly low cost. And this is why it may end up being the one technology that truly brings the benefit of genomics into our every-day medical care. Because next-generation sequencing is cheap and easy to automate, diagnostics based on sequencing, especially cancer diagnostics will become much more routine, and so will treatments based on such genetic profiling. It will be much easier to look at risk factors for genetic diseases. Microbial infections will be easier to characterize in detail.

All of this is still a few years off, but the promise of this technology is already apparent enough to include it among the great breakthroughs of 2007.

Go look at the very informative websites of 454 Life Sciences, Illumina, and Applied Biosystems, the major players in next-generation sequencing.

For more on Sanger sequencing, check out Sanger's Nobel Lecture (pdf file).

A recent commentary and primer on next-generation sequencing in Nature Methods (subscription required).

Real Science vs Intelligent Design

If Intelligent Design advocates are so insistent that most of the human genome is functional, why aren't they doing any research like this? Eric Lander's group at MIT devised a way to test whether the thousands of non-conserved, putative protein-coding genes are likely to be spurious or true protein-producing genes.

From the paper, here is their rationale:

The three most widely used human gene catalogs [Ensembl, RefSeq, and Vega] together contain a total of 24,500 protein-coding genes. It is broadly suspected that a large fraction of these entries is simply spurious ORFs, because they show no evidence of evolutionary conservation. [Recent studies indicate that only 20,000 show evolutionary conservation with dog.] However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation; the alternative hypothesis is that these ORFs are valid human genes that reflect gene innovation in the primate lineage or gene loss in other lineages.

Here is what they test:

The purpose of this article is to test whether the nonconserved human ORFs represent bona fide human protein-coding genes or whether they are simply spurious occurrences in cDNAs.

And here is their conclusion:

Here, we provide strong evidence to show that the vast majority of the nonconserved ORFs are spurious.

This is how you do science. If ID advocates were serious about science, they would be testing similar hypotheses and publishing them.

Wednesday, January 09, 2008

Scientific American and Web 2.0

What are blogs, wikis, and networking sites doing for science? Scientific American has an upcoming feature article about science and the Web 2.0. And in the spirit of Web 2.0, they're inviting your comments, so go check it out.

It's a fascinating topic, and I'll have more to say about it soon.

Thanks to Jean-Claude Bradley over at Scientific Blogging for mentioning this.

Nature comments on creationism

Tomorrow's Nature issue has an editorial (subscription only) praising the latest NAS book on evolution/creationism. The editorial goes on to suggest that:

"Between now and the 200th anniversary of Charles Darwin's birth on 12 February 2009, every science academy and society with a stake in the credibility of evolution should summarize evidence for it on their website and take every opportunity to promote it."

They also post a link to paeleontologist Kevin Padian's testimony at the Kitzmiller intelligent design trial - it's worth checking out.

Tuesday, January 08, 2008

PNAS Evolution Editorial

If you can get journal access, check out this editorial by creation/evolution veteran Francisco Ayala in the latest edition of the Proceedings of the National Academy of Sciences

Monday, January 07, 2008

Yes, you do have to be a genius to read this blog...

Although if the blog is that incomprehensible, I'm not sure what that implies about the writer.

Sunday, January 06, 2008

What Genes Did We Lose to Become Human?

When we think of the genetic changes that to take place during our evolutionary history, we typically think of changes that resulted in a gain of function, like genetic changes that resulted in a larger and more sophisticated brain, improved teeth for our changing prehistoric diet, better bone anatomy for bipedalism, better throat anatomy for speech, and so on. In many cases however, we have lost genes in our evolutionary history, and some of those losses have been beneficial. The most widely known example, found in every introductory biochemistry textbook, is the sickle-cell mutation in hemoglobin - a clear example of a mutation that damages a functional protein yet confers a beneficial effect. People with mutations in both copies of this particular gene are terribly sick, but those who have one good and one bad copy are more resistant to malaria. Another example is the CCR5 gene - people with mutations that damage this gene are more resistant to HIV. In the more distant past, a universal human mutation in a particular muscle gene that results in weaker jaw muscles may have played a role in brain evolution, by removing a constraint on skull dimensions.

These few examples were found primarily by luck, but now with the availability of multiple mammalian genome sequences, researchers can systematically search for human genes that show signs of being adaptively lost at some point in our history. David Haussler's group at UC Santa Cruz, in a recent paper, looked for the genes we lost as we developed into our modern-day human species. What they found could help us better understand our evolutionary history, and possibly the human diseases that are the side-effects of that history.

It's not hard to find genes that have been lost in the human genome - our genomes are littered with pseudogenes, genes which harbor inactivating mutations making them unable to produce a functional protein. But most of these damaged genes have functional copies elsewhere in the genome. Genes are frequently duplicated in the random shuffling that goes on in our chromosomes, and often the duplicate copy will be destroyed by mutations while the good copy continues to perform its original function. Another frequent phenomenon is the production of processed pseudogenes - these are genes that were produced when RNA was transcribed back into DNA and integrated into the genome. The result is a gene that looks just like a highly processed RNA molecule, often surrounded by the classic genetic residue that is left behind when a piece of DNA is integrated back into the genome.

Looking for lost genes

Haussler's group was not interested in these two relatively mundane classes of pseudogenes; they were searching for genes that are clearly functional in other mammals, and which have not been duplicated or reverse transcribed from RNA. In other words, the genes they were looking for would have no other functional copies hanging around somewhere else in the genome. Genes in this category are likely to be candidates for adaptive losses - genes whose function has been completely eliminated from the human genome, which may have provided some benefit to the original ancestor in which the gene was lost.

These researchers performed their search by comparing functional genes in the mouse genome with genes in the human and dog genomes. Mice are more closely related to humans than dogs are; a gene that is present and functional in mice and dogs, but destroyed by mutation in humans has therefore been present in mammals for a very, very long time, but was recently lost in the human lineage.

You can see how this works in the figure below (which I have marked up slightly from the original version in the paper). The mouse and human versions of our hypothetical gene line up nicely, but there is an inactivating mutation in the human version. In the dog version (not shown), that mutation is absent.

What they found

After completing their genome scan, and applying various quality control filters, Haussler's group came up with 72 candidate lost genes. They found some well-known lost genes, such as GULO, an enzyme necessary for making vitamin C that was been destroyed in primates, but is still functional in most other mammals. (If we still had a functional copy of this gene, we wouldn't get scurvy.)

Thee researchers found new lost genes as well, most of which have poorly characterized functions. One gene, named ACYL3 (NM_177028 in the mouse genome, for those of you who want to check out GenBank), contains a very highly conserved enzyme structure, called an acyltransferase domain (see more here). This particular acyltransferase domain is very ancient - it is found in bacteria, archaea, plants, fungi, and animals. Many species have multiple copies of this domain (the fruitfly has over 30), but mammals have only a few copies, and humans have absolutely no functional copies. We know almost nothing about this gene: it produces a membrane protein, it is expressed in the mouse pituitary gland, and it is necessary for normal embryo development in worms and flies. Why was it lost in humans, and was that loss beneficial? We don't know yet. Do we get some diseases that mice don't get, because we lack this gene? It would be fascinating to find out.

Although we know little about ACYL3, Haussler's group was able to pinpoint the timeframe of the loss. This gene was destroyed by a nonsense mutation, a change from TGG (coding for the amino acid tryptophan) to TGA, which means stop - the protein is truncated right in the middle of a highly conserved region. This TGG to TGA change is found only in chimps and humans, not gorillas, orangutans, or any other mammals checked. The mutation therefore happened after the chimp-human lineage split off from the other great apes. Gorillas and orangutans have a functional ACYL3 gene; humans and chimps don't.

Other lost genes found by these researchers included genes lost in only chimps, gorillas and humans, genes lost in all great apes, and genes lost in primates. Each of these losses is an example of an ancient gene that was functional for hundreds of millions of years, but then lost very recently in the lineage that led to primates, and ultimately humans. Without detailed functional studies in multiple species, it is hard to know which genes were lost adaptively, providing an immediate benefit sustained by natural selection, and which were lost simply because they were no longer necessary in a particular environment (such as the vitamin C biosynthesis enzyme). But with this list of genes in hand, we know where to start. It would be fascinating to figure if any of these lost genes are linked to human diseases (like GULO and scurvy), since the lost function could provide an important clue regarding the mechanism of the disease.

It's worth noting the implications of studies like this for the creation-evolution debate. Creationists, including most of the major advocates for its latest form, Intelligent Design, have frequently expressed their disbelief in human evolution and the common ancestry of today's species. In the case of ACYL3, the only plausible explanation for the pattern of mutation that Haussler's group found is that humans and chimps shared a common ancestor. If humans and chimps had been created de novo as separate species, it would an be extremely unlikely coincidence for these identical mutations to occur by chance individually in each species. And when you factor in the distribution of dozens, hundreds, thousands of such mutations, all occurring in a similar pattern, it becomes not just unlikely but essentially impossible that such patterns of mutation would have arisen by chance in each separately created species. One could suggest that a designer just decided to arrange things this way, but that argument falls in the same class as the claim that God just made the universe and the earth appear old, after creating it all 10,000 years ago. There is simply no rationale based on evidence to question the fact of common descent, and those who continue to resist this fact can only do so for religious or psychological reasons.

Figures from the original paper were annotated, cropped and posted under the PLoS Open Access License.

Friday, January 04, 2008

A Science Debate in the US Presidential Election?

It's a long shot, but a worthwhile cause. If you care about science issues in the upcoming US election, consider signing up to support Science Debate 2008.

Check out this book

The National Academy of Sciences has revised this classic book on Evolution and Creationism. (It's free online.)

How To Grow a New Head: The Amazing Regenerative Powers of Planaria

Planarians have fascinated centuries of biologists by their amazing powers of regeneration. If you decapitate a planarian, the body can grow a new head, and the head can grow a new body. In fact, if you cut out a very tiny chunk from the side of a planarian, that chunk will be able to regenerate a new, complete organism. How do these strange critters manage this? What genes do they have that we don't have? As it turns out, most planarian genes are shared with humans, and several groups of scientists are using the latest tools of genomics and molecular biology to figure out just what it is that gives planarians their remarkable powers of regeneration. These researchers hope that planarians will ultimately teach us how to regenerate human injuries.

Planarians are small flatworms that live in ponds and rivers on almost every continent. They are among the most simple organisms that possess a central nervous system: behind the two cross-eyed looking eye spots is a small bundle of neurons that serves as a brain. Extending from the brain is a primitive yet effective nervous system. Planarians also have a digestive system, and, in some species, both male and female reproductive organs (sexual planarians are hermaphrodites). All of these specialized tissues can be regrown from a tiny chunk (as few as about 10,000 cells) cut out of the side of the worm. The great geneticist Thomas Morgan, before he went on to do his famous fruitfly work, discovered (in the kind of work that would be a 10 year-old boy's dream come true) that as little as 1/279th of a planarian is enough to regrow the entire body. This is an incredible feat for such a small chunk of cells - the cells have to reorient themselves, establishing anterior and posterior ends, as well as bilateral symmetry. These cells also have to reform every type of tissue in the organism. (For some fascinating pictures of a regrowing head, check out this PDF file.)

Revealing the Secrets of Regeneration

The secret to these remarkable powers are stem cells, called neoblasts, scattered throughout the organism. How these neoblasts function in regeneration has been the focus of intense research, notably in the labs of Alejandro Sanchez Alvarado at the University of Utah, and Phillip Newmark, at the University of Illinois, Urbana-Champaign. These researchers are using the latest tools of genetics, such as knocking down genes with RNAi, to identify those genes that enable these stem cells to properly rebuild complete planarian bodies. And to leverage all of the tools that modern molecular biology has to offer, the Genome Sequencing Center at Washington University is currently sequencing a planarian genome.

Researchers in Alvorado lab recently identified two important genes involved in this process, genes that just happen to be important genes in all animals, including humans. These researchers found that when they shut down a beta-catenin gene using RNAi, planarians with amputated tails would grow a new head, instead of a new tail. In fact, shutting down beta-catenin could also transform tails into heads in whole animals with no amputated parts at all. The Alvorado lab also found that a gene which antagonizes beta-catenin, called APC (the planarian version of a gene involved in human colon cancer), has the opposite effects of beta-catenin: shutting down APC results in planarians that regrow tails in the place of amputated heads. These two genes, highly conserved in all animals, thus appear to work as master switches in planarian regeneration. An important emerging theme in modern biology is that the amazingly diverse traits found among animals is not the result of new genes for each new feature; instead, this diversity arises from how a core set of genes is put to use. Planarians share the majority of their genes with us, which means that we may some day learn how planrians use their genetic toolkit, and thus be able to manipulate human genes to perform such amazing feats of regeneration. While we won't be regrowing human heads, there is a very real hope that we'll be able to regrow human nerve tissue, such as spinal cords.

Planrians in Evolution

Planarians are also remarkable examples of creatures that appear 'half-way evolved' (of course modern planarians have been evolving as much as we have): they have features that likely resemble evolutionary intermediates of complex vertebrate organs, such as eyes. Planarian eyes don't have lenses, but they do have light sensing cells hooked up to the central nervous system, as well as a curved 'optic cup' that enables planarians to gain some information about the direction of a light source. Another primitive feature of planarians are the early cell divisions of planarian embryos: in contrast with the highly ordered process seen in vertebrates, planarian embryos undergo a series of very disordered cell divisions. And yet this system of disordered embryogenesis produces perfectly good adult organisms. People often find it hard to imagine how evolution can produce complex processes or structures, such as human eyes, or the development of a human embryo. How could an eye progressively evolve? In planarians, we can see that much more primitive physiological systems, having only some of the features of their more complex human counterparts, can be perfectly functional and therefore sustained by natural selection.

For more on planarians, and some fun pictures, check out:

http://www.planarians.org/
and this cool picture.

Photo credit: Wikipedia commons (uploaded by Alejandro Sanchez), under the Creative Commons License