Adaptive Complexity: 2008

Saturday, March 22, 2008

Adaptive Complexity is Moving - Update your links

I'm completely moving this blog over to Scientific Blogging. If you follow this blog via RSS, the new link is:

Adaptive Complexity.

If you read this through the DNA Network, you'll see the new feed shortly.

My Scientifc Blogging column is here.

If you're waiting for the long-delayed next installment on the science in Pynchon's Against the Day, it will go up there (soon, I hope).

Why am I moving? Scientific Blogging is a great and growing community of science bloggers, as opposed to an isolated blog on blogger.

On top of that, I get to write about science for a broader audience. The NSF's Science and Engineering Indicators consistently show that people's acceptance of well-established, mainstream scientific findings - that humans are the product of evolution, that the continents move, that the universe began with the Big Bang - all correlate strongly with education. This is a major reason why scientists should be reaching out to the broader public to get people informed and excited about science.

So come on over and visit my new blogging home, and check out some of the other great writers there. And if you're so inclined, come on over and start your own science blog.

Friday, March 07, 2008

Statisticians are verifiably insane

R is a great free statistical software package. I'm trying to use it frequently enough to get comfortable with the basic commands and data objects, so that I don't have to dig through the help manual every time I want to do some small thing. I'm trying to break my Excel habit - if you want real statistics, you don't use Excel, right?

I'm writing my paper, and need to make a simple bar graph. The basic kind, with error bars, which you see in every single issue of every science journal published today. In about 1 minute I can make one, using Excel:

I'm thinking, 'wow, that was easy in Excel, but I should really learn to do it in R.' Of course in R, 35 minutes later, I still haven't figured out how to get the chart to look the way I want. So I go to the trusty R Help mailing list, confident that someone else has shared my frustration. And there I learn that statisticians hate these charts. Do we need any more evidence that statisticians are insane?

(I understand their point, but in most cases - come on! Bar charts are great!)

Thursday, March 06, 2008

Bad Science Journalism: The Myth of the Oppressed Underdog

There is a particular narrative about science that science journalists love to write about, and Americans love to hear. I call it the 'oppressed underdog' narrative, and it would be great except for the fact that it's usually wrong.

The narrative goes like this:

1. The famous, brilliant scientist So-and-so hypothesized that X was true.

2. X, forever after, became dogma among scientists, simply by virtue of the brilliance and fame of Dr. So-and-so.

3. This dogmatic assent continues unchallenged until an intrepid, underdog scientist comes forward with a dramatic new theory, completely overturning X, in spite of sustained, hostile opposition by the dogmatic scientific establishment.

We love stories like this; in our culture we love the underdog, who sticks to his or her guns, in spite of heavy opposition. In this narrative, we have heroes, villains, and a famous, brilliant scientist proven wrong.

I'm sure you could pick out instances in science history where this story is true, but more often it is not. You wouldn't know this from the pages of our major news media though; in fact you'd probably get the impression that the underdog narrative is the way science works. And many journalists may think that too; after all, most of them read (or misread) Thomas Kuhn when they were in college, and Kuhn brought this kind of narrative to a new high. The impression this narrative leaves is that science only progresses by the efforts of brave individuals who are willing to wither the wrath of the scientific establishment.

Why is this narrative about science wrong? Let me illustrate with the nearest example I have at hand right now: a piece out of the 2007 Best American Science and Nature Writing: "The Effeminate Sheep", by Jonah Lehrer, a piece that was originally published in Seed magazine. The piece is about Joan Roughgarden, an accomplished Stanford Biologist, and the transgendered author of the book, Evolution's Rainbow, about the sexual diversity that we find in nature. As we learn in her book, gay sex is quite popular in the animal world - bonobos, fish, giraffes, whales, and big-horn sheep make humans look incredibly prude.

This raises a question: being gay has obvious evolutionary fitness consequences - without modern medicine, you have to have heterosexual sex to have offspring. So is homosexuality in nature just a freak occurrence, a case of bad genes; or is it something that is in some way adaptive and therefore under selective pressure? Does this mean that there are problems with our current understanding of sexual selection in evolutionary theory?

(Important aside: let's make it clear right now that my intention is not to knock Dr. Roughgarden or her research, or her book - I'm talking here about how science is presented to the public. In fact I feel a little school pride in Roughgarden's accomplishments - she and I both spent some of our educational careers at the University of Rochester, though obviously not at the same time.)

Getting back to our underdog narrative, take a look at how Lehrer sets up the story. After giving a very brief introduction to Darwin's ideas about sexual selection, using the classic example of peacock tails, he writes:

"Darwin's theory of sex has been biological dogma ever since he postulated why peacocks flirt. His gendered view of life has become a centerpiece of evolution, one of his great scientific legacies."

There you have the classic start of the narrative: Darwin, our brilliant scientist, came up with a theory about evolutionary sexual selection, which has been dogma among biologists ever since.

But this story isn't true: Darwin's theories about selection took some time before they were widely accepted (in fact, Darwin's claim that all living species share a common ancestry was accepted before his ideas about selection). And even then, they weren't taken as dogma; researchers have been actively studying the subject for a long time. The theory of sexual selection has undergone heavy scrutiny and extensive modification, including an effort to put it within the mathematical framework of game theory - a development which didn't take place until 100 years after Darwin proposed sexual selection. Biological dogma ever since Darwin? Hardly! (Take a look at this book on the development and status of theories of sexual selection.)

But Lehrer doesn't bother to tell his readers any of this; it would spoil the underdog narrative. It's time to introduce the underdog scientist ready to overturn it all:

"Despite this new evidence [of gay sex among animals], sexual selection theory is still stuck in the nineteenth century. The Victorian peacock remains the standard-bearer. But as far as Roughgarden is concerned, that's bad science: 'The time has come to declare that sexual theory is indeed false and stop shoehorning one exception after another into a sexual selection framework ... To do otherwise suggests that sexual selection theory is unfalsifiable, not subject to refutation.'"

Roughgarden is the underdog against the scientific establishment - an establishment stuck in the nineteenth century, so willing to protect its pet theory that it will go so far as to make it unfalsifiable.

What is this revolutionary idea that the establishment is hostile towards? Roughgarden believes that homosexual behavior is an adaptive trait, one preserved by natural selection to play an important role in group cohesion, or as Lehrer puts it, "gayness is a necessary side effect of getting along." To support this, Roughgarden marshals examples of unorthodox sexual arrangements in many different species, and explains that these arrangements actually promote evolutionary fitness in complex animal societies. She has presented this evidence in a popular book, Evolution's Rainbow, and in a review article in the journal Science (subscription required).

Her ideas are unquestionably bold. She and her co-authors set themselves an ambitious task to replace the current theory of sexual selection:

"We think that the notion of females choosing the genetically best males is mistaken. Studies repeatedly show that females exert choice to increase number, not genetic quality, of offspring and not to express an arbitrary feminine aesthetic. Instead, we suggest that animals cooperate to rear the largest number of offspring possible, because offspring are investments held in common. We therefore propose replacing sexual selection theory with an approach to explaining reproductive social behavior that has its basis in cooperative game theory."

They go on to present their new mathematical formalism for their ideas. This is how science is supposed to work, incidentally. Roughgarden wrote a popular book, but didn't expect that to be a substitute for genuine scientific papers. She knows she has to convince her scientific peers, and to do that, she wrote a technical paper, spelling out the mathematical basis for her novel ideas. The next step is for the people who understand those mathematical details to check them out, work them over, and see how persuasive they are.

But that's not how the underdog narrative goes. According to Lehrer, by this point it should have been an open-and-shut case, if it weren't for the hostile scientific establishment:

"Despite Roughgarden's long list of peer-reviewed articles in prestigious journals, most evolutionary biologists remain skeptical of her conclusions... In the absence of something conclusive, most scientists stick with Darwin and Dawkins."

In the underdog narrative, it is wrong for the establishment to remain skeptical, which in reality is exactly the opposite of how science is supposed to work. It is not like a courtroom where innocence is the presumption; in science, a novel idea is unfounded until proven otherwise. And Roughgarden's publication record, impressive as it is, is not evidence that her hypothesis is correct. Nor is the fact that her review article was published in Science evidence that her idea is true. It does mean though that she's put something together serious enough to deserve a hearing. (One more nitpicking point - "scientists are sticking with Dawkins"? Dawkins may be the authority on the subject to someone like Lehrer who has probably read only popular books on evolution. Dawkins has done some professional work in this field, but he's not the reigning authority.)

Our underdog narrative is almost complete: we have a reigning scientific authority, Darwin, whose ideas are entrenched dogma among an establishment that is skeptical of our underdog scientist, whose ideas are so obviously true that they would have been accepted if it weren't for the closed-mindedness of the defenders of orthodoxy.

And closed-minded they are: Lehrer, instead of summarizing the real critiques that Roughgarden's paper generated (those with access to Science can read them here and here), suggests that biologists are unwilling to abandon their dogma of sexual selection and view homosexuality as anything but "sexual deviants" and "statistical outliers."

That's not exactly what Roughgarden's critics are saying. The responses to her Science review included two major criticisms: that Roughgarden did not correctly characterize sexual selection as it is currently understood, and that some of her assumptions in her game theory model were wrong. Most writers did think that she had offered something interesting, although not something which completely negates the theory of sexual selection; several writers suggested that current theories could be modified to incorporate Roughgarden's ideas. Lehrer does quote one scientist who basically says just that. He quotes PZ Myers, a University of Minnesota biologist:

"I think much of what Roughgarden says is very interesting. But I think she discounts many of the modifications that have been made to sexual selection since Darwin originally proposed it. So in that sense, her Darwin is a straw man. You don't have to dismiss the modern version of sexual selection in order to explain sexual selection of homosexuality."

Our narrative would not be complete without a final look at our persevering underdog:

"Roughgarden remains defiant," Lehrer writes. And we learn the real source of the establishment's skepticism. "I think many scientists discount me because of who I am," the transgendered Roughgarden says. "The theory is becoming Ptolemaic. It clearly has the trajectory of a hypothesis in trouble."

We are left with the impression of scientists hanging on to a sinking theoretical ship, unable to move forward in their understanding because they have something personally against the underdog of the narrative.

It's amazing science makes any advances at all, with such closed-minded people in control of the field! But that's not really how things work. The real story is an example of science operating the way it is supposed operate: a researcher comes up with new and very interesting observations that seem to challenge our current understanding of an important problem. She works to put those observations under some sort of theoretical framework, and presents the results in a paper to her scientific peers. Her fellow scientists think the work is interesting, but remain unconvinced because the evidence or theoretical development is not yet sufficient to support the hypothesis. What should happen next is that our researcher should go out and collect more evidence, correct any mistakes in the analysis or make a persuasive reply to her critics, and try again. A major new idea, one which overturns an existing, well-supported theory, does not get established in one paper. There has to be follow up and debate, and if the idea holds up to scrutiny it will be accepted.

Beware the underdog narrative in science journalism. This narrative severely misrepresents how science really works. It's designed to elicit our sympathy for a not-yet-established theory, maybe one that is socially attractive, and to arouse our indignation against the staid community of eggheaded scientists. This underdog narrative plays on our emotions, it makes for a good read, and helps us feel good about ourselves when we stand up for our convictions. What gets lost is the scientific method, the idea that novel proposals need to be thoroughly vetted and tested, no matter how intuitively attractive they are. That vetting process is done by a dynamic community of smart, educated, competitive people, who care passionately about science. It's a community where everyone wants to come up with the next big theory that overturns long-held beliefs. But that's hard to do, especially in fields where all the low-hanging fruit has been picked over by really talented people for decades or centuries. If a new theory is being presented in the media as the centerpiece of an underdog narrative, you can bet the farm that this theory is not yet substantiated by the evidence.

I said I wasn't writing this to knock Dr. Roughgarden, but I'm going to renege on that promise just a little: based on how she's quoted in the piece, she does seem to be feeding the narrative. She's not giving her colleagues enough credit for giving her ideas a hearing. What non-heterosexual behavior among vertebrates implies about evolution is a fascinating question, one plenty of biologists would be happy to know more about. But it's a question that's only going to be settled by evidence.

This is not a real scientific conference

I'm all for open debate over the genuine science behind global warming - researchers who have research results that don't jive with the IPCC's consensus report should be free to present their stuff at real scientific conferences and not be ostracized just because their results are different.

But, as Real Climate points out, the Heartland Institute International Conference on Climate Change didn't even try to disguise the fact that it was a PR event, and not a scientific conference. The conference organizers invited speakers with this letter (PDF file), which clearly states that:

"The purpose of the conference is to generate international media attention to the fact that many scientists believe forecasts of rapid warming and catastrophic events are not supported by sound science, and that expensive campaigns to reduce greenhouse gas emissions are not necessary or cost-effective. " (emphasis mine)

There are big conferences which are basically promotional events (not necessarily a bad thing!) like the Bio-IT World Conference, but these are not to be confused with real research conferences like the Keystone Symposia or the Cold Spring Harbor Meetings, where the latest, and often unpublished research is presented primarily to other scientists, and not to the media or interested laypeople. And real scientific conferences aren't usually sponsored by organizations that have decided what the answer to a scientific question has to be beforehand (the Heartland Institute is specifically devoted to "discover, develop, and promote free-market solutions to social and economic problems" - I'm all for finding free-market solutions when they're feasible, but we ought to [gasp!] consider cases where the free market solution isn't the most efficient or beneficial).

John Tierny is having trouble understanding this distinction as well, implying that Heartland's advocacy simply counteracts government climate change-advocacy perpetuated by bureaucrats doling out funding at the NSF and NASA (never mind those pesky study sections!).

If you've never been to a real science conference, I suppose it can be hard to understand that, for all of the human failings that exist in science like anywhere else, there are strong institutional cultural mechanisms in place that make science work - like the fact that speakers at a meeting are chosen by a scientific organizing committee, and not by financial sponsors. Advocacy can play an important role in public policy, but research conferences aren't about advocacy.

(Real Climate has more.)

Wednesday, February 27, 2008

Science Blogging: There are Plenty of Readers Out There

There is currently a blogging debate going on about the absence of science from science blogs (initiated over at bayblab). Why are the most popular science blogs full of religion, politics, and controversy?

Larry at Sandwalk has also weighed in on the issue and notes that his posts devoted to geniune science get only a fraction of the readers that his more controversial posts get. There is no doubt that the evolution and creation controversy draws people - I see that in my traffic stats too.

So what should science blogs be about? I agree with Larry's point that a blog should be about whatever the writer is interested in; if you want to write about religion, science, or maybe even Thomas Pynchon (check my labels), go for it. But do you need to avoid the science to draw readers? Are most readers just bored to tears by our blogging on peer-reviewed research?

Not by a long shot, but you have to go out and find those readers. There are millions of readers who love to read the NY Times science section, Scientific American, Discover, and a bunch of other magazines like it. Books like "The Best American Science Writing" do reasonably well every year. These publications cover controversies, but they primarily have really good science writing - and maybe science bloggers could learn something from them.

Those of us who are professional scientists are so used to writing for our colleagues or blogging peers, and we think that with a few tweaks, it should be easy to get the reader off the street interested too. If those readers aren't interested, we take it to mean that few people care about real science. But I don't believe it. It's hard work to write about science well for the average National Geographic or Discover reader, and those of us who are used to writing NIH proposals may have a lot to learn.

If your major goal is to have your posts linked to by most of the other science bloggers around, that's fantastic and admirable, but it's not the same thing as writing for a big popular audience. When we write science mainly for other bloggers, we're not going to get a lot of traffic for our serious science posts.

The other problem is being found in the overgrown jungle of cyberspace. It feels like there are more blogs than people out there, about all sorts of crazy stuff; so it takes a lot of work for an interested, non-professional science reader to dig up the good stuff. Things like Digg and reddit make it easier, but it's still hard. You probably know where I'm going if you've been following my blog - the other place I write for, Scientific Blogging, is focused on writing about science for a really broad audience. It has double the traffic of ScienceBlogs, so clearly it's reaching a different audience. The readers are out there. (And anyone can go sign up for a blog on Scientific Blogging.)

Money has also came up in the debate - should bloggers be paid? I don't do ads here, but Scientific Blogging does give the larger share of the ad revenue back to its writers, who are free to write what they want (as long as it is about real science - real being the key term here). So I write there, and get paid. But if you write a book, you get paid. If you write a science piece for the New Yorker or National Geographic, you get paid. The pay for blogging for me and most of the rest of us doesn't come close to the value of the time I put into it - those of use with day jobs are not about to get rich from this. I'm hoping to be able to head over to Amazon a little more often, and to not have to stop buying coffee at my favorite roaster when money runs out about a week before I get my pacheck. But I do this primarily to talk about science - if money was my major motivator for doing things, there is no chance in hell I'd be a postdoc right now.

Sunday, February 24, 2008

Plug-And-Play Inside Your Cells: Signals and Side Effects

My latest column is up on Scientific Blogging:

If you've ever had a severe asthma attack or gone into premature labor, there is a good chance you were given the drug terbutaline. Terbutaline can relax your involuntary smooth muscle when it's causing problems: in constricted airways during an asthma attack, or in the uterus during contractions. But if you've taken terbutaline, you've probably also noticed another effect: it can induce a pounding, racing heartbeat. How can one drug produce such opposite effects - relaxing smooth muscle in some parts of your body, while making your cardiac muscle work harder?

The answer is that terbutaline switches on a common information-processing module, called a signaling pathway, which gets used over and over in different cells to perform very different jobs. This information-processing module can be plugged into different cell types, where it will transmit signals from the environment outside the cell to the inside where the information is processed and acted upon. Because our cells use a common set of information-processing modules to carry out so many different jobs, it's easy for drugs that act on these modules to produce a wide range of side-effects.

Go read the rest at my column.

(Just to repeat my earlier note to long-time readers: for now own, my stuff aimed at a lay audience goes up on Scientific Blogging, more technical or personal stuff stays here on Blogger.)

Wednesday, February 20, 2008

Making Biology Easy Enough For Engineers

No, I'm not knocking the intelligence of engineers. But we're still not at the point where, in the words of synthetic biologist Drew Endy:

...when I want to go build some new biotechnology, whether it makes a food that I can eat or a bio-fuel that I can use in my vehicle, or I have some disease I want to try and cure, I don't want that project to be a research project. I want it to be an engineering project.

Just like designing a new bridge or a new car is not a scientific research project, designing biotechnology shouldn't always be a research project. But biology is still too hard, argues Drew Endy, in a reflective interview on The Edge. (Thanks to The Seven Stones for the tipoff).

Endy draws a distinction between those of us trying to reverse engineer complex biological systems and those who want to build them - you could say, systems biologists vs. synthetic biologists:

Engineers hate complexity. I hate emergent properties. I like simplicity. I don't want the plane I take tomorrow to have some emergent property while it's flying.

He seems to also be arguing that if we want to build truly predictive models of biological systems, like, say, an individual yeast, we should work on building biological systems, not just reverse engineering them:

If I wanted to be able to model biological systems, if I wanted to be able to predict their behavior when the environment or I make a change to them, I should be building the biological systems myself.

I understand this to mean that you start by engineering really simple things (individual genes), and move up to more complex things (promoters, chromosomes, genomes).

This sounds like a useful approach, but I still don't see how synthetic biology is going to go from engineering really, really simple systems to systems that approach the complexity of real organisms. In the case of mechanical or electrical engineering, the physical theory behind how these systems behave has been worked out, to a high level of sophistication, for decades. And thus we can engineer, fairly easily, things from thermostats to computers to Boeing planes.

But how do we go from building artificial genes and promoters to artificial metabolic pathways (without just copying and pasting an existing metabolic pathway, with minor tweaks)? Let's say you can cheaply synthesize a 50 million-base artificial chromosome, big enough to hold a set of metabolic or signaling pathways of your custom design. How do you choose what to put on your artificial chromosome?

I don't see how you can do it without a genuinely quantitative, formal, theoretical framework for treating biological systems, which we just don't have yet. To echo Endy's earlier quote on engineering, every new effort to model a biological system is a research project in itself, not a routine engineering task. How do we change that?

It's a fascinating interview, worth checking out.

Tuesday, February 12, 2008

If Darwin Had A Web Browser, He Would Never Have Written The Origin

I've got a post on Darwin and the pace of science today up at Scientific Blogging:

How can today's wired, multitasking scientist ever compete with the great scientists of the past? One feature of Darwin's work as a scientists was that it proceeded slowly, very, very slowly. He wrote massive groundbreaking books, compiled huge amounts of data on orchids, barnacles, and Galapagos animals, but all over a long period of time. Scientists in Darwin's day had hours to kill on long voyages, took long walks out in the field, and waited while their scientific correspondence leisurely wended its way across oceans or continents.

Even in the first half of the 20th century, great scientists are famous for what they accomplished on long walks, hiking trips, and train rides. Niels Bohr would walk for hours around Copenhagen and come up with groundbreaking ideas, while Werner Heisenberg spent weeks every year hiking in the mountains. Even Richard Feynman, working in our more modern (but still pre-internet) era, insisted on long blocks of time to concentrate; he likened his thought process to building a house of cards, easily toppled by distraction and difficult to put back together.

Does that mean the kind of science we do in our overscheduled, multitasking world will never be the same as it was in the past? Certainly in one sense it won't - earlier generations of scientists had one distinct advantage we don't have today: Servant

Read the rest here

A note to regular readers: I'm going to start a division of labor between my blogs - posts aimed at a broad, non-specialist audience will go up at my column on Scientific Blogging, and more technical stuff or personal rants (which I haven't written much of lately, but more on systems biology and genomics is coming) will stay here in their entirety. I'll put links and the first couple paragraphs of my Scientific Blogging posts here.

Sunday, February 10, 2008

Neil Shubin's Your Inner Fish and More on Darwin Day

Tuesday, Feb. 12 is the anniversary of Darwin's birthday, and has been dubbed Darwin Day. In celebration of Darwin's scientific achievement, many organizations are holding Darwin Day events.

Over at Scientific Blogging, we're celebrating with a feature page chock full of Darwin Day articles, links to Darwin Day blogging around the web, and highlights of events around the country. If you have a blog post you'd like highlighted for Darwin Day, head on over there, download the badge, and we'll put up a link to your post.

Included is my review of Neil Shubin's recent book, Your Inner Fish: A Journey Into the 3.5-billion-year history of the Human Body. Just to get you started, here is the first paragraph:

“ 'What does the body of a professor share with a blob?' Neil Shubin answers this and other questions about the evolutionary history of our anatomy in Your Inner Fish: A Journey Into The 3.5-Billion-Year History of the Human Body (Pantheon, 2008). As an undergraduate student considering a research career in science, I once endured a 7 AM human anatomy course. In my semi-conscious state, breathing the slightly disturbing fumes of the preservative that the teaching assistant kept spraying on the cadavers, I was thinking, ‘this is morbidly fascinating, but really not that relevant to what scientists do today.’ If Neil Shubin had been teaching my anatomy course, I wouldn’t have struggled to get out of bed and make it to class on time. His book is a fun, compelling tour of the evolutionary history of the human body, filled with dozens of examples that nicely illustrate why biology only makes real sense when it is understood in the context of evolution."

Go read the rest over at Scientific Blogging

Saturday, February 09, 2008

Looking for Darwin Day Writers!!

Darwin Day is coming up Monday, and I'm sure a lot of you bloggers out there will be writing good stuff. Over at Scientific Blogging, we're organizing a last-minute Darwin Day Carnival. Download the badge here (keep in mind the page is still in the draft stage), put the badge in your post, and we'll provide an intro and link to your post.

Scientific Blogging gets almost 600,000 visitors per month - if you want a larger audience for your Darwin Day post, come take advantage of this Carnival.

Any questions? Leave a comment in this post, and I'll get back to you.

Tuesday, February 05, 2008

Super Tuesday Links

I'm trying to find time to write a paper, and thus I've neglected my blog. But here are two interesting links:

Super Tuesday and Science Debate 2008: The US National Academies of Sciences have joined in the call for a US Presidential Election Science Debate. The National Academies have offered to co-sponsor this event. If you haven't heard of this movement yet, go check it out.

One of the pioneers of molecular biology, Nobel Laureate Joshua Lederberg, passed away this past weekend. Lederberg worked in an era very different from today's big-team biology. His brilliant work making bacterial genetics possible is inspiring to at least one biologist who longs for more science based the ingenuity and creativity of individual researchers, as opposed to the large-scale, brute-force work common in today's biomedical research.

Wednesday, January 30, 2008

Self-Organizing Metabolism and the Origins of Life

The late Leslie Orgel, a pioneering researcher in pre-biotic evolution, has an interesting essay in the most recent issue of PLoS Biology. A long-running debate in origins-of-life research has been over what came first: genetic material or a self-organizing metabolism.

The self-organizing metabolism theory has been most prominently argued by Stuart Kauffman. Orgel doesn't rebut Kauffman's theoretical work, but he does claim it makes unrealistic assumptions about peptide chemistry, and thus is unlikely to be what happened on earth 4 billion years ago.

Orgel makes some reasonable arguments, but what is really needed is more experimental work to figure out just how plausible this self-organizing metabolism idea really is.

Monday, January 28, 2008

Sequencing 1000 Human Genomes - How Many Do We Really Need?

A group of the world's leading sequencing centers have announced plans to sequence 1000 human genomes. The cost of the first human genome project was about $3 billion; by comparison, the next 1000 will be a steal at possibly only $50 million dollars (and that's total cost, not per genome). But that's still a lot of money - why are we investing so much in sequencing genomes? It may be a lot up front, but the benefits, in terms of both economics and medical research, easily outweigh the cost of such a large project. By pooling sequencing resources and making large amounts of genome sequence data available up front, we can avoid inefficient and redundant sequencing efforts by groups of independent research groups trying discover gene variants involved in disease. In fact, it would probably be worthwhile to sequence 10,000 human genomes. With 1000 genomes, we're at least making a good start.

The 1000 genomes project comes at a time when new, genome-wide disease studies have created a need for an extensive catalog of human genetic variation, and new technology has provided a potentially efficient way to fill that need. Genome-wide association studies have scanned the genomes of thousands of people, sick and healthy, to find regions of DNA that may be involved in diseases like diabetes and heart disease. These studies have highlighted regions of our chromosomes that could be involved in disease, but the genome scans used are often still too low-resolution for researchers to pinpoint that exact genetic variant that might be involved. In many cases there dozens or more genes inside of a disease-linked region, any of which could be the disease gene of interest. As a result, groups that do these genome-wide studies have to invest considerable time and resources mapping genetic variants at higher resolution - which of course significantly increases the effort required to get anything useful out of genome-wide association studies. What we need is a much better, much more detailed catalog of all the spots where humans vary in their genomes.

Although we're all roughly 99.9% similar in our genomes, that still leaves millions of DNA positions where we vary. And often we don't just vary at single DNA base-pairs (instances where in one position you might have a 'T', while I have a 'C'); small deletions and insertions of stretches of DNA also exist, and can be involved in disease. The 1000 genomes project intends to map both single-base changes and these 'structural variants.'

Many genetic variants important in health show up in just 1%, 0.1%, or even 0.01% of the population. Imagine a medically important genetic variant present in just 0.1% of the U.S. population; 300,000 people will have this variant and will be possibly at risk for a particular disease. Knowing about such variants could help us understand how the disease develops, and possibly design prevention and treatment strategies. For 300,000 people in the U.S, and millions more around the world, it would therefore be a good thing to know about this variant. But to find those rare variants, we can't just sequence 100 human genomes, and even with 1000, we would miss a lot.

So it's obvious that 1000 genomes is just a start. To go more aggressively after rare variants that are likely to be medically important, the 1000 genomes group is going to also focus in on the small gene-containing fraction of the human genome. This will enable them to put more resources into finding rare variants near genes - places where we expect medically important variants to show up.

And let's not forget the technological benefits: by using next-generation sequencing technology, which is still not quite mature, this consortium hopes to develop innovative ways to effectively and cheaply re-sequence human genomes. In both physical technology and data analysis methods, we'll see benefits that will lead to more widespread use of this technology, hopefully in clinical diagnostics as well as research.

It has been nearly 20 years since the Human Genome Project officially began. We're still waiting for the promised medical benefits to emerge, but in the mean time, this effort has transformed biological research - in my opinion, at least as much as the foundational discoveries of molecular biology in the 50's and 60's. Future medical benefits will be based on this science.

Saturday, January 12, 2008

Richard Feynman on Doubt

[Fill in she instead of he as appropriate...]

"The scientist has a lot of experience with ignorance and doubt and uncertainty, and this experience is of very great importance, I think. When a scientist doesn't know the answer to a problem, he is ignorant. When he has a hunch as to what the result is, he is uncertain. And when he is pretty damn sure of what the result is going to be, he is in some doubt."

A scientific theory that withstands that kind of scrutiny for over a hundred years is a damn good theory.

(Quote is from The Pleasure of Finding Things Out, p. 146 - the book, not the TV documentary transcript of the same name.)

An "Irrational Attachment to The Theory of Evolution"?

In Gail Collins' NY Times column today, she says this:

Huckabee seems to be a nice guy, but conservatives are afraid he’d break up the old evangelical-plutocrat Republican alliance and most liberals are restrained by their irrational attachment to the theory of evolution.

Excuse me? I can't quite tell if Collins is being tongue-in-cheek (I frankly don't read her column enough to get where she's coming from), but it looks like Collins is the rationally challenged one here. What she just said is just as absurd as something like: "most liberals are restrained by their irrational attachment to the theory of quantum mechanics."

For people like myself who prefer to reside in the reality-based community, acceptance of the evidence for evolution is in fact an excellent litmus test for people who want my vote. It's a good indication of how decision-makers value evidence versus pet beliefs. Scientific evidence does not necessarily dictate what the correct government policy should be, but it sure as hell can rule out harebrained ones. I view that as a good thing.

This post isn't an endoresement or a slam against any particular political party or candidate - except of course candidates who have an irrational opposition to very successful, fundamental fields of science. The point is, creation vs. evolution isn't just some freak side issue on the fringes of the culture wars - it cuts to the heart of how people respond to the single most successful approach humans have developed for understanding and influencing the reality-based world.

Thursday, January 10, 2008

What Next Generation DNA Sequencing Means For You

Of all the 'Greatest Scientific Breakthroughs' of 2007 heralded in the pages of various newspapers and magazines this past month, perhaps the most unsung one is the entrance of next-generation DNA sequencing onto the stage of serious research. Prior to this year, the latest sequencing technologies were limited in their usefulness and accessibility due to their cost and a steep technical learning curve. That's now changing, and a group of recent research papers gives us a hint of just how powerful this new technology is going to be. Not only will next-generation sequencing be the biggest change in genomics since the advent of microarray technology, but it may also prove to be the first genome-scale technology to become part of every-day medical practice.

Sanger DNA sequencing is one of the most important scientific technologies created in the 20th century. It's the dominant method of sequencing DNA today, and very little of the best biological research of the last 20 years could have been done without it, including the whole genome sequencing projects that have thoroughly transformed modern biology. Now, new next-generation sequencing methods promise to rival Sanger sequencing in significance.

So what's so great about the latest sequencing technology? Sanger sequencing is inherently a one-at-a-time technology - it generates a single sequence read of one region of DNA at a time. This works well in many applications. If you're sequencing one gene from one sample, Sanger sequencing is reliable, and generates long sequence reads that, under the right conditions, can average well over 500 nucleotide bases. You get a nice clean readout of a single strand of DNA, as you can see in this example:

Modern sequencing machines that use this method generate a 4-color fluorescent dye readout, which you can see in the graph in the figure. Each peak of fluorescence in the graph represents one nucleotide base, and you know which base it is from the color of the dye.

Next-generation sequencing, also called pyrosequencing, can't generate the nice, long sequence reads you get with Sanger sequencing, nor are the individual reads as accurate. Instead of 500 DNA bases or more, you just get about 25 bases. But the difference is that you get lots and lots of sequence reads. Instead of just one long read from just one gene (or region of the genome), you get thousands of short, error-prone reads, from hundreds or thousands of different genes or genomic regions. Why exactly is this better? The individual reads may be short and error prone, but as they add up, you get accurate coverage of your DNA sample; thus you can get accurate sequence of many regions of the genome at once.

Next-generation sequencing isn't quite ready to replace Sanger sequencing of entire genomes, but in the meantime, it is poised to replace yet another major technology in genomics: microarrays. Like next-generation sequencing, microarrays can be used to examine thousands of genes in one experiment, and they are one of the bedrock technologies of genomic research. Microarrays are based on hybridization - you're basically seeing which fluorescently labeled DNA from your sample sticks (hybridizes) to spots of DNA probes on a microchip. The more fluorescent the spot, the more DNA of that particular type was in the original sample, like in this figure:

But quantifying the fluorescence of thousands of spots on a chip can be unreliable from experiment to experiment, and some DNA can hybridize to more than one spot, generating misleading results.

Next-generation sequencing gets around this by generating actual sequence reads. You want to know how much of a particular RNA molecule was in your sample? Simply tally up the number of sequence reads corresponding to that RNA molecule! Instead of measuring a fluorescent spot, trying to control for all sorts of experimental variation, you're just counting sequence reads. This technique works very well for some applications, and it has recently been used to look at regulatory markings in chromatin, to find where a neural regulatory protein binds in the genome, to look at the differences between stem cells and differentiated cells, and to see how a regulatory protein behaves after being activated by an external signal.

I've left out one the major selling points of this technology: it's going to be cheap. You get a lot of sequence at a fairly low cost. And this is why it may end up being the one technology that truly brings the benefit of genomics into our every-day medical care. Because next-generation sequencing is cheap and easy to automate, diagnostics based on sequencing, especially cancer diagnostics will become much more routine, and so will treatments based on such genetic profiling. It will be much easier to look at risk factors for genetic diseases. Microbial infections will be easier to characterize in detail.

All of this is still a few years off, but the promise of this technology is already apparent enough to include it among the great breakthroughs of 2007.

Go look at the very informative websites of 454 Life Sciences, Illumina, and Applied Biosystems, the major players in next-generation sequencing.

For more on Sanger sequencing, check out Sanger's Nobel Lecture (pdf file).

A recent commentary and primer on next-generation sequencing in Nature Methods (subscription required).

Real Science vs Intelligent Design

If Intelligent Design advocates are so insistent that most of the human genome is functional, why aren't they doing any research like this? Eric Lander's group at MIT devised a way to test whether the thousands of non-conserved, putative protein-coding genes are likely to be spurious or true protein-producing genes.

From the paper, here is their rationale:

The three most widely used human gene catalogs [Ensembl, RefSeq, and Vega] together contain a total of 24,500 protein-coding genes. It is broadly suspected that a large fraction of these entries is simply spurious ORFs, because they show no evidence of evolutionary conservation. [Recent studies indicate that only 20,000 show evolutionary conservation with dog.] However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation; the alternative hypothesis is that these ORFs are valid human genes that reflect gene innovation in the primate lineage or gene loss in other lineages.

Here is what they test:

The purpose of this article is to test whether the nonconserved human ORFs represent bona fide human protein-coding genes or whether they are simply spurious occurrences in cDNAs.

And here is their conclusion:

Here, we provide strong evidence to show that the vast majority of the nonconserved ORFs are spurious.

This is how you do science. If ID advocates were serious about science, they would be testing similar hypotheses and publishing them.

Wednesday, January 09, 2008

Scientific American and Web 2.0

What are blogs, wikis, and networking sites doing for science? Scientific American has an upcoming feature article about science and the Web 2.0. And in the spirit of Web 2.0, they're inviting your comments, so go check it out.

It's a fascinating topic, and I'll have more to say about it soon.

Thanks to Jean-Claude Bradley over at Scientific Blogging for mentioning this.

Nature comments on creationism

Tomorrow's Nature issue has an editorial (subscription only) praising the latest NAS book on evolution/creationism. The editorial goes on to suggest that:

"Between now and the 200th anniversary of Charles Darwin's birth on 12 February 2009, every science academy and society with a stake in the credibility of evolution should summarize evidence for it on their website and take every opportunity to promote it."

They also post a link to paeleontologist Kevin Padian's testimony at the Kitzmiller intelligent design trial - it's worth checking out.

Tuesday, January 08, 2008

PNAS Evolution Editorial

If you can get journal access, check out this editorial by creation/evolution veteran Francisco Ayala in the latest edition of the Proceedings of the National Academy of Sciences

Monday, January 07, 2008

Yes, you do have to be a genius to read this blog...

Although if the blog is that incomprehensible, I'm not sure what that implies about the writer.

Sunday, January 06, 2008

What Genes Did We Lose to Become Human?

When we think of the genetic changes that to take place during our evolutionary history, we typically think of changes that resulted in a gain of function, like genetic changes that resulted in a larger and more sophisticated brain, improved teeth for our changing prehistoric diet, better bone anatomy for bipedalism, better throat anatomy for speech, and so on. In many cases however, we have lost genes in our evolutionary history, and some of those losses have been beneficial. The most widely known example, found in every introductory biochemistry textbook, is the sickle-cell mutation in hemoglobin - a clear example of a mutation that damages a functional protein yet confers a beneficial effect. People with mutations in both copies of this particular gene are terribly sick, but those who have one good and one bad copy are more resistant to malaria. Another example is the CCR5 gene - people with mutations that damage this gene are more resistant to HIV. In the more distant past, a universal human mutation in a particular muscle gene that results in weaker jaw muscles may have played a role in brain evolution, by removing a constraint on skull dimensions.

These few examples were found primarily by luck, but now with the availability of multiple mammalian genome sequences, researchers can systematically search for human genes that show signs of being adaptively lost at some point in our history. David Haussler's group at UC Santa Cruz, in a recent paper, looked for the genes we lost as we developed into our modern-day human species. What they found could help us better understand our evolutionary history, and possibly the human diseases that are the side-effects of that history.

It's not hard to find genes that have been lost in the human genome - our genomes are littered with pseudogenes, genes which harbor inactivating mutations making them unable to produce a functional protein. But most of these damaged genes have functional copies elsewhere in the genome. Genes are frequently duplicated in the random shuffling that goes on in our chromosomes, and often the duplicate copy will be destroyed by mutations while the good copy continues to perform its original function. Another frequent phenomenon is the production of processed pseudogenes - these are genes that were produced when RNA was transcribed back into DNA and integrated into the genome. The result is a gene that looks just like a highly processed RNA molecule, often surrounded by the classic genetic residue that is left behind when a piece of DNA is integrated back into the genome.

Looking for lost genes

Haussler's group was not interested in these two relatively mundane classes of pseudogenes; they were searching for genes that are clearly functional in other mammals, and which have not been duplicated or reverse transcribed from RNA. In other words, the genes they were looking for would have no other functional copies hanging around somewhere else in the genome. Genes in this category are likely to be candidates for adaptive losses - genes whose function has been completely eliminated from the human genome, which may have provided some benefit to the original ancestor in which the gene was lost.

These researchers performed their search by comparing functional genes in the mouse genome with genes in the human and dog genomes. Mice are more closely related to humans than dogs are; a gene that is present and functional in mice and dogs, but destroyed by mutation in humans has therefore been present in mammals for a very, very long time, but was recently lost in the human lineage.

You can see how this works in the figure below (which I have marked up slightly from the original version in the paper). The mouse and human versions of our hypothetical gene line up nicely, but there is an inactivating mutation in the human version. In the dog version (not shown), that mutation is absent.

What they found

After completing their genome scan, and applying various quality control filters, Haussler's group came up with 72 candidate lost genes. They found some well-known lost genes, such as GULO, an enzyme necessary for making vitamin C that was been destroyed in primates, but is still functional in most other mammals. (If we still had a functional copy of this gene, we wouldn't get scurvy.)

Thee researchers found new lost genes as well, most of which have poorly characterized functions. One gene, named ACYL3 (NM_177028 in the mouse genome, for those of you who want to check out GenBank), contains a very highly conserved enzyme structure, called an acyltransferase domain (see more here). This particular acyltransferase domain is very ancient - it is found in bacteria, archaea, plants, fungi, and animals. Many species have multiple copies of this domain (the fruitfly has over 30), but mammals have only a few copies, and humans have absolutely no functional copies. We know almost nothing about this gene: it produces a membrane protein, it is expressed in the mouse pituitary gland, and it is necessary for normal embryo development in worms and flies. Why was it lost in humans, and was that loss beneficial? We don't know yet. Do we get some diseases that mice don't get, because we lack this gene? It would be fascinating to find out.

Although we know little about ACYL3, Haussler's group was able to pinpoint the timeframe of the loss. This gene was destroyed by a nonsense mutation, a change from TGG (coding for the amino acid tryptophan) to TGA, which means stop - the protein is truncated right in the middle of a highly conserved region. This TGG to TGA change is found only in chimps and humans, not gorillas, orangutans, or any other mammals checked. The mutation therefore happened after the chimp-human lineage split off from the other great apes. Gorillas and orangutans have a functional ACYL3 gene; humans and chimps don't.

Other lost genes found by these researchers included genes lost in only chimps, gorillas and humans, genes lost in all great apes, and genes lost in primates. Each of these losses is an example of an ancient gene that was functional for hundreds of millions of years, but then lost very recently in the lineage that led to primates, and ultimately humans. Without detailed functional studies in multiple species, it is hard to know which genes were lost adaptively, providing an immediate benefit sustained by natural selection, and which were lost simply because they were no longer necessary in a particular environment (such as the vitamin C biosynthesis enzyme). But with this list of genes in hand, we know where to start. It would be fascinating to figure if any of these lost genes are linked to human diseases (like GULO and scurvy), since the lost function could provide an important clue regarding the mechanism of the disease.

It's worth noting the implications of studies like this for the creation-evolution debate. Creationists, including most of the major advocates for its latest form, Intelligent Design, have frequently expressed their disbelief in human evolution and the common ancestry of today's species. In the case of ACYL3, the only plausible explanation for the pattern of mutation that Haussler's group found is that humans and chimps shared a common ancestor. If humans and chimps had been created de novo as separate species, it would an be extremely unlikely coincidence for these identical mutations to occur by chance individually in each species. And when you factor in the distribution of dozens, hundreds, thousands of such mutations, all occurring in a similar pattern, it becomes not just unlikely but essentially impossible that such patterns of mutation would have arisen by chance in each separately created species. One could suggest that a designer just decided to arrange things this way, but that argument falls in the same class as the claim that God just made the universe and the earth appear old, after creating it all 10,000 years ago. There is simply no rationale based on evidence to question the fact of common descent, and those who continue to resist this fact can only do so for religious or psychological reasons.

Figures from the original paper were annotated, cropped and posted under the PLoS Open Access License.

Friday, January 04, 2008

A Science Debate in the US Presidential Election?

It's a long shot, but a worthwhile cause. If you care about science issues in the upcoming US election, consider signing up to support Science Debate 2008.

Check out this book

The National Academy of Sciences has revised this classic book on Evolution and Creationism. (It's free online.)

How To Grow a New Head: The Amazing Regenerative Powers of Planaria

Planarians have fascinated centuries of biologists by their amazing powers of regeneration. If you decapitate a planarian, the body can grow a new head, and the head can grow a new body. In fact, if you cut out a very tiny chunk from the side of a planarian, that chunk will be able to regenerate a new, complete organism. How do these strange critters manage this? What genes do they have that we don't have? As it turns out, most planarian genes are shared with humans, and several groups of scientists are using the latest tools of genomics and molecular biology to figure out just what it is that gives planarians their remarkable powers of regeneration. These researchers hope that planarians will ultimately teach us how to regenerate human injuries.

Planarians are small flatworms that live in ponds and rivers on almost every continent. They are among the most simple organisms that possess a central nervous system: behind the two cross-eyed looking eye spots is a small bundle of neurons that serves as a brain. Extending from the brain is a primitive yet effective nervous system. Planarians also have a digestive system, and, in some species, both male and female reproductive organs (sexual planarians are hermaphrodites). All of these specialized tissues can be regrown from a tiny chunk (as few as about 10,000 cells) cut out of the side of the worm. The great geneticist Thomas Morgan, before he went on to do his famous fruitfly work, discovered (in the kind of work that would be a 10 year-old boy's dream come true) that as little as 1/279th of a planarian is enough to regrow the entire body. This is an incredible feat for such a small chunk of cells - the cells have to reorient themselves, establishing anterior and posterior ends, as well as bilateral symmetry. These cells also have to reform every type of tissue in the organism. (For some fascinating pictures of a regrowing head, check out this PDF file.)

Revealing the Secrets of Regeneration

The secret to these remarkable powers are stem cells, called neoblasts, scattered throughout the organism. How these neoblasts function in regeneration has been the focus of intense research, notably in the labs of Alejandro Sanchez Alvarado at the University of Utah, and Phillip Newmark, at the University of Illinois, Urbana-Champaign. These researchers are using the latest tools of genetics, such as knocking down genes with RNAi, to identify those genes that enable these stem cells to properly rebuild complete planarian bodies. And to leverage all of the tools that modern molecular biology has to offer, the Genome Sequencing Center at Washington University is currently sequencing a planarian genome.

Researchers in Alvorado lab recently identified two important genes involved in this process, genes that just happen to be important genes in all animals, including humans. These researchers found that when they shut down a beta-catenin gene using RNAi, planarians with amputated tails would grow a new head, instead of a new tail. In fact, shutting down beta-catenin could also transform tails into heads in whole animals with no amputated parts at all. The Alvorado lab also found that a gene which antagonizes beta-catenin, called APC (the planarian version of a gene involved in human colon cancer), has the opposite effects of beta-catenin: shutting down APC results in planarians that regrow tails in the place of amputated heads. These two genes, highly conserved in all animals, thus appear to work as master switches in planarian regeneration. An important emerging theme in modern biology is that the amazingly diverse traits found among animals is not the result of new genes for each new feature; instead, this diversity arises from how a core set of genes is put to use. Planarians share the majority of their genes with us, which means that we may some day learn how planrians use their genetic toolkit, and thus be able to manipulate human genes to perform such amazing feats of regeneration. While we won't be regrowing human heads, there is a very real hope that we'll be able to regrow human nerve tissue, such as spinal cords.

Planrians in Evolution

Planarians are also remarkable examples of creatures that appear 'half-way evolved' (of course modern planarians have been evolving as much as we have): they have features that likely resemble evolutionary intermediates of complex vertebrate organs, such as eyes. Planarian eyes don't have lenses, but they do have light sensing cells hooked up to the central nervous system, as well as a curved 'optic cup' that enables planarians to gain some information about the direction of a light source. Another primitive feature of planarians are the early cell divisions of planarian embryos: in contrast with the highly ordered process seen in vertebrates, planarian embryos undergo a series of very disordered cell divisions. And yet this system of disordered embryogenesis produces perfectly good adult organisms. People often find it hard to imagine how evolution can produce complex processes or structures, such as human eyes, or the development of a human embryo. How could an eye progressively evolve? In planarians, we can see that much more primitive physiological systems, having only some of the features of their more complex human counterparts, can be perfectly functional and therefore sustained by natural selection.

For more on planarians, and some fun pictures, check out:

http://www.planarians.org/
and this cool picture.

Photo credit: Wikipedia commons (uploaded by Alejandro Sanchez), under the Creative Commons License