Saturday, April 28, 2007

Are boring science classes the reason we aren't training enough American scientists?

Tom Friedman, in his Friday's NY Times column (subscription required) comments on Walter Isaacson's new biography of Einstein and asks:

"If Einstein were alive today and learned science the boring way it is taught in so many U.S. schools, wouldn’t he have ended up at a Wall Street hedge fund rather than developing theories of relativity for a Nobel Prize?"

While our schools can undoubtedly use improvement in their science curricula, there is a more important reason we're not training as many scientists in this country as we should - science just doesn't pay off, at least not until the scientist has reached an age when most other professionals in the US are thinking about how much to put in their 401Ks instead of worrying about where to find their first job. If Einstein were alive today, he'd have good reason for choosing to work at that hedge fund over the spartan life of a young scientist struggling to get a tenure-track job.

By the time you start making a reasonable salary as a scientist, you've watched for years as all your friends who are MBA graduates, computer engineers, physicians, lawyers, or even moderately paid teachers buy houses, take family vacations, buy their kids music, sports, and dance lessons, and start a substantial retirement savings - all while you were just barely getting by from paycheck to paycheck. Scientists begin saving for retirement and their kids' education at an age which financial self-help books tend describe, with half-hearted optimism, in phrases like 'not completely hopeless.'

For a science career, you live in poverty for 5-7 years of grad school, with no retirement benefits and poor health insurance. That's somewhat understandable - law students and med students also live in poverty during their graduate training (although a PhD program is longer, and senior science grad students are already working at a professional level that lawyers and physicians don't reach until after they graduate).

After grad school, a postdoctoral fellowship is absolutely necessary for any kind of academic career at a research university or in a senior position as an industrial scientist - in other words, the kind of permanent job that induces people to seek a PhD in the first place. Postdoctoral fellowships usually run at least three years now, with a salary in the 32-45k range (you maybe reach 45k after 2-3 years, if you have an outside source of funding). As in grad school, there are no 401Ks or other retirement benefits, and even when you do get decent health insurance, all the money for the premiums (including the 'employer' contribution) is considered taxable income through some perverted loophole in the tax code.

Scientists tend to be in their mid- to late 30's by the time they get a permanent job, and on average they don't get their first research grant (critical for establishing an independent research program) until age 42. This is extremely late for people who want at least some financial security before they start a family. Women scientists are disproportionately affected, because the brunt of the work of having children, biologically and usually socially, falls on women. Those who hope to have a family life are faced with the unpleasant choice of waiting a long time or risking their job and financial security to have children at a more normal age.

The whole process can be extremely discouraging and demoralizing. I have a PhD, more than 8 years of experience in the lab, several published papers, my own funding based on a successful, peer-reviewed fellowship proposal, and I earn less than my younger sister, a clothing store manager who successfully worked up from an entry level job in the two years she's been out of college. I work nights and weekends with no overtime pay. Instead of recording overtime, I fill out a time sheet at the end of the month stating my "hours absent" from the university. I have school debt that's accumulating interest, while my kids rely on grandparents to buy them shoes and swimming lessons.

The National Postdoctoral Association
puts it this way:

"Postdoctoral scientists have slipped between the cracks of the scientific workforce as a heterogeneous group of 'apprentice' scientists. They generally do not have well-defined expectations of employment, appropriate employment rights and responsibilities, commensurate or even normalized pay scales, performance evaluations, consistent employment benefits such as proper health care, pensions, occupational health insurance, or procedures for resolving conflict."

We could have the best public school science curricula in the world, but that would not be enough to overcome the large disincentives students face when considering a career in science.

Tuesday, April 24, 2007

The Rhesus Macaque Genome - Can it help us learn about ourselves?

Just recently Science published the paper describing the latest primate genome - the rhesus macaque genome. (Check out Science's macaque website for some good (and free) articles on the subject.) Sequencing a large genome like this one is resource intensive (unlike microbial genomes, which are now easily and routinely sequenced), so why did scientists sequence yet another primate genome? In addition to the human genome we already have the chimp genome, and we also have several non-primate mammalian genomes - the mouse, rat, cow, dog, and opossum genomes. Is this a good use of our money? Why put in so much effort just to study evolution?

Evolution is worth studying in and of itself, however evolution is so tightly connected with every field of biology that it's hard to avoid evolution when you're studying anything else. We sequence these genomes because we know we can use evolutionary principles to understand the nuts and bolts of the genome. This strategy has been used already with great success in major genetic model organisms, including flies, worms, and yeast.

Most of us, I would bet, are more interested in the human genome than any other, and ultimately we sequence these primate genomes to understand our own genome. The chimp genome is helpful, but we need a more distantly related species to really enable us to effectively use genome comparison to learn about all those parts in our DNA. The rhesus macaque is a great pick, because it has been used extensively in medical research, and it is an Old World monkey - one of our closest relatives outside of the great apes.

Genome sequencing simply gives us raw sequence, such as this region from human chromosome 11:
(Sequence is read from left to right, line by line, like regular text.)

GAGGAGGCGGCGGAGGAGGGGCCGCCCGCGGCCCCCGGCTCACTCCGGCACTCCG
GGCCGCTCGGCCCCCATGCCTGCCCGACCGCGCTGCCGGAGCCCCAGGTCCGGGG
GCGGAGGGGAGCGCTGCCGCGGGGGTGGGCGGGCGGGGCGCGGGGGCCATGTGCG
AGCGCGGCAGGGAGGCGGGCGGGGCGGGCTGCAGGCGGGGTCCGACTCTGGGGCC
AGTCCGGGCCACGGTTGGGACCCAGTCGAGGGTCGGACTGGTCAGGGTTCAGGCG
GGATCCGGCGTCCGAGTCCTGGTGGGCCGGCCTGGGGCAGGATCTGGCTCTGGCT
GCGGGTCCTGACTCGGGTCAGGGTTGGGCCTCCGATCCAGCCCGCTCCGGGGCAG
GGTTCAATCCCGCATTTGCCGAAGTCCCTGGGGCTGGCCGGGGTGGAAGACGGGG
AGGGCTCTATGTCTGGGAAGGGGCTCTGAAGACCACGTGGGGGCGCTCGAAGGGG
CCTGGGGCCACCCTCCTCTCTGGGTCAAAGGTCATCGCACCGGCAGGGGAGAACT
TCCTCCTCCTTGGCTCTCCCCACTTACTTCCTGATAACCTGGTAGAGGTCTCCCG
CGGGCGGGGAGGGGGAGGCGTAGCAACTTTAGGCAACTTCCCAAAGGTGTGCGCA
GGTTGGGGGCGGGACGCGGCGCCCCGGGAGGTGGCGGCCTCTGCGACAGCGGGAG
TATAAGAGTGGACCTGCAGGCTGGTCGCGAGGAGGTGGAGCGGCGCCCGCCGTGT
GCCTGGGACCGGCATGCTGGGGCAGGAGGGCAGCCGCGTGTCAGGTGTGAAAAGC
TCTGGAGGTGTTTTCATGAGTCCGTGCCTGTGCGTGTGGATGTGGGGAGACCTAG
TGAGAGTGTGTGTGATCATGAGCCTTGACTGAGTTCGTGGATGGGGTGTGCGCTC
CAGGAGAAGTGTGTGAGCACAAGTGTGAGCAGGAGTGAGCACGGGTTTGGGAAGG
CCGGTGCAAGTGTGAAAGCCCTCAGCAGAGAGCGAGCCTGCGTGGGCTTGTGGGG
CTCCTGAGCACCCCGGTGAGTGGAGTGTGTGAACTCGGTGTGAGCACGTCCACTG
GCCTTGGGTCTGCTCTCCAATGCAGAATACCCAGATGAGGGCAGGGTCTCAGAGG
TCCCCCCAACATCTGGAGAAAACTGGGAAGTATCCTGCTCCTGGCTAGGGATTCC
AGGTGGGGTTGAAGGTTGCCTGGGGGCTACGGTTACCCTGCTCCCTGGCCTGGGT
GGGAGTAGGGGCTTTCTAAGCCTCCCCCAGGTTCCCAAGGGGGAGACCTGCTGTC
AGTTACTGGCCCTGAAGACTCTGTTTCCATGGCAACAGCTAGGAGGGGGCAGTGT
TCCTGGGCAGTCCTTCCTTGGACTCTGCCCCCCTTCTTCCCCACTTGCTGGGCTT
GGAAGCCTGGCCCTAGGCCCGAGGTTGGGCAACCCGTGTGGCAGGGTGTCTCCCA
TCCCCCATACCAGTGCTTTCCTGCGAACCTATGGGTCTCTCCGTGCAGGTGACCA
GCGCCATGTCCAGCCAGGTGGTGGGCATTGAGCCTCTCTACATCAAGGCAGAGCC
GGCCAGCCCTGACAGTCCAAAGGGTTCCTCGGAGACAGAGACCGAGCCTCCTGTG
GCCCTGGCCCCTGGTCCAGCTCCCACTCGCTGCCTCCCAGGCCACAAGGAAGAGG
AGGATGGGGAGGGGGCTGGGCCTGGCGAGCAGGGCGGTGGGAAGCTGGTGCTCAG
CTCCCTGCCCAAGCGCCTCTGCCTGGTCTGTGGGGACGTGGCCTCCGGCTACCAC
TATGGTGTGGCATCCTGTGAGGCCTGCAAAGCCTTCTTCAAGAGGACCATCCAGG
GTGAGCCCCCAGCCCACTCCCCTGTCCTTTGCCCTGCACCCTCTGGGTACACTGC
TGGGTGCAATAGGCCCCCTGATGGCTGTGGCACCGCTTGAGGCTAACAATCTGGT
GTTTCCAGTCCCTCTACCTCCCAGAGACACTCTTTCCCTGAGAAGTATGGTAAAA
GCACCGGGTGTGCTGATGCATTGCAGTGGATGTGAGTGAGTTCAGGGTACCACCT
GGGTACTCTAGGCCCAGCACCTTCTACAGTGGCTCTGAAAGAGTCCAAGGCAGCC
TCTGTCTGTTCCTAAGCTTTGTTCTTGTTTCTGGCAGCTTCTGACCTCTCCCCAG
CATAGAACATGTCCCCTTTTTGTTAATTTTCCCAAAGCAGCACCAACACAAGGCA
GATTTTAATTTTTTTTTTTTTGAGACAGAGTCTCACTCTGTTGTTCAGGCTAGAG
TGCAGTGGCACAATCTCTGCTCACTGCAACCTTTGCCCCTGGGTTCAAGAGATTC
TCCTGCCTCAGCCTCCTGAGTAGCTGAGACTGCAGGTGTGCACCACCACGCCCAG
CTAATTTTTGTATTTTTAGTAGAGACGACGTTTCACCATGTCGGCCAGGCTGGTC
TGGAATTCCTGACCACAAATGATCCACCTGCCTCGGCCTCCCAAAACAAGGCAGA
TTTTTATCAGTACTTGAGAGGGGCTACATCATAGTTTAGCACCCAACTTTAAAAA
GACTAACAGGCAAGGCCGGACACAGTTGCTCACACCTGTAATCCCAGCACTTTGG
GAGGCCAAGGTGGGCGGATCACCTGAGGTCAGGAGATCGAGACCAGCCTGGCCAG
GGTGGTGAAACCGCATCTCTACTAAAAATGCAAAAAATTAGCTGGGCATGGTGGC
TCGCGCCTGTAATCTCAGCTACTTGCTACTTGAGAGGCTGAGGCAGGAGAATTGC
TTGAACCCAGGAGGCAGAGGTTGCAGTGAGCCAAGATCACACCACTGTACTCCAG
CCTGGGTGACAGAGCGAGATTCCATCTCAAAAAAAAAAAAAAAAGGCCGGGCACT
GTGGCTCATGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCATGAGGTC
AGGAGATTGAGAACATCCTGGCTAACACGGTGAAACACTGTCTCTACTAAAAATA
CAAAAAATTAGCTGGGCATGGTGGCGGGCGCCTGTAATCCCAGCTACTTGGGAGG
CTGAGGCAGGAGAATGGCGTGAACCCAGGAGGCGGAGGTTGCAGTGAGCCAAGAT
CACGCCACTGCACTCCAGCCTGGGCGACAGAGTGAGACTCCGTCTCAAAAAAAAA
AAAAAAAAGGCTGGGCGCGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGC
CGAGACGGGCGGATCACCTGAGGTCAGGAGTTTGAGACCAGCCTGACCAATGTGA
TGAAACCCCGTCTCTACTGAAAATACAAAAATTAGCCAAGCATGGTGGCATGCGC
CTGTCATCCCACTCAAGAGGCTGAGACAGGAGAATTGCTTGAACCTGGGAGGCAG
AGGTTGCAATGAGCCCAGATCGCGCCATTGCACTCTAGCCTGCGCAACAAAAGTG
AAACTCCACCTCAAAAAACAAAAACAAAAACAAAAACAAAAAAACCCAAAAACGC
TGGGCTTGGTGGCTCATGGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCAGAC
GGATCACGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGGTAAAACCCCGTC
TCTACTAAAAATACAAAAATTAGCCGGGCGTGGTGGTGAGTGCCTGTAATCCCAC
TACTTGGGAGGCTGAGGCAGGAGAATTGCTTGAACCCGGGAGGCAGAGGTTGCAG
TGAGCTGAGATCATGCCACAGCACTCTAGTCTGGGCAACAGAATGAGACACTCTC
ATCTCAAAAAAAAAAAAAAAAGGACTTACAGGCATGTCTGCTCTTAAAAGTCACT
AATTTTTTTCTCACTCAGGAAAGCTTATCAGAATTTGGGGGAATGAGCAAGATGC
TGACATTAAGCATTGCCTGGGAAGGGCCTATTATTTCCGTTATTTCTGCTTTTAT
GTAACCATTGGTTACTTTGGGGGCTATAACACGTATAATTAAAAAAAAAAAAAAA
AAGGCCAAGTGTGGTGGCTCACACCTGTAATCTCAGCACTTTCGGAGGCTAAGAT
GGGAGGATCACAAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCC
TGTCTGTACTAGAAATACAAAAATTAGCCAGGTGTCGTGGTGGGTGCCTGTAGTC
CCAGCTACTCAGGAGGCTGAGGCAGGAGAATTGCTGGAACCCAGGAGGCAGAGGT
TGGAGTTAGCCAAGATCGTGCCACTGCACTCCCAGCCTGGGTGACAGAGTGAGAG
TTCGTATCAAAAAAAAAAAAAAAAAAAAAATCTTGAGTGCTTACCTTGTGCTAGG
CACTGTATTCTTTTATGATCTCAGTTAGTCCCCACAGCAACCCTATAAGGTGTCA
GTACTGTTATAACTGAAACTAAGAGAGGCATTTGAAACTTTGTTGAAGTCTCACA
ACTAGGAAATGGCAGAACCAAGATTTGAACTTGGGTCAGTATAGGTCCAGAGCTG
AGCTCTTCAATGTTAGACTGCTTCCTCTGCTTATTACTAATAACACCGAACTTTG
GACAGACGCTGAATGACTGATTGTGACATTCCAGCACGTTTTTTTTTTTTTTTTT
GAGACAGTCTCGTGTGGTCGCCCAGGCTGGAGTGCAGTGGCACGATCTCGGCTCA
CTGCAAGCTCCGCCTCCCGGGTTCACACCATTCTCCTGCCTCAGCCTCCTGAGTA
GCTGGGACTACAGGTGCCCGCCACCACGCCTGGCTAATTTTTTGTACTTTTAGTA
GAGACGGGGTTTCAGCGTGTTAGCCAAGATGGTCTTGATTTCCTGACCTCGAGAT
CCACCTGCCTTGGACTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACTGCTCCT
GGCCAGGTTTTTTTTTTTTTTTTTTTTTTTTTTGAGATGGAGTTTTGCTCTTGTT
GTCCAGGCTGGAGTGCAACGGCCTGCAGTCGTGGTTCACTGCAACCTCTGCCTCC
CGGGTTCAAGCCATTCACCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCGC
CTGCCACCATGCCCGGCTAATTTTTGTGTTTTTAGTAGGGATGGGGTTTCACCAT
GTTGGCCAGGCTGGCCTCAAACTCCTGACCTCAGGCGATCTGCCCTCCTCGGGCT
TCCAAAGTGCTGGGATTATAGGTGTGAGCCACTGCACCCCGCCAATCCAGCAAGT
TTTAACTTGGCCAAAATCCACCAATCTTAAACTTTGTGCACCCTTCCCACTCTGA
AGAACAGTGAGCCAGCCGGCCAGGGTGCGGGTATCTCCTACCTACCCTGGGGCCC
CTCACTGTATGTTGACTATTGACAAATATTTATTGTGTGCTGGCTGTGAATAGGA
CTTGTATATTGAGCACTTAGGTGTCATGAACCATGCTGGATGTTTTGACCATATT
ATCCCCTTTAATTCTCACGACCCAACTCTGTGGGGCACTTTTACAGCTGGGAAAC
TGAGGGTTCAAGGGGTTAGGTATGGGACTTGCCCAAGGTCATAAAGGTATGTGGT
AGCCAGAGTCCCTGTTCGGCACAGACCTGTTCTTTGCTGTCCTGGCCAGTGTTCC
AGGCCTTGGGGACATAGCTGGGGCTGAAGCAGGGCTGTTTCTGCCCTCAGGCAGT
TTACATCCTGGCAGAGGGGAGAGCTGGGCAACAGTGAGTTGCACAGACTTGTCTT
ATTACCGCTGTGGTATGTGCAGGAAGGGGAGGTGCTGGTTCTGAGGCTCCAGAGG
GCTTGTCTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTTTGTTGCCCAGGCTA
GAGTCCAGTGGCGCGATCTCGGCTCAGTGCAAGCTCCGCCTCCCGGGTTCAAGCG
ATTCTCCTGCCTCAGCCTCCCCAATAGCTGGGATTACAGGCGCATGGCACCACGC
ACGGCTAATTTTGGTATTTTTAGTAGAGACTGGGTTTCACCATGTTAGCCAGGAT
GGTCTCGATCTCCTGACCTCGTGATCCACCCGCCTCGGCCTCCCAAAGTGCTGGG
ATTACGCTCCCGGCCTCTTTTTTTTTTTAGACAGAGTCTCACTCTGTTGCCAGGC
TATAGTACAGTGGCACGATCTCAGCTTACTGCAACCTCCGCCTCCCAGGTTCAAG
CGATTGTTCTCCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCACACGCCCAGCT
AATTTTTGTATTTTTAGTAGAGACAGGGTTTCACCGTGTTGGTCAGGCTGGTCTC
AAACTCCTCACCTCGTGATCTGCCTGCCTCGGCCTCCCAAAGTGCTGGGATTATA
GGCGTGAGCCACTGCGCCTGGCCTTTTTTTTTTTTTGGTACAGAGTTTCGCTCTG
GTTGCCCAGGCTGGAGTGCAATGGCACGATCTTGGCTCACTGCAGCCTCTGCCTC
CCGGGTTCAAGCGATTCTCCTGCCTCAGCCTCCGGAGCAGCTGGGATTACAGACA
TGCACCACCATGTCCGGCTAATTTTTTTTTTTCGAGATGGAGTCTCACTGTGTCA
CCCAGGCTGGAGTGCAGTGGCACAATCTCGGCTCACTGCAACCTCTGCCTCCCGG
GTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTACAGGTGCCTG
CCACCACACCCAGCTAATTTTTGTACTTTTAGTAGAGACGGGGTTTTACCATGTT
GGCCAGGCTGGTCTTGAACTTCTGACCTCAGGTGATCCACCCACCTCGGTCTCCC
AAAGTGCTGGGATTACAGGCGTGAGCCACCGTGCCCGGCCGTGGTGTCTTGAGCT
GAGTGCAGAAGCGCAAATAGGGGGTAGGAGAAAATGCACCGCGAGGAGAAATGTG
CTGCGGGCCTGCTGTCTAGCTGTGTCATTTGGTCGTTGCGGGGCCCTGTGAGGCC
GGGAGGGCTGCCAGCACCCACCATGTGCCAGGCCTCGTTGCTAGTGCTGGGGCCA
GTTCCTGCCCCGGTGGAGCTGCCACTGAAGGGGGAGGCGTAATAAACAAGATAGG
TGAGTGCATATGCAGCGTGGTCTGTTGTGCTGAGGGCTGAAGAGAAACCAGAAGC
AGGGCTCAGAGGCCAGGAGGACTCTGCAAAGGGATTTGGCATTATCACAGGGTGG
CCAGGGAAGATCTTCAAGGTGACAGTGAGCAGAGGGAGGTGAGGGAGCCTGTGTG
GACTTCAGGACTAGAGCTCCAGGCAGGGCCTGTTTGAGGAACATGGAGGAGGCGA
GAGCAAGGAGTAGAGGTCAAAAGGAGGCAAGAAGCAGGGGCGTAGGCCTAGGAGG
ACATAGGTTCGCTTTGGCTTGGACTCAGAGAAGGGAAATCCCCAGAGGGTTTTGA
GAAGAGGAGGTACAGGATGTAATGGAGGCTTAATAGGACCCTCTTGGCTGCTGAG
TCGAGAACAGACTGGAGCAAGCAGGGACAGCCAAGCGAGGGGCGAGGTGACAGTG
ACTATCAGGTCAAGGGTGGAAGTAGTTGCCAGGGGCAGGAGGCGGATTCTGGACC
TTGGAGGAGGTAAAGCCCACCAGAATGTGTCGGTGGCTTGGATGTGGGGTGTGAG
AGGAACCAGAGATTCTGCCTAGGTTTCTTCTTGGGCAAGTGAACACGTGGAGTCC
ACGTAGGCTGTGTTCGGTCCGAGATGCCTTCTAGACATGCAGGATGTCAAGGAGG
CAGCTGGAGAGATGGGTCTGGAGCTCACAGCAAGTCCAGGCTAGAGGTAGAAACG
TGAGAGCCCCACGGCTGGGGAAGATTGCCATGGGATTGGAGATGAGCTCCAAGGA
CAGCCCTGGCAGTCTGGATGGAAGAGCTTGGGAAGATGCTCAGAAACCACAAAGT
GGCTGGTGCGGTGGGAGGAAAACCAGAGTGTATGCTGTCCTAGAAGCAAAAGAAG
AAAGTGTTTCAGTGTTTCTAGGAGCAGGAAGTGATCAACAGCCTTAGATCCTCCT
TTTAGGCCAAGTAACATGAGGACTAAGAATTGACCACTGGATTTAGCAATGCAGA
GGTCCTTGTGGCCCTTGATGTCGGCAGATGAGGGCAGTGTGGTCCAGAGATGAGG
CTTGGGGCTGAGATGCAGCCCCGCTGCCTGGTCCAGCTCCTCCCTCATCCAGGCA
GGGCTCCCCCGCCCAGCAGCCACTCCCCTCCCTGCCTGCTCATGGCCCCCTGCTC
TCCCTTTCCTCCCCATACCCCCAGACCTGTGCTTGCCCGGGGAGAGTCAGGGCTC
TCCTGTCAGCTGGGTCCCCTCCCAGCCCCGGGAGGCCGCCACTGGAGCCCTGCCT
CTTCCTGGCAGGGAGCATCGAGTACAGCTGTCCGGCCTCCAACGAGTGTGAGATC
ACCAAGCGGAGACGCAAGGCCTGCCAGGCCTGCCGCTTCACCAAGTGCCTGCGGG
TGGGCATGCTCAAGGAGGGTGAGCGCTGGGCAGGGGCTGGGCGAGGGCTGGGGGA
GTCGGGGACCCGGGCCAGGTGGGGGTGAGGCCTGGGAGTTCTGGTGAGTGGACTC
GGG

I purposely included a long chunk (actually it's really a very tiny piece of chromosome 11) just to convey what this vast sea of unannotated sequence looks like. About all you could do with this is use the genetic code to see if there is something that looks like a protein-coding segment in there. But actual protein coding regions are very sparse, and broken into fragments called exons, which are spliced together before the final protein is made.

What we really want to know is where the gene is (in this case, an estrogen receptor gene) and it's controlling elements are. You won't be able to see the details below, but here is the big picture:




And here we're looking for only a few elements - I haven't included promoter regions, enhancers, non-coding RNAs, transposable elements... To find these elements requires three things:

- computer tools to build models of these elements and search the sequence
- sequence from related species for comparison
- experiments to test your computer predictions.

We have these three elements for yeast flies and worms, but in the case of humans, we have sorely needed more sequence, from an animal like the Rhesus monkey.


I'll finish up with an example from my own work in yeast. Certain proteins, which are master regulators of cell division, modify target proteins at the sequence 'TP..any letter..R or K'. (Now we're talking about protein sequence, so we don't just have A's, T's, G's, and C's.) To understand how these master regulators carry out their role, we would like to know exactly which proteins are their targets. How do we find those targets? Easy - just look for any protein that has 'TP..any letter..R or K' in it, and you have a candidate protein that you can test in the lab!

Well, it turns out it's not so easy - many proteins have this 'TP..any letter..R or K' just by chance - too many to test in the lab. So we want to choose the most likely targets - those whose sequence has been conserved throughout evolution. You can line up the sequences from different species, and easily see the 'TP..any letter..R or K' which has been conserved over 100 million years of evolution:



The sequence on the top line is from baker's yeast, and the sequence on the bottom is from a yeast that shared an ancestor with baker's yeast 100 million years back.

Comparative genomics really works. It has helped us learn a tremendous amount about flies, worms, and yeast. With the macaque genome, we'll hopefully have the same success learning about our own genome.

Sunday, April 22, 2007

Fattening up our nation with farm subsidies...

By the end of this week, this blog should be back in full swing - I've got a post on Vectors, Quaternions and Thomas Pynchon's Against the Day coming up, a review of David Lindley's new book Uncertainty, and a discussion of why the newly sequenced macaque genome, published in Science, is going to help us learn great things.

In the meantime, here's a thought provoking, if somewhat obvious, article in today's NY Times magazine on how the US government's farm subsidies are exacerbating the obesity epidemic that biomedical researchers and physicians are trying so hard to contain.

Almost the best quote I've read all week (it can't quite compete with some gems from the Gonzales hearing):

"The farm bill essentially treats our children as a human Disposall for all the unhealthful calories that the farm bill has encouraged American farmers to overproduce."

As a postdoc with three kids (one of whom, as a second grader, gets to be one of those human Disposalls) and a subsistence salary, this subject hits close to home. It's extremely depressing to walk through the grocery store and see how much cheaper it would be if I gave up on the real juice and fresh vegetables, and instead fed my kids flavored corn syrup, tater tots, and processed frozen vegetables drenched in salt and fake butter. At least in my case, the depressingly tight grocery budget is temporary (assuming a tenure-track job is somewhere in my future), but that's not true for millions of Americans who are very likely to end up obese while living on affordable junk food.

Sunday, April 15, 2007

David Brooks and the Age of Darwin

NY Times columnist David Brooks has written about our society's new grand, all-encompassing narrative - evolutionary theory (Times select - subscription required).

His point is basically this:

"And it occurred to me that while we postmoderns say we detest all-explaining narratives, in fact a newish grand narrative has crept upon us willy-nilly and is now all around. Once the Bible shaped all conversation, then Marx, then Freud, but today Darwin is everywhere."

He concludes that our society has set aside a postmodern aversion to grand explanatory narratives and embraced Darwinism - one which we can "embrace, argue with or unconsciously submit to."

In his column, Brooks makes reference to both popular science books as well as the latest genomics or brain research. While I don't disagree with Brooks that this latest biological view of life is something one can "embrace or argue with," I think it's very important to distinguish between popular science and mainstream research. It's one thing to read Daniel Dennett's Breaking the Spell (basically David Hume's The Natural History Religion skillfully and elegantly reframed from an evolutionary perspective) and disagree with such a thoroughly evolutionary outlook, but quite another to reject the message from genomics research showing that genes impact our susceptibility to type II diabetes, nicotine addiction, prostate cancer, and Alzheimers. Our genes really do affect so much of our lives, including our personalities, our sexual behavior, how we think, and how we age. In fact, too much of the credit for this grand narrative goes to Darwin - Mendel is at least equally responsible.

Our society may have a new "grand narrative that explains behavior and gives shape to our history." Maybe not all, but certainly a good part of this grand narrative is based in solid research that has revealed much about how the world does in fact work, unlike previous grand narratives noted by Brooks - those based on a literal reading of the Bible, or Marx, or Freud. This new narrative is not exclusively the work of "evolutionary theorists" who write popular books and come up with theories to explain everything. True scientific theories are always tested against nature in the lab or in the field, and like physics, with its grand narrative of atoms, modern biology's grand narrative is based deeply in the rigorous study of our world.

Monday, April 02, 2007

Apologies for the extended absence

I apologize for being absent the entire month of March - some sort of respiratory infection has now dragged on for two months, culminating in middle of the night trip to the ER Friday for what was probably a bad reaction to some cold medication. It's been impossible to keep up with this blog.

I also recently got back from a very interesting systems biology conference in Colorado, which I hope to report on soon.

Stay tuned...