Tuesday, October 21, 2003

Celera, Hype and the Human Genome Project

A few weeks ago, in the context of discussing the dog genome announcement, I mentioned that Celera's contribution to the Human Genome Project was less awe-inspiring than it appeared at first glance, leaning heavily as it did on the freely available work of the Human Genome Consortium for the assembly of the fragments Celera had collected using shotgun sequencing. In light of the aforementioned post, I think this PNAS analysis by Robert Watson, Eric Lander and John Sulston well worth reading, as well as this response by Craig Venter, Eugene Myers and others on the Celera sequencing team. The following paragraph from the Waterston/Lander/Sulston critique sums up the reality of the situation, in my view:

The international Human Genome Project (HGP) and Celera Genomics published articles last year on the sequence of the human genome (1, 2). In a recent article (3), we analyzed aspects of the Celera article.

We noted that the article did not report an assembly of Celera's own data but rather reported only joint assemblies based on a data set that included the assembled genome sequence of the HGP. Approximately 60% of the underlying sequence data and 100% of the mapping data used in Celera's analysis came from the HGP, and the HGP genome assembly itself contained 90% of the euchromatic sequence of the human genome. We also noted that Celera used various approaches for using the HGP data (referred to as perfect tiling, gap filling,ΒΆ and compartmentalized assembly; see Fig. 1) that implicitly preserved much of the HGP assembly information. We concluded that Celera's assemblies made extensive and inextricable use of the HGP genome information and thus were not an independent assembly of the human genome.


Our report elicited two commentaries. One, by Green (4), concurred with our analysis. The other, by Myers et al. (five of the Celera authors), raised certain issues about our analysis (5). Specifically, they acknowledge that their approaches preserved the HGP assembly to some extent, but they contend that the role of the HGP data in the Celera joint assemblies was minor.

Here we address the technical issues raised by Myers et al. We show that the analysis of Myers et al. underestimates the role of the HGP genome assembly in their work because they focus on only one of the ways in which the HGP data were used. Moreover, we note that the major role of the HGP sequence can be directly seen from the properties of the Celera assembly.

If there's one conclusion that can be drawn from all of this, it is that private does not automatically mean better, which is not to say that this ought to be taken as a defense of all government endeavors. The key thing to take into account, whether or not one is dealing with public or private entities, is whether or not they face competition: without the impetus provided by the threat of Celera patenting the completed genome, the Human Genome Consortium would still be making slow progress at its' task today, and even perhaps two years from now.

There is one other, extremely important, lesson that I feel is worth drawing from the Human Genome Project, and it is one that will probably go down badly with most hard-core libertarians, and that is the importance of publicly funded research. There are very strong positive externalities in scientific research that make measures like patent protection and trade secrecy inadequate for promoting the public good, and a tremendous amount of important work currently being carried out would simply not be possible under a system in which private companies had 20-year monopolies on research into various portions of humanity's genetic inheritance. One can argue that government-run institutes might not be the best way to encourage scientific investigation, but that such investigation ought to be officially encouraged, rather than left entirely to the market, is something I believe to be indisputable.