Update on genome sequencing costs

For many years, the National Human Genome Research Institute (NHGRI) has tracked the costs associated with DNA sequencing performed at the sequencing centers funded by the Institute. This information has served as an important benchmark for assessing improvements in DNA sequencing technologies and for establishing the DNA sequencing capacity of the NHGRI Genome Sequencing Program (GSP). Here, NHGRI provides an analysis of these data, which gives one view of the remarkable improvements in DNA sequencing technologies and data-production pipelines in recent years.

To calculate the cost for sequencing a genome, one needs to know the size of that genome and the required ‘sequence coverage’ (i.e., ‘sequence redundancy’) to generate a high-quality assembly of the genome given the specific sequencing platform being used. For generating the “Cost per Genome” graph, the assumed genome size was 3,000 Mb (i.e., the size of a human genome). The assumed sequence coverage needed differed among sequencing platforms, depending on the average sequence read length for that platform.

The following ‘sequence coverage’ values were used in calculating the cost per genome:

Sanger-based sequencing (average read length=500-600 bases): 6-fold coverage
454 sequencing (average read length=300-400 bases): 10-fold coverage
Illumina and SOLiD sequencing (average read length=75-150 bases): 30-fold coverage