The Genomics Core blog: How many reads do I need to sequence?

Sunday, 25 January 2015

How many reads do I need to sequence?

A common question we're asked is "how many reads should I use to sequence a sample?" I'm going to focus on genomes, exomes and amplicomes in this post and introduce the Lander-Waterman equation [1]. Other apps are more complex because the number is very much 'how long is a piece of string' for RNA-seq, ChIP-seq and other counting applications - it depends on the complexity of your sample and the sensitivity you'd like to get, but is also affected by the number of replicates you have.

The Lander-Waterman equation

Lander-Waterman: Almost everyone doing NGS is using this equation, even if they are not aware of it. Anyone under 27 was born after it was published (1988), but it is an equation that is good to understand if you are sequencing. Basically it allows you to estimate how many reads of a specific length you need to sequence your genome.

The general equation is C = LN/G where: C = redundancy of coverage, G is the haploid genome size, L is the sequence read length, and N is the number of sequence reads. It can be rearranged to N = CG/L allowing you to compute the number of reads to sequence a genome, exome or amplicome (amplicon-panel) to a desired coverage (this is what we typically discuss when designing experiments).

In the examples below paired-end reads of 125bp from each end of a fragment are used, but these are converted to single 250bp reads for simplicity.

Human genome (3Gb) 30x coverage = 360M reads.
Human exome (150Mb) 50x coverage = 30M reads.
Human amplicome (30x250bp amplicons 0.075Gb) 1000x coverage = 0.3M reads.

[1] Lander, E. S. & Waterman, S. Genomic Mapping by Fingerprinting Random Clones : A Mathematical Analysis. Genomics 239, 231–239 (1988).

Eric Lander founded both the Whitehead and Broad Institutes. Michael S. Waterman is one of the founders of computational biology and gave his name to another important algorithm: Smith-Waterman alignment, he also wrote Computational Genome Analysis with our Director Simon Tavare while at the University of Southern California

5 comments:

용준3 February 2015 at 23:46
I think there a couple of error in the last example.
anyway, thanks for the nice post.
ReplyDelete
Replies
creative biomart10 December 2015 at 00:46
Sequencing a sample is not an easy thing. Thanks for your sharing!
-Caroline
Creative BioMart
ReplyDelete
Replies
murphywu28 September 2018 at 06:14
NCzmPy2Cz/ACM2041519519 can be provided in Alfa Chemistry. We are dedicated to provide our customers the best products and services.
ReplyDelete
Replies
murphywu25 December 2018 at 22:07
Bivariate Cell Cycle Assay (Cyclins/PI) Bivariate cell cycle assay (cyclins/PI) allows to distinguish between G0 and G1 cells, identify mitotic cells, and measure the relate expression of other intracellular proteins to the cell cycle position.
ReplyDelete
Replies
Florence5 June 2021 at 00:15

I started on COPD Herbal treatment from Ultimate Health Home, the treatment worked incredibly for my lungs condition. I used the herbal treatment for almost 4 months, it reversed my COPD. My severe shortness of breath, dry cough, chest tightness gradually disappeared. Reach Ultimate Health Home via their website www.ultimatelifeclinic.com . I can breath much better and It feels comfortable!

ReplyDelete
Replies

Add comment