The Genomics Core blog: January 2015

Tuesday, 27 January 2015

Use your local support team

We have a half-day workshop on Thursday for NGS newbies, the focus of which is library prep for next-generation sequencing. We organise seminars from commercial providers of new technologies throughout the year; but this is a semi-annual event where local users get a chance to present their work, and new users get to hear about what's possible with NGS.

This year we have presentations about RNA-seq, ChIP-seq, Exome-seq, FFPE genomes, DNA methylation, targeted resequencing and a talk on the UoC 10,000 Genomes Project; and afterwards we'll wrap up with beer and pizza. These days require lots of organisation (thanks to Fatimah for organising this years event) but, for the new users especially, turn out to be well worth the effort.

Making use of your local support teams: We also make sure we keep a good relationship with our local technical support teams and run a series of commercial presentations throughout the year. This works out to be much easier to organise as they do the prep work! While we're here in the Genomics Core to help our local users, we get lots of queries from people outside the Cambridge Institute, and this is one way we've found to increase the support we can offer.

Every other month we have Illumina come in to present on a specific library prep, or talk about recent updates. Sandra (Field Application Specialist), and Carla (Marketing Technology Specialist) generally talk for 30 minutes followed by Q&A, and then spend some time with users on a one-to-one basis troubleshooting their problems.

We also try to arrange a training session once per quarter with Thermo. We've been using their ABI 7900 qPCR instruments for eight years and buy in quite a lot of their SYBR and TaqMan master-mixes. Ever since we started working with them we've run "An introduction to qPCR" course for new users. The last one was run by Emma and everyone said it was a great introductory session.

What's in it for them: Neither Illumina or Thermo would do this for free if there was nothing in it for them. They get to interact directly with potential new customers, and get feedback on how their technologies are working in the real world. Some of these conversations might end up as research collaborations. Some of the contacts might end up as new sales contracts too (I know why they are really here)!

What's in it for us: These talks have been reasonably well attended and increase the support we can offer (albeit indirectly), and the feedback from users has been almost universally positive. I'd encourage you to get in touch with your local sales or technical rep and ask if they can help you too. They might even supply doughnuts!

PS: Thanks very much to Carla and Sandra at Illumina for the seminars over the past 12 months. And to Emma for the most recent qPCR training.

PPS: If you missed the registration link to the event on Thursday, send us message via a comment below!

Sunday, 25 January 2015

How many reads do I need to sequence?

A common question we're asked is "how many reads should I use to sequence a sample?" I'm going to focus on genomes, exomes and amplicomes in this post and introduce the Lander-Waterman equation [1]. Other apps are more complex because the number is very much 'how long is a piece of string' for RNA-seq, ChIP-seq and other counting applications - it depends on the complexity of your sample and the sensitivity you'd like to get, but is also affected by the number of replicates you have.

The Lander-Waterman equation

Lander-Waterman: Almost everyone doing NGS is using this equation, even if they are not aware of it. Anyone under 27 was born after it was published (1988), but it is an equation that is good to understand if you are sequencing. Basically it allows you to estimate how many reads of a specific length you need to sequence your genome.

The general equation is C = LN/G where: C = redundancy of coverage, G is the haploid genome size, L is the sequence read length, and N is the number of sequence reads. It can be rearranged to N = CG/L allowing you to compute the number of reads to sequence a genome, exome or amplicome (amplicon-panel) to a desired coverage (this is what we typically discuss when designing experiments).

In the examples below paired-end reads of 125bp from each end of a fragment are used, but these are converted to single 250bp reads for simplicity.

Human genome (3Gb) 30x coverage = 360M reads.
Human exome (150Mb) 50x coverage = 30M reads.
Human amplicome (30x250bp amplicons 0.075Gb) 1000x coverage = 0.3M reads.

[1] Lander, E. S. & Waterman, S. Genomic Mapping by Fingerprinting Random Clones : A Mathematical Analysis. Genomics 239, 231–239 (1988).

Eric Lander founded both the Whitehead and Broad Institutes. Michael S. Waterman is one of the founders of computational biology and gave his name to another important algorithm: Smith-Waterman alignment, he also wrote Computational Genome Analysis with our Director Simon Tavare while at the University of Southern California