Thursday 4 December 2014

Is my NGS library any good?

We've all been there. You bought the extortionately priced kit, you ran the gels, you lovingly removed every single SPRI bead, you sweated in a lab coat for days, and finally you elute your first ever NGS libraries. The question is, how can you tell if you were wasting your time? What if your tube turns out to contain nothing but buffer? Or worse, what if it can be sequenced, but it produces nothing more than a load of expensive gobbledegook?

Never fear, if your experimental design is up to scratch, then you need only three simple quality checks to tell you if your library is a Science paper in the making, or a bit of a dud:
  1. Bioanalyzer for Library Length
  2. qPCR for Concentration
  3. Nanodrop for Chemical Contamination (optional)

1. Bioanalyzer for Size Distribution

The Agilent Bioanalyzer or Tapestation runs 1ul of your library in a microfluidics gel-like cartridge, and shows you the range of sizes in your library, as well as an estimate of library quantity.
A good Bioanalyzer trace will look different depending on the type of library you are assaying. Preferably, your library should appear as a single discrete peak approximating a bell curve. It should be larger than ~150bp, but smaller than ~700bp.
The Bioanalyzer trace is essential for detecting Illumina adapter contamination, which can be spotted is a sharp peak between 100 - 150bp. If you are a member of the CRUK Cambridge Institute, we can train you on how to run the Bioanalyzer and offer you advice on interpreting your Bioanalyzer trace.

A clean library on the Bioanalyzer: this will sequence like a dream

A problematic library on the Bioanalyzer: it will be difficult to sequence this library well.

Once you have run your library on the Bioanalyzer, use manual integration or the region table to select the entire trace and determine the average size of your library. You will need this to calculate your nanomolar concentration later. If you sequence with us, we will ask for this information at submission - it must be accurate in order for us to provide you with a high sequencing yield and quality.

Look out! Certain library prep types do not give an accurate length estimate on the Bioanalyzer due to the presence of secondary structures in the DNA (e.g. Truseq DNA PCR-free). If you're using a kit, the protocol should clearly state if this is the case - and should give you the length to use in quantification calculations.

I wouldn't recommend you use use the Bioanalyzer nmol/l concentration for multiplexing, unless you really know what you are doing - or it is explicitly recommended in your protocol or kit. After all, the Bioanalyzer nmol/l value is only accurate for quantifying certain library prep types, and it is biased by any DNA in your sample which does not contain Illumina adapters.

2. qPCR for Quantification

I like to recommend quantification of libraries by qPCR, using primers designed to target the Illumina adapters. Our NGS service currently uses the KAPA library quantification kit (LQK) for this, and we find it very reliable - but there are alternative kits out there which we haven't tested.
A high quality library should be high concentration, ideally >10nM, but also not too high concentration, ideally <100nM. 
If you find your libraries are consistently very high yield (>100nM), then it is likely that you are performing more cycles of PCR than you need; this is likely to give you unnecessarily high PCR duplicate rates in your data. Reduce your protocol 1 PCR cycle at a time until you are reliably getting 10nM - 100nM libraries. Make sure you remember to dilute your library pools to within our submission requirements, currently 10nM - 20nM.

My top tips for high quality qPCR quantification:
  • Aliquot your qPCR mastermix and your standards into single-use batches prior to first use, to avoid template contamination and the effects of repeated freeze-thaw cycles
  • Wipe down all working surfaces and pipettes with a DNA degrading cleaning agent e.g. DNA Away/DNAoff/DNAZap, before starting work
  • Make a serial dilution and take triplicate measurements, use the median concentration result
  • Check your serial dilution and your replicate measurements give highly reproducible concentration values
  • Check that your results are all comfortably within the range of your standard curve
If you use our NGS service and you choose to use the KAPA LQK, we can provide you with aliquots of the recommended DNA dilution buffer (Tris-Hcl with 0.05% tween). Also, if you are within the CRUK-CI, we offer training on how to perform real-time PCR, and you can sign out a KAPA qPCR kit from the Genomics Core to take advantage of the Institute’s bulk discount.

If you must know about the Qubit...

Other quant methods like Qubit or Bioanalyzer can be great for some library types, as long as you know what you are doing - but both will over-estimate your library concentration if you have an inefficient adapter ligation reaction. So use them with care.

Our submission guidelines are in nmol/l (nM), so if you use the Qubit you need to convert ng/ul to nM using the following equation:

x: concentration in ng/ul 
L: average library length (bp)

y: concentration in nM.

3. Nanodrop for Chemical Contamination

The Nanodrop is a quick and dirty assay for protein and chemical contaminants which interfere with sequencing - including the real killers ethanol and phenol. Test 1ul of each NGS library, preferably before you pool them for submission. I recommend you check that the 260/280 ratio is greater than 1.8, and that the 260/230 ratio is greater than 2.0. The trace should like like this:

A good Nanodrop profile

A bad Nanodrop profile. Do you see the peak at 230nm?

A library with a 260/230 ratio less than 1.8, or a 260/280 measurement less than 2.0, may cluster poorly, and therefore generate low quality data. If you're new to the library preparation process and you can spare the sample I recommend you throw this one away and start again - while paying very careful attention to each cleanup step.
Always use the recommended cleanup method, don't be tempted to swap a bead cleanup for a column, or vice versa, even if it is more convenient! That will waste your time in the long run.
If you've got a contaminant and your library is irreplaceable, consider whether your yield is sufficiently high for you to repeat the final cleanup step. If not, have a chat with your NGS provider and ask if they will try sequencing it anyway. If you sequence with us here at CRUK-CI, we will always try our best to get you sequence data - as long as you know the you run the risk of paying for a lane of data which you can't use.

Whatever happens, do NOT use the Nanodrop quantity measurement for quantifying your DNA/RNA prior to library preparation, OR your final library concentration. DON'T DO IT. This is the most easily avoidable mistake in NGS. Don't be that scientist!

I hope that is enough to get you started. As ever, if you want advice on whether your library is going to sequence well on the Illumina platform, the best place to go is your local NGS facility (if you have one), or Illumina's technical support team:

Happy Sequencing!