The Genomics Core blog: 2014

Thursday, 4 December 2014

Is my NGS library any good?

We've all been there. You bought the extortionately priced kit, you ran the gels, you lovingly removed every single SPRI bead, you sweated in a lab coat for days, and finally you elute your first ever NGS libraries. The question is, how can you tell if you were wasting your time? What if your tube turns out to contain nothing but buffer? Or worse, what if it can be sequenced, but it produces nothing more than a load of expensive gobbledegook?

Never fear, if your experimental design is up to scratch, then you need only three simple quality checks to tell you if your library is a Science paper in the making, or a bit of a dud:

Bioanalyzer for Library Length
qPCR for Concentration
Nanodrop for Chemical Contamination (optional)

1. Bioanalyzer for Size Distribution

The Agilent Bioanalyzer or Tapestation runs 1ul of your library in a microfluidics gel-like cartridge, and shows you the range of sizes in your library, as well as an estimate of library quantity.

A good Bioanalyzer trace will look different depending on the type of library you are assaying. Preferably, your library should appear as a single discrete peak approximating a bell curve. It should be larger than ~150bp, but smaller than ~700bp.

The Bioanalyzer trace is essential for detecting Illumina adapter contamination, which can be spotted is a sharp peak between 100 - 150bp. If you are a member of the CRUK Cambridge Institute, we can train you on how to run the Bioanalyzer and offer you advice on interpreting your Bioanalyzer trace.

A clean library on the Bioanalyzer: this will sequence like a dream

A problematic library on the Bioanalyzer: it will be difficult to sequence this library well.

Once you have run your library on the Bioanalyzer, use manual integration or the region table to select the entire trace and determine the average size of your library. You will need this to calculate your nanomolar concentration later. If you sequence with us, we will ask for this information at submission - it must be accurate in order for us to provide you with a high sequencing yield and quality.

Look out! Certain library prep types do not give an accurate length estimate on the Bioanalyzer due to the presence of secondary structures in the DNA (e.g. Truseq DNA PCR-free). If you're using a kit, the protocol should clearly state if this is the case - and should give you the length to use in quantification calculations.

I wouldn't recommend you use use the Bioanalyzer nmol/l concentration for multiplexing, unless you really know what you are doing - or it is explicitly recommended in your protocol or kit. After all, the Bioanalyzer nmol/l value is only accurate for quantifying certain library prep types, and it is biased by any DNA in your sample which does not contain Illumina adapters.

2. qPCR for Quantification

I like to recommend quantification of libraries by qPCR, using primers designed to target the Illumina adapters. Our NGS service currently uses the KAPA library quantification kit (LQK) for this, and we find it very reliable - but there are alternative kits out there which we haven't tested.

A high quality library should be high concentration, ideally >10nM, but also not too high concentration, ideally <100nM.

If you find your libraries are consistently very high yield (>100nM), then it is likely that you are performing more cycles of PCR than you need; this is likely to give you unnecessarily high PCR duplicate rates in your data. Reduce your protocol 1 PCR cycle at a time until you are reliably getting 10nM - 100nM libraries. Make sure you remember to dilute your library pools to within our submission requirements, currently 10nM - 20nM.

My top tips for high quality qPCR quantification:

Aliquot your qPCR mastermix and your standards into single-use batches prior to first use, to avoid template contamination and the effects of repeated freeze-thaw cycles

Wipe down all working surfaces and pipettes with a DNA degrading cleaning agent e.g. DNA Away/DNAoff/DNAZap, before starting work

Make a serial dilution and take triplicate measurements, use the median concentration result

Check your serial dilution and your replicate measurements give highly reproducible concentration values

Check that your results are all comfortably within the range of your standard curve

If you use our NGS service and you choose to use the KAPA LQK, we can provide you with aliquots of the recommended DNA dilution buffer (Tris-Hcl with 0.05% tween). Also, if you are within the CRUK-CI, we offer training on how to perform real-time PCR, and you can sign out a KAPA qPCR kit from the Genomics Core to take advantage of the Institute’s bulk discount.

If you must know about the Qubit...

Other quant methods like Qubit or Bioanalyzer can be great for some library types, as long as you know what you are doing - but both will over-estimate your library concentration if you have an inefficient adapter ligation reaction. So use them with care.

Our submission guidelines are in nmol/l (nM), so if you use the Qubit you need to convert ng/ul to nM using the following equation:

x: concentration in ng/ul

L: average library length (bp)

y: concentration in nM.

3. Nanodrop for Chemical Contamination

The Nanodrop is a quick and dirty assay for protein and chemical contaminants which interfere with sequencing - including the real killers ethanol and phenol. Test 1ul of each NGS library, preferably before you pool them for submission. I recommend you check that the 260/280 ratio is greater than 1.8, and that the 260/230 ratio is greater than 2.0. The trace should like like this:

A good Nanodrop profile

A bad Nanodrop profile. Do you see the peak at 230nm?

A library with a 260/230 ratio less than 1.8, or a 260/280 measurement less than 2.0, may cluster poorly, and therefore generate low quality data. If you're new to the library preparation process and you can spare the sample I recommend you throw this one away and start again - while paying very careful attention to each cleanup step.

Always use the recommended cleanup method, don't be tempted to swap a bead cleanup for a column, or vice versa, even if it is more convenient! That will waste your time in the long run.

If you've got a contaminant and your library is irreplaceable, consider whether your yield is sufficiently high for you to repeat the final cleanup step. If not, have a chat with your NGS provider and ask if they will try sequencing it anyway. If you sequence with us here at CRUK-CI, we will always try our best to get you sequence data - as long as you know the you run the risk of paying for a lane of data which you can't use.

Whatever happens, do NOT use the Nanodrop quantity measurement for quantifying your DNA/RNA prior to library preparation, OR your final library concentration. DON'T DO IT. This is the most easily avoidable mistake in NGS. Don't be that scientist!

I hope that is enough to get you started. As ever, if you want advice on whether your library is going to sequence well on the Illumina platform, the best place to go is your local NGS facility (if you have one), or Illumina's technical support team: techsupport@illumina.com.

Happy Sequencing!

Friday, 17 October 2014

Indexing 2: Troubleshooting a bad index balance

Indexes are one of the simplest improvements in the last five years of sequencing, with the most incredible far-reaching effects. Today I will share a complementary pair of posts tackling the problems our customers experience most frequently when submitting indexed libraries for sequencing.

Why did I get very different yields for the libraries in my pool?

We've seen this so many times. You think you have carefully quantified and pooled your libraries, and then your sequencing data comes back with a massive variation in the number of reads for each library in your pool. What a nightmare!

Don't be fooled - there is nothing that your sequencing provider can do on the sequencer to cause a variable yield from your different indexes. An imbalance between indexes within your library pool arises during the pooling process, so an imbalanced pool indicates something has gone wrong during pooling.

Normally the problem is one of the following:

Different libraries in the pool are of different lengths
Quantification of the libraries prior to pooling was not accurate
The process of mixing the libraries into the pool was not robust

First check #1: Are your libraries of different average size?

Measure the length of every library prior to pooling on the Bioanalyzer or Tapestation (or similar).
Make sure you are including all of the visible peaks in your length measurement, including any adapter dimers, since they all contribute to the clustering.
Check that all of the libraries in your pool are a similar length to one another

Clustering efficiency is a non-linear function of length, because small fragments cluster disproportionately more efficiently than large ones. So if you mix a library of 200bp 50:50 with a library of 600bp, you will receive much more data for the short 200bp library.

As a guideline, all libraries should ideally be within +/- 50bp of one another.

Then check #2: Was your quantification prior to pooling accurate?

If your quantification is not reproducible then your library balance will be way off, whatever else you do well. When troubleshooting an imbalanced pool, I recommend you repeat quantification on your individual libraries a second time, and see if you receive the same result.

It is worth asking your NGS provider to share their quantification results with you, so you can compare them to your own expectation. No two quantification measurement will ever be in precise agreement, but your NGS provider must have a very robust process in order to provide you with a reliable per-lane yield, so you can use their result as a gold-standard during troubleshooting.

If you are quantifying by qPCR, here are some valuable tips to improve robustness:

Perform quantification measurements in triplicate on your plate
Check your triplicate measurements are within ~0.5 Ct values
Take the Median value of your triplicates
Quantify all libraries which you plan to pool together on a single qPCR plate
Always run a no-template control to check for nonspecific amplification or contamination

If you are quantifying by qubit or bioanalyzer, I recommend that you swap to qPCR as soon as possible - and I bet you will see a better pooling balance afterwards.

Finally, have a look at #3: Was the process of mixing the libraries robust?

A common mistake when pooling is to quantify your library, perform a dilution, and then assume the diluted library will be exactly the concentration you aimed for. Unfortunately this is only true if your original concentration is close to your goal. As a guideline, any dilution greater than 1:5 is unlikely to be sufficiently robust for multiplexing. Using small volumes during dilution steps can really exacerbate this problem

The best practice for diluting highly concentrated libraries prior to pooling is to dilute them to a low value just higher than your goal, then re-quantify, then do a final small dilution to reach your goal. Use large volumes for your dilution steps, and keep your final dilution step as small as possible - and definitely less than 1:5. I often aim for a final 1:2 dilution step.

Consider this simple example:

Library A is at 100nM, so I dilute 1ul in 9ul of buffer to give me 10nM
Library B is at 300nM, so I dilute 1ul in 29ul of buffer to give me 10nM
Library C is at 600nM, so I dilute 1ul in 59ul of buffer to give me 10nM
I then mix 10ul of the diluted A, B and C.

Frankly, my pooling balance is going to be rubbish.

Here's what I should do instead:

Library A is at 100nM, so I dilute 10ul in 40ul of buffer to aim for 20nM, then I re-quantify and find out it is actually at 18nM. I mix 10ul of this with 8ul of buffer to give 10nM
Library B is at 300nM, so I dilute 10ul in 140ul of buffer to aim for 20nM, then I re-quantify and find out it is actually at 22nM. I mix 10ul of this with 12ul of buffer to give me 10nM
Library C is at 600nM, so I dilute 10ul in 290ul of buffer to aim for 20nM, then I re-quantify and find out it is actually at 15nM. I mix 10ul of this with 5ul of buffer to give me 10nM
I then mix 10ul of the diluted A, B and C

My pooling balance will be beautiful

For the true NGS novices out there, if you don't know how I calculated the dilution steps in the example above then check this out.

If you have checked #1, #2, and #3 and everything looks perfect, then get in touch with Illumina's tech support team (techsupport@illumina.com) or with your NGS provider.

Indexing 1: A Simple NGS Pooling How-To Guide

How do I pool my library at a defined concentration?

I get asked this a lot. Our current submission requirements are 10nM - 20nM in 15ul, but what does this mean? Is the total DNA concentration in the pool 10nM, and each individual library therefore much less? Or is it that each library within the pool is at a final concentration of 10nM?

Simply put, our submission guidelines IGNORE your indexes. Quantification and clustering cannot differentiate between indexes on a sample, so all we are interested in is the total quantity of DNA in your pool. So, for example, if you have five libraries in a pool, the final pool DNA concentration must be at least 10nM - which means that each library within that pool is at least 2nM.

Here is the simplest at-a-glance method to dilute and pool your libraries. For more detailed hints and tips read on to my next post!

Quantify and quality check all of your libraries
Select a goal concentration for pooling - at or below the lowest concentration of your set of libraries.
Make sure this is within our current submission guidelines.
Dilute all of your libraries to that concentration, using Illumina Resuspension Buffer, EB, or 10mM Tris pH 8.5 with 0.1% Tween.
Combine an equal volume of all of your libraries in your pool tube

Ta-da! You are ready to submit your pool for sequencing.

Friday, 19 September 2014

Science of the Yesteryear

Are you old enough to remember The Magic Roundabout, Hong-Kong Phooey and Mr Ben? Did you finish your degree barely touching a computer? When you graduated was 'genomics' a mere glint in Fred Sanger's eye? If you answered 'yes' to any of these questions then you, like me, may feel befuddled by the dizzying speed of technological advances.

Don't despair, even when you say you went to 'Glastonbury' in 1990 and realise the app-savvy, linked-in, 'omics'-brains you are talking to weren't even born. If you have spent years in fusty, ill-funded labs only to stumble, blinded, into the light of modern science, here are my rules for survival:

1. Don't cry

2. Even if you don't know what the piece of data being flashed on the screen is telling you, you are still a good person

3. You are still making a contribution, however small

4. Don't waste money on expensive running shoes; you'll never be able to catch the latest advances

5. Accept your limits. You have fewer brain cells than you had when you were 20

6. It is inevitable that one day you will be replaced by a robot, but as the great Loudon Wainwright III said (young folk may be more familiar with his famous son, Rufus), 'at least you've been a has-been and not just a never-was.'

7. It's not your fault; you were born too soon.

Wednesday, 17 September 2014

qPCR quantification using our new Agilent Bravo robot

We have an exciting new instrument in the Genomics Core which will enable us to automate several of our protocols which until now have been quite labour intensive. This is the Bravo Robot by Agilent. After some previous experience with automation, I think that the results we have seen up until now are really quite promising with the aim of saving hands on time and providing consistency and accuracy within protocols.

qPCR test run - One of the first tests we ran upon installation of the Bravo was qPCR quantification. We quantify all libraries which are submitted to our sequencing service so we can aim to generate cluster densities on the flowcell which will yield large amounts of high quality data.

For this test, a single RNA-seq library was quantified using our standard method with the Illumina quantification kit by Kappa Biosystems. To test the reproducibility of the robot, we performed qPCR on this one sample 24 times as this is the maximum number of tubes which can be loaded per run.

In addition to this, each of the same 24 aliquots of this one library was set up in the same way but manually. This test was useful to determine how good the liquid handling on the Bravo is by looking at the reproducibility of the 24 replicates but additionally for comparing that between manual versus automated set up. The library had also been quantified previously so we expected the concentration to be 50nM.

Agilent Bravo Robot

The results show that there is higher variation in concentrations achieved manually although both methods slightly overestimated the expected concentration of 50nM. We saw an average concentration of 55.6nM manually (an 11% increase from 50nM) in comparison to a concentration of 52.7nM (a 5.5% increase from 50nM) on the Bravo.

Despite there being an about 1.8% difference between the average concentrations seen on the manual set-up in comparison to that of the automated, we can see that the Bravo has yielded more consistant results which is what we would expect and also good news. Since this test, we are now quantifying all SLX library submissions using the Bravo.

Although the robot will not be for general use and we will be unable to run individuals qPCR, we will be using it for all qPCR quantification for libraries submitted to us for our NGS service and for generating RNA-seq libraries. Once these protocols become robust within our lab, we will explore the option of utilising the robot in other protocols including Exome libary prep. It is additionally going to be used for automation of ChIP by the Odom group.