The Genomics Core blog: July 2014

Friday, 25 July 2014

PhiX Control - Phact or PhiXion?

Many of our users might have heard us talking about or seen a percentage of their reads aligning to the PhiX genome in the Multi Genome Alignment (see figure below). This is a result of Illumina's recommendation to use the PhiX genome control (see TechNote) for troubleshooting and quality control purposes. There are many features of PhiX that make it a good NGS control: it has a small 5386bp genome, it is a balanced genome (45% GC and 55% AT), and the library is 375bp average making it perfect for clustering and sequencing. The PhiX genome was the first genome to ever be sequenced.

An external file that holds a picture, illustration, etc.
Object name is fgene-05-00031-g001.jpg

The Multi-Genome Alignment report

Why PhiX helps on the sequencer:
We use PhiX control in order to assess the quality of sequencing runs using Sequencing Analysis Viewer or SAV (see image below for an example). Illumina ship PhiX control at 10nM, which we then dilute, denature using our standard protocol and aliquot ready to use prior to clustering. As Illumina suggest, we spike in 1% of PhiX in lanes 1-7 and 5% in lane 8 in all our Hiseq runs, unless requested otherwise. We spike in 5% in our Miseq runs as we see more variable libraries being sequenced here. When checking run performance metrics in SAV, we check the cluster density, clusters passing filter (how many of the clusters are true clusters), error rate, phasing and pre-phasing and also the alignment rate to check the right amount of PhiX spiked in is aligning to the PhiX genome. These results can help us determine whether the problem is associated with the library or the machine, which is the reason we use PhiX to distinguish where the problem lies when a run or a lane doesn't perform well.

PhiX helps our troubleshooting:
We check the run performance at every stage possible: we check after the clustering step to ensure the fluidics delivery is equal across all lanes, we check first base report to check the run looks good at the start of the sequencing and then we check the run metrics throughout the sequencing. When troubleshooting, we look at the same metrics as mentioned earlier, but in a lot more detail and also at many other metrics such as %base and the images to check that the library is balanced and to check the machine is behaving. When a run hasn't performed as expected and we cannot figure out the cause, we may also get Illumina involved and discuss the runs with them.

It can help get better results with funky libraries: There are many different library prep methods making it difficult to predict the performance of every sequencing run. Some methods such as Bisulfite, iClip, Bless, amplicons etc can produce "funky" libraries that might require the use of up to 50% PhiX.

The Genomics Core recommend using a higher percentage of phiX when:

You have a low diversity library (also under clustering can help here as well)
You have small amplicon pools
Bisulfite-seqeuncing, Blessd, amplicon sequencing

Here is the Illumina product code if you would like to order phiX:

PhiX control v3

FC-110-3001

Thursday, 10 July 2014

Agilent’s Clinical Meeting, Haloplex and SureSelect

Agilent’s Clinical Meeting, Haloplex and SureSelect

I recently attended the Agilent Clinical Meeting in London which gave some very informative presentations on the extent to which Next Generation Sequencing is aiding diagnosis of disease and screening in the clinic. Clinical genetics labs need to be able to provide diagnostic tests which have a very rapid turnaround time that are also cost effective.

Many diagnostic tests are currently based on sequencing a specific gene and using Sanger sequencing. However, many people talked about how they are developing panels for targeted sequencing along with studying the exonic regions by exome sequencing. Despite the advances of NGS in the clinic, it was clear that whole genome sequencing is where we want to be heading, to get a complete picture. Unfortunately right now, it is not affordable enough.

After attending this conference it was interesting to see how other labs are using Agilent’s Enrichment and panel solutions so thought I would summarise the technologies here.

Haloplex Target Enrichment

What is Haloplex?

HaloPlex is a Target Enrichment System which can be used for the analysis of genomic regions of interest and is aimed for studying a large number of samples.

How does it work?
1.The workflow appears quite simple, starting with DNA fragmentation using restriction enzymes.

2.Probes which are designed to both ends of a DNA fragment are hybridised to form circular DNA molecules.

3.There is a clean-up step using magnetic streptavidin beads which captures only those fragments containing the biotinylated Haloplex probes. The circular molecules are then closed by ligation.

4.Finally, PCR is used to amplify the targeted fragments ready for sequencing.

What can I enrich for?
SureDesign software can be used to design these custom panels for specific genes or for thousands of exons of interest.

Several clinical labs described how Haloplex technology is enabling them to design diagnostic tests based on screening for specific disease causative genes. Its popularity seemed to be down to its ability to permit a fast turnaround time due to the reduced amount of sample preparation required.

SureSelect Target Enrichment

What is Sureselect?

Agilent’s SureSelect technology enables you to look at the whole exome or at a targeted panel. It has become a very useful tool in focusing on familial disease loci and for validation of whole genome sequencing.

How Does it work?

The SureSelect workflow involves a shearing step of gDNA followed by library preparation incorporating adaptors required for sequencing and indexes for multiplexing. Regions of interest are selected for by a 24hour hybridisation step with biotinylated RNA library baits followed by a cleanup step using magnetic streptavidin beads. The baits can be custom designed using Agilent’s SureDesign software. PCR is then used to amplify these regions which are then ready for sequencing.

Exomes in the Genomics Core

Here in the Genomics core, we are currently using Illumina’s Nextera Rapid Exome kit for Exome sequencing and Fluidigm Access Arrays for generating libraries for targeted sequencing.

Agilent have recently released a new SureSelect kit, SureSelectQXT which combines a transposase-based library prep, followed by target enrichment. We have just received one of these kits and will soon be testing this in the lab.

Thursday, 3 July 2014

To seq or not to seq that is the DGE question.

The most common question asked in Differential Gene Expression (DGE) experimental design meetings at the CI is; "should we do RNA-seq or microarray processing?". It all boils down to what questions you want to answer and how the data will integrate into the bigger experiment. I have described some of the most common questions that are asked or discussed and hopefully this information will be useful in getting you thinking about the direction you want to go.

Why are people doing RNA-seq?
Isn't RNA-seq really expensive?
What about analysis, does it take longer to analyse RNA-seq data?
Do I need as many replicates?
How long does it take?
How many samples can be processed at once?

1. Why are people doing RNA-seq? RNA-seq data allows you to have a greater dynamic range than microarray. RNA-seq is a digital reading (counting number of reads) and microarray is the analogue reading (fluorescents units) this can be useful if you are looking at the extremes of expression. You may if you wish in the future take your prepared library and do a different type of sequencing and analysis to look for splice junctions and other transcriptional changes. It is important to remember that wanting more than DGE needs a completely different experimental design. For microarray processing you are restrained to the design of the array and what species you want to explore, with RNA-seq there is no restraint as long as you have an adequate reference genome/transcriptome to align your data to. There are lots of technical reasons why you would choose one method over the other but I do not think that you can ignore the fact that RNA-seq is the new technology and people are choosing the method as it fashionable and may seem to be more attractive for publications.

2. Isn’t RNA-seq really expensive? Currently the biggest cost in sequencing is the library preparation. In the core we are currently investigating alternative suppliers to reduce this cost. None the less currently sequencing costs are approximately the same as microarray for DGE analysis within the CIGC.

3. What about analysis, does it take longer to analyse RNA-seq data? By its nature more data is created from RNA-seq sequencing so this in its self requires a significant amount of computing time to process the information. Put these aside similar stringent work flows and pipelines are in place to create a comprehensive gene list of the comparison for both processes.

4. Do I need as many replicates? Yes. The design of the experiment for DGE will remain similar for both RNA-seq and microarray which includes replication requirements. Therefore the number of replicates recommended for the experiment will be the same for either RNA-seq or microarray processing.

5. How long does it take? RNA-seq takes about the same amount of time to process samples in the lab as microarray samples. For both it takes just under a week to get to QC’ed cRNA (microarray) or normalised pooled libraries (RNA-seq). We process both protocols within the institute. However, as we no longer have a working microarray scanner on site, the guys at the Department of Pathology kindly perform the scanning step for us.

6. How many samples can be processed at once? Microarray project designs are constrained to multiples of 12 to get the most out of the consumables, due to the way they are manufactured. RNA-seq utilises 96 individual indexes so if processing less than 94 samples (we use 2 for positive controls) all sample can be pooled together. It gets a little more complicated for larger projects but this is also true for microarray processing.