The Genomics Core blog: 2016

Monday, 24 October 2016

How do I submit my index information into Lablink?

When accepting sequencing submissions in the Genomics Core, there may be instances where we have to contact you if there is an error with your submission form. The most common problems relate to index information. We have put together some instructions here that we hope should make things easier and help us to get started on your sequencing as soon as we can.

Please only follow these instructions if:

· The index sequences you have used are visible in the index sequences tab of the submission form

· There are fewer than 384 samples within your pool.

If the points above are not true, please see section, unspecified index further on in this blog.

1. Completing the sample/reagent label field

1a. Navigate to the index sequences tab of the sample submission form.

1b. Search for your index sequences

1c. Copy the index name from column C of the index sequences tab, e.g A001-A005 to the column Sample/Reagent Label, of the submission form tab.

Figure 1-index sequences tab of the sample submission form

Figure2-submission form

2. Completing the UDF/Index type field

2a. Select the correct UDF/Index type from the drop down menu on the submission form tab.

IMPORTANT –please make sure the Index type field matches column B of the index sequences tab. This ensures that your library goes through our acceptance step. Please see the two following examples.

Example1- I am submitting a Truseq LT library consisting of 5 samples and used indexes A001-A005.

The sample/reagent label on the submission form should read A001-A005.
The UDF/index type should read Truseq LT.

Figure 3 - The index type field next to these indexes is Truseq LT so this is what should be entered into the UDF/Index type field.

Figure 4- Submission form

In most cases, the Index type will match to the indexes you have used as expected. However there are now multiple kits available which share the same indexes.
Because of this, there may be some cases where the index sequences you select will have a different Index type to the library you have made. (see example 2 below) This may affect you if you are submitting for Nextera XT or Nextera.

Example 2- I used indexes N701-N501, N702-N501, N703-N501, N704-N501. I prepared the libraries using a Nextera XT library prep kit.

The sample/reagent label on the submission form should read N701-N501, N702-N501, N703-N501, N704-N501. The UDF/index type should read Nextera and not Nextera XT. This is because the UDF/Index type needs to match column B on the index sequences tab.

Figure 5 - The index type field next to these indexes is Nextera so this is what should be entered into the UDF/Index type field

3. Unspecified Index

If your pool has index sequences not present in the index sequences tab OR if you have a pool which is made up of more than 384 samples, you will need to submit as unspecified index.

In the submission form:

· Sample/reagent label should read – unspecified

· UDF/Index type should read –Unspecified (other)

You should submit your pool as one row on the form. Libraries submitted as unspecified index cannot be demultiplexed by the Genomics Core but we do have a demultiplexing guide on lablink which should give some useful information.

Important-since we have no index sequence information, please write in the comments section of the form the index lengths for Index 1 and Index 2. Without this information your sequencing may be delayed whilst we contact you to check these parameters.
Once you have submitted your libraries, the Genomics Core would like to start working on your sequencing as soon as we can.

If the incorrect index type has been selected, we will need to delete your submission and we would ask you to submit again after making changes to your sample sheet and following the instructions above. Of course whilst this guide should be used to help you, we are always here to discuss this with you in person if you have any questions. Alternatively you can contact us on our helpdesk: genomics-helpdesk@cruk.cam.ac.uk

Sunday, 9 October 2016

Recent papers that the Genomics Core has helped with

I like to highlight some of the really interesting work we've been involved with, or that has come out of the Institute from time to time, and I recently updated our lab home page with links to a couple of papers. i thought I'd take the opportunity to write about them in a bit more detail here. Many of you will already know I run the Genomics Core facility at CRUKs Cambridge Institute. We do a lot of Illumina sequencing! The lab works on a huge number of projects for the research groups here in the Institute, and also across many groups in Cambridge via a long-running sequencing collaboration. We do do some R&D work in my lab, but >90% of our efforts are working with, or for, other research groups.

The papers:

Pereira et al Nature Communications 2016: The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes
Bruna et al Cell 2016: A Biobank of Breast Cancer Explants with Preserved Intra-tumor Heterogeneity to Screen Anticancer Compounds
Lensing et al Nature Methods 2016: DSBCapture: in situ capture and sequencing of DNA breaks).
Murtaza et al Nature Communications 2015: Multifocal clonal evolution characterized using circulating tumour DNA in a case of metastatic breast cancer

Highlights from the last years genomics research include work from the Caldas group who have completed three project over the lat year I've included here; 1) profiling of almost 2500 Breast Cancer patients for mutational analysis of 173 genes using a targeted pull-down (Pereira et al Nature Communications 2016); 2) cancer exomes from Murtaza et al,; 3) PDXs from Bruna et al.; and the Balasubramanian group who have shown that it is possible to capture and sequence double-strand DNA breaks (DSBs) in situ and directly map these at single-nucleotide resolution, enabling the study of DSB origin (Lensing et al. Nature Methods 2016). The rapid speed and unbiased nature of the genome-wide experiments being performed in the Institute, and often prepped and sequenced in the Genomics core continue to increase our understanding cancer biology.

Why is my HiSeq 2500 sequencing taking longer than usual

With the introduction of the HiSeq 4000 we're able to sequence faster and cheaper than ever before. But as we're transitioning the larger projects over to HiSeq 4000 a side-effect is fewer and fewer samples to run on HiSeq 2500; and as we're waiting for samples to fill the 8 lane flowcell that means longer wait times for you. We thought this post might help you determine if you still need to use HiSeq 2500, or if you can migrate over to HiSeq 4000. Most sequencing is taking under 2 weeks, but some people are now waiting up to one month for 2500 data.

Running a big RNA-seq project is easy(ish)

Last year we completed our largest ever RNA-seq project: 528 samples of TruSeq mRNA, 60 lanes of HiSeq 2500 SE50, 13 billion reads - and all in 16 weeks. Being able to do such a large project in such a short time and get high quality data from nearly all samples really demonstrates the robustness of RNA-seq. If you're thinking that a project larger than 96 samples might be too much to consider, then come and talk to us (and Bioinformatics) at a Tuesday afternoon experimental design meeting - and we'll convince you it can be a pretty smooth process.

We've been using Illumina's TruSeq mRNA-seq automated on our Agilent Bravo robot and the sequencing was done on HiSeq 2500, although we're currently moving to HiSeq 4000.

528 samples processed on six-plates of RNA-seq
QC lanes sequenced and analysed
60 lanes of SE50bp sequencing in total, 10 lanes per plate
12,918,018,345 PF reads for this project (215M reads per lane on average)
24M reads per sample on average
16 weeks from start to finish

This has been a large and complex project where we had lots of discussions along the way. I think that everyone involved has contributed to the success so far: the research group who asked us to do the project, my lab, and also our Bioinformatics Core. The ability to discuss the experiment at different stages, and to focus on QC issues as they arise really makes using the Cores a great place to do your projects.

Sunday, 7 February 2016

Our first paper on the bioRxiv

I just uploaded our paper, which has also been submitted to BioTechniques, onto the bioRxiv preprint server. The work we present comes from an idea I had shortly after first using Agilent's BioAnalyser in 2000. I was blown away by this piece of technology that has become the de facto standard for RNA QC, and has also pretty much replaced gel electrophoresis for DNA fragment analysis in NGS applications. When launched in 1999, it was the only microfulidics instrument for biology applications. The idea was a simple one: can bioanalyser chips be swapped between assays?