Friday, 15 July 2016

Why is my HiSeq 2500 sequencing taking longer than usual

With the introduction of the HiSeq 4000 we're able to sequence faster and cheaper than ever before. But as we're transitioning the larger projects over to HiSeq 4000 a side-effect is fewer and fewer samples to run on HiSeq 2500; and as we're waiting for samples to fill the 8 lane flowcell that means longer wait times for you. We thought this post might help you determine if you still need to use HiSeq 2500, or if you can migrate over to HiSeq 4000. Most sequencing is taking under 2 weeks, but some people are now waiting up to one month for 2500 data.

We bought the new instruments in Genomics to do large RNA-seq gene expression and exome projects. The HiSeq 4000 has an increased maximum read-length (PE150 vs HiSeq 2500 PE125) and increased cluster density (312M clusters vs HiSeq 2500's 250M) so users can expect to see lower costs for sequencing. As a guide expect to run the following number of samples per application:

  • Genomes - 6 Human genomes (30x coverage) per flowcell in just 3 days
  • Exomes -  90 Nextera exomes (4Gb per exome) per flowcell in under 2 days.
  • RNA-seq - 125 mRNA-seq DGE (20M reads per sample) per flowcell in under 2 days.

How do instruments differ:  The HiSeq 4000 performs very well for RNA-seq and exomes. Data are highly comparable and certainly for new projects you should migrate to HiSeq 4000. If you are in the middle of a project it is probably worth a discussion to decide on the best time to switch, or how to mitigate the longer wait times on hiseq 4000.

The main differences between the machines are the clustering chemistry (either random clusters or patterned flowcells) and the sequencing chemistry (either the original 4-colour SBS, or the NextSeq only 2-colour version). The amount of data they each generate and the costs aso vary and so I've listed them below.

Costs are based on paired-end 150bp reads for equivalence
The NextSeq is the easiest system to run instead of 2500 as it is a single sample/pool per flowcell and generates about the same data as a HiSeq 4000 lane, however it is a different sequencing chemistry. Rapid runs cost the most but "should" generate data almost identical to the normal 8-lane flowcells.

Get in touch via the HelpDesk if you have any questions, or pop down for a chat.

PS: want to know  more about HiSeq 4000? Then read this post on my personal blog - (almost) everything you wanted to know about @illumina HiSeq 4000...and some stuff you didn't