Sequencing

Whether you are a novice or an expert in NGS sequencing technologies, GTAC can help you design a study, prepare DNA or RNA libraries, sequence samples, and analyze the data.

GTAC Sequencing FAQ's

Click on Any Question to see the answer

I'm not familiar with next-generation sequencing. Can the GTAC help me with experiment design?


We would be happy to discuss the details of your experiment with you. To begin the discussion, please submit a Project Inquiry form.
For help with rigorous statistical design or maximization of patient cohorts, contact The Department of Statistical Genomics.


How long does it take to produce data from a sample?


This depends on the type and scale of the project. In general, the following ranges give a sense as to how long each step can take:

  • libraryprep 1- 6 days depending on the prep type and sample size
  • sequencing 1- 4 days depending on the read length
  • analysis 2-10 days depending on the project
Obviously, samples in the queues ahead of yours can increase these turnaround times. At the time the project is being submitted, we can provide an estimate of how long the project will take.


Can I prepare my own library?


Yes, libraries can be submitted directly for sequencing. We recommend that you submit your libraries at a concentration of 10nM in a minimum volume of 20ul. With libraries constructed outside of GTAC, we cannot guarantee sequencing results.


If I am making my own library, how should I QC my final library?


We suggest running your samples on a DNA Bioanalyzer chip to see the expected size distribution and using a fluorometric method such as Qubit to assess the concentration. This gives a close, but not perfect calculation of the molar concentration of fragments in your sample. The best way to determine the concentration of your library is by qPCR, which provides the most accurate estimate of the number of adapter ligated fragments in your sample.
 
Please note that if you see peaks that are roughly 120-140bp, you have adapter dimers in your sample that will be reflected in the sequence data because these smaller fragments will bind and amplify on the flowcell. Also, smaller peaks indicate primer dimers present in your sample and can cause you to miscalculate the concentration of your final library. We recommend purifying the sample with a 1.0X ratio of Ampure XP beads to remove them.

We have also seen instances where the ratio of template to primer is not optimal in the amplification steps, and the result is a bimodal or trimodal distribution of sizes. These samples can be sequenced if the total size is less than ~700bp or if larger molecular weight material is removed to result in a more optimal size range, or you accommodate by reducing the amount of library loaded onto the sequencer. However, this artifact can result in a bias of your data whether you selectively remove the larger population from the sequencing library or not. When sequencing with longer read lengths, this distribution can also result in artificial splice sites. We suggest repeating the amplification step with an increased amount of primer, less template, or fewer cycles if at all possible.

If generating your sequence library in a PCR based manner (without adapter ligation step), we have seen instances where there is significant template remaining that can throw off the concentration and size of your library. We recommend optimization of your PCR to minimize the amount of template required.
 


What is the difference between paired end and single read sequencing runs?


Single read sequencing generates data from only one end of the fragment, ideal for transcript identification when doing RNA-seq or ChIP-seq. While paired end sequence reads are generated from each end of a DNA fragment. Because paired end reads are generated from a single molecule they provide linking information that is useful for assembly or for analyses that require the reads to be accurately mapped back to a large reference genome.


Will GTAC help me analyze my data?


Yes, the GTAC provides analytical support to its customers. A list of the services that we provide and the deliverable data that you will receive can be found here: https://gtac.wustl.edu/services/sequencing/


What is the format of the data and analysis that I will receive?


https://gtac.wustl.edu/services/sequencing/


What if I need more assistance with my analysis?


If you need additional assistance beyond the analysis GTAC offers, we recommend you contact the Center for Biomedical Informatics. They will work with you either on a collaborative level or as a for-fee service. They can be contacted via their website: http://cbmi.wustl.edu/


I don’t have a high-powered computer to analyze my data, can I get access to one?


Yes, for Washington University investigators the Center for High Performance Computing can offer cluster services.


Where can I read more about next generation sequencing?

What loading concentration should I use?


There are a few variables that dictate the optimal loading concentration. First, each instrument has different recommendations. For HiSeq 2500, we have found that a loading concentration between 12-14pM works well for libraries we have prepared, where the base composition is balanced, using indexes that are balanced, and the fragment size range is optimal. For new lab submissions, we recommend starting out at a lower loading concentration (8-10pM) and working up towards the optimal point because a lower than ideal concentration will provide usable data, but at a lower number of reads. However, a loading concentration that is too high can dramatically reduce the amount of clusters that pass the filter and leave you with very few reads to work with. We also suggest starting at a much lower loading concentration for samples that have a skewed base composition, samples using in-line barcoding, and samples with a large size distribution. For MiSeq runs, we typically use a concentration of 10pM, and for HiSeq 3000, we use 250pM.    


Can I run multiple experimental samples on a single lane of sequencing?


Yes, we routinely multiplex experimental samples. For our standard library prep service, we have run up to 96 samples in one lane using indexing. For amplicon sequencing using our PCR core services, we can accommodate up to 384 samples in a single lane. Please be aware that we do not recommend mixing of different library types on the same run. Samples with different size ranges or prepped in different manners can have different clustering efficiencies resulting in unexpected read numbers across samples.


What is indexing, how does it work?


Multiplexing with indexes is a method for tagging samples during Illumina library preparation so that they can be pooled together, sequenced in a single lane, and be separated and analyzed individually after the sequencing is complete. Indexing allows for the full potential of a lane to be utilized even if a single sample only requires the data from a fraction of a lane. The tag that is added during library construction is a unique 6-8 bp DNA tag for most commercial kits. After Read one, an indexing sequencing primer is used to read the tag. Both single read and paired end samples can be indexed. We recommend using this method of multiplexing over the use of in-line barcodes, where the first bases of read 1 would be your tag. The HiSeq software relies on having an even base composition in the first four bases to identify cluster positions. If your base composition is not well balanced among your barcodes, the software will not be able to identify the clusters very well, and many of your potential reads will be filtered out of the data. Illumina instruments also support dual indexing, where there will be a different tag on both ends of the fragments. These instrument will read the i7 indexes as read 2 and the i5 indexes as read 3 and the paired read would be read 4 if necessary.


How much sequencing coverage do I need for my experiment?


Coverage refers to sequencing a DNA target until the sequence information represents a certain multiple of the size of the target.  For example: A 1-fold sequence coverage of a 3 Mb genome means 3 million base pairs are determined, a 6-fold coverage means 18 million base pairs are determined. The point of multiple fold coverage is to distinguish PCR or sequencing-introduced errors from true variants.

  • Whole genome sequence:  30X of the genome size
  • Targeted or exome capture:  200X of the target size
  • RNA-seq: for identifying transcripts rather than variants, only 1X of the transcriptome is required, higher coverage suggested for low abundance transcript analysis, transcript discovery, and alternative splicing analysis
  • ChIP-seq: dependent on the transcription factor or protein of interest - human transcription factors usually require 10 to 50 million sequence reads


What is the difference between the instruments?


The HiSeqs and MiSeq employ the same sequencing by synthesis chemistry. The differences are in the capacity of the flowcells, read lengths available, and run times. The HiSeq 3000 employs the patterned flowcell technology that allows many more clusters to be analyzed in a shorter amount of time. The HiSeq 2500 allows for large data outputs and the most flexibility with the 2 lane flowcells. The MiSeq allows for the longest read lengths and is the best at handling low diversity samples.

HiSeq 3000 HiSeq 2500 MiSeq
312 million reads per lane
2x150 read legths
3.5 day run time
8 lanes per flowcell
1 flowcell at a time
150 million reads per lane
2x150 read lengths
40 hour run time
2 lanes per flowcell
2 flowcells simultaneously
15 million reads per lane
2x250 read lengths
40 hour run time
1 lane per flowcell
1 flowcell at a time