GENOME TECHNOLOGY ACCESS CENTER • DEPARTMENT OF GENETICS • WASHINGTON UNIVERSITY SCHOOL OF MEDICINE

Bioinformatics NextGen Analysis

The sequence analysis group performs a number of analyses, which are included in the price of sequencing, to assist customers with their research needs.  The goal of our analyses is to empower investigators by making next generation sequencing accessible to all users.  We do this by performing complex and time consuming steps on a large computational cluster so researchers can investigate specific hypotheses on their personal computers.  To request examples or our standard exports, please contact genetics-gtac@email.wustl.edu 

Analysis Pipelines

DNA Sequence Analysis

GTAC supports several DNA analysis pipelines that can be used for whole genome, exome or targeted capture regions. Generally, we use NovoAlign for sequence alignment and Samtools and Picard Tools for processing the alignments and removing or marking duplicate reads. Variants can be detected using GATKv4 (Human and mouse), Sentieon, Samtools or FreeBayes. Sequence variants are returned in a standard VCF file. The variants can be annotated using SnpEff or ANNOVAR to include information from many biological databases. The annotation and formatting of the variants can be customized to fit your needs. Additional downstream analyses or filtering of SNPs against other samples maybe be available at a nominal rate.


RNA-Seq

GTAC offers a comprehensive analysis of RNA-seq data that includes multiple levels of quality control, differential expression and pathway analysis. Sequence reads are aligned to the reference genome using STAR and gene counts are derived from uniquely aligned reads using Subread:featureCount. To identify differentially expressed genes, the gene counts are imported into R; TMM normalization factors are calculated using EdgeR to adjust for differences in sample read numbers and differential expression analysis is performed in Limma with voom. Pathways significantly altered in sample groups are detected using GAGE with the Gene Ontology (GO) and KEGG pathways.

For more information see:
STAR
EdgeR
Sailfish
Subread
Authors


ChIP-Seq

Sequence reads are aligned to the reference genome using NovoAlign. Peaks are called for each IP and input pairing using MACS2 and the samples are assessed using ChIPQC. MACS is capable of detecting the narrow binding signals of transcription factors and the broad signals of histone modifications. The called peaks are annotated with information about the nearest transcription start site. For experiments with biological replicates a statistical test for differential binding between conditions is performed using the DiffBind Bioconductor package. The pipeline also supports the IDR and PePr methods of creating a consensus peak set for one condition with biological replicates.

16S Analysis

GTAC offers 16S rRNA profiling using QIIME or MVRSION (Multiple 16S Variable Region Species-Level IdentificatiON) a method developed in-house. MVRSION identifies and quantifies members of bacterial communities through simultaneous analysis of multiple variable regions of the bacterial 16S rRNA gene improves the sensitivity and specificity of bacterial species identification and provides a more accurate assessment of their relative abundance in the population. MVRSION combines high-throughput microfluidics for PCR amplification, short read DNA sequencing, and a custom algorithm for optimizing taxonomic assignment. The main output of MVRSION is a table which lists the bacterial species detected in the sample along with the relative abundance estimates. These estimates are fed into QIIME for visualization and statistical analysis of the bacterial communities.

Single Cell RNA-seq

We support multiple single cell RNA-seq protocols. Our most popular scRNA-seq protocol is the 10X Genomics single cell gene expression platform. We also support drop-seq and facs-seq gene expression analysis. The 10X ssRNA gene expression analysis are processed with 10X’s Cell Ranger software. Drop-seq and facs-seq data are processed using variations of our in-house RNA-seq pipeline.


Additional Analysis

We also perform some basic manipulations that allow more advanced users flexibility with downstream analyses:
 

Demultiplexing

Modern sequencers produce tremendous amounts of sequence in a single lane so samples are often mixed in a single lane in a process called multiplexing. During library construction, each sample is tagged with a unique index sequence that is part of the sequencing adapter. Dual indexing (using two indexes) is also commonly used to tag samples. The informatics team “demultiplexes” the sequence data, assigning each read to the appropriate samples. Demultiplexing (demuxing) is included in all of the analytic pipelines. Single and dual indexes are supported, but “barcodes” (index sequences that are attached to the library fragment and sequenced upstream of the inserts) are no longer supported and are not recommended because they often inhibit sequence generation.


Sequence Alignment

Sequence alignment is the first step of many of our analysis pipelines, but can also be run as a stand-alone service. GTAC can align your DNA data using either NovoAlign or BWA – two accurate and highly regarded sequence alignment tools. RNA data is aligned using the STAR aligner.


 

Fully Supported Organisms:

 Human Homo sapien
 Mouse Mus musculus
 Zebrafish Danio rerio
 Fruit fly Drosophila melanogaster
 Nematode Caenorhabditis elegans
 Yeast Saccharomyces cerevisiae
 E. coli Escherichia coli
 

Others Organisms *:

 Chicken Gallus gallus
 Arabidopsis Arabidopsis thaliana
 C. briggsae Caenarhabditis briggsae
 Chlamydamonas Chlamydamonas reihardtii
 Plasmodium Plasmodium falciparum
 Cryptococcus Cryptococcus neoformans
* For analyses on organisms not listed, please contact us at genetics-gtac@email.wustl.edu