What it does
Uses DESeq2 version 1.40.2 to estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
Inputs
Count Files
DESeq2 takes count tables generated from featureCounts, HTSeq-count or StringTie as input. Count tables must be generated for each sample individually. One header row is assumed, but files with no header (e.g from HTSeq) can be input with the Files have header? option set to No. DESeq2 is capable of handling multiple factors that affect your experiment. The first factor you input is considered as the primary factor that affects gene expressions. Optionally, you can input one or more secondary factors that might influence your experiment. But the final output will be changes in genes due to primary factor in presence of secondary factors. Each factor has two levels/states. You need to select appropriate count table from your history for each factor level.
The following table gives some examples of factors and their levels:
Factor | Factor level 1 | Factor level 2 |
Treatment | Treated | Untreated |
Condition | Knockdown | Wildtype |
TimePoint | Day4 | Day1 |
SeqType | SingleEnd | PairedEnd |
Gender | Female | Male |
Note: Output log2 fold changes are based on primary factor level 1 vs. factor level2. Here the order of factor levels is important. For example, for the factor 'Treatment' given in above table, DESeq2 computes fold changes of 'Treated' samples against 'Untreated', i.e. the values correspond to up or down regulations of genes in Treated samples.
DESeq2 can also take transcript-level counts from quantification tools such as, kallisto, Salmon and Sailfish, and this Galaxy wrapper incorporates the Bioconductor tximport package to process the transcript counts for DESeq2.
Salmon or Sailfish Files
Salmon or Sailfish quant.sf files can be imported by setting type to Salmon or Sailfish respectively above. Note: for previous version of Salmon or Sailfish, in which the quant.sf files start with comment lines you will need to remove the comment lines before inputting here. An example of the format is shown below.
Example:
Name | Length | EffectiveLength | TPM | NumReads |
NR_001526 | 164 | 20.4518 | 0 | 0 |
NR_001526_1 | 164 | 20.4518 | 0 | 0 |
NR_001526_2 | 164 | 20.4518 | 0 | 0 |
NM_130786 | 1764 | 1956.04 | 2.47415 | 109.165 |
NR_015380 | 2129 | 2139.53 | 1.77331 | 85.5821 |
NM_001198818 | 9360 | 7796.58 | 2.38616e-07 | 4.19648e-05 |
NM_001198819 | 9527 | 7964.62 | 0 | 0 |
NM_001198820 | 9410 | 7855.78 | 0 | 0 |
NM_014576 | 9267 | 7714.88 | 0.0481114 | 8.37255 |
kallisto Files
kallisto abundance.tsv files can be imported by setting type to kallisto above. An example of the format is shown below.
Example:
target_id | length | eff_length | est_counts | tpm |
NR_001526 | 164 | 20.4518 | 0 | 0 |
NR_001526_1 | 164 | 20.4518 | 0 | 0 |
NR_001526_2 | 164 | 20.4518 | 0 | 0 |
NM_130786 | 1764 | 1956.04 | 109.165 | 2.47415 |
NR_015380 | 2129 | 2139.53 | 85.5821 | 1.77331 |
NM_001198818 | 9360 | 7796.58 | 4.19648e-05 | 2.38616e-07 |
NM_001198819 | 9527 | 7964.62 | 0 | 0 |
NM_001198820 | 9410 | 7855.78 | 0 | 0 |
NM_014576 | 9267 | 7714.88 | 8.37255 | 0.0481114 |
Output
DESeq2 generates a tabular file containing the different columns and optional visualized results as PDF.
Column | Description |
1 | Gene Identifiers |
2 | mean normalised counts, averaged over all samples from both conditions |
3 | the logarithm (to basis 2) of the fold change (See the note in inputs section) |
4 | standard error estimate for the log2 fold change estimate |
5 | Wald statistic |
6 | p value for the statistical significance of this change |
7 | p value adjusted for multiple testing with the Benjamini-Hochberg procedure which controls false discovery rate (FDR) |
By selecting Output sample size factors in the "Output options" selection box, the size factors used to normalize the samples can also be output as a tabular file.