Galaxy |

What it does

Estimates differential gene expression for short read sequence count using methods appropriate for count data. If you have paired data you may also want to consider Tophat/Cufflinks. Input must be raw count data for each sequence arranged in a rectangular matrix as a tabular file. Note - no scaling - please make sure you have untransformed raw counts of reads for each sequence.

Performs digital differential gene expression analysis between groups (eg a treatment and control). Biological replicates provide information about experimental variability required for reliable inference.

What it does not do edgeR requires biological replicates. Without replicates you can't account for known important experimental sources of variability that the approach implemented here requires.

Input A count matrix containing sequence names as rows and sample specific counts of reads from this sequence as columns. The matrix must have 2 header rows, the first indicating the group assignment and the second uniquely identifiying the samples. It must also contain a unique set of (eg Feature) names in the first column.

Example:

#       G1:Mut  G1:Mut  G1:Mut  G2:WT   G2:WT   G2:WT
#Feature        Spl1    Spl2    Spl3    Spl4    Spl5    Spl6
NM_001001130    97      43      61      34      73      26
NM_001001144    25      8       9       3       5       5
NM_001001152    72      45      29      20      31      13
NM_001001160    0       1       1       1       0       0
NM_001001177    0       1       0       4       3       3
NM_001001178    0       2       1       0       4       0
NM_001001179    0       0       0       0       0       2
NM_001001180    0       0       0       0       0       2
NM_001001181    415     319     462     185     391     155
NM_001001182    1293    945     987     297     938     496
NM_001001183    5       4       11      7       11      2
NM_001001184    135     198     178     110     205     64
NM_001001185    186     1       0       1       1       0
NM_001001186    75      90      91      34      63      54
NM_001001187    267     236     170     165     202     51
NM_001001295    5       2       6       1       7       0
NM_001001309    1       0       0       1       2       1
...

Please use the "Count reads in features with htseq-count" tool to generate the count matrix.

Output

A tabular file containing relative expression levels, statistical estimates of differential expression probability, R scripts, log, and some helpful diagnostic plots.

Fixed Parameters

Method for allowing the prior distribution for the dispersion to be abundance-dependent used: movingave

False discovery rate adjustment method used: Benjamini and Hochberg (1995)

GLM dispersion estimate used: Tagwise Dispersion

Gene filter used: less than 1 count per million reads

Attribution This tool wraps the edgeR Bioconductor package so all calculations and plots are controlled by that code. See edgeR for all documentation and appropriate attribution. Recommended reference is Mark D. Robinson, Davis J. McCarthy, Gordon K. Smyth, PMCID: PMC2796818

Attribution When applying the LIMMA (Linear models for RNA-Seq) anlysis the tool also makes use of the limma Bioconductor package. Recommended reference is Smyth, G. K. (2005). Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397--420.