Galaxy | Tool Preview

MAGeCK pathway (version 0.5.9.2)
The gene ranking file generated by the gene test step. Only one enrichment comparison will be performed.
The pathway file in GMT format. See Help below for more information
Advanced Options
Advanced Options 0

What it does

MAGeCK pathway can also invoke robust ranking aggregation (RRA) to test if a pathway is enriched in one particular gene ranking, see More Information below.


Inputs

Gene Ranking files

A gene ranking file is required as input and can be produced using mageck test. An example of the gene ranking file (gene summary file) is as follows:

id num neg|score neg|p-value neg|fdr neg|rank neg|goodsgrna neg|lfc pos|score pos|p-value pos|fdr pos|rank pos|goodsgrna pos|lfc
ESPL1 12 6.4327e-10 7.558e-06 7.9e-05 1 -2.35 11 0.99725 0.99981 0.999992 615 0 -0.07
RPL18 12 6.4671e-10 7.558e-06 7.9e-05 2 -2.12 11 0.99799 0.99989 0.999992 620 0 -0.32
CDK1 12 2.6439e-09 7.558e-06 7.9e-05 3 -1.93 12 1.0 0.99999 0.999992 655 0 -0.12

Pathway file

MAGeCK pathway also requires a pathway file in GMT format. The GMT (Gene Matrix Transposed) file format is a tab delimited file format that describes gene sets and is consistent with the GMT file in Gene Set Enrichment Analysis (GSEA). In the GMT format, each row represents a gene set, with the first column containing the gene set name, and the second column containing a description for the gene set, followed by the names or ids of the genes in the gene set. You can download different GMT pathway files directly from the GSEA MSigDB database. An example of the GMT format is as follows:

Gene Set Name Description Genes
KEGG_RIBOSOME http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_RIBOSOME RPL35 RPL23 RPL3...

Outputs

Pathway summary file

An example of the pathway summary output file is as follows:

id num neg|score neg|rra neg|p-value neg|fdr neg|rank neg|goodgene neg|lfc pos|score pos|rra pos|p-value pos|fdr pos|rank pos|goodgene pos|lfc
KEGG_RIBOSOME 88 1 0 0 0 1 0 0 1 0 0 0 1 00    

The contents of each column is as follows:

Genes are ranked by the p.neg field (by default). If you need a ranking by the p.pos, you can use the --sort-criteria option.


More Information

Overview of the MAGeCK algorithm

Briefly, read counts from different samples are first median-normalized to adjust for the effect of library sizes and read count distributions. Then the variance of read counts is estimated by sharing information across features, and a negative binomial (NB) model is used to test whether sgRNA abundance differs significantly between treatments and controls. This approach is similar to those used for differential RNA-Seq analysis. We rank sgRNAs based on P-values calculated from the NB model, and use a modified robust ranking aggregation (RRA) algorithm named α-RRA to identify positively or negatively selected genes. More specifically, α-RRA assumes that if a gene has no effect on selection, then sgRNAs targeting this gene should be uniformly distributed across the ranked list of all the sgRNAs. α-RRA ranks genes by comparing the skew in rankings to the uniform null model, and prioritizes genes whose sgRNA rankings are consistently higher than expected. α-RRA calculates the statistical significance of the skew by permutation, and a detailed description of the algorithm is presented in the Materials and methods section of the MAGeCK paper. Finally, MAGeCK reports positively and negatively selected pathways by applying α-RRA to the rankings of genes in a pathway.

For more information on using MAGeCK, see the MAGeCK website here.