Galaxy | Tool Preview

MAGeCKs test (version 0.5.9.2.1)
A tab-separated file of read counts. See Help below for format
You can choose to either specify the treated samples or the control
If sample label is provided, the labels must match the labels in the first line of the count table, separated by comma (,); for example, HL60.final,KBM7.final. For sample index, 0,2 means the 1st and 3rd samples are treatment experiments. See Help below for a detailed description.
If sample label is provided, the labels must match the labels in the first line of the count table, separated by comma (,). Default is all the samples not specified in treatment experiments. See Help below for a detailed description.
Output Options
Output Options 0
Advanced Options
Advanced Options 0

What it does

Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK) is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens (or GeCKO) technology. MAGeCK can be used for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. MAGeCK identifies both positively and negatively selected genes simultaneously and reports robust results across different experimental conditions. MAGeCK is developed and maintained by Wei Li and Han Xu from Prof. Xiaole Shirley Liu's lab at the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health. MAGeCK has been used to identify functional lncRNAs from screens with close to 100% validation rate.


mageck test

This tests and ranks sgRNAs and genes based on the table provided.

Inputs

sgRNA count file

The input sgRNA count file be tab-delimited and list the names of the sgRNA, the gene it is targeting, followed by the read counts in each sample. A header line is optional. For example in the studies of T. Wang et al. Science 2014, there are 4 CRISPR screening samples, and they are labeled as: HL60.initial, KBM7.initial, HL60.final, KBM7.final, see below.

Example:

sgRNA gene HL60.initial KBM7.initial HL60.final KBM7.final
A1CF_m52595977 A1CF 213 274 883 175
A1CF_m52596017 A1CF 294 412 1554 1891
A1CF_m52596056 A1CF 421 368 566 759
A1CF_m52603842 A1CF 274 243 314 855
A1CF_m52603847 A1CF 0 50 145 266

Sample Labels

In the Treatment and Control inputs above, you can use either Sample Label or Sample Index to specify samples. If sample label is used, the labels MUST match the sample labels in the first line of the count table. For example, "HL60.final,KBM7.final". You can also use sample index to specify samples. The index of the sample is the order it appears in the sgRNA read count file, starting from 0. The index is used in the Treatment and Control inputs. In the example above, there are four samples, and the index of each sample is as follows:

sample index
HL60.initial 0
KBM7.initial 1
HL60.final 2
KBM7.final 3

Control sgRNA file

The optional Control sgRNA file is used to generate null distribution when calculating the p values. If this option is not specified, MAGeCK generates the null distribution of RRA scores by assuming all of the genes in the library are non-essential. This approach is sometimes over-conservative, and you can improve this if you know some genes are not essential. By providing the corresponding sgRNA IDs in this option, MAGeCK will have a better estimation of p values. To use this option, you need to prepare a text file specifying the IDs of control sgRNAs, one line for one sgRNA ID.


Outputs

This tool outputs

  • a ranked sgRNA Summary file
  • a ranked Gene Summary file

Optionally, under Output Options you can choose to output

  • a Normalized Counts table
  • a PDF of the plots
  • the .R and .Rnw files to generate the report
  • a Log file of the analysis

sgRNA Summary file

An example of the sgRNA ranking output is as follows:

sgrna Gene control_count treatment_count control_mean treat_mean LFC control_var adj_var score p.low p.high p.twosided FDR high_in_treatment
INO80B_m74682554 INO80B 0.0/0.0 1220.15/1476.14 0.810860 1348.15 10.70 0.0 19.0767 308.478 1.0 1.11022e-16 2.22044e-16 1.57651e-14 True
NHS_p17705966 NHS 1.62172/3.90887 2327.09/1849.95 2.76529 2088.52 9.54 2.61554 68.2450 252.480 1.0 1.11022e-16 2.22044e-16 1.57651e-14 True

The contents of each column are as follows:

Gene Summary file

An example of the gene summary output file is as follows:

id num neg|score neg|p-value neg|fdr neg|rank neg|goodsgrna neg|lfc pos|score pos|p-value pos|fdr pos|rank pos|goodsgrna pos|lfc
ESPL1 12 6.4327e-10 7.558e-06 7.9e-05 1 -2.35 11 0.99725 0.99981 0.999992 615 0 -0.07
RPL18 12 6.4671e-10 7.558e-06 7.9e-05 2 -2.12 11 0.99799 0.99989 0.999992 620 0 -0.32
CDK1 12 2.6439e-09 7.558e-06 7.9e-05 3 -1.93 12 1.0 0.99999 0.999992 655 0 -0.12

The contents of each column is as follows:

Genes are ranked by the p.neg field (by default). If you need a ranking by the p.pos, you can use the --sort-criteria option.


More Information

Overview of the MAGeCK algorithm

Briefly, read counts from different samples are first median-normalized to adjust for the effect of library sizes and read count distributions. Then the variance of read counts is estimated by sharing information across features, and a negative binomial (NB) model is used to test whether sgRNA abundance differs significantly between treatments and controls. This approach is similar to those used for differential RNA-Seq analysis. We rank sgRNAs based on P-values calculated from the NB model, and use a modified robust ranking aggregation (RRA) algorithm named α-RRA to identify positively or negatively selected genes. More specifically, α-RRA assumes that if a gene has no effect on selection, then sgRNAs targeting this gene should be uniformly distributed across the ranked list of all the sgRNAs. α-RRA ranks genes by comparing the skew in rankings to the uniform null model, and prioritizes genes whose sgRNA rankings are consistently higher than expected. α-RRA calculates the statistical significance of the skew by permutation, and a detailed description of the algorithm is presented in the Materials and methods section of the MAGeCK paper. Finally, MAGeCK reports positively and negatively selected pathways by applying α-RRA to the rankings of genes in a pathway.

For more information on using MAGeCK, see the MAGeCK website here.