Galaxy | Tool Preview

MAGeCK mle (version 0.5.9.2.1)
Provide a tab-separated count table. Each line in the table should include sgRNA name (1st column), target gene (2nd column) and read counts in each sample. See Help below for more information
You can choose to either provide a design matrix or specify the samples
Provide a design matrix, either a file name or a quoted string of the design matrix. For example, 1,1;1,0. The row of the design matrix must match the order of the samples in the count table (if --include-samples is not specified), or the order of the samples by the --include-samples option
Advanced Options
Advanced Options 0

What it does

MAGeCK mle calculates gene essentiality from CRISPR screens. Compared with the original algorithm in MAGeCK test, MAGeCK mle uses a measurement called beta score to call gene essentialities: a positive beta score means a gene is positively selected, and a negative beta score means a gene is negatively selected. It is similar to the term log-fold change in differential expression, and compared with the original robust ranking aggregation (RRA) algorithm, this measurement has the following advantages:

  • It has only one score for one gene, instead of two scores in RRA: one for positive selection, one for negative selection;
  • It allows a direct comparison across multiple conditions, or even experiments;
  • It is able to incorporate sgRNA efficiency information.

Inputs

sgRNA count file

The sgRNA read count file will be used in -k parameter in the mle command. The read count file should list the names of the sgRNA, the gene it is targeting, followed by the read counts in each sample. Each item should be separated by the tab ('t'). A header line is optional. For example in the studies of T. Wang et al. Science 2014, there are 4 CRISPR screening samples, and they are labeled as: HL60.initial, KBM7.initial, HL60.final, KBM7.final. Here are a few lines of the read count file:

sgRNA gene HL60.initial KBM7.initial HL60.final KBM7.final
A1CF_m52595977 A1CF 213 274 883 175
A1CF_m52596017 A1CF 294 412 1554 1891
A1CF_m52596056 A1CF 421 368 566 759
A1CF_m52603842 A1CF 274 243 314 855
A1CF_m52603847 A1CF 0 50 145 266

Design matrix file

Either the sample labels can be specified in the tool form above, or alternatively, a design matrix file can be provided. The design matrix indicates which sample is affected by which condition. It is generally a binary matrix indicating which sample (indicated by the first column) is affected by which condition (indicated by the first row). For the meanings of the design matrix, check the input file format page.

Samples baseline HL60 KBM7
HL60.initial 1 0 0
KBM7.initial 1 0 0
HL60.final 1 1 0
KBM7.final 1 0 1

The following are the rules for the design matrix file:

Control sgRNA file

The optional Control sgRNAs file is used to generate null distribution when calculating the p values. If this option is not specified, MAGeCK generates the null distribution of RRA scores by assuming all of the genes in the library are non-essential, see More Information below. This approach is sometimes over-conservative, and you can improve this if you know some genes are not essential. By providing the corresponding sgRNA IDs in this option, MAGeCK will have a better estimation of p values. To use this option, you need to prepare a text file specifying the IDs of control sgRNAs, one line for one sgRNA ID.

Outputs

This tool outputs

  • a ranked sgRNA Summary file
  • a ranked Gene Summary file

Optionally, under Output Options you can choose to output

  • a Log file of the analysis

If successful, MAGeCK mle will generate two files, the Gene Summary file (including gene beta scores), and the sgRNA Summary file (including sgRNA efficiency probability predictions).

Gene Summary file (including beta scores)

An example of the gene summary output file is below. This file includes the beta scores in two conditions specified in the design matrix (HL60|beta and KBM7|beta), and the associated statistics. For more information, check the output format specification of the mageck test Gene Summary file.

Gene sgRNA HL60|beta HL60|z HL60|p-value HL60|fdr HL60|wald-p-value HL60|wald-fdr KBM7|beta KBM7|ze KBM7|p-value KBM7|fdr KBM7|wald-p-value KBM7|wald-fdr
RNF14 10 0.24927 0.72077 0.36256 0.75648 0.47105 0.9999 0.57276 1.6565 0.06468 0.32386 0.097625 0.73193
RNF10 10 0.10159 0.29373 0.92087 0.98235 0.76896 0.9999 0.11341 0.32794 0.90145 0.97365 0.74296 0.98421
RNF11 10 3.6354 10.513 0.00028 0.021739 7.5197e-26 1.3376e-22 2.5928 7.4925 0.0014898 0.032024 6.7577e-14 1.33e-11

sgRNA Summary file (including sgRNA efficiency probability predictions)

An example of the sgRNA ranking output is as follows:

sgrna Gene control_count treatment_count control_mean treat_mean LFC control_var adj_var score p.low p.high p.twosided FDR high_in_treatment
INO80B_m74682554 INO80B 0.0/0.0 1220.15/1476.14 0.810860 1348.15 10.70 0.0 19.0767 308.478 1.0 1.11022e-16 2.22044e-16 1.57651e-14 True
NHS_p17705966 NHS 1.62172/3.90887 2327.09/1849.95 2.76529 2088.52 9.54 2.61554 68.2450 252.480 1.0 1.11022e-16 2.22044e-16 1.57651e-14 True

The contents of each column are as follows:


More Information

Overview of the MAGeCK algorithm

Briefly, read counts from different samples are first median-normalized to adjust for the effect of library sizes and read count distributions. Then the variance of read counts is estimated by sharing information across features, and a negative binomial (NB) model is used to test whether sgRNA abundance differs significantly between treatments and controls. This approach is similar to those used for differential RNA-Seq analysis. We rank sgRNAs based on P-values calculated from the NB model, and use a modified robust ranking aggregation (RRA) algorithm named α-RRA to identify positively or negatively selected genes. More specifically, α-RRA assumes that if a gene has no effect on selection, then sgRNAs targeting this gene should be uniformly distributed across the ranked list of all the sgRNAs. α-RRA ranks genes by comparing the skew in rankings to the uniform null model, and prioritizes genes whose sgRNA rankings are consistently higher than expected. α-RRA calculates the statistical significance of the skew by permutation, and a detailed description of the algorithm is presented in the Materials and methods section of the MAGeCK paper. Finally, MAGeCK reports positively and negatively selected pathways by applying α-RRA to the rankings of genes in a pathway.

For more information on using MAGeCK, see the MAGeCK website here.