fgsea is a Bioconductor package for fast preranked gene set enrichment analysis (GSEA). The performance is achieved by using an algorithm for cumulative GSEA-statistic calculation. This allows to reuse samples between different gene set sizes. See the preprint for algorithmic details.
Inputs
Ranked Genes
A two-column file containing a ranked list of genes is required. The first column must contain the gene identifiers and the second column the statistic used to rank. Gene identifiers must be unique (not repeated) within the file and must be the same type as the identifiers in the Gene Sets file.
Example:
Symbol Ranked Stat VDR 67.198 IL20RA 65.963 MPHOSPH10 51.353 RCAN1 50.269 HILPDA 50.015 TSC22D3 47.496 FAM107B 45.926
Gene Sets
A Gene Sets file is required. This can be a tabular file in Gene Matrix Transposed (GMT) format. In GMT format, each row represents a gene set, with the set name in the first column, a description in the second, then the identifiers of the genes in the set in the following columns, see the example below. GMT files with any identifiers (e.g. Entrez IDs, Symbols) can be used but the same type of identifiers must be present in the Ranked Genes file. More information on GMT format can be found at the Broad website. GMT files for human gene sets can be obtained from the Broad's MSigDB collections.
HALLMARK_APOPTOSIS | http://www.broadinstitute.org/gsea/msigdb/cards/HALLMARK_APOPTOSIS | CASP3 | CASP9 | ... |
HALLMARK_HYPOXIA | http://www.broadinstitute.org/gsea/msigdb/cards/HALLMARK_HYPOXIA | PGK1 | PDK1 | ... |
Alternatively, an RData file containing a collection of gene sets can be input, like the ones provided here containing mouse versions of the MSigDB collections.
Outputs
Wrapper released under MIT License. Copyright (c) 2017 Mark Dunning