LDA Effect Size (LEfSe) (Segata et. al 2010) is an algorithm for high-dimensional biomarker discovery and explanation that identifies genomic features (genes, pathways, or taxa) characterizing the differences between two or more biological conditions (or classes, see figure below). It emphasizes both statistical significance and biological relevance, allowing researchers to identify differentially abundant features that are also consistent with biologically meaningful categories (subclasses). LEfSe first robustly identifies features that are statistically different among biological classes. It then performs additional tests to assess whether these differences are consistent with respect to expected biological behavior.
Specifically, we first use the non-parametric factorial Kruskal-Wallis (KW) sum-rank test to detect features with significant differential abundance with respect to the class of interest; biological significance is subsequently investigated using a set of pairwise tests among subclasses using the (unpaired) Wilcoxon rank-sum test. As a last step, LEfSe uses Linear Discriminant Analysis to estimate the effect size of each differentially abundant feature and, if desired by the investigator, to perform dimension reduction. LEfSe consists of six modules performing the following steps (see the figure below). The first step consists of uploading your file by using Galaxy's "Get-Data / Upload-file" The next steps are: A) Format Data for LEfSe: selects the structure of the problem (classes, subclasses, subjects) and formats the tabular abundance data for the B module B) LDA Effect Size (LEfSe): performs the analysis using the data formatted with module A and provides input for the visualization modules (C, D, E, F) C) Plot LEfSe Results: graphically reports the discovered biomarkes (output of B) with their effect sizes D) Plot Cladogram: graphically represents the discovered biomarkers (output of B) in a taxonomic tree specified by the hierarchical feature names (not available for non-hierarchical features) E) Plot One Feature: plots the row values of a feature (biomarker or not) as an abundance histogram with classes and subclasses structure (only one feature at the time) F) Plot Differential Features: plots the row values of all features (biomarkers or not) as abundance histograms with classes and subclasses structure and provides a zip archive of the figures |
hg clone https://toolshed.g2.bx.psu.edu/repos/george-weingart/lefse
Name | Description | Version | Minimum Galaxy Version |
---|---|---|---|
1.0 | any | ||
1.0 | any | ||
1.0 | any | ||
1.0 | any | ||
1.0 | any | ||
1.0 | any |