Clustering using RaceID (version 0.2.3+galaxy3)

RaceID is a clustering algorithm for the identification of cell types from single-cell RNA-sequencing data. It was specifically designed for the detection of rare cells which correspond to outliers in conventional clustering methods.

This module performs clustering, and outlier detection and ultimately tells you how well defined your clusters are (even if the resultant tSNE plots look messy).

The tool generates the following:

- A list of the most differentially expressed genes in each cluster

- Cluster stability plots:

- The mean within-cluster dispersion as a function of the cluster number and highlights the saturation point inferred based on the saturation criterion applied by RaceID3. The number of clusters where the change in within-cluster dispersion upon adding further clusters approaches linear behaviour demarcates the saturation point is highlighted in blue.

- The point where this flattens out gives you a rough estimate of how many clusters there are in your analysis.

- A scatter plot showing the gene expression variance as a function of the mean and the inferred polynomial fit of the background model, as well as a local regression.

- This tells you which genes are the most significant against a background model of random expression.

- A Jaccard stability plot which tells you how well defined your clusters are prior to outlier identification.

- Good stable clusters should near the 0.8 mark, but 0.6 is acceptable, and for a large number of clusters, one or two low scoring clusters are also acceptable.

- Heatmaps:

- The initial and final clustering (as determined using random forest)
- For each of the most differentially expressed genes in each cluster

The tool requires the RDS input from the previous filtering / normalisation / confounder removal step to work.