GraphEmbed (version 2.4.0)

Input Matrix:

An feature-observation matrix, with features as rows and observations as columns (e.g. Genes vs Cells)

Observation Classes:

A two-column file with observations in the first column, and an integer representing their assigned class in the second column (e.g. 'YFPCD24X3w_20 2')

Confidence bias for clustering:

Number of links towards closest neighbour with the same class:

Number of links towards denser neighbours with a different class:

Number of nearest neighbours to limit the horizon of where to search for denser neighbours of a different class:

Number of mutual nearest neighbours that define outlier instances.:

Use Advanced Parameters?:

Output Log file:

GraphEmbed

Compute a 2D embedding of a data matrix given supervised class information.

Input: A discrete label for each instance is expected.

A graph is built where nodes are instances and there exist two types of edges:

'knn' edges

An edge to the k-th nearest instance that has the same label.

'k_shift' edges

An edge to the k-th nearest instance that is denser and has a different label

Density is defined as the sum of the pairwise cosine similarity between an instance and all the other instances. The desired edge length is the euclidean distance between the instances. If the endpoints of an edge have the same label then the desired distance is divided by 1 + class_confidence. A k-shift edge is deleted if at least one of the endpoints is an outlier. Outlier nodes are defined as those instances that have no mutual k neighbors.

Finally the embedding is computed as the 2D coordinates of the corresponding graph embedding using the force layout algorithm from Tomihisa Kamada, and Satoru Kawai. "An algorithm for drawing general undirected graphs.", Information processing letters 31, no. 1 (1989): 7-15.