Galaxy | Tool Preview

GraphEmbed (version 2.4.0)
An feature-observation matrix, with features as rows and observations as columns (e.g. Genes vs Cells)
A two-column file with observations in the first column, and an integer representing their assigned class in the second column (e.g. 'YFPCD24X3w_20 2')


Compute a 2D embedding of a data matrix given supervised class information.

Input: A discrete label for each instance is expected.

A graph is built where nodes are instances and there exist two types of edges:

  • 'knn' edges
    An edge to the k-th nearest instance that has the same label.
  • 'k_shift' edges
    An edge to the k-th nearest instance that is denser and has a different label

Density is defined as the sum of the pairwise cosine similarity between an instance and all the other instances. The desired edge length is the euclidean distance between the instances. If the endpoints of an edge have the same label then the desired distance is divided by 1 + class_confidence. A k-shift edge is deleted if at least one of the endpoints is an outlier. Outlier nodes are defined as those instances that have no mutual k neighbors.

Finally the embedding is computed as the 2D coordinates of the corresponding graph embedding using the force layout algorithm from Tomihisa Kamada, and Satoru Kawai. "An algorithm for drawing general undirected graphs.", Information processing letters 31, no. 1 (1989): 7-15.