The data retrieval tool presented here allows the user to retrieve expression matrices and metadata for any public experiment available at Human Cell Atlas data portal.
To use it, simply set the name, or label, or ID for the desired project, which can be found at the HCA data browser (https://data.humancellatlas.org/explore/projects), and select the desired matrix format (Matrix Market or Loom).
For projects that have more than one organism, one needs to be specified. If none is specified, then the job will fail and the available options to be specified will be listed in the stdout of the job.
Outputs will be:
When "Matrix Market" is seleted, outputs are in 10X-compatible Matrix Market format:
Matrix (txt):
Contains the expression values for genes (rows) and cells (columns) in raw counts. This text file is formatted as a Matrix Market file, and as such it is accompanied by separate files for the gene identifiers and the cells identifiers.
Genes (tsv):
Identifiers (column repeated) for the genes present in the matrix of expression, in the same order as the matrix rows.
Barcodes (tsv):
Identifiers for the cells of the data matrix. The file is ordered to match the columns of the matrix.
Experiment Design file (tsv):
Contains metadata for the different cells of the experiment.
When "Loom" is selected, output is a single Loom HDF5 file:
Loom (h5):
Contains expression values for genes (rows) and cells (columns) in raw counts, cell metadata table and gene metadata table, in a single HDF5 file with specification defined in http://linnarssonlab.org/loompy/format/index.html.
Version history
0.0.4+galaxy0: Retrieves data from EBI FTP until an equivalent Matrix service for DCP 2.0 is established. Deals with multi organisms studies.
0.0.2+galaxy0: Initial contribution. Ni Huang and Pablo Moreno, Teichmann Lab at Wellcome Sanger Institute and Expression Atlas team https://www.ebi.ac.uk/gxa/home at EMBL-EBI https://www.ebi.ac.uk/.