In Galaxy, an immuneML dataset is simply a Galaxy collection containing all relevant files (including an optional metadata file). The Create dataset Galaxy tool allows users to import data from various formats and create immuneML datasets in Galaxy. These datasets are in an optimized binary (Pickle) format, which ensures that you can quickly import the dataset into Galaxy tools without having to repeatedly specify the import parameters.
Before creating a dataset, the relevant data files must first be uploaded to the Galaxy interface. This can be done either by uploading files from your local computer (use the 'Upload file' tool under the 'Get local data' menu), or by fetching remote data from the iReceptor Plus Gateway or VDJdb (see How to import remote AIRR datasets in Galaxy).
The imported immuneML dataset is stored in a Galaxy collection, which will appear as a history item on the right side of the screen, and can later be selected as input to other tools.
The tool has a simplified and an advanced interface. The simplified interface is fully button-based, and relies on default settings for importing datasets. The advanced interface gives full control over import settings through a YAML specification. In most cases, the simplified interface will suffice.
For the exhaustive documentation of this tool and more information about immuneML datasets, see the tutorial How to make an immuneML dataset in Galaxy.
Tool output
This Galaxy tool will produce the following history elements:
- ImmuneML dataset: a sequence, receptor or repertoire dataset which can be used as an input to other immuneML tools. The history element contains a summary HTML page describing general characteristics of the dataset, including the name of the dataset (which is used in the dataset definition of a yaml specification), the dataset type and size, available labels, and a link to download the raw data files.
- create_dataset.yaml: the YAML specification file that was used by immuneML to create the dataset. This file can be downloaded and altered (for example to export files in AIRR format, or use non-standard import parameters), and run again using the 'Advanced' interface.