Mercurial > repos > immuneml > immuneml_tools
comparison immuneml_simulate_dataset.xml @ 0:629e7e403e19 draft
"planemo upload commit 2fed2858d4044a3897a93a5604223d1d183ceac0-dirty"
author | immuneml |
---|---|
date | Thu, 01 Jul 2021 11:36:43 +0000 |
parents | |
children | ed3932e6d616 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:629e7e403e19 |
---|---|
1 <tool id="immuneml_simulate_dataset" name="Simulate a synthetic immune receptor or repertoire dataset" version="@VERSION@.0"> | |
2 <description></description> | |
3 <macros> | |
4 <import>prod_macros.xml</import> | |
5 </macros> | |
6 <expand macro="requirements" /> | |
7 <command><![CDATA[ | |
8 | |
9 cp "$yaml_input" yaml_copy && | |
10 immune-ml ./yaml_copy ${html_outfile.files_path} --tool DataSimulationTool && | |
11 | |
12 mv ${html_outfile.files_path}/index.html ${html_outfile} && | |
13 mv ${html_outfile.files_path}/immuneML_output.zip $archive | |
14 | |
15 ]]> | |
16 </command> | |
17 <inputs> | |
18 <param name="yaml_input" type="data" format="txt" label="YAML specification" multiple="false"/> | |
19 </inputs> | |
20 <outputs> | |
21 <data format="zip" name="archive" label="Archive: dataset simulation"/> | |
22 <data format="iml_dataset" name="html_outfile" label="ImmuneML dataset (simulated sequences)"/> | |
23 </outputs> | |
24 | |
25 | |
26 <help><![CDATA[ | |
27 | |
28 This Galaxy tool allows you to quickly make a dummy dataset. | |
29 The tool generates a SequenceDataset, ReceptorDataset or RepertoireDataset consisting of random CDR3 sequences, which could be used for benchmarking machine learning methods or encodings, | |
30 or testing out other functionalities. | |
31 The amino acids in the sequences are chosen from a uniform random distribution, and there is no underlying structure in the sequences. | |
32 | |
33 You can control: | |
34 | |
35 - The amount of sequences in the dataset, and in the case of a RepertoireDataset, the amount of repertoires | |
36 | |
37 - The length of the generated sequences | |
38 | |
39 - Labels, which can be used as a target when training ML models | |
40 | |
41 Note that since these labels are randomly assigned, they do not bear any meaning and it is not possible to train a ML model with high classification accuracy on this data. | |
42 Meaningful labels can be added using the `Simulate immune events into existing repertoire/receptor dataset <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_simulation>`_ Galaxy tool. | |
43 | |
44 For the exhaustive documentation of this tool and an example YAML specification, see the tutorial `How to simulate an AIRR dataset in Galaxy <https://docs.immuneml.uio.no/galaxy/galaxy_simulate_dataset.html>`_. | |
45 | |
46 **Tool output** | |
47 | |
48 This Galaxy tool will produce the following history elements: | |
49 | |
50 - ImmuneML dataset (simulated sequences): a sequence, receptor or repertoire dataset which can be used as an input to other immuneML tools. The history element contains a summary HTML page describing general characteristics of the dataset, including the name of the dataset | |
51 (which is used in the dataset definition of a yaml specification), the dataset type and size, available labels, and a link to download the raw data files. | |
52 | |
53 - Archive: dataset simulation: a .zip file containing the complete output folder as it was produced by immuneML. This folder | |
54 contains the output of the DatasetExport instruction including raw data files. | |
55 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file. | |
56 | |
57 | |
58 ]]> | |
59 </help> | |
60 | |
61 </tool> |