annotate resfinder/scripts/wdl/README.md @ 0:55051a9bc58d draft default tip

Uploaded
author dcouvin
date Mon, 10 Jan 2022 20:06:07 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
1 # Quick guide to running ResFinder with Cromwell
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
2
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
3 ### Disclaimer
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
4 Support is not offered for running Cromwell and no files in this directory is
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
5 guaranteed to work. These files were uploaded as inspiration. Please do not
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
6 report issues relating to this directory.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
7
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
8 ## Prepare input files
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
9
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
10 Two input files are needed:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
11
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
12 1. input_data.tsv
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
13 2. input.json
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
14
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
15 Templates can be found in the ResFinder directory scripts/wdl.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
16
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
17 ### input_data.tsv
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
18 Tab separated file. Should contain columns in the following order:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
19
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
20 1. Absolute path to fasta/fastq file 1
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
21 2. Absolute path to fastq file 2 (Can be empty, but must exist)
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
22 3. Species
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
23 4. Type of data, must be one of: assembly, paired
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
24
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
25 Each row should contain a single sample.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
26
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
27 #### Species
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
28 If species cannot be provided put "other" (cases sensitive).
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
29
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
30 #### Type of data
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
31
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
32 * assembly: Fasta file containing contigs from a de novo assembly.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
33 * paired: Couple of fastq files containing read data for foward and reverse
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
34 reads.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
35 * single: **Not implemented** Read data from single-end sequencing.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
36
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
37
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
38 #### Example
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
39 ```
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
40
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
41 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01_2.fq Escherichia coli paired
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
42 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05_2.fq Escherichia coli paired
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
43 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a_2.fq Escherichia coli paired
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
44 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b_2.fq Escherichia coli paired
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
45 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01.fa Escherichia coli assembly
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
46 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_02.fa Escherichia coli assembly
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
47 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_03.fa Escherichia coli assembly
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
48 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05.fa Escherichia coli assembly
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
49 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a.fa Escherichia coli assembly
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
50 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b.fa Escherichia coli assembly
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
51
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
52 ```
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
53
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
54 ### input.json
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
55 JSON formatted file containing input and output information.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
56
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
57 The file should consist of a single dict/hash/map with the following keys:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
58
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
59 * Resistance.inputSamplesFile: Absolute path to input_data.tsv
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
60 * Resistance.outputDir: Absolute path to output directory.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
61 * Resistance.geneCov: Fraction of gene coverage needed for resistance gene hits.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
62 * Resistance.geneID: Fraction of nucleotide identity needed in resistance gene
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
63 hits.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
64 * Resistance.pointCov: Fraction of gene coverage needed for point mutation gene
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
65 hits.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
66 * Resistance.pointID: Fraction of nucleotide identity needed in point mutation gene
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
67 hits.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
68
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
69 If running on Computerome and are using the input.json template, you probably
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
70 won't need to change the following:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
71
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
72 * Resistance.python: Path to python3 interpreter.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
73 * Resistance.kma: Path to kma application.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
74 * Resistance.blastn: Path to blastn application.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
75 * Resistance.resfinder: Path to run_resfinder.py.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
76 * Resistance.resDB: Path to ResFinder database.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
77 * Resistance.pointDB: Path to PointFinder database
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
78
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
79 The values should be the absolute path to the input_data.tsv and the desired
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
80 output directory, respectively.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
81
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
82 #### Example
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
83
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
84 ```json
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
85
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
86 {
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
87 "Resistance.inputSamplesFile": "/home/projects/cge/people/rkmo/delme/res_input.tsv",
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
88 "Resistance.outputDir": "/home/projects/cge/people/rkmo/delme/",
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
89 "Resistance.geneCov": 0.6,
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
90 "Resistance.geneID": 0.8,
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
91 "Resistance.pointCov": 0.6,
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
92 "Resistance.pointID": 0.8,
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
93 "Resistance.python": "python3",
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
94 "Resistance.kma": "/home/projects/cge/apps/resfinder/resfinder/cge/kma/kma",
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
95 "Resistance.blastn": "blastn",
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
96 "Resistance.resfinder": "/home/projects/cge/apps/resfinder/resfinder/run_resfinder.py",
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
97 "Resistance.resDB": "/home/projects/cge/apps/resfinder/resfinder/db_resfinder",
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
98 "Resistance.pointDB": "/home/projects/cge/apps/resfinder/resfinder/db_pointfinder"
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
99 }
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
100
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
101 ```
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
102
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
103 ## Run Cromwell
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
104
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
105 Cromwell needs JAVA to run. Load a valid JAVA module, for example:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
106
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
107 ```bash
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
108
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
109 module load openjdk/16
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
110
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
111 ```
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
112
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
113 A Cromwell call looks like this:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
114
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
115 ```bash
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
116
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
117 java -Dconfig.file=<CONF> -jar <CROMWELL> run <WDL> --inputs <JSON>
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
118
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
119 ```
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
120
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
121 ### <CONF> and <CROMWELL>
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
122 Computerome specific.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
123
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
124 * <CONF>: Path to Computerome configuration for Cromwell. You need to change
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
125 this if you are not running Cromwell on Computerome. Computerome path:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
126 /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/computerome.conf
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
127
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
128 * <CROMWELL>: Path to Cronwell jar file in Computerome:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
129 /services/tools/cromwell/50/cromwell-50.jar
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
130
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
131 ### <WDL>
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
132 ResFinder specific.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
133
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
134 * <WDL>: Path to wdl file that specifies how to run ResFinder. Path to
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
135 resfinder.wdl on Computerome:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
136 /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/resfinder.wdl
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
137
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
138 ### <JSON>
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
139 User/Run specific
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
140
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
141 Path to input.json. Specifies all the parameters for ResFinder (See above).
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
142
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
143 ### Run example
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
144
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
145 ```bash
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
146
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
147 java -Dconfig.file=/home/projects/cge/apps/resfinder/resfinder/scripts/wdl/computerome.conf -jar /services/tools/cromwell/50/cromwell-50.jar run /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/resfinder.wdl --inputs /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/input.json
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
148
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
149 ```
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
150
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
151 ### Post run
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
152
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
153 All ResFinder output will be located in the provided output directory.
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
154
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
155 In the directory where you execute Cromwell the following two directories will
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
156 also be created:
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
157
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
158 * cromwell-executions
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
159 * cromwell-workflow-logs
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
160
55051a9bc58d Uploaded
dcouvin
parents:
diff changeset
161 They contain logging information and cached results.