comparison resfinder/scripts/wdl/README.md @ 0:55051a9bc58d draft default tip

Uploaded
author dcouvin
date Mon, 10 Jan 2022 20:06:07 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:55051a9bc58d
1 # Quick guide to running ResFinder with Cromwell
2
3 ### Disclaimer
4 Support is not offered for running Cromwell and no files in this directory is
5 guaranteed to work. These files were uploaded as inspiration. Please do not
6 report issues relating to this directory.
7
8 ## Prepare input files
9
10 Two input files are needed:
11
12 1. input_data.tsv
13 2. input.json
14
15 Templates can be found in the ResFinder directory scripts/wdl.
16
17 ### input_data.tsv
18 Tab separated file. Should contain columns in the following order:
19
20 1. Absolute path to fasta/fastq file 1
21 2. Absolute path to fastq file 2 (Can be empty, but must exist)
22 3. Species
23 4. Type of data, must be one of: assembly, paired
24
25 Each row should contain a single sample.
26
27 #### Species
28 If species cannot be provided put "other" (cases sensitive).
29
30 #### Type of data
31
32 * assembly: Fasta file containing contigs from a de novo assembly.
33 * paired: Couple of fastq files containing read data for foward and reverse
34 reads.
35 * single: **Not implemented** Read data from single-end sequencing.
36
37
38 #### Example
39 ```
40
41 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01_2.fq Escherichia coli paired
42 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05_2.fq Escherichia coli paired
43 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a_2.fq Escherichia coli paired
44 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b_1.fq /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b_2.fq Escherichia coli paired
45 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_01.fa Escherichia coli assembly
46 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_02.fa Escherichia coli assembly
47 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_03.fa Escherichia coli assembly
48 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_05.fa Escherichia coli assembly
49 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09a.fa Escherichia coli assembly
50 /home/projects/cge/apps/resfinder/resfinder/tests/data/test_isolate_09b.fa Escherichia coli assembly
51
52 ```
53
54 ### input.json
55 JSON formatted file containing input and output information.
56
57 The file should consist of a single dict/hash/map with the following keys:
58
59 * Resistance.inputSamplesFile: Absolute path to input_data.tsv
60 * Resistance.outputDir: Absolute path to output directory.
61 * Resistance.geneCov: Fraction of gene coverage needed for resistance gene hits.
62 * Resistance.geneID: Fraction of nucleotide identity needed in resistance gene
63 hits.
64 * Resistance.pointCov: Fraction of gene coverage needed for point mutation gene
65 hits.
66 * Resistance.pointID: Fraction of nucleotide identity needed in point mutation gene
67 hits.
68
69 If running on Computerome and are using the input.json template, you probably
70 won't need to change the following:
71
72 * Resistance.python: Path to python3 interpreter.
73 * Resistance.kma: Path to kma application.
74 * Resistance.blastn: Path to blastn application.
75 * Resistance.resfinder: Path to run_resfinder.py.
76 * Resistance.resDB: Path to ResFinder database.
77 * Resistance.pointDB: Path to PointFinder database
78
79 The values should be the absolute path to the input_data.tsv and the desired
80 output directory, respectively.
81
82 #### Example
83
84 ```json
85
86 {
87 "Resistance.inputSamplesFile": "/home/projects/cge/people/rkmo/delme/res_input.tsv",
88 "Resistance.outputDir": "/home/projects/cge/people/rkmo/delme/",
89 "Resistance.geneCov": 0.6,
90 "Resistance.geneID": 0.8,
91 "Resistance.pointCov": 0.6,
92 "Resistance.pointID": 0.8,
93 "Resistance.python": "python3",
94 "Resistance.kma": "/home/projects/cge/apps/resfinder/resfinder/cge/kma/kma",
95 "Resistance.blastn": "blastn",
96 "Resistance.resfinder": "/home/projects/cge/apps/resfinder/resfinder/run_resfinder.py",
97 "Resistance.resDB": "/home/projects/cge/apps/resfinder/resfinder/db_resfinder",
98 "Resistance.pointDB": "/home/projects/cge/apps/resfinder/resfinder/db_pointfinder"
99 }
100
101 ```
102
103 ## Run Cromwell
104
105 Cromwell needs JAVA to run. Load a valid JAVA module, for example:
106
107 ```bash
108
109 module load openjdk/16
110
111 ```
112
113 A Cromwell call looks like this:
114
115 ```bash
116
117 java -Dconfig.file=<CONF> -jar <CROMWELL> run <WDL> --inputs <JSON>
118
119 ```
120
121 ### <CONF> and <CROMWELL>
122 Computerome specific.
123
124 * <CONF>: Path to Computerome configuration for Cromwell. You need to change
125 this if you are not running Cromwell on Computerome. Computerome path:
126 /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/computerome.conf
127
128 * <CROMWELL>: Path to Cronwell jar file in Computerome:
129 /services/tools/cromwell/50/cromwell-50.jar
130
131 ### <WDL>
132 ResFinder specific.
133
134 * <WDL>: Path to wdl file that specifies how to run ResFinder. Path to
135 resfinder.wdl on Computerome:
136 /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/resfinder.wdl
137
138 ### <JSON>
139 User/Run specific
140
141 Path to input.json. Specifies all the parameters for ResFinder (See above).
142
143 ### Run example
144
145 ```bash
146
147 java -Dconfig.file=/home/projects/cge/apps/resfinder/resfinder/scripts/wdl/computerome.conf -jar /services/tools/cromwell/50/cromwell-50.jar run /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/resfinder.wdl --inputs /home/projects/cge/apps/resfinder/resfinder/scripts/wdl/input.json
148
149 ```
150
151 ### Post run
152
153 All ResFinder output will be located in the provided output directory.
154
155 In the directory where you execute Cromwell the following two directories will
156 also be created:
157
158 * cromwell-executions
159 * cromwell-workflow-logs
160
161 They contain logging information and cached results.