Mercurial > repos > iss > eurl_vtec_wgs_pt
comparison scripts/ReMatCh/utils/README.md @ 0:c6bab5103a14 draft
"planemo upload commit 6abf3e299d82d07e6c3cf8642bdea80e96df64c3-dirty"
author | iss |
---|---|
date | Mon, 21 Mar 2022 15:23:09 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:c6bab5103a14 |
---|---|
1 ReMatCh | |
2 ======= | |
3 *Reads mapping against target sequences, checking mapping and consensus sequences production* | |
4 | |
5 <https://github.com/B-UMMI/ReMatCh> | |
6 | |
7 Table of Contents | |
8 -- | |
9 | |
10 [Combine alignment consensus](#combine-alignment-consensus) | |
11 [Convert Ns to gaps](#convert-ns-to-gaps) | |
12 [gffParser](#gffparser) | |
13 [Restart ReMatCh](#restart-rematch) | |
14 [Strip Alignment](#strip-alignment) | |
15 | |
16 | |
17 ## Combine Alignment Consensus | |
18 | |
19 Combine the alignment consensus sequences from ReMatCh first run by reference sequences into single files. | |
20 | |
21 **Dependencies** | |
22 - Python v3 | |
23 | |
24 **Usage** | |
25 | |
26 usage: combine_alignment_consensus.py [-h] [--version] | |
27 -w /path/to/rematch/working/directory/ | |
28 [-o /path/to/output/directory/] | |
29 | |
30 Combine the alignment consensus sequences from ReMatCh first run by reference sequences into single files | |
31 | |
32 optional arguments: | |
33 -h, --help show this help message and exit | |
34 --version Version information | |
35 | |
36 Required options: | |
37 -w /path/to/rematch/working/directory/ | |
38 --workdir /path/to/rematch/working/directory/ Path to the directory where ReMatCh was running (default: None) | |
39 | |
40 General facultative options: | |
41 -o --outdir /path/to/output/directory/ Path to the directory where the combined sequence files will stored (default: .) | |
42 | |
43 | |
44 | |
45 ## Convert Ns to Gaps | |
46 | |
47 | |
48 Convert the Ns into gaps in a fasta file. | |
49 | |
50 **Dependencies** | |
51 - Python (2.7.x) | |
52 | |
53 **Usage** | |
54 | |
55 usage: convert_Ns_to_gaps.py [-h] [--version] | |
56 -i /path/to/input/file.fasta | |
57 -o /path/to/converted/output/file.fasta | |
58 | |
59 Convert the Ns into gaps | |
60 | |
61 optional arguments: | |
62 -h, --help show this help message and exit | |
63 --version Version information | |
64 | |
65 Required options: | |
66 -i --infile /path/to/input/file.fasta Path to the fasta file (default: None) | |
67 -o --outfile /path/to/converted/output/file.fasta Converted output fasta file (default: converted_Ns_to_gaps.fasta) | |
68 | |
69 | |
70 | |
71 ## gffParser | |
72 | |
73 | |
74 Parser for GFF3 files, as the ones obtained by [PROKKA](https://github.com/tseemann/prokka). This files require to have both the features and sequence. It will retrieve the CDS sequences in the GFF file, allowing these to be extended by the number of nucleotides specifiend in `--extraSeq`. A selection of CDS of interest to be parsed can also be obtained by providing `--select` with a txt file of the IDs of interest, one per line. As an alternative, wanted sequences can be obtained from the GFF file from a txt file containing the coontig ID, start and end position (one per line) of the sequences of interest, using the `-fromFile` option. `-extraSeq` can also be obtain through this method. | |
75 | |
76 **Dependencies** | |
77 - Python (2.7.x) | |
78 - [Biopython](http://biopython.org/) (1.68 or similar) | |
79 | |
80 **Usage** | |
81 | |
82 usage: gffParser.py [-h] | |
83 -i INPUT [-x EXTRASEQ] [-k] [-o OUTPUTDIR] | |
84 [-s SELECT] [-f FROMFILE] [--version] | |
85 | |
86 GFF3 parser for feature sequence retrival, containing both sequences and annotations. | |
87 | |
88 optional arguments: | |
89 -h, --help Show this help message and exit | |
90 -i --input INPUT | |
91 GFF3 file to parse, containing both sequences and annotations (like the one obtained from PROKKA). | |
92 -x --extraSeq EXTRASEQ | |
93 Extra sequence to retrieve per feature in gff. | |
94 -k, --keepTemporaryFiles | |
95 Keep temporary gff(without sequence) and fasta files. | |
96 -o --outputDir OUTPUTDIR | |
97 Path to where the output is to be saved. | |
98 -s --select SELECT | |
99 txt file with the IDs of interest, one per line | |
100 -f --fromFile FROMFILE | |
101 Sequence coordinates to be retrieved. Requires contig ID and coords (contig,strart,end) in a csv file, one per line. | |
102 --version Display version, and exit. | |
103 | |
104 **Output** | |
105 | |
106 *<filename>.fasta* | |
107 Multi-fasta file with the retrieved sequences. | |
108 Headers will contain the feature ID, followed by '=', and the position of that feature in the sequence, starting with the original sequence ID,a '# and' the start and end coordinates separated with '_' (>featureID=contig#start_end). | |
109 If the `--fromFile` option is used, there's no feature ID, so the header will only contain it's position in the original sequence, followed by the start and end coordinates separated with '_' (>contig#start_end). | |
110 | |
111 *<filename>.txt* | |
112 Feature ID of the sequences that failed to be retireved, due to the start position or end position being outside of the sequence where the feature is (due to the `--extraSeq` option). | |
113 | |
114 | |
115 | |
116 ## Restart ReMatCh | |
117 | |
118 | |
119 Restart a ReMatCh run abruptly terminated | |
120 | |
121 **Dependencies** | |
122 - Python (2.7.x) | |
123 | |
124 **Usage** | |
125 | |
126 usage: restart_rematch.py [-h] [--version] -i | |
127 /path/to/initial/workdir/directory/ | |
128 [-w /path/to/workdir/directory/] [-j N] | |
129 [--runFailedSamples] | |
130 | |
131 Restart a ReMatCh run abruptly terminated | |
132 | |
133 optional arguments: | |
134 -h, --help show this help message and exit | |
135 --version Version information | |
136 | |
137 Required options: | |
138 -i /path/to/initial/workdir/directory/, --initialWorkdir /path/to/initial/workdir/directory/ | |
139 Path to the directory where ReMatCh was running (default: None) | |
140 | |
141 General facultative options: | |
142 -w, --workdir /path/to/workdir/directory/ | |
143 Path to the directory where ReMatCh will run again (default: .) | |
144 -j N, --threads N | |
145 Number of threads to use instead of the ones set in initial ReMatCh run (default: None) | |
146 --runFailedSamples | |
147 Will run ReMatCh for those samples missing, as well as for samples that did not run successfully in initial ReMatCh run (default: False) | |
148 | |
149 | |
150 | |
151 ## Strip Alignment | |
152 | |
153 | |
154 Strip alignment positions containing gaps, | |
155 missing data and invariable positions. | |
156 | |
157 **Dependencies** | |
158 - Python (2.7.x) | |
159 - [Biopython](http://biopython.org/) (1.68 or similar) | |
160 | |
161 **Usage** | |
162 | |
163 usage: strip_alignment.py [-h] [--version] | |
164 -i /path/to/aligned/input/file.fasta -o /path/to/stripped/output/file.fasta [--notGAPs] | |
165 [--notMissing] [--notInvariable] | |
166 | |
167 Strip alignment positions containing gaps, missing data and invariable positions | |
168 | |
169 optional arguments: | |
170 -h, --help show this help message and exit | |
171 --version Version information | |
172 | |
173 Required options: | |
174 -i, --infile /path/to/aligned/input/file.fasta | |
175 Path to the aligned fasta file (default: None) | |
176 -o, --outfile /path/to/stripped/output/file.fasta | |
177 Stripped output fasta file (default: alignment_stripped.fasta) | |
178 | |
179 General facultative options: | |
180 --notGAPs Not strip positions with GAPs (default: False) | |
181 --notMissing Not strip positions with missing data (default: False) | |
182 --notInvariable | |
183 Not strip invariable sites (default: False) |