annotate dante_gff_to_dna.xml @ 0:a5f1638b73be draft

Uploaded
author petr-novak
date Wed, 26 Jun 2019 08:01:42 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
1 <tool id="domains_extract" name="Extract Domains Nucleotide Sequences" version="1.0.0">
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
2 <description> Tool to extract nucleotide sequences of protein domains found by DANTE </description>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
3 <requirements>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
4 <requirement type="package">biopython</requirement>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
5 </requirements>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
6 <command>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
7 TEMP_DIR_LINEAGES=\$(mktemp -d) &amp;&amp;
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
8 python3 ${__tool_directory__}/dante_gff_to_dna.py --domains_gff ${domains_gff} --input_dna ${input_dna} --out_dir \$TEMP_DIR_LINEAGES
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
9
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
10 #if $extend_edges:
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
11 --extended True
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
12 #else:
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
13 --extended False
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
14 #end if
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
15 --classification ${__tool_data_path__ }/protein_domains/${db_type}_class
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
16 &amp;&amp;
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
17
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
18 cat \$TEMP_DIR_LINEAGES/domains_counts.txt \$TEMP_DIR_LINEAGES/*fasta > $out_fasta &amp;&amp;
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
19 rm -rf \$TEMP_DIR_LINEAGES
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
20 </command>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
21 <inputs>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
22 <param format="fasta" type="data" name="input_dna" label="Input DNA" help="Choose input DNA sequence(s) to extract the domains from" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
23 <param format="gff" type="data" name="domains_gff" label="Protein domains GFF" help="Choose filtered protein domains GFF3 (DANTE's output)" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
24 <param name="db_type" type="select" label="Select taxon and protein domain database version (REXdb)" help="">
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
25 <options from_file="rexdb_versions.txt">
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
26 <column name="name" index="0"/>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
27 <column name="value" index="1"/>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
28 </options>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
29 </param>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
30
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
31 <param name="extend_edges" type="boolean" truevalue="True" falsevalue="False" checked="True" label="Extend sequence edges" help="Extend extracted sequence edges to the full length of database domains sequences"/>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
32 </inputs>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
33 <outputs>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
34 <data format="fasta" name="out_fasta" label="Concatenated fasta domains NT sequences from ${input_dna.hid}" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
35 </outputs>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
36
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
37 <help>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
38
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
39 **WHAT IT DOES**
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
40
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
41 This tool extracts nucleotide sequences of protein domains from reference DNA based on DANTE's output. It can be used e.g. for deriving phylogenetic relations of individual mobile elements within a species. This can be done separately for individual protein domains types.
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
42 In this case, prior running this tool use DANTE on input DNA:
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
43
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
44 1. Protein Domains Finder
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
45 2. Protein Domains Filter (quality filter + type of domain, e.g. RT)
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
46
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
47 INPUTS:
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
48 * original DNA sequence in multifasta format to extract the domains from
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
49 * DANTE's output GFF3 file (preferably filtered for quality and specific domain type)
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
50
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
51 OUTPUT:
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
52
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
53 * concatenated fasta file of nucleotide sequences for individual transposons lineages
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
54
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
55 By default sequences will be EXTENDED if the alignment reported by LASTAL does not cover the whole protein sequence from the database.
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
56 As the result, the corresponding nucleotide region of the WHOLE aligned database domain will be reported. For every record in the GFF3 file the sequence is reported for the BEST HIT within the domain region under following conditions:
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
57
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
58 * The domain cannot be ambiguous, i.e. the FINAL CLASSIFICATION of the domains region corresponds to the last classification level
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
59 * The extracted sequences are not reported in the case they contain any Ns within the extracted region
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
60
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
61 </help>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
62 </tool>