0
|
1 <tool id="tagdust_architecture_manager" name="tagdust architecture manager" tool_type="manage_data" version="0.0.1">
|
|
2 <description>architecture creator</description>
|
|
3 <command interpreter="python">
|
|
4 tagdust_architecture_data_manager.py
|
|
5 --data_table_name "tagdust_architecture"
|
|
6 --json_output_file "${json_output_file}"
|
|
7 </command>
|
|
8 <inputs>
|
|
9 <repeat name="hmms" title="HMM Building Blocks">
|
|
10 <param name="block" type="text" size="25" label="Next HMM Building block" />
|
|
11 </repeat>
|
|
12 <param name="name" type="text" value="" label="name field for the entry. Defaults to a contactenation of hmm values if left blank." />
|
|
13 <param name="value" type="text" value="" label="value field for the entry. Defaults to name if left blank." />
|
|
14 <param name="dbkey" type="text" value="" label="dbkey field for the entry. Defaults to value if left blank." />
|
|
15 </inputs>
|
|
16 <outputs>
|
|
17 <data name="json_output_file" format="data_manager_json"/>
|
|
18 </outputs>
|
|
19
|
|
20 <help>
|
|
21 Adds a path to the tagdust references.
|
|
22
|
|
23 The tool will check the path exists but NOT check that it holds the expected data type.
|
|
24
|
|
25 If name is not provided a concatenation of hmm values is used.
|
|
26
|
|
27 If value is not provided, the name will be used (or its default)
|
|
28
|
|
29 If dbkey is not provided, the value will be used (or its default)
|
|
30
|
|
31 ====
|
|
32
|
|
33 Taken from The TagDust2 Manual http://tagdust.sourceforge.net (part of Version 2_31 download)
|
|
34
|
|
35 Raw sequences produced by next generation sequencing (NGS) machines can contain adapter, linker,
|
|
36 barcode and fingerprint sequences. TagDust2 is a program to extract and correctly label the sequences
|
|
37 to be mapped in downstream pipelines.
|
|
38 TagDust allows users to specify the expected architecture of a read and converts it into a hidden
|
|
39 Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence
|
|
40 of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are
|
|
41 automatically discarded
|
|
42
|
|
43 TagDust requires an input file containing sequences and a user defined HMM architecture used to ex-
|
|
44 tract the reads. The architecture is composed of a selection of pre-defined building blocks representing
|
|
45 indices, barcodes, spacers and other sequences one might encounter in the raw output of a sequenced
|
|
46 sample.
|
|
47
|
|
48 HMM Building Blocks
|
|
49
|
|
50 TagDust comes with a set of pre-defined HMM building blocks. Each includes a silent state at the
|
|
51 beginning and end used to link blocks together. Each block is specified by a unique letter following
|
|
52 by a colon and some information about the sequence.
|
|
53
|
|
54 Read
|
|
55 Segment modeling the read.
|
|
56 Code: R:N
|
|
57
|
|
58 Optional
|
|
59 Segment modeling an optional single or short stretch of nucleotides.
|
|
60 Code: O:N
|
|
61
|
|
62 G addition
|
|
63 Segment modeling the occasional addition of guanines to the reads.
|
|
64 (89.3% chance of a single G , 19.5% chance of 2 Gs..).
|
|
65 Code: G:G
|
|
66
|
|
67 Barcode or Index
|
|
68 Segment modeling a set of barcode sequences. For each sequence a separate HMM is created. The
|
|
69 barcode sequences must be given as a comma separated list. A null model of the same length as the
|
|
70 barcode is automatically added and initialized to the background nucleotide frequencies.
|
|
71 Code: B:GTA,AAC
|
|
72
|
|
73 Fingerprint or Unique Molecular Identifier - UMI
|
|
74 Segment modeling a fingerprint (or unique molecular identifiers). Insertions and deletions are by
|
|
75 default not allowed within a fingerprint segment.
|
|
76 Code: F:NNN
|
|
77
|
|
78 Spacer
|
|
79 Segment modeling a pre-defined sequence.
|
|
80 Code: S:GTA
|
|
81
|
|
82 Partial
|
|
83 This segment is used to model sequences that may only be partially present at the 5‘ or 3‘ end of
|
|
84 the read. The transition probabilities (orange and blue) are set automatically based on the length
|
|
85 distribution of exactly matching adapters.
|
|
86 Code: P:CCTTAA
|
|
87
|
|
88
|
|
89 </help>
|
|
90 <citations>
|
|
91 </citations>
|
|
92
|
|
93 </tool>
|