annotate data_manager/tagdust_architecture_data_manager.xml @ 0:e3b3261e5498 draft default tip

Uploaded
author brenninc
date Sun, 08 May 2016 04:44:17 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
1 <tool id="tagdust_architecture_manager" name="tagdust architecture manager" tool_type="manage_data" version="0.0.1">
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
2 <description>architecture creator</description>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
3 <command interpreter="python">
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
4 tagdust_architecture_data_manager.py
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
5 --data_table_name "tagdust_architecture"
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
6 --json_output_file "${json_output_file}"
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
7 </command>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
8 <inputs>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
9 <repeat name="hmms" title="HMM Building Blocks">
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
10 <param name="block" type="text" size="25" label="Next HMM Building block" />
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
11 </repeat>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
12 <param name="name" type="text" value="" label="name field for the entry. Defaults to a contactenation of hmm values if left blank." />
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
13 <param name="value" type="text" value="" label="value field for the entry. Defaults to name if left blank." />
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
14 <param name="dbkey" type="text" value="" label="dbkey field for the entry. Defaults to value if left blank." />
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
15 </inputs>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
16 <outputs>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
17 <data name="json_output_file" format="data_manager_json"/>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
18 </outputs>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
19
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
20 <help>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
21 Adds a path to the tagdust references.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
22
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
23 The tool will check the path exists but NOT check that it holds the expected data type.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
24
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
25 If name is not provided a concatenation of hmm values is used.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
26
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
27 If value is not provided, the name will be used (or its default)
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
28
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
29 If dbkey is not provided, the value will be used (or its default)
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
30
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
31 ====
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
32
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
33 Taken from The TagDust2 Manual http://tagdust.sourceforge.net (part of Version 2_31 download)
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
34
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
35 Raw sequences produced by next generation sequencing (NGS) machines can contain adapter, linker,
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
36 barcode and fingerprint sequences. TagDust2 is a program to extract and correctly label the sequences
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
37 to be mapped in downstream pipelines.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
38 TagDust allows users to specify the expected architecture of a read and converts it into a hidden
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
39 Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
40 of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
41 automatically discarded
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
42
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
43 TagDust requires an input file containing sequences and a user defined HMM architecture used to ex-
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
44 tract the reads. The architecture is composed of a selection of pre-defined building blocks representing
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
45 indices, barcodes, spacers and other sequences one might encounter in the raw output of a sequenced
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
46 sample.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
47
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
48 HMM Building Blocks
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
49
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
50 TagDust comes with a set of pre-defined HMM building blocks. Each includes a silent state at the
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
51 beginning and end used to link blocks together. Each block is specified by a unique letter following
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
52 by a colon and some information about the sequence.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
53
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
54 Read
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
55 Segment modeling the read.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
56 Code: R:N
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
57
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
58 Optional
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
59 Segment modeling an optional single or short stretch of nucleotides.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
60 Code: O:N
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
61
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
62 G addition
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
63 Segment modeling the occasional addition of guanines to the reads.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
64 (89.3% chance of a single G , 19.5% chance of 2 Gs..).
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
65 Code: G:G
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
66
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
67 Barcode or Index
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
68 Segment modeling a set of barcode sequences. For each sequence a separate HMM is created. The
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
69 barcode sequences must be given as a comma separated list. A null model of the same length as the
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
70 barcode is automatically added and initialized to the background nucleotide frequencies.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
71 Code: B:GTA,AAC
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
72
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
73 Fingerprint or Unique Molecular Identifier - UMI
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
74 Segment modeling a fingerprint (or unique molecular identifiers). Insertions and deletions are by
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
75 default not allowed within a fingerprint segment.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
76 Code: F:NNN
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
77
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
78 Spacer
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
79 Segment modeling a pre-defined sequence.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
80 Code: S:GTA
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
81
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
82 Partial
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
83 This segment is used to model sequences that may only be partially present at the 5‘ or 3‘ end of
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
84 the read. The transition probabilities (orange and blue) are set automatically based on the length
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
85 distribution of exactly matching adapters.
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
86 Code: P:CCTTAA
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
87
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
88
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
89 </help>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
90 <citations>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
91 </citations>
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
92
e3b3261e5498 Uploaded
brenninc
parents:
diff changeset
93 </tool>