Mercurial > repos > brenninc > data_manager_tagdust_architecture
comparison data_manager/tagdust_architecture_data_manager.xml @ 0:e3b3261e5498 draft default tip
Uploaded
author | brenninc |
---|---|
date | Sun, 08 May 2016 04:44:17 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:e3b3261e5498 |
---|---|
1 <tool id="tagdust_architecture_manager" name="tagdust architecture manager" tool_type="manage_data" version="0.0.1"> | |
2 <description>architecture creator</description> | |
3 <command interpreter="python"> | |
4 tagdust_architecture_data_manager.py | |
5 --data_table_name "tagdust_architecture" | |
6 --json_output_file "${json_output_file}" | |
7 </command> | |
8 <inputs> | |
9 <repeat name="hmms" title="HMM Building Blocks"> | |
10 <param name="block" type="text" size="25" label="Next HMM Building block" /> | |
11 </repeat> | |
12 <param name="name" type="text" value="" label="name field for the entry. Defaults to a contactenation of hmm values if left blank." /> | |
13 <param name="value" type="text" value="" label="value field for the entry. Defaults to name if left blank." /> | |
14 <param name="dbkey" type="text" value="" label="dbkey field for the entry. Defaults to value if left blank." /> | |
15 </inputs> | |
16 <outputs> | |
17 <data name="json_output_file" format="data_manager_json"/> | |
18 </outputs> | |
19 | |
20 <help> | |
21 Adds a path to the tagdust references. | |
22 | |
23 The tool will check the path exists but NOT check that it holds the expected data type. | |
24 | |
25 If name is not provided a concatenation of hmm values is used. | |
26 | |
27 If value is not provided, the name will be used (or its default) | |
28 | |
29 If dbkey is not provided, the value will be used (or its default) | |
30 | |
31 ==== | |
32 | |
33 Taken from The TagDust2 Manual http://tagdust.sourceforge.net (part of Version 2_31 download) | |
34 | |
35 Raw sequences produced by next generation sequencing (NGS) machines can contain adapter, linker, | |
36 barcode and fingerprint sequences. TagDust2 is a program to extract and correctly label the sequences | |
37 to be mapped in downstream pipelines. | |
38 TagDust allows users to specify the expected architecture of a read and converts it into a hidden | |
39 Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence | |
40 of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are | |
41 automatically discarded | |
42 | |
43 TagDust requires an input file containing sequences and a user defined HMM architecture used to ex- | |
44 tract the reads. The architecture is composed of a selection of pre-defined building blocks representing | |
45 indices, barcodes, spacers and other sequences one might encounter in the raw output of a sequenced | |
46 sample. | |
47 | |
48 HMM Building Blocks | |
49 | |
50 TagDust comes with a set of pre-defined HMM building blocks. Each includes a silent state at the | |
51 beginning and end used to link blocks together. Each block is specified by a unique letter following | |
52 by a colon and some information about the sequence. | |
53 | |
54 Read | |
55 Segment modeling the read. | |
56 Code: R:N | |
57 | |
58 Optional | |
59 Segment modeling an optional single or short stretch of nucleotides. | |
60 Code: O:N | |
61 | |
62 G addition | |
63 Segment modeling the occasional addition of guanines to the reads. | |
64 (89.3% chance of a single G , 19.5% chance of 2 Gs..). | |
65 Code: G:G | |
66 | |
67 Barcode or Index | |
68 Segment modeling a set of barcode sequences. For each sequence a separate HMM is created. The | |
69 barcode sequences must be given as a comma separated list. A null model of the same length as the | |
70 barcode is automatically added and initialized to the background nucleotide frequencies. | |
71 Code: B:GTA,AAC | |
72 | |
73 Fingerprint or Unique Molecular Identifier - UMI | |
74 Segment modeling a fingerprint (or unique molecular identifiers). Insertions and deletions are by | |
75 default not allowed within a fingerprint segment. | |
76 Code: F:NNN | |
77 | |
78 Spacer | |
79 Segment modeling a pre-defined sequence. | |
80 Code: S:GTA | |
81 | |
82 Partial | |
83 This segment is used to model sequences that may only be partially present at the 5‘ or 3‘ end of | |
84 the read. The transition probabilities (orange and blue) are set automatically based on the length | |
85 distribution of exactly matching adapters. | |
86 Code: P:CCTTAA | |
87 | |
88 | |
89 </help> | |
90 <citations> | |
91 </citations> | |
92 | |
93 </tool> |