Mercurial > repos > brenninc > data_manager_tagdust_architecture
view data_manager/tagdust_architecture_data_manager.xml @ 0:e3b3261e5498 draft default tip
Uploaded
author | brenninc |
---|---|
date | Sun, 08 May 2016 04:44:17 -0400 |
parents | |
children |
line wrap: on
line source
<tool id="tagdust_architecture_manager" name="tagdust architecture manager" tool_type="manage_data" version="0.0.1"> <description>architecture creator</description> <command interpreter="python"> tagdust_architecture_data_manager.py --data_table_name "tagdust_architecture" --json_output_file "${json_output_file}" </command> <inputs> <repeat name="hmms" title="HMM Building Blocks"> <param name="block" type="text" size="25" label="Next HMM Building block" /> </repeat> <param name="name" type="text" value="" label="name field for the entry. Defaults to a contactenation of hmm values if left blank." /> <param name="value" type="text" value="" label="value field for the entry. Defaults to name if left blank." /> <param name="dbkey" type="text" value="" label="dbkey field for the entry. Defaults to value if left blank." /> </inputs> <outputs> <data name="json_output_file" format="data_manager_json"/> </outputs> <help> Adds a path to the tagdust references. The tool will check the path exists but NOT check that it holds the expected data type. If name is not provided a concatenation of hmm values is used. If value is not provided, the name will be used (or its default) If dbkey is not provided, the value will be used (or its default) ==== Taken from The TagDust2 Manual http://tagdust.sourceforge.net (part of Version 2_31 download) Raw sequences produced by next generation sequencing (NGS) machines can contain adapter, linker, barcode and fingerprint sequences. TagDust2 is a program to extract and correctly label the sequences to be mapped in downstream pipelines. TagDust allows users to specify the expected architecture of a read and converts it into a hidden Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are automatically discarded TagDust requires an input file containing sequences and a user defined HMM architecture used to ex- tract the reads. The architecture is composed of a selection of pre-defined building blocks representing indices, barcodes, spacers and other sequences one might encounter in the raw output of a sequenced sample. HMM Building Blocks TagDust comes with a set of pre-defined HMM building blocks. Each includes a silent state at the beginning and end used to link blocks together. Each block is specified by a unique letter following by a colon and some information about the sequence. Read Segment modeling the read. Code: R:N Optional Segment modeling an optional single or short stretch of nucleotides. Code: O:N G addition Segment modeling the occasional addition of guanines to the reads. (89.3% chance of a single G , 19.5% chance of 2 Gs..). Code: G:G Barcode or Index Segment modeling a set of barcode sequences. For each sequence a separate HMM is created. The barcode sequences must be given as a comma separated list. A null model of the same length as the barcode is automatically added and initialized to the background nucleotide frequencies. Code: B:GTA,AAC Fingerprint or Unique Molecular Identifier - UMI Segment modeling a fingerprint (or unique molecular identifiers). Insertions and deletions are by default not allowed within a fingerprint segment. Code: F:NNN Spacer Segment modeling a pre-defined sequence. Code: S:GTA Partial This segment is used to model sequences that may only be partially present at the 5‘ or 3‘ end of the read. The transition probabilities (orange and blue) are set automatically based on the length distribution of exactly matching adapters. Code: P:CCTTAA </help> <citations> </citations> </tool>