Mercurial > repos > iuc > mothur_chimera_bellerophon
view chimera.bellerophon.xml @ 0:2146e981bc7f draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/mothur commit a9d1e0debcd357d8080a1c6c5f1d206dd45a7a4d
author | iuc |
---|---|
date | Fri, 19 May 2017 05:40:38 -0400 |
parents | |
children | d8fdd5448cda |
line wrap: on
line source
<tool profile="16.07" id="mothur_chimera_bellerophon" name="Chimera.bellerophon" version="@WRAPPER_VERSION@.0"> <description>Find putative chimeras using bellerophon</description> <macros> <import>macros.xml</import> </macros> <expand macro="requirements"/> <expand macro="stdio"/> <expand macro="version_command"/> <command><![CDATA[ @SHELL_OPTIONS@ ## create symlinks to input datasets ln -s "$fasta" fasta.dat && echo 'chimera.bellerophon( fasta=fasta.dat, filter=$filter, correction=$correction, #if int($window) > 0: window=$window, #end if #if $increment: increment=$increment, #end if processors='\${GALAXY_SLOTS:-8}' )' | sed 's/ //g' ## mothur trips over whitespace | mothur | tee mothur.out.log ]]></command> <inputs> <param name="fasta" type="data" format="fasta" label="fasta - Candiate Sequences"/> <param name="filter" type="boolean" falsevalue="False" truevalue="True" checked="false" label="filter - Apply a 50% soft vertical filter"/> <param name="correction" type="boolean" falsevalue="False" truevalue="True" checked="true" label="correction - Use the square root of the distances instead of the distance value"/> <param name="window" type="integer" value="0" label="window - Length of sequence you want in each window analyzed (uses default if < 1)" help="Default is 25% of the sequence length."/> <param name="increment" type="integer" value="25" min="0" label="increment - Increment for window slide on each iteration" help="Default is 25, but you may set it up to sequence length minus twice the window."/> </inputs> <outputs> <expand macro="logfile-output"/> <data name="out_file" format="txt" from_work_dir="fasta.bellerophon.chimeras" label="${tool.name} on ${on_string}: bellerophon.chimeras"/> <data name="out_accnos" format="mothur.accnos" from_work_dir="fasta.bellerophon.accnos" label="${tool.name} on ${on_string}: bellerophon.accnos"/> </outputs> <tests> <test> <param name="fasta" value="Mock_S280_L001_R1_001_small.trim.contigs.good.fasta"/> <param name="window" value="-1"/> <!-- don't use window if negative --> <output name="out_file" md5="435ff11d15d6c117e29747ab8ac36c75" ftype="txt"/> <output name="out_accnos" ftype="mothur.accnos" file="Mock_S280_L001_R1_001_small.trim.contigs.good.bellerophon.accnos"/> <output name="logfile" ftype="txt"> <!-- outputs some results to the log file --> <assert_contents> <has_text text="Minimum"/> <has_text text="Median"/> <has_text text="Maximum"/> </assert_contents> </output> <expand macro="logfile-test"/> </test> </tests> <help> <![CDATA[ @MOTHUR_OVERVIEW@ **Command Documentation** The chimera.bellerophon_ command identifies putative chimeras using the bellerophon_ approach. Advantages of Bellerophon: 1) You can process all sequences from a PCR-clone library in a single analysis and don't have to inspect outputs for every sequence in the dataset. 2) The approximate putative breakpoint is calculated using a sliding window (see above) and will help verification of the chimera manually. 3) A chimeric sequence is not only tested against two (putative) parent sequences but rather is assessed by how well it fits into the complete phylogenetic environment of a multiple sequence alignment. Hence sequences do not become invisible to the program as is the case with CHIMERA_CHECK (see Ref 1 below). 4) The calculations Bellerophon uses to detect chimeric sequences are computationally relatively cheap and results are quickly calculated for datasets with up 50 sequences (~1 min). Larger datasets take longer - 100 sequences ~30 min, 300 sequences ~8 hours. Tips for using Bellerophon: 1) Bellerophon works most efficiently if the parent sequences or non-chimeric sequences closely related to the parent sequences are present in the dataset analyzed. Therefore, as many sequences as possible from the one PCR-clone library should be included in the analysis since the parent sequences of any chimera are most likely to be in that dataset. Addition of non-chimeric outgroup sequences (e.g. from isolates) may help refine an analysis by providing reference points (and a broader phylogenetic context) in the analysis, but be aware of increasing analysis time with bigger datasets. 2) Bellerophon is compromised by using sequences of different lengths as this can produce artificial skews in distance matrices of fragments of the alignment. Datasets containing sequences of the same length and covering the same portion of the gene should be used (usually not an issue with sequences from a PCR-clone library). The filter will automatically remove sequences too short for the window size, i.e. less than 600 bp for a window size of 300. 3) If possible multiple window sizes should be used as the number of identified chimeras can vary with the choice of the window size. 4) Re-running the dataset without the first reported chimeras may identify additional putative chimeras by reducing noise in the analysis. Ideally, the dataset should continue to be re-run removing previously reported chimeras until no chimeras are identified. 5) Bellerophon should be used in concert with other detection methods such as CHIMERA_CHECK and putatively identified chimeras should always be confirmed by manual inspection of the sequences for signature shifts. .. _bellerophon: http://comp-bio.anu.edu.au/Bellerophon/doc/doc.html .. _chimera.bellerophon: https://www.mothur.org/wiki/Chimera.bellerophon ]]> </help> <expand macro="citations"/> </tool>