Mercurial > repos > peterjc > seq_length

<tool id="seq_length" name="Sequence lengths" version="0.0.3">
    <description>from FASTA, QUAL, FASTQ, or SFF file</description>
    <requirements>
        <!-- This is the currently the last release of Biopython which is available via Galaxy's legacy XML packaging system -->
        <requirement type="package" version="1.67">biopython</requirement>
    </requirements>
    <version_command>
python $__tool_directory__/seq_length.py --version
</version_command>
    <command detect_errors="aggressive">
python $__tool_directory__/seq_length.py -i '$input_file' -f '$input_file.ext' -o '$output_file'
    </command>
    <inputs>
        <param name="input_file" type="data" format="fasta,qual,fastq,sff" label="Sequence file" help="FASTA, QUAL, FASTQ, or SFF format." />
    </inputs>
    <outputs>
        <data name="output_file" format="tabular" label="${on_string} length"/>
    </outputs>
    <tests>
        <test>
            <param name="input_file" value="four_human_proteins.fasta" ftype="fasta" />
            <output name="output_file" file="four_human_proteins.length.tabular" ftype="tabular" />
            <assert_stdout>
                <has_line line="4 sequences, total length 3297" />
            </assert_stdout>
        </test>
        <test>
            <param name="input_file" value="SRR639755_sample_strict.fastq" ftype="fastq" />
            <output name="output_file" file="SRR639755_sample_strict.length.tabular" ftype="tabular" />
            <assert_stdout>
                <has_line line="2 sequences, total length 202" />
            </assert_stdout>
        </test>
        <test>
            <param name="input_file" value="MID4_GLZRM4E04_rnd30.sff" ftype="sff" />
            <output name="output_file" file="MID4_GLZRM4E04_rnd30.length.tabular" ftype="tabular" />
            <assert_stdout>
                <has_line line="30 sequences, total length 7504" />
            </assert_stdout>
        </test>
    </tests>
    <help>
**What it does**

Takes a FASTA, QUAL, FASTQ or Standard Flowgram Format (SFF) file and produces a
two-column tabular file containing one line per sequence giving the sequence
identifier and the associated sequence's length.

WARNING: If there are any duplicate sequence identifiers, these will all appear
in the tabular output.

**References**

This tool uses Biopython's ``SeqIO`` library to read sequences, so please cite
the Biopython application note (and Galaxy too of course):

Cock et al (2009). Biopython: freely available Python tools for computational
molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.

This tool is available to install into other Galaxy Instances via the Galaxy
Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/seq_length
    </help>
    <citations>
        <citation type="doi">10.1093/bioinformatics/btp163</citation>
    </citations>
</tool>
author	peterjc
date	Mon, 14 May 2018 12:09:50 -0400
parents	458f987918a6
children	fcdf11fb34de