Mercurial > repos > iuc > kc_align

<tool id="kc-align" name="Kc-Align" version="0.1.2" python_template_version="3.5">
    <requirements>
        <requirement type="package" version="0.5.8">kcalign</requirement>
        <requirement type="package" version="3.2.2">kalign3</requirement>
    </requirements>
    <command detect_errors="exit_code">
        <![CDATA[
        kc-align
            --mode $position.mode
            --reference '$reference'
            --reads '$reads'
            #if $position.mode == "genome":
                --start $position.start
                --end $position.end
            #end if
    ]]></command>
    <inputs>
        <param name="reference" type="data" format="fasta" label="Reference Sequence" help="Single FASTA reference sequence to be aligned" />
        <param name="reads" type="data" format="fasta" label="Reads" help="Multi-FASTA of seqeunces to be aligned with the reference" />
        <param name="outname" type="text" value="out" label="Output Prefix" />
        <conditional name="position" >
            <param name="mode" type="select" label="Mode" >
                <option value="genome">Genome</option>
                <option value="gene">Gene</option>
                <option value="mixed">Mixed</option>
            </param>
            <when value="genome" >
                <param name="start" type="text" value="0" label="Start Position(s)" help="The 1-indexed start position of the gene of interest in the reference sequence (For multi-segmented sequences, input each start site separated by a comma ex: 12562,12591)" />
                <param name="end" type="text" value="0" label="End Position(s)" help="The 1-indexed end position of the gene of interest in the reference sequence (For multi-segmented sequences, input each end site separated by a comma ex: 12592,13905)" />
            </when>
            <when value="gene" >
            </when>
            <when value="mixed" >
            </when>
        </conditional>
    </inputs>
    <outputs>
        <data name="fasta" format="fasta" from_work_dir="codon_align.fasta" label="${outname}.fasta" />
        <data name="clustal" format="txt" from_work_dir="codon_align.clustal" label="${outname}.clustal" />
    </outputs>
<tests>
    <test>
        <param name="reference" ftype="fasta" value="wuhan_ref.fasta" />
        <param name="reads" ftype="fasta" value="3.fasta" />
        <param name="mode" value="genome" />
        <param name="start" value="21563" />
        <param name="end" value="25384" />
        <output name="fasta" ftype="fasta" compare="diff" value="kc-align.fasta" />
        <output name="clustal" ftype="txt" compare="diff" value="kc-align.clustal" />
    </test>
</tests>
    <help><![CDATA[

============
Kc-Align
============

Kc-Algin is a codon-aware multiple aligner that uses Kalgin2 to produce in-frame gapped codon alignments for selection analysis of small genomes (mostly viral and some smaller bacterial genomes). Takes nucleotide seqeunces as inputs, converts them to their in-frame amino acid sequences, performs multiple alignment with Kalign, and then converts the alignments back to their original codon sequence while preserving the gaps. Produces two outputs: the gapped nucleotide alignments in FASTA format and in CLUSTAL format.

Kc-Align will also attempt to detect any frameshift mutations in the input reads. If a frameshift is detected, that sequence will not be included in the multiple alignment and its ID will be printed to stdout.

Kc-Align also has functionality for genes that are are composed of more than one continuous sequence (currently only support for two segments). This can be achieved by entering each segments start coordinate in the Start Position parameter separated by a comma and then doing the same for each segments end coordinate in the End Position parameter (Ex: Start Postion: 12562,12591 End Position: 12592,13905)

Modes:
------

Kc-Align can be run in three different modes, depending on your input data.

* In **genome** mode, the "reference" and "reads" input parameters are all full genome FASTA files. This mode also requires the 1-based start and end position numbers corresponding to the gene you are interested in aligning from the reference input.

* If both the "reference" and "reads" inputs are already in-frame genes, the **gene** mode should be used. This mode does not require start and end position parameters as the reference is already in-frame.

* For the case when your "reference" is an in-frame gene while the "reads" are whole genomes, the **mixed** mode can be used. Like gene mode, this mode does not require the start and end point position parameters.


    ]]></help>
    <citations>
        <citation type="bibtex">
            @misc{githubkcalign,
            author = {Nicholas Keener, Emil Bouvier},
            year = {2020},
            title = {Kc-Align},
            publisher = {Github},
            journal = {Github repository},
            url = {https://github.com/davebx/kc-align},
        }</citation>
    </citations>
</tool>
author	iuc
date	Fri, 27 Mar 2020 15:00:09 -0400
parents	04b13fc809ac
children	20bef04b5272