Mercurial > repos > iuc > gamma

<tool id="gamma" name="GAMMA" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.2">
	<description>finds gene matches in microbial genomic data using protein identity</description>
	<macros>
		<token name="@TOOL_VERSION@">2.2</token>
		<token name="@VERSION_SUFFIX@">0</token>
	</macros>
    <creator>
        <person givenName="Lieven" familyName="Sterck" url="https://github.com/lsterck" />
        <organization name="Sciensano-BioIT" url="https://github.com/BioinformaticsPlatformWIV-ISP" />
    </creator>
	<requirements>
		<requirement type="package" version="@TOOL_VERSION@">GAMMA</requirement>
	</requirements>
	<version_command>echo @TOOL_VERSION@</version_command>
	<command detect_errors="exit_code"><![CDATA[
 GAMMA.py

 '$input_fasta'
 '$input_db'
 gamma_out

 $all
 -i $identity
 $extended

 $out_options.fasta
 $out_options.gff
 $out_options.headless

	]]></command>
	<inputs>
        <param name="input_fasta" type="data" format="fasta" label="Input FASTA file" help="a genome or assembly in FASTA format" />
        <param name="input_db" type="data" format="fasta" label="Database to screen against" help="a multifasta database of the coding sequence of genes." />
        <param argument="--all" type="boolean" truevalue="-a" falsevalue="" checked="false" label="Include all gene matches" help="Returns all (including overlapping) gene matches" />
        <param argument="--identity" type="integer" min="0" max="100" value="90" label="Nucleotide sequence identity" help="The minimum nucleotide sequence identity % used by the Blat search, input as an integer (i.e., '-i 95' for a 95% threshold), default is 90" />
        <param argument="--extended" type="boolean" truevalue="-e" falsevalue="" checked="false" label="Return all gene mutations" help="Returns all gene mutations, otherwise if there are more than 10 mutations present the count is given" />
        <section name="out_options" title="Output options">
            <param argument="--fasta" type="boolean" truevalue="-f" falsevalue="" checked="false" label="Write out a multifasta file of the gene matches" />
            <param argument="--gff" type="boolean" truevalue="-g" falsevalue="" checked="false" label="Generate a general feature format (.GFF) file of the output gene matches" />
            <param argument="--headless" type="boolean" truevalue="-l" falsevalue="" checked="false" label="Remove the column headers in the .gamma output file" />
        </section>
	</inputs>
	<outputs>
        <data name="gamma_out" format="tabular" from_work_dir="gamma_out.gamma" label="${tool.name} on $on_string: GAMMA Output" />
        <data name="gamma_gff" format="gff" from_work_dir="gamma_out.gff" label="${tool.name} on $on_string: GFF file">
            <filter>out_options['gff'] is True </filter>
        </data>
        <data name="gamma_fasta" format="fasta" from_work_dir="gamma_out.fasta" label="${tool.name} on $on_string: FASTA file">
            <filter>out_options['fasta'] is True </filter>
        </data>
	</outputs>
	<tests>
        <test expect_num_outputs="1">
            <param name="input_fasta" value="contig_in.fasta" ftype="fasta"/>
            <param name="input_db" value="lukE_6.fasta" ftype="fasta"/>
            <output name="gamma_out" file="gamma_out.gamma" ftype="tabular"/>
        </test>
        <test expect_num_outputs="1">
            <param name="input_fasta" value="pDHQP1701672_amr_plasmid.fa" ftype="fasta"/>
            <param name="input_db" value="ResFinderDB_subset.fsa" ftype="fasta"/>
            <output name="gamma_out" file="gamma_amr.gamma" ftype="tabular"/>
        </test>
        <test expect_num_outputs="3">
            <param name="input_fasta" value="contig_in.fasta" ftype="fasta"/>
            <param name="input_db" value="lukE_6.fasta" ftype="fasta"/>
            <section name="out_options">
                <param name="gff" value="true"/>
                <param name="fasta" value="true"/>
            </section>
            <output name="gamma_out" file="gamma_out.gamma" ftype="tabular"/>
            <output name="gamma_gff" file="gamma_out.gff" ftype="gff"/>
            <output name="gamma_fasta" >
                <assert_contents>
                    <has_text_matching expression="^>lukE:6:BA000033\.2.+contig_in\n" />
                </assert_contents>
            </output>
        </test>
	</tests>
	<help><![CDATA[
**GAMMA (Gene Allele Mutation Microbial Assessment)**
a tool that finds gene matches in microbial genomic data using protein coding (rather than nucleotide) identity, and then translates and annotates the match by providing the type (i.e., mutant, truncation, etc.) and a translated description (i.e., Y190S mutant, truncation at residue 110, etc.). Because microbial gene families often have multiple alleles and existing databases are rarely exhaustive, GAMMA is helpful in both identifying and explaining how unique alleles differ from their closest known matches.

Output:

The default output of GAMMA is a tab-delimited file with 15 columns:

- Gene – The name of the closest matching gene (target) from the database. If there are ambiguous gene matches (i.e., multiple target matches with the same number of non-degenerate codon changes, basepair changes, and transversions), the gene match will be appended with a "‡".
- Contig – The name of the contig on which the match was found.
- Start – The start position of the sequence matching the gene on the contig.
- Stop – The end position of the sequence matching the gene on the contig.
- Match_Type – The type of the gene match based on the translation of the sequence (i.e., the protein sequence). Can be native (for identical amino acid sequences to the target), mutant (for nonsynonymous mutations), truncation (for nonsense mutations), indels (for insertions/deletions), nonstop (for a missing stop codon), contig edge (for matches that are truncated at the start or stop of a contig), or a combination of multiple types (i.e., indel truncation).
- Description – A short description of the match type.
- Codon_Changes – The count of the non-degenerate codon changes in the sequence versus the closest match from the datbase.
- BP_Changes - The count of the basepair changes in the sequence versus the closest match from the datbase.
- Transversions - The count of basepair changes that are transversions (i.e., purine to pyrimidine or vice versa, such as an A -> C or a T -> G)
- Codon_Percent – The percent (expressed as a decimal value) of the degenerate codon similarity between the query and match sequence. Gene matches with large insertions may show a negative value.
- BP_Percent - The percent (expressed as a decimal value) of the basepair similarity between the query and match sequence. Gene matches with large insertions may show a negative value.
- Percent_Length - The percent (expressed as a decimal value) of the length of the target covered by the matching sequence, maximum of 1.
- Match_Length – The length (in basepairs) of the matching sequence.
- Target_Length - The length (in basepairs) of the target sequence.
- Strand – The sense of the strand (+ or -) on which the match is found.

Additional outputs in the .gff format and a fasta of the gene matches (in the positive sense) can be generated using the -g and -f options, respectively.

**More Information**

- **Official Repository**: `GAMMA on GitHub`_

.. _GAMMA on GitHub: https://github.com/rastanton/GAMMA

	]]></help>
	<citations>
		<citation type="doi">10.1093/bioinformatics/btab607</citation>
	</citations>
</tool>
author	iuc
date	Tue, 19 Aug 2025 19:50:33 +0000
parents	0228b27d4373
children