Mercurial > repos > iuc > proteinortho_summary

<tool id="proteinortho_summary" name="Proteinortho summary" version="@TOOL_VERSION@+galaxy@WRAPPER_VERSION@" profile="@PROFILE@">
    <description>summaries the orthology-pairs/RBH files</description>
    <macros>
        <import>proteinortho_macros.xml</import>
    </macros>
    <expand macro="biotools"/>
    <expand macro="requirements"/>
    <expand macro="version_command"/>
    <command detect_errors="exit_code"><![CDATA[
        export TERM=dumb &&
        proteinortho_summary.pl
            $queryfile
            #if $queryfile2:
                '$queryfile2'
            #end if
            2>&1
            | awk '/^$/ && !f{f=1;next}1' ## remove potentially present 1st empty line
            | awk 'BEGIN{i=0} /^$/{i+=1}{print > ("output" i ".tsv")}' ## split file at empty lines
        &&
        mv output0.tsv '$distribution' &&
        mv output1.tsv '$adjacencyMat' &&
        mv output2.tsv '$average1paths' &&
        mv output3.tsv '$adjacencyMatSquared' &&
        mv output4.tsv '$average2paths'
    ]]></command>
    <inputs>
        <param name="queryfile" type="data" format="tabular" label="A orthology-pairs / RBH file"/>
        <param name="queryfile2" type="data" format="tabular" optional="true" label="(optional) A second orthology-pairs / RBH file" help="If you provide a second file, then difference is calculated (GRAPH - second GRAPH)"/>
    </inputs>
    <outputs>
        <data name="distribution" format="tabular" label="${tool.name} on ${on_string}: Protein-Group distribution"/>
        <data name="adjacencyMat" format="tabular" label="${tool.name} on ${on_string}: Adjacency Matrix"/>
        <data name="average1paths" format="tabular" label="${tool.name} on ${on_string}: Average number of Edges"/>
        <data name="adjacencyMatSquared" format="tabular" label="${tool.name} on ${on_string}: Matrix of 2-paths"/>
        <data name="average2paths" format="tabular" label="${tool.name} on ${on_string}: Average number of 2-paths"/>
    </outputs>
    <tests>
        <test expect_num_outputs="5">
            <param name="queryfile" value="result.proteinortho-graph"/>
            <output name="distribution">
                <assert_contents>
                    <has_text text="%"/>
                </assert_contents>
            </output>
            <output name="adjacencyMat">
                <assert_contents>
                    <has_text text="18"/>
                    <has_text text="14"/>
                    <has_text text="TERM" negate="true"/>
                </assert_contents>
            </output>
            <output name="average1paths">
                <assert_contents>
                    <has_text text="9.6"/>
                    <has_text text="15"/>
                    <has_text text="TERM" negate="true"/>
                </assert_contents>
            </output>
            <output name="adjacencyMatSquared">
                <assert_contents>
                    <has_text text="750"/>
                    <has_text text="74"/>
                    <has_text text="TERM" negate="true"/>
                </assert_contents>
            </output>
            <output name="average2paths">
                <assert_contents>
                    <has_text text="1088.8"/>
                    <has_text text="1374.2"/>
                    <has_text text="TERM" negate="true"/>
                </assert_contents>
            </output>
        </test>
        <test expect_num_outputs="5">
            <param name="queryfile" value="result.proteinortho-graph"/>
            <param name="queryfile2" value="result.blast-graph"/>
            <output name="average2paths">
                <assert_contents>
                    <has_text text="49.6"/>
                    <has_text text="59.8"/>
                    <has_text text="TERM" negate="true"/>
                </assert_contents>
            </output>
        </test>
        <test expect_num_outputs="5">
            <param name="queryfile" value="result.blast-graph"/>
            <output name="average2paths">
                <assert_contents>
                    <has_text text="115.2"/>
                    <has_text text="TERM" negate="true"/>
                </assert_contents>
            </output>
        </test>
    </tests>
    <help><![CDATA[proteinortho summary

**What it does**

proteinortho_summary : Summaries the (orthology-pairs/RBH) file(s) to determine how well the species are connected to each other.

 * **Protein-Group distribution** (for orthology-pairs) : This report contains overal statistics about the output. (i) Number of groups that contains at least p% input species (with p ranging between 0 and 100). (ii) number of groups for each input species.

 * **Adjacency Matrix** : How well are the species connected to each other directly.

 * **Average number of Edges** : Averaged number of connections for each species.

 * **Matrix of 2-paths** : The square of the adjacency matrix = The number of paths of length 2 between two species.

 * **Average number of 2-paths** : The average number of 2-paths for each species. If a species is not well connected to all the other species, it will result in a low average.

If you supply a second orthology-pairs/RBH then the difference is calculated for all 4 outputs.

E.g. given the RBH and the orthology-pairs of the same run : The outputs show how much the clustering removed from the initial reciprocal best hit graph.
Or given 2 orthology-pairs from the same set of fasta files with different parameters (evalue,...) : The output show how the parameters change the connectivity of the output.

**Other Proteinortho-Tools for downstream analysis**

* `proteinortho grab proteins` : find proteins/genes in a given fasta file and retrieve their sequence(s). You can also use a orthology-groups file.

More information can be found on github https://gitlab.com/paulklemm_PHD/proteinortho
]]>
    </help>
    <expand macro="citations"/>
</tool>
author	iuc
date	Tue, 31 Oct 2023 16:31:54 +0000
parents	de26b312c0d2
children