Mercurial > repos > iuc > astral

<tool id="astral" name="ASTRAL-III" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.00">
    <description>Estimating an unrooted species tree given a set of unrooted gene trees</description>
    <macros>
        <token name="@TOOL_VERSION@">5.7.8</token>
        <token name="@VERSION_SUFFIX@">0</token>
    </macros>
    <requirements>
        <requirement type="package" version="@TOOL_VERSION@">astral-tree</requirement>
    </requirements>
    <version_command><![CDATA[
python '$__tool_directory__/version_command.py'
    ]]></version_command>
    <command detect_errors="exit_code"><![CDATA[
        astral
        --input '$input'
        --branch-annotate ${branch_annotate}
        --output ./output.tre
        --lambda $lambda 2>&1
        | tee '$log_output'
        &&
        mv ./output.tre '$output'
        #if $branch_annotate == "16" or $branch_annotate == "32"
            &&
            mv freqQuad.csv '$branch_annotations'
        #end if
    ]]></command>
    <inputs>
        <param type="data" argument="--input" format="newick" label="Tree file" />
        <param argument="--branch-annotate" type="select" label="Amount of annotations that should be added to each branch (-t)">
            <option value="0">0: no annotations</option>
            <option value="1">1: only the quartet support for the main resolution</option>
            <option value="2">2: full annotation</option>
            <option value="3" selected="true">3 (default): only the posterior probability for the main resolution</option>
            <option value="4">4: three alternative posterior probabilities</option>
            <option value="8">8: three alternative quartet scores</option>
            <option value="16">16: for file export of branch annotations to freqQuad.csv (see below)</option>
            <option value="32">32: for file export of branch annotations to freqQuad.csv (see below)</option>
            <option value="10">10: p-values of a polytomy null hypothesis test (default: 3)</option>
        </param>
        <param argument="--lambda" type="float" min="0.0" max="10.0" value="0.5" optional="true" label="Lambda parameter for the Yule prior"
            help="Used in the calculations of branch lengths and posterior probabilities.">
            <validator type="in_range" min="0.0" max="10.0"/>
        </param>
    </inputs>
    <outputs>
        <data name="output" format="newick" label="Output tree file"/>
        <data name="log_output" format="txt" label="Astral log"/>
        <data name="branch_annotations" format="tabular" label="Branch annotations file.">
            <filter>branch_annotate == '16' or branch_annotate == '32'</filter>

        </data>
    </outputs>
    <tests>
        <test expect_num_outputs="3">
            <param name="input" value="song_mammals.424.gene.tre" ftype="newick"/>
            <param name="branch_annotate" value="16" />
            <param name="lambda" value="2.0" />
            <output name="output" file="song_mammals.tre" ftype="newick"/>
            <output name="log_output">
                 <assert_contents>
                    <has_line line="Number of taxa: 37 (37 species)" />
                    <has_line line="gradient0: 1933" />
                    <has_line line="Number of Clusters after addition by distance: 1933" />
                    <has_line line="Final quartet score is: 25526915" />
                </assert_contents>
            </output>
            <output name="branch_annotations" file="freqQuad.csv" ftype="tabular"/>
            <assert_command_version>
                <has_text text="@TOOL_VERSION@"/>
            </assert_command_version>
        </test>
        <test expect_num_outputs="2">
            <param name="input" value="song_primates.424.gene.tre"/>
            <param name="branch_annotate" value="0" />
            <param name="lambda" value="2.0" />
            <output name="output" file="song_primates.tre" ftype="newick"/>
            <output name="log_output">
                <assert_contents>
                    <has_line line="Number of taxa: 14 (14 species)" />
                    <has_line line="gradient0: 339" />
                    <has_line line="Number of Clusters after addition by distance: 339" />
                    <has_line line="Final quartet score is: 389734" />
                </assert_contents>
            </output>
            <assert_command_version>
                <has_text text="@TOOL_VERSION@"/>
            </assert_command_version>
        </test>
    </tests>
    <help><![CDATA[

ASTRAL is a tool for estimating an unrooted species tree given a set of unrooted gene trees. ASTRAL is statistically consistent under the multi-species coalescent model (and thus is useful for handling incomplete lineage sorting, i.e., ILS). ASTRAL finds the species tree that has the maximum number of shared induced quartet trees with the set of gene trees, subject to the constraint that the set of bipartitions in the species tree comes from a predefined set of bipartitions. This predefined set is empirically decided by ASTRAL (but see tutorial on how to expand it). The current code corresponds to ASTRAL-III (see below for the publication).

The algorithm was designed by Tandy Warnow and Siavash Mirarab originally. ASTRAL-III incorporates many ideas by Chao Zhang and Maryam Rabiee. Code developers are mainly Siavash Mirarab, Chao Zhang, Maryam Rabiee, and Erfan Sayyari.

Github: https://github.com/smirarab/ASTRAL

----

**input**
- The input gene trees are in the Newick format.

**output**
- The output is in Newick format.

**Newick annotations**
    - No annotations (-t 0): This turns off calculation and reporting of posterior probabilities and branch lengths.
    - Quartet support (-t 1): The percentage of quartets in your gene trees that agree with a branch (normalized quartet support) give use a nice way of measuring the amount of gene tree conflict around a branch. Note that the local posterior probabilities are computed based on a transformation of normalized quartet scores (see Figure 2 of the paper).
    - Alternative quartet topologies (-t 8): Outputs q1, q2, q3; these three values show quartet support (as defined above) for the main topology (LR|SO), first alternative (RS|LO) and second alternative (RO|LS), respectively.
    - Local posterior (-t 3): is the default where we show local posterior probability for the main topology.
    - Alternative posteriors (-t 4): The output includes three local posterior probabilities: one for the main topology, and one for each of the two alternatives (RS|LO and RO|LS, in that order). The posterior of the three topologies adds up to 1. This is because of our locality assumption, which basically asserts that we assume the four groups around the branch (L, R, S, and O) are each correct and therefore, there are only three possible alternatives.
    - Full annotation (-t 2): When you use this option, for each branch you get a lot of different measurements. The values q1, q2 and q3 show quartet support (as defined in the description of -t 1) for the main topology, the first alternative, and the second alternative, respectively. The values f1, f2 and f3 show the total number of quartet trees in all the gene trees that support the main topology, the first alternative, and the second alternative, respectively. The values pp1, pp2 and pp3 show the local posterior probabilities (as defined in the description of -t 4) for the main topology, the first alternative, and the second alternative, respectively.
    - QC: this shows the total number of quartets defined around each branch (this is what the paper calls *m*).
    - EN: this is the effective number of genes for the branch. If you don't have any missing data, this would be the number of branches in your tree. When there are missing data, some gene trees might have nothing to say about a branch. Thus, the effective number of genes might be smaller than the total number of genes.
    - Polytomy test (-t 10): runs an experimental test to see if a null hypothesis that the branch is a polytomy could be rejected. See this paper: doi:10.3390/genes9030132.

**File export of branch annotations**
    Since it is often hard to know which branch is L and which branch is R, understanding branch annotations is a bit hard for most users.
    To help, we have added a feature for outputting some of the branch annotations into a .csv file.
    Note that we strongly suggest using DiscoVista to visualize the quartet frequencies. If you find DiscoVista hard to install and use, you can instead use these .csv files.
    To get the .csv outputs, you can use -t 16 and -t 32.
    The .csv output has the following format:

    - The output file is always called freqQuad.csv and is written to the same directory as the input file (sorry for the ugliness!)
    - The file is tab-delimited.
    - 1st column: a dummy name for the node. Note that each three lines in a row have the same node number. Around each node, we have three possible unrooted toplogies (NNI rearrangements). We show stats for these three rearrangements.
    - 2nd column: the topology name for which we are giving the scores. Here, t1 is always the main topology (observed in your species tree) and t2 and t3 are the two alternatives.
    - 3rd column: Gives the actual topology with the format: {A}|{B}#{C}|{D}. This means that the quartet topology being scored is putting groups A and B together on one side, and groups C and D on the other side. Please remember that quartets are unrooted trees. Each of the groups is a comma-separate list of species.
    - 5th column: number of gene trees that match the the topology in this line. You will note that this number is not always an integer number. The reason is that in each gene tree, groups A, B, C, and D may not be together. Those gene trees still count as 1 unit, but they can contribute a fraction of that total of 1 to each of the tree topologies. So a gene tree may count as 0.7 for one topology, 0.2 for another, and 0.1 for the third.
    - 6th column: This is the total number of gene trees that had any useful information about this branch. If you have no missing data, this should equal the total number of gene trees. If you have missing data, some genes may be missing one of groups A, B, C, or D entirely. Those genes will be agnostic about this branch. This column gives the number of genes that have at least one species from each of A, B, C, and D.
    - 4th column is likely you are interested in and it depends on whether you used -t 16 or -t 32. The -t 16 column is the local posterior for the topology given in this line. Note that the local posterior probability is different from normalized quartet score. See Figure 2 of the paper. The -t 32 column is simply 5th column divided by the 6th column. Thus, it gives the normalized quartet score for that topology. Note that the three lines with the same node name (1st column) will add up to one in their 4th column.

Prior hyper-parameter
    Our calculations of the local posterior probabilities and branch lengths use a Yule prior model for the branch lengths of the species tree.
    The speciation rate (in coalescent units) of the Yule process (lambda) is by default set to 0.5, which results in a flat prior for the quartet frequencies in the [1/3,1] range.
    Using -c option one can adjust the hyper-parameter for the prior.
    For example, you might want to estimate lambda from the data after one run and plug the estimate prior in a subsequent run.
    We have not yet fully explored the impact of lambda on the posterior.
    For branch lengths, lambda acts as a pseudocount and can have a substantial impact on the estimated branch length for very long branches.
    More specifically, if there is no, or very little discordance around a branch,
    the MAP lengths of the branch (which is what we report) is almost fully determined by the prior.

    Note that setting lambda to 0 results in reporting ML estimates of the branch lengths instead of MAP.
    However, for branches with no discordance, we cannot compute a branch lengths.
    For these, we currently arbitrarily set ML to 10 coalescent units (we might change this in future versions).
    ]]></help>
    <citations>
        <citation type="doi">doi:10.1186/s12859-018-2129-y</citation>
    </citations>
</tool>