Galaxy | Tool Preview

MutCount (version 2.2.0)
It must contain the concatenated file in NUCLEIC format from Phylogeny tool
List the species separated with a comma (for e.g Ap,As,Ct,Gt,Yu)
List the species separated with a comma (for e.g Ap,As,Ct,Gt,Yu)
Sets the length (in codons) of the resampled sequences
Sets the number of resampled sequences

Authors Eric Fontanillas created the version 1 of this pipeline. Victor Mataigne developped version 2.

Galaxy integration Julie Baffard and ABiMS TEAM, Roscoff Marine Station

Contact support.abims@sb-roscoff.fr for any questions or concerns about the Galaxy implementation of this tool.
Credits : Gildas le Corguillé, Misharl Monsoor

Description

1-Separated mode

Input files are all the orthogroups computed by the AdaptSearch suite; Counts and test are computed on each group separatly. This mode counts occurrences of amino-acids or nucleic acids, according to the sequences type, in each distinct orthogroup. Then, two subgroups of species are set by the user :

Within the groups, the program checks wether the occurrences of each element (amino-acid, nucleic acid, thermostability indice, GC content …) is higher of lower between one species and all the species of the opposite group. Binomial tests are then performed of these counts.

2-Concatenated mode

The input file is the super-alignment obtained by concatenation of all the orthogroups computed by the AdaptSearch suite. This script counts the number of codons, amino acids, and types of amino acids in sequences, as well as the mutation bias from one item to another between 2 sequences. Counts are then compared to empirical p-values, obtained from bootstrapped sequences obtained from a subset of sequences.

In the output files, the pvalues indicate the position of the observed data in a distribution of empirical counts obtained from a resampling of the data. Values above 0.95 indicate a significantly higher count, values under 0.05 a significantly lower count.

The script resamples random pairs of aligned codon to determine what counts can be expected under the hypothesis of an homogenous dataset. Counts are performed on each generated random alignement, thousands of alignments allow to draw a gaussian distribution of the counts. Then the script simply checks whether the observed data are within the 5% lowest or 5% highest values of the distribution.


Input files

If you choose the concatenated method, the input file is the concatenated genes fasta file (in nucleic format) from a previous run of the toolConcatPhyl.

If you choose the separated method, there are two input files : - A dataset collection containing output files from the CDS_Search tool, the one without indels. These files must be in nucleic or proteic format according to the format chosen along with the method. - The concatenated genes fasta file from ConcatPhyl, only used here to retrieve species name.


Parameters

There are parameters only for the "Concatenated" method :


Outputs

Many outputs in .csv format , varying according to the chosen method and format (separated, nucleic ...)
  • When method = concat : 6 .csv outputs : countings of codons, amino acids, amino acids types, and transitions from amino acid to amino acid and from amino acid type to amino acid type.
  • When method = separated and format = nucleic : 4 collections with several .csv files : counts tables and binomial sign tests results for nucleotides and various indices (GC and purine percent ...) .
  • When method = separated and format = proteic : 4 collections with several .csv files : counts tables and binomial sign tests results for amino-acids and various indices (thermophilic indices, hydratation, partial specific volume...).

System Message: WARNING/2 (<string>, line 76)

Definition list ends without a blank line; unexpected unindent.

The AdaptSearch Pipeline

/repository/static/images/b24aa87591a5d671/adaptsearch_picture_helps.png

Changelog

Version 2.2.0 - 10/07/2018 - Updated separated mode : added a binomial sign test

Version 2.1 - 26/02/2017 - Fully re-written the concat method : fixed mistakes + cleaner code - Splitted output of concatenated method in several csv files. - Bug corrected in output files of separated method.

Version 2.0 - 12/07/2017

  • NEW: Replaced the zip between tools by Dataset Collection
  • More functional tests

Version 1.0 - 14/04/2017

  • Added the tools to the suite
  • Added a functional test with planemo
  • Planemo test using conda dependencies for python
  • Scripts renamed + symlinks to the directory 'scripts'