Galaxy | Tool Preview

Filter sequence alignment (version 1.9.1.0)
Filters positions which are gaps in > allowed_gap_frac of the sequences
remove seqs whose dissimilarity to the consensus sequence is approximately > x standard deviations above the mean of the sequences
For example, if 0.10 were specified, the top 10% most entropic base positions would be filtered. If this value is used, any lane mask supplied will be ignored. Entropy filtering occurs after gap filtering

What it does

This script should be applied to generate a useful tree when aligning against a template alignment (e.g., with PyNAST). This script will remove positions which are gaps in every sequence (common for PyNAST, as typical sequences cover only 200-400 bases, and they are being aligned against the full 16S gene). Additionally, the user can supply a lanemask file, that defines which positions should included when building the tree, and which should be ignored. Typically, this will differentiate between non-conserved positions, which are uninformative for tree building, and conserved positions which are informative for tree building. FILTERING ALIGNMENTS WHICH WERE BUILT WITH PYNAST AGAINST THE GREENGENES CORE SET ALIGNMENT SHOULD BE CONSIDERED AN ESSENTIAL STEP.