Galaxy |

Antismash (version 6.1.1+galaxy1)

Sequence file in GenBank,EMBL or FASTA format:

GFF3 file:

Specify GFF3 file to extract features from

Taxonomic classification of input sequence:

Source of DNA

Specify algorithm used for gene finding:

The 'error' option will raise an error if genefinding is attempted. The 'none' option will not run genefinding

Annotate with TIGRFam:

Annotate clusters using TIGRFam profiles. TIGRFAMs is a collection of manually curated protein families focusing primarily on prokaryotic sequences

Full genome PFAM anotation:

Each gene product encoded in the detected BGCs is analyzed against the PFAM database. Hits are annotated in the final Genbank/EMBL files. Also, selecting this option normally increases the runtime

PFAM anotation for only clusters:

Run a cluster-limited HMMer analysis

Run active site finder analysis:

Active sites of several highly conserved biosynthetic enzymes are detected and variations of the active sites are reported

Comparison against MIBiG database:

Run a comparison against the MIBiG database

BLAST identified clusters against known clusters:

Compare identified clusters against a database of antiSMASH-predicted clusters.

KnowCluster BLAST analysis:

Compare identified clusters against known gene clusters from the MIBiG database. MIBiG is a hand curated data collection of biosynthetic gene clusters, which have been experimentally characterized

Subcluster BLAST analysis:

The identified clusters are searched against a database containing operons involved in the biosynthesis of common secondary metabolite building blocks (e.g. the biosynthesis of non-proteinogenic amino acids)

Run Pfam to Gene Ontology mapping module:

RREFinder precision mode:

Run RREFinder precision mode on all RiPP gene clusters. Many ribosomally synthesized and posttranslationally modified peptide classes (RiPPs) are reliant on a domain called the RiPP recognition element (RRE). The RRE binds specifically to a precursor peptide and directs the posttranslational modification enzymes to their substrates

Analysis of secondary metabolism gene families (smCOGs):

It attempts to allocate each gene in the detected gene clusters to a secondary metabolism-specific gene family using profile hidden Markov models specific for the conserved sequence region characteristic of this family. In other words, each gene of the cluster is compared to a database of clusters of orthologous groups of proteins involved in secondary metabolism

Lowest GC content to annotate TTA codons at:

High-GC containing bacterial sequences contain the rare Leu-codon “TTA” as a mean for post-transcriptional regulation by limiting/controlling the amount of TTA-tNRA in the cell. This type of regulation is commonly found in secondary metabolite BGCs. This feature will annotate such TTA codons in the identified BGCs. Default: 0.65

Advanced options

Advanced options 0

Sideloadings

Sideloading 0

Outputs:

What it does

AntiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes. It integrates and cross-links with a large number of in silico secondary metabolite analysis tools that have been published earlier.

antiSMASH is powered by several open source tools: NCBI BLAST+, HMMer 3, Muscle 3, Glimmer 3, FastTree, TreeGraph 2, Indigo-depict, PySVG and JQuery SVG.

Input

The ideal input for antiSMASH is an annotated nucleotide file in Genbank format or EMBL format. You can either upload a GenBank/EMBL file manually, or simply enter the GenBank/RefSeq accession number of your sequence for antiSMASH to upload it. If no annotation is available, we recommend running your sequence through an annotation pipeline like RAST to obtain GBK/EMBL files with high-quality annotations.

Alternatively, you can provide a FASTA file containing a single sequence. antiSMASH will generate a preliminary annotation using Prodigal, and use that to run the rest of the analysis. You can also provide gene annotations in GFF3 foramt. Input files should be properly formatted. If you are creating your GBK/EMBL/FASTA file manually, be sure to do so in a plain text editor like Notepad or Emacs, and saving your files as "All files (.)", ending with the correct extension (for example ".fasta", ".gbk", or ".embl".

There are several optional analyses that may or may not be run on your sequence. Highly recommended is the Gene Cluster Blast Comparative Analysis, which runs BlastP using each amino acid sequence from a detected gene cluster as a query on a large database of predicted protein sequences from secondary metabolite biosynthetic gene clusters, and pools the results to identify the gene clusters that are most homologous to the gene cluster that was detected in your query nucleotide sequence. This analysis is selected by default

Also available is the analysis of secondary metabolism gene families (smCOGs). This analysis attempts to allocate each gene in the detected gene clusters to a secondary metabolism-specific gene family using profile hidden Markov models specific for the conserved sequence region characteristic of this family. Additionally, a phylogenetic tree is constructed of each gene together with the (max. 100) sequences of the smCOG seed alignment. This analysis is selected by default

Ouput

The output of the antiSMASH analysis pipeline is organized in an interactive HTML page with SVG graphics, and different parts of the analysis are displayed in different panels for every gene cluster

In the upper right, a small list of buttons offers further functionality. The house-shaped button will get you back on the antiSMASH start page. The question-mark button will get you to this help page. The exclamation-mark button leads to a page explaining about antiSMASH. The downward-pointing arrow will open a menu offering to download the complete set of results from the antiSMASH run, a summary Excel file and to the summary EMBL/GenBank output file. The EMBL/GenBank file can be viewed in a genome browser such as Artemis.