Galaxy |

Please note that running task end_to_end, phamer, phagcn, phatyp, and cherry, will automatically run phavip. The output files are the same but the supplementary files will be dumped into the corresponding task.

Input

Contig sequences in FASTA format
Optionally own predicted protein sequences can be given (by default the tool will use prodigal and diamond blastp for the prediction)

Output

A tabular dataset with the following columns:

Accession: the accession or the name of the input contigs.
Length: the length of input contigs.
Protein_num: total number of predicted proteins.
Annotated_num: number of proteins that have significant alignments.
Annotation_rate: percentage of proteins that have annotations.

In addition the gene annotation itself can be produced:

Genome: the accession or the name of the input contigs.
ORF: the ID of the translated protein.
Start: start position on the genome.
End: end position on the genome.
Strand: forward (1) or backward(-1).
GC: GC content.
Annotation: the annotation of the proteins.

Please note that there are two kinds of hypothetical protein:

hypothetical protein (no hit): a protein has no alignment results to the reference database.
hypothetical protein (no hit): a protein has alignment results but the annotation is "hypothetical protein"