Please note that running task end_to_end, phamer, phagcn, phatyp, and cherry, will automatically run phavip.
The output files are the same but the supplementary files will be dumped into the corresponding task.
Input
- Contig sequences in FASTA format
- Optionally own predicted protein sequences can be given (by default the tool will use prodigal and diamond blastp for the prediction)
Output
A tabular dataset with the following columns:
- Accession: the accession or the name of the input contigs.
- Length: the length of input contigs.
- Protein_num: total number of predicted proteins.
- Annotated_num: number of proteins that have significant alignments.
- Annotation_rate: percentage of proteins that have annotations.
In addition the gene annotation itself can be produced:
- Genome: the accession or the name of the input contigs.
- ORF: the ID of the translated protein.
- Start: start position on the genome.
- End: end position on the genome.
- Strand: forward (1) or backward(-1).
- GC: GC content.
- Annotation: the annotation of the proteins.
Please note that there are two kinds of hypothetical protein:
- hypothetical protein (no hit): a protein has no alignment results to the reference database.
- hypothetical protein (no hit): a protein has alignment results but the annotation is "hypothetical protein"