Commit message:
planemo upload for repository https://github.com/Helmholtz-UFZ/ufz-galaxy-tools/blob/main/tools/virsorter commit fdde667342e69dd11ef98029e0d6d3376d0d95c7 |
added:
test-data.sh test-data/8seq.fa test-data/virsorter.loc tool-data/virsorter.loc.sample tool_data_table_conf.xml.sample tool_data_table_conf.xml.test virsorter_run.xml |
b |
diff -r 000000000000 -r 76a7de225f06 test-data.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data.sh Sun Jun 23 16:48:49 2024 +0000 |
b |
@@ -0,0 +1,4 @@ +#!/bin/bash + +cd test-data/ +wget -O - https://osf.io/v46sc/download | tar -xvz \ No newline at end of file |
b |
diff -r 000000000000 -r 76a7de225f06 test-data/8seq.fa --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/8seq.fa Sun Jun 23 16:48:49 2024 +0000 |
b |
b'@@ -0,0 +1,6 @@\n+>Caudo-circular\n+CTGTTCATCGTTCCGGGGGGGCTCAACGTAGATGTTGGGGCCGCAGACTTGACACTCGAAATCATTATCGCAACGGAGCAACCAGCAGACCTTGAGTCGAGAAGTGCGGTGCTTTCGAATATGGCTTACATCATAAAGGATGCAATCGCCATGGGTCACAACCATGCCTACAACGACAACGAACCTATACCGATGCGAGCCATCCTAGAGATGCCCGTTCAATGCGAACCGTTCCTTGCTCGTTTCGATAATAGCTTAATCGGATGGACGTGTCAACTGACGTTTAATTTAGACAACCGGAATGATGTCTGTCTAATACCTATCAAGTGATAACGCTTAGACTCGGTGGGAAGACGTACCCATGCCCTAACGTCAGCACCGAGCTGTTGAGGGTGGCAAAGCTCTGGAAGAAGAACGCTCAGGCGAAATTGAGGCGTGCGAAAATTAACGCATCGGGTGACCTGAAGAGTAGTATGAAGCTCACCCTGCATCAAGACCGCGATGATATATGGGTTGACCTTACACCTGATGTCGATTACTGGGAATTCGTAGACCTTGGAGTTCGCGGAGCAGGTCCATTGACGAAGAACGACAAAGACAAAGGCTACCCGTTCAAAGGCCAGCAAGAGTCTCCATTCAAGTACACTAAAAAAAGACCACCTCTTAAACCCTTAATGGATTGGATAAAGGTGAAGCCAGTCACCCCAAGAGACAAAGACGGGAAGTTTATGAAGCACCGAGCCTTCGCCTTCCTGTTGCAAGCGGCTATCTTTCGAAGAGGACTGAAGCCGTCCTTCTTTATATCTGGCACTGGAGACAGAATACAAAGGAAGTACGCTCAATCTATTGCCGATGCATACGCAAAGGATATGGCCAATGCGATGCTCGACAACCTGTCATAAACGAACACCCTAAAAAGGATGGCCTACACTATCAACGAAGAACCTAATAACGCTTATTTCGCAAGTACCTTACAACCCATCATCTGGTCTGTGAGCGACACCGCAATGGGATTCAAGTATCGATTCACCCTTCGGATAAAGGATGCCAGTGGCACGGAGTTGGTTCGTCTACACCAGCAACCGAATAACTCGGCCTCGGCAGTGTTCGATGTCTCGAATGTCTTAGACTCCTACGTTGACTTGAACCCGTACTTGACACCTCCCAACCTTGCAATCAATGTCTCGAGCGTTCACACCACGGGACCATACTCGGCTCAGAGTTCCCTTATGCCCAAGACGGCTGAGTTTTTCTTAGTCGATGCAGGTCACACCTCCGCCACAACTGCGGATGGGATTCCAACGACGACTTGGGAATACACAGACAGGAAAGTGACCGCCATGCGATGGCAAGGTGTTAACTCTGACCAAGTGGGTTGGAATCAAGTCTTGGCGAATAGCGGTTACCTCCTTCAAGACTTTACAGATGATGGATACAATACACCAGTCTTCTCCGATAGTCCTTTAAACACGGCGGCAACCATTGGCAGTGACTGGACGTGGCTCAATGCAGGCACCAAGGTCTTTGAGTCCTTTGTTTCGGAGGATGGTTGGAGGACGTTAACACGAGCAGTTGGCTCCGTGGCTCCTATGACGGTAGTAAAGGCTTCAGATTGGACGAACATTAAAGTCTATCTAAATAGCACTCTCCAAGCTTCCCAAGCAAAGAGCAACACAGGACTCGGCGGCGTGATGTCGGCAGATGTTAACCTTGCTGAGGAGCTCATTCAATGTACTGCCGTTGGGCCAGCCAATTTAATAAGCTCTAACATCTGGACGGATATCGCCACTGCATTCGTTACGAACACTTGGACTCACTTCGATGTCGTGTTTTCTGAAAATGAAGGAGCAGACCAACAGAGCCACCTGTATCGATTTAAGCGACTTGACAAGAATTGCTTCTACGAGGCAGTGACATTGGCATTCAGCAATCGCTCCGGCGGTTGGGATTATATCGACTTCACATTAGAGAAGAAACGAAAGACCAGAGTGACAAGCCGTAATCGATTCGATGGCTTGCAGGGTAATTACTTGGAGGCTTCGTCTACCATCCCATTTAGGAGCACTGGCCTTGAAGGTGGCATCACCACACGTCGAGTTCAATCCCGAAGAAGCGTAATGGTGAACACGGGCTTCTATCCAGAATCACAAAACGACTTGATGGAATCCCTATATCTGGCACGGGATGTCTACGAGATTACACCAGCTGGCGTGGTGACCAAAGTATACGTGACAGACTCCTCGTGGAAGGAGCTCACAAGGAATCTAGACAACCAGTTTCGATATTCAGTTTCGTATGAATACGCCAAGGAAAGAATAACCTAATGACACGACTCGAAGCAAAGCATTTGGACACGGGTGATTGGTATCCGTTGCACCTTCCCATGGGTGTGGTGGTGCCTGTTACGTTTCGATTCGCAGACCCCAGCAAGCTCACTCAACGGGAAGCTCCATACTCTGGTGCTTTTCAACTGCCTTTCTCCAATGAGAATAACCAGTTCTTCGGCCATTACTATCAGGCTAATTTAACCAACGCGACCTTTGATGCTGGCCAGCCCAATACGTGCCGACTCCTCGTCGATGGAGAGGTGATTATAGAAGGCTCCCTCCAGCTTCGAAGCATTTCGTTGGTAGACAACACCTACGAGGTTAACGTGGTGTCAGGGGCTGGAGATTTGTTTACTCGATTGGGAGACACCAAGCTTCGGGATGCCATCGACAATCCCGATGATTATGGATATACGGCTTCTGACTCAGCCGTTATCGATAGCTGGACGAGCGATATCACAAATGGGTCCGTAGGCAATGGAGTGATTCGAATCCCTTTATTGGATACAGGTAATTCTCCAGCTGGGAGGTTCTTCGCGGATTATGGCGTGGAGGAAGGGCTATTCAAGGCAGGCTATCTTCTGCCGAGCAATATGAGCCCGACTATGAGTGTCGACCATCTGTTGCGGAAAGTACTGTCTCACTTTGGGTATACCATCGACTCCGTGTTTATGGCGACAACGGCTTGGACGAATTTAGTGATGACGCTTGCCAGCAGTTCTGGAGGTATACCGTTTCGCCCTTATTATGGATGTAAGGTAGGCCGACTGACTGACCAAGTGCTTTACGACTACCAGAGCATCTCTCCCAGTTATAATGCCGTAAACTTTAACGACGAAGCATCTGCTCCGTTTTTTGACCCAGATGGTCTTTTCTCTGGGGGCCTGTTCACCGCCCCTATCGATATTGATGCCGTCTTCAGGATATCGTTGCTGATTGACAATTCAGCGAGTATCAGTGCATCGAATATGACCATCTCCTTAATGAGTGGAGGCGCGCAAGTCTGGAGCACCAGCCAAGGAATCCCGACCAACACCACTCAAACGGTCTTTCCTTTTACTTGGGAGACGACTCCCATCTCTATGACGCAAGGGCAGAACCTCACGGTCGTTGTTTCAGTGTCAAACATCAATGGCACCTCCCTGACATTGGTAGCTTCTCAATACACGTACTTTCAATTCGTTTCGTACTCTTCTCCGTTTTCTGCCAATGGCTTAATCTCGCCTATCGACGGCATACCCGATATGACCTGTGCTGGCTTTATTCGGGATTTGGTTCAGCGGTTCAACCTGTCTCTCCTCCAATCCCAAGACCCACAGGTACTTAGGCTGGAGCCGTTAACCGATGTCATTGGCACAGGCAAGGTACTCGACTGGACTGCCAAGGTAGATGTCTCAAAGACCTACCTACTCAAGCCAGCGACTAGCCTCCGCAACAAGACAATCGCTTTCTCAGACGGTGTCGACAAGGATGCTCCCAACGTATATCATCAGGAGAACTACGCTTTCCCCATGGGTGAGTATAAATACATCTCTGAGGATTCCTTTGCTCAAGGTGCGGCTACAAATAAGGCAGTCTTTGGTTCGTCCATGATTTCCCTATTGCCGAAATACGACTGGTCAGGAGTGGACGTGAATA'..b'AATTTCCACAGAACTCTGGTTATCCGCAGCTGTCGAGAACACCTGACTCTTACGTGTGGGAATTGTGGTATTCCTCTCAATCAATGGTGTTGCGACACCACCCATTGTTTCGATACCAAGTGTAAGAGGAGTGACATCCAGCAGCAATACATCTTCAACTTCTCCTCCCATTACACCGGCCTGAATGGCAGCACCTACAGCAACCGCTTCATCAGGATTGATATTCTTGTACGGATCCTTACCAATGAATTCCTTCACAGTCCTGACAACTGCAGGCATCCTGGTAGAACCACCGATAAGCAGCACCTTTTCGATATCTGAAGTCTTAAGGTTTGCATCCTTTAGCGCCTGTTTCATGGATTGGAGGGTCTTATCGATGAGATCCTCGGTCATCTTTTCAAATTGTGCCCTTGTAATATCGATATCGACATGTTTTGGCTGACCGTTAGAATCGGCTGTAATGAAAGGAAGGTTGATGTTTGTGGTACCAACGCCTGAAAGTTCAATCTTGGCCTTCTCCGCAGCATCCTTGAATCTCTGCATTGCTGCCTTGTCACCGGAAAGATCGATACCTTCTTCCTTTTTGAATTCGCTTATGATGTGATTGACAATCCTGTCATCAAAGTCATCTCCTCCAAGATGGGTATCACCACTTGTGGAAAGTACTTCAAAAACCCCGTCTCCCAACTCAAGGATGGAAACATCAAAAGTACCGCCTCCAAGATCGTAAACAAGGATTTTGTGGTCACCTGTTTCCTTGTCCAGTCCATATGCCAGAGAAGCTGCTGTGGGCTCATTTATAATTCTCTTAACTTCAAGCCCTGCAATCTTACCGGCATCCTTTGTAGCCTGCCTCTGGGAGTCGTTGAAATAAGCAGGGACTGTGATTACAGTATCGGTAATTGTTTCACCGATATAACTTTCTGCATCCTCCTTGAGTTTGCGCAGTATCATTGCAGAAATTTCCTGTGGCGTATACTCCTTATCGTTGAGAGTTACCTGATAGTCCCCTTCGCCAATATGACGTTTGATTGAGCTTACCGTATTATTCGGATTTGCTACCATCTGTCTTTTGGCTACCTGACCCACAAGTTTCTCCCCCTTCTTTGAGAATCCCACTACTGAGGGAGTAGTCCTGCTGCCTTCTGCATTTGGTAATACAGTTGGTTCGCCACCTTCAATCACTGCCATGCAGGAATTTGTAGTTCCCAGATCAATTCCAAGTATTTTACCCATATAGATTACCTCTTTGTAATATTATCAGATTTATAGTTAATATTTCTTATCAATTCGTTTCAATGAACTTCACTTCTCCTCTTTGTCAGAAGAAGTGTTTTTGGAAACCGCCACCATAGCGGGCCGAATTACCCTGGAATTTAACTTGTATCCAGGTTTGCAAACATCAATGATAGTATTGTCAGGATGATCCGCGTGTTCAACGTGCATCATGGCTTCATGTTTGGAAGGGTCAAATTCCTCGCCTTCACATTCAATCCTTTTGAGACCCTCTTTTTCCAGGATCGATACAAATTGCTTGAACACCATTTCTACACCTTCGACCACCGAATTCACATCATCAGTATTGTGAGCTGACTCTATGGCCCTCTCGAAGTTATCATAGACATCAAGCAATTCCACCATTAGGTTTTCCACCGCAAAATTGCGGAATTCTTCCTGCTCCTTGCGGGTACGTTTGCGGAAATTATCAAATTCTGCTCTCTTACGTAAAAGGTCTTCCTTCAGAGATGCAATTTCCGCCTCCTTTTCCTGAACCAGCTGTTCCAGCTCTTCAGGAGAATTTCCCGCGTCTGAGGAGTTATTATCTTCTTGTTTATCTTTACCTACCATCACTGACATCTCCGCCTTTGTATTGAGCGGTCCAACTTATAATAGAAATTGCAAAGAATTGGATTGTATGAGGTTCTATAAATAATTTGCGTATGCCAAGTGGGATCTAATCAGGAATATTGATTCTGAACCCTGTAAGAGAAGAACTTGACAACCAGCATACTAACAATCATACAAATCGCCAGTGCGGAAACTAGCCTATTCCATCCTATGGAAAAAAGACCATCGTTAAAAGCAAAACTCAGTCCACCCAGAAAAGAGATGATTGAAAAGGTACTTACCGAGGAAACAAGAGCTGCACCCACAATTGATCCCACTTCGTCGGCCCCTCTCTTTTCAAAGAAAACAGAACCAAGAATGAATATGATGGCGAATACCAAGAGTATAATAGCATAAGGTACGGGGTTCGAACCGGTGGTTACTATTTGCATTATACCCATGGCCATACCTACCATAAAAATTGCCATTGCAAAAGATATTAACAGTGCCTGAATAAAAGGATTTTCTGACATTTCCAAATCCATCCCTCTCTGCAAAAAAGAGGTCCTGATAGTATATATTTATTGTCTCGTTGATATAAACCCAAAGCATATATTAAATAAAACATTTCTCGAAATATGAACATCTTAATTTGCGAATATGCCACAAGTACGGGAATGGGAGGTACCTTCCTGCTTGAAGGTAAAGCCATGTTAAAAAGCCTTGCAGAAAGCTTTGCCGCAGGCCATCATGAAATCAAATATACGACATCTGCAACCGAAATTCAGATCGGCACACCTGTATATTGTAACCATGAAAACATAAAGACGGTATTACAGGCCGAGGCCAATAAATGTGATTGCGCATTGATAATCGCTCCCGATGACCTGCTTGCAGGGCTTGTGGAAGAAATAGTAGAACATACCTCTAATATGGGATCATCCCCAGAGGTCATCCGCAAGTGTGCGGATAAATTCAAGTGCGGAAAAATTCTTGAAAGCAATGGCATACCTGCACCCCAACTTGTTGAAACGGTGGAAGATATCAAAAAAGATACCAATTATGTTGTCAAACCCCGTTATGGATGTGCTGCGGAAAACACGATCATTACATCTAATCCTTCAATATCCCAGGACATGGTTGTAACCGAATATGTGGAAGGAGAACACCTGAGTGTGAGTCTTGTCTGTGCCGATGTGCCCCTCCCACTGACAATCAATCGCCAGTACATAGAAATTACAGGAAGAGGAGATAAAACTGTCATCGATTACAGGGGAGGGATAACCCCCTATGAAACACCTGACAGGGAACTGATTATCGATACTGCAAAAAGGGCAGCCCAGGCGCTTGGATGCAGGGGGTACACAGGTGTTGATATTGTAATGGGTGACAGCCCCTATGTAATAGACATAAATCCCAGGCCAACCACTTCGCTTGTGGGCATCTGCAAAATAATCAAACCCCAAATAGGAGAATTGTTAATCGATGCCCTGGAAGAGAACCTACCACCAACAGTACATATTACAGGCAATTATGAATTCCGAAAGGAGGACCTTTTATGAAAAAACTCGGGATAGATATCGGAGGGGCAAACACAAAAATTGCCTCCTCAGACGGCTCTGTCTGTGAACTCTACTATGTACCTCTCTGGAAAGGAACAAAACTTCCAACGGTTCTGAAAATTATCTCTGAAAAACACAATCCTTCTCATGTGGGTGTTGTAATGACCGGAGAACTTGCTGATTGTTACGATAACAAAAAAGCAGGGGTTGTCGGCATTATGGATATTGTAAGAGACAGTTTCGACTGTGAGATTGATTTTCTGGACCAGGAAGGCAATTTTGTAAAGCACACACAAAATCCCGCCTCCCTTGCCGCAGCAAACTGGATGGCTTCTGCAAGTCTTGTTGCCAGAAAAATGAAAGATTGTCTTTTTGTGGATATGGGCAGTACTACAAGTGACCTAATCCCTGTCAAAGGCGGGAGGATCGTTGCACACAATACCGATACACAACGTCTTGCAAACAACGAATTATTGTATCAGGGTGTCCTGAGGACAAATATAGCGGCCCTGCTGGACAGTGTGGCGATAAGTGCGGGGAATTGTCGAATCGCCTCTGAACTATTTGCCACAAGCGCGGATGCTTACCTTCTCCTTTCAGATATCGATG\n' |
b |
diff -r 000000000000 -r 76a7de225f06 test-data/virsorter.loc --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/virsorter.loc Sun Jun 23 16:48:49 2024 +0000 |
b |
@@ -0,0 +1,3 @@ +# value name path +# value must be the version of the database, e.g. 0.4, needs to be a number +0.4 virsorter 0.4 2.2.4 ${__HERE__}/db/ \ No newline at end of file |
b |
diff -r 000000000000 -r 76a7de225f06 tool-data/virsorter.loc.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/virsorter.loc.sample Sun Jun 23 16:48:49 2024 +0000 |
b |
@@ -0,0 +1,3 @@ +# Your virsorter.loc file should include an entry per line +# value should be db version, i.e. a number +#value name virsorter_version path |
b |
diff -r 000000000000 -r 76a7de225f06 tool_data_table_conf.xml.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_data_table_conf.xml.sample Sun Jun 23 16:48:49 2024 +0000 |
b |
@@ -0,0 +1,10 @@ +<?xml version="1.0"?> +<tables> + <!-- Locations of indexes for virsorter, since reference data seems not to be versioned we use the download date as version --> + <table name="virsorter" comment_char="#"> + <columns>value, name, virsorter_version, path</columns> + <file path="tool-data/virsorter.loc" /> + </table> +</tables> + + |
b |
diff -r 000000000000 -r 76a7de225f06 tool_data_table_conf.xml.test --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_data_table_conf.xml.test Sun Jun 23 16:48:49 2024 +0000 |
b |
@@ -0,0 +1,6 @@ +<tables> + <table name="virsorter" comment_char="#"> + <columns>value, name, virsorter_version, path</columns> + <file path="${__HERE__}/test-data/virsorter.loc" /> + </table> +</tables> \ No newline at end of file |
b |
diff -r 000000000000 -r 76a7de225f06 virsorter_run.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/virsorter_run.xml Sun Jun 23 16:48:49 2024 +0000 |
[ |
b'@@ -0,0 +1,233 @@\n+<tool id="virsorter" name="VirSorter" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.0" license="MIT">\n+ <description>identify DNA and RNA virus</description>\n+ <macros>\n+ <token name="@TOOL_VERSION@">2.2.4</token>\n+ <token name="@VERSION_SUFFIX@">0</token>\n+ </macros>\n+ <xrefs>\n+ <xref type="bio.tools">virsorter</xref>\n+ </xrefs>\n+ <requirements>\n+ <requirement type="package" version="@TOOL_VERSION@">virsorter</requirement>\n+ <!-- virsorter (virsorter setup -d db -j 4) creates a conda env yaml (and conda env)\n+ that is used by the snakemake pipeline. in the tool we disable the use of the conda env. \n+ \n+ the following are the pinned requirements used in the conda env that would be created by the setup \n+ -->\n+ <requirement type="package" version="1548">last</requirement>\n+ <requirement type="package" version="0.3.3">ncbi-genome-download</requirement>\n+ <requirement type="package" version="0.16.12">ruamel.yaml</requirement>\n+ <requirement type="package" version="2.6.3">prodigal</requirement>\n+ <requirement type="package" version="1.1.3">screed</requirement>\n+ <requirement type="package" version="3.4">hmmer</requirement>\n+ <requirement type="package" version="0.22.1">scikit-learn</requirement>\n+ <requirement type="package" version="0.7.0">imbalanced-learn</requirement>\n+ <requirement type="package" version="1.2.5">pandas</requirement>\n+ <requirement type="package" version="0.13.2">seaborn</requirement>\n+ <requirement type="package" version="1.23.5">numpy</requirement>\n+ <!-- additional pin (the datamanager failed because of this .. so I\'m adding this just be sure) -->\n+ <requirement type="package" version="2.7.0">pulp</requirement> <!-- needs to be pinned, because the old-ish snakemake used in vs2 struggles with https://github.com/snakemake/snakemake/issues/2607 -->\n+ </requirements>\n+ <version_command><![CDATA[virsorter --version 2> /dev/null | grep "^virsorter" | cut -d" " -f3]]></version_command>\n+ <command detect_errors="exit_code"><![CDATA[\n+ virsorter run all\n+ --db-dir \'$db_dir.fields.path\'\n+ --seqfile \'$seqfile\'\n+ --include-groups \'#echo ",".join($include_groups)#\'\n+ --jobs \\${GALAXY_SLOTS:-4}\n+ --min-score $min_score\n+ --min-length $min_length\n+ $keep_original_seq\n+ $exclude_lt2gene\n+ $prep_for_dramv\n+ $high_confidence_only\n+ $hallmark_required\n+ $hallmark_required_on_short\n+ $viral_gene_required\n+ $viral_gene_enrich_off\n+ $seqname_suffix_off\n+ $provirus_cond.provirus_off\n+ #if $provirus_cond.provirus_off != ""\n+ --max-orf-per-seq $provirus_cond.max_orf_per_seq\n+ #end if\n+ --tmpdir \\${TEMP:-\\$_GALAXY_JOB_TMP_DIR}\n+ --rm-tmpdir\n+ --use-conda-off\n+ ]]></command>\n+ <inputs>\n+ <param argument="--db-dir" type="select" label="Reference database">\n+ <options from_data_table="virsorter">\n+ <validator type="no_options" message="Built-in reference data is not available. Contact the Galaxy admin." />\n+ </options>\n+ </param>\n+ <param argument="--seqfile" type="data" format="fasta,fasta.gz,fasta.bz2,fastqsanger,fastqsanger.gz,fastqsanger.bz2" label="Sequences" help="" />\n+ <param argument="--include-groups" type="select" multiple="true" optional="false" label="Viral groups" help="Classifiers for these groups will be used">\n+ <option value="dsDNAphage" selected="true">dsDNAphage</option>\n+ <option value="NCLDV">NCLDV</option>\n+ <option value="RNA">RNA</option>\n+ <option value="ssDNA" selected="true">ssDNA</option>\n+ <option value="lavidavi'..b'A fasta sequence.\n+\n+The default score cutoff (0.5) works well known viruses (RefSeq). For the real environmental data, we can expect to get false positives (non-viral) with the default cutoff. Generally, samples with more host (e.g. bulk metaG) and unknown sequences (e.g. soil) tends to have more false positives. We find a score cutoff of 0.9 work well as a cutoff for high confidence hits, but there are also many viral hits with score <0.9. It\'s difficult to separate the viral and non-viral hits by score alone. So we recommend using the default score cutoff (0.5) for maximal sensitivity and then applying a quality checking step using checkV. Here is a tutorial of [viral identification SOP](https://www.protocols.io/view/viral-sequence-identification-sop-with-virsorter2-btv8nn9w) used in Sullivan Lab.\n+\n+**Output**\n+\n+identified viral **sequences**, including the following types:\n+\n+- full sequences identified as viral (identified with suffix ``||full``);\n+- partial sequences identified as viral (identified with suffix ``||{i}_partial``); here ``{i}`` can be numbers starting from 0 to max number of viral fragments found in that contig;\n+- short (less than two genes) sequences with hallmark genes identified as viral (identified with suffix ``||lt2gene``);\n+\n+Note that suffix `||full`, `||lt2gene` and `||{i}_partial` have been added to original sequence names to differentiate sub-sequences in case of multiple viral subsequences found in one contig. Partial sequences can be treated as proviruses since they are extracted from longer host sequences. Full sequences, however, can be proviruses or free virus since it can be a short fragment sequenced from a provirus region. Moreover, "full" sequences are just sequences with strong viral signal as a whole ("nearly full" is more accurate). They might be trimmed due to partial gene overhang at ends, duplicate segments from circular genomes, and an end trimming step for all identified viral sequences to find the optimal viral segments (longest within 95% of peak score by default). Again, the "full" sequences trimmed by the end trimming step should not be interpreted as provirus, since genes that have low impact on score, such as unknown gene or genes shared by host and virus, could be trimmed. If you prefer the full sequences (ending with ||full) not to be trimmed and leave it to specialized tools such as checkV, you can use `--keep-original-seq` option.\n+\n+**Scores**: This table can be used for further screening of results. It includes the following columns:\n+\n+- sequence name\n+- score of each viral sequences across groups (multiple columns)\n+- max score across groups\n+- max score group\n+- contig length\n+- hallmark gene count\n+- viral gene %\n+- nonviral gene %\n+\n+**Boundary** information: This is a intermediate file that \n+1) might have extra records compared to other two files and should be ignored;\n+2) do not include the viral sequences with < 2 gene but have >= 1 hallmark gene;\n+3) the group and trim_pr are intermediate results and might not match the max_group and max_score respectively in the Scores output.\n+Only some of the columns in this file might be useful:\n+\n+- seqname: original sequence name\n+- trim_orf_index_start, trim_orf_index_end: start and end ORF index on orignal sequence of identified viral sequence\n+- trim_bp_start, trim_bp_end: start and end position on orignal sequence of identified viral sequence\n+- trim_pr: score of final trimmed viral sequence\n+- partial: full sequence as viral or partial sequence as viral; this is defined when a full sequence has score > score cutoff, it is full (0), or else any viral sequence extracted within it is partial (1)\n+- pr_full: score of the original sequence\n+- hallmark_cnt: hallmark gene count\n+- group: the classifier of viral group that gives high score; this should NOT be used as reliable classification\n+ ]]></help>\n+ <citations>\n+ <citation type="doi">10.1186/s40168-020-00990-y</citation>\n+ </citations>\n+</tool>\n' |