Repository 'vapper'
hg clone https://toolshed.g2.bx.psu.edu/repos/johnheap/vapper

Changeset 4:8f6469ffef85 (2018-07-19)
Previous changeset 3:4432e4183ebd (2018-07-11) Next changeset 5:7f3cfd8d114c (2019-06-03)
Commit message:
planemo upload for repository https://github.com/johnheap/VAPPER-Galaxy
modified:
Tryp_T.py
added:
LICENSE.md
README.md
test-data/Test.fa
test-data/test_html.html
b
diff -r 4432e4183ebd -r 8f6469ffef85 LICENSE.md
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/LICENSE.md Thu Jul 19 06:38:58 2018 -0400
b
@@ -0,0 +1,16 @@
+ * Copyright 2018 University of Liverpool
+ * Author John Heap, Computational Biology Facility, UoL
+ * Based on original scripts of Sara Silva Silva Pereira, Institute of Infection and Global Health, UoL
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
\ No newline at end of file
b
diff -r 4432e4183ebd -r 8f6469ffef85 README.md
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README.md Thu Jul 19 06:38:58 2018 -0400
b
@@ -0,0 +1,22 @@
+This is the repository associated with the VAPPER Galaxy Tool available in the Galaxy Toolshed
+VAPP accurately quantifies the variant antigen diversity in Trypanosoma congolense isolates or the variable Cog presence in T.vivax isolates
+
+
+The Trypanosoma congolense variant antigen repertoire is divided into 15 clades or phylotypes. These phylotypes are present in any T. congolense isolate, but their relative abundance varies between strains. The purpose of the VAPPER is to accurately quantify antigen diversity in any T. congolense isolate by calculating the relative frequency of each phylotype. 
+
+The Galaxy VAPPER Tool has three modes. 
+1) T.congolense Genomic:
+This takes raw NGS reads (or pre-assmebled contigs) as input, assembles them de-novo, searches for evidence of each phylotype based on hidden Markov models (HMM), and calculates their relative abundances. 
+The results are visualized in three different ways: a table with each phylotype and their relative frequencies as proportions of the full repertoire in the given genome; a heat map with dendogram showing either absolute VAP variation or deviation from the mean, using our pilot dataset; and a Principal Component Analysis (PCA) plot showing variation distribution in the given sample compared to our pilot dataset. 
+
+2) T.congolense Transcriptomic:
+This requires NGS paired reads and uses bowtie2 and samtools for read mapping and processing, cufflinks for transcript abundance estimation, and hmmer for sequence identification. The output is a stacked bar chart and a table of frequencies based on the transcript abundances.
+
+3) T.vivax clusters of orthologs
+The approach for T. vivax relies on the presence/absence of clusters of orthologs (COGs). It requires velvet for the genome assembly and blast. it recieves paired sequencing reads in fastq format (or a contig file if already assembled) and the output is a binary matrix of the presence/absence of each COG/gene for a given sample. Within the tool there is a database of 28 isolates that are used as a comparison producing a heatmap and dendogram.
+
+
+
+
+
+
b
diff -r 4432e4183ebd -r 8f6469ffef85 Tryp_T.py
--- a/Tryp_T.py Wed Jul 11 08:58:14 2018 -0400
+++ b/Tryp_T.py Thu Jul 19 06:38:58 2018 -0400
[
@@ -89,7 +89,7 @@
     if strain == "Tc148":
         refName = dir_path + "/data/Reference/148_prot.fasta"
     if strain == "IL3000":
-        refName = dir_path + "data/Reference/IL3000_prot.fasta"
+        refName = dir_path + "/data/Reference/IL3000_prot.fasta"
 
     cuff_df = pd.read_csv(inputName+".cuff/genes.fpkm_tracking", sep='\t')
     cuff_df = cuff_df[(cuff_df['FPKM'] > 0)]
@@ -124,6 +124,8 @@
                         while line[0] != '>':
                             outfile.write(line)
                             line=ref.readline()
+                            if not line:
+                                break;
                 else:
                     line = ref.readline()
             else:
b
diff -r 4432e4183ebd -r 8f6469ffef85 test-data/Test.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/Test.fa Thu Jul 19 06:38:58 2018 -0400
b
b'@@ -0,0 +1,672922 @@\n+>NODE_1_length_170_cov_8.364706\n+TACAGAGGCGAACGAGTTGTGCTCACTTAGCTCAATAAAGTGAATTGAATTCATGAGTGC\n+ACTAATATGGGAATGAAATAGGAGAATAGATGCATTGGATGCTGCTTCTTTTTTTTACCC\n+TTAAGAAAAGTTTGTGGTGATCAGAGTGTATATACCTGTCTTACCCTGGAGGTGATAGTT\n+TCATATTAATCTCTTGATAATGTGCATAACATTTCTCCATTACTAAGCTG\n+>NODE_42_length_127_cov_28.456694\n+TTGACGCGCATAGTGATTTTTCAAAATTTTATTTTATTACTTTTTATTGTTGGATATTTA\n+GTTTTTCTCTGTCTTGTTCTGATGTTATTTGGTAAATGTTTCTCATTTTTATTTGTGGCT\n+GTTGTGTATTTCTTTTTGTACTTCGTTGGAGTGTTTATATCTTGTCACGTTGGGTCGCCT\n+TACTAGA\n+>NODE_69_length_122_cov_31.057377\n+CTGCAGTAGAGAAAATAATGTTTCTTCACGTGTGCTTTTGTGGTGTTGTGTCGTGGCTTT\n+ATTGCTGAAAGTCTTCATTGGTATCCTCACGTGATTCTGCTAATAGGAATTGAAGAGAAT\n+TATTGAATCCTAACCCTAACTCGTTTGCCCCAACATCTCGATAGGTGAAGATATCAGTGG\n+AG\n+>NODE_70_length_1050_cov_28.035238\n+CACCTGAGGTACCTCCCACAGGAACGCGGGGCCGATAACTCCCACACCTCAGTGCTCACT\n+CGAGCGATCGCTCCGGGGCGCATGCCCTCTATAATCCAATACCTTTCCAGCGTAGATCGG\n+GGGCTGTCGCTTAAGCCGGGCTGACTTTACAAGCCACTCGTTACAAACTTTCCTGTGGTG\n+GATGCGGTGGGCGGAAGTGGCTCCACTCGGCGAGATGCGGAAAGGCTAAAACGGACAATG\n+GCTATGCTCCAGATAACCAAAGCAAGCAGCCACCACACGACAACGAGCAAGGTGGGTAAG\n+TCTCGTCGGTTCATGGCGGTTGTCTTTCCGGAGTGGGCGAGCATTGAGAGTGGGCTCAGT\n+TATGAGCTCATTTACGTCCAGCACGCGAACAGCTCTCTGATATGGGGAAGACTGAGCTGC\n+TCCTCCGACGCTGAGGTCGGCAGTGGAGGGAATGCAGCGTTTCGGGATGCCGTAGCACAG\n+TTTCAAGTACGGCTATGTTCCTCGGTGGCAAGAGGAATTTTACAAGAAGTTCATGACGCC\n+AAATTAAAGCCATATCGTGGCGGTGAGAAACGAGTGAGCTTCGCTTGTTCCCACTGTTGT\n+TTCCTCGCATTTATTTGTCTTCCTTGTGCGGGCGCTTGTTTATTCCTAATTTTTTTCATT\n+TCTTTCCCTTTTAGTGTGACACCTGTAAGAGAGCGACAAAAATGGCCTGTAATGGGTGCA\n+CTGCCTAAGCCGTCTATTTTATTTCTTCTATTCCTATCGACTCACTCATTAAGATGCCTT\n+TATTTATTTCATTGACTATTTTTATTTGTTATTTTTTATTACTTCTCTTCACGCTGCCCT\n+TGTGCGCATGTCCCGTGCTTGTGGTGGCTGCCTCATATGATGCCCGTCTATTTGCCACCT\n+CCTGATTTATTATGTGAGATCGGCAACGGTATAGCATGCTCTCATGCTTTAGGGCTTACC\n+TTCCTTTGTGGCGACTGGAGTTAGCTCTCGCCCACGACACTACTACTTTTACACATTGTC\n+TGTTTACCACACTTCAGAGTCTGCGCGGGATGCTTTTCTTTTTTTTCGTTTCAATAACTA\n+ATGTTATGTTTGTATGTGTGTTGAGAGTGT\n+>NODE_120_length_1958_cov_12.900409\n+AGGGATGCATCAGCCGTAGTCCGTGGGACCCCTTTGCGGCTCAACCTCGCCGAGGAAACA\n+TAGCCCTTCGAGGCTCCCTGAAAGGTGGGGCTTCGAGCGTTCGGCCGAGCTGGTGGATTT\n+TGATCACCACCCGTGGATTGGCGTCAGCGGAACGGTGTTTGGTGCAACCTGGAGCTTCCT\n+TCCCGTTACCGCATTTCTGAGTTCTTTACTTCCTCCGTCATTCGTCAGCAACCGCCGCCG\n+AGAAGTATGGCCGCTGCCGGCGGGCCTCGCCGCTCAACGACTAAGGCATTTCTGAGAAAC\n+TATTTTTGGCAAGAGGACATGGTCAACGTGTTTATCACGGCTAAATGTGAACCTATCGTT\n+TCTTTAATTAATGTGCGTACGGCGCGAATGTCTTTCAGGCGTGGCTTGTCAATTTAACTG\n+GTGGGGGCCATTGCCCCCGGCACCATTTGGCTTCGGTACCGGCGGTGAACGGGTTTCGCC\n+AAAATCAGTGCACAACGTAGAGCAGCACAACCAAACATCGGTGCACTTTTCGGTTGTTTG\n+GTAGGGGAGAATAACAGACGAGATGGGTGGTTCTGCCTCGCGCGTGTCATTGTTACGGCG\n+GCGATGATCCGTTCTTATTCTGCTGTGGGTGTTGGAAATGAAGCAAAATTAAACTAATAA\n+GTCGCGTTTCGACGTATTGTGCTTGACGGACCGCGGCATATCACGGAGAAGTTCCGCGCT\n+GTTTTGCGAAAATGTTATAAATATTGCTGGTGACACAGTTGCCAGTGTGCACCCTCCTCG\n+CAAAGCTTTTCATTGCGCACAGAAACCGTATAGTCGGTTGAATGGTAGCCCGACATACCT\n+CGGGTGTTGAGGCGTTGTGAAGCGCGCTGCTCAATCTCCTCGCATGGAAGGGTGGCGTGG\n+GCCGGCAGCTTGGCGTGCCCGCGGGCTCTGTGCGTTTTCTCGCCCCGGCGCCAGTCATTC\n+ACTCTAACGTCACCGCGCCTATCGGTTGGAGAGTCGTTTAGACAAAAACTTTGAAAGCCC\n+ATGGCACAAGTTATCATCGTTATTACCGCTGTCGGCTCTTTTCCTCCTTAGCCCTTTGGG\n+CTTTAGCTCTCCTTCAAAATTACTTTGCCGCCTTATGCGAAAATTTCCCCGTTTCCGCTC\n+AGCAAGAGTGATGTTTGTTCCGCCGAGACCTCCAGCAGATTTTTGCACAGTTTTTTTTTC\n+ATGGTAACGCAACGTATCTCTCCTTGATTTAAAGCCGACTATCTCAGCGTTTCACTGTGG\n+ACACCGCCTCTCGGTATCACGGCGTTATTTTCACACTTTTTTGCACCCGGCCACCCGCTT\n+CGTGTTACCAATGGGGTCTCCGGAGGACCCGTGTAAAGGTGATGGCCCTTTGAAGGCAGC\n+GTAGGAGTTTCTCGCGCCCCCCACCACGAGGCTTACGAGCTACGCGGTTGTGGCGTTTCC\n+CCCCCCCGACTCGTTCTGGGAAGGCAATGCCCTAAAACACCATCTCTCTCTCGCCGGGTG\n+CCGCCGCAACGGTGCGTCGGCCAGGTTTAAAACGTAATCCTTTACTTTGCCTGTTCAAGT\n+GCCTCCCGAGCCTTGAAGAGCCGAAGTTATACGACATACGGCGGGGTTCGCACTGTCGTT\n+GGCACATCGGATGAACCCGACCTTAACTCCTCACAAACACCGTGACGCATCTCACGCTTG\n+GAAAAACTCCTTACCCGCGGGCCGATGTCTTCTTTCGCCTTCTGGAGTGGACGGCGTATC\n+TGTGCCAGGCCAGCGCGTAAGCGCGTCGGGGTGTACATTTATGCTTGTCCTGCGGCACCG\n+CTGATGTAAAAACGGGTTGGCGGGCGCAAATCGTGTTGGTACACGTTCCATTCCTCGCGC\n+TGTTTGCGCATTCGTTGGCCAGGAGTCAGTTACCCGAGTGCTTCTCGATGATACACTGTG\n+ACGGTGTCTGCGGTGACTTTTTTCCCTCCGCTCTAACCTCTCAAGC'..b'ACTGGTCTTTGGCAATGAGGAACGGGAGAAACAAATATT\n+ACGACGCATCGAGAAGGTGGGTCCGGAGATTCGCGGCCTTTCCGGGCAAGCCCACACGCG\n+CGAGGTCTCTCTT\n+>NODE_146833_length_104_cov_6.346154\n+ATGTGCGGAACGTGTTCAGGACACTTTTGCTGGTTTACCAACACATTTTTTACAAAAAAA\n+TATCCCTGGTGACGCCGGCCACTTCAACGTGGTGCCAGGGTATCGTACTCCGTCCAGGAG\n+GAAGCCAAGCGCCCGCATTATGCCCGGTTCGACAGGACGGGGGC\n+>NODE_146836_length_244_cov_7.020492\n+GAGCCGTCTTATTGGAGTTAATATGGTTAAGGCACTTATTCACTACGAAGTGGCCGCTGG\n+ACAAGGCAATGCAAAGGCGCTGTTCAACGCAGGTTTCATGTATCAACTAGGCCTTGGCAT\n+GAGAGGAAGCACTCGTCAACCTGAAGGCGTGTGGGATGCGTTTCTAGGCATAATTAACAA\n+CTTTCCTGTGGGCACTGTTGGATCGTCCCGGTCTGAGGGTGGTGGGGCGTTGCAGGCAGA\n+CCCACTCACTGTGCTGGAGAGACGGCTGTACGTGGCCAAGCGCTTTTACGACCACGTTAG\n+CGAT\n+>NODE_146846_length_144_cov_5.000000\n+GCACCATATACCTTGGTGGTAACTACGTATGGTATATTTGACTTGACCTTTTCTTGTGCA\n+CGTGTTGCTTCCCATTTAGTCGGTCTCTTCCGTTCATTACTGAGCAAAATAACGCCTTTT\n+CCCTGCACCTTTTTCATTAATCCTCGCTGATTTTAAAGGTTAGTTGTTGAGATATAACGC\n+AGACTTGTGTTCTGTGCAATCACC\n+>NODE_146849_length_112_cov_22.437500\n+CCTGCGGCCAAGTCCGCCACAAGGCTGAGGACCTCCCCTAAACCCATCCGGCCCTCCATG\n+GCTCCGCTGCCCTTGCGAGGCCCAGTGCACCCCTTCGCTCCCGACAGACCGAGATCGTTC\n+TCCGTTACAACAGACCTTCCACTGGAGCGCCACAACGCAAGCGAAGCGGCTG\n+>NODE_146857_length_82_cov_5.219512\n+GGGGAAAGGGAAGATTACCAAGAGGAGGTGAGGAGTGGTACCGATGATGATTACTTTTTT\n+AAATTTCTTTTTCTTTTGTGTTTTTTTGTTTCGTCAAGACAAAAAGCCTCTCTTGGAGTG\n+AAGACCAGGGGGAACAAACAAA\n+>NODE_146864_length_230_cov_5.100000\n+CTCCTGCCGCCTCTTGCGATCAGGCTCGGGTTTGATGTCCGCCTGCTCCTCGGACCATAA\n+ATCGGGCTGCTCTCCCTCACCAACAGGCAACACACCCATTCCAAGCCACTTCGTGTCGTA\n+CTCATCGCTCCTCACGACATAACTCCACCTCGCGTTGTAGACGGAGTCATACGCGCCATC\n+TAGTTCGACGCCTTTATCAGCAAGGGTATCCACCCTCGCCTCCAACCTAACAACACTATT\n+ACCAATATGTTTTATTTCTTGATTCATCGCATCCATTCTGTTATTCATCG\n+>NODE_146871_length_61_cov_9.868853\n+TTTTAAATACGATCATTCCAGCGTCTCTTCCTTTTTCGATCTCATGAGTCATTCGCGCCG\n+TATAGGGACGGCTCTCGAACTCTCCTTCGACTTCCCTTCCCACCTCGCACCATTCTCCAT\n+T\n+>NODE_146876_length_237_cov_7.995780\n+CTGCGCTGTGACAAGAATGATTAATTATGACACGTGAAAGTTTAATAATTATGAGGCCCC\n+TAGCTTTTGCGCATATACTACTTGTCTTCTACATTTTGAAGAGTAAGCGTTATCATTTCC\n+GTGTCACCTCACACTATACGAGAATGTGGTTCTAATAGCAGAAAAATAAAGCTAGAAAGT\n+TTTTCCTCAAAGGTATTCTCACTTGTTTTCTCTAGGGGTAATAAAAACGGTGTTCCAGGG\n+AGAAAACAGATCACGCAAGAAAAGCCTCCTACAAAATGTTAAACCACGATGAATCCC\n+>NODE_146878_length_243_cov_5.032922\n+CTCAGCAACTATTGAGATGCCAAATATTAATAACAAATGAGAACGCATGAAAGGGACAAA\n+GCAAAAAATAACGGAGAAGGGGAGCTATTTCCAATAGGGCTACCGTTTGAATGTGAATAA\n+GAACTGGACACGACTTCTGTTTGTTTTTTTTCTTCTAACAACGCTTCAGGTACGAGAATA\n+TAATTATAATTAGGTATAACATTTACGTGCATATGTACTTGTAAAGATATGCTTACAGAC\n+GAAGAATAAAAGTGATGTTAAAACATAGGCCGAAAATAAGGAAGCTGCACGAAAGAAGTC\n+CCA\n+>NODE_146886_length_61_cov_184.639343\n+GGTTATTTCGTTGTTAGAGCTGTTTTTTTGCATTCCCGACGTGAAATACCACTTTTTACC\n+TCATTTTGGAAAAAAAATGCAAAAAAAATATTTTTTTCGTTTTTTTGTTTTTTGTGCCGA\n+A\n+>NODE_146890_length_199_cov_5.432161\n+GGATTCGCTGAGCAAACGACTGAATTGTATCCCTGGATTGCAACATGGGTTCTTTGCGGA\n+TGACCTTACAATTGTGTGCACGAGTGCTGATCTGAGTGCAATCCAGCAGACCATTCAGCA\n+AGGATTAGATTGCATCACGAGGTGGTCGGAGGAACATTACACGGAGGTGTCAGCGGAGAA\n+GACCGAATATACGCTATTCGGTGCGCGAGAAACGAACCTATTGAGCCTGAAGGTTGGCGA\n+GACGGTCCTGAAGGAAGTG\n+>NODE_146892_length_92_cov_10.869565\n+GCTGTTGGTTCACGCACTGTTGATCTGCAGCTGATGGCCTTACATCCCAGTCTCGACATC\n+TCGCTGGTGGCTGTGAGGCCGACATCCCGGCCCTTGCAGGGTCTCCCTGGAGGTTGTGGG\n+CACGCGACGGTCGGAGGGGCGTGGGAGGCGCA\n+>NODE_146901_length_64_cov_2472.328125\n+GTTCCAAAAATGGTTGTGCGGGATTTGTTTGTCCCTATTTTTCATCGAAAACGCCGGGAT\n+AGGGTTTAAAACTGATCAAATTTCGAGCGAGAACGGGCACTTTGCGATTTTCCCAAAATT\n+CACC\n+>NODE_146902_length_61_cov_5.131147\n+CAAACGCTGACACCATTGCATACACTAGCAAGATTTTATTGAAAGGACCCAGATGTAGCT\n+ATCTCTTGTGAGACTATGCCGGGGCCTAGCAAACACAGAAGTGGATGCTCACAGTCAGCT\n+A\n+>NODE_146911_length_61_cov_25.737705\n+CCATGGGGTGTTATCTTGGGCCATGCCTTCCGAGGATGGCGAGGAGGGGCTGTTGGACAT\n+GAATATTCCTTCTGGTTGAGGGAGAAGGGATGGGTTAGGGAAAAAGGAGAGAGACGATAG\n+T\n+>NODE_146912_length_192_cov_6.156250\n+CGAAGCAGAGAAGCGTGCCGCTGCACTTCGTGAGAAACGTGGGCTTACCATCGCATTCCC\n+GCCTTAAACATAAATCCGGCTGCTTTTTCAGTGTGCTTGTAGAACTTATTCCAATTTCGC\n+TTCTCGAAGCTGTGATGTGATCCACCCAGCAGGAGACTACGTGCTGTTGCAGTGTGCTGC\n+AGGACTGTTACGGGGGGGACTATATGGTGGTGGATGTACATGATGGGGGAACGTGTGGCA\n+ACACAGGGGGAA\n'
b
diff -r 4432e4183ebd -r 8f6469ffef85 test-data/test_html.html
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/test_html.html Thu Jul 19 06:38:58 2018 -0400
b
@@ -0,0 +1,1 @@
+<html><title>T.congolense VAP</title><body><div style='text-align:center'><h2>Trypanosoma Variant Antigen Profile</h2><h3>test_html<br>Genomic Analysis</h3></p>Relative abundance and the deviation from the mean of the 15 phylotypes within the variant repertoire. <style> table, th, tr, td {border: 1px solid black; border-collapse: collapse;}</style><table style='width:50%;margin-left:25%;text-align:center'><tr><th>Phylotype</th><th>Relative Frequency</th><th>Deviation from Mean</th></tr><tr><td>phy1</td><td>0.1117</td><td>0.0008</td></tr><tr><td>phy2</td><td>0.1036</td><td>0.0073</td></tr><tr><td>phy3</td><td>0.0785</td><td>-0.0012</td></tr><tr><td>phy4</td><td>0.0111</td><td>-0.0045</td></tr><tr><td>phy5</td><td>0.0543</td><td>-0.0137</td></tr><tr><td>phy6</td><td>0.0563</td><td>-0.0162</td></tr><tr><td>phy7</td><td>0.0734</td><td>0.0217</td></tr><tr><td>phy8</td><td>0.0161</td><td>0.0079</td></tr><tr><td>phy9</td><td>0.0111</td><td>-0.0031</td></tr><tr><td>phy10</td><td>0.0282</td><td>-0.0011</td></tr><tr><td>phy11</td><td>0.1268</td><td>-0.0131</td></tr><tr><td>phy12</td><td>0.0584</td><td>0.0037</td></tr><tr><td>phy13</td><td>0.0624</td><td>-0.0094</td></tr><tr><td>phy14</td><td>0.0372</td><td>0.0056</td></tr><tr><td>phy15</td><td>0.1710</td><td>0.0152</td></tr></table><p> <h3>The Variation Heat Map and Dendogram</h3> The absolute phylotype variation in the sample compared to model dataset.</p><img src = 'heatmap.png' alt='Variation Heatmap' style='max-width:100%'><br><br><p><h3>The Deviation Heat Map and Dendogram</h3>The phylotype variation expressed as the deviation from your sample mean compared to the model dataset</p><img src = 'dheatmap.png' alt='Deviation Heatmap' style='max-width:100%'><br><br><p><h3>The Variation PCA plot</h3>PCA analysis corresponding to absolute variation. Colour coded according to location</p><img src = 'vapPCA.png' alt='PCA Analysis' style='max-width:100%'><br><br></div></body></html>
\ No newline at end of file