Mercurial > repos > iuc > vsnp_determine_ref_from_data
changeset 0:12f2b14549f6 draft
"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/vsnp commit 524a39e08f2bea8b8754284df606ff8dd27ed24b"
line wrap: on
 line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/macros.xml Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,30 @@ +<?xml version='1.0' encoding='UTF-8'?> +<macros> + <token name="@WRAPPER_VERSION@">1.0</token> + <token name="@PROFILE@">19.09</token> + <xml name="param_input_type"> + <param name="input_type" type="select" label="Choose the category of the files to be analyzed"> + <option value="single" selected="true">Single files</option> + <option value="collection">Collections of files</option> + </param> + </xml> + <xml name="param_reference_source"> + <param name="reference_source" type="select" label="Choose the source for the reference genome"> + <option value="cached" selected="true">locally cached</option> + <option value="history">from history</option> + </param> + </xml> + <xml name="citations"> + <citations> + <citation type="bibtex"> + @misc{None, + journal = {None}, + author = {1. Stuber T}, + title = {Manuscript in preparation}, + year = {None}, + url = {https://github.com/USDA-VS/vSNP},} + </citation> + </citations> + </xml> +</macros> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01D6_avg_mq.json Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +{"name":null,"index":["NC_002945.4:1057","NC_002945.4:4480","NC_002945.4:8741","NC_002945.4:29061","NC_002945.4:33788","NC_002945.4:41228","NC_002945.4:41437","NC_002945.4:50470","NC_002945.4:59861","NC_002945.4:69913","NC_002945.4:70082","NC_002945.4:70438","NC_002945.4:79918","NC_002945.4:96244","NC_002945.4:110198","NC_002945.4:114965","NC_002945.4:117800","NC_002945.4:127447","NC_002945.4:130166","NC_002945.4:130237","NC_002945.4:140686","NC_002945.4:143799","NC_002945.4:144992","NC_002945.4:148871","NC_002945.4:159370","NC_002945.4:160535","NC_002945.4:165799","NC_002945.4:166696","NC_002945.4:179885","NC_002945.4:189083","NC_002945.4:192177","NC_002945.4:198890","NC_002945.4:223919","NC_002945.4:230661","NC_002945.4:232188","NC_002945.4:295519","NC_002945.4:299636","NC_002945.4:304339","NC_002945.4:319911","NC_002945.4:332124","NC_002945.4:332128","NC_002945.4:332144","NC_002945.4:332145","NC_002945.4:332154","NC_002945.4:332215","NC_002945.4:332218","NC_002945.4:333010","NC_002945.4:340088","NC_002945.4:340090","NC_002945.4:340091","NC_002945.4:340092","NC_002945.4:340097","NC_002945.4:362818","NC_002945.4:364560","NC_002945.4:364804","NC_002945.4:366022","NC_002945.4:407246","NC_002945.4:430077","NC_002945.4:438482","NC_002945.4:441762","NC_002945.4:449922","NC_002945.4:452398","NC_002945.4:460722","NC_002945.4:467343","NC_002945.4:467402","NC_002945.4:479644","NC_002945.4:483845","NC_002945.4:485584","NC_002945.4:488897","NC_002945.4:490878","NC_002945.4:507929","NC_002945.4:518522","NC_002945.4:519412","NC_002945.4:541571","NC_002945.4:544180","NC_002945.4:577068","NC_002945.4:598704","NC_002945.4:600207","NC_002945.4:611077","NC_002945.4:622386","NC_002945.4:641896","NC_002945.4:642875","NC_002945.4:644245","NC_002945.4:649910","NC_002945.4:652349","NC_002945.4:673880","NC_002945.4:680416","NC_002945.4:685069","NC_002945.4:701329","NC_002945.4:701386","NC_002945.4:712319","NC_002945.4:723170","NC_002945.4:726979","NC_002945.4:737636","NC_002945.4:738102","NC_002945.4:745507","NC_002945.4:760347","NC_002945.4:792617","NC_002945.4:804997","NC_002945.4:808601","NC_002945.4:811737","NC_002945.4:812709","NC_002945.4:828003","NC_002945.4:832093","NC_002945.4:833960","NC_002945.4:839308","NC_002945.4:843812","NC_002945.4:854043","NC_002945.4:865821","NC_002945.4:870116","NC_002945.4:884432","NC_002945.4:889897","NC_002945.4:905912","NC_002945.4:917766","NC_002945.4:920753","NC_002945.4:941068","NC_002945.4:942431","NC_002945.4:943719","NC_002945.4:946102","NC_002945.4:948022","NC_002945.4:948811","NC_002945.4:948974","NC_002945.4:965529","NC_002945.4:967989","NC_002945.4:973459","NC_002945.4:974604","NC_002945.4:976327","NC_002945.4:982301","NC_002945.4:990611","NC_002945.4:998183","NC_002945.4:998196","NC_002945.4:1018313","NC_002945.4:1021422","NC_002945.4:1034434","NC_002945.4:1036102","NC_002945.4:1036530","NC_002945.4:1096802","NC_002945.4:1104019","NC_002945.4:1104291","NC_002945.4:1124266","NC_002945.4:1137800","NC_002945.4:1139489","NC_002945.4:1159390","NC_002945.4:1160992","NC_002945.4:1168458","NC_002945.4:1186381","NC_002945.4:1190076","NC_002945.4:1190080","NC_002945.4:1190084","NC_002945.4:1191092","NC_002945.4:1199529","NC_002945.4:1199530","NC_002945.4:1199951","NC_002945.4:1206896","NC_002945.4:1212203","NC_002945.4:1213847","NC_002945.4:1214540","NC_002945.4:1224899","NC_002945.4:1230875","NC_002945.4:1244746","NC_002945.4:1259250","NC_002945.4:1264712","NC_002945.4:1295457","NC_002945.4:1312836","NC_002945.4:1314197","NC_002945.4:1333537","NC_002945.4:1335092","NC_002945.4:1341613","NC_002945.4:1383731","NC_002945.4:1405922","NC_002945.4:1412824","NC_002945.4:1412828","NC_002945.4:1412885","NC_002945.4:1412893","NC_002945.4:1421904","NC_002945.4:1442194","NC_002945.4:1467394","NC_002945.4:1470606","NC_002945.4:1479827","NC_002945.4:1481327","NC_002945.4:1484942","NC_002945.4:1492328","NC_002945.4:1498639","NC_002945.4:1501932","NC_002945.4:1509487","NC_002945.4:1517866","NC_002945.4:1524526","NC_002945.4:1529147","NC_002945.4:1533175","NC_002945.4:1535299","NC_002945.4:1535303","NC_002945.4:1535366","NC_002945.4:1536267","NC_002945.4:1547426","NC_002945.4:1568090","NC_002945.4:1584881","NC_002945.4:1591357","NC_002945.4:1594398","NC_002945.4:1597464","NC_002945.4:1597847","NC_002945.4:1600443","NC_002945.4:1619153","NC_002945.4:1619361","NC_002945.4:1625561","NC_002945.4:1628068","NC_002945.4:1632869","NC_002945.4:1659174","NC_002945.4:1682044","NC_002945.4:1701507","NC_002945.4:1711760","NC_002945.4:1716413","NC_002945.4:1717086","NC_002945.4:1720220","NC_002945.4:1741553","NC_002945.4:1762390","NC_002945.4:1790296","NC_002945.4:1799442","NC_002945.4:1803035","NC_002945.4:1817260","NC_002945.4:1828312","NC_002945.4:1833330","NC_002945.4:1863248","NC_002945.4:1871114","NC_002945.4:1880430","NC_002945.4:1894922","NC_002945.4:1896107","NC_002945.4:1915461","NC_002945.4:1915936","NC_002945.4:1920100","NC_002945.4:1932972","NC_002945.4:1941781","NC_002945.4:1954048","NC_002945.4:1957978","NC_002945.4:1958977","NC_002945.4:1961656","NC_002945.4:1974665","NC_002945.4:1989922","NC_002945.4:1996251","NC_002945.4:2002061","NC_002945.4:2007303","NC_002945.4:2010421","NC_002945.4:2020061","NC_002945.4:2021640","NC_002945.4:2024890","NC_002945.4:2027869","NC_002945.4:2035774","NC_002945.4:2036697","NC_002945.4:2049171","NC_002945.4:2057553","NC_002945.4:2059249","NC_002945.4:2059920","NC_002945.4:2075405","NC_002945.4:2078648","NC_002945.4:2093479","NC_002945.4:2096812","NC_002945.4:2099043","NC_002945.4:2118096","NC_002945.4:2121160","NC_002945.4:2137049","NC_002945.4:2138896","NC_002945.4:2145868","NC_002945.4:2163576","NC_002945.4:2204661","NC_002945.4:2210027","NC_002945.4:2239061","NC_002945.4:2257546","NC_002945.4:2267557","NC_002945.4:2268821","NC_002945.4:2283200","NC_002945.4:2283218","NC_002945.4:2283220","NC_002945.4:2283227","NC_002945.4:2283235","NC_002945.4:2283236","NC_002945.4:2283350","NC_002945.4:2283353","NC_002945.4:2283355","NC_002945.4:2283362","NC_002945.4:2283366","NC_002945.4:2283367","NC_002945.4:2283368","NC_002945.4:2283371","NC_002945.4:2308525","NC_002945.4:2310215","NC_002945.4:2333994","NC_002945.4:2339770","NC_002945.4:2358298","NC_002945.4:2360219","NC_002945.4:2368982","NC_002945.4:2369407","NC_002945.4:2378324","NC_002945.4:2381437","NC_002945.4:2384647","NC_002945.4:2410761","NC_002945.4:2412437","NC_002945.4:2413021","NC_002945.4:2418267","NC_002945.4:2428397","NC_002945.4:2433602","NC_002945.4:2479007","NC_002945.4:2492067","NC_002945.4:2497022","NC_002945.4:2499336","NC_002945.4:2506199","NC_002945.4:2508626","NC_002945.4:2513801","NC_002945.4:2515130","NC_002945.4:2520576","NC_002945.4:2524942","NC_002945.4:2528517","NC_002945.4:2529413","NC_002945.4:2532958","NC_002945.4:2538021","NC_002945.4:2539896","NC_002945.4:2549198","NC_002945.4:2573831","NC_002945.4:2615591","NC_002945.4:2631265","NC_002945.4:2656304","NC_002945.4:2656651","NC_002945.4:2662768","NC_002945.4:2663582","NC_002945.4:2667489","NC_002945.4:2683485","NC_002945.4:2688315","NC_002945.4:2729845","NC_002945.4:2747797","NC_002945.4:2749502","NC_002945.4:2758761","NC_002945.4:2767533","NC_002945.4:2770129","NC_002945.4:2794510","NC_002945.4:2806603","NC_002945.4:2807510","NC_002945.4:2807511","NC_002945.4:2809255","NC_002945.4:2819758","NC_002945.4:2823105","NC_002945.4:2870414","NC_002945.4:2870624","NC_002945.4:2873027","NC_002945.4:2884747","NC_002945.4:2886118","NC_002945.4:2890220","NC_002945.4:2893045","NC_002945.4:2899163","NC_002945.4:2899584","NC_002945.4:2900525","NC_002945.4:2918203","NC_002945.4:2924775","NC_002945.4:2927134","NC_002945.4:2931071","NC_002945.4:2931113","NC_002945.4:2942926","NC_002945.4:2946800","NC_002945.4:2956778","NC_002945.4:2964207","NC_002945.4:2978162","NC_002945.4:2978164","NC_002945.4:2983580","NC_002945.4:2984156","NC_002945.4:3018593","NC_002945.4:3031841","NC_002945.4:3039600","NC_002945.4:3040820","NC_002945.4:3042914","NC_002945.4:3045025","NC_002945.4:3053649","NC_002945.4:3053756","NC_002945.4:3063074","NC_002945.4:3068041","NC_002945.4:3069493","NC_002945.4:3070642","NC_002945.4:3088868","NC_002945.4:3093531","NC_002945.4:3098932","NC_002945.4:3100639","NC_002945.4:3103354","NC_002945.4:3106064","NC_002945.4:3106527","NC_002945.4:3116059","NC_002945.4:3127117","NC_002945.4:3137471","NC_002945.4:3140342","NC_002945.4:3151212","NC_002945.4:3154140","NC_002945.4:3172929","NC_002945.4:3173568","NC_002945.4:3191792","NC_002945.4:3247551","NC_002945.4:3250072","NC_002945.4:3250245","NC_002945.4:3270181","NC_002945.4:3294771","NC_002945.4:3295991","NC_002945.4:3297558","NC_002945.4:3304410","NC_002945.4:3304946","NC_002945.4:3306898","NC_002945.4:3310831","NC_002945.4:3319244","NC_002945.4:3330907","NC_002945.4:3338298","NC_002945.4:3347870","NC_002945.4:3368453","NC_002945.4:3371156","NC_002945.4:3396621","NC_002945.4:3396650","NC_002945.4:3413486","NC_002945.4:3414355","NC_002945.4:3421983","NC_002945.4:3422650","NC_002945.4:3439578","NC_002945.4:3451869","NC_002945.4:3453219","NC_002945.4:3460907","NC_002945.4:3464357","NC_002945.4:3464485","NC_002945.4:3464524","NC_002945.4:3468669","NC_002945.4:3476130","NC_002945.4:3482644","NC_002945.4:3484836","NC_002945.4:3486507","NC_002945.4:3493554","NC_002945.4:3495510","NC_002945.4:3497957","NC_002945.4:3533661","NC_002945.4:3546799","NC_002945.4:3553753","NC_002945.4:3564896","NC_002945.4:3567535","NC_002945.4:3574014","NC_002945.4:3574955","NC_002945.4:3591452","NC_002945.4:3600600","NC_002945.4:3622899","NC_002945.4:3624371","NC_002945.4:3626128","NC_002945.4:3630061","NC_002945.4:3645682","NC_002945.4:3655045","NC_002945.4:3667823","NC_002945.4:3712401","NC_002945.4:3718169","NC_002945.4:3718628","NC_002945.4:3719802","NC_002945.4:3723554","NC_002945.4:3725203","NC_002945.4:3729351","NC_002945.4:3751627","NC_002945.4:3769174","NC_002945.4:3776764","NC_002945.4:3778473","NC_002945.4:3800223","NC_002945.4:3805467","NC_002945.4:3816878","NC_002945.4:3821259","NC_002945.4:3839650","NC_002945.4:3846859","NC_002945.4:3874432","NC_002945.4:3877448","NC_002945.4:3884519","NC_002945.4:3888418","NC_002945.4:3902781","NC_002945.4:3905690","NC_002945.4:3957298","NC_002945.4:3966140","NC_002945.4:3969490","NC_002945.4:3969558","NC_002945.4:3969875","NC_002945.4:4003460","NC_002945.4:4008509","NC_002945.4:4010760","NC_002945.4:4017319","NC_002945.4:4018300","NC_002945.4:4029201","NC_002945.4:4046572","NC_002945.4:4070056","NC_002945.4:4076594","NC_002945.4:4077189","NC_002945.4:4080736","NC_002945.4:4096612","NC_002945.4:4128841","NC_002945.4:4130927","NC_002945.4:4149101","NC_002945.4:4155870","NC_002945.4:4159272","NC_002945.4:4160820","NC_002945.4:4162407","NC_002945.4:4162554","NC_002945.4:4180986","NC_002945.4:4205111","NC_002945.4:4207380","NC_002945.4:4214259","NC_002945.4:4219009","NC_002945.4:4222196","NC_002945.4:4226875","NC_002945.4:4231626","NC_002945.4:4245762","NC_002945.4:4251588","NC_002945.4:4264139","NC_002945.4:4278315","NC_002945.4:4281136","NC_002945.4:4282825","NC_002945.4:4293932","NC_002945.4:4298964","NC_002945.4:4303164","NC_002945.4:4311425","NC_002945.4:4321337","NC_002945.4:4339036","NC_002945.4:4347304","NC_002945.4:228109","NC_002945.4:331051","NC_002945.4:331241","NC_002945.4:331411","NC_002945.4:960995","NC_002945.4:997676","NC_002945.4:1005705","NC_002945.4:1348342","NC_002945.4:1723583","NC_002945.4:1961826","NC_002945.4:3373966","NC_002945.4:3941254","NC_002945.4:4236320","NC_002945.4:1277988","NC_002945.4:1382465","NC_002945.4:1463503","NC_002945.4:1704859","NC_002945.4:1806623","NC_002945.4:1911237","NC_002945.4:3942270"],"data":[60,60,60,59,60,59,60,59,59,60,60,59,60,59,60,60,59,59,60,60,60,60,59,59,59,59,60,59,60,59,60,60,60,60,60,59,60,60,59,59,59,59,59,59,59,59,59,57,57,57,57,57,58,59,60,60,60,59,59,60,59,59,60,59,60,60,59,60,60,59,59,59,59,60,60,59,59,59,59,60,60,60,60,60,59,60,59,60,60,60,60,59,59,60,59,59,59,59,59,59,59,60,60,59,58,60,59,60,59,59,59,59,59,60,59,59,60,59,59,59,60,59,59,59,60,60,60,59,59,60,60,60,59,60,59,59,55,60,60,60,59,59,60,59,60,60,52,55,56,59,59,59,60,59,60,60,59,59,60,60,59,59,60,59,59,59,60,59,60,60,59,59,56,56,59,60,59,58,60,59,59,60,59,59,59,59,59,60,59,58,57,57,60,59,60,60,59,60,59,60,59,59,59,60,59,59,60,60,60,59,60,60,60,60,59,60,59,59,60,60,59,59,59,60,60,59,60,59,60,60,59,59,59,59,60,60,59,59,59,59,59,59,59,59,59,60,60,59,59,60,60,60,59,60,59,59,60,60,59,59,59,59,59,60,60,60,59,60,59,59,59,59,59,59,60,60,60,60,60,60,60,60,59,60,59,60,60,60,59,59,60,59,60,60,60,60,59,60,60,59,60,60,59,59,60,60,60,59,60,60,59,59,60,59,60,60,59,59,60,60,60,59,60,59,59,59,59,60,60,59,59,59,60,60,60,59,59,60,60,59,60,60,60,59,59,59,60,59,59,60,59,59,60,60,60,60,59,60,60,60,59,60,59,59,60,60,60,60,59,60,60,60,59,59,60,59,60,59,59,59,59,59,60,59,60,59,60,59,60,59,59,60,60,60,60,59,59,59,60,60,60,60,58,60,59,60,59,59,60,60,60,59,59,59,59,59,59,60,59,59,60,60,60,59,60,60,59,60,59,60,60,60,60,60,60,60,59,60,59,59,59,59,59,59,59,59,59,59,59,59,60,60,60,60,60,59,60,60,59,59,59,59,59,59,59,60,60,59,60,59,60,60,60,59,60,60,59,59,60,60,59,59,59,60,59,59,59,60,59,59,60,60,59,60,59,60,60,60,59,59,59,60,60,60,59,60,59,59,59,60,59,59,59,60,60,60,59,59,60,60,60,59,59,59,60,56,60,60,59,60,60,60]} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01D6_snps.json Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +{"columns":["NC_002945.4:1005705","NC_002945.4:1348342","NC_002945.4:1382465","NC_002945.4:1463503","NC_002945.4:1704859","NC_002945.4:1723583","NC_002945.4:1911237","NC_002945.4:1961826","NC_002945.4:228109","NC_002945.4:2412437","NC_002945.4:2413021","NC_002945.4:3069493","NC_002945.4:3319244","NC_002945.4:3373966","NC_002945.4:3413486","NC_002945.4:3941254","NC_002945.4:3942270","NC_002945.4:4236320","NC_002945.4:4278315","NC_002945.4:960995","NC_002945.4:997676"],"index":["SRR1792265_zc","SRR1792272_zc","SRR1792271_zc","SRR8073662_zc","SRR1791772_zc","SRR1791698_zc_vcf","root"],"data":[["C","G","G","A","C","G","C","G","C","R","C","A","C","G","A","G","A","G","T","T","C"],["G","A","G","A","C","A","C","C","T","A","T","C","A","A","G","A","A","A","C","G","T"],["G","A","G","A","C","A","C","C","T","A","T","C","A","A","G","A","A","A","C","G","T"],["G","A","G","A","C","G","C","C","T","A","T","C","A","G","G","G","A","G","C","G","T"],["G","A","C","G","T","G","C","C","T","A","T","C","A","G","G","G","A","G","C","G","T"],["G","A","G","A","C","G","T","C","T","A","T","C","A","G","G","G","C","G","C","G","T"],["C","G","G","A","C","G","C","G","C","G","T","C","A","G","G","G","A","G","C","T","C"]]} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01D6_snps.newick Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +(root,((((SRR1792271_zc,SRR1792272_zc),SRR1791772_zc),SRR8073662_zc),SRR1791698_zc_vcf),SRR1792265_zc);
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01D_avg_mq.json Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +{"name":null,"index":["NC_002945.4:1057","NC_002945.4:4480","NC_002945.4:8741","NC_002945.4:29061","NC_002945.4:33788","NC_002945.4:41228","NC_002945.4:41437","NC_002945.4:50470","NC_002945.4:59861","NC_002945.4:69913","NC_002945.4:70082","NC_002945.4:70438","NC_002945.4:79918","NC_002945.4:96244","NC_002945.4:110198","NC_002945.4:114965","NC_002945.4:117800","NC_002945.4:127447","NC_002945.4:130166","NC_002945.4:130237","NC_002945.4:140686","NC_002945.4:143799","NC_002945.4:144992","NC_002945.4:148871","NC_002945.4:159370","NC_002945.4:160535","NC_002945.4:165799","NC_002945.4:166696","NC_002945.4:179885","NC_002945.4:189083","NC_002945.4:192177","NC_002945.4:198890","NC_002945.4:223919","NC_002945.4:230661","NC_002945.4:232188","NC_002945.4:295519","NC_002945.4:299636","NC_002945.4:304339","NC_002945.4:319911","NC_002945.4:332124","NC_002945.4:332128","NC_002945.4:332144","NC_002945.4:332145","NC_002945.4:332154","NC_002945.4:332215","NC_002945.4:332218","NC_002945.4:333010","NC_002945.4:340088","NC_002945.4:340090","NC_002945.4:340091","NC_002945.4:340092","NC_002945.4:340097","NC_002945.4:362818","NC_002945.4:364560","NC_002945.4:364804","NC_002945.4:366022","NC_002945.4:407246","NC_002945.4:430077","NC_002945.4:438482","NC_002945.4:441762","NC_002945.4:449922","NC_002945.4:452398","NC_002945.4:460722","NC_002945.4:467343","NC_002945.4:467402","NC_002945.4:479644","NC_002945.4:483845","NC_002945.4:485584","NC_002945.4:488897","NC_002945.4:490878","NC_002945.4:507929","NC_002945.4:518522","NC_002945.4:519412","NC_002945.4:541571","NC_002945.4:544180","NC_002945.4:577068","NC_002945.4:598704","NC_002945.4:600207","NC_002945.4:611077","NC_002945.4:622386","NC_002945.4:641896","NC_002945.4:642875","NC_002945.4:644245","NC_002945.4:649910","NC_002945.4:652349","NC_002945.4:673880","NC_002945.4:680416","NC_002945.4:685069","NC_002945.4:701329","NC_002945.4:701386","NC_002945.4:712319","NC_002945.4:723170","NC_002945.4:726979","NC_002945.4:737636","NC_002945.4:738102","NC_002945.4:745507","NC_002945.4:760347","NC_002945.4:792617","NC_002945.4:804997","NC_002945.4:808601","NC_002945.4:811737","NC_002945.4:812709","NC_002945.4:828003","NC_002945.4:832093","NC_002945.4:833960","NC_002945.4:839308","NC_002945.4:843812","NC_002945.4:854043","NC_002945.4:865821","NC_002945.4:870116","NC_002945.4:884432","NC_002945.4:889897","NC_002945.4:905912","NC_002945.4:917766","NC_002945.4:920753","NC_002945.4:941068","NC_002945.4:942431","NC_002945.4:943719","NC_002945.4:946102","NC_002945.4:948022","NC_002945.4:948811","NC_002945.4:948974","NC_002945.4:965529","NC_002945.4:967989","NC_002945.4:973459","NC_002945.4:974604","NC_002945.4:976327","NC_002945.4:982301","NC_002945.4:990611","NC_002945.4:998183","NC_002945.4:998196","NC_002945.4:1018313","NC_002945.4:1021422","NC_002945.4:1034434","NC_002945.4:1036102","NC_002945.4:1036530","NC_002945.4:1096802","NC_002945.4:1104019","NC_002945.4:1104291","NC_002945.4:1124266","NC_002945.4:1137800","NC_002945.4:1139489","NC_002945.4:1159390","NC_002945.4:1160992","NC_002945.4:1168458","NC_002945.4:1186381","NC_002945.4:1190076","NC_002945.4:1190080","NC_002945.4:1190084","NC_002945.4:1191092","NC_002945.4:1199529","NC_002945.4:1199530","NC_002945.4:1199951","NC_002945.4:1206896","NC_002945.4:1212203","NC_002945.4:1213847","NC_002945.4:1214540","NC_002945.4:1224899","NC_002945.4:1230875","NC_002945.4:1244746","NC_002945.4:1259250","NC_002945.4:1264712","NC_002945.4:1295457","NC_002945.4:1312836","NC_002945.4:1314197","NC_002945.4:1333537","NC_002945.4:1335092","NC_002945.4:1341613","NC_002945.4:1383731","NC_002945.4:1405922","NC_002945.4:1412824","NC_002945.4:1412828","NC_002945.4:1412885","NC_002945.4:1412893","NC_002945.4:1421904","NC_002945.4:1442194","NC_002945.4:1467394","NC_002945.4:1470606","NC_002945.4:1479827","NC_002945.4:1481327","NC_002945.4:1484942","NC_002945.4:1492328","NC_002945.4:1498639","NC_002945.4:1501932","NC_002945.4:1509487","NC_002945.4:1517866","NC_002945.4:1524526","NC_002945.4:1529147","NC_002945.4:1533175","NC_002945.4:1535299","NC_002945.4:1535303","NC_002945.4:1535366","NC_002945.4:1536267","NC_002945.4:1547426","NC_002945.4:1568090","NC_002945.4:1584881","NC_002945.4:1591357","NC_002945.4:1594398","NC_002945.4:1597464","NC_002945.4:1597847","NC_002945.4:1600443","NC_002945.4:1619153","NC_002945.4:1619361","NC_002945.4:1625561","NC_002945.4:1628068","NC_002945.4:1632869","NC_002945.4:1659174","NC_002945.4:1682044","NC_002945.4:1701507","NC_002945.4:1711760","NC_002945.4:1716413","NC_002945.4:1717086","NC_002945.4:1720220","NC_002945.4:1741553","NC_002945.4:1762390","NC_002945.4:1790296","NC_002945.4:1799442","NC_002945.4:1803035","NC_002945.4:1817260","NC_002945.4:1828312","NC_002945.4:1833330","NC_002945.4:1863248","NC_002945.4:1871114","NC_002945.4:1880430","NC_002945.4:1894922","NC_002945.4:1896107","NC_002945.4:1915461","NC_002945.4:1915936","NC_002945.4:1920100","NC_002945.4:1932972","NC_002945.4:1941781","NC_002945.4:1954048","NC_002945.4:1957978","NC_002945.4:1958977","NC_002945.4:1961656","NC_002945.4:1974665","NC_002945.4:1989922","NC_002945.4:1996251","NC_002945.4:2002061","NC_002945.4:2007303","NC_002945.4:2010421","NC_002945.4:2020061","NC_002945.4:2021640","NC_002945.4:2024890","NC_002945.4:2027869","NC_002945.4:2035774","NC_002945.4:2036697","NC_002945.4:2049171","NC_002945.4:2057553","NC_002945.4:2059249","NC_002945.4:2059920","NC_002945.4:2075405","NC_002945.4:2078648","NC_002945.4:2093479","NC_002945.4:2096812","NC_002945.4:2099043","NC_002945.4:2118096","NC_002945.4:2121160","NC_002945.4:2137049","NC_002945.4:2138896","NC_002945.4:2145868","NC_002945.4:2163576","NC_002945.4:2204661","NC_002945.4:2210027","NC_002945.4:2239061","NC_002945.4:2257546","NC_002945.4:2267557","NC_002945.4:2268821","NC_002945.4:2283200","NC_002945.4:2283218","NC_002945.4:2283220","NC_002945.4:2283227","NC_002945.4:2283235","NC_002945.4:2283236","NC_002945.4:2283350","NC_002945.4:2283353","NC_002945.4:2283355","NC_002945.4:2283362","NC_002945.4:2283366","NC_002945.4:2283367","NC_002945.4:2283368","NC_002945.4:2283371","NC_002945.4:2308525","NC_002945.4:2310215","NC_002945.4:2333994","NC_002945.4:2339770","NC_002945.4:2358298","NC_002945.4:2360219","NC_002945.4:2368982","NC_002945.4:2369407","NC_002945.4:2378324","NC_002945.4:2381437","NC_002945.4:2384647","NC_002945.4:2410761","NC_002945.4:2412437","NC_002945.4:2413021","NC_002945.4:2418267","NC_002945.4:2428397","NC_002945.4:2433602","NC_002945.4:2479007","NC_002945.4:2492067","NC_002945.4:2497022","NC_002945.4:2499336","NC_002945.4:2506199","NC_002945.4:2508626","NC_002945.4:2513801","NC_002945.4:2515130","NC_002945.4:2520576","NC_002945.4:2524942","NC_002945.4:2528517","NC_002945.4:2529413","NC_002945.4:2532958","NC_002945.4:2538021","NC_002945.4:2539896","NC_002945.4:2549198","NC_002945.4:2573831","NC_002945.4:2615591","NC_002945.4:2631265","NC_002945.4:2656304","NC_002945.4:2656651","NC_002945.4:2662768","NC_002945.4:2663582","NC_002945.4:2667489","NC_002945.4:2683485","NC_002945.4:2688315","NC_002945.4:2729845","NC_002945.4:2747797","NC_002945.4:2749502","NC_002945.4:2758761","NC_002945.4:2767533","NC_002945.4:2770129","NC_002945.4:2794510","NC_002945.4:2806603","NC_002945.4:2807510","NC_002945.4:2807511","NC_002945.4:2809255","NC_002945.4:2819758","NC_002945.4:2823105","NC_002945.4:2870414","NC_002945.4:2870624","NC_002945.4:2873027","NC_002945.4:2884747","NC_002945.4:2886118","NC_002945.4:2890220","NC_002945.4:2893045","NC_002945.4:2899163","NC_002945.4:2899584","NC_002945.4:2900525","NC_002945.4:2918203","NC_002945.4:2924775","NC_002945.4:2927134","NC_002945.4:2931071","NC_002945.4:2931113","NC_002945.4:2942926","NC_002945.4:2946800","NC_002945.4:2956778","NC_002945.4:2964207","NC_002945.4:2978162","NC_002945.4:2978164","NC_002945.4:2983580","NC_002945.4:2984156","NC_002945.4:3018593","NC_002945.4:3031841","NC_002945.4:3039600","NC_002945.4:3040820","NC_002945.4:3042914","NC_002945.4:3045025","NC_002945.4:3053649","NC_002945.4:3053756","NC_002945.4:3063074","NC_002945.4:3068041","NC_002945.4:3069493","NC_002945.4:3070642","NC_002945.4:3088868","NC_002945.4:3093531","NC_002945.4:3098932","NC_002945.4:3100639","NC_002945.4:3103354","NC_002945.4:3106064","NC_002945.4:3106527","NC_002945.4:3116059","NC_002945.4:3127117","NC_002945.4:3137471","NC_002945.4:3140342","NC_002945.4:3151212","NC_002945.4:3154140","NC_002945.4:3172929","NC_002945.4:3173568","NC_002945.4:3191792","NC_002945.4:3247551","NC_002945.4:3250072","NC_002945.4:3250245","NC_002945.4:3270181","NC_002945.4:3294771","NC_002945.4:3295991","NC_002945.4:3297558","NC_002945.4:3304410","NC_002945.4:3304946","NC_002945.4:3306898","NC_002945.4:3310831","NC_002945.4:3319244","NC_002945.4:3330907","NC_002945.4:3338298","NC_002945.4:3347870","NC_002945.4:3368453","NC_002945.4:3371156","NC_002945.4:3396621","NC_002945.4:3396650","NC_002945.4:3413486","NC_002945.4:3414355","NC_002945.4:3421983","NC_002945.4:3422650","NC_002945.4:3439578","NC_002945.4:3451869","NC_002945.4:3453219","NC_002945.4:3460907","NC_002945.4:3464357","NC_002945.4:3464485","NC_002945.4:3464524","NC_002945.4:3468669","NC_002945.4:3476130","NC_002945.4:3482644","NC_002945.4:3484836","NC_002945.4:3486507","NC_002945.4:3493554","NC_002945.4:3495510","NC_002945.4:3497957","NC_002945.4:3533661","NC_002945.4:3546799","NC_002945.4:3553753","NC_002945.4:3564896","NC_002945.4:3567535","NC_002945.4:3574014","NC_002945.4:3574955","NC_002945.4:3591452","NC_002945.4:3600600","NC_002945.4:3622899","NC_002945.4:3624371","NC_002945.4:3626128","NC_002945.4:3630061","NC_002945.4:3645682","NC_002945.4:3655045","NC_002945.4:3667823","NC_002945.4:3712401","NC_002945.4:3718169","NC_002945.4:3718628","NC_002945.4:3719802","NC_002945.4:3723554","NC_002945.4:3725203","NC_002945.4:3729351","NC_002945.4:3751627","NC_002945.4:3769174","NC_002945.4:3776764","NC_002945.4:3778473","NC_002945.4:3800223","NC_002945.4:3805467","NC_002945.4:3816878","NC_002945.4:3821259","NC_002945.4:3839650","NC_002945.4:3846859","NC_002945.4:3874432","NC_002945.4:3877448","NC_002945.4:3884519","NC_002945.4:3888418","NC_002945.4:3902781","NC_002945.4:3905690","NC_002945.4:3957298","NC_002945.4:3966140","NC_002945.4:3969490","NC_002945.4:3969558","NC_002945.4:3969875","NC_002945.4:4003460","NC_002945.4:4008509","NC_002945.4:4010760","NC_002945.4:4017319","NC_002945.4:4018300","NC_002945.4:4029201","NC_002945.4:4046572","NC_002945.4:4070056","NC_002945.4:4076594","NC_002945.4:4077189","NC_002945.4:4080736","NC_002945.4:4096612","NC_002945.4:4128841","NC_002945.4:4130927","NC_002945.4:4149101","NC_002945.4:4155870","NC_002945.4:4159272","NC_002945.4:4160820","NC_002945.4:4162407","NC_002945.4:4162554","NC_002945.4:4180986","NC_002945.4:4205111","NC_002945.4:4207380","NC_002945.4:4214259","NC_002945.4:4219009","NC_002945.4:4222196","NC_002945.4:4226875","NC_002945.4:4231626","NC_002945.4:4245762","NC_002945.4:4251588","NC_002945.4:4264139","NC_002945.4:4278315","NC_002945.4:4281136","NC_002945.4:4282825","NC_002945.4:4293932","NC_002945.4:4298964","NC_002945.4:4303164","NC_002945.4:4311425","NC_002945.4:4321337","NC_002945.4:4339036","NC_002945.4:4347304","NC_002945.4:228109","NC_002945.4:331051","NC_002945.4:331241","NC_002945.4:331411","NC_002945.4:960995","NC_002945.4:997676","NC_002945.4:1005705","NC_002945.4:1348342","NC_002945.4:1723583","NC_002945.4:1961826","NC_002945.4:3373966","NC_002945.4:3941254","NC_002945.4:4236320","NC_002945.4:1277988","NC_002945.4:1382465","NC_002945.4:1463503","NC_002945.4:1704859","NC_002945.4:1806623","NC_002945.4:1911237","NC_002945.4:3942270"],"data":[60,60,60,59,60,59,60,59,59,60,60,59,60,59,60,60,59,59,60,60,60,60,59,59,59,59,60,59,60,59,60,60,60,60,60,59,60,60,59,59,59,59,59,59,59,59,59,57,57,57,57,57,58,59,60,60,60,59,59,60,59,59,60,59,60,60,59,60,60,59,59,59,59,60,60,59,59,59,59,60,60,60,60,60,59,60,59,60,60,60,60,59,59,60,59,59,59,59,59,59,59,60,60,59,58,60,59,60,59,59,59,59,59,60,59,59,60,59,59,59,60,59,59,59,60,60,60,59,59,60,60,60,59,60,59,59,55,60,60,60,59,59,60,59,60,60,52,55,56,59,59,59,60,59,60,60,59,59,60,60,59,59,60,59,59,59,60,59,60,60,59,59,56,56,59,60,59,58,60,59,59,60,59,59,59,59,59,60,59,58,57,57,60,59,60,60,59,60,59,60,59,59,59,60,59,59,60,60,60,59,60,60,60,60,59,60,59,59,60,60,59,59,59,60,60,59,60,59,60,60,59,59,59,59,60,60,59,59,59,59,59,59,59,59,59,60,60,59,59,60,60,60,59,60,59,59,60,60,59,59,59,59,59,60,60,60,59,60,59,59,59,59,59,59,60,60,60,60,60,60,60,60,59,60,59,60,60,60,59,59,60,59,60,60,60,60,59,60,60,59,60,60,59,59,60,60,60,59,60,60,59,59,60,59,60,60,59,59,60,60,60,59,60,59,59,59,59,60,60,59,59,59,60,60,60,59,59,60,60,59,60,60,60,59,59,59,60,59,59,60,59,59,60,60,60,60,59,60,60,60,59,60,59,59,60,60,60,60,59,60,60,60,59,59,60,59,60,59,59,59,59,59,60,59,60,59,60,59,60,59,59,60,60,60,60,59,59,59,60,60,60,60,58,60,59,60,59,59,60,60,60,59,59,59,59,59,59,60,59,59,60,60,60,59,60,60,59,60,59,60,60,60,60,60,60,60,59,60,59,59,59,59,59,59,59,59,59,59,59,59,60,60,60,60,60,59,60,60,59,59,59,59,59,59,59,60,60,59,60,59,60,60,60,59,60,60,59,59,60,60,59,59,59,60,59,59,59,60,59,59,60,60,59,60,59,60,60,60,59,59,59,60,60,60,59,60,59,59,59,60,59,59,59,60,60,60,59,59,60,60,60,59,59,59,60,56,60,60,59,60,60,60]} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01D_snps.json Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +{"columns":["NC_002945.4:1005705","NC_002945.4:1348342","NC_002945.4:1382465","NC_002945.4:1463503","NC_002945.4:1704859","NC_002945.4:1723583","NC_002945.4:1911237","NC_002945.4:1961826","NC_002945.4:228109","NC_002945.4:2412437","NC_002945.4:2413021","NC_002945.4:3069493","NC_002945.4:3319244","NC_002945.4:3373966","NC_002945.4:3413486","NC_002945.4:3941254","NC_002945.4:3942270","NC_002945.4:4236320","NC_002945.4:4278315","NC_002945.4:960995","NC_002945.4:997676"],"index":["SRR1792265_zc","SRR1792272_zc","SRR1792271_zc","SRR8073662_zc","SRR1791772_zc","SRR1791698_zc_vcf","root"],"data":[["C","G","G","A","C","G","C","G","C","R","C","A","C","G","A","G","A","G","T","T","C"],["G","A","G","A","C","A","C","C","T","A","T","C","A","A","G","A","A","A","C","G","T"],["G","A","G","A","C","A","C","C","T","A","T","C","A","A","G","A","A","A","C","G","T"],["G","A","G","A","C","G","C","C","T","A","T","C","A","G","G","G","A","G","C","G","T"],["G","A","C","G","T","G","C","C","T","A","T","C","A","G","G","G","A","G","C","G","T"],["G","A","G","A","C","G","T","C","T","A","T","C","A","G","G","G","C","G","C","G","T"],["C","G","G","A","C","G","C","G","C","G","T","C","A","G","G","G","A","G","C","T","C"]]} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01D_snps.newick Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +(root,((((SRR1792271_zc,SRR1792272_zc),SRR1791772_zc),SRR8073662_zc),SRR1791698_zc_vcf),SRR1792265_zc);
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01_avg_mq.json Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +{"name":null,"index":["NC_002945.4:1057","NC_002945.4:4480","NC_002945.4:8741","NC_002945.4:29061","NC_002945.4:33788","NC_002945.4:41228","NC_002945.4:41437","NC_002945.4:50470","NC_002945.4:59861","NC_002945.4:69913","NC_002945.4:70082","NC_002945.4:70438","NC_002945.4:75274","NC_002945.4:79918","NC_002945.4:96244","NC_002945.4:110198","NC_002945.4:114965","NC_002945.4:117800","NC_002945.4:127447","NC_002945.4:130166","NC_002945.4:130237","NC_002945.4:140686","NC_002945.4:143799","NC_002945.4:144992","NC_002945.4:148871","NC_002945.4:159370","NC_002945.4:160535","NC_002945.4:165799","NC_002945.4:166696","NC_002945.4:179885","NC_002945.4:189083","NC_002945.4:192177","NC_002945.4:198890","NC_002945.4:223919","NC_002945.4:230661","NC_002945.4:232188","NC_002945.4:249090","NC_002945.4:295519","NC_002945.4:299636","NC_002945.4:304339","NC_002945.4:319911","NC_002945.4:332124","NC_002945.4:332128","NC_002945.4:332215","NC_002945.4:332218","NC_002945.4:333010","NC_002945.4:340088","NC_002945.4:340090","NC_002945.4:340091","NC_002945.4:340092","NC_002945.4:340097","NC_002945.4:364560","NC_002945.4:364804","NC_002945.4:366022","NC_002945.4:407246","NC_002945.4:430077","NC_002945.4:438482","NC_002945.4:441762","NC_002945.4:449922","NC_002945.4:452398","NC_002945.4:460722","NC_002945.4:467343","NC_002945.4:467402","NC_002945.4:479644","NC_002945.4:483845","NC_002945.4:485584","NC_002945.4:488897","NC_002945.4:490878","NC_002945.4:507929","NC_002945.4:518522","NC_002945.4:519412","NC_002945.4:541571","NC_002945.4:544180","NC_002945.4:577068","NC_002945.4:598704","NC_002945.4:600207","NC_002945.4:611077","NC_002945.4:622386","NC_002945.4:642172","NC_002945.4:642875","NC_002945.4:644245","NC_002945.4:649910","NC_002945.4:652349","NC_002945.4:680416","NC_002945.4:685069","NC_002945.4:701329","NC_002945.4:701386","NC_002945.4:707522","NC_002945.4:712319","NC_002945.4:723170","NC_002945.4:726979","NC_002945.4:737636","NC_002945.4:738102","NC_002945.4:745507","NC_002945.4:760347","NC_002945.4:792617","NC_002945.4:804997","NC_002945.4:808601","NC_002945.4:811737","NC_002945.4:812709","NC_002945.4:828003","NC_002945.4:832093","NC_002945.4:833960","NC_002945.4:843812","NC_002945.4:854043","NC_002945.4:865821","NC_002945.4:870116","NC_002945.4:884432","NC_002945.4:889897","NC_002945.4:905912","NC_002945.4:917766","NC_002945.4:920753","NC_002945.4:941068","NC_002945.4:942431","NC_002945.4:943719","NC_002945.4:946102","NC_002945.4:948022","NC_002945.4:948811","NC_002945.4:948974","NC_002945.4:965529","NC_002945.4:967989","NC_002945.4:973459","NC_002945.4:974604","NC_002945.4:976327","NC_002945.4:982301","NC_002945.4:990611","NC_002945.4:998183","NC_002945.4:998196","NC_002945.4:1018313","NC_002945.4:1021422","NC_002945.4:1034434","NC_002945.4:1036102","NC_002945.4:1036530","NC_002945.4:1096802","NC_002945.4:1104019","NC_002945.4:1104291","NC_002945.4:1124266","NC_002945.4:1137800","NC_002945.4:1139489","NC_002945.4:1159390","NC_002945.4:1160992","NC_002945.4:1168458","NC_002945.4:1186381","NC_002945.4:1191092","NC_002945.4:1199529","NC_002945.4:1199530","NC_002945.4:1199951","NC_002945.4:1206896","NC_002945.4:1212203","NC_002945.4:1214540","NC_002945.4:1224899","NC_002945.4:1230875","NC_002945.4:1244746","NC_002945.4:1259250","NC_002945.4:1264712","NC_002945.4:1295457","NC_002945.4:1312836","NC_002945.4:1314197","NC_002945.4:1333537","NC_002945.4:1335092","NC_002945.4:1341613","NC_002945.4:1383731","NC_002945.4:1405922","NC_002945.4:1412824","NC_002945.4:1412828","NC_002945.4:1412885","NC_002945.4:1412893","NC_002945.4:1421904","NC_002945.4:1442194","NC_002945.4:1462755","NC_002945.4:1467394","NC_002945.4:1470606","NC_002945.4:1479827","NC_002945.4:1481327","NC_002945.4:1484942","NC_002945.4:1492328","NC_002945.4:1498639","NC_002945.4:1501932","NC_002945.4:1509487","NC_002945.4:1517866","NC_002945.4:1524526","NC_002945.4:1529147","NC_002945.4:1533175","NC_002945.4:1535299","NC_002945.4:1535303","NC_002945.4:1535366","NC_002945.4:1536267","NC_002945.4:1547426","NC_002945.4:1568090","NC_002945.4:1584881","NC_002945.4:1591357","NC_002945.4:1594398","NC_002945.4:1597464","NC_002945.4:1597847","NC_002945.4:1600443","NC_002945.4:1619153","NC_002945.4:1619361","NC_002945.4:1625561","NC_002945.4:1628068","NC_002945.4:1632869","NC_002945.4:1659174","NC_002945.4:1682044","NC_002945.4:1701507","NC_002945.4:1717086","NC_002945.4:1720220","NC_002945.4:1723479","NC_002945.4:1741553","NC_002945.4:1762390","NC_002945.4:1790296","NC_002945.4:1796727","NC_002945.4:1803035","NC_002945.4:1817260","NC_002945.4:1828312","NC_002945.4:1833330","NC_002945.4:1863248","NC_002945.4:1871114","NC_002945.4:1880430","NC_002945.4:1894922","NC_002945.4:1896107","NC_002945.4:1915461","NC_002945.4:1915936","NC_002945.4:1920100","NC_002945.4:1932972","NC_002945.4:1941781","NC_002945.4:1954048","NC_002945.4:1957978","NC_002945.4:1958977","NC_002945.4:1961656","NC_002945.4:1967341","NC_002945.4:1974665","NC_002945.4:2002061","NC_002945.4:2007303","NC_002945.4:2010421","NC_002945.4:2020061","NC_002945.4:2021640","NC_002945.4:2024890","NC_002945.4:2027869","NC_002945.4:2035774","NC_002945.4:2036697","NC_002945.4:2049171","NC_002945.4:2051968","NC_002945.4:2057553","NC_002945.4:2059249","NC_002945.4:2059920","NC_002945.4:2075405","NC_002945.4:2078648","NC_002945.4:2093479","NC_002945.4:2096812","NC_002945.4:2099043","NC_002945.4:2118096","NC_002945.4:2121160","NC_002945.4:2137049","NC_002945.4:2138896","NC_002945.4:2145868","NC_002945.4:2163576","NC_002945.4:2178975","NC_002945.4:2204661","NC_002945.4:2239061","NC_002945.4:2257546","NC_002945.4:2267557","NC_002945.4:2268821","NC_002945.4:2283200","NC_002945.4:2283218","NC_002945.4:2283220","NC_002945.4:2283227","NC_002945.4:2283235","NC_002945.4:2283236","NC_002945.4:2283350","NC_002945.4:2283353","NC_002945.4:2283355","NC_002945.4:2283362","NC_002945.4:2283366","NC_002945.4:2283367","NC_002945.4:2283368","NC_002945.4:2283371","NC_002945.4:2308525","NC_002945.4:2310215","NC_002945.4:2333994","NC_002945.4:2339770","NC_002945.4:2358298","NC_002945.4:2360219","NC_002945.4:2368982","NC_002945.4:2369407","NC_002945.4:2378324","NC_002945.4:2381437","NC_002945.4:2384647","NC_002945.4:2410761","NC_002945.4:2412437","NC_002945.4:2418267","NC_002945.4:2428397","NC_002945.4:2429853","NC_002945.4:2433602","NC_002945.4:2479007","NC_002945.4:2492067","NC_002945.4:2497022","NC_002945.4:2499336","NC_002945.4:2506199","NC_002945.4:2508626","NC_002945.4:2513801","NC_002945.4:2515130","NC_002945.4:2520576","NC_002945.4:2524942","NC_002945.4:2528517","NC_002945.4:2529413","NC_002945.4:2532958","NC_002945.4:2538021","NC_002945.4:2539896","NC_002945.4:2549198","NC_002945.4:2573831","NC_002945.4:2615591","NC_002945.4:2631265","NC_002945.4:2656304","NC_002945.4:2656651","NC_002945.4:2662768","NC_002945.4:2663582","NC_002945.4:2667489","NC_002945.4:2683485","NC_002945.4:2688315","NC_002945.4:2729845","NC_002945.4:2747797","NC_002945.4:2749502","NC_002945.4:2758761","NC_002945.4:2767533","NC_002945.4:2770129","NC_002945.4:2794510","NC_002945.4:2806603","NC_002945.4:2807510","NC_002945.4:2807511","NC_002945.4:2809255","NC_002945.4:2819758","NC_002945.4:2823105","NC_002945.4:2870414","NC_002945.4:2870624","NC_002945.4:2873027","NC_002945.4:2886118","NC_002945.4:2890220","NC_002945.4:2893045","NC_002945.4:2899163","NC_002945.4:2899584","NC_002945.4:2900525","NC_002945.4:2918203","NC_002945.4:2924775","NC_002945.4:2927134","NC_002945.4:2931071","NC_002945.4:2931113","NC_002945.4:2942926","NC_002945.4:2946800","NC_002945.4:2956778","NC_002945.4:2964207","NC_002945.4:2978162","NC_002945.4:2978164","NC_002945.4:2983580","NC_002945.4:2984156","NC_002945.4:3018593","NC_002945.4:3031841","NC_002945.4:3039600","NC_002945.4:3040820","NC_002945.4:3042914","NC_002945.4:3043695","NC_002945.4:3045025","NC_002945.4:3053649","NC_002945.4:3053756","NC_002945.4:3063074","NC_002945.4:3068041","NC_002945.4:3070642","NC_002945.4:3088868","NC_002945.4:3093531","NC_002945.4:3098932","NC_002945.4:3100639","NC_002945.4:3103354","NC_002945.4:3106064","NC_002945.4:3106527","NC_002945.4:3116059","NC_002945.4:3127117","NC_002945.4:3137471","NC_002945.4:3140342","NC_002945.4:3151212","NC_002945.4:3154140","NC_002945.4:3172929","NC_002945.4:3173568","NC_002945.4:3191792","NC_002945.4:3247551","NC_002945.4:3250072","NC_002945.4:3250245","NC_002945.4:3252431","NC_002945.4:3270181","NC_002945.4:3294771","NC_002945.4:3295991","NC_002945.4:3297558","NC_002945.4:3304410","NC_002945.4:3304946","NC_002945.4:3306898","NC_002945.4:3309513","NC_002945.4:3310831","NC_002945.4:3330907","NC_002945.4:3338298","NC_002945.4:3347870","NC_002945.4:3368453","NC_002945.4:3371156","NC_002945.4:3396621","NC_002945.4:3396650","NC_002945.4:3414355","NC_002945.4:3421983","NC_002945.4:3422650","NC_002945.4:3439578","NC_002945.4:3451869","NC_002945.4:3453219","NC_002945.4:3460907","NC_002945.4:3464357","NC_002945.4:3464524","NC_002945.4:3468669","NC_002945.4:3476130","NC_002945.4:3482644","NC_002945.4:3484836","NC_002945.4:3486507","NC_002945.4:3488828","NC_002945.4:3493554","NC_002945.4:3495510","NC_002945.4:3497957","NC_002945.4:3533661","NC_002945.4:3546799","NC_002945.4:3564896","NC_002945.4:3567535","NC_002945.4:3574014","NC_002945.4:3574955","NC_002945.4:3591452","NC_002945.4:3600600","NC_002945.4:3622899","NC_002945.4:3624371","NC_002945.4:3626128","NC_002945.4:3630061","NC_002945.4:3645682","NC_002945.4:3655045","NC_002945.4:3667823","NC_002945.4:3672841","NC_002945.4:3712401","NC_002945.4:3718169","NC_002945.4:3718628","NC_002945.4:3719802","NC_002945.4:3723554","NC_002945.4:3725203","NC_002945.4:3729351","NC_002945.4:3751627","NC_002945.4:3769174","NC_002945.4:3776764","NC_002945.4:3778473","NC_002945.4:3800223","NC_002945.4:3805467","NC_002945.4:3816878","NC_002945.4:3821259","NC_002945.4:3825329","NC_002945.4:3839650","NC_002945.4:3846859","NC_002945.4:3872596","NC_002945.4:3874432","NC_002945.4:3877448","NC_002945.4:3884519","NC_002945.4:3888418","NC_002945.4:3902781","NC_002945.4:3905690","NC_002945.4:3957298","NC_002945.4:3966140","NC_002945.4:3969490","NC_002945.4:3969558","NC_002945.4:3969875","NC_002945.4:3993571","NC_002945.4:4003460","NC_002945.4:4008509","NC_002945.4:4010760","NC_002945.4:4017319","NC_002945.4:4017949","NC_002945.4:4018300","NC_002945.4:4029201","NC_002945.4:4046572","NC_002945.4:4052766","NC_002945.4:4070056","NC_002945.4:4076594","NC_002945.4:4077189","NC_002945.4:4080736","NC_002945.4:4096612","NC_002945.4:4128841","NC_002945.4:4130927","NC_002945.4:4149101","NC_002945.4:4155870","NC_002945.4:4159272","NC_002945.4:4160820","NC_002945.4:4162407","NC_002945.4:4162554","NC_002945.4:4180986","NC_002945.4:4205111","NC_002945.4:4207380","NC_002945.4:4214259","NC_002945.4:4219009","NC_002945.4:4222196","NC_002945.4:4226875","NC_002945.4:4231626","NC_002945.4:4245762","NC_002945.4:4264139","NC_002945.4:4281136","NC_002945.4:4282825","NC_002945.4:4298964","NC_002945.4:4303164","NC_002945.4:4311425","NC_002945.4:4321337","NC_002945.4:4339036","NC_002945.4:4347304","NC_002945.4:332144","NC_002945.4:332145","NC_002945.4:332154","NC_002945.4:362818","NC_002945.4:641896","NC_002945.4:673880","NC_002945.4:839308","NC_002945.4:1190076","NC_002945.4:1190080","NC_002945.4:1190084","NC_002945.4:1213847","NC_002945.4:1711760","NC_002945.4:1716413","NC_002945.4:1799442","NC_002945.4:1989922","NC_002945.4:1996251","NC_002945.4:2210027","NC_002945.4:2413021","NC_002945.4:2884747","NC_002945.4:3069493","NC_002945.4:3319244","NC_002945.4:3413486","NC_002945.4:3464485","NC_002945.4:3553753","NC_002945.4:4251588","NC_002945.4:4278315","NC_002945.4:4293932","NC_002945.4:228109","NC_002945.4:331051","NC_002945.4:331241","NC_002945.4:331411","NC_002945.4:960995","NC_002945.4:997676","NC_002945.4:1005705","NC_002945.4:1348342","NC_002945.4:1723583","NC_002945.4:1961826","NC_002945.4:3373966","NC_002945.4:3941254","NC_002945.4:4236320","NC_002945.4:1277988","NC_002945.4:1382465","NC_002945.4:1463503","NC_002945.4:1704859","NC_002945.4:1806623","NC_002945.4:1911237","NC_002945.4:3942270"],"data":[60,60,60,59,60,59,60,59,59,60,60,59,60,60,59,60,60,59,59,60,60,60,60,59,59,59,59,60,59,60,59,60,60,60,60,60,60,59,60,60,59,59,59,59,59,59,57,57,57,57,57,59,60,60,60,59,59,60,59,59,60,59,60,60,59,60,60,59,59,59,59,60,60,59,59,59,59,60,60,60,60,59,59,59,60,60,60,60,60,59,59,60,59,59,59,59,59,59,59,60,60,59,58,59,60,59,59,59,59,59,60,59,59,60,59,59,59,60,59,59,59,60,60,60,59,59,60,60,60,59,60,59,59,55,60,60,60,59,59,60,59,60,60,59,59,59,60,59,60,59,59,60,60,59,59,60,59,59,59,60,59,60,60,59,59,56,56,59,60,60,59,58,60,59,59,60,59,59,59,59,59,60,59,58,57,56,60,59,60,60,59,60,59,60,59,59,59,60,59,59,59,60,60,60,60,60,60,59,60,60,59,60,60,59,59,59,60,60,59,60,59,60,60,59,59,59,59,60,60,60,59,59,59,59,59,59,59,60,60,59,60,59,60,60,60,59,60,59,59,60,60,59,59,59,59,60,59,60,60,59,60,59,59,59,59,59,59,60,60,60,60,60,60,60,60,59,60,59,60,60,60,59,59,60,59,60,60,60,59,60,60,60,59,60,60,59,59,60,60,60,59,60,60,59,59,60,59,60,60,59,59,60,60,60,59,60,59,59,59,59,60,60,59,59,59,60,60,60,59,59,60,60,59,60,60,59,59,59,60,59,59,60,59,59,60,60,60,60,59,60,60,60,59,60,59,59,60,60,60,60,59,59,60,60,59,59,60,59,60,59,59,59,59,59,60,59,60,59,60,59,60,59,59,60,57,60,60,60,59,59,59,60,60,60,60,58,60,59,60,59,59,60,60,59,59,59,59,59,59,59,59,60,60,60,59,60,60,60,59,60,59,60,60,60,60,60,60,59,60,59,59,59,59,59,60,59,59,59,59,59,59,59,60,60,60,60,60,59,60,60,60,59,59,60,59,59,59,59,59,60,60,59,60,59,60,60,60,60,59,60,60,60,59,59,59,60,60,59,59,59,60,59,59,59,60,59,59,60,60,59,60,59,60,60,60,59,59,60,60,59,59,59,59,60,59,59,59,59,59,58,60,60,60,52,55,56,60,59,60,59,59,59,60,60,60,60,60,60,60,60,59,60,60,59,60,60,60,59,59,60,60,60,59,59,59,60,56,60,60,59,60,60,60]} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01_snps.json Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +{"columns":["NC_002945.4:1005705","NC_002945.4:1348342","NC_002945.4:1382465","NC_002945.4:1463503","NC_002945.4:1704859","NC_002945.4:1723583","NC_002945.4:1911237","NC_002945.4:1961826","NC_002945.4:228109","NC_002945.4:2412437","NC_002945.4:2413021","NC_002945.4:3069493","NC_002945.4:3319244","NC_002945.4:3373966","NC_002945.4:3413486","NC_002945.4:3941254","NC_002945.4:3942270","NC_002945.4:4236320","NC_002945.4:4278315","NC_002945.4:960995","NC_002945.4:997676"],"index":["SRR1792265_zc","SRR1792272_zc","SRR1792271_zc","SRR8073662_zc","SRR1791772_zc","SRR1791698_zc_vcf","root"],"data":[["C","G","G","A","C","G","C","G","C","R","C","A","C","G","A","G","A","G","T","T","C"],["G","A","G","A","C","A","C","C","T","A","T","C","A","A","G","A","A","A","C","G","T"],["G","A","G","A","C","A","C","C","T","A","T","C","A","A","G","A","A","A","C","G","T"],["G","A","G","A","C","G","C","C","T","A","T","C","A","G","G","G","A","G","C","G","T"],["G","A","C","G","T","G","C","C","T","A","T","C","A","G","G","G","A","G","C","G","T"],["G","A","G","A","C","G","T","C","T","A","T","C","A","G","G","G","C","G","C","G","T"],["C","G","G","A","C","G","C","G","C","G","T","C","A","G","G","G","A","G","C","T","C"]]} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Mbovis-01_snps.newick Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +(root,((((SRR1792271_zc,SRR1792272_zc),SRR1791772_zc),SRR8073662_zc),SRR1791698_zc_vcf),SRR1792265_zc);
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/NC_002945v4.fasta Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,101 @@ +>NC_002945.4 Mycobacterium bovis AF2122/97 genome assembly, chromosome: Mycobacterium_bovis_AF2122/97 +TTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGACC +CTAAGGTTGACGACGGACCCAGCAGTGATGCTAATCTCAGCGCTCCGCTGACCCCTCAGCAAAGGGCTTG +GCTCAATCTCGTCCAGCCATTGACCATCGTCGAGGGGTTTGCTCTGTTATCCGTGCCGAGCAGCTTTGTC +CAAAACGAAATCGAGCGCCATCTGCGGGCCCCGATTACCGACGCTCTCAGCCGCCGACTCGGACATCAGA +TCCAACTCGGGGTCCGCATCGCTCCGCCGGCGACCGACGAAGCCGACGACACTACCGTGCCGCCTTCCGA +AAATCCTGCTACCACATCGCCAGACACCACAACCGACAACGACGAGATTGATGACAGCGCTGCGGCACGG +GGCGATAACCAGCACAGTTGGCCAAGTTACTTCACCGAGCGCCCGCGCAATACCGATTCCGCTACCGCTG +GCGTAACCAGCCTTAACCGTCGCTACACCTTTGATACGTTCGTTATCGGCGCCTCCAACCGGTTCGCGCA +CGCCGCCGCCTTGGCGATCGCAGAAGCACCCGCCCGCGCTTACAACCCCCTGTTCATCTGGGGCGAGTCC +GGTCTCGGCAAGACACACCTGCTACACGCGGCAGGCAACTATGCCCAACGGTTGTTCCCGGGAATGCGGG +TCAAATATGTCTCCACCGAGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCAAGGTCGC +ATTCAAACGCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAATTCATTGAAGGCAAAGAG +GGTATTCAAGAGGAGTTCTTCCACACCTTCAACACCTTGCACAATGCCAACAAGCAAATCGTCATCTCAT +CTGACCGCCCACCCAAGCAGCTCGCCACCCTCGAGGACCGGCTGAGAACCCGCTTTGAGTGGGGGCTGAT +CACTGACGTACAACCACCCGAGCTGGAGACCCGCATCGCCATCTTGCGCAAGAAAGCACAGATGGAACGG +CTCGCGATCCCCGACGATGTCCTCGAACTCATCGCCAGCAGTATCGAACGCAATATCCGTGAACTCGAGG +GCGCGCTGATCCGGGTCACCGCGTTCGCCTCATTGAACAAAACACCAATCGACAAAGCGCTGGCCGAGAT +TGTGCTTCGCGATCTGATCGCCGACGCCAACACCATGCAAATCAGCGCGGCGACGATCATGGCTGCCACC +GCCGAATACTTCGACACTACCGTCGAAGAGCTTCGCGGGCCCGGCAAGACCCGAGCACTGGCCCAGTCAC +GACAGATTGCGATGTACCTGTGTCGTGAGCTCACCGATCTTTCGTTGCCCAAAATCGGCCAAGCGTTCGG +CCGTGATCACACAACCGTCATGTACGCCCAACGCAAGATCCTGTCCGAGATGGCCGAGCGCCGTGAGGTC +TTTGATCACGTCAAAGAACTCACCACTCGCATCCGTCAGCGCTCCAAGCGCTAGCACGGCGTGTTCTTCC +GACAACGTTCTTAAAAAAACTTCTCTCTCCCAGGTCACACCAGTCACAGAGATTGGCTGTGAGTGTCGCT +GTGCACAAACCGCGCACAGACTCATACAGTCCCGGCGGTTCCGTTCACAACCCACGCCTCATCCCCACCG +ACCCAACACACACCCCACAGTCATCGCCACCGTCATCCACAACTCCGACCGACGTCGACCTGCACCAAGA +CCAGACTGTCCCCAAACTGCACACCCTCTAATACTGTTACCGAGATTTCTTCGTCGTTTGTTCTTGGAAA +GACAGCGCTGGGGATCGTTCGCTGGATACCACCCGCATAACTGGCTCGTCGCGGTGGGTCAGAGGTCAAT +GATGAACTTTCAAGTTGACGTGAGAAGCTCTACGGTTGTTGTTCGACTGCTGTTGCGGCCGTCGTGGCGG +GTCACGCGTCATGGGCGTTCGTCGTTGGCAGTCCCCACGCTAGCGGGGCGCTAGCCACGGGATCGAACTC +ATCGTGAGGTGAAAGGGCGCAATGGACGCGGCTACGACAAGAGTTGGCCTCACCGACTTGACGTTTCGTT +TGCTACGAGAGTCTTTCGCCGATGCGGTGTCGTGGGTGGCTAAAAATCTGCCAGCCAGGCCCGCGGTGCC +GGTGCTCTCCGGCGTGTTGTTGACCGGCTCGGACAACGGTCTGACGATTTCCGGATTCGACTACGAGGTT +TCCGCCGAGGCCCAGGTTGGCGCTGAAATTGTTTCTCCTGGAAGCGTTTTAGTTTCTGGCCGATTGTTGT +CCGATATTACCCGGGCGTTGCCTAACAAGCCCGTAGGCGTTCATGTCGAAGGTAACCGGGTCGCATTGAC +CTGCGGTAACGCCAGGTTTTCGCTACCGACGATGCCAGTCGAGGATTATCCGACGCTGCCGACGCTGCCG +GAAGAGACCGGATTGTTGCCTGCGGAATTATTCGCCGAGGCAATCAGTCAGGTCGCTATCGCCGCCGGCC +GGGACGACACGCTGCCTATGTTGACCGGCATCCGGGTCGAAATCCTCGGTGAGACGGTGGTTTTGGCCGC +TACCGACAGGTTTCGCCTGGCTGTTCGAGAACTGAAGTGGTCGGCGTCGTCGCCAGATATCGAAGCGGCT +GTGCTGGTCCCGGCCAAGACGCTGGCCGAGGCCGCCAAAGCGGGCATCGGCGGCTCTGACGTTCGTTTGT +CGTTGGGTACTGGGCCGGGGGTGGGCAAGGATGGCCTGCTCGGTATCAGTGGGAACGGCAAGCGCAGCAC +CACGCGACTTCTTGATGCCGAGTTCCCGAAGTTTCGGCAGTTGCTACCAACCGAACACACCGCGGTGGCC +ACCATGGACGTGGCCGAGTTGATCGAAGCGATCAAGCTGGTTGCGTTGGTAGCTGATCGGGGCGCGCAGG +TGCGCATGGAGTTCGCTGATGGCAGCGTGCGGCTTTCTGCGGGTGCCGATGATGTTGGACGAGCCGAGGA +AGATCTTGTTGTTGACTATGCCGGTGAACCATTGACGATTGCGTTTAACCCAACCTATCTAACGGACGGT +TTGAGTTCGTTGCGCTCGGAGCGAGTGTCTTTCGGGTTTACGACTGCGGGTAAGCCTGCCTTGCTACGTC +CGGTGTCCGGGGACGATCGCCCTGTGGCGGGTCTGAATGGCAACGGTCCGTTCCCGGCGGTGTCGACGGA +CTATGTCTATCTGTTGATGCCGGTTCGGTTGCCGGGCTGAGCACTTGGCGCCCGGGTAGGTGTACGTCCG +TCATTTGGGGCTGCGTGACTTCCGGTCCTGGGCATGTGTAGATCTGGAATTGCATCCAGGGCGGACGGTT +TTTGTTGGGCCTAACGGTTATGGTAAGACGAATCTTATTGAGGCACTGTGGTATTCGACGACGTTAGGTT +CGCACCGCGTTAGCGCCGATTTGCCGTTGATCCGGGTAGGTACCGATCGTGCGGTGATCTCCACGATCGT +GGTGAACGACGGTAGAGAATGTGCCGTCGACCTCGAGATCGCCACGGGGCGAGTCAACAAAGCGCGATTG +AATCGATCATCGGTCCGAAGTACACGTGATGTGGTCGGAGTGCTTCGAGCTGTGTTGTTTGCCCCTGAGG +ATCTGGGGTTGGTTCGTGGGGATCCCGCTGACCGGCGGCGCTATCTGGATGATCTGGCGATCGTGCGTAG +GCCTGCGATCGCTGCGGTACGAGCCGAATATGAGAGGGTGGTGCGCCAGCGGACGGCGTTATTGAAGTCC +GTACCTGGAGCACGGTATCGGGGTGACCGGGGTGTGTTTGACACTCTTGAGGTATGGGACAGTCGTTTGG +CGGAGCACGGGGCTGAACTGGTGGCCGCCCGCATCGATTTGGTCAACCAGTTGGCACCGGAAGTGAAGAA +GGCATACCAGCTGTTGGCGCCGGAATCGCGATCGGCGTCTATCGGTTATCGGGCCAGCATGGATGTAACC +GGTCCCAGCGAGCAGTCAGATACCGATCGGCAATTGTTAGCAGCTCGGCTGTTGGCGGCGCTGGCGGCCC +GTCGGGATGCCGAACTCGAGCGTGGGGTTTGTCTAGTTGGTCCGCACCGTGACGACCTAATACTGCGACT +AGGCGATCAACCCGCGAAAGGATTTGCTAGCCATGGGGAGGCGTGGTCGTTGGCGGTGGCACTGCGGTTG +GCGGCCTATCAACTGTTACGCGTTGATGGTGGTGAGCCGGTGTTGTTGCTCGACGACGTGTTCGCCGAAC +TGGATGTCATGCGCCGTCGAGCGTTGGCGACGGCGGCCGAGTCCGCCGAACAGGTGTTGGTGACTGCCGC +GGTGCTCGAGGATATTCCCGCCGGCTGGGACGCCAGGCGGGTGCACATCGATGTGCGTGCCGATGACACC +GGATCGATGTCGGTGGTTCTGCCATGACGGGTTCTGTTGACCGGCCCGACCAGAATCGCGGTGAGCGATT +AATGAAGTCACCAGGGTTGGATTTGGTCAGGCGCACCCTGGACGAAGCTCGTGCTGCTGCCCGCGCGCGC +GGACAAGACGCCGGTCGAGGGCGGGTCGCTTCCGTTGCGTCGGGTCGGGTGGCCGGACGGCGACGAAGCT +GGTCGGGTCCGGGGCCCGACATTCGTGATCCACAACCGCTGGGTAAGGCCGCTCGTGAGCTGGCAAAGAA +ACGCGGCTGGTCGGTGCGGGTCGCCGAGGGTATGGTGCTCGGCCAGTGGTCTGCGGTGGTCGGCCACCAG +ATCGCCGAACATGCACGCCCGACTGCGCTAAACGACGGGGTGTTGAGCGTGATTGCGGAGTCGACGGCGT +GGGCGACGCAGTTGAGGATCATGCAGGCCCAGCTTCTGGCCAAGATCGCCGCAGCGGTTGGCAACGATGT +GGTGCGATCGCTAAAGATCACCGGGCCGGCGGCACCATCGTGGCGCAAGGGGCCTCGCCATATTGCCGGT +AGGGGTCCGCGCGACACCTACGGATAACACGTCGATCGGCCCAGAACAAGGCGCTCCGGTCCCGGCCTGA +GAGCCTCGAGGACGAAGCGGATCCGTATGCCGGACGTCGGGACGCACCAGGAAGAAAGATGTCCGACGCA +CGGCGCGGTTAGATGGGTAAAAACGAGGCCAGAAGATCGGCCCTGGCGCCCGATCACGGTACAGTGGTGT +GCGACCCCCTGCGGCGACTCAACCGCATGCACGCAACCCCTGAGGAGAGTATTCGGATCGTGGCTGCCCA +GAAAAAGAAGGCCCAAGACGAATACGGCGCTGCGTCTATCACCATTCTCGAAGGGCTGGAGGCCGTCCGC +AAACGTCCCGGCATGTACATTGGCTCGACCGGTGAGCGCGGTTTACACCATCTCATTTGGGAGGTGGTCG +ACAACGCGGTCGACGAGGCGATGGCCGGTTATGCAACCACAGTGAACGTAGTGCTGCTTGAGGATGGCGG +TGTCGAGGTCGCCGACGACGGCCGCGGCATTCCGGTCGCCACCCACGCCTCCGGCATACCGACCGTCGAC +GTGGTGATGACACAACTACATGCCGGCGGCAAGTTCGACTCGGACGCGTATGCGATATCTGGTGGTCTGC +ACGGCGTCGGCGTGTCGGTGGTTAACGCGCTATCCACCCGGCTCGAAGTCGAGATCAAGCGCGACGGGTA +CGAGTGGTCTCAGGTTTATGAGAAGTCGGAACCCCTGGGCCTCAAGCAAGGGGCGCCGACCAAGAAGACG +GGGTCAACGGTACGGTTCTGGGCCGACCCCGCTGTTTTCGAAACCACGGAATACGACTTCGAAACCGTCG +CCCGCCGGCTGCAAGAGATGGCGTTCCTCAACAAGGGGCTGACCATCAACCTGACCGACGAGAGGGTGAC +CCAAGACGAGGTCGTCGACGAAGTGGTCAGCGACGTCGCCGAGGCGCCGAAGTCGGCAAGTGAACGCGCA +GCCGAATCCACTGCACCGCACAAAGTTAAGAGCCGCACCTTTCACTATCCGGGTGGCCTGGTGGACTTCG +TGAAACACATCAACCGCACCAAGAACGCGATTCATAGCAGCATCGTGGACTTTTCCGGCAAGGGCACCGG +GCACGAGGTGGAGATCGCGATGCAATGGAACGCCGGGTATTCGGAGTCGGTGCACACCTTCGCCAACACC +ATCAACACCCACGAGGGCGGCACCCACGAAGAGGGCTTCCGCAGCGCGCTGACGTCGGTGGTGAACAAGT +ACGCCAAGGACCGCAAGCTACTGAAGGACAAGGACCCCAACCTCACCGGTGACGATATCCGGGAAGGCCT +GGCCGCTGTGATCTCGGTGAAGGTCAGCGAACCGCAGTTCGAGGGCCAGACCAAGACCAAGTTGGGCAAC +ACCGAGGTCAAATCGTTTGTGCAGAAGGTCTGTAATGAACAGCTGACCCACTGGTTTGAAGCCAACCCCA +CCGACTCGAAAGTCGTTGTGAACAAGGCTGTGTCCTCGGCGCAAGCCCGTATCGCGGCACGTAAGGCACG +AGAGTTGGTGCGGCGTAAGAGCGCCACCGACATCGGTGGATTGCCCGGCAAGCTGGCCGATTGCCGTTCC +ACGGATCCGCGCAAGTCCGAACTGTATGTCGTAGAAGGTGACTCGGCCGGCGGTTCTGCAAAAAGCGGTC +GCGATTCGATGTTCCAGGCGATACTTCCGCTGCGCGGCAAGATCATCAATGTGGAGAAAGCGCGCATCGA +CCGGGTGCTAAAGAACACCGAAGTTCAGGCGATCATCACGGCGCTGGGCACCGGGATCCACGACGAGTTC +GATATCGGCAAGCTGCGCTACCACAAGATCGTGCTGATGGCCGACGCCGATGTTGACGGCCAACATATTT +CCACGCTGTTGTTGACGTTGTTGTTCCGGTTCATGCGGCCGCTCATCGAGAACGGGCATGTGTTTTTGGC +ACAACCGCCGCTGTACAAACTCAAGTGGCAGCGCAGTGACCCGGAATTCGCATACTCCGACCGCGAGCGC
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/NC_002945v4.yml Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,5 @@ +bovis: + - '11001110' + - '11011110' + - '11001100' +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fasta_indexes.loc Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +89 89 Mycobacterium_AF2122 ${__HERE__}/NC_002945v4.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/input_avg_mq_json.json Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +{"name":null,"index":["NC_002945.4:1005705","NC_002945.4:1018313","NC_002945.4:1021422","NC_002945.4:1034434","NC_002945.4:1036102","NC_002945.4:1036530","NC_002945.4:1057","NC_002945.4:1096802","NC_002945.4:110198","NC_002945.4:1104019","NC_002945.4:1104291","NC_002945.4:1124266","NC_002945.4:1137800","NC_002945.4:1139489","NC_002945.4:114965","NC_002945.4:1159390","NC_002945.4:1160992","NC_002945.4:1168458","NC_002945.4:117800","NC_002945.4:1186381","NC_002945.4:1190076","NC_002945.4:1190080","NC_002945.4:1190084","NC_002945.4:1191092","NC_002945.4:1199529","NC_002945.4:1199530","NC_002945.4:1199951","NC_002945.4:1206896","NC_002945.4:1212203","NC_002945.4:1213847","NC_002945.4:1214540","NC_002945.4:1224899","NC_002945.4:1230875","NC_002945.4:1244746","NC_002945.4:1259250","NC_002945.4:1264712","NC_002945.4:127447","NC_002945.4:1277988","NC_002945.4:1295457","NC_002945.4:130166","NC_002945.4:130237","NC_002945.4:1312836","NC_002945.4:1314197","NC_002945.4:1333537","NC_002945.4:1335092","NC_002945.4:1341613","NC_002945.4:1348342","NC_002945.4:1382465","NC_002945.4:1383731","NC_002945.4:1405922","NC_002945.4:140686","NC_002945.4:1412824","NC_002945.4:1412828","NC_002945.4:1412885","NC_002945.4:1412893","NC_002945.4:1421904","NC_002945.4:143799","NC_002945.4:1442194","NC_002945.4:144992","NC_002945.4:1463503","NC_002945.4:1467394","NC_002945.4:1470606","NC_002945.4:1479827","NC_002945.4:1481327","NC_002945.4:1484942","NC_002945.4:148871","NC_002945.4:1492328","NC_002945.4:1498639","NC_002945.4:1501932","NC_002945.4:1509487","NC_002945.4:1517866","NC_002945.4:1524526","NC_002945.4:1529147","NC_002945.4:1533175","NC_002945.4:1535299","NC_002945.4:1535303","NC_002945.4:1535366","NC_002945.4:1536267","NC_002945.4:1547426","NC_002945.4:1568090","NC_002945.4:1584881","NC_002945.4:1591357","NC_002945.4:159370","NC_002945.4:1594398","NC_002945.4:1597464","NC_002945.4:1597847","NC_002945.4:1600443","NC_002945.4:160535","NC_002945.4:1619153","NC_002945.4:1619361","NC_002945.4:1625561","NC_002945.4:1628068","NC_002945.4:1632869","NC_002945.4:165799","NC_002945.4:1659174","NC_002945.4:166696","NC_002945.4:1682044","NC_002945.4:1701507","NC_002945.4:1704859","NC_002945.4:1711760","NC_002945.4:1716413","NC_002945.4:1717086","NC_002945.4:1720220","NC_002945.4:1723583","NC_002945.4:1741553","NC_002945.4:1762390","NC_002945.4:1790296","NC_002945.4:179885","NC_002945.4:1799442","NC_002945.4:1803035","NC_002945.4:1806623","NC_002945.4:1817260","NC_002945.4:1828312","NC_002945.4:1833330","NC_002945.4:1863248","NC_002945.4:1871114","NC_002945.4:1880430","NC_002945.4:189083","NC_002945.4:1894922","NC_002945.4:1896107","NC_002945.4:1911237","NC_002945.4:1915461","NC_002945.4:1915936","NC_002945.4:1920100","NC_002945.4:192177","NC_002945.4:1932972","NC_002945.4:1941781","NC_002945.4:1954048","NC_002945.4:1957978","NC_002945.4:1958977","NC_002945.4:1961656","NC_002945.4:1961826","NC_002945.4:1974665","NC_002945.4:198890","NC_002945.4:1989922","NC_002945.4:1996251","NC_002945.4:2002061","NC_002945.4:2007303","NC_002945.4:2010421","NC_002945.4:2020061","NC_002945.4:2021640","NC_002945.4:2024890","NC_002945.4:2027869","NC_002945.4:2035774","NC_002945.4:2036697","NC_002945.4:2049171","NC_002945.4:2057553","NC_002945.4:2059249","NC_002945.4:2059920","NC_002945.4:2075405","NC_002945.4:2078648","NC_002945.4:2093479","NC_002945.4:2096812","NC_002945.4:2099043","NC_002945.4:2118096","NC_002945.4:2121160","NC_002945.4:2137049","NC_002945.4:2138896","NC_002945.4:2145868","NC_002945.4:2163576","NC_002945.4:2204661","NC_002945.4:2210027","NC_002945.4:2239061","NC_002945.4:223919","NC_002945.4:2257546","NC_002945.4:2267557","NC_002945.4:2268821","NC_002945.4:228109","NC_002945.4:2283200","NC_002945.4:2283218","NC_002945.4:2283220","NC_002945.4:2283227","NC_002945.4:2283235","NC_002945.4:2283236","NC_002945.4:2283350","NC_002945.4:2283353","NC_002945.4:2283355","NC_002945.4:2283362","NC_002945.4:2283366","NC_002945.4:2283367","NC_002945.4:2283368","NC_002945.4:2283371","NC_002945.4:230661","NC_002945.4:2308525","NC_002945.4:2310215","NC_002945.4:232188","NC_002945.4:2333994","NC_002945.4:2339770","NC_002945.4:2358298","NC_002945.4:2360219","NC_002945.4:2368982","NC_002945.4:2369407","NC_002945.4:2378324","NC_002945.4:2381437","NC_002945.4:2384647","NC_002945.4:2410761","NC_002945.4:2412437","NC_002945.4:2413021","NC_002945.4:2418267","NC_002945.4:2428397","NC_002945.4:2433602","NC_002945.4:2479007","NC_002945.4:2492067","NC_002945.4:2497022","NC_002945.4:2499336","NC_002945.4:2506199","NC_002945.4:2508626","NC_002945.4:2513801","NC_002945.4:2515130","NC_002945.4:2520576","NC_002945.4:2524942","NC_002945.4:2528517","NC_002945.4:2529413","NC_002945.4:2532958","NC_002945.4:2538021","NC_002945.4:2539896","NC_002945.4:2549198","NC_002945.4:2573831","NC_002945.4:2615591","NC_002945.4:2631265","NC_002945.4:2656304","NC_002945.4:2656651","NC_002945.4:2662768","NC_002945.4:2663582","NC_002945.4:2667489","NC_002945.4:2683485","NC_002945.4:2688315","NC_002945.4:2729845","NC_002945.4:2747797","NC_002945.4:2749502","NC_002945.4:2758761","NC_002945.4:2767533","NC_002945.4:2770129","NC_002945.4:2794510","NC_002945.4:2806603","NC_002945.4:2807510","NC_002945.4:2807511","NC_002945.4:2809255","NC_002945.4:2819758","NC_002945.4:2823105","NC_002945.4:2870414","NC_002945.4:2870624","NC_002945.4:2873027","NC_002945.4:2884747","NC_002945.4:2886118","NC_002945.4:2890220","NC_002945.4:2893045","NC_002945.4:2899163","NC_002945.4:2899584","NC_002945.4:2900525","NC_002945.4:29061","NC_002945.4:2918203","NC_002945.4:2924775","NC_002945.4:2927134","NC_002945.4:2931071","NC_002945.4:2931113","NC_002945.4:2942926","NC_002945.4:2946800","NC_002945.4:295519","NC_002945.4:2956778","NC_002945.4:2964207","NC_002945.4:2978162","NC_002945.4:2978164","NC_002945.4:2983580","NC_002945.4:2984156","NC_002945.4:299636","NC_002945.4:3018593","NC_002945.4:3031841","NC_002945.4:3039600","NC_002945.4:3040820","NC_002945.4:3042914","NC_002945.4:304339","NC_002945.4:3045025","NC_002945.4:3053649","NC_002945.4:3053756","NC_002945.4:3063074","NC_002945.4:3068041","NC_002945.4:3069493","NC_002945.4:3070642","NC_002945.4:3088868","NC_002945.4:3093531","NC_002945.4:3098932","NC_002945.4:3100639","NC_002945.4:3103354","NC_002945.4:3106064","NC_002945.4:3106527","NC_002945.4:3116059","NC_002945.4:3127117","NC_002945.4:3137471","NC_002945.4:3140342","NC_002945.4:3151212","NC_002945.4:3154140","NC_002945.4:3172929","NC_002945.4:3173568","NC_002945.4:3191792","NC_002945.4:319911","NC_002945.4:3247551","NC_002945.4:3250072","NC_002945.4:3250245","NC_002945.4:3270181","NC_002945.4:3294771","NC_002945.4:3295991","NC_002945.4:3297558","NC_002945.4:3304410","NC_002945.4:3304946","NC_002945.4:3306898","NC_002945.4:331051","NC_002945.4:3310831","NC_002945.4:331241","NC_002945.4:331411","NC_002945.4:3319244","NC_002945.4:332124","NC_002945.4:332128","NC_002945.4:332144","NC_002945.4:332145","NC_002945.4:332154","NC_002945.4:332215","NC_002945.4:332218","NC_002945.4:333010","NC_002945.4:3330907","NC_002945.4:3338298","NC_002945.4:3347870","NC_002945.4:3368453","NC_002945.4:3371156","NC_002945.4:3373966","NC_002945.4:33788","NC_002945.4:3396621","NC_002945.4:3396650","NC_002945.4:340088","NC_002945.4:340090","NC_002945.4:340091","NC_002945.4:340092","NC_002945.4:340097","NC_002945.4:3413486","NC_002945.4:3414355","NC_002945.4:3421983","NC_002945.4:3422650","NC_002945.4:3439578","NC_002945.4:3451869","NC_002945.4:3453219","NC_002945.4:3460907","NC_002945.4:3464357","NC_002945.4:3464485","NC_002945.4:3464524","NC_002945.4:3468669","NC_002945.4:3476130","NC_002945.4:3482644","NC_002945.4:3484836","NC_002945.4:3486507","NC_002945.4:3493554","NC_002945.4:3495510","NC_002945.4:3497957","NC_002945.4:3533661","NC_002945.4:3546799","NC_002945.4:3553753","NC_002945.4:3564896","NC_002945.4:3567535","NC_002945.4:3574014","NC_002945.4:3574955","NC_002945.4:3591452","NC_002945.4:3600600","NC_002945.4:3622899","NC_002945.4:3624371","NC_002945.4:3626128","NC_002945.4:362818","NC_002945.4:3630061","NC_002945.4:364560","NC_002945.4:3645682","NC_002945.4:364804","NC_002945.4:3655045","NC_002945.4:366022","NC_002945.4:3667823","NC_002945.4:3712401","NC_002945.4:3718169","NC_002945.4:3718628","NC_002945.4:3719802","NC_002945.4:3723554","NC_002945.4:3725203","NC_002945.4:3729351","NC_002945.4:3751627","NC_002945.4:3769174","NC_002945.4:3776764","NC_002945.4:3778473","NC_002945.4:3800223","NC_002945.4:3805467","NC_002945.4:3816878","NC_002945.4:3821259","NC_002945.4:3839650","NC_002945.4:3846859","NC_002945.4:3874432","NC_002945.4:3877448","NC_002945.4:3884519","NC_002945.4:3888418","NC_002945.4:3902781","NC_002945.4:3905690","NC_002945.4:3941254","NC_002945.4:3942270","NC_002945.4:3957298","NC_002945.4:3966140","NC_002945.4:3969490","NC_002945.4:3969558","NC_002945.4:3969875","NC_002945.4:4003460","NC_002945.4:4008509","NC_002945.4:4010760","NC_002945.4:4017319","NC_002945.4:4018300","NC_002945.4:4029201","NC_002945.4:4046572","NC_002945.4:4070056","NC_002945.4:407246","NC_002945.4:4076594","NC_002945.4:4077189","NC_002945.4:4080736","NC_002945.4:4096612","NC_002945.4:41228","NC_002945.4:4128841","NC_002945.4:4130927","NC_002945.4:41437","NC_002945.4:4149101","NC_002945.4:4155870","NC_002945.4:4159272","NC_002945.4:4160820","NC_002945.4:4162407","NC_002945.4:4162554","NC_002945.4:4180986","NC_002945.4:4205111","NC_002945.4:4207380","NC_002945.4:4214259","NC_002945.4:4219009","NC_002945.4:4222196","NC_002945.4:4226875","NC_002945.4:4231626","NC_002945.4:4236320","NC_002945.4:4245762","NC_002945.4:4251588","NC_002945.4:4264139","NC_002945.4:4278315","NC_002945.4:4281136","NC_002945.4:4282825","NC_002945.4:4293932","NC_002945.4:4298964","NC_002945.4:430077","NC_002945.4:4303164","NC_002945.4:4311425","NC_002945.4:4321337","NC_002945.4:4339036","NC_002945.4:4347304","NC_002945.4:438482","NC_002945.4:441762","NC_002945.4:4480","NC_002945.4:449922","NC_002945.4:452398","NC_002945.4:460722","NC_002945.4:467343","NC_002945.4:467402","NC_002945.4:479644","NC_002945.4:483845","NC_002945.4:485584","NC_002945.4:488897","NC_002945.4:490878","NC_002945.4:50470","NC_002945.4:507929","NC_002945.4:518522","NC_002945.4:519412","NC_002945.4:541571","NC_002945.4:544180","NC_002945.4:577068","NC_002945.4:59861","NC_002945.4:598704","NC_002945.4:600207","NC_002945.4:611077","NC_002945.4:622386","NC_002945.4:641896","NC_002945.4:642875","NC_002945.4:644245","NC_002945.4:649910","NC_002945.4:652349","NC_002945.4:673880","NC_002945.4:680416","NC_002945.4:685069","NC_002945.4:69913","NC_002945.4:70082","NC_002945.4:701329","NC_002945.4:701386","NC_002945.4:70438","NC_002945.4:712319","NC_002945.4:723170","NC_002945.4:726979","NC_002945.4:737636","NC_002945.4:738102","NC_002945.4:745507","NC_002945.4:760347","NC_002945.4:792617","NC_002945.4:79918","NC_002945.4:804997","NC_002945.4:808601","NC_002945.4:811737","NC_002945.4:812709","NC_002945.4:828003","NC_002945.4:832093","NC_002945.4:833960","NC_002945.4:839308","NC_002945.4:843812","NC_002945.4:854043","NC_002945.4:865821","NC_002945.4:870116","NC_002945.4:8741","NC_002945.4:884432","NC_002945.4:889897","NC_002945.4:905912","NC_002945.4:917766","NC_002945.4:920753","NC_002945.4:941068","NC_002945.4:942431","NC_002945.4:943719","NC_002945.4:946102","NC_002945.4:948022","NC_002945.4:948811","NC_002945.4:948974","NC_002945.4:960995","NC_002945.4:96244","NC_002945.4:965529","NC_002945.4:967989","NC_002945.4:973459","NC_002945.4:974604","NC_002945.4:976327","NC_002945.4:982301","NC_002945.4:990611","NC_002945.4:997676","NC_002945.4:998183","NC_002945.4:998196"],"data":[60,60,59,60,59,59,60,55,60,60,60,60,59,59,60,60,59,60,59,60,52,55,56,59,59,59,60,59,60,60,59,59,60,60,59,59,59,56,60,60,60,59,59,59,60,59,60,60,60,60,60,59,59,56,56,59,60,60,59,60,59,58,60,59,59,59,60,59,59,59,59,59,60,59,58,57,57,60,59,60,60,59,59,60,59,60,59,59,59,59,60,59,59,60,60,59,60,60,59,59,60,60,60,60,60,59,60,60,59,59,60,60,60,59,59,59,60,59,60,59,60,60,59,60,60,60,59,59,59,59,60,59,60,60,59,59,59,59,59,59,59,59,59,60,60,59,59,60,60,60,59,60,59,59,60,60,59,59,59,59,59,60,60,60,60,59,60,59,59,59,59,59,59,59,60,60,60,60,60,60,60,60,60,59,60,60,59,60,60,60,59,59,60,59,60,60,60,60,59,60,60,59,60,60,59,59,60,60,60,59,60,60,59,59,60,59,60,60,59,59,60,60,60,59,60,59,59,59,59,60,60,59,59,59,60,60,60,59,59,60,60,59,60,60,60,59,59,59,60,59,59,59,60,59,59,60,60,60,59,60,59,60,60,60,59,60,60,59,59,60,60,60,60,60,59,60,60,60,59,59,60,59,60,59,59,59,59,59,60,59,60,59,60,59,60,59,59,59,60,60,60,60,59,59,59,60,60,60,60,60,60,59,59,59,59,59,59,59,59,60,58,60,59,60,59,60,59,59,57,57,57,57,57,60,60,60,59,59,59,59,59,59,60,59,59,60,60,60,59,60,60,59,60,59,60,60,60,60,60,60,60,59,60,59,58,59,59,59,60,59,60,59,59,59,59,59,59,59,59,60,60,60,60,60,59,60,60,59,59,59,59,59,59,59,60,59,60,60,59,60,59,60,60,60,59,60,60,59,59,60,60,60,59,59,59,59,60,59,60,59,59,60,59,59,60,60,59,60,59,60,60,60,59,60,59,59,60,60,60,59,60,59,59,59,59,60,59,59,59,60,60,59,59,60,59,60,60,59,60,60,59,59,59,59,59,60,60,59,59,59,59,59,60,60,60,60,60,59,60,59,60,60,60,60,60,59,60,59,59,60,59,59,59,59,60,59,59,59,60,60,59,58,60,59,60,59,59,60,59,59,59,60,59,59,60,59,59,59,60,59,59,59,59,59,60,60,60,59,59,59,60,60]} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/input_newick.newick Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +(root,((((SRR1792271_zc,SRR1792272_zc),SRR1791772_zc),SRR8073662_zc),SRR1791698_zc_vcf),SRR1792265_zc);
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/input_snps_json.json Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +{"columns":["NC_002945.4:1005705","NC_002945.4:1348342","NC_002945.4:1382465","NC_002945.4:1463503","NC_002945.4:1704859","NC_002945.4:1723583","NC_002945.4:1911237","NC_002945.4:1961826","NC_002945.4:228109","NC_002945.4:2412437","NC_002945.4:2413021","NC_002945.4:3069493","NC_002945.4:3319244","NC_002945.4:3373966","NC_002945.4:3413486","NC_002945.4:3941254","NC_002945.4:3942270","NC_002945.4:4236320","NC_002945.4:4278315","NC_002945.4:960995","NC_002945.4:997676"],"index":["SRR1792265_zc","SRR1792272_zc","SRR1792271_zc","SRR8073662_zc","SRR1791772_zc","SRR1791698_zc_vcf","root"],"data":[["C","G","G","A","C","G","C","G","C","R","C","A","C","G","A","G","A","G","T","T","C"],["G","A","G","A","C","A","C","C","T","A","T","C","A","A","G","A","A","A","C","G","T"],["G","A","G","A","C","A","C","C","T","A","T","C","A","A","G","A","A","A","C","G","T"],["G","A","G","A","C","G","C","C","T","A","T","C","A","G","G","G","A","G","C","G","T"],["G","A","C","G","T","G","C","C","T","A","T","C","A","G","G","G","A","G","C","G","T"],["G","A","G","A","C","G","T","C","T","A","T","C","A","G","G","G","C","G","C","G","T"],["C","G","G","A","C","G","C","G","C","G","T","C","A","G","G","G","A","G","C","T","C"]]} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/output_dbkey.txt Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +AF2122 \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/output_metrics.tabular Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,2 @@ +# File Number of Good SNPs Average Coverage Genome Coverage + 0
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/output_metrics.txt Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,6 @@ +Sample: Mcap_Deer_DE_SRR650221 +Brucella counts: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, +TB counts: 2,2,0,0,4,5,0,0, +Para counts: 0,0,0, +Group: TB +dbkey: AF2122
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/output_vcf.vcf Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,100 @@ +##fileformat=VCFv4.2 +##fileDate=20200302 +##source=freeBayes v1.3.1-dirty +##reference=/home/galaxy/galaxy/tool-data/AF2122/seq/AF2122.fa +##contig=<ID=NC_002945.4,length=4349904> +##phasing=none +##commandline="freebayes --region NC_002945.4:0..4349904 --bam b_0.bam --fasta-reference /home/galaxy/galaxy/tool-data/AF2122/seq/AF2122.fa --vcf ./vcf_output/part_NC_002945.4:0..4349904.vcf -u -n 0 --haplotype-length -1 --min-repeat-size 5 --min-repeat-entropy 1 -m 1 -q 0 -R 0 -Y 0 -e 1 -F 0.05 -C 2 -G 1 --min-alternate-qsum 0" +##filter="QUAL > 0" +##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data"> +##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the locus"> +##INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at the locus; bases in reads overlapping / bases in haplotype"> +##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes"> +##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes"> +##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]"> +##INFO=<ID=RO,Number=1,Type=Integer,Description="Count of full observations of the reference haplotype."> +##INFO=<ID=AO,Number=A,Type=Integer,Description="Count of full observations of this alternate haplotype."> +##INFO=<ID=PRO,Number=1,Type=Float,Description="Reference allele observation count, with partial observations recorded fractionally"> +##INFO=<ID=PAO,Number=A,Type=Float,Description="Alternate allele observations, with partial observations recorded fractionally"> +##INFO=<ID=QR,Number=1,Type=Integer,Description="Reference allele quality sum in phred"> +##INFO=<ID=QA,Number=A,Type=Integer,Description="Alternate allele quality sum in phred"> +##INFO=<ID=PQR,Number=1,Type=Float,Description="Reference allele quality sum in phred for partial observations"> +##INFO=<ID=PQA,Number=A,Type=Float,Description="Alternate allele quality sum in phred for partial observations"> +##INFO=<ID=SRF,Number=1,Type=Integer,Description="Number of reference observations on the forward strand"> +##INFO=<ID=SRR,Number=1,Type=Integer,Description="Number of reference observations on the reverse strand"> +##INFO=<ID=SAF,Number=A,Type=Integer,Description="Number of alternate observations on the forward strand"> +##INFO=<ID=SAR,Number=A,Type=Integer,Description="Number of alternate observations on the reverse strand"> +##INFO=<ID=SRP,Number=1,Type=Float,Description="Strand balance probability for the reference allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SRF and SRR given E(SRF/SRR) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=SAP,Number=A,Type=Float,Description="Strand balance probability for the alternate allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SAF and SAR given E(SAF/SAR) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=AB,Number=A,Type=Float,Description="Allele balance at heterozygous sites: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous"> +##INFO=<ID=ABP,Number=A,Type=Float,Description="Allele balance probability at heterozygous sites: Phred-scaled upper-bounds estimate of the probability of observing the deviation between ABR and ABA given E(ABR/ABA) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=RUN,Number=A,Type=Integer,Description="Run length: the number of consecutive repeats of the alternate allele in the reference genome"> +##INFO=<ID=RPP,Number=A,Type=Float,Description="Read Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=RPPR,Number=1,Type=Float,Description="Read Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=RPL,Number=A,Type=Float,Description="Reads Placed Left: number of reads supporting the alternate balanced to the left (5') of the alternate allele"> +##INFO=<ID=RPR,Number=A,Type=Float,Description="Reads Placed Right: number of reads supporting the alternate balanced to the right (3') of the alternate allele"> +##INFO=<ID=EPP,Number=A,Type=Float,Description="End Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=EPPR,Number=1,Type=Float,Description="End Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=DPRA,Number=A,Type=Float,Description="Alternate allele depth ratio. Ratio between depth in samples with each called alternate allele and those without."> +##INFO=<ID=ODDS,Number=1,Type=Float,Description="The log odds ratio of the best genotype combination to the second-best."> +##INFO=<ID=GTI,Number=1,Type=Integer,Description="Number of genotyping iterations required to reach convergence or bailout."> +##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex."> +##INFO=<ID=CIGAR,Number=A,Type=String,Description="The extended CIGAR representation of each alternate allele, with the exception that '=' is replaced by 'M' to ease VCF parsing. Note that INDEL alleles do not have the first matched base (which is provided by default, per the spec) referred to by the CIGAR."> +##INFO=<ID=NUMALT,Number=1,Type=Integer,Description="Number of unique non-reference alleles in called genotypes at this position."> +##INFO=<ID=MEANALT,Number=A,Type=Float,Description="Mean number of unique non-reference allele observations per sample with the corresponding alternate alleles."> +##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length"> +##INFO=<ID=MQM,Number=A,Type=Float,Description="Mean mapping quality of observed alternate alleles"> +##INFO=<ID=MQMR,Number=1,Type=Float,Description="Mean mapping quality of observed reference alleles"> +##INFO=<ID=PAIRED,Number=A,Type=Float,Description="Proportion of observed alternate alleles which are supported by properly paired read fragments"> +##INFO=<ID=PAIREDR,Number=1,Type=Float,Description="Proportion of observed reference alleles which are supported by properly paired read fragments"> +##INFO=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in gVCF output block."> +##INFO=<ID=END,Number=1,Type=Integer,Description="Last position (inclusive) in gVCF output record."> +##INFO=<ID=technology.ILLUMINA,Number=A,Type=Float,Description="Fraction of observations supporting the alternate observed in reads from ILLUMINA"> +##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> +##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype"> +##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy"> +##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> +##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each allele"> +##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count"> +##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference observations"> +##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count"> +##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate observations"> +##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in gVCF output block."> +#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 13-1941-6 +NC_002945.4 1 . N . . . . GT ./. +NC_002945.4 2 . N . . . . GT ./. +NC_002945.4 3 . N . . . . GT ./. +NC_002945.4 4 . N . . . . GT ./. +NC_002945.4 5 . N . . . . GT ./. +NC_002945.4 6 . N . . . . GT ./. +NC_002945.4 7 . N . . . . GT ./. +NC_002945.4 8 . N . . . . GT ./. +NC_002945.4 9 . N . . . . GT ./. +NC_002945.4 10 . N . . . . GT ./. +NC_002945.4 11 . N . . . . GT ./. +NC_002945.4 12 . N . . . . GT ./. +NC_002945.4 13 . N . . . . GT ./. +NC_002945.4 14 . N . . . . GT ./. +NC_002945.4 15 . N . . . . GT ./. +NC_002945.4 16 . N . . . . GT ./. +NC_002945.4 17 . N . . . . GT ./. +NC_002945.4 18 . N . . . . GT ./. +NC_002945.4 19 . N . . . . GT ./. +NC_002945.4 20 . N . . . . GT ./. +NC_002945.4 21 . N . . . . GT ./. +NC_002945.4 22 . N . . . . GT ./. +NC_002945.4 23 . N . . . . GT ./. +NC_002945.4 24 . N . . . . GT ./. +NC_002945.4 25 . N . . . . GT ./. +NC_002945.4 26 . N . . . . GT ./. +NC_002945.4 27 . N . . . . GT ./. +NC_002945.4 28 . N . . . . GT ./. +NC_002945.4 29 . N . . . . GT ./. +NC_002945.4 30 . N . . . . GT ./. +NC_002945.4 31 . N . . . . GT ./. +NC_002945.4 32 . N . . . . GT ./. +NC_002945.4 33 . N . . . . GT ./. +NC_002945.4 34 . N . . . . GT ./. +NC_002945.4 35 . N . . . . GT ./. +NC_002945.4 36 . N . . . . GT ./. +NC_002945.4 37 . N . . . . GT ./.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/paired_dbkey.txt Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,1 @@ +AF2122 \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/paired_metrics.txt Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,6 @@ +Sample: forward +Brucella counts: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, +TB counts: 4,4,0,0,8,10,0,0, +Para counts: 0,0,0, +Group: TB +dbkey: AF2122
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/vcf_input.vcf Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,64 @@ +##fileformat=VCFv4.2 +##fileDate=20200302 +##source=freeBayes v1.3.1-dirty +##reference=/home/galaxy/galaxy/tool-data/AF2122/seq/AF2122.fa +##contig=<ID=NC_002945.4,length=4349904> +##phasing=none +##commandline="freebayes --region NC_002945.4:0..4349904 --bam b_0.bam --fasta-reference /home/galaxy/galaxy/tool-data/AF2122/seq/AF2122.fa --vcf ./vcf_output/part_NC_002945.4:0..4349904.vcf -u -n 0 --haplotype-length -1 --min-repeat-size 5 --min-repeat-entropy 1 -m 1 -q 0 -R 0 -Y 0 -e 1 -F 0.05 -C 2 -G 1 --min-alternate-qsum 0" +##filter="QUAL > 0" +##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data"> +##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the locus"> +##INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at the locus; bases in reads overlapping / bases in haplotype"> +##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes"> +##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes"> +##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]"> +##INFO=<ID=RO,Number=1,Type=Integer,Description="Count of full observations of the reference haplotype."> +##INFO=<ID=AO,Number=A,Type=Integer,Description="Count of full observations of this alternate haplotype."> +##INFO=<ID=PRO,Number=1,Type=Float,Description="Reference allele observation count, with partial observations recorded fractionally"> +##INFO=<ID=PAO,Number=A,Type=Float,Description="Alternate allele observations, with partial observations recorded fractionally"> +##INFO=<ID=QR,Number=1,Type=Integer,Description="Reference allele quality sum in phred"> +##INFO=<ID=QA,Number=A,Type=Integer,Description="Alternate allele quality sum in phred"> +##INFO=<ID=PQR,Number=1,Type=Float,Description="Reference allele quality sum in phred for partial observations"> +##INFO=<ID=PQA,Number=A,Type=Float,Description="Alternate allele quality sum in phred for partial observations"> +##INFO=<ID=SRF,Number=1,Type=Integer,Description="Number of reference observations on the forward strand"> +##INFO=<ID=SRR,Number=1,Type=Integer,Description="Number of reference observations on the reverse strand"> +##INFO=<ID=SAF,Number=A,Type=Integer,Description="Number of alternate observations on the forward strand"> +##INFO=<ID=SAR,Number=A,Type=Integer,Description="Number of alternate observations on the reverse strand"> +##INFO=<ID=SRP,Number=1,Type=Float,Description="Strand balance probability for the reference allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SRF and SRR given E(SRF/SRR) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=SAP,Number=A,Type=Float,Description="Strand balance probability for the alternate allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SAF and SAR given E(SAF/SAR) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=AB,Number=A,Type=Float,Description="Allele balance at heterozygous sites: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous"> +##INFO=<ID=ABP,Number=A,Type=Float,Description="Allele balance probability at heterozygous sites: Phred-scaled upper-bounds estimate of the probability of observing the deviation between ABR and ABA given E(ABR/ABA) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=RUN,Number=A,Type=Integer,Description="Run length: the number of consecutive repeats of the alternate allele in the reference genome"> +##INFO=<ID=RPP,Number=A,Type=Float,Description="Read Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=RPPR,Number=1,Type=Float,Description="Read Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=RPL,Number=A,Type=Float,Description="Reads Placed Left: number of reads supporting the alternate balanced to the left (5') of the alternate allele"> +##INFO=<ID=RPR,Number=A,Type=Float,Description="Reads Placed Right: number of reads supporting the alternate balanced to the right (3') of the alternate allele"> +##INFO=<ID=EPP,Number=A,Type=Float,Description="End Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=EPPR,Number=1,Type=Float,Description="End Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality"> +##INFO=<ID=DPRA,Number=A,Type=Float,Description="Alternate allele depth ratio. Ratio between depth in samples with each called alternate allele and those without."> +##INFO=<ID=ODDS,Number=1,Type=Float,Description="The log odds ratio of the best genotype combination to the second-best."> +##INFO=<ID=GTI,Number=1,Type=Integer,Description="Number of genotyping iterations required to reach convergence or bailout."> +##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex."> +##INFO=<ID=CIGAR,Number=A,Type=String,Description="The extended CIGAR representation of each alternate allele, with the exception that '=' is replaced by 'M' to ease VCF parsing. Note that INDEL alleles do not have the first matched base (which is provided by default, per the spec) referred to by the CIGAR."> +##INFO=<ID=NUMALT,Number=1,Type=Integer,Description="Number of unique non-reference alleles in called genotypes at this position."> +##INFO=<ID=MEANALT,Number=A,Type=Float,Description="Mean number of unique non-reference allele observations per sample with the corresponding alternate alleles."> +##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length"> +##INFO=<ID=MQM,Number=A,Type=Float,Description="Mean mapping quality of observed alternate alleles"> +##INFO=<ID=MQMR,Number=1,Type=Float,Description="Mean mapping quality of observed reference alleles"> +##INFO=<ID=PAIRED,Number=A,Type=Float,Description="Proportion of observed alternate alleles which are supported by properly paired read fragments"> +##INFO=<ID=PAIREDR,Number=1,Type=Float,Description="Proportion of observed reference alleles which are supported by properly paired read fragments"> +##INFO=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in gVCF output block."> +##INFO=<ID=END,Number=1,Type=Integer,Description="Last position (inclusive) in gVCF output record."> +##INFO=<ID=technology.ILLUMINA,Number=A,Type=Float,Description="Fraction of observations supporting the alternate observed in reads from ILLUMINA"> +##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> +##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype"> +##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy"> +##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> +##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each allele"> +##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count"> +##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference observations"> +##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count"> +##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate observations"> +##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in gVCF output block."> +#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 13-1941-6 +NC_002945.4 2898437 . T G 0.263449 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=3.0103;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=2.77259;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=0;QR=0;RO=0;RPL=2;RPP=7.35324;RPPR=0;RPR=0;RUN=1;SAF=1;SAP=3.0103;SAR=1;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.ILLUMINA=1 GT:DP:AD:RO:QR:AO:QA:GL 0/0:2:0,2:0:0:2:0:0,-0.60206,-8.68589e-09
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/vsnp_dnaprints.loc Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,4 @@ +## vSNP DNAprints files +#Value Name Path Description +AF2122 Mycobacterium_AF2122/NC_002945v4.yml ${__HERE__}/NC_002945v4.yml DNAprints file for Mycobacterium bovis AF2122/97 +#NC_006932 Brucella_abortus1/NC_006932-NC_006933.yml /vsnp/NC_006932/Brucella_abortus1/NC_006932-NC_006933.yml DNAprints file for Brucella abortus bv. 1 str. 9-941
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/fasta_indexes.loc.sample Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,29 @@ +#This is a sample file distributed with Galaxy that enables tools +#to use a directory of Samtools indexed sequences data files. You will need +#to create these data files and then create a fasta_indexes.loc file +#similar to this one (store it in this directory) that points to +#the directories in which those files are stored. The fasta_indexes.loc +#file has this format (white space characters are TAB characters): +# +# <unique_build_id> <dbkey> <display_name> <file_base_path> +# +#So, for example, if you had hg19 Canonical indexed stored in +# +# /depot/data2/galaxy/hg19/sam/, +# +#then the fasta_indexes.loc entry would look like this: +# +#hg19canon hg19 Human (Homo sapiens): hg19 Canonical /depot/data2/galaxy/hg19/sam/hg19canon.fa +# +#and your /depot/data2/galaxy/hg19/sam/ directory +#would contain hg19canon.fa and hg19canon.fa.fai files. +# +#Your fasta_indexes.loc file should include an entry per line for +#each index set you have stored. The file in the path does actually +#exist, but it should never be directly used. Instead, the name serves +#as a prefix for the index file. For example: +# +#hg18canon hg18 Human (Homo sapiens): hg18 Canonical /depot/data2/galaxy/hg18/sam/hg18canon.fa +#hg18full hg18 Human (Homo sapiens): hg18 Full /depot/data2/galaxy/hg18/sam/hg18full.fa +#hg19canon hg19 Human (Homo sapiens): hg19 Canonical /depot/data2/galaxy/hg19/sam/hg19canon.fa +#hg19full hg19 Human (Homo sapiens): hg19 Full /depot/data2/galaxy/hg19/sam/hg19full.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/vsnp_dnaprints.loc.sample Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,4 @@ +## vSNP DNAprints files +#Value Name Path Description +#AF2122 Mycobacterium_AF2122/NC_002945v4.yml /vsnp/AF2122/Mycobacterium_AF2122/NC_002945v4.yml DNAprints file for Mycobacterium bovis AF2122/97 +#NC_006932 Brucella_abortus1/NC_006932-NC_006933.yml /vsnp/NC_006932/Brucella_abortus1/NC_006932-NC_006933.yml DNAprints file for Brucella abortus bv. 1 str. 9-941
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/vsnp_genbank.loc.sample Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,4 @@ +## vSNP Genbank files +#Value Name Path Description +#AF2122 Mycobacterium_AF2122/NC_002945v4.gbk vsnp/AF2122/Mycobacterium_AF2122/NC_002945v4.gbk Genbank file for Mycobacterium bovis AF2122/97 +#NC_006932 Brucella_abortus1/NC_006932-NC_006933.gbk vsnp/NC_006932/Brucella_abortus1/NC_006932-NC_006933.gbk Genbank file for Brucella abortus bv. 1 str. 9-941
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_data_table_conf.xml.sample Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,18 @@ +<tables> + <!-- Location of SAMTools indexes for FASTA files --> + <table name="fasta_indexes" comment_char="#"> + <columns>value, dbkey, name, path</columns> + <file path="tool-data/fasta_indexes.loc" /> + </table> + <!-- Location of genbank files for vsnp_build_tables tool --> + <table name="vsnp_genbank" comment_char="#"> + <columns>value, name, path, description</columns> + <file path="tool-data/vsnp_genbank.loc" /> + </table> + <!-- Location of dnaprints files for vsnp_dtermine_ref_from_data tool --> + <table name="vsnp_dnaprints" comment_char="#"> + <columns>value, name, path, description</columns> + <file path="tool-data/vsnp_dnaprints.loc" /> + </table> +</tables> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_data_table_conf.xml.test Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,15 @@ +<tables> + <table name="fasta_indexes" comment_char="#"> + <columns>value, dbkey, name, path</columns> + <file path="${__HERE__}/test-data/fasta_indexes.loc" /> + </table> + <!-- Location of genbank files for vsnp_build_tables tool --> + <table name="vsnp_genbank" comment_char="#"> + <columns>value, name, path, description</columns> + <file path="${__HERE__}/test-data/vsnp_genbank.loc" /> + </table> + <table name="vsnp_dnaprints" comment_char="#"> + <columns>value, name, path, description</columns> + <file path="${__HERE__}/test-data/vsnp_dnaprints.loc" /> + </table> +</tables>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/vsnp_add_zero_coverage.py Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,189 @@ +#!/usr/bin/env python + +import argparse +import multiprocessing +import os +import queue +import re +import shutil + +import pandas +import pysam +from Bio import SeqIO + +INPUT_BAM_DIR = 'input_bam_dir' +INPUT_VCF_DIR = 'input_vcf_dir' +OUTPUT_VCF_DIR = 'output_vcf_dir' +OUTPUT_METRICS_DIR = 'output_metrics_dir' + + +def get_base_file_name(file_path): + base_file_name = os.path.basename(file_path) + if base_file_name.find(".") > 0: + # Eliminate the extension. + return os.path.splitext(base_file_name)[0] + elif base_file_name.endswith("_vcf"): + # The "." character has likely + # changed to an "_" character. + return base_file_name.rstrip("_vcf") + return base_file_name + + +def get_coverage_and_snp_count(task_queue, reference, output_metrics, output_vcf, timeout): + while True: + try: + tup = task_queue.get(block=True, timeout=timeout) + except queue.Empty: + break + bam_file, vcf_file = tup + # Create a coverage dictionary. + coverage_dict = {} + coverage_list = pysam.depth(bam_file, split_lines=True) + for line in coverage_list: + chrom, position, depth = line.split('\t') + coverage_dict["%s-%s" % (chrom, position)] = depth + # Convert it to a data frame. + coverage_df = pandas.DataFrame.from_dict(coverage_dict, orient='index', columns=["depth"]) + # Create a zero coverage dictionary. + zero_dict = {} + for record in SeqIO.parse(reference, "fasta"): + chrom = record.id + total_len = len(record.seq) + for pos in list(range(1, total_len + 1)): + zero_dict["%s-%s" % (str(chrom), str(pos))] = 0 + # Convert it to a data frame with depth_x + # and depth_y columns - index is NaN. + zero_df = pandas.DataFrame.from_dict(zero_dict, orient='index', columns=["depth"]) + coverage_df = zero_df.merge(coverage_df, left_index=True, right_index=True, how='outer') + # depth_x "0" column no longer needed. + coverage_df = coverage_df.drop(columns=['depth_x']) + coverage_df = coverage_df.rename(columns={'depth_y': 'depth'}) + # Covert the NaN to 0 coverage and get some metrics. + coverage_df = coverage_df.fillna(0) + coverage_df['depth'] = coverage_df['depth'].apply(int) + total_length = len(coverage_df) + average_coverage = coverage_df['depth'].mean() + zero_df = coverage_df[coverage_df['depth'] == 0] + total_zero_coverage = len(zero_df) + total_coverage = total_length - total_zero_coverage + genome_coverage = "{:.2%}".format(total_coverage / total_length) + # Process the associated VCF input. + column_names = ["CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT", "Sample"] + vcf_df = pandas.read_csv(vcf_file, sep='\t', header=None, names=column_names, comment='#') + good_snp_count = len(vcf_df[(vcf_df['ALT'].str.len() == 1) & (vcf_df['REF'].str.len() == 1) & (vcf_df['QUAL'] > 150)]) + base_file_name = get_base_file_name(vcf_file) + if total_zero_coverage > 0: + header_file = "%s_header.csv" % base_file_name + with open(header_file, 'w') as outfile: + with open(vcf_file) as infile: + for line in infile: + if re.search('^#', line): + outfile.write("%s" % line) + vcf_df_snp = vcf_df[vcf_df['REF'].str.len() == 1] + vcf_df_snp = vcf_df_snp[vcf_df_snp['ALT'].str.len() == 1] + vcf_df_snp['ABS_VALUE'] = vcf_df_snp['CHROM'].map(str) + "-" + vcf_df_snp['POS'].map(str) + vcf_df_snp = vcf_df_snp.set_index('ABS_VALUE') + cat_df = pandas.concat([vcf_df_snp, zero_df], axis=1, sort=False) + cat_df = cat_df.drop(columns=['CHROM', 'POS', 'depth']) + cat_df[['ID', 'ALT', 'QUAL', 'FILTER', 'INFO']] = cat_df[['ID', 'ALT', 'QUAL', 'FILTER', 'INFO']].fillna('.') + cat_df['REF'] = cat_df['REF'].fillna('N') + cat_df['FORMAT'] = cat_df['FORMAT'].fillna('GT') + cat_df['Sample'] = cat_df['Sample'].fillna('./.') + cat_df['temp'] = cat_df.index.str.rsplit('-', n=1) + cat_df[['CHROM', 'POS']] = pandas.DataFrame(cat_df.temp.values.tolist(), index=cat_df.index) + cat_df = cat_df[['CHROM', 'POS', 'ID', 'REF', 'ALT', 'QUAL', 'FILTER', 'INFO', 'FORMAT', 'Sample']] + cat_df['POS'] = cat_df['POS'].astype(int) + cat_df = cat_df.sort_values(['CHROM', 'POS']) + body_file = "%s_body.csv" % base_file_name + cat_df.to_csv(body_file, sep='\t', header=False, index=False) + if output_vcf is None: + output_vcf_file = os.path.join(OUTPUT_VCF_DIR, "%s.vcf" % base_file_name) + else: + output_vcf_file = output_vcf + with open(output_vcf_file, "w") as outfile: + for cf in [header_file, body_file]: + with open(cf, "r") as infile: + for line in infile: + outfile.write("%s" % line) + else: + if output_vcf is None: + output_vcf_file = os.path.join(OUTPUT_VCF_DIR, "%s.vcf" % base_file_name) + else: + output_vcf_file = output_vcf + shutil.copyfile(vcf_file, output_vcf_file) + bam_metrics = [base_file_name, "", "%4f" % average_coverage, genome_coverage] + vcf_metrics = [base_file_name, str(good_snp_count), "", ""] + if output_metrics is None: + output_metrics_file = os.path.join(OUTPUT_METRICS_DIR, "%s.tabular" % base_file_name) + else: + output_metrics_file = output_metrics + metrics_columns = ["File", "Number of Good SNPs", "Average Coverage", "Genome Coverage"] + with open(output_metrics_file, "w") as fh: + fh.write("# %s\n" % "\t".join(metrics_columns)) + fh.write("%s\n" % "\t".join(bam_metrics)) + fh.write("%s\n" % "\t".join(vcf_metrics)) + task_queue.task_done() + + +def set_num_cpus(num_files, processes): + num_cpus = int(multiprocessing.cpu_count()) + if num_files < num_cpus and num_files < processes: + return num_files + if num_cpus < processes: + half_cpus = int(num_cpus / 2) + if num_files < half_cpus: + return num_files + return half_cpus + return processes + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + + parser.add_argument('--output_metrics', action='store', dest='output_metrics', required=False, default=None, help='Output metrics text file') + parser.add_argument('--output_vcf', action='store', dest='output_vcf', required=False, default=None, help='Output VCF file') + parser.add_argument('--reference', action='store', dest='reference', help='Reference dataset') + parser.add_argument('--processes', action='store', dest='processes', type=int, help='User-selected number of processes to use for job splitting') + + args = parser.parse_args() + + # The assumption here is that the list of files + # in both INPUT_BAM_DIR and INPUT_VCF_DIR are + # equal in number and named such that they are + # properly matched if the directories contain + # more than 1 file (i.e., hopefully the bam file + # names and vcf file names will be something like + # Mbovis-01D6_* so they can be # sorted and properly + # associated with each other). + bam_files = [] + for file_name in sorted(os.listdir(INPUT_BAM_DIR)): + file_path = os.path.abspath(os.path.join(INPUT_BAM_DIR, file_name)) + bam_files.append(file_path) + vcf_files = [] + for file_name in sorted(os.listdir(INPUT_VCF_DIR)): + file_path = os.path.abspath(os.path.join(INPUT_VCF_DIR, file_name)) + vcf_files.append(file_path) + + multiprocessing.set_start_method('spawn') + queue1 = multiprocessing.JoinableQueue() + num_files = len(bam_files) + cpus = set_num_cpus(num_files, args.processes) + # Set a timeout for get()s in the queue. + timeout = 0.05 + + # Add each associated bam and vcf file pair to the queue. + for i, bam_file in enumerate(bam_files): + vcf_file = vcf_files[i] + queue1.put((bam_file, vcf_file)) + + # Complete the get_coverage_and_snp_count task. + processes = [multiprocessing.Process(target=get_coverage_and_snp_count, args=(queue1, args.reference, args.output_metrics, args.output_vcf, timeout, )) for _ in range(cpus)] + for p in processes: + p.start() + for p in processes: + p.join() + queue1.join() + + if queue1.empty(): + queue1.close() + queue1.join_thread()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/vsnp_build_tables.py Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,382 @@ +#!/usr/bin/env python + +import argparse +import multiprocessing +import os +import queue +import re + +import pandas +import pandas.io.formats.excel +from Bio import SeqIO + +INPUT_JSON_AVG_MQ_DIR = 'input_json_avg_mq_dir' +INPUT_JSON_DIR = 'input_json_dir' +INPUT_NEWICK_DIR = 'input_newick_dir' +# Maximum columns allowed in a LibreOffice +# spreadsheet is 1024. Excel allows for +# 16,384 columns, but we'll set the lower +# number as the maximum. Some browsers +# (e.g., Firefox on Linux) are configured +# to use LibreOffice for Excel spreadsheets. +MAXCOLS = 1024 +OUTPUT_EXCEL_DIR = 'output_excel_dir' + + +def annotate_table(table_df, group, annotation_dict): + for gbk_chrome, pro in list(annotation_dict.items()): + ref_pos = list(table_df) + ref_series = pandas.Series(ref_pos) + ref_df = pandas.DataFrame(ref_series.str.split(':', expand=True).values, columns=['reference', 'position']) + all_ref = ref_df[ref_df['reference'] == gbk_chrome] + positions = all_ref.position.to_frame() + # Create an annotation file. + annotation_file = "%s_annotations.csv" % group + with open(annotation_file, "a") as fh: + for _, row in positions.iterrows(): + pos = row.position + try: + aaa = pro.iloc[pro.index.get_loc(int(pos))][['chrom', 'locus', 'product', 'gene']] + try: + chrom, name, locus, tag = aaa.values[0] + print("{}:{}\t{}, {}, {}".format(chrom, pos, locus, tag, name), file=fh) + except ValueError: + # If only one annotation for the entire + # chromosome (e.g., flu) then having [0] fails + chrom, name, locus, tag = aaa.values + print("{}:{}\t{}, {}, {}".format(chrom, pos, locus, tag, name), file=fh) + except KeyError: + print("{}:{}\tNo annotated product".format(gbk_chrome, pos), file=fh) + # Read the annotation file into a data frame. + annotations_df = pandas.read_csv(annotation_file, sep='\t', header=None, names=['index', 'annotations'], index_col='index') + # Remove the annotation_file from disk since both + # cascade and sort tables are built using the file, + # and it is opened for writing in append mode. + os.remove(annotation_file) + # Process the data. + table_df_transposed = table_df.T + table_df_transposed.index = table_df_transposed.index.rename('index') + table_df_transposed = table_df_transposed.merge(annotations_df, left_index=True, right_index=True) + table_df = table_df_transposed.T + return table_df + + +def excel_formatter(json_file_name, excel_file_name, group, annotation_dict): + pandas.io.formats.excel.header_style = None + table_df = pandas.read_json(json_file_name, orient='split') + if annotation_dict is not None: + table_df = annotate_table(table_df, group, annotation_dict) + else: + table_df = table_df.append(pandas.Series(name='no annotations')) + writer = pandas.ExcelWriter(excel_file_name, engine='xlsxwriter') + table_df.to_excel(writer, sheet_name='Sheet1') + writer_book = writer.book + ws = writer.sheets['Sheet1'] + format_a = writer_book.add_format({'bg_color': '#58FA82'}) + format_g = writer_book.add_format({'bg_color': '#F7FE2E'}) + format_c = writer_book.add_format({'bg_color': '#0000FF'}) + format_t = writer_book.add_format({'bg_color': '#FF0000'}) + format_normal = writer_book.add_format({'bg_color': '#FDFEFE'}) + formatlowqual = writer_book.add_format({'font_color': '#C70039', 'bg_color': '#E2CFDD'}) + format_ambigous = writer_book.add_format({'font_color': '#C70039', 'bg_color': '#E2CFDD'}) + format_n = writer_book.add_format({'bg_color': '#E2CFDD'}) + rows, cols = table_df.shape + ws.set_column(0, 0, 30) + ws.set_column(1, cols, 2.1) + ws.freeze_panes(2, 1) + format_annotation = writer_book.add_format({'font_color': '#0A028C', 'rotation': '-90', 'align': 'top'}) + # Set last row. + ws.set_row(rows + 1, cols + 1, format_annotation) + # Make sure that row/column locations don't overlap. + ws.conditional_format(rows - 2, 1, rows - 1, cols, {'type': 'cell', 'criteria': '<', 'value': 55, 'format': formatlowqual}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'cell', 'criteria': '==', 'value': 'B$2', 'format': format_normal}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'A', 'format': format_a}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'G', 'format': format_g}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'C', 'format': format_c}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'T', 'format': format_t}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'S', 'format': format_ambigous}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'Y', 'format': format_ambigous}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'R', 'format': format_ambigous}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'W', 'format': format_ambigous}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'K', 'format': format_ambigous}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'M', 'format': format_ambigous}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': 'N', 'format': format_n}) + ws.conditional_format(2, 1, rows - 2, cols, {'type': 'text', 'criteria': 'containing', 'value': '-', 'format': format_n}) + format_rotation = writer_book.add_format({}) + format_rotation.set_rotation(90) + for column_num, column_name in enumerate(list(table_df.columns)): + ws.write(0, column_num + 1, column_name, format_rotation) + format_annotation = writer_book.add_format({'font_color': '#0A028C', 'rotation': '-90', 'align': 'top'}) + # Set last row. + ws.set_row(rows, 400, format_annotation) + writer.save() + + +def get_annotation_dict(gbk_file): + gbk_dict = SeqIO.to_dict(SeqIO.parse(gbk_file, "genbank")) + annotation_dict = {} + tmp_file = "features.csv" + # Create a file of chromosomes and features. + for chromosome in list(gbk_dict.keys()): + with open(tmp_file, 'w+') as fh: + for feature in gbk_dict[chromosome].features: + if "CDS" in feature.type or "rRNA" in feature.type: + try: + product = feature.qualifiers['product'][0] + except KeyError: + product = None + try: + locus = feature.qualifiers['locus_tag'][0] + except KeyError: + locus = None + try: + gene = feature.qualifiers['gene'][0] + except KeyError: + gene = None + fh.write("%s\t%d\t%d\t%s\t%s\t%s\n" % (chromosome, int(feature.location.start), int(feature.location.end), locus, product, gene)) + # Read the chromosomes and features file into a data frame. + df = pandas.read_csv(tmp_file, sep='\t', names=["chrom", "start", "stop", "locus", "product", "gene"]) + # Process the data. + df = df.sort_values(['start', 'gene'], ascending=[True, False]) + df = df.drop_duplicates('start') + pro = df.reset_index(drop=True) + pro.index = pandas.IntervalIndex.from_arrays(pro['start'], pro['stop'], closed='both') + annotation_dict[chromosome] = pro + return annotation_dict + + +def get_base_file_name(file_path): + base_file_name = os.path.basename(file_path) + if base_file_name.find(".") > 0: + # Eliminate the extension. + return os.path.splitext(base_file_name)[0] + elif base_file_name.find("_") > 0: + # The dot extension was likely changed to + # the " character. + items = base_file_name.split("_") + return "_".join(items[0:-1]) + else: + return base_file_name + + +def output_cascade_table(cascade_order, mqdf, group, annotation_dict): + cascade_order_mq = pandas.concat([cascade_order, mqdf], join='inner') + output_table(cascade_order_mq, "cascade", group, annotation_dict) + + +def output_excel(df, type_str, group, annotation_dict, count=None): + # Output the temporary json file that + # is used by the excel_formatter. + if count is None: + if group is None: + json_file_name = "%s_order_mq.json" % type_str + excel_file_name = os.path.join(OUTPUT_EXCEL_DIR, "%s_table.xlsx" % type_str) + else: + json_file_name = "%s_%s_order_mq.json" % (group, type_str) + excel_file_name = os.path.join(OUTPUT_EXCEL_DIR, "%s_%s_table.xlsx" % (group, type_str)) + else: + if group is None: + json_file_name = "%s_order_mq_%d.json" % (type_str, count) + excel_file_name = os.path.join(OUTPUT_EXCEL_DIR, "%s_table_%d.xlsx" % (type_str, count)) + else: + json_file_name = "%s_%s_order_mq_%d.json" % (group, type_str, count) + excel_file_name = os.path.join(OUTPUT_EXCEL_DIR, "%s_%s_table_%d.xlsx" % (group, type_str, count)) + df.to_json(json_file_name, orient='split') + # Output the Excel file. + excel_formatter(json_file_name, excel_file_name, group, annotation_dict) + + +def output_sort_table(cascade_order, mqdf, group, annotation_dict): + sort_df = cascade_order.T + sort_df['abs_value'] = sort_df.index + sort_df[['chrom', 'pos']] = sort_df['abs_value'].str.split(':', expand=True) + sort_df = sort_df.drop(['abs_value', 'chrom'], axis=1) + sort_df.pos = sort_df.pos.astype(int) + sort_df = sort_df.sort_values(by=['pos']) + sort_df = sort_df.drop(['pos'], axis=1) + sort_df = sort_df.T + sort_order_mq = pandas.concat([sort_df, mqdf], join='inner') + output_table(sort_order_mq, "sort", group, annotation_dict) + + +def output_table(df, type_str, group, annotation_dict): + if isinstance(group, str) and group.startswith("dataset"): + # Inputs are single files, not collections, + # so input file names are not useful for naming + # output files. + group_str = None + else: + group_str = group + count = 0 + chunk_start = 0 + chunk_end = 0 + column_count = df.shape[1] + if column_count >= MAXCOLS: + # Here the number of columns is greater than + # the maximum allowed by Excel, so multiple + # outputs will be produced. + while column_count >= MAXCOLS: + count += 1 + chunk_end += MAXCOLS + df_of_type = df.iloc[:, chunk_start:chunk_end] + output_excel(df_of_type, type_str, group_str, annotation_dict, count=count) + chunk_start += MAXCOLS + column_count -= MAXCOLS + count += 1 + df_of_type = df.iloc[:, chunk_start:] + output_excel(df_of_type, type_str, group_str, annotation_dict, count=count) + else: + output_excel(df, type_str, group_str, annotation_dict) + + +def preprocess_tables(task_queue, annotation_dict, timeout): + while True: + try: + tup = task_queue.get(block=True, timeout=timeout) + except queue.Empty: + break + newick_file, json_file, json_avg_mq_file = tup + avg_mq_series = pandas.read_json(json_avg_mq_file, typ='series', orient='split') + # Map quality to dataframe. + mqdf = avg_mq_series.to_frame(name='MQ') + mqdf = mqdf.T + # Get the group. + group = get_base_file_name(newick_file) + snps_df = pandas.read_json(json_file, orient='split') + with open(newick_file, 'r') as fh: + for line in fh: + line = re.sub('[:,]', '\n', line) + line = re.sub('[)(]', '', line) + line = re.sub(r'[0-9].*\.[0-9].*\n', '', line) + line = re.sub('root\n', '', line) + sample_order = line.split('\n') + sample_order = list([_f for _f in sample_order if _f]) + sample_order.insert(0, 'root') + tree_order = snps_df.loc[sample_order] + # Count number of SNPs in each column. + snp_per_column = [] + for column_header in tree_order: + count = 0 + column = tree_order[column_header] + for element in column: + if element != column[0]: + count = count + 1 + snp_per_column.append(count) + row1 = pandas.Series(snp_per_column, tree_order.columns, name="snp_per_column") + # Count number of SNPS from the + # top of each column in the table. + snp_from_top = [] + for column_header in tree_order: + count = 0 + column = tree_order[column_header] + # for each element in the column + # skip the first element + for element in column[1:]: + if element == column[0]: + count = count + 1 + else: + break + snp_from_top.append(count) + row2 = pandas.Series(snp_from_top, tree_order.columns, name="snp_from_top") + tree_order = tree_order.append([row1]) + tree_order = tree_order.append([row2]) + # In pandas=0.18.1 even this does not work: + # abc = row1.to_frame() + # abc = abc.T --> tree_order.shape (5, 18), abc.shape (1, 18) + # tree_order.append(abc) + # Continue to get error: "*** ValueError: all the input arrays must have same number of dimensions" + tree_order = tree_order.T + tree_order = tree_order.sort_values(['snp_from_top', 'snp_per_column'], ascending=[True, False]) + tree_order = tree_order.T + # Remove snp_per_column and snp_from_top rows. + cascade_order = tree_order[:-2] + # Output the cascade table. + output_cascade_table(cascade_order, mqdf, group, annotation_dict) + # Output the sorted table. + output_sort_table(cascade_order, mqdf, group, annotation_dict) + task_queue.task_done() + + +def set_num_cpus(num_files, processes): + num_cpus = int(multiprocessing.cpu_count()) + if num_files < num_cpus and num_files < processes: + return num_files + if num_cpus < processes: + half_cpus = int(num_cpus / 2) + if num_files < half_cpus: + return num_files + return half_cpus + return processes + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + + parser.add_argument('--input_avg_mq_json', action='store', dest='input_avg_mq_json', required=False, default=None, help='Average MQ json file') + parser.add_argument('--input_newick', action='store', dest='input_newick', required=False, default=None, help='Newick file') + parser.add_argument('--input_snps_json', action='store', dest='input_snps_json', required=False, default=None, help='SNPs json file') + parser.add_argument('--gbk_file', action='store', dest='gbk_file', required=False, default=None, help='Optional gbk file'), + parser.add_argument('--processes', action='store', dest='processes', type=int, help='User-selected number of processes to use for job splitting') + + args = parser.parse_args() + + if args.gbk_file is not None: + # Create the annotation_dict for annotating + # the Excel tables. + annotation_dict = get_annotation_dict(args.gbk_file) + else: + annotation_dict = None + + # The assumption here is that the list of files + # in both INPUT_NEWICK_DIR and INPUT_JSON_DIR are + # named such that they are properly matched if + # the directories contain more than 1 file (i.e., + # hopefully the newick file names and json file names + # will be something like Mbovis-01D6_* so they can be + # sorted and properly associated with each other). + if args.input_newick is not None: + newick_files = [args.input_newick] + else: + newick_files = [] + for file_name in sorted(os.listdir(INPUT_NEWICK_DIR)): + file_path = os.path.abspath(os.path.join(INPUT_NEWICK_DIR, file_name)) + newick_files.append(file_path) + if args.input_snps_json is not None: + json_files = [args.input_snps_json] + else: + json_files = [] + for file_name in sorted(os.listdir(INPUT_JSON_DIR)): + file_path = os.path.abspath(os.path.join(INPUT_JSON_DIR, file_name)) + json_files.append(file_path) + if args.input_avg_mq_json is not None: + json_avg_mq_files = [args.input_avg_mq_json] + else: + json_avg_mq_files = [] + for file_name in sorted(os.listdir(INPUT_JSON_AVG_MQ_DIR)): + file_path = os.path.abspath(os.path.join(INPUT_JSON_AVG_MQ_DIR, file_name)) + json_avg_mq_files.append(file_path) + + multiprocessing.set_start_method('spawn') + queue1 = multiprocessing.JoinableQueue() + queue2 = multiprocessing.JoinableQueue() + num_files = len(newick_files) + cpus = set_num_cpus(num_files, args.processes) + # Set a timeout for get()s in the queue. + timeout = 0.05 + + for i, newick_file in enumerate(newick_files): + json_file = json_files[i] + json_avg_mq_file = json_avg_mq_files[i] + queue1.put((newick_file, json_file, json_avg_mq_file)) + + # Complete the preprocess_tables task. + processes = [multiprocessing.Process(target=preprocess_tables, args=(queue1, annotation_dict, timeout, )) for _ in range(cpus)] + for p in processes: + p.start() + for p in processes: + p.join() + queue1.join() + + if queue1.empty(): + queue1.close() + queue1.join_thread()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/vsnp_determine_ref_from_data.py Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,231 @@ +#!/usr/bin/env python + +import argparse +import gzip +import os +from collections import OrderedDict + +import yaml +from Bio.SeqIO.QualityIO import FastqGeneralIterator + +OUTPUT_DBKEY_DIR = 'output_dbkey' +OUTPUT_METRICS_DIR = 'output_metrics' + + +def get_base_file_name(file_path): + base_file_name = os.path.basename(file_path) + if base_file_name.find(".") > 0: + # Eliminate the extension. + return os.path.splitext(base_file_name)[0] + elif base_file_name.find("_fq") > 0: + # The "." character has likely + # changed to an "_" character. + return base_file_name.split("_fq")[0] + elif base_file_name.find("_fastq") > 0: + return base_file_name.split("_fastq")[0] + return base_file_name + + +def get_dbkey(dnaprints_dict, key, s): + # dnaprints_dict looks something like this: + # {'brucella': {'NC_002945v4': ['11001110', '11011110', '11001100']} + # {'bovis': {'NC_006895': ['11111110', '00010010', '01111011']}} + d = dnaprints_dict.get(key, {}) + for data_table_value, v_list in d.items(): + if s in v_list: + return data_table_value + return "" + + +def get_dnaprints_dict(dnaprint_fields): + # A dndprint_fields entry looks something liek this. + # [['AF2122', '/galaxy/tool-data/vsnp/AF2122/dnaprints/NC_002945v4.yml']] + dnaprints_dict = {} + for item in dnaprint_fields: + # Here item is a 2-element list of data + # table components, # value and path. + value = item[0] + path = item[1].strip() + with open(path, "rt") as fh: + # The format of all dnaprints yaml + # files is something like this: + # brucella: + # - 0111111111111111 + print_dict = yaml.load(fh, Loader=yaml.Loader) + for print_dict_k, print_dict_v in print_dict.items(): + dnaprints_v_dict = dnaprints_dict.get(print_dict_k, {}) + if len(dnaprints_v_dict) > 0: + # dnaprints_dict already contains k (e.g., 'brucella', + # and dnaprints_v_dict will be a dictionary # that + # looks something like this: + # {'NC_002945v4': ['11001110', '11011110', '11001100']} + value_list = dnaprints_v_dict.get(value, []) + value_list = value_list + print_dict_v + dnaprints_v_dict[value] = value_list + else: + # dnaprints_v_dict is an empty dictionary. + dnaprints_v_dict[value] = print_dict_v + dnaprints_dict[print_dict_k] = dnaprints_v_dict + # dnaprints_dict looks something like this: + # {'brucella': {'NC_002945v4': ['11001110', '11011110', '11001100']} + # {'bovis': {'NC_006895': ['11111110', '00010010', '01111011']}} + return dnaprints_dict + + +def get_group_and_dbkey(dnaprints_dict, brucella_string, brucella_sum, bovis_string, bovis_sum, para_string, para_sum): + if brucella_sum > 3: + group = "Brucella" + dbkey = get_dbkey(dnaprints_dict, "brucella", brucella_string) + elif bovis_sum > 3: + group = "TB" + dbkey = get_dbkey(dnaprints_dict, "bovis", bovis_string) + elif para_sum >= 1: + group = "paraTB" + dbkey = get_dbkey(dnaprints_dict, "para", para_string) + else: + group = "" + dbkey = "" + return group, dbkey + + +def get_oligo_dict(): + oligo_dict = {} + oligo_dict["01_ab1"] = "AATTGTCGGATAGCCTGGCGATAACGACGC" + oligo_dict["02_ab3"] = "CACACGCGGGCCGGAACTGCCGCAAATGAC" + oligo_dict["03_ab5"] = "GCTGAAGCGGCAGACCGGCAGAACGAATAT" + oligo_dict["04_mel"] = "TGTCGCGCGTCAAGCGGCGTGAAATCTCTG" + oligo_dict["05_suis1"] = "TGCGTTGCCGTGAAGCTTAATTCGGCTGAT" + oligo_dict["06_suis2"] = "GGCAATCATGCGCAGGGCTTTGCATTCGTC" + oligo_dict["07_suis3"] = "CAAGGCAGATGCACATAATCCGGCGACCCG" + oligo_dict["08_ceti1"] = "GTGAATATAGGGTGAATTGATCTTCAGCCG" + oligo_dict["09_ceti2"] = "TTACAAGCAGGCCTATGAGCGCGGCGTGAA" + oligo_dict["10_canis4"] = "CTGCTACATAAAGCACCCGGCGACCGAGTT" + oligo_dict["11_canis"] = "ATCGTTTTGCGGCATATCGCTGACCACAGC" + oligo_dict["12_ovis"] = "CACTCAATCTTCTCTACGGGCGTGGTATCC" + oligo_dict["13_ether2"] = "CGAAATCGTGGTGAAGGACGGGACCGAACC" + oligo_dict["14_63B1"] = "CCTGTTTAAAAGAATCGTCGGAACCGCTCT" + oligo_dict["15_16M0"] = "TCCCGCCGCCATGCCGCCGAAAGTCGCCGT" + oligo_dict["16_mel1b"] = "TCTGTCCAAACCCCGTGACCGAACAATAGA" + oligo_dict["17_tb157"] = "CTCTTCGTATACCGTTCCGTCGTCACCATGGTCCT" + oligo_dict["18_tb7"] = "TCACGCAGCCAACGATATTCGTGTACCGCGACGGT" + oligo_dict["19_tbbov"] = "CTGGGCGACCCGGCCGACCTGCACACCGCGCATCA" + oligo_dict["20_tb5"] = "CCGTGGTGGCGTATCGGGCCCCTGGATCGCGCCCT" + oligo_dict["21_tb2"] = "ATGTCTGCGTAAAGAAGTTCCATGTCCGGGAAGTA" + oligo_dict["22_tb3"] = "GAAGACCTTGATGCCGATCTGGGTGTCGATCTTGA" + oligo_dict["23_tb4"] = "CGGTGTTGAAGGGTCCCCCGTTCCAGAAGCCGGTG" + oligo_dict["24_tb6"] = "ACGGTGATTCGGGTGGTCGACACCGATGGTTCAGA" + oligo_dict["25_para"] = "CCTTTCTTGAAGGGTGTTCG" + oligo_dict["26_para_sheep"] = "CGTGGTGGCGACGGCGGCGGGCCTGTCTAT" + oligo_dict["27_para_cattle"] = "TCTCCTCGGTCGGTGATTCGGGGGCGCGGT" + return oligo_dict + + +def get_seq_counts(value, fastq_list, gzipped): + count = 0 + for fastq_file in fastq_list: + if gzipped: + with gzip.open(fastq_file, 'rt') as fh: + for title, seq, qual in FastqGeneralIterator(fh): + count += seq.count(value) + else: + with open(fastq_file, 'r') as fh: + for title, seq, qual in FastqGeneralIterator(fh): + count += seq.count(value) + return(value, count) + + +def get_species_counts(fastq_list, gzipped): + count_summary = {} + oligo_dict = get_oligo_dict() + for v1 in oligo_dict.values(): + returned_value, count = get_seq_counts(v1, fastq_list, gzipped) + for key, v2 in oligo_dict.items(): + if returned_value == v2: + count_summary.update({key: count}) + count_list = [] + for v in count_summary.values(): + count_list.append(v) + brucella_sum = sum(count_list[:16]) + bovis_sum = sum(count_list[16:24]) + para_sum = sum(count_list[24:]) + return count_summary, count_list, brucella_sum, bovis_sum, para_sum + + +def get_species_strings(count_summary): + binary_dictionary = {} + for k, v in count_summary.items(): + if v > 1: + binary_dictionary.update({k: 1}) + else: + binary_dictionary.update({k: 0}) + binary_dictionary = OrderedDict(sorted(binary_dictionary.items())) + binary_list = [] + for v in binary_dictionary.values(): + binary_list.append(v) + brucella_binary = binary_list[:16] + brucella_string = ''.join(str(e) for e in brucella_binary) + bovis_binary = binary_list[16:24] + bovis_string = ''.join(str(e) for e in bovis_binary) + para_binary = binary_list[24:] + para_string = ''.join(str(e) for e in para_binary) + return brucella_string, bovis_string, para_string + + +def output_dbkey(file_name, dbkey, output_file): + # Output the dbkey. + with open(output_file, "w") as fh: + fh.write("%s" % dbkey) + + +def output_files(fastq_file, count_list, group, dbkey, dbkey_file, metrics_file): + base_file_name = get_base_file_name(fastq_file) + output_dbkey(base_file_name, dbkey, dbkey_file) + output_metrics(base_file_name, count_list, group, dbkey, metrics_file) + + +def output_metrics(file_name, count_list, group, dbkey, output_file): + # Output the metrics. + with open(output_file, "w") as fh: + fh.write("Sample: %s\n" % file_name) + fh.write("Brucella counts: ") + for i in count_list[:16]: + fh.write("%d," % i) + fh.write("\nTB counts: ") + for i in count_list[16:24]: + fh.write("%d," % i) + fh.write("\nPara counts: ") + for i in count_list[24:]: + fh.write("%d," % i) + fh.write("\nGroup: %s" % group) + fh.write("\ndbkey: %s\n" % dbkey) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + + parser.add_argument('--dnaprint_fields', action='append', dest='dnaprint_fields', nargs=2, required=False, default=None, help="List of dnaprints data table value, name and path fields") + parser.add_argument('--read1', action='store', dest='read1', required=True, default=None, help='Required: single read') + parser.add_argument('--read2', action='store', dest='read2', required=False, default=None, help='Optional: paired read') + parser.add_argument('--gzipped', action='store_true', dest='gzipped', default=False, help='Input files are gzipped') + parser.add_argument('--output_dbkey', action='store', dest='output_dbkey', required=True, default=None, help='Output reference file') + parser.add_argument('--output_metrics', action='store', dest='output_metrics', required=True, default=None, help='Output metrics file') + + args = parser.parse_args() + + fastq_list = [args.read1] + if args.read2 is not None: + fastq_list.append(args.read2) + + # The value of dnaprint_fields is a list of lists, where each list is + # the [value, name, path] components of the vsnp_dnaprints data table. + # The data_manager_vsnp_dnaprints tool assigns the dbkey column from the + # all_fasta data table to the value column in the vsnp_dnaprints data + # table to ensure a proper mapping for discovering the dbkey. + dnaprints_dict = get_dnaprints_dict(args.dnaprint_fields) + + # Here fastq_list consists of either a single read + # or a set of paired reads, producing single outputs. + count_summary, count_list, brucella_sum, bovis_sum, para_sum = get_species_counts(fastq_list, args.gzipped) + brucella_string, bovis_string, para_string = get_species_strings(count_summary) + group, dbkey = get_group_and_dbkey(dnaprints_dict, brucella_string, brucella_sum, bovis_string, bovis_sum, para_string, para_sum) + output_files(args.read1, count_list, group, dbkey, dbkey_file=args.output_dbkey, metrics_file=args.output_metrics)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/vsnp_determine_ref_from_data.xml Wed Dec 02 09:11:24 2020 +0000 @@ -0,0 +1,154 @@ +<tool id="vsnp_determine_ref_from_data" name="vSNP: determine reference" version="1.0.0"> + <description>from input data</description> + <macros> + <import>macros.xml</import> + </macros> + <requirements> + <requirement type="package" version="1.76">biopython</requirement> + <requirement type="package" version="5.3">pyyaml</requirement> + </requirements> + <command detect_errors="exit_code"><![CDATA[ +#import re +#set gzipped = 'false' +#set input_type = $input_type_cond.input_type + +#if $input_type in ["single", "pair"]: + #set read1 = $input_type_cond.read1 + #set read1_identifier = re.sub('[^\s\w\-]', '_', str($read1.element_identifier)) + ln -s '${read1}' '${read1_identifier}' && + #if $input_type == "pair": + #set read2 = $input_type_cond.read2 + #set read2_identifier = re.sub('[^\s\w\-]', '_', str($read2.element_identifier)) + ln -s '${read2}' '${read2_identifier}' && + #else: + #set read2 = None + #end if +#else: + #set read1 = $input_type_cond.reads_collection['forward'] + #set read1_identifier = re.sub('[^\s\w\-]', '_', str($read1.element_identifier)) + ln -s '${read1}' '${read1_identifier}' && + #set read2 = $input_type_cond.reads_collection['reverse'] + #set read2_identifier = re.sub('[^\s\w\-]', '_', str($read2.element_identifier)) + ln -s '${read2}' '${read2_identifier}' && +#end if + +python '$__tool_directory__/vsnp_determine_ref_from_data.py' + --read1 '${read1_identifier}' + #if $read2 is not None + --read2 '${read2_identifier}' + #end if + --output_dbkey '$output_dbkey' + --output_metrics '$output_metrics' +#if $read1.is_of_type('fastqsanger.gz'): + --gzipped +#end if +#set $dnaprint_fields = $__app__.tool_data_tables['vsnp_dnaprints'].get_fields() +#for $i in $dnaprint_fields: + --dnaprint_fields '${i[0]}' '${i[2]}' +#end for +]]></command> + <inputs> + <conditional name="input_type_cond"> + <param name="input_type" type="select" label="Choose the category of the files to be analyzed"> + <option value="single" selected="true">Single files</option> + <option value="paired">Paired reads</option> + <option value="pair">Paired reads in separate data sets</option> + </param> + <when value="single"> + <param name="read1" type="data" format="fastqsanger.gz,fastqsanger" label="Read1 fastq file"/> + </when> + <when value="paired"> + <param name="reads_collection" type="data_collection" format="fastqsanger,fastqsanger.gz" collection_type="paired" label="Collection of fastqsanger paired read files"/> + </when> + <when value="pair"> + <param name="read1" type="data" format="fastqsanger.gz,fastqsanger" label="Read1 fastq file"/> + <param name="read2" type="data" format="fastqsanger.gz,fastqsanger" label="Read2 fastq file"/> + </when> + </conditional> + </inputs> + <outputs> + <data name="output_dbkey" format="txt" label="${tool.name} (dbkey) on ${on_string}"/> + <data name="output_metrics" format="txt" label="${tool.name} (metrics) on ${on_string}"/> + </outputs> + <tests> + <!-- 1 single read --> + <test expect_num_outputs="2"> + <param name="input_type" value="single"/> + <param name="read1" value="Mcap_Deer_DE_SRR650221.fastq.gz" ftype="fastqsanger.gz"/> + <output name="output_dbkey" file="output_dbkey.txt" ftype="txt"/> + <output name="output_metrics" file="output_metrics.txt" ftype="txt"/> + </test> + <!-- 1 set of paired reads --> + <test expect_num_outputs="2"> + <param name="input_type" value="pair"/> + <param name="read1" value="forward.fastq.gz" ftype="fastqsanger.gz"/> + <param name="read2" value="reverse.fastq.gz" ftype="fastqsanger.gz"/> + <output name="output_dbkey" file="paired_dbkey.txt" ftype="txt"/> + <output name="output_metrics" file="paired_metrics.txt" ftype="txt"/> + </test> + <!-- A collection of paired reads --> + <test expect_num_outputs="2"> + <param name="input_type" value="paired"/> + <param name="reads_collection"> + <collection type="paired"> + <element name="forward" value="forward.fastq.gz" ftype="fastqsanger.gz"/> + <element name="reverse" value="reverse.fastq.gz" ftype="fastqsanger.gz"/> + </collection> + </param> + <output name="output_dbkey" file="paired_dbkey.txt" ftype="txt"/> + <output name="output_metrics" file="paired_metrics.txt" ftype="txt"/> + </test> + </tests> + <help> +**What it does** + +Accepts a single fastqsanger read, a set of paired reads, or a collection of single or paired reads (bacterial samples) and +inspects the data to discover the best reference genome for aligning the reads. + +The information needed to discover the best reference is maintained by the USDA in this repository_. References are curreently + +.. _repository: https://github.com/USDA-VS/vSNP_reference_options + +limited to TB complex, paraTB, and Brucella, but information for additional references will be added. The information for each +reference is a string consisting of zeros and ones, compiled by USDA researchers, which we call a "DNA print". These strings +are maintained in yaml files for use in Galaxy, and are installed via the **vSNP DNAprints data manager** tool. + +This tool creates an in-memory dictionary of these DNA print strings for matching with a string generated by inspecting the +input sample data. During inspection, this tool accrues sequence counts for supported species, ultimately generating a string +consisting of zeros and ones based on the counts, (i.e., a DNA print). This string is then compared to the strings contained +in the in-memory dictionary of DNA prints to find a match. + +The strings in the in-memory dictionary are each associated with a Galaxy "dbkey" (i.e., genome build), so when a match is found, +the associated "dbkey" is passed to a mapper (e.g., **Map with BWA-MEM**), typically within a workflow via an expression tool, +to align the reads to the associated reference. + +This tool produces 2 text files, a "dbkey" file that contains the dbkey string and a "metrics" file that provides information +about the sequence counts that were discovered in the input sample data that produced the "DNA print" string. + +This tool is important for samples containing bacterial species because many of the samples have a "mixed bag" of species, +and discovering the primary species is critical. DNA print matching is currently supported for the following genomes. + + * Mycobacterium bovis AF2122/97 + * Brucella abortus bv. 1 str. 9-941 + * Brucella abortus strain BER + * Brucella canis ATCC 23365 + * Brucella ceti TE10759-12 + * Brucella melitensis bv. 1 str. 16M + * Brucella melitensis bv. 3 str. Ether + * Brucella melitensis BwIM_SOM_36b + * Brucella melitensis ATCC 23457 + * Brucella ovis ATCC 25840 + * Brucella suis 1330 + * Mycobacterium tuberculosis H37Rv + * Mycobacterium avium subsp. paratuberculosis strain Telford + * Mycobacterium avium subsp. paratuberculosis K-10 + * Brucella suis ATCC 23445 + * Brucella suis bv. 3 str. 686 + +**Required Options** + + * **Choose the category of the files to be analyzed** - select "Single files" or "Collection of files", then select the appropriate history items (single or paired fastqsanger reads or a collection of fastqsanger reads) based on the selected option. + </help> + <expand macro="citations"/> +</tool> +
