Previous changeset 5:aa156d61c38c (2016-08-11) Next changeset 7:61bd336c50c2 (2016-08-12) |
Commit message:
Deleted selected files |
removed:
calc_fitness.py calc_fitness.xml |
b |
diff -r aa156d61c38c -r 6693daf10224 calc_fitness.py --- a/calc_fitness.py Thu Aug 11 18:34:42 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
b'@@ -1,553 +0,0 @@\n-# A translation of calc_fitness.pl into python! For analysis of Tn-Seq.\n-# This script requires BioPython, which in turn has a good number of dependencies (some optional but very helpful).\n-# How to install BioPython and a list of its dependencies can be found here: http://biopython.org/DIST/docs/install/Installation.html\n-# To see what future edits / tests are planned for this script, search for the phrase "in the future".\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-##### ARGUMENTS #####\n-\n-def print_usage():\n-\tprint "\\n" + "You are missing one or more required flags. A complete list of flags accepted by calc_fitness is as follows:" + "\\n\\n"\n-\tprint "\\033[1m" + "Required" + "\\033[0m" + "\\n"\n-\tprint "-ref" + "\\t\\t" + "The name of the reference genome file, in GenBank format." + "\\n"\n-\tprint "-t1" + "\\t\\t" + "The name of the bowtie mapfile from time 1." + "\\n"\n-\tprint "-t2" + "\\t\\t" + "The name of the bowtie mapfile from time 2." + "\\n"\n-\tprint "-out" + "\\t\\t" + "Name of a file to enter the .csv output." + "\\n"\n-\tprint "\\n"\n-\tprint "\\033[1m" + "Optional" + "\\033[0m" + "\\n"\n-\tprint "-expansion" + "\\t\\t" + "Expansion factor (default: 250)" + "\\n"\n-\tprint "-d" + "\\t\\t" + "All reads being analyzed are downstream of the transposon" + "\\n"\n-\tprint "-reads1" + "\\t\\t" + "The number of reads to be used to calculate the correction factor for time 0." + "\\n\\t\\t" + "(default counted from bowtie output)" + "\\n"\n-\tprint "-reads2" + "\\t\\t" + "The number of reads to be used to calculate the correction factor for time 6." + "\\n\\t\\t" + "(default counted from bowtie output)" + "\\n"\n-\tprint "-cutoff" + "\\t\\t" + "Discard any positions where the average of counted transcripts at time 0 and time 1 is below this number (default 0)" + "\\n"\n-\tprint "-cutoff2" + "\\t\\t" + "Discard any positions within the normalization genes where the average of counted transcripts at time 0 and time 1 is below this number (default 0)" + "\\n"\n-\tprint "-strand" + "\\t\\t" + "Use only the specified strand (+ or -) when counting transcripts (default: both)" + "\\n"\n-\tprint "-normalize" + "\\t" + "A file that contains a list of genes that should have a fitness of 1" + "\\n"\n-\tprint "-maxweight" + "\\t" + "The maximum weight a transposon gene can have in normalization calculations" + "\\n"\n-\tprint "-multiply" + "\\t" + "Multiply all fitness scores by a certain value (e.g., the fitness of a knockout). You should normalize the data." + "\\n"\n-\tprint "-ef" + "\\t\\t" + "Exclude insertions that occur in the first N amount (%) of gene--becuase may not affect gene function." + "\\n"\n-\tprint "-el" + "\\t\\t" + "Exclude insertions in the last N amount (%) of the gene--considering truncation may not affect gene function." + "\\n"\n-\tprint "-wig" + "\\t\\t" + "Create a wiggle file for viewing in a genome browser. Provide a filename." + "\\n"\n-\tprint "\\n"\n-\n-import argparse \n-parser = argparse.ArgumentParser()\n-parser.add_argument("-ref", action="store", dest="ref_genome")\n-parser.add_argument("-t1", action="store", dest="mapfile1")\n-parser.add_argument("-t2", action="store", dest="mapfile2")\n-parser.add_argument("-out", action="store", dest="outfile")\n-parser.add_argument("-out2", action="store", dest="outfile2")\n-parser.add_argument("-expansion", action="store", dest="expansion_factor")\n-parser.add_argument("-d", action="store", dest="downstream")\n-parser.add_argument("-reads1", action="store", dest="reads1")\n-parser.add_argument("-reads2", action="store", dest="reads2")\n-parser.add_argument("-cutoff", action="store", dest="cutoff")\n-parser.add_argument("-cutoff2", action="store", dest="cutoff2")\n-parser.add_argument("-strand", action="store", dest="usestrand")\n-parser.add_argument("-normalize", action="store", dest="normalize")\n-parser.add_argument("-maxweight", action="store", dest="max_weight")\n-parser.add_argument("-multiply", action="store", dest="multiply")\n-parser.add_argument("-ef", action="store", dest="exclude_first")\n-parser.add_argument("-el", action="store", dest="exclude_last")\n-parser.add_arg'..b'rtion within the transposon genes weighted by how many insertions each had.\n-\n-\taverage = sum / count\n-\ti = 0\n-\tweighted_sum = 0\n-\tweight_sum = 0\n-\twhile i < len(weights):\n-\t\tweighted_sum += weights[i]*scores[i]\n-\t\tweight_sum += weights[i]\n-\t\ti += 1\n-\tweighted_average = weighted_sum/weight_sum\n- \n-# Prints the regular average, weighted average, and total insertions for reference\n- \n-\tprint "Normalization step:" + "\\n"\n-\tprint "Regular average: " + str(average) + "\\n"\n-\tprint "Weighted Average: " + str(weighted_average) + "\\n"\n-\tprint "Total Insertions: " + str(count) + "\\n"\n- \n-# The actual normalization happens here; every fitness score is divided by the average fitness found for genes that should have a value of 1. \n-# For example, if the average fitness for genes was too low overall - let\'s say 0.97 within the normalization geness - every fitness would be proportionally raised.\n-\n-\told_ws = 0\n-\tnew_ws = 0\n-\twcount = 0\n-\tfor list in results:\n-\t\tif list[11] == \'W\':\n-\t\t\tcontinue\n-\t\tnew_w = float(list[11])/weighted_average\n-\t\t\n-# Sometimes you want to multiply all the fitness values by a constant; this does that.\n-# For example you might multiply all the values by a constant for a genetic interaction screen - where Tn-Seq is performed as usual except there\'s one background knockout all the mutants share. This is\n-# because independent mutations should have a fitness value that\'s equal to their individual fitness values multipled, but related mutations will deviate from that; to find those deviations you\'d multiply\n-# all the fitness values from mutants from a normal library by the fitness of the background knockout and compare that to the fitness values found from the knockout library!\n-\t\t\n-\t\tif arguments.multiply:\n-\t\t\tnew_w *= float(arguments.multiply)\n-\t\t\n-# Records the old w score for reference, and adds it to a total sum of all w scores (so that the old w mean and new w mean can be printed later).\n-\t\t\n-\t\tif float(list[11]) > 0:\n-\t\t\told_ws += float(list[11])\n-\t\t\tnew_ws += new_w\n-\t\t\twcount += 1\n-\t\t\t\n-# Writes the new w score into the results list of lists.\n-\t\t\t\n-\t\tlist[12] = new_w\n-\t\t\n-# Adds a line to wiglist for each insertion position, with the insertion position and its new w value.\n-\t\t\n-\t\tif (arguments.wig):\n-\t\t\twigstring += str(list[0]) + " " + str(new_w) + "\\n"\n- \n-# Prints the old w mean and new w mean for reference.\n- \n-\told_w_mean = old_ws / wcount\n-\tnew_w_mean = new_ws / wcount\n-\tprint "Old W Average: " + str(old_w_mean) + "\\n"\n-\tprint "New W Average: " + str(new_w_mean) + "\\n"\n-\n-# Overwrites the old file with the normalized file.\n-\n-with open(arguments.outfile, "wb") as csvfile:\n- writer = csv.writer(csvfile)\n- writer.writerows(results)\n- \n-# If a WIG file was requested, actually creates the WIG file and writes wiglist to it\n-# So what\'s written here is the WIG header plus each insertion position and it\'s new w value if normalization were called for, and each insertion position and its unnormalized w value if normalization were not called for.\n-\t\t\n-if (arguments.wig):\n-\tif (arguments.normalize):\n-\t\twith open(arguments.wig, "wb") as wigfile:\n-\t\t\twigfile.write(wigstring)\n-\telse:\n-\t\tfor list in results:\n-\t\t\twigstring += str(list[0]) + " " + str(list[11]) + "\\n"\n-\t\twith open(arguments.wig, "wb") as wigfile:\n-\t\t\t\twigfile.write(wigstring)\t\t\t\t\t\n-\n-\n-\n-\n-\n-#FITOPSpy="-ef .0 -el .10 -cutoff 0"\n-#\n-#python ../script/calc_fitness.py $FITOPSpy -wig gview/py_L1_2394eVI_Gluc.wig -t1 alignments/L1_2394eVI_Input.map -t2 alignments/L1_2394eVI_Gluc_T2.map -ref=NC_003028b2.gbk -out results/py_L1_2394eVI_Gluc.csv -out2 results/py_2_L1_2394eVI_Gluc.txt -expansion 675 -normalize tigr4_normal.txt\n-#python ../script/calc_fitness.py $FITOPSpy -wig gview/py_L3_2394eVI_Gluc.wig -t1 alignments/L3_2394eVI_Input.map -t2 alignments/L3_2394eVI_Gluc_T2.map -ref=NC_003028b2.gbk -out results/py_L3_2394eVI_Gluc.csv -out2 results/py_2_L3_2394eVI_Gluc.txt -expansion 244 -normalize tigr4_normal.txt\n\\ No newline at end of file\n' |
b |
diff -r aa156d61c38c -r 6693daf10224 calc_fitness.xml --- a/calc_fitness.xml Thu Aug 11 18:34:42 2016 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,159 +0,0 @@\n-<tool id="calc_fitness" name="Calculate Fitnesses">\n- <description>of transposon insertion locations</description>\n- <requirements>\n- <requirement type="package" version="1.64">biopython</requirement>\n- </requirements>\n- <command interpreter="python">\n- calc_fitness.py \n- -ef $ef \n- -el $el \n- -wig $output3 \n- -t1 $t1 \n- -t2 $t2 \n- -ref $ref \n- -out $output \n- -out2 $output2 \n- -expansion $expansion\n- -maxweight $maxweight\n- -cutoff $cutoff\n- -cutoff2 $cutoff2\n- -strand $strand\n- #if $normalization.calculations == "yes":\n- -normalize $normalization.genes\n- #end if\n- #if $multiply.choice == "yes":\n- -multiply $multiply.factor\n- #end if\n- #if $reads.downstream == "yes":\n- -d 1\n- #end if\n- #if $reads1.choice == "yes":\n- -reads1 $reads1.number\n- #end if\n- #if $reads2.choice == "yes":\n- -reads1 $read1.number\n- #end if\n- </command>\n- <inputs>\n- <param name="t1" type="data" label="Map files from t1"/>\n- <param name="t2" type="data" label="Map files from t2"/>\n- <param name="ref" type="data" label="GenBank reference genome"/>\n- <conditional name="normalization">\n- <param name="calculations" type="select" label="Normalize fitness calculations?">\n- <option value="no">No</option>\n- <option value="yes">Yes</option>\n- </param>\n- <when value="no">\n- <!-- do nothing -->\n- </when>\n- <when value="yes"> \n- <param name="genes" type="data" label="Genes to normalize by" />\n- </when>\n- </conditional>\n- <param name="strand" type="select" label="Use reads from which strands?">\n- <option value="both">both</option>\n- <option value="+">Watson (+)</option>\n- <option value="-">Crick (-)</option>\n- </param>\n- <param name="expansion" type="float" value="250" label="Expansion factor"/>\n- <param name="cutoff" type="float" value="0.0" label="Cutoff"/>\n- <param name="cutoff2" type="float" value="0.0" label="Cutoff2"/>\n- <param name="ef" type="float" value="0.0" label="Exclude first %"/>\n- <param name="el" type="float" value="0.0" label="Exclude last %"/>\n- <param name="maxweight" type="float" value="75" label="Maximum weight of a transposon gene in normalization calculations"/>\n- <conditional name="multiply">\n- <param name="choice" type="select" label="Multiply fitness scores by a certain value?">\n- <option value="no">No</option>\n- <option value="yes">Yes</option>\n- </param>\n- <when value="no">\n- <!-- do nothing -->\n- </when>\n- <when value="yes"> \n- <param name="factor" type="float" value="0.0" label="Multiply by" />\n- </when>\n- </conditional>\n- <conditional name="reads">\n- <param name="downstream" type="select" label="Are all reads downstream of the transposon?">\n- <option value="no">No</option>\n- <option value="yes">Yes</option>\n- </param>\n- <when value="no">\n- <!-- do nothing -->\n- </when>\n- <when value="yes"> \n- <!-- do nothing -->\n- </when>\n- </conditional>\n- <conditional name="reads1">\n- <param name="choice" type="select" label="Set reads1 manually?">\n- <option value="no">No</option>\n- <option value="yes">Yes</option>\n- </param>\n- <when value="no">\n- <!-- do nothing -->\n- </when>\n- <when value="yes"> \n- <param name="number" type="float" value="0.0" label="Reads1" />\n- </when>\n- </conditional>\n- <conditional name="reads2">\n- <param name="choice" type="select" label="Set reads2 manually?">\n- <option value="no">No</option>\n- <option value="yes">Yes</option>\n- </param>\n- <when value="no">\n- <!-- do nothing -->\n- </when>\n- <when value="yes"> \n- <param name="number" type="float" value="0.0" label="Reads2" />\n- </when>\n- </conditional>\n- </inputs>\n- <outputs>\n- '..b'g Illumina sequencing reads from t1 and t2.\n-\n-**The options explained**\n-\n-Map files from t1: a bowtie mapfile containing the mapped flanking reads from t1\n-\n-Map files from t2: a bowtie mapfile containing the mapped flanking reads from t2\n-\n-GenBank reference genome: the reference genome of whatever model you\'re working with, which needs to be in standard genbank format. For more on that format see the genbank website.\n-\n-Normalizing fitness calculations: our normalization relies on the fitness scores of insertions within transposon genes, which ought to have a neutral fitness of 1. The file of normalization genes should be formatted so that each line is a single gene loci like "SP_0017"\n-\n-Using reads from certain strands: typically users will use reads from both strands, but this lets you do things like comparing reads between strands.\n-\n-Expansion factor: the expansion factor of the bacteria culture you got your reads from - this is something you should measure when you\'re growing up the bacteria from t1 to t2. Using the default expansion factor of 250 will give you very rough fitness calculations and so it\'s not recommended.\n-\n-Cutoff: the cutoff for all genes; insertion locations with an average count less than this number will be disregarded, as insertion locations with a low number of reads can have inaccurate fitnesses calculated, for the same reason studies with low sample sizes can be inaccurate.\n-\n-Cutoff2: the cutoff for the normalization genes; only has an effect if larger than cutoff\n-\n-Exclude first %: insertions in the very beginning of genes sometimes don\'t actually interfere with their function, and so you can exclude insertions from the first % of a gene from being counted as within those genes. This mostly affects the aggregate calculations downstream.\n-\n-Exclude last %: similarly insertions in the very end of genes sometimes don\'t actually interfere with their function, and so you can exclude insertions from the last % of a gene. Also mostly affects the aggregate calculations downstream.\n-\n-Maximum weight of a transposon gene in normalization calculations: in the normalization calculations, fitnesses within transposon genes are weighted according to their number of reads, as fitnesses calculated from more reads tend to be more accurate. However, to keep those fitnesses with huge numbers of reads from vastly outweighing the others, you can limit the max weight.\n-\n-Multiplying fitness scores by a certain value: what it says on the lid; you can multiply the normalized fitness scores by a certain value. This can be helpful for genetic interaction screens, where Tn-seq is performed as usual except there\'s one background knockout all the mutants share. This is because a combination of independent mutations should have a fitness value that\'s equal to their individual fitness values multipled, but related mutations will deviate from that; to find those deviations you\'d multiply all the fitness values from mutants from a normal library by the fitness of the background knockout and compare that to the fitness values found from the knockout library!\t\n-\n-Setting reads1 / reads2 manually: these are related to the correction factor calculations; it\'s not recommended that you set them manually. If this number is too low it will cause a mathematical error and Calculate Fitness will not work.\n-\n-Output: the output is a csv (comma separated values) file containing the fitness values calculated. Each line besides the header will represent the following information for an insertion location: position, strand, count_1, count_2, ratio, mt_freq_t1, mt_freq_t2, pop_freq_t1, pop_freq_t2, gene, D, W, nW\n-\n-Output2: a txt file containing the percent blanks to be used in the Aggregate tool for normalization\n-\n-Output3: a wig file that can be used for visualization of the fitness values; each line besides the header will be an insertion location and its (possibly normalized) fitness.\n-\n-</help>\n-</tool>\n\\ No newline at end of file\n' |