Mercurial > repos > m-zytnicki > mmquant

--- a/mmquant.xml	Thu Aug 11 03:26:32 2016 -0400
+++ b/mmquant.xml	Wed Feb 15 06:03:00 2017 -0500
@@ -29,6 +29,8 @@
 			-c "$count"
 			-m "$merge"
 			-o "$output"
+			-d "$n_overlap"
+			-D "$pc_overlap"
     ]]></command>
     <inputs>
         <param name="annotation" type="data" label="Annotation" format="gtf" />
@@ -47,6 +49,8 @@
         <param name="gene_name" type="boolean" label="Print gene name instead of IDs" truevalue="-g" falsevalue="" help="use gene name instead of gene ID in the output file" />
 		<param name="count" type="integer" value="0" min="0" label="Count threshold" help="Do not display genes with less than N reads" />
 		<param name="merge" type="float" value="0.0" min="0.0" max="1.0" label="Merge threshold" help="Merge gene aggregate count with parent aggregate if count is low" />
+		<param name="n_overlap" type="integer" value="30" min="1" label="Difference of overlapping" help="Number of overlapping bp between the best matches and the other matches" />
+		<param name="pc_overlap" type="float" value="0.5" min="0.0" max="1.0" label="Ratio of overlapping" help="Ratio of overlapping bp between the best matches and the other matches" />
     </inputs>
     <outputs>
         <data name="output" format="txt" label="${tool.name} on ${on_string}" />
@@ -101,6 +105,20 @@
 .. _TopHat2: http://ccb.jhu.edu/software/tophat/index.shtml
 .. _STAR: https://github.com/alexdobin/STAR/releases

+**Read mapping to several genes**
+
+We will suppose here that the ``-l 1`` strategy is used (i.e. a read is attributed to a gene as soon as at least 1 nucleotide overlap). The example can be extended to other strategies as well.
+
+If a read (say, of size 100), maps unambiguously and overlaps with gene A and B, it will be counted as 1 for the new "gene" gene_A--gene_B. However, suppose that only 1 nucleotide overlaps with gene A, whereas 100 nucleotides overlap with gene B (yes, genes A and B overlap). You probably would like to attribute the read to gene B.
+
+The options ``Difference of overlapping`` and ``Ratio of overlapping`` control this. We compute the number of overlapping nucleotides between a read and the overlapping genes. If a read overlaps "significantly" more with one gene than with all the other genes, they will attribute the read to the former gene only.
+
+The option ``Difference of overlapping`` *n* computes the differences of overlapping nucleotides. Let us name *N_A* and *N_B* the number of overlapping nucleotides with genes A and B respectively. If *N_A >= N_B + n*, then the read will be attributed to gene A only.
+
+The option ``Ratio of overlapping`` *m* compares the ratio of overlapping nucleotides. If *N_A / N_B >= m*, then the read will be attributed to gene A only.
+
+If both option ``Difference of overlapping`` *n* and ``Ratio of overlapping`` *m* are used, then the read will be attributed to gene A only iff both *N_A >= N_B + n* and *N_A / N_B >= m*.
+

 **Output file**