# HG changeset patch
# User m-zytnicki
# Date 1470900392 14400
# Node ID 60abb65400044ca860faad22bee85d4abe42a918
planemo upload commit fb76aa0a938a2498d3206e6039bc1d9906e6c2ce-dirty
diff -r 000000000000 -r 60abb6540004 mmquant.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/mmquant.xml Thu Aug 11 03:26:32 2016 -0400
@@ -0,0 +1,187 @@
+
+
+ mmquant
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**Why using this tool?**
+
+This tool counts the number of reads (produced by RNA-Seq) per gene, much like HTSeq-count_ and featureCounts_. The main difference with other tools is that multi-mapping reads are counted differently: if a read is mapped to gene A, gene B, and gene C, the tool will create a new feature, "geneA--geneB--geneC", that will be counted once.
+
+.. _HTSeq-count: http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html
+.. _featureCounts: http://bioinf.wehi.edu.au/featureCounts/
+
+**Why it matters?**
+
+Recently, an article_ showed that RNA-Seq quantification tools are not accurate, leading to errors while finding differentially expressed genes. The authors suggest this method, that may not provide the genes that are differentially expressed (something that RNA-Seq alone cannot do), but the groups of genes that are differentially expressed.
+
+.. _article: http://www.genomebiology.com/2015/16/1/177
+
+**Strands**
+
+Strands can be:
+
+* for paired-end reads: ``U`` (unknown), ``FR`` (forward-reverse), ``RF`` (reverse-forward), ``FF`` (forward-forward);
+
+* for single-end reads: ``U`` (unknown), ``F`` (forward), ``R`` (reverse);
+
+* Default: ``U``.
+
+
+**Annotation file**
+
+The annotation file should be in GTF. GFF might work too. The tool only uses the gene/transcript/exon types.
+
+
+**Reads files**
+
+The reads should be given in SAM or BAM format, and be sorted (by position). The reads can be single end or paired-end (or a mixture thereof).
+
+You can use the samtools_ to sort them. This tool uses the NH flag (provides the number of hits for each read, see the specification_), so be sure that your mapping tool sets it adequately (yes, TopHat2_ and STAR_ do it fine). You should also check how your mapping tool handles multi-mapping reads (this can usually be tuned using the appropriate parameters).
+
+.. _samtools: http://www.htslib.org/
+.. _specification: https://samtools.github.io/hts-specs/SAMv1.pdf
+.. _TopHat2: http://ccb.jhu.edu/software/tophat/index.shtml
+.. _STAR: https://github.com/alexdobin/STAR/releases
+
+
+**Output file**
+
+The output is a tab-separated file, to be use in EdgeR or DESeq, for instance. If the user provided *n* reads files, the output will contain *n+1* columns:
+
+============== ======== ======== ===
+Gene sample_1 sample_2 ...
+============== ======== ======== ===
+gene_A ... ... ...
+gene_B ... ... ...
+gene_B--gene_C ... ... ...
+============== ======== ======== ===
+
+The first line is the ID of the genes.
+If a read maps several genes (say, gene_B and gene_C), a new feature is added to the table, gene_B--gene_C. The reads that can be mapped to these genes will be counted there (but not in the gene_B nor gene_C lines).
+
+With the ``Print names`` option, the gene names are used instead of gene IDs. If two different genes have the same name, the systematic name is added, like: ``Mat2a (ENSMUSG00000053907)``.
+
+Note that the gene IDs and gene names should be given in the GTF file after the ``gene_id`` and ``gene_name`` tags respectively.
+
+**Output stats**
+
+The output stats are given in standard error.
+
+The general shape is::
+
+ Results for sample_A:
+ # hits: N
+ # uniquely mapped reads: N (x%)
+ # ambiguous hits: N (x%)
+ # non-uniquely mapped hits: N (x%)
+ # unassigned hits: N (x%)
+
+These figures mainly provide stats on hits; one sequence may have zero, one, or several hits. An ambiguous hit is a hit that overlaps several annotation features. A non-uniquely mapped hit belongs to a sequence that maps several loci in the genome.
+
+**Overlap**
+
+The way a read R is mapped to a gene A depends on the overlap *n* value:
+
+==================== ===============================================
+if *n* is then R is mapped to A iff
+==================== ===============================================
+a negative value R is included in A
+a positive integer they have at least *n* nucleotides in common
+a float value (0, 1) *n* % of the nucleotides of R are shared with A
+==================== ===============================================
+
+**Merge Threshold**
+
+Sometimes, there are very few reads that can be mapped unambiguously to a gene A, because it is very similar to gene B.
+
+============== ==========
+Gene sample_1
+============== ==========
+gene_A *x*
+gene_B *y*
+gene_A--gene_B *z*
+============== ==========
+
+In the previous example, suppose that *x << z*. In this case, you can move all the reads from gene_A to gene_A--gene_B, using the merge threshold *t*, a float in (0, 1). If *x < t* x *y*, then the reads are transferred.
+
+**Count Threshold**
+
+If the maximum number of reads for a gene is less than the count threshold (a non-negative integer), then the corresponding line is discarded.
+
+
+**Contact**
+
+Comment? Suggestion? Do not hesitate sending me an email_.
+
+.. _email: mailto:matthias.zytnicki@toulouse.inra.fr
+
+
+
+@misc{bitbucketmmquant,
+ author = {Zytnicki.},
+ year = {2016},
+ title = {multi-mapping-counter},
+ publisher = {BitBucket},
+ journal = {BitBucket repository},
+ url = {https://bitbucket.org/mzytnicki/multi-mapping-counter},
+}
+
+
diff -r 000000000000 -r 60abb6540004 test-data/test_mmquant_1.gtf
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/test_mmquant_1.gtf Thu Aug 11 03:26:32 2016 -0400
@@ -0,0 +1,1 @@
+chr1 mmquant exon 100 400 . + 0 gene_id "gene"; transcript_id "transcript";
diff -r 000000000000 -r 60abb6540004 test-data/test_mmquant_1.sam
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/test_mmquant_1.sam Thu Aug 11 03:26:32 2016 -0400
@@ -0,0 +1,2 @@
+@SQ SN:chr1 LN:500
+read1 0 chr1 100 255 50M * 0 0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:0
diff -r 000000000000 -r 60abb6540004 test-data/test_mmquant_1.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/test_mmquant_1.txt Thu Aug 11 03:26:32 2016 -0400
@@ -0,0 +1,2 @@
+Gene test
+gene 1
diff -r 000000000000 -r 60abb6540004 tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_dependencies.xml Thu Aug 11 03:26:32 2016 -0400
@@ -0,0 +1,21 @@
+
+
+
+
+
+ https://bitbucket.org/mzytnicki/multi-mapping-counter/get/master.zip
+ g++ mmquant.cpp -Wall -pthread -std=c++11 -lz -o mmquant -O3
+
+
+ $INSTALL_DIR/bin
+
+
+ $INSTALL_DIR/bin
+
+
+
+
+ Compiling mmquant requires a C++11 compiler, and zlib.
+
+
+