diff MIRUReader/README.md @ 0:f0e3646a4e45 draft

Uploaded
author dcouvin
date Tue, 17 Aug 2021 19:15:15 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MIRUReader/README.md	Tue Aug 17 19:15:15 2021 +0000
@@ -0,0 +1,88 @@
+# MIRUReader
+
+## Description
+
+Identify 24-locus MIRU-VNTR for *Mycobacterium tuberculosis* complex (MTBC) directly from long reads generated by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). Also work on assembled genome.
+
+## Requirements
+
+* Linux
+* primersearch from [EMBOSS](http://emboss.sourceforge.net/download/)
+   * install from the official website or
+   * install via conda `conda install -c bioconda emboss`
+   * Ensure the primersearch command is in your device's environment path, where primersearch program can be executed directly by typing `primersearch` on the commandline
+* [*pandas*](https://pandas.pydata.org/) 
+   * can be installed via conda `conda install pandas` or via PyPI `pip install pandas`
+* [*statistics*](https://pypi.org/project/statistics/)
+   * can be installed via PyPI `pip install statistics`
+
+## Installation
+
+`git clone https://github.com/phglab/MIRUReader.git`
+
+## Change log
+#### 13/09/2019
+- Added a check to ensure primersearch is executable prior to MIRUReader program execution
+- Updated documentation to the README
+
+#### 04/07/2019
+- Update output format for option '--details'.
+
+#### 14/06/2019
+- Auto convert fastq to fasta.
+
+## Usage example
+
+For one sample analysis:
+```
+python /your/path/to/MIRUReader.py -r sample.fasta -p sampleID > miru.txt
+```
+
+For multiple samples analysis:
+1. Create a mapping file (mappingFile.txt) that looks like:
+
+    sample_001.fasta sample_001 \
+    sample_002.fasta sample_002 \
+    ...
+
+2. Then run the program:
+```
+cat mappingFile.txt | while read -a line; do python /your/path/to/MIRUReader.py -r ${line[0]} -p ${line[1]}; done > miru.multiple.txt
+```
+
+## Output example
+
+```
+sample_prefix   0154    0424    0577    0580    0802    0960    1644    1955    2059    2163b   2165    2347    2401    2461    2531    2687    2996    3007    3171    3192    3690    4052    4156    4348
+sample_001      2       4       4       2       3       3       3       2       2       5       4       4       4       2       5       1       6       3       3       5       3       7       2       3
+```
+
+Notes:
+* The program is compatible to Python 2 and Python 3.
+* Accepted reads file format includes '.fastq', '.fastq.gz', '.fasta', and '.fasta.gz'.
+* The program output is a tab-delimited plain text which can be copied to or opened in Excel spreadsheet.
+
+## Full usage
+
+| Main options | Description |
+| ------------ | ----------- |
+| -r READS | Input reads file in fastq/fasta format, can be gzipped or not gzipped |
+| -p PREFIX | Sample ID required for naming output file. |
+| --table TABLE | Allele calling table, default is MIRU_table. Can be user-defined in fixed format. However, providing custom allele calling table for other VNTR is not tested. |
+| --primers PRIMERS | Primers sequences, default is MIRU_primers. Can be user-defined in fixed format. |
+
+
+| Optional options | Description |
+| ---------------- | ----------- |
+| --amplicons | Use output from primersearch ("prefix.18.primersearch.out") and summarize MIRU profile directly. |
+| --details | This option is for further inspection. It displays details of repeat count for each loci with total mismatch error in the primer sequences alignment. |
+| --nofasta | Delete fasta file generated if your input read is in fastq format. |
+
+## FAQ
+1. **Why are there two MIRU allele calling tables (MIRU_table and MIRU_table_0580)?** 
+
+MIRU loci 0580 (MIRU_table_0580) consist of a different numbering system for determination of repeat numbers as compared to the other 23 MIRU locus (MIRU_table) for MTBC isolates.  
+
+
+## Troubleshooting
+1. If an error message `OSError: primersearch is not found.` appears, please ensure your `primersearch` executable file is in your environment path (`echo $PATH`) and can be called directly.