diff README.org @ 0:dd46956ff61f draft

Uploaded
author petr-novak
date Fri, 08 Dec 2017 09:57:17 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.org	Fri Dec 08 09:57:17 2017 -0500
@@ -0,0 +1,73 @@
+#+TITLE:  Sequence Read Simulator
+#+AUTHOR: Petr Novak
+
+Create pseudo short reads from long reads (Illumina Like). 
+
+* Requirements
+- python version > 3.4
+- biopython
+
+* Available tools
+** long_reads_sampling
+#+BEGIN_EXAMPLE
+
+usage: long_reads_sampling.py [-h] [-i INPUT] [-o OUTPUT] [-l TOTAL_LENGTH]
+                              [-s SEED]
+
+Create sample of long reads, instead of setting number of reads to be sampled,
+total length of all sampled sequences is defined
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -i INPUT, --input INPUT
+                        file with long reads in fasta format (default: None)
+  -o OUTPUT, --output OUTPUT
+                        Output file name (default: None)
+  -l TOTAL_LENGTH, --total_length TOTAL_LENGTH
+                        total length of sampled output (default: None)
+  -s SEED, --seed SEED  random number generator seed (default: 123)
+#+END_EXAMPLE
+
+** long2short
+#+BEGIN_EXAMPLE
+usage: long2short.py [-h] [-i INPUT] [-o OUTPUT] [-cov COVERAGE]
+                     [-L INSERT_LENGTH] [-l READ_LENGTH]
+
+Creates pseudo short reads from long oxford nanopore reads
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -i INPUT, --input INPUT
+                        file with long reads in fasta format (default: None)
+  -o OUTPUT, --output OUTPUT
+                        Output file name (default: None)
+  -cov COVERAGE, --coverage COVERAGE
+                        samplig coverage (default: 0.1)
+  -L INSERT_LENGTH, --insert_length INSERT_LENGTH
+                        length of insert, must be longer than read length
+                        (default: 600)
+  -l READ_LENGTH, --read_length READ_LENGTH
+                        read length (default: 100)
+
+#+END_EXAMPLE
+resulting reads in fasta format has names which include following information:
+ - original long read name index
+ - position of pseudo forward read in long reads
+forward a reverse reads are interlaced a reverse reads are reverse complement of original long sequence
+example outut:
+#+BEGIN_EXAMPLE
+>1_1_101_f
+TGGTACTTGCGGTTACGTATTGCTAGCTAGTCTCCATTTGTCCGTTGGTCTTAGGTGATT
+TTCCAAGCTTTGTGTGTAAATGTAAGGATCCTCATTTGTA
+>1_1_101_r
+GTTTTGTTATCGTGATCCACAGATCAGAAGATATCGCCGCTCACCTGTCAATTAATCTTA
+ACTTAATGTACACTAGGGTTTTGGTTTTAACTGCTATCTT
+>1_2001_2101_f
+CTGAGTTGGGCAACATAGCCGACAAATTTGAACAATAAGCCGGTCCAGCCTTCTTTCTCA
+GCTGATACATGAAACAAATCAAAGGAGCATTGTAAAGGCG
+>1_2001_2101_r
+TTTTGAATGATGGCACTACCGTGATCAAGGACGATGGTCTCCGTTCACTCGCTTTTGTTG
+TACGTTCTCTATGAACTTGGTTTCTTTGCATTCGGTTCTT
+>1_4001_4101_f
+GAAGTTGAAGGAACATTTGGAAAGGTGTGTGAAGACTAATTTGGTCT
+#+END_EXAMPLE