annotate README.html @ 0:a4cd8608ef6b draft

Uploaded
author petr-novak
date Mon, 01 Apr 2019 07:56:36 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
1 <?xml version="1.0" encoding="UTF-8" ?>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
4
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
5 <html xmlns="http://www.w3.org/1999/xhtml">
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
6
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
7 <head>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
8 <title>README.html</title>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
9
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
10 </head>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
11
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
12 <body>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
13
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
14 <h1>RepeatExplorer utilities</h1>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
15 <p>This repository include utilities for preprocessing of NGS data to suitable format for RepeatExplorer and TAREAN
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
16 analysis. Each tool include also XML file which define tool interface for Galaxy environment</p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
17 <h2>Available tools</h2>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
18 <h3>Paired fastq reads filtering and interlacing</h3>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
19 <p>tool definition file: <code>paired_fastq_filtering.xml</code></p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
20 <p>This tool is designed to make memory efficient preprocessing of two fastq files. Output of this file can be used as input of RepeatExplorer clustering. Input files can be in GNU zipped archive (.gz extension). Reads are filtered based on the quality, presence of N bases and adapters. Two input fastq files are procesed in parallel. Only complete pair are kept. As the input files are process in chunks, it is required that pair reads are complete and in the same order in both input files. All reads which pass the quality filter fill be writen into output files. If sampling is specified, only sample of sequences will be returned. Cutadapt us run with this options:</p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
21 <p><code>--anywhere='AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT'
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
22 --anywhere='AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT'
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
23 --anywhere='GATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
24 --anywhere='ATCTCGTATGCCGTCTTCTGCTTG'
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
25 --anywhere='CAAGCAGAAGACGGCATACGAGAT'
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
26 --anywhere='GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC'
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
27 --error-rate=0.05
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
28 --times=1 --overlap=15 --discard</code></p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
29 <p>Order of fastq files processing</p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
30 <ol>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
31 <li>Trimming (optional)</li>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
32 <li>Filter by quality</li>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
33 <li>Discard single reads, keep complete pairs</li>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
34 <li>Cutadapt filtering</li>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
35 <li>Discard single reads, keep complete pairs</li>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
36 <li>Sampling (optional)</li>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
37 <li>Interlacing two fasta files</li>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
38 </ol>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
39 <h3>single fastq reads filtering</h3>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
40 <p>tool definition file: <code>single_fastq_filtering.xml</code></p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
41 <p>This tool is designed to perform preprocessing
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
42 of fastq file. Input files can be in GNU zipped archive (.gz extension). Reads
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
43 are filtered based on the quality, presence of N bases and adapters. All reads
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
44 which pass the quality filter fill be writen into output files. If sampling is
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
45 specified, only sample of sequences will be returned. </p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
46 <h3>fasta afixer</h3>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
47 <p>tool definition file: <code>fasta_affixer.xml</code></p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
48 <p>Tool for appending prefix and suffix to sequences names in fasta formated sequences. This tool is useful
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
49 if you want to do comparative analysis with RepeatExplorer and need to
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
50 append sample codes to sequence identifiers</p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
51 <h2>Dependencies</h2>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
52 <p>R programming environment with installed packages <em>optparse</em> and <em>ShortRead</em> (Bioconductor)
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
53 python3
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
54 cutadapt</p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
55 <h2>License</h2>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
56 <p>Copyright (c) 2012 Petr Novak (petr@umbr.cas.cz), Jiri Macas and Pavel Neumann,
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
57 Laboratory of Molecular Cytogenetics(http://w3lamc.umbr.cas.cz/lamc/)
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
58 Institute of Plant Molecular Biology, Biology Centre AS CR, Ceske Budejovice, Czech Republic</p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
59 <p>This program is free software: you can redistribute it and/or modify
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
60 it under the terms of the GNU General Public License as published by
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
61 the Free Software Foundation, either version 3 of the License, or
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
62 (at your option) any later version.</p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
63 <p>This program is distributed in the hope that it will be useful,
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
64 but WITHOUT ANY WARRANTY; without even the implied warranty of
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
65 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
66 GNU General Public License for more details.
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
67 You should have received a copy of the GNU General Public License
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
68 along with this program. If not, see <a href="http://www.gnu.org/licenses/">http://www.gnu.org/licenses/</a>.</p>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
69 </body>
a4cd8608ef6b Uploaded
petr-novak
parents:
diff changeset
70 </html>