annotate profrep_db_reducing.xml @ 4:e27e86406f56 draft

Uploaded
author petr-novak
date Wed, 26 Jun 2019 10:23:50 -0400
parents a5f1638b73be
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
1 <tool id="profrep_db_reducing" name="cd-hit based size reduction of Profrep database" version="1.0.0">
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
2 <description> Tool to reduce database of reads sequences based on their similarities to speed up ProfRep </description>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
3 <requirements>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
4 <requirement type="package" version="1.0.0">profrep</requirement>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
5 <requirement type="package" version="4.6.4">cd-hit</requirement>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
6 </requirements>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
7 <command>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
8 python3 ${__tool_directory__}/profrep_db_reducing.py --reads_all ${reads_all} --cls ${cls} --cluster_size ${cluster_size} --identity_th ${identity_th} --reads_reduced ${reads_reduced} --cls_reduced ${cls_reduced}
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
9 </command>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
10 <inputs>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
11 <param format="fasta" type="data" name="reads_all" label="NGS reads" help="Choose input file containing all reads sequences to be reduced" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
12 <param format="fasta" type="data" name="cls" label="RE clusters" help="Choose file containing all clusters and belonging reads [ RE archive -> seqclust -> clustering -> hitsort.cls]" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
13 <param name="cluster_size" type="integer" value="1000" min="1" max ="1000000000" label="Min cluster size" help="Only the reads from most represented clusters will be reduced - parameter indicates min. number of reads in a cluster to be involved in reducing" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
14 <param name="identity_th" type="float" value="0.90" min="0.1" max ="1.0" label="Reads identity threshold" help="Proportion of identity between reads sequences to group and reduce them" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
15 </inputs>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
16
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
17 <outputs>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
18 <data format="fasta" name="cls_reduced" label="Modified cls file of ${cls.hid}" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
19 <data format="fasta" name="reads_reduced" label="Reduced reads database of ${reads_all.hid}" />
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
20 </outputs>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
21
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
22 <help>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
23
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
24 **WHAT IT DOES**
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
25
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
26 This tool will reduce the database of all reads based on similarities between them using **cd-hit**. Basically, it creates groups of similar reads and the reduced database will then be composed of one representative read replacing the group. New reads IDs also indicate the number of reads that they represents. The identity threshold between the reads to create a group (cd-hit parameter) is by default set to **0.9**. This value usually makes a good balance between reduction level and accuracy. As the new reads database is produced, CLS file containing reads connected to clusters has to be modified as well. As the result we will obtain reduced database of reads sequences and modified cls file adjusted to the new reads database. The actual reduction level depends on number of clusters envolved and how big they are. Default value for cluster size to be involved in reducing is **1000**, which means all clusters containing 1000 and more reads will undergo the reduction.
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
27
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
28 </help>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
29 </tool>
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
30
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
31
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
32
a5f1638b73be Uploaded
petr-novak
parents:
diff changeset
33