rapidcluster: rapidcluster.xml annotate

annotate rapidcluster.xml @ 0:12f2dd9ac1fd draft

Uploaded

author	hathkul
date	Mon, 26 Dec 2016 11:04:51 -0500
parents
children

rev	line source
0 12f2dd9ac1fd Uploaded hathkul parents: diff changeset	1 <tool id="rapidcluster_2" name="RapidCluster" version="2">
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	2
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	3 <description>Cluster closely-related sequences using Levenshtein edit distance filtering.</description>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	4
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	5 <version_command>rapidcluster -v</version_command>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	6
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	7 <command interpreter="perl">rapidcluster -i $input -o $output -d $distance -f $filter -c $max_clusters > $report
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	8 </command>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	9
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	10 <inputs>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	11 <param name="input" type="data" format="fasta" label="Input file" help="Must use FASTA output from FASTAptamer-Count"></param>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	12 <param name="distance" type="integer" label="Levenshtein Edit Distance" value="1" help="Minimum number of insertions, deletions, or substitutions required to transfer a sequence into another"></param>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	13 <param name="filter" type="integer" label="Read Filter" optional="true" value="1" help="Only sequences with total reads greater than the value supplied will be clustered."></param>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	14 <param name="max_clusters" type="integer" label="Maximum number of clusters to find" optional="true" value="500" help="Script will stop after finding this much clusters"></param>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	15 </inputs>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	16
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	17 <outputs>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	18 <data name="output" format="fasta" label="$input RapidCluster output"></data>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	19 <data name="report" format="txt" label="$input RapidCluster Report"></data>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	20 </outputs>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	21
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	22 <help>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	23
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	24 .. class:: warningmark
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	25
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	26 RapidCluster requires a FASTA formatted input file generated by FASTAptamer-Count.
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	27
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	28 .. class:: warningmark
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	29
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	30 RapidCluster uses an exhaustive approach to clustering and can take several hours to process. For faster processing utilize the "Read Filter" option to exclude low read sequences and define a reasonable number of clusters to find.
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	31
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	32 ------
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	33
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	34 This version does not calculate exact Levenshtein distance for each pair of sequences, instead it simply checks if this distance is lower or greater than user-defined value. This makes script much faster for clustering highly-similar sequences.
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	35
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	36 RapidCluster begins with the most abundant sequence in a population, referred to as the "seed sequence," and clusters with it every sequence in the file within an edit distance less than or equal to the specified edit distance (Cluster #1). The next most abundant unclustered sequence then serves as the next seed sequence for assembling the second cluster from the remaining sequences (Cluster #2), followed by the next most abundant unclustered sequence (Cluster #3), and so on. This process is iterated until every sequence is clustered.
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	37
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	38 Output is FASTA formatted with the following information on the FASTA identifier line:
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	39
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	40 >Rank-Reads-RPM-Cluster#-RankWithinCluster-EditDistanceFromSeedSequence
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	41
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	42 .. class:: infomark
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	43
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	44 The "Read Filter" excludes from the clustering process sequences with a total number of reads less than or equal to the integer supplied. Because of the computational complexity of clustering large datasets, the default filter setting of 1 is designed to eliminate singleton sequences from clustering.
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	45
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	46 ------
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	47
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	48
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	49 </help>
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	50
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	51
12f2dd9ac1fd Uploaded hathkul parents: diff changeset	52 </tool>

Mercurial > repos > hathkul > rapidcluster

annotate rapidcluster.xml @ 0:12f2dd9ac1fd draft