# HG changeset patch # User fastaptamer # Date 1423596629 18000 # Node ID 307254415eb13385f1493640184f741c1d960a06 Uploaded diff -r 000000000000 -r 307254415eb1 fastaptamer_cluster_1.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastaptamer_cluster_1.xml Tue Feb 10 14:30:29 2015 -0500 @@ -0,0 +1,65 @@ + + + Cluster closely-related sequences using Levenshtein edit distance. + + fastaptamer_cluster -v + + fastaptamer_cluster -i $input -o $output -d $distance -f $filter > $report + + + + + + + + + + + + + + + +.. class:: warningmark + +FASTAptamer-Cluster requires a FASTA formatted input file generated by FASTAptamer-Count. + +.. class:: warningmark + +FASTAptamer-Cluster uses an exhaustive approach to clustering and can take *several* hours to process. For faster processing utilize the "Read Filter" option to exclude low read sequences. + +------ + +**FASTAptamer-Cluster** uses the Levenshtein algorithm to cluster together closely-related sequences based on a user-defined edit distance (*the minimum number of insertions, deletions, or subsitutions required to transform one string into another*). + +FASTAptamer-Cluster begins with the most abundant sequence in a population, referred to as the "seed sequence," and clusters with it every sequence in the file within an edit distance less than or equal to the specified edit distance (Cluster #1). The next most abundant unclustered sequence then serves as the next seed sequence for assembling the second cluster from the remaining sequences (Cluster #2), followed by the next most abundant unclustered sequence (Cluster #3), and so on. This process is iterated until every sequence is clustered. + +Output is FASTA formatted with the following information on the FASTA identifier line: + + >Rank-Reads-RPM-Cluster#-RankWithinCluster-EditDistanceFromSeedSequence + +.. class:: infomark + +The "Read Filter" excludes from the clustering process sequences with a total number of reads less than or equal to the integer supplied. Because of the computational complexity of clustering large datasets, the default filter setting of 1 is designed to eliminate singleton sequences from clustering. + +------ + +.. image:: + http://burkelab.missouri.edu/images/fastaptamer-logo-xs.png + :height: 98 + :width: 300 + +For more information on FASTAptamer, visit our website_. + +FASTAptamer is distributed under a GNU GPL v3.0 license. For complete license click here_. + +.. _here: http://burkelab.missouri.edu/fastaptamer/LICENSE.txt +.. _website: http://burkelab.missouri.edu/fastaptamer.html + + + + + doi:10.1038/mtna.2015.4 + + +