Mercurial > repos > idot > fastx_toolkit2
comparison fasta_clipping_histogram.xml @ 0:78a7d28f2a15 draft
Uploaded
| author | idot |
|---|---|
| date | Wed, 10 Jul 2013 06:13:48 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:78a7d28f2a15 |
|---|---|
| 1 <tool id="cshl_fasta_clipping_histogram" name="Length Distribution"> | |
| 2 <description>chart</description> | |
| 3 <command>fasta_clipping_histogram.pl $input $outfile</command> | |
| 4 | |
| 5 <inputs> | |
| 6 <param format="fasta" name="input" type="data" label="Library to analyze" /> | |
| 7 </inputs> | |
| 8 | |
| 9 <outputs> | |
| 10 <data format="png" name="outfile" metadata_source="input" | |
| 11 /> | |
| 12 </outputs> | |
| 13 <help> | |
| 14 | |
| 15 **What it does** | |
| 16 | |
| 17 This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file. | |
| 18 | |
| 19 **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results. | |
| 20 | |
| 21 ----- | |
| 22 | |
| 23 **Output Examples** | |
| 24 | |
| 25 In the following library, most sequences are 24-mers to 27-mers. | |
| 26 This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place). | |
| 27 | |
| 28 .. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png | |
| 29 | |
| 30 | |
| 31 In the following library, most sequences are 19,22 or 23-mers. | |
| 32 This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place). | |
| 33 | |
| 34 .. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png | |
| 35 | |
| 36 | |
| 37 ----- | |
| 38 | |
| 39 | |
| 40 **Input Formats** | |
| 41 | |
| 42 This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so:: | |
| 43 | |
| 44 >sequence1 | |
| 45 AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG | |
| 46 >sequence2 | |
| 47 GTGTGTGTGGGAAGTTGACACAGTA | |
| 48 >sequence3 | |
| 49 CCTTGAGATTAACGCTAATCAAGTAAAC | |
| 50 | |
| 51 | |
| 52 If the sequences span over multiple lines:: | |
| 53 | |
| 54 >sequence1 | |
| 55 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG | |
| 56 TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG | |
| 57 aactggtctttacctTTAAGTTG | |
| 58 | |
| 59 Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences:: | |
| 60 | |
| 61 >sequence1 | |
| 62 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG | |
| 63 | |
| 64 | |
| 65 ----- | |
| 66 | |
| 67 | |
| 68 | |
| 69 **Multiplicity counts (a.k.a reads-count)** | |
| 70 | |
| 71 If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing). | |
| 72 | |
| 73 Example 1 - The following FASTA file *does not* have multiplicity counts:: | |
| 74 | |
| 75 >seq1 | |
| 76 GGATCC | |
| 77 >seq2 | |
| 78 GGTCATGGGTTTAAA | |
| 79 >seq3 | |
| 80 GGGATATATCCCCACACACACACAC | |
| 81 | |
| 82 Each sequence is counts as one, to produce the following chart: | |
| 83 | |
| 84 .. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png | |
| 85 | |
| 86 | |
| 87 Example 2 - The following FASTA file have multiplicity counts:: | |
| 88 | |
| 89 >seq1-2 | |
| 90 GGATCC | |
| 91 >seq2-10 | |
| 92 GGTCATGGGTTTAAA | |
| 93 >seq3-3 | |
| 94 GGGATATATCCCCACACACACACAC | |
| 95 | |
| 96 The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart: | |
| 97 | |
| 98 .. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png | |
| 99 | |
| 100 Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts. | |
| 101 | |
| 102 </help> | |
| 103 </tool> | |
| 104 <!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> |
