annotate tools/indels/sam_indel_filter.xml @ 1:cdcb0ce84a1b

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:45:15 -0500
parents 9071e359b9a3
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="sam_indel_filter" name="Filter Indels" version="1.0.0">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>for SAM</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3 <command interpreter="python">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4 sam_indel_filter.py
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 --input=$input1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 --quality_threshold=$quality_threshold
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 --adjacent_bases=$adjacent_bases
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8 --output=$out_file1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9 </command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 <param format="sam" name="input1" type="data" label="Select dataset to filter" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 <param name="quality_threshold" type="integer" value="40" label="Quality threshold for adjacent bases" help="Takes Phred value assuming Sanger scale; usually between 0 and 40, but up to 93" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 <param name="adjacent_bases" type="integer" value="1" label="The number of adjacent bases to match on either side of the indel" help="If one side is shorter than this width, the read will be excluded" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 <data format="sam" name="out_file1" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20 <param name="input1" value="sam_indel_filter_in1.sam" ftype="sam"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 <param name="quality_threshold" value="14"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 <param name="adjacent_bases" value="2"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 <output name="out_file1" file="sam_indel_filter_out1.sam" ftype="sam"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 <param name="input1" value="sam_indel_filter_in1.sam" ftype="sam"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 <param name="quality_threshold" value="29"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28 <param name="adjacent_bases" value="5"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29 <output name="out_file1" file="sam_indel_filter_out2.sam" ftype="sam"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32 <param name="input1" value="sam_indel_filter_in2.sam" ftype="sam"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33 <param name="quality_threshold" value="7"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34 <param name="adjacent_bases" value="1"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35 <output name="out_file1" file="sam_indel_filter_out3.sam" ftype="sam"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40 **What it does**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 Allows extracting indels from SAM produced by BWA. Currently it can handle SAM with alignments that have only one insertion or one deletion, and will skip that alignment if it encounters one with more than one indel. It matches CIGAR strings (column 6 in the SAM file) like 5M3I5M or 4M2D10M, so there must be a match or mismatch of sufficient length on either side of the indel.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46 **Example**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48 Suppose you have the following::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50 r770 89 ref 116 37 17M1I5M = 72131356 0 CACACTGTGACAGACAGCGCAGC 00/02!!0//1200210AA44/1 XT:A:U CM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:1 MD:Z:22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51 r770 181 ref 116 0 24M = 72131356 0 TTGGTGCGCGCGGTTGAGGGTTGG $$(#%%#$%#%####$%%##$###
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52 r1945 177 ref 41710908 0 23M 190342418 181247988 0 AGAGAGAGAGAGAGAGAGAGAGA SQQWZYURVYWX]]YXTSY]]ZM XT:A:R CM:i:0 SM:i:0 AM:i:0 X0:i:163148 XM:i:0 XO:i:0 XG:i:0 MD:Z:23
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53 r3671 117 ref 190342418 0 24M = 190342418 0 CTGGCGTTCTCGGCGTGGATGGGT #####$$##$#%#%%###%$#$##
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 r3671 153 ref 190342418 37 16M1I6M = 190342418 0 TCTAACTTAGCCTCATAATAGCT /&lt;&lt;!"0///////00/!!0121/ XT:A:U CM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:1 MD:Z:22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55 r3824 117 ref 80324999 0 24M = 80324999 0 TCCAGTCGCGTTGTTAGGTTCGGA #$#$$$#####%##%%###**#+/
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 r3824 153 ref 80324999 37 8M1I14M = 80324999 0 TTTAGCCCGAAATGCCTAGAGCA 4;6//11!"11100110////00 XT:A:U CM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:1 MD:Z:22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57 r4795 81 ref 26739130 0 23M 57401793 57401793 0 TGGCATTCCTGTAGGCAGAGAGG AZWWZS]!"QNXZ]VQ]]]/2]] XT:A:R CM:i:2 SM:i:0 AM:i:0 X0:i:3 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:23
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58 r4795 161 ref 57401793 37 23M 26739130 26739130 0 GATCACCCAGGTGATGTAACTCC ]WV]]]]WW]]]]]]]]]]PU]] XT:A:U CM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:23
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59 r4800 16 ref 241 255 15M1D8M = 0 0 CGTGGCCGGCGGGCCGAAGGCAT IIIIIIIIIICCCCIII?IIIII XT:A:U CM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:1 MD:Z:22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60 r5377 170 ref 59090793 37 23M 26739130 26739130 0 TATCAATAAGGTGATGTAACTCG ]WV]ABAWW]]]]]P]P//GU]] XT:A:U CM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:23
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61 r5612 151 ref 190342418 37 19M1I3M = 190342418 0 TCTAACTTAGCCTCATAATAGCT /&lt;&lt;!"0/4//7//00/BC0121/ XT:A:U CM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:1 MD:Z:22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 To select only alignments with indels, you need to determine the minimum quality you want the adjacent bases to have, as well as the number of adjacent bases to check. If you set the quality threshold to 47 and the number of bases to check to 2, you will get the following output::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66 r770 89 ref 116 37 17M1I5M = 72131356 0 CACACTGTGACAGACAGCGCAGC 00/02!!0//1200210AA44/1 XT:A:U CM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:1 MD:Z:22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67 r4800 16 ref 241 255 15M1D8M = 0 0 CGTGGCCGGCGGGCCGAAGGCAT IIIIIIIIIICCCCIII?IIIII XT:A:U CM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:1 MD:Z:22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68 r5612 151 ref 190342418 37 19M1I3M = 190342418 0 TCTAACTTAGCCTCATAATAGCT /&lt;&lt;!"0/4//7//00/BC0121/ XT:A:U CM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:1 MD:Z:22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71 For more information on SAM, please consult the `SAM format description`__.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 .. __: http://www.ncbi.nlm.nih.gov/pubmed/19505943
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77 </tool>