annotate bismark_deduplicate/deduplicate_bismark @ 7:fcadce4d9a06 draft

planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
author bgruening
date Sat, 06 May 2017 13:18:09 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
7
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1 #!/usr/bin/env perl
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
2 use strict;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
3 use warnings;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
4 use Getopt::Long;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
5
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
6
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
7 ### This script is supposed to remove alignments to the same position in the genome which can arise by e.g. PCR amplification
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
8 ### Paired-end alignments are considered a duplicate if both partner reasd start and end at the exact same position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
9
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
10 ### May 13, 2013
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
11 ### Changed the single-end trimming behavior so that only the start coordinate will be used. This avoids duplicate reads that have been trimmed to a varying extent
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
12 ### Changed the way of determining the end of reads in SAM format to using the CIGAR string if the read contains InDels
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
13
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
14 ### 16 July 2013
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
15 ### Adding a new deduplication mode for barcoded RRBS-Seq
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
16
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
17 ### 27 Sept 2013
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
18 ### Added close statement for all output filehandles (which should probably have been there from the start...)
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
19
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
20 ### 8 Jan 2015
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
21 ### to detect paired-end command from the @PG line we are no requiring spaces before and after the -1 or -2
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
22
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
23 ### 09 Mar 2015
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
24 ### Removing newline characters also from the read conversion flag in case the tags had been reordered and are now present in the very last column
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
25
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
26 ### 19 08 2015
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
27 ### Hiding the option --representative from view to discourage people from using it (it was nearly always not what they wanted to do anyway). It should still work
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
28 ### for alignments that do not contain any indels
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
29 ### Just for reference, here is the the text:
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
30 ### print "--representative\twill browse through all sequences and print out the sequence with the most representative (as in most frequent) methylation call for any given position. Note that this is very likely the most highly amplified PCR product for a given sequence\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
31
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
32
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
33 my $dedup_version = 'v0.16.3';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
34
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
35 my $help;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
36 my $representative;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
37 my $single;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
38 my $paired;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
39 my $global_single;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
40 my $global_paired;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
41 my $vanilla;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
42 my $samtools_path;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
43 my $bam;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
44 my $rrbs;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
45 my $version;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
46
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
47 my $command_line = GetOptions ('help' => \$help,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
48 'representative' => \$representative,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
49 's|single' => \$global_single,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
50 'p|paired' => \$global_paired,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
51 'vanilla' => \$vanilla,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
52 'samtools_path=s' => \$samtools_path,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
53 'bam' => \$bam,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
54 'barcode' => \$rrbs,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
55 'version' => \$version,
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
56 );
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
57
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
58 die "Please respecify command line options\n\n" unless ($command_line);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
59
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
60 if ($help){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
61 print_helpfile();
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
62 exit;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
63 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
64
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
65 if ($version){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
66 print << "VERSION";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
67
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
68 Bismark Deduplication Module
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
69
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
70 Deduplicator Version: $dedup_version
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
71 Copyright 2010-16 Felix Krueger, Babraham Bioinformatics
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
72 www.bioinformatics.babraham.ac.uk/projects/bismark/
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
73
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
74
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
75 VERSION
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
76 exit;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
77 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
78
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
79
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
80
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
81 my @filenames = @ARGV;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
82
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
83 unless (@filenames){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
84 print "Please provide one or more Bismark output files for deduplication\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
85 sleep (2);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
86 print_helpfile();
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
87 exit;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
88 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
89
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
90
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
91 ### OPTIONS
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
92 unless ($global_single or $global_paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
93 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
94 die "Please specify either -s (single-end) or -p (paired-end) for deduplication. Reading this information from the \@PG header line only works for SAM/BAM files\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
95 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
96 warn "\nNeither -s (single-end) nor -p (paired-end) selected for deduplication. Trying to extract this information for each file separately from the \@PG line of the SAM/BAM file\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
97 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
98
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
99 if ($global_paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
100 if ($global_single){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
101 die "Please select either -s for single-end files or -p for paired-end files, but not both at the same time!\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
102 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
103 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
104
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
105 if ($rrbs){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
106 die "Barcode deduplication only works with Bismark SAM (or BAM) output (in attempt to phase out the vanilla format)\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
107 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
108
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
109 warn "Processing paired-end custom Bismark output file(s):\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
110 warn join ("\t",@filenames),"\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
111 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
112 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
113 warn "Processing paired-end Bismark output file(s) (SAM format):\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
114 warn join ("\t",@filenames),"\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
115 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
116 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
117 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
118 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
119 warn "Processing single-end custom Bismark output file(s):\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
120 warn join ("\t",@filenames),"\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
121 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
122 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
123 warn "Processing single-end Bismark output file(s) (SAM format):\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
124 warn join ("\t",@filenames),"\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
125 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
126 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
127
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
128 ### PATH TO SAMTOOLS
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
129 if (defined $samtools_path){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
130 # if Samtools was specified as full command
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
131 if ($samtools_path =~ /samtools$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
132 if (-e $samtools_path){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
133 # Samtools executable found
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
134 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
135 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
136 die "Could not find an installation of Samtools at the location $samtools_path. Please respecify\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
137 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
138 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
139 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
140 unless ($samtools_path =~ /\/$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
141 $samtools_path =~ s/$/\//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
142 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
143 $samtools_path .= 'samtools';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
144 if (-e $samtools_path){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
145 # Samtools executable found
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
146 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
147 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
148 die "Could not find an installation of Samtools at the location $samtools_path. Please respecify\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
149 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
150 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
151 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
152 # Check whether Samtools is in the PATH if no path was supplied by the user
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
153 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
154 if (!system "which samtools >/dev/null 2>&1"){ # STDOUT is binned, STDERR is redirected to STDOUT. Returns 0 if Samtools is in the PATH
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
155 $samtools_path = `which samtools`;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
156 chomp $samtools_path;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
157 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
158 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
159
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
160 if ($bam){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
161 if (defined $samtools_path){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
162 $bam = 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
163 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
164 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
165 warn "No Samtools found on your system, writing out a gzipped SAM file instead\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
166 $bam = 2;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
167 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
168 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
169 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
170 $bam = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
171 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
172
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
173
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
174 if ($representative){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
175 warn "\nIf there are several alignments to a single position in the genome the alignment with the most representative methylation call will be chosen (this might be the most highly amplified PCR product...)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
176 sleep (1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
177 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
178 elsif($rrbs){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
179 warn "\nIf the input is a multiplexed sample with several alignments to a single position in the genome, only alignments with a unique barcode will be chosen)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
180 sleep (1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
181 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
182 else{ # default; random (=first) alignment
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
183 warn "\nIf there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads.\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
184 sleep (1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
185 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
186
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
187 foreach my $file (@filenames){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
188
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
189 if ($global_single){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
190 $paired = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
191 $single = 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
192 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
193 elsif($global_paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
194 $paired = 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
195 $single = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
196 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
197
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
198 unless($global_single or $global_paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
199
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
200 warn "Trying to determine the type of mapping from the SAM header line\n"; sleep(1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
201
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
202 ### if the user did not specify whether the alignment file was single-end or paired-end we are trying to get this information from the @PG header line in the SAM/BAM file
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
203 if ($file =~ /\.gz$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
204 open (DETERMINE,"gunzip -c $file |") or die "Unable to read from gzipped file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
205 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
206 elsif ($file =~ /\.bam$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
207 open (DETERMINE,"$samtools_path view -h $file |") or die "Unable to read from BAM file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
208 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
209 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
210 open (DETERMINE,$file) or die "Unable to read from $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
211 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
212 while (<DETERMINE>){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
213 last unless (/^\@/);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
214 if ($_ =~ /^\@PG/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
215 # warn "found the \@PG line:\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
216 # warn "$_";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
217
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
218 if ($_ =~ /\s+-1\s+/ and $_ =~ /\s+-2\s+/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
219 warn "Treating file as paired-end data (extracted from \@PG line)\n"; sleep(1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
220 $paired = 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
221 $single = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
222 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
223 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
224 warn "Treating file as single-end data (extracted from \@PG line)\n"; sleep(1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
225 $paired = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
226 $single = 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
227 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
228 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
229 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
230 close DETERMINE or warn "$!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
231 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
232
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
233 if ($file =~ /(\.bam$|\.sam$)/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
234 bam_isEmpty($file);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
235 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
236
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
237
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
238 ### OPTIONS
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
239 unless ($single or $paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
240 die "Please specify either -s (single-end) or -p (paired-end) for deduplication, or provide a SAM/BAM file that contains the \@PG header line\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
241 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
242
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
243 ###
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
244 unless ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
245 if ($paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
246 test_positional_sorting($file);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
247 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
248 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
249
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
250
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
251 ### writing to a report file
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
252 my $report = $file;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
253
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
254 $report =~ s/\.gz$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
255 $report =~ s/\.sam$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
256 $report =~ s/\.bam$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
257 $report =~ s/\.txt$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
258 $report =~ s/$/.deduplication_report.txt/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
259
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
260 open (REPORT,'>',$report) or die "Failed to write to report file to $report: $!\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
261
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
262
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
263 ### for representative methylation calls we need to discriminate between single-end and paired-end files as the latter have 2 methylation call strings
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
264 if($representative){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
265 deduplicate_representative($file);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
266 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
267
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
268 elsif($rrbs){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
269 deduplicate_barcoded_rrbs($file);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
270 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
271
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
272 ### as the default option we simply write out the first read for a position and discard all others. This is the fastest option
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
273 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
274
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
275 my %unique_seqs;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
276 my %positions;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
277
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
278 if ($file =~ /\.gz$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
279 open (IN,"gunzip -c $file |") or die "Unable to read from gzipped file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
280 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
281 elsif ($file =~ /\.bam$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
282 open (IN,"$samtools_path view -h $file |") or die "Unable to read from BAM file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
283 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
284 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
285 open (IN,$file) or die "Unable to read from $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
286 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
287
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
288 my $outfile = $file;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
289 $outfile =~ s/\.gz$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
290 $outfile =~ s/\.sam$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
291 $outfile =~ s/\.bam$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
292 $outfile =~ s/\.txt$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
293
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
294 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
295 $outfile =~ s/$/_deduplicated.txt/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
296 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
297 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
298 if ($bam == 1){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
299 $outfile =~ s/$/.deduplicated.bam/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
300 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
301 elsif ($bam == 2){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
302 $outfile =~ s/$/.deduplicated.sam.gz/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
303 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
304 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
305 $outfile =~ s/$/.deduplicated.sam/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
306 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
307 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
308 if ($bam == 1){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
309 open (OUT,"| $samtools_path view -bSh 2>/dev/null - > $outfile") or die "Failed to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
310 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
311 elsif($bam == 2){ ### no Samtools found on system. Using GZIP compression instead
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
312 open (OUT,"| gzip -c - > $outfile") or die "Failed to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
313 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
314
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
315 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
316 open (OUT,'>',$outfile) or die "Unable to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
317 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
318
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
319 ### need to proceed slightly differently for the custom Bismark and Bismark SAM output
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
320 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
321 $_ = <IN>; # Bismark version header
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
322 print OUT; # Printing the Bismark version to the de-duplicated file again
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
323 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
324 my $count = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
325 my $unique_seqs = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
326 my $removed = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
327
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
328 while (<IN>){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
329
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
330 if ($count == 0){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
331 if ($_ =~ /^Bismark version:/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
332 warn "The file appears to be in the custom Bismark and not SAM format. Please see option --vanilla!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
333 sleep (2);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
334 print_helpfile();
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
335 exit;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
336 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
337 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
338
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
339 ### if this was a SAM file we ignore header lines
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
340 unless ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
341 if (/^\@\w{2}\t/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
342 print "skipping header line:\t$_";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
343 print OUT "$_"; # Printing the header lines again into the de-duplicated file
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
344 next;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
345 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
346 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
347
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
348 ++$count;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
349 my $composite; # storing positional data. For single end data we are only using the start coordinate since the end might have been trimmed to different lengths
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
350
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
351 my ($strand,$chr,$start,$end,$cigar);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
352 my $line1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
353
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
354 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
355 ($strand,$chr,$start,$end) = (split (/\t/))[1,2,3,4];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
356 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
357 else{ # SAM format
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
358 ($strand,$chr,$start,$cigar) = (split (/\t/))[1,2,3,5]; # we are assigning the FLAG value to $strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
359
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
360 ### SAM single-end
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
361 if ($single){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
362
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
363 if ($strand == 0 ){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
364 ### read aligned to the forward strand. No action needed
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
365 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
366 elsif ($strand == 16){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
367 ### read is on reverse strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
368
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
369 $start -= 1; # only need to adjust this once
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
370
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
371 # for InDel free matches we can simply use the M number in the CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
372 if ($cigar =~ /^(\d+)M$/){ # linear match
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
373 $start += $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
374 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
375
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
376 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
377 # parsing CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
378 my @len = split (/\D+/,$cigar); # storing the length per operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
379 my @ops = split (/\d+/,$cigar); # storing the operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
380 shift @ops; # remove the empty first element
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
381 die "CIGAR string contained a non-matching number of lengths and operations\n" unless (scalar @len == scalar @ops);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
382
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
383 # warn "CIGAR string; $cigar\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
384 ### determining end position of a read
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
385 foreach my $index(0..$#len){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
386 if ($ops[$index] eq 'M'){ # standard matching bases
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
387 $start += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
388 # warn "Operation is 'M', adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
389 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
390 elsif($ops[$index] eq 'I'){ # insertions do not affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
391 # warn "Operation is 'I', next\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
392 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
393 elsif($ops[$index] eq 'D'){ # deletions do affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
394 # warn "Operation is 'D',adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
395 $start += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
396 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
397 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
398 die "Found CIGAR operations other than M, I or D: '$ops[$index]'. Not allowed at the moment\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
399 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
400 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
401 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
402 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
403 $composite = join (":",$strand,$chr,$start);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
404 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
405 elsif($paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
406
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
407 ### storing the current line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
408 $line1 = $_;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
409
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
410 my $read_conversion;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
411 my $genome_conversion;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
412
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
413 while ( /(XR|XG):Z:([^\t]+)/g ) {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
414 my $tag = $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
415 my $value = $2;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
416
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
417 if ($tag eq "XR") {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
418 $read_conversion = $value;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
419 $read_conversion =~ s/\r//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
420 chomp $read_conversion;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
421 } elsif ($tag eq "XG") {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
422 $genome_conversion = $value;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
423 $genome_conversion =~ s/\r//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
424 chomp $genome_conversion;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
425 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
426 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
427 die "Failed to determine read and genome conversion from line: $line1\n\n" unless ($read_conversion and $read_conversion);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
428
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
429
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
430 my $index;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
431 if ($read_conversion eq 'CT' and $genome_conversion eq 'CT') { ## original top strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
432 $index = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
433 $strand = '+';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
434 } elsif ($read_conversion eq 'GA' and $genome_conversion eq 'CT') { ## complementary to original top strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
435 $index = 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
436 $strand = '-';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
437 } elsif ($read_conversion eq 'GA' and $genome_conversion eq 'GA') { ## complementary to original bottom strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
438 $index = 2;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
439 $strand = '+';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
440 } elsif ($read_conversion eq 'CT' and $genome_conversion eq 'GA') { ## original bottom strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
441 $index = 3;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
442 $strand = '-';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
443 } else {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
444 die "Unexpected combination of read and genome conversion: '$read_conversion' / '$genome_conversion'\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
445 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
446
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
447 # if the read aligns in forward orientation we can certainly use the start position of read 1, and only need to work out the end position of read 2
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
448 if ($index == 0 or $index == 2){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
449
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
450 ### reading in the next line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
451 $_ = <IN>;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
452 # the only thing we need is the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
453 ($end,my $cigar_2) = (split (/\t/))[3,5];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
454
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
455 $end -= 1; # only need to adjust this once
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
456
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
457 # for InDel free matches we can simply use the M number in the CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
458 if ($cigar_2 =~ /^(\d+)M$/){ # linear match
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
459 $end += $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
460 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
461 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
462 # parsing CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
463 my @len = split (/\D+/,$cigar_2); # storing the length per operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
464 my @ops = split (/\d+/,$cigar_2); # storing the operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
465 shift @ops; # remove the empty first element
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
466 die "CIGAR string contained a non-matching number of lengths and operations ($cigar_2)\n" unless (scalar @len == scalar @ops);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
467
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
468 # warn "CIGAR string; $cigar_2\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
469 ### determining end position of the read
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
470 foreach my $index(0..$#len){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
471 if ($ops[$index] eq 'M'){ # standard matching bases
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
472 $end += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
473 # warn "Operation is 'M', adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
474 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
475 elsif($ops[$index] eq 'I'){ # insertions do not affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
476 # warn "Operation is 'I', next\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
477 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
478 elsif($ops[$index] eq 'D'){ # deletions do affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
479 # warn "Operation is 'D',adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
480 $end += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
481 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
482 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
483 die "Found CIGAR operations other than M, I or D: '$ops[$index]'. Not allowed at the moment\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
484 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
485 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
486 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
487 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
488 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
489 # else read 1 aligns in reverse orientation and we need to work out the end of the fragment first, and use the start of the next line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
490
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
491 $end = $start - 1; # need to adjust this only once
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
492
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
493 # for InDel free matches we can simply use the M number in the CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
494 if ($cigar =~ /^(\d+)M$/){ # linear match
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
495 $end += $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
496 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
497 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
498 # parsing CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
499 my @len = split (/\D+/,$cigar); # storing the length per operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
500 my @ops = split (/\d+/,$cigar); # storing the operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
501 shift @ops; # remove the empty first element
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
502 die "CIGAR string contained a non-matching number of lengths and operations ($cigar)\n" unless (scalar @len == scalar @ops);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
503
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
504 # warn "CIGAR string; $cigar\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
505 ### determining end position of the read
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
506 foreach my $index(0..$#len){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
507 if ($ops[$index] eq 'M'){ # standard matching bases
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
508 $end += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
509 # warn "Operation is 'M', adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
510 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
511 elsif($ops[$index] eq 'I'){ # insertions do not affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
512 # warn "Operation is 'I', next\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
513 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
514 elsif($ops[$index] eq 'D'){ # deletions do affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
515 # warn "Operation is 'D',adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
516 $end += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
517 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
518 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
519 die "Found CIGAR operations other than M, I or D: '$ops[$index]'. Not allowed at the moment\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
520 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
521 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
522 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
523
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
524 ### reading in the next line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
525 $_ = <IN>;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
526 # the only thing we need is the start position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
527 ($start) = (split (/\t/))[3];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
528 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
529 $composite = join (":",$strand,$chr,$start,$end);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
530 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
531
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
532 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
533 die "Input must be single or paired-end\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
534 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
535 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
536
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
537 if (exists $unique_seqs{$composite}){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
538 ++$removed;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
539 unless (exists $positions{$composite}){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
540 $positions{$composite}++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
541 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
542 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
543 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
544 if ($paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
545 unless ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
546 print OUT "$line1"; # printing first paired-end line for SAM output
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
547 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
548 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
549 print OUT "$_"; # printing single-end SAM alignment or second paired-end line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
550 $unique_seqs{$composite}++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
551 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
552 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
553
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
554 my $percentage;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
555 my $percentage_leftover;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
556 my $leftover = $count - $removed;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
557
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
558 unless ($count == 0){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
559 $percentage = sprintf("%.2f",$removed/$count*100);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
560 $percentage_leftover = sprintf("%.2f",$leftover/$count*100);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
561 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
562 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
563 $percentage = 'N/A';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
564 $percentage_leftover = 'N/A';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
565 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
566
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
567 warn "\nTotal number of alignments analysed in $file:\t$count\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
568 warn "Total number duplicated alignments removed:\t$removed ($percentage%)\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
569 warn "Duplicated alignments were found at:\t",scalar keys %positions," different position(s)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
570 warn "Total count of deduplicated leftover sequences: $leftover ($percentage_leftover% of total)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
571
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
572 print REPORT "\nTotal number of alignments analysed in $file:\t$count\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
573 print REPORT "Total number duplicated alignments removed:\t$removed ($percentage%)\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
574 print REPORT "Duplicated alignments were found at:\t",scalar keys %positions," different position(s)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
575 print REPORT "Total count of deduplicated leftover sequences: $leftover ($percentage_leftover% of total)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
576 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
577
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
578 close OUT or warn "Failed to close output filehandle: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
579 close REPORT or warn "Failed to close report filehandle: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
580
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
581 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
582
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
583
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
584 sub deduplicate_barcoded_rrbs{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
585
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
586 my $file = shift;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
587
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
588 my %unique_seqs;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
589 my %positions;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
590
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
591 if ($file =~ /\.gz$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
592 open (IN,"gunzip -c $file |") or die "Unable to read from gzipped file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
593 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
594 elsif ($file =~ /\.bam$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
595 open (IN,"$samtools_path view -h $file |") or die "Unable to read from BAM file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
596 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
597 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
598 open (IN,$file) or die "Unable to read from $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
599 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
600
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
601 my $outfile = $file;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
602 $outfile =~ s/\.gz$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
603 $outfile =~ s/\.sam$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
604 $outfile =~ s/\.bam$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
605 $outfile =~ s/\.txt$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
606
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
607 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
608 $outfile =~ s/$/_dedup_RRBS.txt/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
609 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
610 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
611 if ($bam == 1){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
612 $outfile =~ s/$/.dedup_RRBS.bam/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
613 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
614 elsif ($bam == 2){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
615 $outfile =~ s/$/.dedupRRBS.sam.gz/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
616 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
617 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
618 $outfile =~ s/$/.dedup_RRBS.sam/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
619 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
620 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
621 if ($bam == 1){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
622 open (OUT,"| $samtools_path view -bSh 2>/dev/null - > $outfile") or die "Failed to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
623 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
624 elsif($bam == 2){ ### no Samtools found on system. Using GZIP compression instead
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
625 open (OUT,"| gzip -c - > $outfile") or die "Failed to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
626 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
627 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
628 open (OUT,'>',$outfile) or die "Unable to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
629 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
630
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
631 ### This mode only supports Bismark SAM output
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
632 my $count = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
633 my $unique_seqs = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
634 my $removed = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
635
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
636 while (<IN>){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
637
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
638 if ($count == 0){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
639 if ($_ =~ /^Bismark version:/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
640 warn "The file appears to be in the custom Bismark and not SAM format. Please see option --vanilla!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
641 sleep (2);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
642 print_helpfile();
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
643 exit;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
644 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
645 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
646
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
647 ### if this was a SAM file we ignore header lines
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
648 if (/^\@\w{2}\t/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
649 warn "skipping SAM header line:\t$_";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
650 print OUT; # Printing the header lines again into the de-duplicated file
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
651 next;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
652 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
653
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
654 ++$count;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
655 my $composite; # storing positional data. For single end data we are only using the start coordinate since the end might have been trimmed to different lengths
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
656 ### in this barcoded mode we also store the read barcode as additional means of assisting the deduplication
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
657 ### in effect the $composite string looks like this (separated by ':'):
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
658
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
659 ### FLAG:chromosome:start:barcode
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
660
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
661 my $end;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
662 my $line1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
663
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
664 # SAM format
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
665 my ($id,$strand,$chr,$start,$cigar) = (split (/\t/))[0,1,2,3,5]; # we are assigning the FLAG value to $strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
666
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
667 $id =~ /:(\w+)$/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
668 my $barcode = $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
669 unless ($barcode){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
670 die "Failed to extract a barcode from the read ID (last element of each read ID needs to be the barcode sequence, e.g. ':CATG'\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
671 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
672
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
673 ### SAM single-end
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
674 if ($single){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
675
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
676 if ($strand == 0 ){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
677 ### read aligned to the forward strand. No action needed
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
678 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
679 elsif ($strand == 16){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
680 ### read is on reverse strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
681
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
682 $start -= 1; # only need to adjust this once
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
683
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
684 # for InDel free matches we can simply use the M number in the CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
685 if ($cigar =~ /^(\d+)M$/){ # linear match
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
686 $start += $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
687 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
688 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
689 # parsing CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
690 my @len = split (/\D+/,$cigar); # storing the length per operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
691 my @ops = split (/\d+/,$cigar); # storing the operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
692 shift @ops; # remove the empty first element
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
693 die "CIGAR string contained a non-matching number of lengths and operations\n" unless (scalar @len == scalar @ops);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
694
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
695 # warn "CIGAR string; $cigar\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
696 ### determining end position of a read
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
697 foreach my $index(0..$#len){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
698 if ($ops[$index] eq 'M'){ # standard matching bases
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
699 $start += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
700 # warn "Operation is 'M', adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
701 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
702 elsif($ops[$index] eq 'I'){ # insertions do not affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
703 # warn "Operation is 'I', next\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
704 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
705 elsif($ops[$index] eq 'D'){ # deletions do affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
706 # warn "Operation is 'D',adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
707 $start += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
708 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
709 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
710 die "Found CIGAR operations other than M, I or D: '$ops[$index]'. Not allowed at the moment\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
711 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
712 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
713 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
714 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
715
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
716 ### Here we take the barcode sequence into consideration
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
717 $composite = join (":",$strand,$chr,$start,$barcode);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
718 # warn "$composite\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
719 # sleep(1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
720 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
721 elsif($paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
722
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
723 ### storing the current line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
724 $line1 = $_;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
725
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
726 my $read_conversion;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
727 my $genome_conversion;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
728
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
729 while ( /(XR|XG):Z:([^\t]+)/g ) {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
730 my $tag = $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
731 my $value = $2;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
732
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
733 if ($tag eq "XR") {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
734 $read_conversion = $value;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
735 $read_conversion =~ s/\r//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
736 chomp $read_conversion;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
737 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
738 elsif ($tag eq "XG") {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
739 $genome_conversion = $value;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
740 $genome_conversion =~ s/\r//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
741 chomp $genome_conversion;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
742 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
743 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
744 die "Failed to determine read and genome conversion from line: $line1\n\n" unless ($read_conversion and $read_conversion);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
745
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
746
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
747 my $index;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
748 if ($read_conversion eq 'CT' and $genome_conversion eq 'CT') { ## original top strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
749 $index = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
750 $strand = '+';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
751 } elsif ($read_conversion eq 'GA' and $genome_conversion eq 'CT') { ## complementary to original top strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
752 $index = 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
753 $strand = '-';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
754 } elsif ($read_conversion eq 'GA' and $genome_conversion eq 'GA') { ## complementary to original bottom strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
755 $index = 2;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
756 $strand = '+';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
757 } elsif ($read_conversion eq 'CT' and $genome_conversion eq 'GA') { ## original bottom strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
758 $index = 3;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
759 $strand = '-';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
760 } else {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
761 die "Unexpected combination of read and genome conversion: '$read_conversion' / '$genome_conversion'\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
762 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
763
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
764 # if the read aligns in forward orientation we can certainly use the start position of read 1, and only need to work out the end position of read 2
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
765 if ($index == 0 or $index == 2){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
766
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
767 ### reading in the next line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
768 $_ = <IN>;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
769 # the only thing we need is the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
770 ($end,my $cigar_2) = (split (/\t/))[3,5];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
771
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
772 $end -= 1; # only need to adjust this once
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
773
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
774 # for InDel free matches we can simply use the M number in the CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
775 if ($cigar_2 =~ /^(\d+)M$/){ # linear match
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
776 $end += $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
777 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
778 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
779 # parsing CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
780 my @len = split (/\D+/,$cigar_2); # storing the length per operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
781 my @ops = split (/\d+/,$cigar_2); # storing the operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
782 shift @ops; # remove the empty first element
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
783 die "CIGAR string contained a non-matching number of lengths and operations ($cigar_2)\n" unless (scalar @len == scalar @ops);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
784
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
785 # warn "CIGAR string; $cigar_2\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
786 ### determining end position of the read
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
787 foreach my $index(0..$#len){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
788 if ($ops[$index] eq 'M'){ # standard matching bases
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
789 $end += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
790 # warn "Operation is 'M', adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
791 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
792 elsif($ops[$index] eq 'I'){ # insertions do not affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
793 # warn "Operation is 'I', next\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
794 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
795 elsif($ops[$index] eq 'D'){ # deletions do affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
796 # warn "Operation is 'D',adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
797 $end += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
798 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
799 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
800 die "Found CIGAR operations other than M, I or D: '$ops[$index]'. Not allowed at the moment\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
801 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
802 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
803 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
804 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
805 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
806 # else read 1 aligns in reverse orientation and we need to work out the end of the fragment first, and use the start of the next line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
807
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
808 $end = $start - 1; # need to adjust this only once
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
809
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
810 # for InDel free matches we can simply use the M number in the CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
811 if ($cigar =~ /^(\d+)M$/){ # linear match
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
812 $end += $1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
813 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
814 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
815 # parsing CIGAR string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
816 my @len = split (/\D+/,$cigar); # storing the length per operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
817 my @ops = split (/\d+/,$cigar); # storing the operation
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
818 shift @ops; # remove the empty first element
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
819 die "CIGAR string contained a non-matching number of lengths and operations ($cigar)\n" unless (scalar @len == scalar @ops);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
820
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
821 # warn "CIGAR string; $cigar\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
822 ### determining end position of the read
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
823 foreach my $index(0..$#len){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
824 if ($ops[$index] eq 'M'){ # standard matching bases
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
825 $end += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
826 # warn "Operation is 'M', adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
827 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
828 elsif($ops[$index] eq 'I'){ # insertions do not affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
829 # warn "Operation is 'I', next\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
830 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
831 elsif($ops[$index] eq 'D'){ # deletions do affect the end position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
832 # warn "Operation is 'D',adding $len[$index] bp\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
833 $end += $len[$index];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
834 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
835 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
836 die "Found CIGAR operations other than M, I or D: '$ops[$index]'. Not allowed at the moment\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
837 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
838 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
839 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
840
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
841 ### reading in the next line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
842 $_ = <IN>;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
843 # the only thing we need is the start position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
844 ($start) = (split (/\t/))[3];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
845 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
846
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
847 ### Here we take the barcode sequence into consideration
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
848 $composite = join (":",$strand,$chr,$start,$end,$barcode);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
849 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
850 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
851 die "Input must be single or paired-end\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
852 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
853
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
854 if (exists $unique_seqs{$composite}){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
855 ++$removed;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
856 unless (exists $positions{$composite}){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
857 $positions{$composite}++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
858 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
859 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
860 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
861 if ($paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
862 print OUT $line1; # printing first paired-end line for SAM output
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
863 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
864 print OUT; # printing single-end SAM alignment or second paired-end line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
865 $unique_seqs{$composite}++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
866 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
867 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
868
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
869 my $percentage;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
870 my $percentage_leftover;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
871 my $leftover = $count - $removed;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
872
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
873 unless ($count == 0){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
874 $percentage = sprintf("%.2f",$removed/$count*100);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
875 $percentage_leftover = sprintf("%.2f",$leftover/$count*100);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
876 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
877 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
878 $percentage = 'N/A';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
879 $percentage_leftover = 'N/A';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
880 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
881
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
882
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
883 warn "\nTotal number of alignments analysed in $file:\t$count\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
884 warn "Total number duplicated alignments removed:\t$removed ($percentage%)\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
885 warn "Duplicated alignments were found at:\t",scalar keys %positions," different position(s)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
886 warn "Total count of deduplicated leftover sequences: $leftover ($percentage_leftover% of total)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
887
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
888 print REPORT "\nTotal number of alignments analysed in $file:\t$count\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
889 print REPORT "Total number duplicated alignments removed:\t$removed ($percentage%)\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
890 print REPORT "Duplicated alignments were found at:\t",scalar keys %positions," different position(s)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
891 print REPORT "Total count of deduplicated leftover sequences: $leftover ($percentage_leftover% of total)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
892
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
893 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
894
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
895 sub bam_isEmpty{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
896
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
897 my $file = shift;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
898
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
899 if ($file =~ /\.bam$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
900 open (EMPTY,"$samtools_path view $file |") or die "Unable to read from BAM file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
901 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
902 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
903 open (EMPTY,$file) or die "Unable to read from $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
904 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
905 my $count = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
906 while (<EMPTY>){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
907 if ($_){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
908 $count++; # file contains data, fine.
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
909 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
910 last; # one line is enough
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
911 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
912
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
913 if ($count == 0){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
914 die "\n### File appears to be empty, terminating deduplication process. Please make sure the input file has not been truncated. ###\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
915 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
916 close EMPTY or warn "$!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
917 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
918
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
919
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
920 sub print_helpfile{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
921 print "\n",'='x111,"\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
922 print "\nThis script is supposed to remove alignments to the same position in the genome from the Bismark mapping output\n(both single and paired-end SAM files), which can arise by e.g. excessive PCR amplification. If sequences align\nto the same genomic position but on different strands they will be scored individually.\n\nNote that deduplication is not recommended for RRBS-type experiments!\n\nIn the default mode, the first alignment to a given position will be used irrespective of its methylation call\n(this is the fastest option, and as the alignments are not ordered in any way this is also near enough random).\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
923 print "For single-end alignments only use the start coordinate of a read will be used for deduplication.\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
924 print "For paired-end alignments the start-coordinate of the first read and the end coordinate of the second\nread will be used for deduplication. ";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
925 print "This script expects the Bismark output to be in SAM format\n(Bismark v0.6.x or higher). To deduplicate the old custom Bismark output please specify '--vanilla'.\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
926 print "*** Please note that for paired-end BAM files the deduplication script expects Read1 and Read2 to\nfollow each other in consecutive lines! If the file has been sorted by position make sure that you resort it\nby read name first (e.g. using samtools sort -n) ***\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
927
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
928 print '='x111,"\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
929 print ">>> USAGE: ./deduplicate_bismark_alignment_output.pl [options] filename(s) <<<\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
930
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
931 print "-s/--single\t\tdeduplicate single-end Bismark files (default format: SAM)\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
932 print "-p/--paired\t\tdeduplicate paired-end Bismark files (default format: SAM)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
933 print "--vanilla\t\tThe input file is in the old custom Bismark format and not in SAM format\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
934 print "--barcode\t\tIn addition to chromosome, start position and orientation this will also take a potential barcode into\n consideration while deduplicating. The barcode needs to be the last element of the read ID and separated\n by a ':', e.g.: MISEQ:14:000000000-A55D0:1:1101:18024:2858_1:N:0:CTCCT\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
935 print "--bam\t\t\tThe output will be written out in BAM format instead of the default SAM format. This script will\n\t\t\tattempt to use the path to Samtools that was specified with '--samtools_path', or, if it hasn't\n\t\t\tbeen specified, attempt to find Samtools in the PATH. If no installation of Samtools can be found,\n\t\t\tthe SAM output will be compressed with GZIP instead (yielding a .sam.gz output file)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
936 print "--samtools_path\t\tThe path to your Samtools installation, e.g. /home/user/samtools/. Does not need to be specified\n\t\t\texplicitly if Samtools is in the PATH already\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
937 print "--version\t\tPrint version information and exit\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
938
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
939 print '='x111,"\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
940
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
941 print "This script was last modified on November 04, 2015\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
942 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
943
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
944
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
945
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
946
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
947 sub test_positional_sorting{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
948
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
949 my $filename = shift;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
950
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
951 print "\nNow testing Bismark result file $filename for positional sorting (which would be bad...)\t";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
952 sleep(1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
953
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
954 if ($filename =~ /\.gz$/) {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
955 open (TEST,"gunzip -c $filename |") or die "Can't open gzipped file $filename: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
956 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
957 elsif ($filename =~ /bam$/ || isBam($filename) ){ ### this would allow to read BAM files that do not end in *.bam
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
958 if ($samtools_path){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
959 open (TEST,"$samtools_path view -h $filename |") or die "Can't open BAM file $filename: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
960 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
961 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
962 die "Sorry couldn't find an installation of Samtools. Either specifiy an alternative path using the option '--samtools_path /your/path/', or use a SAM file instead\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
963 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
964 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
965 else {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
966 open (TEST,$filename) or die "Can't open file $filename: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
967 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
968
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
969 my $count = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
970
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
971 while (<TEST>) {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
972 if (/^\@/) { # testing header lines if they contain the @SO flag (for being sorted)
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
973 if (/^\@SO/) {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
974 die "SAM/BAM header line '$_' indicates that the Bismark aligment file has been sorted by chromosomal positions which is is incompatible with correct methylation extraction. Please use an unsorted file instead (e.g. use samtools sort -n)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
975 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
976 next;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
977 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
978 $count++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
979
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
980 last if ($count > 100000); # else we test the first 100000 sequences if they start with the same read ID
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
981
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
982 my ($id_1) = (split (/\t/));
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
983
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
984 ### reading the next line which should be read 2
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
985 $_ = <TEST>;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
986 my ($id_2) = (split (/\t/));
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
987 last unless ($id_2);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
988 ++$count;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
989
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
990 if ($id_1 eq $id_2){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
991 ### ids are the same
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
992 next;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
993 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
994 else{ ### in previous versions of Bismark we appended /1 and /2 to the read IDs for easier eyeballing which read is which. These tags need to be removed first
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
995 my $id_1_trunc = $id_1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
996 $id_1_trunc =~ s/\/1$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
997 my $id_2_trunc = $id_2;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
998 $id_2_trunc =~ s/\/2$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
999
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1000 unless ($id_1_trunc eq $id_2_trunc){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1001 die "\nThe IDs of Read 1 ($id_1) and Read 2 ($id_2) are not the same. This might be a result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead (e.g. use samtools sort -n)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1002 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1003 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1004 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1005 # close TEST or die $!; somehow fails on our cluster...
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1006 ### If it hasen't died so far then it seems the file is in the correct Bismark format (read 1 and read 2 of a pair directly following each other)
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1007 warn "...passed!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1008 sleep(1);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1009
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1010 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1011
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1012 sub isBam{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1013
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1014 my $filename = shift;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1015
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1016 # reading the first line of the input file to see if it is a BAM file in disguise (i.e. a BAM file that does not end in *.bam which may be produced by Galaxy)
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1017 open (DISGUISE,"gunzip -c $filename |") or die "Failed to open filehandle DISGUISE for $filename\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1018
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1019 ### when BAM files read through a gunzip -c stream they start with BAM...
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1020 my $bam_in_disguise = <DISGUISE>;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1021 # warn "BAM in disguise: $bam_in_disguise\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1022
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1023 if ($bam_in_disguise){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1024 if ($bam_in_disguise =~ /^BAM/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1025 close (DISGUISE) or warn "Had trouble closing filehandle BAM in disguise: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1026 return 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1027 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1028 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1029 close (DISGUISE) or warn "Had trouble closing filehandle BAM in disguise: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1030 return 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1031 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1032 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1033 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1034 close (DISGUISE) or warn "Had trouble closing filehandle BAM in disguise: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1035 return 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1036 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1037 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1038
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1039
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1040
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1041
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1042
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1043 #####################################
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1044
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1045 ### The following subroutine "deduplicate_representative" only works correctly for reads that do not contain any indels (as in only Bowtie1-based alignments).
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1046 ### Please refrain from using it unless you want to test something out.
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1047
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1048 #####################################
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1049
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1050 sub deduplicate_representative {
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1051 my $file = shift;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1052
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1053 my %positions;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1054 my %unique_seqs;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1055
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1056 my $count = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1057 my $unique_seqs = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1058 my $removed = 0;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1059
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1060 ### going through the file first and storing all positions as well as their methylation call strings in a hash
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1061 if ($file =~ /\.gz$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1062 open (IN,"gunzip -c $file |") or die "Unable to read from gzipped file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1063 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1064 elsif ($file =~ /\.bam$/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1065 open (IN,"$samtools_path view -h $file |") or die "Unable to read from BAM file $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1066 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1067 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1068 open (IN,$file) or die "Unable to read from $file: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1069 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1070
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1071 if ($single){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1072
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1073 my $outfile = $file;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1074 $outfile =~ s/\.gz$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1075 $outfile =~ s/\.sam$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1076 $outfile =~ s/\.bam$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1077 $outfile =~ s/\.txt$//;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1078
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1079 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1080 $outfile =~ s/$/.deduplicated_to_representative_sequences.txt/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1081 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1082 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1083 if ($bam == 1){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1084 $outfile =~ s/$/.deduplicated_to_representative_sequences.bam/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1085 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1086 elsif ($bam == 2){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1087 $outfile =~ s/$/.deduplicated_to_representative_sequences.sam.gz/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1088 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1089 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1090 $outfile =~ s/$/.deduplicated_to_representative_sequences.sam/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1091 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1092 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1093
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1094 if ($bam == 1){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1095 open (OUT,"| $samtools_path view -bSh 2>/dev/null - > $outfile") or die "Failed to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1096 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1097 elsif($bam == 2){ ### no Samtools found on system. Using GZIP compression instead
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1098 open (OUT,"| gzip -c - > $outfile") or die "Failed to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1099 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1100 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1101 open (OUT,'>',$outfile) or die "Unable to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1102 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1103
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1104 warn "Reading and storing all alignment positions\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1105
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1106 ### need to proceed slightly differently for the custom Bismark and Bismark SAM output
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1107 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1108 $_ = <IN>; # Bismark version header
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1109 print OUT; # Printing the Bismark version to the de-duplicated file again
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1110 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1111
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1112 while (<IN>){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1113
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1114 if ($count == 0){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1115 if ($_ =~ /^Bismark version:/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1116 warn "The file appears to be in the custom Bismark and not SAM format. Please see option --vanilla!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1117 sleep (2);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1118 print_helpfile();
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1119 exit;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1120 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1121 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1122
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1123 ### if this was a SAM file we ignore header lines
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1124 unless ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1125 if (/^\@\w{2}\t/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1126 warn "skipping SAM header line:\t$_";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1127 print OUT; # Printing the header lines again into the de-duplicated file
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1128 next;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1129 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1130 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1131
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1132 my ($strand,$chr,$start,$end,$meth_call);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1133
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1134 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1135 ($strand,$chr,$start,$end,$meth_call) = (split (/\t/))[1,2,3,4,7];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1136 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1137 else{ # SAM format
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1138
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1139 ($strand,$chr,$start,my $seq,$meth_call) = (split (/\t/))[1,2,3,9,13]; # we are assigning the FLAG value to $strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1140 ### SAM single-end
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1141 $end = $start + length($seq) - 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1142 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1143
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1144 my $composite = join (":",$strand,$chr,$start,$end);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1145
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1146 $count++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1147 $positions{$composite}->{$meth_call}->{count}++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1148 $positions{$composite}->{$meth_call}->{alignment} = $_;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1149 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1150 warn "Stored ",scalar keys %positions," different positions for $count sequences in total (+ and - alignments to the same position are scored individually)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1151 close IN or warn $!;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1152 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1153
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1154 elsif ($paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1155
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1156 ### we are going to concatenate both methylation call strings from the paired end file to form a joint methylation call string
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1157
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1158 my $outfile = $file;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1159 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1160 $outfile =~ s/$/_deduplicated_to_representative_sequences_pe.txt/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1161 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1162 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1163 $outfile =~ s/$/_deduplicated_to_representative_sequences_pe.sam/;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1164 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1165
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1166 open (OUT,'>',$outfile) or die "Unable to write to $outfile: $!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1167 warn "Reading and storing all alignment positions\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1168
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1169 ### need to proceed slightly differently for the custom Bismark and Bismark SAM output
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1170 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1171 $_ = <IN>; # Bismark version header
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1172 print OUT; # Printing the Bismark version to the de-duplicated file again
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1173 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1174
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1175 while (<IN>){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1176
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1177 if ($count == 0){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1178 if ($_ =~ /^Bismark version:/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1179 warn "The file appears to be in the custom Bismark and not SAM format. Please see option --vanilla!\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1180 sleep (2);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1181 print_helpfile();
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1182 exit;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1183 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1184 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1185
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1186 ### if this was a SAM file we ignore header lines
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1187 unless ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1188 if (/^\@\w{2}\t/){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1189 warn "skipping SAM header line:\t$_";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1190 print OUT; # Printing the header lines again into the de-duplicated file
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1191 next;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1192 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1193 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1194
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1195 my ($strand,$chr,$start,$end,$meth_call_1,$meth_call_2);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1196 my $line1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1197
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1198 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1199 ($strand,$chr,$start,$end,$meth_call_1,$meth_call_2) = (split (/\t/))[1,2,3,4,7,10];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1200 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1201 else{ # SAM paired-end format
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1202
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1203 ($strand,$chr,$start,$meth_call_1) = (split (/\t/))[1,2,3,13]; # we are assigning the FLAG value to $strand
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1204
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1205 ### storing the first line (= read 1 alignment)
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1206 $line1 = $_;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1207
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1208 ### reading in the next line
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1209 $_ = <IN>;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1210 # we only need the end position and the methylation call
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1211 (my $pos,my $seq_2,$meth_call_2) = (split (/\t/))[3,9,13];
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1212 $end = $pos + length($seq_2) - 1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1213 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1214
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1215 my $composite = join (":",$strand,$chr,$start,$end);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1216 $count++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1217 my $meth_call = $meth_call_1.$meth_call_2;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1218
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1219 $positions{$composite}->{$meth_call}->{count}++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1220 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1221 $positions{$composite}->{$meth_call}->{alignment} = $_;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1222 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1223 else{ # SAM PAIRED-END
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1224 $positions{$composite}->{$meth_call}->{alignment_1} = $line1;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1225 $positions{$composite}->{$meth_call}->{alignment_2} = $_;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1226 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1227 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1228 warn "Stored ",scalar keys %positions," different positions for $count sequences in total (+ and - alignments to the same position are scored individually)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1229 close IN or warn $!;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1230 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1231
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1232 ### PRINTING RESULTS
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1233
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1234 ### Now going through all stored positions and printing out the methylation call which is most representative, i.e. the one which occurred most often
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1235 warn "Now printing out alignments with the most representative methylation call(s)\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1236
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1237 foreach my $pos (keys %positions){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1238 foreach my $meth_call (sort { $positions{$pos}->{$b}->{count} <=> $positions{$pos}->{$a}->{count} }keys %{$positions{$pos}}){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1239 if ($paired){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1240 if ($vanilla){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1241 print OUT $positions{$pos}->{$meth_call}->{alignment};
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1242 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1243 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1244 print OUT $positions{$pos}->{$meth_call}->{alignment_1}; # SAM read 1
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1245 print OUT $positions{$pos}->{$meth_call}->{alignment_2}; # SAM read 2
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1246 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1247 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1248 else{ # single-end
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1249 print OUT $positions{$pos}->{$meth_call}->{alignment};
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1250 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1251 $unique_seqs++;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1252 last; ### exiting once we printed a sequence with the most frequent methylation call for a position
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1253 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1254 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1255
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1256 my $percentage;
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1257 unless ($count == 0){
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1258 $percentage = sprintf ("%.2f",$unique_seqs*100/$count);
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1259 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1260 else{
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1261 $percentage = 'N/A';
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1262 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1263
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1264 warn "\nTotal number of alignments analysed in $file:\t$count\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1265 warn "Total number of representative alignments printed from $file in total:\t$unique_seqs ($percentage%)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1266 print REPORT "\nTotal number of alignments analysed in $file:\t$count\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1267 print REPORT "Total number of representative alignments printed from $file in total:\t$unique_seqs ($percentage%)\n\n";
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1268
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1269 }
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1270
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1271
fcadce4d9a06 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit b'e6ee273f75fff61d1e419283fa8088528cf59470\n'
bgruening
parents:
diff changeset
1272