annotate TopHit_namefilter/TopHit_namefilter_galaxy.pl @ 0:9f1fe290345e default tip

Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
author abossers
date Tue, 07 Jun 2011 18:07:34 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
1 #!/usr/bin/perl -w
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
2
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
3 # Simple filter to keep just the TOPHIT / first occurrence of some identifier
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
4 # usefull for keeping only the first tophit in blast when multiple hits are returned
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
5 #
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
6 # Please be aware that NO additional filtering or checking is done on for instance
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
7 # E values of BLAST hits. Tophit = FIRST hit...not necessarily the best..
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
8 #
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
9 # input list/table having some groupable identifier
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
10 # input the column number to filter on (column number starts at 1)
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
11 # input number of occurrences to keep
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
12 # note that the hits are displayed in order of occurrence
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
13 # and NOT sorted on given column!
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
14 # column splitter (default TAB)
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
15 # Note that: splitting on tab: \t
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
16 # splitting on pipe: \|
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
17 # combined splits: -|\| (splits on '-' OR '|')
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
18 #
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
19 # output the same table having only the FIRST occurrence of the identifier.
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
20 #
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
21 # alex.bossers@wur.nl
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
22 #
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
23
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
24 my $version = "v0.13.alx 19-5-2011";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
25 # Version history
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
26 # 0.13 19-05-2011 added extra cmdline opt hits to keep -> first galaxy version
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
27 # 0.12 19-05-2011 mods to fit initial needs. Not distributed.
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
28 # 0.1 xx-xx-2010 template
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
29
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
30 use strict;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
31 use warnings;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
32
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
33 #cmd line options
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
34 if (!$ARGV[4]) {
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
35 warn "Error: not enough arguments\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
36 usage();
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
37 }
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
38 my ($input) = $ARGV[0] =~ m/^([A-Z0-9_.\-\/]+)$/ig;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
39 my $column = $ARGV[1]; # column numbers start at 1!
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
40 my $splitter = $ARGV[2]; # splitter for fields to use (might need enclosing "")
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
41 my $hits = $ARGV[3]; # number of occurences to keep
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
42 my ($output) = $ARGV[4] =~ m/^([A-Z0-9_.\-\/]+)$/ig;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
43
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
44 if ($column <1 || $hits < 1){warn "Invalid column/hits number\n";usage();}
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
45
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
46 #keeping track
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
47 my $entrycounter = 0;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
48 my $filter_count = 0;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
49
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
50 #open the files
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
51 open (IN,$input) || die "Input file error: $!\n" ;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
52 open (OUT, ">$output") || die "Output file error: $!\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
53
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
54 #read file into hash having KEY equal to column data specified
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
55 my %filtered;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
56 while (<IN>){
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
57 chomp;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
58 my $line = $_;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
59 my @fields = split($splitter,$line);
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
60 #print "@fields\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
61 $entrycounter++;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
62 if (exists $filtered{$fields[$column-1]}){
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
63 if ($filtered{$fields[$column-1]} < $hits){
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
64 #number of occurrences to keep
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
65 print OUT "$line\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
66 $filtered{$fields[$column-1]}++;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
67 $filter_count++;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
68 }
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
69 next;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
70 }
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
71 else {
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
72 $filtered{$fields[$column-1]} = "1"; #first occurrence
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
73 print OUT "$line\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
74 #print "key: $fields[$column-1]\tLine: $line\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
75 $filter_count++;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
76 }
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
77 }
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
78
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
79 #end and close
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
80 close (IN);
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
81 close (OUT);
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
82
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
83 print "\nVersion : $version\nComments/bugs : alex.bossers\@wur.nl\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
84 print "Processed : $entrycounter entries\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
85 print "Filtered : $filter_count entries remain\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
86
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
87 sub usage {
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
88 warn "\nVersion: $version\nContact/bugs: alex.bossers\@wur.nl\n";
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
89 my ($cmd) = $0 =~ m/([A-Z0-9_.-]+)$/ig;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
90 die <<EOF;
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
91 usage: $cmd <infile> <column> <splitter> <outfile>
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
92
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
93 INPUT: infile Input original tabular/text
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
94
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
95 column Input column number to use (>= 1)
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
96
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
97 splitter Splitter char to use (i.e. \t for tab)
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
98 For splitting on pipe use escaping: \|
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
99 Combined splits possible: -|\| splits both on - as |
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
100
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
101 hits Number of hits to keep (in chronological order)
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
102 The results are NOT sorted!
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
103
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
104 OUTPUT: outfile Output filename of filtered table.
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
105
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
106 EOF
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
107 }
9f1fe290345e Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff changeset
108 #end script