Mercurial > repos > abossers > tophit_namefilter
annotate TopHit_namefilter/TopHit_namefilter_galaxy.pl @ 0:9f1fe290345e default tip
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
author | abossers |
---|---|
date | Tue, 07 Jun 2011 18:07:34 -0400 |
parents | |
children |
rev | line source |
---|---|
0
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
1 #!/usr/bin/perl -w |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
2 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
3 # Simple filter to keep just the TOPHIT / first occurrence of some identifier |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
4 # usefull for keeping only the first tophit in blast when multiple hits are returned |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
5 # |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
6 # Please be aware that NO additional filtering or checking is done on for instance |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
7 # E values of BLAST hits. Tophit = FIRST hit...not necessarily the best.. |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
8 # |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
9 # input list/table having some groupable identifier |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
10 # input the column number to filter on (column number starts at 1) |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
11 # input number of occurrences to keep |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
12 # note that the hits are displayed in order of occurrence |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
13 # and NOT sorted on given column! |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
14 # column splitter (default TAB) |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
15 # Note that: splitting on tab: \t |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
16 # splitting on pipe: \| |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
17 # combined splits: -|\| (splits on '-' OR '|') |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
18 # |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
19 # output the same table having only the FIRST occurrence of the identifier. |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
20 # |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
21 # alex.bossers@wur.nl |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
22 # |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
23 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
24 my $version = "v0.13.alx 19-5-2011"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
25 # Version history |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
26 # 0.13 19-05-2011 added extra cmdline opt hits to keep -> first galaxy version |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
27 # 0.12 19-05-2011 mods to fit initial needs. Not distributed. |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
28 # 0.1 xx-xx-2010 template |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
29 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
30 use strict; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
31 use warnings; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
32 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
33 #cmd line options |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
34 if (!$ARGV[4]) { |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
35 warn "Error: not enough arguments\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
36 usage(); |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
37 } |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
38 my ($input) = $ARGV[0] =~ m/^([A-Z0-9_.\-\/]+)$/ig; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
39 my $column = $ARGV[1]; # column numbers start at 1! |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
40 my $splitter = $ARGV[2]; # splitter for fields to use (might need enclosing "") |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
41 my $hits = $ARGV[3]; # number of occurences to keep |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
42 my ($output) = $ARGV[4] =~ m/^([A-Z0-9_.\-\/]+)$/ig; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
43 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
44 if ($column <1 || $hits < 1){warn "Invalid column/hits number\n";usage();} |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
45 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
46 #keeping track |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
47 my $entrycounter = 0; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
48 my $filter_count = 0; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
49 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
50 #open the files |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
51 open (IN,$input) || die "Input file error: $!\n" ; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
52 open (OUT, ">$output") || die "Output file error: $!\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
53 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
54 #read file into hash having KEY equal to column data specified |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
55 my %filtered; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
56 while (<IN>){ |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
57 chomp; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
58 my $line = $_; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
59 my @fields = split($splitter,$line); |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
60 #print "@fields\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
61 $entrycounter++; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
62 if (exists $filtered{$fields[$column-1]}){ |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
63 if ($filtered{$fields[$column-1]} < $hits){ |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
64 #number of occurrences to keep |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
65 print OUT "$line\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
66 $filtered{$fields[$column-1]}++; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
67 $filter_count++; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
68 } |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
69 next; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
70 } |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
71 else { |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
72 $filtered{$fields[$column-1]} = "1"; #first occurrence |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
73 print OUT "$line\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
74 #print "key: $fields[$column-1]\tLine: $line\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
75 $filter_count++; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
76 } |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
77 } |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
78 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
79 #end and close |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
80 close (IN); |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
81 close (OUT); |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
82 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
83 print "\nVersion : $version\nComments/bugs : alex.bossers\@wur.nl\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
84 print "Processed : $entrycounter entries\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
85 print "Filtered : $filter_count entries remain\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
86 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
87 sub usage { |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
88 warn "\nVersion: $version\nContact/bugs: alex.bossers\@wur.nl\n"; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
89 my ($cmd) = $0 =~ m/([A-Z0-9_.-]+)$/ig; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
90 die <<EOF; |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
91 usage: $cmd <infile> <column> <splitter> <outfile> |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
92 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
93 INPUT: infile Input original tabular/text |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
94 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
95 column Input column number to use (>= 1) |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
96 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
97 splitter Splitter char to use (i.e. \t for tab) |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
98 For splitting on pipe use escaping: \| |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
99 Combined splits possible: -|\| splits both on - as | |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
100 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
101 hits Number of hits to keep (in chronological order) |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
102 The results are NOT sorted! |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
103 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
104 OUTPUT: outfile Output filename of filtered table. |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
105 |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
106 EOF |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
107 } |
9f1fe290345e
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
abossers
parents:
diff
changeset
|
108 #end script |