What it does
TopHit_namefilter is a SIMPLE filter to keep just the TOPHIT / first [N] occurrence(s) of some identifier useful for keeping only the first N tophits in blast when multiple hits were returned (and you don't want to rerun the BLAST analysis).
Please be aware that NO additional filtering or checking is done on for instance E values of BLAST hits. Tophit = FIRST hit...not necessarily the best.. If multiple hits are selected to be returned they will NOT be sorted (see below example of a number of 2 hits occurring somewhere else in the input and therefore in the output file).
Comments/feedback on the Perl script or GALAXY wrapper: alex.bossers@wur.nl
Note! Beware the special use of splitters! Especially if you want to use special characters that have a "perl" split meaning. They need to be escaped by a leading \.
Examples of splitters before filtering (end result will remain the ORIGINAL unsplit line!):
Splitter Meaning Example line to split Split result for filtering only! -------- ------------------------------- ----------------------- -------------------------------- \t Single tab Foo<tab>Bar<tab>here ---> Foo Bar here \| Single pipe Foo<tab>Bar|here ---> Foo<tab>Bar here - Single dash Foo-Bar ---> Foo Bar -|\| Combined splits on dash OR pipe Foo-Bar|here ---> Foo Bar here
EXAMPLE
Parameters: Column = 1, hits = 2 and splitter = \t
Input
Any text/tabular file:
Q3262-21 gi|71066702|gb|AE016828.2| tja..here something extra Q3262-23 gi|71066702|gb|AE016828.2| okay Q3262-24 gi|71066702|gb|AE016828.2| nothing there Q3262-21 gi|71066702|gb|AE016828.2| enhier was zonder space :) Q3262-26 gi|71066702|gb|AE016828.2| or still Q3262-21 gi|71066702|gb|AE016828.2| Q3262-21 gi|71066702|gb|AE016828.2| Q3262-21 gi|71066702|gb|AE016828.2| Q3262-21 gi|71066702|gb|AE016828.2| Q3262-21 gi|145004|gb|M80806.1|COXTRANSPO Q3262-21 gi|144996|gb|M20482.1|COXHSPAB Q3262-21 gi|161761570|gb|CP000890.1| Q3262-30 gi|161761570|gb|CP000890.1| Q3262-21 gi|161761570|gb|CP000890.1| Q3262-21 gi|161761570|gb|CP000890.1| Q3262-21 gi|161761570|gb|CP000890.1|
Outputs
Q3262-21 gi|71066702|gb|AE016828.2| tja..here something extra Q3262-23 gi|71066702|gb|AE016828.2| okay Q3262-21 gi|71066702|gb|AE016828.2| enhier was zonder space :) Q3262-24 gi|71066702|gb|AE016828.2| nothing there Q3262-26 gi|71066702|gb|AE016828.2| or still Q3262-30 gi|161761570|gb|CP000890.1|
Please acknowledge our work when you find it useful!