Galaxy | Tool Preview

TopHit filter (version 0.1.Alx)
See help below for advanced options and how to use {pipe}
They will not be sorted!

What it does

TopHit_namefilter is a SIMPLE filter to keep just the TOPHIT / first [N] occurrence(s) of some identifier useful for keeping only the first N tophits in blast when multiple hits were returned (and you don't want to rerun the BLAST analysis).

Please be aware that NO additional filtering or checking is done on for instance E values of BLAST hits. Tophit = FIRST hit...not necessarily the best.. If multiple hits are selected to be returned they will NOT be sorted (see below example of a number of 2 hits occurring somewhere else in the input and therefore in the output file).

Comments/feedback on the Perl script or GALAXY wrapper: alex.bossers@wur.nl


Note! Beware the special use of splitters! Especially if you want to use special characters that have a "perl" split meaning. They need to be escaped by a leading \.

Examples of splitters before filtering (end result will remain the ORIGINAL unsplit line!):

Splitter   Meaning                           Example line to split          Split result for filtering only!
--------   -------------------------------   -----------------------        --------------------------------
  \t       Single tab                        Foo<tab>Bar<tab>here    --->   Foo          Bar        here
  \|       Single pipe                       Foo<tab>Bar|here        --->   Foo<tab>Bar  here
  -        Single dash                       Foo-Bar                 --->   Foo          Bar
  -|\|     Combined splits on dash OR pipe   Foo-Bar|here            --->   Foo          Bar        here

EXAMPLE

Parameters: Column = 1, hits = 2 and splitter = \t

Input

Any text/tabular file:

Q3262-21     gi|71066702|gb|AE016828.2|      tja..here something extra
Q3262-23     gi|71066702|gb|AE016828.2|      okay
Q3262-24     gi|71066702|gb|AE016828.2| nothing there
Q3262-21     gi|71066702|gb|AE016828.2| enhier       was zonder space :)
Q3262-26     gi|71066702|gb|AE016828.2|      or still
Q3262-21     gi|71066702|gb|AE016828.2|
Q3262-21     gi|71066702|gb|AE016828.2|
Q3262-21     gi|71066702|gb|AE016828.2|
Q3262-21     gi|71066702|gb|AE016828.2|
Q3262-21     gi|145004|gb|M80806.1|COXTRANSPO
Q3262-21     gi|144996|gb|M20482.1|COXHSPAB
Q3262-21     gi|161761570|gb|CP000890.1|
Q3262-30     gi|161761570|gb|CP000890.1|
Q3262-21     gi|161761570|gb|CP000890.1|
Q3262-21     gi|161761570|gb|CP000890.1|
Q3262-21     gi|161761570|gb|CP000890.1|

Outputs

Q3262-21     gi|71066702|gb|AE016828.2|      tja..here something extra
Q3262-23     gi|71066702|gb|AE016828.2|      okay
Q3262-21     gi|71066702|gb|AE016828.2| enhier       was zonder space :)
Q3262-24     gi|71066702|gb|AE016828.2| nothing there
Q3262-26     gi|71066702|gb|AE016828.2|      or still
Q3262-30     gi|161761570|gb|CP000890.1|

Please acknowledge our work when you find it useful!