Mercurial > repos > abossers > tophit_namefilter
view TopHit_namefilter/TopHit_namefilter.xml @ 0:9f1fe290345e default tip
Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
author | abossers |
---|---|
date | Tue, 07 Jun 2011 18:07:34 -0400 |
parents | |
children |
line wrap: on
line source
<tool id="TopHit_namefilter" name="TopHit filter" version="0.1.Alx"> <description>Simple filter to keep N occurrences of lines in a file</description> <command interpreter="perl"> TopHit_namefilter_galaxy.pl $input $column "$splitter" $hits $output_file <!-- 2>$logfile --> </command> <inputs> <param name="input" type="data" format="tabular,txt" label="Input tabular or plain text file" /> <param name="column" type="integer" size="4" value="1" label="Column number to use after the split!" /> <param name="splitter" type="text" size="10" value="\t" label="Splitter character/code to use" help="See help below for advanced options and how to use {pipe}" > <sanitizer> <valid> <add value="\"/> <add value=">"/> <add value="%"/> <add value="|"/> </valid> </sanitizer> </param> <param name="hits" type="integer" size="4" value="1" label="Number of occurrences to keep" help="They will not be sorted!" /> </inputs> <outputs> <data name="output_file" format="input" label="Filtered table/text" /> </outputs> <tests> </tests> <help> **What it does** TopHit_namefilter is a SIMPLE filter to keep just the TOPHIT / first [N] occurrence(s) of some identifier useful for keeping only the first N tophits in blast when multiple hits were returned (and you don't want to rerun the BLAST analysis). Please be aware that NO additional filtering or checking is done on for instance E values of BLAST hits. Tophit = FIRST hit...not necessarily the best.. If multiple hits are selected to be returned they will NOT be sorted (see below example of a number of 2 hits occurring somewhere else in the input and therefore in the output file). **Comments/feedback** on the Perl script or GALAXY wrapper: alex.bossers@wur.nl ----- **Note!** Beware the special use of splitters! Especially if you want to use special characters that have a "perl" split meaning. They need to be escaped by a leading \\. Examples of splitters before filtering (end result will remain the ORIGINAL unsplit line!): :: Splitter Meaning Example line to split Split result for filtering only! -------- ------------------------------- ----------------------- -------------------------------- \t Single tab Foo<tab>Bar<tab>here ---> Foo Bar here \| Single pipe Foo<tab>Bar|here ---> Foo<tab>Bar here - Single dash Foo-Bar ---> Foo Bar -|\| Combined splits on dash OR pipe Foo-Bar|here ---> Foo Bar here ----- **EXAMPLE** Parameters: Column = 1, **hits = 2** and splitter = \\t **Input** Any text/tabular file: :: Q3262-21 gi|71066702|gb|AE016828.2| tja..here something extra Q3262-23 gi|71066702|gb|AE016828.2| okay Q3262-24 gi|71066702|gb|AE016828.2| nothing there Q3262-21 gi|71066702|gb|AE016828.2| enhier was zonder space :) Q3262-26 gi|71066702|gb|AE016828.2| or still Q3262-21 gi|71066702|gb|AE016828.2| Q3262-21 gi|71066702|gb|AE016828.2| Q3262-21 gi|71066702|gb|AE016828.2| Q3262-21 gi|71066702|gb|AE016828.2| Q3262-21 gi|145004|gb|M80806.1|COXTRANSPO Q3262-21 gi|144996|gb|M20482.1|COXHSPAB Q3262-21 gi|161761570|gb|CP000890.1| Q3262-30 gi|161761570|gb|CP000890.1| Q3262-21 gi|161761570|gb|CP000890.1| Q3262-21 gi|161761570|gb|CP000890.1| Q3262-21 gi|161761570|gb|CP000890.1| **Outputs** :: Q3262-21 gi|71066702|gb|AE016828.2| tja..here something extra Q3262-23 gi|71066702|gb|AE016828.2| okay Q3262-21 gi|71066702|gb|AE016828.2| enhier was zonder space :) Q3262-24 gi|71066702|gb|AE016828.2| nothing there Q3262-26 gi|71066702|gb|AE016828.2| or still Q3262-30 gi|161761570|gb|CP000890.1| ----- Please acknowledge our work when you find it useful! | </help> </tool>