view TopHit_namefilter/TopHit_namefilter.xml @ 0:9f1fe290345e default tip

Migrated tool version 0.1.Alx from old tool shed archive to new tool shed repository
author abossers
date Tue, 07 Jun 2011 18:07:34 -0400
parents
children
line wrap: on
line source

<tool id="TopHit_namefilter" name="TopHit filter" version="0.1.Alx">
  <description>Simple filter to keep N occurrences of lines in a file</description>
  <command interpreter="perl">
            TopHit_namefilter_galaxy.pl
                $input
				$column
				"$splitter"
				$hits
				$output_file
				<!-- 2&gt;$logfile -->
  </command>
  <inputs>
   <param name="input" type="data" format="tabular,txt" label="Input tabular or plain text file" />
   <param name="column" type="integer" size="4" value="1" label="Column number to use after the split!" />
   <param name="splitter" type="text" size="10" value="\t" label="Splitter character/code to use" help="See help below for advanced options and how to use {pipe}" >
		<sanitizer>
			<valid>
				<add value="\"/>
				<add value=">"/>
				<add value="%"/>
				<add value="|"/>
			</valid>
		</sanitizer>
   </param>
   <param name="hits" type="integer" size="4" value="1" label="Number of occurrences to keep" help="They will not be sorted!" />
  </inputs>
  <outputs>
    <data name="output_file" format="input" label="Filtered table/text" />
  </outputs>
  <tests>
  </tests>
  <help>
**What it does**

TopHit_namefilter is a SIMPLE filter to keep just the TOPHIT / first [N] occurrence(s) of some identifier
useful for keeping only the first N tophits in blast when multiple hits were returned (and you don't want to rerun the BLAST analysis).

Please be aware that NO additional filtering or checking is done on for instance E values of BLAST hits.
Tophit = FIRST hit...not necessarily the best.. If multiple hits are selected to be returned
they will NOT be sorted (see below example of a number of 2 hits occurring somewhere else in the input
and therefore in the output file).

**Comments/feedback** on the Perl script or GALAXY wrapper: alex.bossers@wur.nl

-----

**Note!** Beware the special use of splitters! Especially if you want to use special characters that have a "perl" split
meaning. They need to be escaped by a leading \\.

Examples of splitters before filtering (end result will remain the ORIGINAL unsplit line!):

::

  Splitter   Meaning                           Example line to split          Split result for filtering only!
  --------   -------------------------------   -----------------------        --------------------------------
    \t       Single tab                        Foo&lt;tab&gt;Bar&lt;tab&gt;here    ---&gt;   Foo          Bar        here
    \|       Single pipe                       Foo&lt;tab&gt;Bar|here        ---&gt;   Foo&lt;tab&gt;Bar  here
    -        Single dash                       Foo-Bar                 ---&gt;   Foo          Bar
    -|\|     Combined splits on dash OR pipe   Foo-Bar|here            ---&gt;   Foo          Bar        here


-----

**EXAMPLE**

Parameters: Column = 1, **hits = 2** and splitter = \\t 

**Input**

Any text/tabular file:

::

   Q3262-21	gi|71066702|gb|AE016828.2|	tja..here something extra
   Q3262-23	gi|71066702|gb|AE016828.2|	okay
   Q3262-24	gi|71066702|gb|AE016828.2| nothing there
   Q3262-21	gi|71066702|gb|AE016828.2| enhier	was zonder space :)
   Q3262-26	gi|71066702|gb|AE016828.2|	or still
   Q3262-21	gi|71066702|gb|AE016828.2|
   Q3262-21	gi|71066702|gb|AE016828.2|
   Q3262-21	gi|71066702|gb|AE016828.2|
   Q3262-21	gi|71066702|gb|AE016828.2|
   Q3262-21	gi|145004|gb|M80806.1|COXTRANSPO
   Q3262-21	gi|144996|gb|M20482.1|COXHSPAB
   Q3262-21	gi|161761570|gb|CP000890.1|
   Q3262-30	gi|161761570|gb|CP000890.1|
   Q3262-21	gi|161761570|gb|CP000890.1|
   Q3262-21	gi|161761570|gb|CP000890.1|
   Q3262-21	gi|161761570|gb|CP000890.1|


**Outputs**

::

   Q3262-21	gi|71066702|gb|AE016828.2|	tja..here something extra
   Q3262-23	gi|71066702|gb|AE016828.2|	okay
   Q3262-21	gi|71066702|gb|AE016828.2| enhier	was zonder space :)
   Q3262-24	gi|71066702|gb|AE016828.2| nothing there
   Q3262-26	gi|71066702|gb|AE016828.2|	or still
   Q3262-30	gi|161761570|gb|CP000890.1|

-----

Please acknowledge our work when you find it useful!

|


  </help>
</tool>