Repository revision
repository tip
Select a revision to inspect and download versions of Galaxy utilities from this repository.

Repository text_processing
Owner: bgruening
Synopsis: High performance text processing tools using the GNU coreutils, sed, awk and friends.
That repository contains all kind of different text processing tools.

  * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
  * sed - Stream Editor ( https://www.gnu.org/software/sed/ )
  * grep - Search files ( http://www.gnu.org/software/grep/ )
  * sort_columns - Sorting every line according to there columns
  * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
  * sort - sort files * join - join two files, based on common key field.
  * cut - keep/discard fields from a file * unsorted_uniq - keep unique/duplicated lines in a file
  * sorted_uniq - keep unique/duplicated lines in a file * head - keep the first X lines in a file
  * tail - keep the last X lines in a file

  Originally known as "Unix Tools" and developed from Assaf Gordon @ Greg Hannon's lab ( http://hannonlab.cshl.edu )
  in Cold Spring Harbor Laboratory, it is now hosted under https://github.com/bgruening/galaxytools/tree/master/tools/text_processing
  and open for contributions. It will also replace several smaller sed, sort and uniq wrappers, developed over the time.
  Repository-Maintainer: Björn Grüning
Content homepage: https://www.gnu.org/software/
Type: unrestricted
Revision: 20:fbf99087e067
This revision can be installed: True
Times cloned / installed: 54357

Repository README files - may contain important installation or license information

Galaxy wrappers for common unix text-processing tools

The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu ) in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose text manipulation tool to this repository.

Tools:

  • awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
  • sed - Stream Editor ( http://sed.sf.net )
  • grep - Search files ( http://www.gnu.org/software/grep/ )
  • sort_columns - Sorting every line according to there columns
  • GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
  • sort - sort files
  • join - join two files, based on common key field.
  • cut - keep/discard fields from a file
  • unsorted_uniq - keep unique/duplicated lines in a file
  • sorted_uniq - keep unique/duplicated lines in a file
  • head - keep the first X lines in a file.
  • tail - keep the last X lines in a file.
  • unfold_column - unfold a column with multiple entities into multiple lines

Few improvements over the standard tools:

Requirements:

  • Coreutils vesion 8.22 or later.
  • AWK version 4.0.1 or later.
  • SED version 4.2 with a special patch
  • Grep with PCRE support

All dependencies will be installed automatically with the Galaxy Tool Shed and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing

NOTE About Security

The included tools are secure (barring unintentional bugs): The main concern might be executing system commands with awk's "system" and sed's "e" commands, or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. These commands are DISABLED using the "--sandbox" parameter to awk and sed.

User trying to run an awk program similar to:

BEGIN { system("ls") }

Will get an error (in Galaxy) saying:

fatal: 'system' function not allowed in sandbox mode.

User trying to run a SED program similar to:

1els

will get an error (in Galaxy) saying:

sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode

That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them.

Installation

Should be done via the Galaxy Tool Shed. Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing

TODO

  • add shuf, we can remove the random feature from sort and use shuf instead
  • move some advanced settings under a conditional, for example the cut tools offers to cut bytes
  • cut wrapper has some output conditional magic for interval files, that needs to be checked
  • comm wrapper, see the Galaxy default one
  • evaluate the join wrappers against the Galaxy ones, maybe we should drop them

License

  • Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu)
  • Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Contents of this repository

Name Description Version Minimum Galaxy Version
assuming sorted input file 9.3+galaxy1 23.1
reverse a file (reverse cat) 9.3+galaxy1 23.1
in a specific column 9.3+galaxy1 23.1
lines from a dataset (tail) 9.3+galaxy1 23.1
(combine multiple files) 9.3+galaxy1 23.1
9.3+galaxy1 23.1
according to their columns 9.3+galaxy1 23.1
parts of text 9.3+galaxy1 23.1
columns from a table (cut) 9.3+galaxy1 23.1
two files 9.3+galaxy1 23.1
data in ascending or descending order 9.3+galaxy1 23.1
occurrences of each record 9.3+galaxy1 23.1
in entire line 9.3+galaxy1 23.1
with recurring lines 9.3+galaxy1 23.1
with sed 9.3+galaxy1 23.1
with awk 9.3+galaxy1 23.1
columns from a table 9.3+galaxy1 23.1
lines from a dataset (head) 9.3+galaxy1 23.1
(grep) 9.3+galaxy1 23.1
tail-to-head (cat) 9.3+galaxy1 23.1

Categories
Text Manipulation - Tools for manipulating data