Mercurial > repos > bgruening > text_processing
diff readme.rst @ 0:5314e5d6f040 draft
Imported from capsule None
author | bgruening |
---|---|
date | Thu, 29 Jan 2015 07:53:17 -0500 |
parents | |
children | 43b1f073b693 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/readme.rst Thu Jan 29 07:53:17 2015 -0500 @@ -0,0 +1,124 @@ +Galaxy wrappers for common unix text-processing tools +===================================================== + +The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu ) +in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and +further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose +text manipulation tool to this repository. + + +Tools: +------ + + * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) + * sed - Stream Editor ( http://sed.sf.net ) + * grep - Search files ( http://www.gnu.org/software/grep/ ) + * sort_columns - Sorting every line according to there columns + * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): + + * sort - sort files + * join - join two files, based on common key field. + * cut - keep/discard fields from a file + * unsorted_uniq - keep unique/duplicated lines in a file + * sorted_uniq - keep unique/duplicated lines in a file + * head - keep the first X lines in a file. + * tail - keep the last X lines in a file. + * unfold_column - unfold a column with multiple entities into multiple lines + + +Few improvements over the standard tools: +----------------------------------------- + + * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) + * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) + * Find_and_Replace - Find/Replace text in a line or specific column. + * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. + * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) + + +Requirements: +------------- + + * Coreutils vesion 8.22 or later. + * AWK version 4.0.1 or later. + * SED version 4.2 *with* a special patch + * Grep with PCRE support + +All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing + + +------------------- +NOTE About Security +------------------- + +The included tools are secure (barring unintentional bugs): +The main concern might be executing system commands with awk's "system" and sed's "e" commands, +or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. +These commands are DISABLED using the "--sandbox" parameter to awk and sed. + +User trying to run an awk program similar to:: + + BEGIN { system("ls") } + +Will get an error (in Galaxy) saying:: + + fatal: 'system' function not allowed in sandbox mode. + +User trying to run a SED program similar to:: + + 1els + +will get an error (in Galaxy) saying:: + + sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode + +That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. + +------------ +Installation +------------ + +Should be done via the Galaxy `Tool Shed`_. +Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing + +.. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed + + +---- +TODO +---- + + * add shuf, we can remove the random feature from sort and use shuf instead + * move some advanced settings under a conditional, for example the cut tools offers to cut bytes + * cut wrapper has some output conditional magic for interval files, that needs to be checked + * comm wrapper, see the Galaxy default one + * evaluate the join wrappers against the Galaxy ones, maybe we should drop them + + +------- +License +------- + + * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) + * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) + + +Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +"Software"), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +