comparison readme.rst @ 0:5314e5d6f040 draft

Imported from capsule None
author bgruening
date Thu, 29 Jan 2015 07:53:17 -0500
parents
children 43b1f073b693
comparison
equal deleted inserted replaced
-1:000000000000 0:5314e5d6f040
1 Galaxy wrappers for common unix text-processing tools
2 =====================================================
3
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu )
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and
6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose
7 text manipulation tool to this repository.
8
9
10 Tools:
11 ------
12
13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
14 * sed - Stream Editor ( http://sed.sf.net )
15 * grep - Search files ( http://www.gnu.org/software/grep/ )
16 * sort_columns - Sorting every line according to there columns
17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
18
19 * sort - sort files
20 * join - join two files, based on common key field.
21 * cut - keep/discard fields from a file
22 * unsorted_uniq - keep unique/duplicated lines in a file
23 * sorted_uniq - keep unique/duplicated lines in a file
24 * head - keep the first X lines in a file.
25 * tail - keep the last X lines in a file.
26 * unfold_column - unfold a column with multiple entities into multiple lines
27
28
29 Few improvements over the standard tools:
30 -----------------------------------------
31
32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
34 * Find_and_Replace - Find/Replace text in a line or specific column.
35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
37
38
39 Requirements:
40 -------------
41
42 * Coreutils vesion 8.22 or later.
43 * AWK version 4.0.1 or later.
44 * SED version 4.2 *with* a special patch
45 * Grep with PCRE support
46
47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
48
49
50 -------------------
51 NOTE About Security
52 -------------------
53
54 The included tools are secure (barring unintentional bugs):
55 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
57 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
58
59 User trying to run an awk program similar to::
60
61 BEGIN { system("ls") }
62
63 Will get an error (in Galaxy) saying::
64
65 fatal: 'system' function not allowed in sandbox mode.
66
67 User trying to run a SED program similar to::
68
69 1els
70
71 will get an error (in Galaxy) saying::
72
73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
74
75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them.
76
77 ------------
78 Installation
79 ------------
80
81 Should be done via the Galaxy `Tool Shed`_.
82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
83
84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed
85
86
87 ----
88 TODO
89 ----
90
91 * add shuf, we can remove the random feature from sort and use shuf instead
92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes
93 * cut wrapper has some output conditional magic for interval files, that needs to be checked
94 * comm wrapper, see the Galaxy default one
95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them
96
97
98 -------
99 License
100 -------
101
102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu)
103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com)
104
105
106 Permission is hereby granted, free of charge, to any person obtaining
107 a copy of this software and associated documentation files (the
108 "Software"), to deal in the Software without restriction, including
109 without limitation the rights to use, copy, modify, merge, publish,
110 distribute, sublicense, and/or sell copies of the Software, and to
111 permit persons to whom the Software is furnished to do so, subject to
112 the following conditions:
113
114 The above copyright notice and this permission notice shall be
115 included in all copies or substantial portions of the Software.
116
117 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
118 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
119 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
120 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
121 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
122 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
123 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
124