annotate readme.rst @ 0:5314e5d6f040 draft

Imported from capsule None
author bgruening
date Thu, 29 Jan 2015 07:53:17 -0500
parents
children 43b1f073b693
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
1 Galaxy wrappers for common unix text-processing tools
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
2 =====================================================
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
3
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
7 text manipulation tool to this repository.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
8
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
9
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
10 Tools:
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
11 ------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
12
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
14 * sed - Stream Editor ( http://sed.sf.net )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
15 * grep - Search files ( http://www.gnu.org/software/grep/ )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
16 * sort_columns - Sorting every line according to there columns
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
18
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
19 * sort - sort files
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
20 * join - join two files, based on common key field.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
21 * cut - keep/discard fields from a file
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
22 * unsorted_uniq - keep unique/duplicated lines in a file
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
23 * sorted_uniq - keep unique/duplicated lines in a file
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
24 * head - keep the first X lines in a file.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
25 * tail - keep the last X lines in a file.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
26 * unfold_column - unfold a column with multiple entities into multiple lines
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
27
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
28
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
29 Few improvements over the standard tools:
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
30 -----------------------------------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
31
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
34 * Find_and_Replace - Find/Replace text in a line or specific column.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
37
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
38
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
39 Requirements:
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
40 -------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
41
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
42 * Coreutils vesion 8.22 or later.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
43 * AWK version 4.0.1 or later.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
44 * SED version 4.2 *with* a special patch
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
45 * Grep with PCRE support
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
46
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
48
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
49
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
50 -------------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
51 NOTE About Security
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
52 -------------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
53
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
54 The included tools are secure (barring unintentional bugs):
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
55 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
57 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
58
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
59 User trying to run an awk program similar to::
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
60
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
61 BEGIN { system("ls") }
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
62
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
63 Will get an error (in Galaxy) saying::
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
64
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
65 fatal: 'system' function not allowed in sandbox mode.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
66
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
67 User trying to run a SED program similar to::
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
68
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
69 1els
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
70
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
71 will get an error (in Galaxy) saying::
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
72
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
74
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
76
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
77 ------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
78 Installation
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
79 ------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
80
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
81 Should be done via the Galaxy `Tool Shed`_.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
83
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
85
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
86
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
87 ----
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
88 TODO
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
89 ----
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
90
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
91 * add shuf, we can remove the random feature from sort and use shuf instead
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
93 * cut wrapper has some output conditional magic for interval files, that needs to be checked
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
94 * comm wrapper, see the Galaxy default one
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
96
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
97
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
98 -------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
99 License
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
100 -------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
101
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu)
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com)
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
104
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
105
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
106 Permission is hereby granted, free of charge, to any person obtaining
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
107 a copy of this software and associated documentation files (the
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
108 "Software"), to deal in the Software without restriction, including
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
109 without limitation the rights to use, copy, modify, merge, publish,
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
110 distribute, sublicense, and/or sell copies of the Software, and to
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
111 permit persons to whom the Software is furnished to do so, subject to
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
112 the following conditions:
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
113
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
114 The above copyright notice and this permission notice shall be
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
115 included in all copies or substantial portions of the Software.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
116
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
117 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
118 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
119 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
120 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
121 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
122 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
123 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
124