Mercurial > repos > bgruening > text_processing
comparison readme.rst @ 0:5314e5d6f040 draft
Imported from capsule None
author | bgruening |
---|---|
date | Thu, 29 Jan 2015 07:53:17 -0500 |
parents | |
children | 43b1f073b693 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:5314e5d6f040 |
---|---|
1 Galaxy wrappers for common unix text-processing tools | |
2 ===================================================== | |
3 | |
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu ) | |
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and | |
6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose | |
7 text manipulation tool to this repository. | |
8 | |
9 | |
10 Tools: | |
11 ------ | |
12 | |
13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) | |
14 * sed - Stream Editor ( http://sed.sf.net ) | |
15 * grep - Search files ( http://www.gnu.org/software/grep/ ) | |
16 * sort_columns - Sorting every line according to there columns | |
17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): | |
18 | |
19 * sort - sort files | |
20 * join - join two files, based on common key field. | |
21 * cut - keep/discard fields from a file | |
22 * unsorted_uniq - keep unique/duplicated lines in a file | |
23 * sorted_uniq - keep unique/duplicated lines in a file | |
24 * head - keep the first X lines in a file. | |
25 * tail - keep the last X lines in a file. | |
26 * unfold_column - unfold a column with multiple entities into multiple lines | |
27 | |
28 | |
29 Few improvements over the standard tools: | |
30 ----------------------------------------- | |
31 | |
32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) | |
33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) | |
34 * Find_and_Replace - Find/Replace text in a line or specific column. | |
35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. | |
36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) | |
37 | |
38 | |
39 Requirements: | |
40 ------------- | |
41 | |
42 * Coreutils vesion 8.22 or later. | |
43 * AWK version 4.0.1 or later. | |
44 * SED version 4.2 *with* a special patch | |
45 * Grep with PCRE support | |
46 | |
47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing | |
48 | |
49 | |
50 ------------------- | |
51 NOTE About Security | |
52 ------------------- | |
53 | |
54 The included tools are secure (barring unintentional bugs): | |
55 The main concern might be executing system commands with awk's "system" and sed's "e" commands, | |
56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. | |
57 These commands are DISABLED using the "--sandbox" parameter to awk and sed. | |
58 | |
59 User trying to run an awk program similar to:: | |
60 | |
61 BEGIN { system("ls") } | |
62 | |
63 Will get an error (in Galaxy) saying:: | |
64 | |
65 fatal: 'system' function not allowed in sandbox mode. | |
66 | |
67 User trying to run a SED program similar to:: | |
68 | |
69 1els | |
70 | |
71 will get an error (in Galaxy) saying:: | |
72 | |
73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode | |
74 | |
75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. | |
76 | |
77 ------------ | |
78 Installation | |
79 ------------ | |
80 | |
81 Should be done via the Galaxy `Tool Shed`_. | |
82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing | |
83 | |
84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed | |
85 | |
86 | |
87 ---- | |
88 TODO | |
89 ---- | |
90 | |
91 * add shuf, we can remove the random feature from sort and use shuf instead | |
92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes | |
93 * cut wrapper has some output conditional magic for interval files, that needs to be checked | |
94 * comm wrapper, see the Galaxy default one | |
95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them | |
96 | |
97 | |
98 ------- | |
99 License | |
100 ------- | |
101 | |
102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) | |
103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) | |
104 | |
105 | |
106 Permission is hereby granted, free of charge, to any person obtaining | |
107 a copy of this software and associated documentation files (the | |
108 "Software"), to deal in the Software without restriction, including | |
109 without limitation the rights to use, copy, modify, merge, publish, | |
110 distribute, sublicense, and/or sell copies of the Software, and to | |
111 permit persons to whom the Software is furnished to do so, subject to | |
112 the following conditions: | |
113 | |
114 The above copyright notice and this permission notice shall be | |
115 included in all copies or substantial portions of the Software. | |
116 | |
117 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, | |
118 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF | |
119 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. | |
120 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY | |
121 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, | |
122 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE | |
123 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |
124 |