Mercurial > repos > bgruening > text_processing
annotate readme.rst @ 21:86755160afbf draft default tip
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit c2b1677d1c94433f777c2dc28ac8eec0a99cc6a7
author | bgruening |
---|---|
date | Fri, 16 Aug 2024 10:41:54 +0000 |
parents | 43b1f073b693 |
children |
rev | line source |
---|---|
0 | 1 Galaxy wrappers for common unix text-processing tools |
2 ===================================================== | |
3 | |
1
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu ) |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and |
0 | 6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose |
7 text manipulation tool to this repository. | |
8 | |
9 | |
10 Tools: | |
11 ------ | |
12 | |
13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) | |
14 * sed - Stream Editor ( http://sed.sf.net ) | |
15 * grep - Search files ( http://www.gnu.org/software/grep/ ) | |
16 * sort_columns - Sorting every line according to there columns | |
17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): | |
18 | |
19 * sort - sort files | |
20 * join - join two files, based on common key field. | |
21 * cut - keep/discard fields from a file | |
22 * unsorted_uniq - keep unique/duplicated lines in a file | |
23 * sorted_uniq - keep unique/duplicated lines in a file | |
24 * head - keep the first X lines in a file. | |
25 * tail - keep the last X lines in a file. | |
26 * unfold_column - unfold a column with multiple entities into multiple lines | |
27 | |
28 | |
29 Few improvements over the standard tools: | |
30 ----------------------------------------- | |
31 | |
32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) | |
33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) | |
34 * Find_and_Replace - Find/Replace text in a line or specific column. | |
35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. | |
36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) | |
37 | |
38 | |
39 Requirements: | |
40 ------------- | |
41 | |
42 * Coreutils vesion 8.22 or later. | |
43 * AWK version 4.0.1 or later. | |
44 * SED version 4.2 *with* a special patch | |
45 * Grep with PCRE support | |
46 | |
47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing | |
48 | |
49 | |
50 ------------------- | |
51 NOTE About Security | |
52 ------------------- | |
53 | |
54 The included tools are secure (barring unintentional bugs): | |
55 The main concern might be executing system commands with awk's "system" and sed's "e" commands, | |
56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. | |
57 These commands are DISABLED using the "--sandbox" parameter to awk and sed. | |
58 | |
59 User trying to run an awk program similar to:: | |
60 | |
1
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
61 BEGIN { system("ls") } |
0 | 62 |
63 Will get an error (in Galaxy) saying:: | |
64 | |
1
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
65 fatal: 'system' function not allowed in sandbox mode. |
0 | 66 |
67 User trying to run a SED program similar to:: | |
68 | |
1
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
69 1els |
0 | 70 |
71 will get an error (in Galaxy) saying:: | |
72 | |
1
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode |
0 | 74 |
75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. | |
76 | |
77 ------------ | |
78 Installation | |
79 ------------ | |
80 | |
81 Should be done via the Galaxy `Tool Shed`_. | |
82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing | |
83 | |
84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed | |
85 | |
86 | |
87 ---- | |
88 TODO | |
89 ---- | |
90 | |
91 * add shuf, we can remove the random feature from sort and use shuf instead | |
92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes | |
93 * cut wrapper has some output conditional magic for interval files, that needs to be checked | |
94 * comm wrapper, see the Galaxy default one | |
95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them | |
96 | |
97 | |
98 ------- | |
99 License | |
100 ------- | |
101 | |
102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) | |
103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) | |
104 | |
105 | |
1
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
106 Permission is hereby granted, free of charge, to any person obtaining |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
107 a copy of this software and associated documentation files (the |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
108 "Software"), to deal in the Software without restriction, including |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
109 without limitation the rights to use, copy, modify, merge, publish, |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
110 distribute, sublicense, and/or sell copies of the Software, and to |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
111 permit persons to whom the Software is furnished to do so, subject to |
0 | 112 the following conditions: |
113 | |
1
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
114 The above copyright notice and this permission notice shall be |
0 | 115 included in all copies or substantial portions of the Software. |
116 | |
1
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
117 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
118 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
119 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
120 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
121 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
122 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE |
43b1f073b693
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
0
diff
changeset
|
123 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |