annotate readme.rst @ 21:86755160afbf draft default tip

planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit c2b1677d1c94433f777c2dc28ac8eec0a99cc6a7
author bgruening
date Fri, 16 Aug 2024 10:41:54 +0000
parents 43b1f073b693
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
1 Galaxy wrappers for common unix text-processing tools
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
2 =====================================================
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
3
1
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu )
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
7 text manipulation tool to this repository.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
8
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
9
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
10 Tools:
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
11 ------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
12
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
14 * sed - Stream Editor ( http://sed.sf.net )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
15 * grep - Search files ( http://www.gnu.org/software/grep/ )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
16 * sort_columns - Sorting every line according to there columns
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
18
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
19 * sort - sort files
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
20 * join - join two files, based on common key field.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
21 * cut - keep/discard fields from a file
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
22 * unsorted_uniq - keep unique/duplicated lines in a file
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
23 * sorted_uniq - keep unique/duplicated lines in a file
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
24 * head - keep the first X lines in a file.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
25 * tail - keep the last X lines in a file.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
26 * unfold_column - unfold a column with multiple entities into multiple lines
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
27
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
28
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
29 Few improvements over the standard tools:
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
30 -----------------------------------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
31
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
34 * Find_and_Replace - Find/Replace text in a line or specific column.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
37
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
38
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
39 Requirements:
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
40 -------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
41
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
42 * Coreutils vesion 8.22 or later.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
43 * AWK version 4.0.1 or later.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
44 * SED version 4.2 *with* a special patch
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
45 * Grep with PCRE support
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
46
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
48
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
49
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
50 -------------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
51 NOTE About Security
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
52 -------------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
53
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
54 The included tools are secure (barring unintentional bugs):
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
55 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
57 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
58
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
59 User trying to run an awk program similar to::
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
60
1
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
61 BEGIN { system("ls") }
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
62
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
63 Will get an error (in Galaxy) saying::
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
64
1
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
65 fatal: 'system' function not allowed in sandbox mode.
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
66
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
67 User trying to run a SED program similar to::
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
68
1
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
69 1els
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
70
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
71 will get an error (in Galaxy) saying::
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
72
1
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
74
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
76
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
77 ------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
78 Installation
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
79 ------------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
80
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
81 Should be done via the Galaxy `Tool Shed`_.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
83
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
85
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
86
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
87 ----
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
88 TODO
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
89 ----
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
90
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
91 * add shuf, we can remove the random feature from sort and use shuf instead
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
93 * cut wrapper has some output conditional magic for interval files, that needs to be checked
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
94 * comm wrapper, see the Galaxy default one
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
96
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
97
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
98 -------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
99 License
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
100 -------
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
101
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu)
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com)
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
104
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
105
1
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
106 Permission is hereby granted, free of charge, to any person obtaining
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
107 a copy of this software and associated documentation files (the
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
108 "Software"), to deal in the Software without restriction, including
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
109 without limitation the rights to use, copy, modify, merge, publish,
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
110 distribute, sublicense, and/or sell copies of the Software, and to
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
111 permit persons to whom the Software is furnished to do so, subject to
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
112 the following conditions:
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
113
1
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
114 The above copyright notice and this permission notice shall be
0
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
115 included in all copies or substantial portions of the Software.
5314e5d6f040 Imported from capsule None
bgruening
parents:
diff changeset
116
1
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
117 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
118 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
119 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
120 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
121 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
122 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
43b1f073b693 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 0
diff changeset
123 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.