annotate tools/sample_seqs/README.rst @ 6:31f5701cd2e9 draft

v0.2.4 Depends on Biopython 1.67 via legacy Tool Shed package or bioconda.
author peterjc
date Thu, 11 May 2017 07:24:38 -0400
parents 6b71ad5d43fb
children 86710edcec02
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
16ecf25d521f Uploaded v0.0.1 with fixed README file
peterjc
parents: 0
diff changeset
1 Galaxy tool to sub-sample sequence files
16ecf25d521f Uploaded v0.0.1 with fixed README file
peterjc
parents: 0
diff changeset
2 ========================================
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
3
6
31f5701cd2e9 v0.2.4 Depends on Biopython 1.67 via legacy Tool Shed package or bioconda.
peterjc
parents: 5
diff changeset
4 This tool is copyright 2014-2017 by Peter Cock, The James Hutton Institute
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
5 (formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
6 See the licence text below (MIT licence).
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
7
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
8 This tool is a short Python script (using Biopython library functions)
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
9 to sub-sample sequence files (in a range of formats including FASTA, FASTQ,
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
10 and SFF). This can be useful for preparing a small sample of data to test
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
11 or time a new pipeline, or for reducing the read coverage in a de novo
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
12 assembly.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
13
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
14 This tool is available from the Galaxy Tool Shed at:
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
15
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
16 * http://toolshed.g2.bx.psu.edu/view/peterjc/sample_seqs
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
17
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
18
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
19 Automated Installation
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
20 ======================
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
21
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
22 This should be straightforward using the Galaxy Tool Shed, which should be
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
23 able to automatically install the dependency on Biopython, and then install
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
24 this tool and run its unit tests.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
25
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
26
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
27 Manual Installation
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
28 ===================
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
29
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
30 There are just two files to install to use this tool from within Galaxy:
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
31
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
32 * ``sample_seqs.py`` (the Python script)
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
33 * ``sample_seqs.xml`` (the Galaxy tool definition)
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
34
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
35 The suggested location is in a dedicated ``tools/sample_seqs`` folder.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
36
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
37 You will also need to modify the ``tools_conf.xml`` file to tell Galaxy to offer the
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
38 tool. One suggested location is in the filters section. Simply add the line::
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
39
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
40 <tool file="sample_seqs/sample_seqs.xml" />
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
41
2
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
42 You will also need to install Biopython 1.62 or later.
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
43
2
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
44 If you wish to run the unit tests, also move/copy the ``test-data/`` files
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
45 under Galaxy's ``test-data/`` folder. Then::
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
46
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
47 ./run_tests.sh -id sample_seqs
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
48
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
49 That's it.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
50
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
51
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
52 History
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
53 =======
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
54
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
55 ======= ======================================================================
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
56 Version Changes
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
57 ------- ----------------------------------------------------------------------
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
58 v0.0.1 - Initial version.
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
59 v0.1.1 - Using ``optparse`` to provide a proper Python command line API.
2
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
60 v0.1.2 - Interleaved mode for working with paired records.
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
61 - Tool definition now embeds citation information.
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
62 v0.2.0 - Option to give number of sequences (or pairs) desired.
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
63 This works by first counting all your sequences, then calculates
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
64 the percentage required in order to sample them uniformly (evenly).
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
65 This makes two passes through the input and is therefore slower.
3
02c13ef1a669 Uploaded v0.2.1, fixed missing test file, more tests.
peterjc
parents: 2
diff changeset
66 v0.2.1 - Was missing a file for the functional tests.
02c13ef1a669 Uploaded v0.2.1, fixed missing test file, more tests.
peterjc
parents: 2
diff changeset
67 - Included testing of stdout messages.
02c13ef1a669 Uploaded v0.2.1, fixed missing test file, more tests.
peterjc
parents: 2
diff changeset
68 - Includes testing of failure modes.
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
69 v0.2.2 - Reorder XML elements (internal change only).
5
6b71ad5d43fb v0.2.3 clarified help, internal cleanup of Python script
peterjc
parents: 4
diff changeset
70 - Use ``format_source=...`` tag.
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
71 - Planemo for Tool Shed upload (``.shed.yml``, internal change only).
5
6b71ad5d43fb v0.2.3 clarified help, internal cleanup of Python script
peterjc
parents: 4
diff changeset
72 v0.2.3 - Do the Biopython imports at the script start (internal change only).
6b71ad5d43fb v0.2.3 clarified help, internal cleanup of Python script
peterjc
parents: 4
diff changeset
73 - Clarify paired read example in help text.
6
31f5701cd2e9 v0.2.4 Depends on Biopython 1.67 via legacy Tool Shed package or bioconda.
peterjc
parents: 5
diff changeset
74 v0.2.4 - Depends on Biopython 1.67 via legacy Tool Shed package or bioconda.
31f5701cd2e9 v0.2.4 Depends on Biopython 1.67 via legacy Tool Shed package or bioconda.
peterjc
parents: 5
diff changeset
75 - Style changes to Python code (internal change only).
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
76 ======= ======================================================================
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
77
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
78
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
79 Developers
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
80 ==========
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
81
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
82 This script and related tools are being developed on this GitHub repository:
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
83 https://github.com/peterjc/pico_galaxy/tree/master/tools/sample_seqs
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
84
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
85 For pushing a release to the test or main "Galaxy Tool Shed", use the following
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
86 Planemo commands (which requires you have set your Tool Shed access details in
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
87 ``~/.planemo.yml`` and that you have access rights on the Tool Shed)::
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
88
6
31f5701cd2e9 v0.2.4 Depends on Biopython 1.67 via legacy Tool Shed package or bioconda.
peterjc
parents: 5
diff changeset
89 $ planemo shed_update -t testtoolshed --check_diff tools/sample_seqs/
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
90 ...
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
91
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
92 or::
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
93
6
31f5701cd2e9 v0.2.4 Depends on Biopython 1.67 via legacy Tool Shed package or bioconda.
peterjc
parents: 5
diff changeset
94 $ planemo shed_update -t toolshed --check_diff tools/sample_seqs/
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
95 ...
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
96
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
97 To just build and check the tar ball, use::
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
98
6
31f5701cd2e9 v0.2.4 Depends on Biopython 1.67 via legacy Tool Shed package or bioconda.
peterjc
parents: 5
diff changeset
99 $ planemo shed_upload --tar_only tools/sample_seqs/
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
100 ...
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
101 $ tar -tzf shed_upload.tar.gz
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
102 test-data/MID4_GLZRM4E04_rnd30_frclip.pair_sample_N5.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
103 test-data/MID4_GLZRM4E04_rnd30_frclip.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
104 test-data/MID4_GLZRM4E04_rnd30_frclip.sample_C1.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
105 test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
106 test-data/MID4_GLZRM4E04_rnd30_frclip.pair_sample_N5.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
107 test-data/ecoli.fastq
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
108 test-data/ecoli.pair_sample_N100.fastq
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
109 test-data/ecoli.sample_C10.fastq
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
110 test-data/ecoli.sample_N100.fastq
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
111 test-data/get_orf_input.Suis_ORF.prot.fasta
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
112 test-data/get_orf_input.Suis_ORF.prot.pair_sample_C10.fasta
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
113 test-data/get_orf_input.Suis_ORF.prot.pair_sample_N100.fasta
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
114 test-data/get_orf_input.Suis_ORF.prot.sample_C10.fasta
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
115 test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
116 tools/sample_seqs/README.rst
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
117 tools/sample_seqs/sample_seqs.py
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
118 tools/sample_seqs/sample_seqs.xml
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
119 tools/sample_seqs/tool_dependencies.xml
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
120
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
121
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
122 Licence (MIT)
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
123 =============
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
124
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
125 Permission is hereby granted, free of charge, to any person obtaining a copy
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
126 of this software and associated documentation files (the "Software"), to deal
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
127 in the Software without restriction, including without limitation the rights
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
128 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
129 copies of the Software, and to permit persons to whom the Software is
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
130 furnished to do so, subject to the following conditions:
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
131
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
132 The above copyright notice and this permission notice shall be included in
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
133 all copies or substantial portions of the Software.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
134
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
135 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
136 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
137 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
138 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
139 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
140 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
141 THE SOFTWARE.