annotate tools/sample_seqs/README.rst @ 4:d3aa9f25c24c draft

v0.2.2 use format_source and other internal changes
author peterjc
date Wed, 05 Aug 2015 12:30:18 -0400
parents 02c13ef1a669
children 6b71ad5d43fb
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
16ecf25d521f Uploaded v0.0.1 with fixed README file
peterjc
parents: 0
diff changeset
1 Galaxy tool to sub-sample sequence files
16ecf25d521f Uploaded v0.0.1 with fixed README file
peterjc
parents: 0
diff changeset
2 ========================================
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
3
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
4 This tool is copyright 2014-2015 by Peter Cock, The James Hutton Institute
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
5 (formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
6 See the licence text below (MIT licence).
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
7
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
8 This tool is a short Python script (using Biopython library functions)
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
9 to sub-sample sequence files (in a range of formats including FASTA, FASTQ,
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
10 and SFF). This can be useful for preparing a small sample of data to test
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
11 or time a new pipeline, or for reducing the read coverage in a de novo
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
12 assembly.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
13
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
14 This tool is available from the Galaxy Tool Shed at:
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
15
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
16 * http://toolshed.g2.bx.psu.edu/view/peterjc/sample_seqs
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
17
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
18
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
19 Automated Installation
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
20 ======================
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
21
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
22 This should be straightforward using the Galaxy Tool Shed, which should be
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
23 able to automatically install the dependency on Biopython, and then install
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
24 this tool and run its unit tests.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
25
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
26
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
27 Manual Installation
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
28 ===================
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
29
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
30 There are just two files to install to use this tool from within Galaxy:
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
31
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
32 * ``sample_seqs.py`` (the Python script)
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
33 * ``sample_seqs.xml`` (the Galaxy tool definition)
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
34
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
35 The suggested location is in a dedicated ``tools/sample_seqs`` folder.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
36
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
37 You will also need to modify the ``tools_conf.xml`` file to tell Galaxy to offer the
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
38 tool. One suggested location is in the filters section. Simply add the line::
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
39
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
40 <tool file="sample_seqs/sample_seqs.xml" />
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
41
2
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
42 You will also need to install Biopython 1.62 or later.
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
43
2
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
44 If you wish to run the unit tests, also move/copy the ``test-data/`` files
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
45 under Galaxy's ``test-data/`` folder. Then::
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
46
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
47 ./run_tests.sh -id sample_seqs
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
48
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
49 That's it.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
50
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
51
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
52 History
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
53 =======
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
54
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
55 ======= ======================================================================
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
56 Version Changes
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
57 ------- ----------------------------------------------------------------------
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
58 v0.0.1 - Initial version.
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
59 v0.1.1 - Using ``optparse`` to provide a proper Python command line API.
2
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
60 v0.1.2 - Interleaved mode for working with paired records.
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
61 - Tool definition now embeds citation information.
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
62 v0.2.0 - Option to give number of sequences (or pairs) desired.
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
63 This works by first counting all your sequences, then calculates
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
64 the percentage required in order to sample them uniformly (evenly).
da64f6a9e32b Uploaded v0.2.0, adds desired count mode
peterjc
parents: 1
diff changeset
65 This makes two passes through the input and is therefore slower.
3
02c13ef1a669 Uploaded v0.2.1, fixed missing test file, more tests.
peterjc
parents: 2
diff changeset
66 v0.2.1 - Was missing a file for the functional tests.
02c13ef1a669 Uploaded v0.2.1, fixed missing test file, more tests.
peterjc
parents: 2
diff changeset
67 - Included testing of stdout messages.
02c13ef1a669 Uploaded v0.2.1, fixed missing test file, more tests.
peterjc
parents: 2
diff changeset
68 - Includes testing of failure modes.
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
69 v0.2.2 - Reorder XML elements (internal change only).
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
70 - Use ``format_source=...``` tag.
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
71 - Planemo for Tool Shed upload (``.shed.yml``, internal change only).
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
72 ======= ======================================================================
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
73
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
74
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
75 Developers
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
76 ==========
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
77
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
78 This script and related tools are being developed on this GitHub repository:
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
79 https://github.com/peterjc/pico_galaxy/tree/master/tools/sample_seqs
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
80
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
81 For pushing a release to the test or main "Galaxy Tool Shed", use the following
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
82 Planemo commands (which requires you have set your Tool Shed access details in
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
83 ``~/.planemo.yml`` and that you have access rights on the Tool Shed)::
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
84
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
85 $ planemo shed_update --shed_target testtoolshed --check_diff ~/repositories/pico_galaxy/tools/sample_seqs/
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
86 ...
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
87
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
88 or::
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
89
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
90 $ planemo shed_update --shed_target toolshed --check_diff ~/repositories/pico_galaxy/tools/sample_seqs/
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
91 ...
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
92
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
93 To just build and check the tar ball, use::
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
94
4
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
95 $ planemo shed_upload --tar_only ~/repositories/pico_galaxy/tools/sample_seqs/
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
96 ...
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
97 $ tar -tzf shed_upload.tar.gz
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
98 test-data/MID4_GLZRM4E04_rnd30_frclip.pair_sample_N5.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
99 test-data/MID4_GLZRM4E04_rnd30_frclip.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
100 test-data/MID4_GLZRM4E04_rnd30_frclip.sample_C1.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
101 test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
102 test-data/MID4_GLZRM4E04_rnd30_frclip.pair_sample_N5.sff
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
103 test-data/ecoli.fastq
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
104 test-data/ecoli.pair_sample_N100.fastq
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
105 test-data/ecoli.sample_C10.fastq
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
106 test-data/ecoli.sample_N100.fastq
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
107 test-data/get_orf_input.Suis_ORF.prot.fasta
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
108 test-data/get_orf_input.Suis_ORF.prot.pair_sample_C10.fasta
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
109 test-data/get_orf_input.Suis_ORF.prot.pair_sample_N100.fasta
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
110 test-data/get_orf_input.Suis_ORF.prot.sample_C10.fasta
d3aa9f25c24c v0.2.2 use format_source and other internal changes
peterjc
parents: 3
diff changeset
111 test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta
0
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
112 tools/sample_seqs/README.rst
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
113 tools/sample_seqs/sample_seqs.py
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
114 tools/sample_seqs/sample_seqs.xml
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
115 tools/sample_seqs/tool_dependencies.xml
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
116
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
117
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
118 Licence (MIT)
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
119 =============
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
120
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
121 Permission is hereby granted, free of charge, to any person obtaining a copy
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
122 of this software and associated documentation files (the "Software"), to deal
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
123 in the Software without restriction, including without limitation the rights
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
124 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
125 copies of the Software, and to permit persons to whom the Software is
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
126 furnished to do so, subject to the following conditions:
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
127
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
128 The above copyright notice and this permission notice shall be included in
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
129 all copies or substantial portions of the Software.
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
130
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
131 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
132 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
133 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
134 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
135 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
136 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
3a807e5ea6c8 Uploaded v0.0.1
peterjc
parents:
diff changeset
137 THE SOFTWARE.