annotate split_tabular_columns.py @ 0:d43312f961cc draft default tip

planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
author jjohnson
date Wed, 01 Mar 2017 14:01:57 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
1 #!/usr/bin/env python
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
2 """
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
3 #
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
4 #------------------------------------------------------------------------------
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
5 # University of Minnesota
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
6 # Copyright 2016, Regents of the University of Minnesota
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
7 #------------------------------------------------------------------------------
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
8 # Author:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
9 #
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
10 # James E Johnson
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
11 #
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
12 #------------------------------------------------------------------------------
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
13 """
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
14
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
15 """
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
16 Split selected columns on pattern
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
17 and print a line for each item split
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
18
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
19 For example:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
20 split_tabular_columns.py -c 3 -c 4 -s '; '
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
21 with input line:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
22 1 1.3 id1; id2 desc1; desc2 AMDLID
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
23 will be output as:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
24 1 1.3 id1 desc1 AMDLID
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
25 1 1.3 id2 desc2 AMDLID
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
26 """
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
27
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
28 import sys
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
29 import os.path
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
30 import optparse
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
31 from optparse import OptionParser
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
32
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
33
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
34 def __main__():
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
35 # Parse Command Line
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
36 parser = optparse.OptionParser()
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
37 parser.add_option('-i', '--input', dest='input', default=None, help='Tabular input file')
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
38 parser.add_option('-o', '--output', dest='output', default=None, help='Tabular output file')
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
39 parser.add_option('-c', '--column', type='int', action='append', dest='column', default=[], help='column ordinal to split')
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
40 parser.add_option('-s', '--split_on', dest='split_on', default=' ', help='String on which to split columns')
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
41 parser.add_option('-d', '--debug', dest='debug', action='store_true', default=False, help='Turn on wrapper debugging to stderr')
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
42 (options, args) = parser.parse_args()
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
43 # Input file
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
44 if options.input is not None:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
45 try:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
46 inputPath = os.path.abspath(options.input)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
47 inputFile = open(inputPath, 'r')
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
48 except Exception, e:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
49 print >> sys.stderr, "failed: %s" % e
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
50 exit(2)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
51 else:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
52 inputFile = sys.stdin
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
53 # Output file
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
54 if options.output is not None:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
55 try:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
56 outputPath = os.path.abspath(options.output)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
57 outputFile = open(outputPath, 'w')
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
58 except Exception, e:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
59 print >> sys.stderr, "failed: %s" % e
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
60 exit(3)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
61 else:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
62 outputFile = sys.stdout
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
63 split_cols = [x - 1 for x in options.column]
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
64 split_on = options.split_on
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
65 try:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
66 for i, line in enumerate(inputFile):
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
67 fields = line.rstrip('\r\n').split('\t')
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
68 split_fields = dict()
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
69 cnt = 0
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
70 for c in split_cols:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
71 if c < len(fields):
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
72 split_fields[c] = fields[c].split(split_on)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
73 cnt = max(cnt, len(split_fields[c]))
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
74 if cnt == 0:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
75 print >> outputFile, "%s" % '\t'.join(fields)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
76 else:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
77 for n in range(0, cnt):
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
78 flds = [x if c not in split_cols else split_fields[c][n] for (c, x) in enumerate(fields)]
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
79 print >> outputFile, "%s" % '\t'.join(flds)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
80 except Exception, e:
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
81 print >> sys.stderr, "failed: Error reading %s - %s" % (options.input if options.input else 'stdin', e)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
82 exit(1)
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
83
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
84 if __name__ == "__main__":
d43312f961cc planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/split_tabular_columns commit 1d5750b99b90bb1d2730c816a95849e9b9a7d2f9-dirty
jjohnson
parents:
diff changeset
85 __main__()