sharplabtool: tools/rgenetics/rgRegion.py annotate

annotate tools/rgenetics/rgRegion.py @ 0:9071e359b9a3

Uploaded

author	xuebing
date	Fri, 09 Mar 2012 19:37:19 -0500
parents
children

rev	line source
0 9071e359b9a3 Uploaded xuebing parents: diff changeset	1 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	2 released under the terms of the LGPL
9071e359b9a3 Uploaded xuebing parents: diff changeset	3 copyright ross lazarus August 2007
9071e359b9a3 Uploaded xuebing parents: diff changeset	4 for the rgenetics project
9071e359b9a3 Uploaded xuebing parents: diff changeset	5
9071e359b9a3 Uploaded xuebing parents: diff changeset	6 Special galaxy tool for the camp2007 data
9071e359b9a3 Uploaded xuebing parents: diff changeset	7 Allows grabbing arbitrary columns from an arbitrary region
9071e359b9a3 Uploaded xuebing parents: diff changeset	8
9071e359b9a3 Uploaded xuebing parents: diff changeset	9 Needs a mongo results file in the location hardwired below or could be passed in as
9071e359b9a3 Uploaded xuebing parents: diff changeset	10 a library parameter - but this file must have a very specific structure
9071e359b9a3 Uploaded xuebing parents: diff changeset	11 rs chrom offset float1...floatn
9071e359b9a3 Uploaded xuebing parents: diff changeset	12
9071e359b9a3 Uploaded xuebing parents: diff changeset	13 called as
9071e359b9a3 Uploaded xuebing parents: diff changeset	14 <command interpreter="python">
9071e359b9a3 Uploaded xuebing parents: diff changeset	15 rsRegion.py $infile '$cols' $r $tag $out_file1
9071e359b9a3 Uploaded xuebing parents: diff changeset	16 </command>
9071e359b9a3 Uploaded xuebing parents: diff changeset	17
9071e359b9a3 Uploaded xuebing parents: diff changeset	18 cols is a delimited list of chosen column names for the subset
9071e359b9a3 Uploaded xuebing parents: diff changeset	19 r is a ucsc location region pasted into the tool
9071e359b9a3 Uploaded xuebing parents: diff changeset	20
9071e359b9a3 Uploaded xuebing parents: diff changeset	21 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	22
9071e359b9a3 Uploaded xuebing parents: diff changeset	23
9071e359b9a3 Uploaded xuebing parents: diff changeset	24 import sys,string
9071e359b9a3 Uploaded xuebing parents: diff changeset	25
9071e359b9a3 Uploaded xuebing parents: diff changeset	26 trantab = string.maketrans(string.punctuation,'_'*len(string.punctuation))
9071e359b9a3 Uploaded xuebing parents: diff changeset	27 print >> sys.stdout, '##rgRegion.py started'
9071e359b9a3 Uploaded xuebing parents: diff changeset	28 if len(sys.argv) <> 6:
9071e359b9a3 Uploaded xuebing parents: diff changeset	29 print >> sys.stdout, '##!expected params in sys.argv, got %d - %s' % (len(sys.argv),sys.argv)
9071e359b9a3 Uploaded xuebing parents: diff changeset	30 sys.exit(1)
9071e359b9a3 Uploaded xuebing parents: diff changeset	31 print '##got %d - %s' % (len(sys.argv),sys.argv)
9071e359b9a3 Uploaded xuebing parents: diff changeset	32 # quick and dirty for galaxy - we always get something for each parameter
9071e359b9a3 Uploaded xuebing parents: diff changeset	33 fname = sys.argv[1]
9071e359b9a3 Uploaded xuebing parents: diff changeset	34 wewant = sys.argv[2].split(',')
9071e359b9a3 Uploaded xuebing parents: diff changeset	35 region = sys.argv[3].lower()
9071e359b9a3 Uploaded xuebing parents: diff changeset	36 tag = sys.argv[4].translate(trantab)
9071e359b9a3 Uploaded xuebing parents: diff changeset	37 ofname = sys.argv[5]
9071e359b9a3 Uploaded xuebing parents: diff changeset	38 myname = 'rgRegion'
9071e359b9a3 Uploaded xuebing parents: diff changeset	39 if len(wewant) == 0: # no columns selected?
9071e359b9a3 Uploaded xuebing parents: diff changeset	40 print >> sys.stdout, '##!%s: no columns selected - cannot run' % myname
9071e359b9a3 Uploaded xuebing parents: diff changeset	41 sys.exit(1)
9071e359b9a3 Uploaded xuebing parents: diff changeset	42 try:
9071e359b9a3 Uploaded xuebing parents: diff changeset	43 f = open(fname,'r')
9071e359b9a3 Uploaded xuebing parents: diff changeset	44 except: # bad input file name?
9071e359b9a3 Uploaded xuebing parents: diff changeset	45 print >> sys.stdout, '##!%s unable to open file %s' % (myname, fname)
9071e359b9a3 Uploaded xuebing parents: diff changeset	46 sys.exit(1)
9071e359b9a3 Uploaded xuebing parents: diff changeset	47 try: # TODO make a regexp?
9071e359b9a3 Uploaded xuebing parents: diff changeset	48 c,rest = region.split(':')
9071e359b9a3 Uploaded xuebing parents: diff changeset	49 c = c.replace('chr','') # leave although will break strict genome graphs
9071e359b9a3 Uploaded xuebing parents: diff changeset	50 rest = rest.replace(',','') # remove commas
9071e359b9a3 Uploaded xuebing parents: diff changeset	51 spos,epos = rest.split('-')
9071e359b9a3 Uploaded xuebing parents: diff changeset	52 spos = int(spos)
9071e359b9a3 Uploaded xuebing parents: diff changeset	53 epos = int(epos)
9071e359b9a3 Uploaded xuebing parents: diff changeset	54 except:
9071e359b9a3 Uploaded xuebing parents: diff changeset	55 print >> sys.stdout, '##!%s unable to parse region %s - MUST look like "chr8:10,000-100,000' % (myname,region)
9071e359b9a3 Uploaded xuebing parents: diff changeset	56 sys.exit(1)
9071e359b9a3 Uploaded xuebing parents: diff changeset	57 print >> sys.stdout, '##%s parsing chrom %s from %d to %d' % (myname, c,spos,epos)
9071e359b9a3 Uploaded xuebing parents: diff changeset	58 res = []
9071e359b9a3 Uploaded xuebing parents: diff changeset	59 cnames = f.next().strip().split() # column titles for output
9071e359b9a3 Uploaded xuebing parents: diff changeset	60 linelen = len(cnames)
9071e359b9a3 Uploaded xuebing parents: diff changeset	61 wewant = [int(x) - 1 for x in wewant] # need col numbers base 0
9071e359b9a3 Uploaded xuebing parents: diff changeset	62 for n,l in enumerate(f):
9071e359b9a3 Uploaded xuebing parents: diff changeset	63 ll = l.strip().split()
9071e359b9a3 Uploaded xuebing parents: diff changeset	64 thisc = ll[1]
9071e359b9a3 Uploaded xuebing parents: diff changeset	65 thispos = int(ll[2])
9071e359b9a3 Uploaded xuebing parents: diff changeset	66 if (thisc == c) and (thispos >= spos) and (thispos <= epos):
9071e359b9a3 Uploaded xuebing parents: diff changeset	67 if len(ll) == linelen:
9071e359b9a3 Uploaded xuebing parents: diff changeset	68 res.append([ll[x] for x in wewant]) # subset of columns!
9071e359b9a3 Uploaded xuebing parents: diff changeset	69 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	70 print >> sys.stdout, '##! looking for %d fields - found %d in ll=%s' % (linelen,len(ll),str(ll))
9071e359b9a3 Uploaded xuebing parents: diff changeset	71 o = file(ofname,'w')
9071e359b9a3 Uploaded xuebing parents: diff changeset	72 res = ['%s\n' % '\t'.join(x) for x in res] # turn into tab delim string
9071e359b9a3 Uploaded xuebing parents: diff changeset	73 print >> sys.stdout, '##%s selected and returning %d data rows' % (myname,len(res))
9071e359b9a3 Uploaded xuebing parents: diff changeset	74 head = [cnames[x] for x in wewant] # ah, list comprehensions - list of needed column names
9071e359b9a3 Uploaded xuebing parents: diff changeset	75 o.write('%s\n' % '\t'.join(head)) # header row for output
9071e359b9a3 Uploaded xuebing parents: diff changeset	76 o.write(''.join(res))
9071e359b9a3 Uploaded xuebing parents: diff changeset	77 o.close()
9071e359b9a3 Uploaded xuebing parents: diff changeset	78 f.close()
9071e359b9a3 Uploaded xuebing parents: diff changeset	79
9071e359b9a3 Uploaded xuebing parents: diff changeset	80

Mercurial > repos > xuebing > sharplabtool

annotate tools/rgenetics/rgRegion.py @ 0:9071e359b9a3