Mercurial > repos > vipints > fml_gff3togtf
annotate bed_to_gff.py @ 11:5c6f33e20fcc default tip
requirement tag added
author | vipints <vipin@cbio.mskcc.org> |
---|---|
date | Fri, 24 Apr 2015 18:04:27 -0400 |
parents | c42c69aa81f8 |
children |
rev | line source |
---|---|
10
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
2 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
3 Convert genome annotation data in a 12 column BED format to GFF3. |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
4 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
5 Usage: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
6 python bed_to_gff.py in.bed > out.gff |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
7 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
8 Requirement: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
9 helper.py : https://github.com/vipints/GFFtools-GX/blob/master/helper.py |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
10 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
11 Copyright (C) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
12 2009-2012 Friedrich Miescher Laboratory of the Max Planck Society, Tubingen, Germany. |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
13 2012-2015 Memorial Sloan Kettering Cancer Center New York City, USA. |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
14 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
15 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
16 import re |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
17 import sys |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
18 import helper |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
19 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
20 def __main__(): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
21 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
22 main function |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
23 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
24 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
25 try: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
26 bed_fname = sys.argv[1] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
27 except: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
28 print __doc__ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
29 sys.exit(-1) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
30 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
31 bed_fh = helper.open_file(bed_fname) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
32 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
33 for line in bed_fh: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
34 line = line.strip( '\n\r' ) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
35 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
36 if not line or line[0] in ['#']: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
37 continue |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
38 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
39 parts = line.split('\t') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
40 assert len(parts) >= 12, line |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
41 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
42 rstarts = parts[-1].split(',') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
43 rstarts.pop() if rstarts[-1] == '' else rstarts |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
44 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
45 exon_lens = parts[-2].split(',') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
46 exon_lens.pop() if exon_lens[-1] == '' else exon_lens |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
47 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
48 if len(rstarts) != len(exon_lens): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
49 continue # checking the consistency col 11 and col 12 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
50 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
51 if len(rstarts) != int(parts[-3]): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
52 continue # checking the number of exons and block count are same |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
53 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
54 if not parts[5] in ['+', '-']: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
55 parts[5] = '.' # replace the unknown strand with '.' |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
56 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
57 # bed2gff result line |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
58 sys.stdout.write('%s\tbed2gff\tgene\t%d\t%s\t%s\t%s\t.\tID=Gene:%s;Name=Gene:%s\n' % (parts[0], int(parts[1])+1, parts[2], parts[4], parts[5], parts[3], parts[3])) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
59 sys.stdout.write('%s\tbed2gff\ttranscript\t%d\t%s\t%s\t%s\t.\tID=%s;Name=%s;Parent=Gene:%s\n' % (parts[0], int(parts[1])+1, parts[2], parts[4], parts[5], parts[3], parts[3], parts[3])) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
60 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
61 st = int(parts[1]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
62 for ex_cnt in range(int(parts[-3])): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
63 start = st + int(rstarts[ex_cnt]) + 1 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
64 stop = start + int(exon_lens[ex_cnt]) - 1 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
65 sys.stdout.write('%s\tbed2gff\texon\t%d\t%d\t%s\t%s\t.\tParent=%s\n' % (parts[0], start, stop, parts[4], parts[5], parts[3])) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
66 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
67 bed_fh.close() |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
68 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
69 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
70 if __name__ == "__main__": |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
71 __main__() |