annotate riboseqr/triplet.py @ 5:423ad61697c4

Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero). readingFrame function fails with subscript out of bounds. Bugfix 2: [triplet] Check if transcript name in SAM matches the name in FASTA. Produce an error message if it's not. This fixes the problem where an empty plot is produced (no bars). [ribosome_profile] - A proper error message is now produced if an invalid transcript name is provided. Updated test data
author Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
date Tue, 27 Oct 2015 12:21:50 +0000
parents c34c364ce75d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
1 #!/usr/bin/env python
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
2 import os
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
3 import sys
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
4 import argparse
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
5 import logging
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
6 import rpy2.robjects as robjects
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
7
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
8 import utils
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
9
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
10 rscript = ''
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
11 R = robjects.r
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
12
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
13
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
14 class TripletPeriodicityError(Exception):
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
15 pass
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
16
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
17
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
18 def run_rscript(command=None):
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
19 """Run R command, log it, append to rscript"""
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
20 global rscript
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
21 if not command:
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
22 return
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
23 logging.debug(command)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
24 rscript += '{}\n'.format(command)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
25 output = R(command)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
26 return output
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
27
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
28
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
29 def find_periodicity(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
30 rdata_load='Prepare.rda', start_codons='ATG', stop_codons='TAG,TAA,TGA',
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
31 fasta_file=None, include_lengths='25:30', analyze_plot_lengths=None,
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
32 text_legend='Frame 0, Frame 1, Frame 2', rdata_save='Periodicity.rda',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
33 html_file='Periodicity-report.html', output_path=os.getcwd()):
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
34 """Plot triplet periodicity from prepared R data file. """
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
35 logging.debug('{}'.format(R('sessionInfo()')))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
36 cmd = 'suppressMessages(library(riboSeqR))'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
37 run_rscript(cmd)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
38
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
39 logging.debug('Loading saved R data file')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
40 cmd = 'load("{}")'.format(rdata_load)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
41 run_rscript(cmd)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
42
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
43 # get the first seqname (populated from input file which was generated from
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
44 # SAM)
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
45 refseq = R('levels(riboDat@riboGR$`RiboSeq file 1`@seqnames)[1]')
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
46 refseq = tuple(refseq)[0]
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
47
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
48 fasta_valid = False
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
49 for line in open(fasta_file):
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
50 if line.startswith('>'):
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
51 seqname = line.strip().lstrip('>').split()
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
52 # get the FASTA header, split on space, and use the first column to
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
53 # compare. Multiple fields separated by space doesn't appear to be
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
54 # a problem for riboSeqR
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
55 if seqname[0] == refseq:
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
56 fasta_valid = True
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
57 break
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
58 if not fasta_valid:
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
59 raise TripletPeriodicityError(
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
60 'Transcript name in SAM alignment does not match with name in the transcriptome '
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
61 'FASTA header. Please also check if the reads were aligned to this transcriptome '
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
62 'and not the genome')
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
63
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
64 # R("""options(showTailLines=Inf)""")
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
65 starts, stops = (utils.process_args(start_codons, ret_mode='charvector'),
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
66 utils.process_args(stop_codons, ret_mode='charvector'))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
67
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
68 cmd = ('fastaCDS <- findCDS(fastaFile={0!r}, startCodon={1}, '
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
69 'stopCodon={2})'.format(fasta_file, starts, stops))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
70 run_rscript(cmd)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
71
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
72 logging.debug('Potential coding sequences using start codon (ATG) and '
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
73 'stop codons TAG, TAA, TGA')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
74 logging.debug('{}\n'.format(R['fastaCDS']))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
75
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
76 cmd = """fCs <- frameCounting(riboDat, fastaCDS, lengths={0})
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
77
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
78 """.format(include_lengths)
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
79 if analyze_plot_lengths:
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
80 cmd += 'fS <- readingFrame(rC=fCs, lengths={0}); fS'.format(analyze_plot_lengths)
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
81 else:
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
82 cmd += 'fS <- readingFrame(rC=fCs); fS'
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
83 run_rscript(cmd)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
84
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
85 logging.debug('riboDat \n{}\n'.format(R['riboDat']))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
86 logging.debug('fCs\n{0}\n'.format(R['fCs']))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
87 logging.debug('Reading frames for each n-mer\n{}'.format(R['fS']))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
88
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
89 legend = utils.process_args(text_legend, ret_mode='charvector')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
90
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
91 for fmat in ('pdf', 'png'):
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
92 if fmat == 'png':
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
93 cmd = '{0}(file="{1}", type="cairo")'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
94 else:
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
95 cmd = '{0}(file="{1}")'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
96 run_rscript(cmd.format(fmat, os.path.join(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
97 output_path, '{0}.{1}'.format('Periodicity-plot', fmat))))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
98 run_rscript('plotFS(fS, legend.text = {0})'.format(legend))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
99 run_rscript('dev.off()')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
100
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
101 run_rscript('save("fCs", "fS", "riboDat", "fastaCDS", '
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
102 'file="{}", compress=FALSE)'.format(rdata_save))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
103
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
104 html = '<h2>Triplet periodicity - results</h2><hr>'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
105 html += ('<h4>Results of reading frame analysis</h4>'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
106 '<pre>{}</pre><br>'.format(R['fS']))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
107 html += ('<p>Lengths used for reading frame analysis - <code>{0}</code>'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
108 '<br>Lengths selected for the plot - <code>{1}</code>'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
109 '</p>'.format(include_lengths, analyze_plot_lengths))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
110 html += ('<p><img src="Periodicity-plot.png" border="1" '
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
111 'alt="Triplet periodicity plot" />'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
112 '<br><a href="Periodicity-plot.pdf">PDF version</a></p>')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
113
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
114 logging.debug('\n{:#^80}\n{}\n{:#^80}\n'.format(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
115 ' R script for this session ', rscript, ' End R script '))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
116
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
117 with open(os.path.join(output_path, 'periodicity.R'), 'w') as r:
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
118 r.write(rscript)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
119
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
120 html += ('<h4>R script for this session</h4>'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
121 '<p><a href="periodicity.R">periodicity.R</a></p>'
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
122 '<p>Next step: <em>Metagene analysis</em></p>')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
123
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
124 with open(html_file, 'w') as f:
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
125 f.write(html)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
126
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
127
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
128 if __name__ == '__main__':
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
129
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
130 parser = argparse.ArgumentParser(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
131 description='Plot triplet periodicity for different read lengths.')
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
132
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
133 # required arguments
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
134 flags = parser.add_argument_group('required arguments')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
135 flags.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
136 '--rdata_load', required=True,
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
137 help='riboSeqR data from input preparation step (Prepare.rda)')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
138 flags.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
139 '--fasta_file', required=True,
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
140 help='FASTA file of the reference transcriptome')
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
141
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
142 # optional arguments
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
143 parser.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
144 '--start_codons',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
145 help='Start codon(s) to use. (default: %(default)s)', default='ATG')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
146 parser.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
147 '--stop_codons', help='Stop codon(s) to use. (default: %(default)s)',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
148 default='TAG, TAA, TGA')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
149 parser.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
150 '--include_lengths',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
151 help='Lengths of ribosome footprints to be included '
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
152 '(default: %(default)s)', default='25:30')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
153 parser.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
154 '--analyze_plot_lengths',
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
155 help='Lenghts of reads to be analyzed for frame-shift and plotting')
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
156 parser.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
157 '--text_legend',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
158 help='Text for legend used in the plot (default: %(default)s)',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
159 default='Frame 0, Frame 1, Frame 2')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
160 parser.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
161 '--rdata_save', help='File to write RData to (default: %(default)s)',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
162 default='Periodicity.rda')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
163 parser.add_argument('--html_file', help='Output file for results (HTML)')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
164 parser.add_argument('--output_path',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
165 help='Files are saved in this directory')
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
166 parser.add_argument(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
167 '--debug', help='Produce debug output', action='store_true')
5
423ad61697c4 Bugfix 1: [triplet] Lengths (frameCounting) if given should be a range (not zero).
Vimalkumar Velayudhan <vimalkumarvelayudhan@gmail.com>
parents: 0
diff changeset
168
0
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
169 args = parser.parse_args()
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
170 if args.debug:
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
171 logging.basicConfig(format='%(module)s: %(levelname)s - %(message)s',
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
172 level=logging.DEBUG, stream=sys.stdout)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
173 logging.debug('Supplied Arguments\n{}\n'.format(vars(args)))
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
174
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
175 if not os.path.exists(args.output_path):
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
176 os.mkdir(args.output_path)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
177
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
178 find_periodicity(
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
179 rdata_load=args.rdata_load, start_codons=args.start_codons,
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
180 stop_codons=args.stop_codons, fasta_file=args.fasta_file,
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
181 include_lengths=args.include_lengths,
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
182 analyze_plot_lengths=args.analyze_plot_lengths,
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
183 text_legend=args.text_legend,
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
184 rdata_save=args.rdata_save, html_file=args.html_file,
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
185 output_path=args.output_path)
c34c364ce75d First commit
Vimalkumar Velayudhan <vimal@biotechcoder.com>
parents:
diff changeset
186 logging.debug("Done!")