annotate seq2HLA.py @ 1:155b796033b6 draft default tip

Uploaded
author sebastian-boegel
date Fri, 21 Dec 2012 03:46:15 -0500
parents 913ea6991ee4
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
1 ##########################################################################################################
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
2 #Title:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
3 #seq2HLA - HLA genotyping from RNA-Seq sequence reads
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
4 #
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
5 #Release: 1.0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
6 #
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
7 #Author:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
8 #Sebastian Boegel, 2012 (c)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
9 #TRON - Translational Oncology at the University Medical Center Mainz, 55131 Mainz, Germany
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
10 #University Medical Center of the Johannes Gutenberg-University Mainz, III. Medical Department, Mainz, Germany
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
11 #
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
12 #Contact:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
13 #boegels@uni-mainz.de
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
14 #
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
15 #Synopsis:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
16 #We developed an in-silico method "Seq2HLA", written in python and R, which takes standard RNA-Seq sequence reads in fastq format
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
17 #as input, uses a bowtie index comprising all HLA alleles and outputs the most likely HLA class I and class II genotypes,
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
18 #a p-value for each call, and the expression of each class
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
19 #
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
20 #Usage:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
21 #python seq2HLA.py -1 <readfile1> -2 <readfile2> -r "<runname>" -l <readlength> [-3 <int>]*
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
22 #*optional (Default:0)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
23 #
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
24 #Dependencies:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
25 #0.) seq2HLA is a python script, developed with Python 2.6.8
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
26 #1.) bowtie must be reachable by the command "bowtie". seq2HLA was developed and tested with bowtie version 0.12.7 (64-bit). The call to bowtie is invoked with 6 CPUs. You can change that in the function "mapping".
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
27 #2.) R must be installed, seq2HLA.py was developed and tested with R version 2.12.2 (2011-02-25)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
28 #3.) Input must be paired-end reads in fastq-format
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
29 #4.) Index files must be located in the folder "references".
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
30 #5.) Packages: biopython (developed with V1.58), numpy (1.3.0)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
31 ###########################################################################################################
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
32
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
33 from operator import itemgetter
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
34 import sys
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
35 import linecache
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
36 import ast
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
37 import os
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
38 import shutil
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
39 import tempfile
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
40 import subprocess
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
41 from Bio import SeqIO
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
42 import numpy
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
43 import operator
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
44 from optparse import OptionParser
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
45
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
46 log_stderr = sys.stderr
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
47 log_file = open("err.log","w")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
48 sys.stderr = log_file
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
49
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
50 import pysam
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
51
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
52 #These variables need to be global, as they are filled and used by different modules
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
53 readcount={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
54 readspergroup={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
55 allelesPerLocus={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
56
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
57 def main(runName,readFile1,readFile2,fastaClassI,fastaClassII,bowtiebuildClassI,bowtiebuildClassII,mismatch,trim3,output1,output2,logfile,gzipped,num_threads):
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
58 log = open(logfile,"w")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
59 # mapopt="-a -v"+str(mismatch)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
60 mapopt="-a -v"+str(mismatch)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
61 #call HLA typing for Class I
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
62 mainClassI(runName+"-ClassI",readFile1,readFile2,bowtiebuildClassI,fastaClassI,mapopt,trim3,output1,log,gzipped,num_threads)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
63 #call HLA typing for Class II
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
64 mainClassII(runName+"-ClassII",readFile1,readFile2,bowtiebuildClassII,fastaClassII,mapopt,trim3,output2,log,gzipped,num_threads)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
65
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
66 #---------------Class I-------------------------------
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
67 def mainClassI(runName,readFile1,readFile2,bowtiebuild,hla1fasta,mapopt,trim3,finaloutput,log,gzipped,num_threads):
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
68 #-------1st iteration-----------------------------------
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
69 log.write("----------HLA class I------------\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
70 sam1=runName+"-iteration1.bam"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
71 iteration=1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
72 log.write("First iteration starts....\nMapping ......\n")
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
73 mapping(sam1,runName,readFile1,readFile2,bowtiebuild,1,mapopt,trim3,log,gzipped,num_threads)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
74 medians=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
75 medians.extend([0,0,0])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
76 medianflag=False
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
77 #Calculation of first digital haplototype .....
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
78 output1=runName+".digitalhaplotype1"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
79 log.write("Calculation of first digital haplototype .....\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
80 map=createRefDict(hla1fasta,"A","B","C")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
81 readMapping(map,sam1)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
82 predictHLA1(sam1,medians,output1,medianflag,log)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
83 log.write("1st iteration done.\nNow removing reads that mapped to the three top-scoring groups .......\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
84 removeReads(runName,createRemoveList(runName,map))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
85
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
86 #------2nd iteration------------------------------------------
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
87 log.write("Second iterations starts .....\n Mapping ......\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
88 medians=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
89 iteration=2
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
90 sam2=runName+"-iteration2.bam"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
91 newReadFile1=runName+"-2nditeration_1.fq"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
92 newReadFile2=runName+"-2nditeration_2.fq"
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
93 mapping(sam2,runName,newReadFile1,newReadFile2,bowtiebuild,2,mapopt,trim3,log,gzipped,num_threads)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
94 medianfile=runName+".digitalhaplotype1"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
95 for i in range(2,5):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
96 medians.append(linecache.getline(medianfile, i).split('\t', 3)[2])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
97 medianflag=True
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
98 output2=runName+".digitalhaplotype2"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
99 #Calculation of second digital haplototype
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
100 log.write("Calculation of second digital haplototype .....\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
101 map=createRefDict(hla1fasta,"A","B","C")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
102 readMapping(map,sam2)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
103 predictHLA1(sam2,medians,output2,medianflag,log)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
104 log.write("2nd iteration done.")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
105 reportHLAgenotype(output1,output2,finaloutput)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
106 log.write("Calculation of locus-specific expression ...\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
107 expression("A","B","C",694,694,694,map,runName,readFile1,finaloutput,gzipped)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
108
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
109 #--------------Class II----------------------------------------
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
110 def mainClassII(runName,readFile1,readFile2,bowtiebuild,hla2fasta,mapopt,trim3,finaloutput,log,gzipped,num_threads):
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
111 #-------1st iteration----------------------------------------
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
112 log.write("----------HLA class II------------\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
113 sam1=runName+"-iteration1.bam"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
114 iteration=1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
115 log.write("ClassII: first iteration starts....\nMapping ......\n")
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
116 mapping(sam1,runName,readFile1,readFile2,bowtiebuild,1,mapopt,trim3,log,gzipped,num_threads)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
117 medians=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
118 medians.extend([0,0,0])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
119 medianflag=False
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
120 output1=runName+".digitalhaplotype1"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
121 log.write("ClassII: calculation of first digital haplototype .....\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
122 map=createRefDict(hla2fasta,"DQA1","DQB1","DRB1")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
123 readMapping(map,sam1)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
124 predictHLA2(sam1,medians,output1,medianflag,log)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
125 log.write("1st iteration done.\nNow removing reads that mapped to the three top-scoring groups .......\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
126 removeReads(runName,createRemoveList(runName,map))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
127
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
128 #------2nd iteration------------------------------------------
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
129 log.write("Second iterations starts .....\n Mapping ......\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
130 medians=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
131 iteration=2
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
132 sam2=runName+"-iteration2.bam"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
133 newReadFile1=runName+"-2nditeration_1.fq"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
134 newReadFile2=runName+"-2nditeration_2.fq"
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
135 mapping(sam2,runName,newReadFile1,newReadFile2,bowtiebuild,2,mapopt,trim3,log,gzipped,num_threads)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
136 medianfile=runName+".digitalhaplotype1"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
137 for i in range(2,5):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
138 medians.append(float(linecache.getline(medianfile, i).split('\t', 3)[2])/2.0)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
139 medianflag=True
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
140 output2=runName+".digitalhaplotype2"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
141 #Calculation of second digital haplototype
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
142 log.write("ClassII: calculation of second digital haplototype .....\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
143 map=createRefDict(hla2fasta,"DQA1","DQB1","DRB1")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
144 readMapping(map,sam2)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
145 predictHLA2(sam2,medians,output2,medianflag,log)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
146 log.write("2nd iteration done.\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
147 reportHLAgenotype(output1,output2,finaloutput)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
148 log.write("Calculation of locus-specific expression ...\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
149 expression("DQA1","DQB1","DRB1",400,421,421,map,runName,readFile1,finaloutput,gzipped)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
150
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
151 #performs the bowtie mapping for the 2 iterations using the given parameters
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
152 def mapping(sam,runName,readFile1,readFile2,bowtiebuild,iteration,mapopt,trim3,log,gzipped,num_threads):
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
153 if iteration==1:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
154 if gzipped == "True":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
155 sam = sam.rsplit(".",1)[0]
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
156 cmd1 = "bowtie --threads "+num_threads+" -3 %s --quiet -S %s --al %s.aligned %s -1 <(zcat %s) -2 <(zcat %s) | awk -F '\t' '$3 != \"*\"{ print $0 }' | samtools view -bS - | samtools sort - %s" % (trim3,mapopt,runName,bowtiebuild,readFile1,readFile2,sam)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
157 mappingOutput = execBowtie(cmd1)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
158
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
159 cmd2 = "samtools index %s.bam" % sam
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
160 createBAMIndex(cmd2)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
161 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
162 sam = sam.rsplit(".",1)[0]
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
163 cmd1 = "bowtie --threads "+num_threads+" -3 %s --quiet -S %s --al %s.aligned %s -1 %s -2 %s | awk -F '\t' '$3 != \"*\"{ print $0 }' | samtools view -bS - | samtools sort - %s" % (trim3,mapopt,runName,bowtiebuild,readFile1,readFile2,sam)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
164 mappingOutput = execBowtie(cmd1)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
165
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
166 cmd2 = "samtools index %s.bam" % sam
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
167 createBAMIndex(cmd2)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
168 # if iteration==2:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
169 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
170
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
171 sam = sam.rsplit(".",1)[0]
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
172 cmd2 = "bowtie --threads "+num_threads+" -3 %s --quiet -S %s %s -1 %s -2 %s | samtools view -bS - | samtools sort - %s" % (trim3,mapopt,bowtiebuild,readFile1,readFile2,sam)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
173 mappingOutput = execBowtie(cmd2)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
174
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
175 cmd3 = "samtools index %s.bam" % sam
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
176 createBAMIndex(cmd3)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
177
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
178 log.write(mappingOutput+"\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
179
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
180 def createBAMIndex(cmd):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
181 proc = subprocess.Popen(["/bin/bash","-c",cmd],stderr=subprocess.PIPE)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
182 (stdoutdata,stderrdata) = proc.communicate()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
183 r = proc.returncode
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
184 if r != 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
185 raise IOError(stderrdata)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
186
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
187 def execBowtie(cmd):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
188 proc = subprocess.Popen(["/bin/bash","-c",cmd],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
189 (stdoutdata,stderrdata) = proc.communicate()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
190 mappingOutput = stdoutdata
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
191 r = proc.returncode
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
192 if r != 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
193 raise IOError(stderrdata)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
194 return mappingOutput
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
195
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
196 #create dictionary "map", that contains all IMGT/HLA-allele names as keys and the allele name (e.g. A*02:01:01) as value
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
197 #dictionary "readcount" is initialized with 0 for each allele
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
198 #dictionary "readspergroup" is initialized with 0 for each group (2digit, e.g. A*01)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
199 #dictionary "allelesPerLocus" stores the number of alleles per locus.
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
200 def createRefDict(hlafasta,locus1,locus2,locus3):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
201 map={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
202 for locus in (locus1,locus2,locus3):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
203 allelesPerLocus[locus]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
204 handle=open(hlafasta,'r')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
205 for record in SeqIO.parse(handle, "fasta") :
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
206 l=record.description.split(' ')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
207 hlapseudoname=l[0]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
208 hlaallele=l[1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
209 map[hlapseudoname]=hlaallele
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
210 readcount[hlaallele]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
211 readspergroup[hlaallele.split(":")[0]]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
212 allelesPerLocus[hlaallele.split('*')[0]]+=1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
213 handle.close()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
214 return map
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
215
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
216 #Open sam-file and count mappings for each allele
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
217 def readMapping(map,sam):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
218 samhandle = pysam.Samfile(sam,"rb")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
219 for ar in samhandle.fetch():
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
220 hla = samhandle.getrname(ar.tid)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
221 readcount[map[hla]]+=1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
222 samhandle.close()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
223
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
224 #predict HLA class II genotype
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
225 def predictHLA2(sam,medians,output,medianflag,log):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
226
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
227 readspergroupDQAlist=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
228 readspergroupDQBlist=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
229 readspergroupDRBlist=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
230 readspergroupDQA={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
231 readspergroupDQB={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
232 readspergroupDRB={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
233 maxallelepergroup={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
234
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
235 #for each allele, to which at least 1 read map, find the allele which has the most reads within a group (2-digit-level, e.g. DQA*02") and save
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
236 #i) the key of this allele as ambassador for the group => maxallelepergroup holds for each group the top-scoring allele
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
237 #ii) the number of reads mapping to the top-scoring allele => readspergroup
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
238 for key in readcount:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
239 if readcount[key] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
240 group=key.split(":")[0]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
241 if readspergroup[group] <=readcount[key]:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
242 readspergroup[group]=readcount[key]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
243 maxallelepergroup[group]=key
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
244
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
245 #consider all DQA-,DQB-, and DRB-groups seperately
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
246 #readspergroup<DQA|DQB|DRB>list = list of all reads mapping to the top-scoring groups minus the decision threshold (which is 0 in the first iteration)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
247 #readspergroup<DQA|DQB|DRB> = contains the same entries as the list, but the entries are uniquely accessible via the group-key (e.g. DQA*01)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
248 for key in readspergroup:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
249 keyBegin = key[0:3]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
250 if keyBegin=='DQA':
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
251 readspergroupDQAlist.append(readspergroup[key]-float(medians[0]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
252 readspergroupDQA[key]=readspergroup[key]-float(medians[0])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
253 if keyBegin=='DQB':
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
254 readspergroupDQBlist.append(readspergroup[key]-float(medians[1]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
255 readspergroupDQB[key]=readspergroup[key]-float(medians[1])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
256 if keyBegin=='DRB':
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
257 readspergroupDRBlist.append(readspergroup[key]-float(medians[2]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
258 readspergroupDRB[key]=readspergroup[key]-float(medians[2])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
259
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
260 #compute the decision threshold (median) for homozygosity vs. heterozygosity for the second iteration
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
261 medianDQA=numpy.median(readspergroupDQAlist)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
262 medianDQB=numpy.median(readspergroupDQBlist)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
263 medianDRB=numpy.median(readspergroupDRBlist)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
264 a = b = c = ""
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
265
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
266 #Determine top-scoring group of the whole locus (DQA,DQBB,DRB) and store it
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
267 #maxkey<DQA,DQB,DRB> = group (e.g. DQA*01) with the most reads
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
268 #It can be that, e.g. in cancer cells a whole locus is lost. For that reason it is checked if the
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
269 #number of mapping reads of the top-scoring group maxkey<DQA,DQB,DRB> is > 0, otherwise "no" ist reported for this locus
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
270 if len(readspergroupDQAlist)>0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
271 maxkeyDQA=max(readspergroupDQA,key=lambda a:readspergroupDQA.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
272 if readspergroupDQA[maxkeyDQA] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
273 maxDQA=maxallelepergroup[maxkeyDQA]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
274 a = max(readspergroupDQA,key=lambda a:readspergroupDQA.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
275 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
276 a = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
277 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
278 a = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
279
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
280 if len(readspergroupDQBlist)>0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
281 maxkeyDQB=max(readspergroupDQB,key=lambda a:readspergroupDQB.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
282 if readspergroupDQB[maxkeyDQB] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
283 maxDQB=maxallelepergroup[maxkeyDQB]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
284 b = max(readspergroupDQB,key=lambda a:readspergroupDQB.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
285 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
286 b = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
287 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
288 b = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
289
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
290 if len(readspergroupDRBlist)>0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
291 maxkeyDRB=max(readspergroupDRB,key=lambda a:readspergroupDRB.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
292 if readspergroupDRB[maxkeyDRB] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
293 maxDRB=maxallelepergroup[maxkeyDRB]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
294 c = max(readspergroupDRB,key=lambda a:readspergroupDRB.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
295 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
296 c = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
297 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
298 c = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
299
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
300 readspergroupDQA=sorted(readspergroupDQA.items(), key=itemgetter(1),reverse=True)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
301 readspergroupDQB=sorted(readspergroupDQB.items(), key=itemgetter(1),reverse=True)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
302 readspergroupDRB=sorted(readspergroupDRB.items(), key=itemgetter(1),reverse=True)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
303
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
304 dqavec=dqbvec=drbvec=""
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
305 if medianflag:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
306 #in the 2nd iteration:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
307 #1.) DO NOT remove the top-scoring group from the set <dqa,dqb,drb>vec as this is more strict when calculating the probability of the top scoring group being an outlier
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
308 #The strings <dqa,dqb,drb>vec are used by the R-script to calculate the probability of the top scoring group being an outlier
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
309 for key in maxallelepergroup:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
310 keyBegin = key[0:3]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
311 if keyBegin=="DQA":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
312 dqavec=dqavec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
313 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
314 if keyBegin=="DQB":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
315 dqbvec=dqbvec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
316 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
317 if keyBegin=="DRB":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
318 drbvec=drbvec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
319 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
320 #2.) Add the decision thresholds to the sets <dqa,dqb,drb>vec, as this enables measuring the distance of the top-scoring group to this distance
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
321 if a=="no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
322 dqavec+=str(medians[0])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
323 if b=="no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
324 dqbvec+=str(medians[1])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
325 if c=="no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
326 drbvec+=str(medians[2])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
327 #3.) In case of, e.g. a loss of whole HLA-locus (DQA/B,DRB), <dqa,dqb,drb>vec only contain medians[<0|1|2>].
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
328 #To avoid errors in the R-Script, set to 0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
329 if dqavec==str(medians[0]):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
330 dqavec = dqacount="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
331 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
332 dqacount=str(readcount[maxallelepergroup[maxkeyDQA]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
333 if dqavec==str(readcount[maxallelepergroup[maxkeyDQA]])+",":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
334 dqavec=str(readcount[maxallelepergroup[maxkeyDQA]])+","+str(readcount[maxallelepergroup[maxkeyDQA]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
335 if dqbvec==str(medians[1]):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
336 dqbvec = dqbcount="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
337 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
338 dqbcount=str(readcount[maxallelepergroup[maxkeyDQB]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
339 if dqbvec==str(readcount[maxallelepergroup[maxkeyDQB]])+",":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
340 dqbvec=str(readcount[maxallelepergroup[maxkeyDQB]])+","+str(readcount[maxallelepergroup[maxkeyDQB]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
341 if drbvec==str(medians[2]):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
342 drbvec = drbcount="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
343 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
344 drbcount=str(readcount[maxallelepergroup[maxkeyDRB]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
345 if drbvec==str(readcount[maxallelepergroup[maxkeyDRB]])+",":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
346 drbvec=str(readcount[maxallelepergroup[maxkeyDDRB]])+","+str(readcount[maxallelepergroup[maxkeyDRB]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
347 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
348 #in the 1st iteration: remove the top-scoring group from the set <dqa,dqb,drb>vec as this increases the certainty when calculating the probability of the top scoring group being an outlier
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
349 for key in maxallelepergroup:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
350 keyBegin = key[0:3]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
351 if keyBegin=="DQA":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
352 if key!=maxkeyDQA:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
353 dqavec=dqavec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
354 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
355 if keyBegin=="DQB":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
356 if key!=maxkeyDQB:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
357 dqbvec=dqbvec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
358 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
359 if keyBegin=="DRB":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
360 if key!=maxkeyDRB:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
361 drbvec=drbvec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
362 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
363 #2.) DO NOT add the decision thresholds to the sets <dqa,dqb,drb>vec
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
364 if dqavec=="":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
365 dqacount=dqavec="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
366 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
367 dqacount=str(readcount[maxallelepergroup[maxkeyDQA]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
368 if dqavec==str(readcount[maxallelepergroup[maxkeyDQA]])+",":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
369 dqavec=str(readcount[maxallelepergroup[maxkeyDQA]])+","+str(readcount[maxallelepergroup[maxkeyDQA]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
370 if len(dqavec.split(","))==2:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
371 dqavec=dqavec+"0,"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
372
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
373 if dqbvec=="":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
374 dqbcount=dqbvec="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
375 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
376 dqbcount=str(readcount[maxallelepergroup[maxkeyDQB]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
377 if dqbvec==str(readcount[maxallelepergroup[maxkeyDQB]])+",":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
378 dqbvec=str(readcount[maxallelepergroup[maxkeyDQB]])+","+str(readcount[maxallelepergroup[maxkeyDQB]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
379 if len(dqbvec.split(","))==2:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
380 dqbvec=dqbvec+"0,"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
381
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
382 if drbvec=="":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
383 drbcount=drbvec="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
384 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
385 drbcount=str(readcount[maxallelepergroup[maxkeyDRB]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
386 if drbvec==str(readcount[maxallelepergroup[maxkeyDRB]])+",":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
387 drbvec=str(readcount[maxallelepergroup[maxkeyDDRB]])+","+str(readcount[maxallelepergroup[maxkeyDRB]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
388 if len(drbvec.split(","))==2:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
389 drbvec=drbvec+"0,"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
390
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
391 #call R-script "commmand.R" to calculate the confidence of the top-scoring allele
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
392 routput=os.popen("R --vanilla < "+sys.path[0]+"/command.R --args "+dqacount+" "+dqavec+" "+maxkeyDQA+" "+dqbcount+" "+dqbvec+" "+maxkeyDQB+" "+drbcount+" "+drbvec+" "+maxkeyDRB).read()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
393 parseOutput=routput.split("\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
394
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
395 entries = []
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
396 for entry in parseOutput:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
397 if entry[0:3]=="[1]":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
398 entries.append(str(entry[4:len(entry)]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
399
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
400 if a=="no" and entries[1]!="NA":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
401 entries[1]=str(1-float(entries[1]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
402 if b=="no" and entries[3]!="NA":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
403 entries[3]=str(1-float(entries[3]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
404 if c=="no" and entries[5]!="NA":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
405 entries[5]=str(1-float(entries[5]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
406
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
407 pred2File(entries,medianDQA,medianDQB,medianDRB,output,a,b,c,log)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
408
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
409 #predict HLA class I genotype
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
410 def predictHLA1(sam,medians,output,medianflag,log):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
411
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
412 readspergroupAlist=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
413 readspergroupBlist=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
414 readspergroupClist=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
415 readspergroupA={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
416 readspergroupB={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
417 readspergroupC={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
418 maxallelepergroup={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
419
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
420 #for each allele, to which at least 1 read map, find the allele which has the most reads within a group (2-digit-level, e.g. A*02") and save
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
421 #i) the key of this allele as ambassador for the group => maxallelepergroup holds for each group the top-scoring allele
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
422 #ii) the number of reads mapping to the top-scoring allele => readspergroup
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
423 for key in readcount:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
424 if readcount[key] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
425 group=key.split(":")[0]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
426 if readspergroup[group] <=readcount[key]:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
427 readspergroup[group]=readcount[key]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
428 maxallelepergroup[group]=key
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
429
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
430
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
431 #consider all A-,B-, and C-groups seperately
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
432 #readspergroup<A|B|C>list = list of all reads mapping to the top-scoring groups minus the decision threshold (which is 0 in the first iteration)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
433 #readspergroup<A|B|C> = contains the same entries as the list, but the entries are uniquely accessible via the group-key (e.g. B*27)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
434 for key in readspergroup:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
435 firstKey = key[0]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
436 if key[0]=='A':
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
437 readspergroupAlist.append(readspergroup[key]-float(medians[0]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
438 readspergroupA[key]=readspergroup[key]-float(medians[0])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
439 if key[0]=='B':
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
440 readspergroupBlist.append(readspergroup[key]-float(medians[1]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
441 readspergroupB[key]=readspergroup[key]-float(medians[1])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
442 if key[0]=='C':
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
443 readspergroupClist.append(readspergroup[key]-float(medians[2]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
444 readspergroupC[key]=readspergroup[key]-float(medians[2])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
445
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
446 #compute the decision threshold (median) for homozygosity vs. heterozygosity for the second iteration
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
447 medianA=numpy.median(readspergroupAlist)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
448 medianB=numpy.median(readspergroupBlist)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
449 medianC=numpy.median(readspergroupClist)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
450 a = b = c = ""
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
451
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
452 #Determine top-scoring group of the whole locus (A,B,C) and store it
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
453 #maxkey<A,B,C> = group (e.g. A*02) with the most reads
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
454 #It can be that, e.g. in cancer cells A whole locus is lost. For that reason it is checked if the
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
455 #number of mapping reads of the top-scoring group maxkey<A,B,C> is > 0, otherwise "no" ist reported for this locus
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
456 if len(readspergroupAlist)>0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
457 maxkeyA=max(readspergroupA,key=lambda a:readspergroupA.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
458 if readspergroupA[maxkeyA] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
459 maxA=maxallelepergroup[maxkeyA]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
460 a = max(readspergroupA,key=lambda a:readspergroupA.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
461 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
462 a = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
463 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
464 a = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
465
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
466 if len(readspergroupBlist)>0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
467 maxkeyB=max(readspergroupB,key=lambda a:readspergroupB.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
468 if readspergroupB[maxkeyB] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
469 maxB=maxallelepergroup[maxkeyB]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
470 b = max(readspergroupB,key=lambda a:readspergroupB.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
471 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
472 b = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
473 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
474 b = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
475
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
476 if len(readspergroupClist)>0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
477 maxkeyC=max(readspergroupC,key=lambda a:readspergroupC.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
478 if readspergroupC[maxkeyC] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
479 maxC=maxallelepergroup[maxkeyC]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
480 c = max(readspergroupC,key=lambda a:readspergroupC.get(a))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
481 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
482 c = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
483 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
484 c = "no"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
485
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
486 readspergroupA=sorted(readspergroupA.items(), key=itemgetter(1),reverse=True)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
487 readspergroupB=sorted(readspergroupB.items(), key=itemgetter(1),reverse=True)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
488 readspergroupC=sorted(readspergroupC.items(), key=itemgetter(1),reverse=True)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
489
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
490 avec=bvec=cvec=""
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
491
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
492 if medianflag:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
493 #in the 2nd iteration:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
494 #1.) DO NOT remove the top-scoring group from the set <a,b,c>vec as this is more strict when calculating the probability of the top scoring group being an outlier
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
495 #The strings <a,b,c>vec are used by the R-script to calculate the probability of the top scoring group being an outlier
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
496 for key in maxallelepergroup:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
497 if key[0]=="A":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
498 avec=avec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
499 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
500 if key[0]=="B":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
501 bvec=bvec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
502 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
503 if key[0]=="C":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
504 cvec=cvec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
505 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
506 #2.) Add the decision thresholds to the sets <a,b,c>vec, as this enables measuring the distance of the top-scoring group to this distance
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
507 if a=="no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
508 avec+=str(medians[0])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
509 if b=="no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
510 bvec+=str(medians[1])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
511 if c=="no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
512 cvec+=str(medians[2])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
513 #3.) In case of, e.g. a loss of whole HLA-locus (A,B,C), <a,b,c>vec only contain median[<0|1|2>].
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
514 #To avoid errors in the R-Script, set to 0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
515 if avec=="" or avec==medians[0]:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
516 avec = acount="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
517 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
518 acount=str(readcount[maxallelepergroup[maxkeyA]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
519
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
520 if bvec=="" or bvec==medians[1]:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
521 bvec = bcount="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
522 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
523 bcount=str(readcount[maxallelepergroup[maxkeyB]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
524
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
525 if cvec=="" or cvec==medians[2]:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
526 cvec = ccount="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
527 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
528 ccount=str(readcount[maxallelepergroup[maxkeyC]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
529 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
530 #in the 1st iteration: remove the top-scoring group from the set <a,b,c>vec as this increases the certainty when calculating the probability of the top scoring group being an outlier
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
531 for key in maxallelepergroup:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
532 if key[0]=="A" and key!=maxkeyA:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
533 avec=avec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
534 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
535 if key[0]=="B" and key!=maxkeyB:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
536 bvec=bvec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
537 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
538 if key[0]=="C" and key!=maxkeyC:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
539 cvec=cvec+str(readcount[maxallelepergroup[key]])+","
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
540 #print key+": "+str(readcount[maxallelepergroup[key]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
541 #2.) DO NOT add the decision thresholds to the sets <a,b,c>vec
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
542 if avec=="":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
543 acount=avec="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
544 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
545 acount=str(readcount[maxallelepergroup[maxkeyA]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
546 if bvec=="":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
547 bcount=bvec="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
548 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
549 bcount=str(readcount[maxallelepergroup[maxkeyB]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
550 if cvec=="":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
551 ccount=cvec="0"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
552 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
553 ccount=str(readcount[maxallelepergroup[maxkeyC]])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
554
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
555 #call R-script "commmand.R" to calculate the confidence of the top-scoring allele
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
556 routput=os.popen("R --vanilla < "+sys.path[0]+"/command.R --args "+acount+" "+avec+" "+maxkeyA+" "+bcount+" "+bvec+" "+maxkeyB+" "+ccount+" "+cvec+" "+maxkeyC).read()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
557 parseOutput=routput.split("\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
558
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
559 entries = []
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
560 for entry in parseOutput:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
561 if entry[0:3]=="[1]":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
562 entries.append(str(entry[4:len(entry)]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
563
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
564 if a=="no" and entries[1]!="NA":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
565 entries[1]=str(1-float(entries[1]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
566 if b=="no" and entries[3]!="NA":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
567 entries[3]=str(1-float(entries[3]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
568 if c=="no" and entries[5]!="NA":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
569 entries[5]=str(1-float(entries[5]))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
570
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
571 pred2File(entries,medianA,medianB,medianC,output,a,b,c,log)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
572
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
573 #write digital haplotype into file
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
574 def pred2File(entries,medianA,medianB,medianC,output,a,b,c,log):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
575 out = open(output, 'w')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
576 out.write("HLA\tHLA1\tmedian-Value\talternative\tp-Value\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
577 out.write("A\t"+a+"\t")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
578 # write the median-Value for A
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
579 out.write(str(medianA)+"\t")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
580 out.write(entries[0]+"\t"+entries[1]+"\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
581
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
582 out.write("B\t"+b+"\t")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
583 # write the median-Value for B
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
584 out.write(str(medianB)+"\t")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
585 out.write(entries[2]+"\t"+entries[3]+"\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
586
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
587 out.write("C\t"+c+"\t")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
588 # write the median-Value for C
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
589 out.write(str(medianC)+"\t")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
590 out.write(entries[4]+"\t"+entries[5]+"\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
591 out.close()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
592 log.write("The digital haplotype is written into "+output+"\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
593
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
594 #open mapping file and all read ids to the list "removeList", which map to one of the three groups in "alleles"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
595 def createRemoveList(runName,map):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
596 removeList={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
597 alleles = []
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
598 sam=runName+"-iteration1.bam"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
599 alleles_in=runName+".digitalhaplotype1"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
600 for i in range(2,5):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
601 alleles.append(linecache.getline(alleles_in, i).split('\t', 2)[1])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
602
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
603 samhandle=pysam.Samfile(sam,'rb')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
604 for read in samhandle.fetch():
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
605 illuminaid = read.qname
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
606 hlapseudoname = samhandle.getrname(read.tid)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
607 if map[hlapseudoname].split(':')[0] in alleles:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
608 removeList[illuminaid]=1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
609 samhandle.close()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
610 return removeList
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
611
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
612 #Remove reads that mapped to the three top-scoring alleles and write the remaining reads into two new read files
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
613 def removeReads(runName,removeList):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
614 aligned1=runName+"_1.aligned"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
615 aligned2=runName+"_2.aligned"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
616 newReadFile1=runName+"-2nditeration_1.fq"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
617 newReadFile2=runName+"-2nditeration_2.fq"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
618 #r1 and r2, which are the input of bowtie in the 2nd iteration
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
619 r1=open(newReadFile1,"w")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
620 r2=open(newReadFile2,"w")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
621 #open the 2 files, that contain the reads that mapped in the 1st iteration
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
622 aligned_handle1=open(aligned1,"r")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
623 aligned_handle2=open(aligned2,"r")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
624
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
625 #One read entry consists of 4 lines: header, seq, "+", qualities.
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
626 for record in SeqIO.parse(aligned_handle1, "fastq"):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
627 illuminaid=record.id.split('/')[0].split(' ')[0]#find exact id, which also appears in the mapping file
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
628 if not illuminaid in removeList:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
629 SeqIO.write(record, r1, "fastq")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
630
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
631 for record in SeqIO.parse(aligned_handle2, "fastq"):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
632 illuminaid=record.id.split('/')[0].split(' ')[0]#find exact id, which also appears in the mapping file
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
633 if not illuminaid in removeList:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
634 SeqIO.write(record, r2, "fastq")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
635
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
636 #write the final prediction (both digital haplotypes) to file and stdout
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
637 def reportHLAgenotype(output1,output2,finaloutput):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
638 filehandle1=open(output1,'r').readlines()[1:4]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
639 filehandle2=open(output2,'r').readlines()[1:4]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
640 outfile=open(finaloutput, 'w')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
641
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
642 outfile.write("HLA\tHLA1\tp-Value\tHLA2\tp-Value\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
643 for i in range(len(filehandle1)):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
644 filehandle1[i]=filehandle1[i][0:-1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
645 for i in range(len(filehandle2)):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
646 filehandle2[i]=filehandle2[i][0:-1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
647
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
648 a1 = filehandle1[0].split('\t',2)[1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
649 a1score = filehandle1[0].split('\t')[4]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
650 b1 = filehandle1[1].split('\t',2)[1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
651 b1score = filehandle1[1].split('\t')[4]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
652 c1 = filehandle1[2].split('\t',2)[1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
653 c1score = filehandle1[2].split('\t')[4]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
654
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
655 a2 = filehandle2[0].split('\t',2)[1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
656 if a2 == "no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
657 a2 = "hoz("+filehandle2[0].split('\t')[3]+")"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
658 a2score = filehandle2[0].split('\t')[4]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
659 b2 = filehandle2[1].split('\t',2)[1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
660 if b2 == "no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
661 b2 = "hoz("+filehandle2[1].split('\t')[3]+")"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
662 b2score = filehandle2[1].split('\t')[4]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
663 c2 = filehandle2[2].split('\t', 2)[1]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
664 if c2 == "no":
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
665 c2 = "hoz("+filehandle2[2].split('\t')[3]+")"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
666 c2score = filehandle2[2].split('\t')[4]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
667 #write complete HLA genotype to file
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
668 outfile.write("A\t"+a1+"\t"+a1score+"\t"+a2+"\t"+a2score+"\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
669 outfile.write("B\t"+b1+"\t"+b1score+"\t"+b2+"\t"+b2score+"\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
670 outfile.write("C\t"+c1+"\t"+c1score+"\t"+c2+"\t"+c2score+"\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
671 #.. and print it to STDOUT
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
672 #print "A\t"+a1+"\t"+a1score+"\t"+a2+"\t"+a2score
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
673 #print "B\t"+b1+"\t"+b1score+"\t"+b2+"\t"+b2score
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
674 #print "C\t"+c1+"\t"+c1score+"\t"+c2+"\t"+c2score
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
675
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
676 #calculate locus-specific expression
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
677 def expression(locus1,locus2,locus3,length1,length2,length3,map,runName,readFile1,finaloutput,gzipped):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
678 outfile = open(finaloutput,'a')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
679 aligned1=runName+"_1.aligned"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
680 sam=runName+"-iteration1.bam"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
681 alleles_in=runName+".digitalhaplotype1"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
682 if gzipped:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
683 proc=subprocess.Popen(["/bin/bash","-c","wc -l <(zcat " + readFile1 + ")"],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
684 (stdoutdata,stderrdata) = proc.communicate()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
685 numberlines = stdoutdata
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
686 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
687 proc=subprocess.Popen(["/bin/bash","-c","wc -l " + readFile1],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
688 (stdoutdata,stderrdata) = proc.communicate()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
689 numberlines = stdoutdata
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
690 totalreads=int(numberlines.split(" ")[0])/4
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
691 alleles=[]
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
692 for i in range(2,5):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
693 alleles.append(linecache.getline(alleles_in, i).split('\t', 2)[1])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
694 alleles.append(linecache.getline(alleles_in, i).split('\t')[3])
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
695
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
696 #create read dictionary
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
697 reads={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
698 aligned_handle1=open(aligned1,"r")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
699 for record in SeqIO.parse(aligned_handle1, "fastq"):
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
700 illuminaid=record.id.split('/')[0].split(' ')[0]#find exact id, which also appears in the mapping file
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
701 reads[illuminaid]={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
702 reads[illuminaid][locus1]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
703 reads[illuminaid][locus2]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
704 reads[illuminaid][locus3]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
705 samhandle = pysam.Samfile(sam,"rb")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
706 for read in samhandle.fetch():
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
707 illuminaid = read.qname
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
708 hlapseudoname = samhandle.getrname(read.tid)
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
709 if map[hlapseudoname].split(':')[0] in alleles:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
710 reads[illuminaid][map[hlapseudoname].split('*')[0]]+=1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
711
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
712 count={}
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
713 count[locus1]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
714 count[locus2]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
715 count[locus3]=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
716 for key in reads:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
717 n=0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
718 for locus in reads[key]:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
719 if reads[key][locus] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
720 n+=1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
721 for locus in reads[key]:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
722 if reads[key][locus] > 0:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
723 count[locus]+=float(1.0/float(n))
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
724
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
725 outfile.write("\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
726 #Calculate RPKM and print expression values for each locus to stdout
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
727 for locus in count:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
728 if locus==locus1:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
729 outfile.write(locus+": "+str(round(float((1000.0/length1))*float((1000000.0/totalreads))*count[locus],2))+" RPKM\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
730 if locus==locus2:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
731 outfile.write(locus+": "+str(round(float((1000.0/length2))*float((1000000.0/totalreads))*count[locus],2))+" RPKM\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
732 if locus==locus3:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
733 outfile.write(locus+": "+str(round(float((1000.0/length3))*float((1000000.0/totalreads))*count[locus],2))+" RPKM\n")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
734
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
735
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
736 if __name__ == '__main__':
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
737 parser = OptionParser(usage="usage: %prog -1 readFile1 -2 readFile2 -r runName -l readlength [-3 <int>]", version="%prog 1.0")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
738 parser.add_option("-z","--gzipped",action="store",dest="gzipped",help="Select if input reads are in gzip format")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
739 parser.add_option("-1",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
740 action="store",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
741 dest="readFile1",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
742 help="File name of #1 mates ")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
743 parser.add_option("-2",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
744 action="store",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
745 dest="readFile2",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
746 help="File name of #2 mates")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
747 parser.add_option("-r", "--runName",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
748 action="store",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
749 dest="runName",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
750 help="Name of this HLA typing run. Wil be used throughout this process as part of the name of the newly created files.")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
751 parser.add_option("-l", "--length",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
752 action="store",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
753 dest="length",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
754 help="Readlength")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
755 parser.add_option("-3", "--trim3",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
756 action="store",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
757 dest="trim3",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
758 default="0",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
759 help="Bowtie option: -3 <int> trims <int> bases from the low quality 3' end of each read. Default: 0")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
760 parser.add_option("-o", "--output1",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
761 action="store",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
762 dest="output1",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
763 help="Output file 1")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
764 parser.add_option("-p", "--output2",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
765 action="store",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
766 dest="output2",
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
767 help="Output file 2")
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
768 parser.add_option("-g","--log",action="store",dest="logfile",help="Output Log File")
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
769 parser.add_option("-t","--threads",type="int",dest="num_threads",help="Define amount of threads to use for Bowtie in this script")
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
770
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
771
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
772 (options, args) = parser.parse_args()
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
773 if not options.readFile1:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
774 parser.error('File name #1 pair not given.')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
775 if not options.readFile2:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
776 parser.error('File name #2 pair not given.')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
777 if not options.runName:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
778 parser.error('Run name not given.')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
779 if not options.length:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
780 parser.error('Readlength not given.')
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
781 gzipped = options.gzipped
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
782 readFile1=options.readFile1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
783 readFile2=options.readFile2
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
784 runName=options.runName
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
785 output1 = options.output1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
786 output2 = options.output2
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
787 logfile = options.logfile
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
788 bowtiebuildClassI=sys.path[0]+"/references/ClassIWithoutNQex2-3.plus75"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
789 bowtiebuildClassII=sys.path[0]+"/references/HLA2.ex2.plus75"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
790 fastaClassI=sys.path[0]+"/references/ClassIWithoutNQex2-3.plus75.fasta"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
791 fastaClassII=sys.path[0]+"/references/HLA2.ex2.plus75.fasta"
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
792 #as shown in the publication HLA typing with RNA-Seq works best by allowing as less mismatches as necessary
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
793 if int(options.length)<=50:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
794 mismatch=1
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
795 elif int(options.length)>50 and int(options.length)<=100:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
796 mismatch=2
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
797 else:
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
798 mismatch=3
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
799 trim3=str(options.trim3)
1
155b796033b6 Uploaded
sebastian-boegel
parents: 0
diff changeset
800 main(runName,readFile1,readFile2,fastaClassI,fastaClassII,bowtiebuildClassI,bowtiebuildClassII,mismatch,trim3,output1,output2,logfile,gzipped,options.num_threads)
0
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
801 sys.stderr = log_stderr
913ea6991ee4 Uploaded
sebastian-boegel
parents:
diff changeset
802 log_file.close()