annotate query_metexp.py @ 21:19d8fd10248e

* Added interface to METEXP data store, including tool to fire queries in batch mode * Improved quantification output files of MsClust, a.o. sorting mass list based on intensity (last two columns of quantification files) * Added Molecular Mass calculation method
author pieter.lukasse@wur.nl
date Wed, 05 Mar 2014 17:20:11 +0100
parents
children cd4f13119afa
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
21
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
1 #!/usr/bin/env python
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
2 # encoding: utf-8
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
3 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
4 Module to query a set of identifications against the METabolomics EXPlorer database.
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
5
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
6 It will take the input file and for each record it will query the
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
7 molecular mass in the selected MetExp DB. If one or more compounds are found in the
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
8 MetExp DB then extra information regarding these compounds is added to the output file.
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
9
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
10 The output file is thus the input file enriched with information about
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
11 related items found in the selected MetExp DB.
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
12 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
13 import csv
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
14 import sys
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
15 import fileinput
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
16 import urllib2
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
17 from collections import OrderedDict
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
18
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
19 __author__ = "Pieter Lukasse"
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
20 __contact__ = "pieter.lukasse@wur.nl"
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
21 __copyright__ = "Copyright, 2014, Plant Research International, WUR"
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
22 __license__ = "Apache v2"
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
23
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
24 def _process_file(in_xsv, delim='\t'):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
25 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
26 Generic method to parse a tab-separated file returning a dictionary with named columns
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
27 @param in_csv: input filename to be parsed
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
28 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
29 data = list(csv.reader(open(in_xsv, 'rU'), delimiter=delim))
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
30 return _process_data(data)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
31
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
32 def _process_data(data):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
33
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
34 header = data.pop(0)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
35 # Create dictionary with column name as key
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
36 output = OrderedDict()
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
37 for index in xrange(len(header)):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
38 output[header[index]] = [row[index] for row in data]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
39 return output
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
40
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
41
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
42 def _query_and_add_data(input_data, casid_col, formula_col, molecular_mass_col, metexp_dblink, separation_method):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
43 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
44 This method will iterate over the record in the input_data and
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
45 will enrich them with the related information found (if any) in the
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
46 MetExp Database.
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
47 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
48 merged = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
49
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
50 for i in xrange(len(input_data[input_data.keys()[0]])):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
51 # Get the record in same dictionary format as input_data, but containing
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
52 # a value at each column instead of a list of all values of all records:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
53 input_data_record = OrderedDict(zip(input_data.keys(), [input_data[key][i] for key in input_data.keys()]))
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
54
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
55 # read the molecular mass and formula:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
56 cas_id = input_data_record[casid_col]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
57 formula = input_data_record[formula_col]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
58 molecular_mass = input_data_record[molecular_mass_col]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
59
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
60 # search for related records in MetExp:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
61 data_found = None
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
62 if cas_id != "undef":
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
63 # 1- search for other experiments where this CAS id has been found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
64 query_link = metexp_dblink + "/find_entries/query?cas_nr="+ cas_id + "&method=" + separation_method
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
65 data_found = _fire_query_and_return_dict(query_link + "&_format_result=tsv")
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
66 data_type_found = "CAS"
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
67 if data_found == None:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
68 # 2- search for other experiments where this FORMULA has been found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
69 query_link = metexp_dblink + "/find_entries/query?molecule_formula="+ formula + "&method=" + separation_method
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
70 data_found = _fire_query_and_return_dict(query_link + "&_format_result=tsv")
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
71 data_type_found = "FORMULA"
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
72 if data_found == None:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
73 # 3- search for other experiments where this MM has been found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
74 query_link = metexp_dblink + "/find_entries/query?molecule_mass="+ molecular_mass + "&method=" + separation_method
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
75 data_found = _fire_query_and_return_dict(query_link + "&_format_result=tsv")
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
76 data_type_found = "MM"
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
77
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
78 if data_found == None:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
79 # If still nothing found, just add empty columns
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
80 extra_cols = ['', '','','','','','','']
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
81 else:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
82 # Add info found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
83 extra_cols = _get_extra_info_and_link_cols(data_found, data_type_found, query_link)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
84
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
85 # Take all data and merge it into a "flat"/simple array of values:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
86 field_values_list = _merge_data(input_data_record, extra_cols)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
87
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
88 merged.append(field_values_list)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
89
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
90 # return the merged/enriched records:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
91 return merged
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
92
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
93
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
94 def _get_extra_info_and_link_cols(data_found, data_type_found, query_link):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
95 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
96 This method will go over the data found and will return a
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
97 list with the following items:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
98 - Experiment details where hits have been found :
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
99 'organism', 'tissue','experiment_name','user_name','column_type'
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
100 - Link that executes same query
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
101
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
102 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
103 # set() makes a unique list:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
104 organism_set = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
105 tissue_set = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
106 experiment_name_set = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
107 user_name_set = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
108 column_type_set = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
109 cas_nr_set = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
110
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
111 if 'organism' in data_found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
112 organism_set = set(data_found['organism'])
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
113 if 'tissue' in data_found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
114 tissue_set = set(data_found['tissue'])
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
115 if 'experiment_name' in data_found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
116 experiment_name_set = set(data_found['experiment_name'])
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
117 if 'user_name' in data_found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
118 user_name_set = set(data_found['user_name'])
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
119 if 'column_type' in data_found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
120 column_type_set = set(data_found['column_type'])
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
121 if 'CAS' in data_found:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
122 cas_nr_set = set(data_found['CAS'])
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
123
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
124
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
125 result = [data_type_found,
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
126 _to_xsv(organism_set),
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
127 _to_xsv(tissue_set),
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
128 _to_xsv(experiment_name_set),
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
129 _to_xsv(user_name_set),
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
130 _to_xsv(column_type_set),
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
131 _to_xsv(cas_nr_set),
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
132 #To let Excel interpret as link, use e.g. =HYPERLINK("http://stackoverflow.com", "friendly name"):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
133 "=HYPERLINK(\""+ query_link + "\", \"Link to entries found in DB \")"]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
134 return result
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
135
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
136
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
137 def _to_xsv(data_set):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
138 result = ""
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
139 for item in data_set:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
140 result = result + str(item) + "|"
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
141 return result
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
142
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
143
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
144 def _fire_query_and_return_dict(url):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
145 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
146 This method will fire the query as a web-service call and
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
147 return the results as a list of dictionary objects
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
148 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
149
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
150 try:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
151 data = urllib2.urlopen(url).read()
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
152
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
153 # transform to dictionary:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
154 result = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
155 data_rows = data.split("\n")
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
156
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
157 # check if there is any data in the response:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
158 if len(data_rows) <= 1 or data_rows[1].strip() == '':
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
159 # means there is only the header row...so no hits:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
160 return None
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
161
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
162 for data_row in data_rows:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
163 if not data_row.strip() == '':
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
164 row_as_list = _str_to_list(data_row, delimiter='\t')
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
165 result.append(row_as_list)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
166
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
167 # return result processed into a dict:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
168 return _process_data(result)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
169
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
170 except urllib2.HTTPError, e:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
171 raise Exception( "HTTP error for URL: " + url + " : %s - " % e.code + e.reason)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
172 except urllib2.URLError, e:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
173 raise Exception( "Network error: %s" % e.reason.args[1] + ". Administrator: please check if MetExp service [" + url + "] is accessible from your Galaxy server. ")
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
174
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
175 def _str_to_list(data_row, delimiter='\t'):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
176 result = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
177 for column in data_row.split(delimiter):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
178 result.append(column)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
179 return result
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
180
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
181
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
182 # alternative: ?
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
183 # s = requests.Session()
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
184 # s.verify = False
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
185 # #s.auth = (token01, token02)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
186 # resp = s.get(url, params={'name': 'anonymous'}, stream=True)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
187 # content = resp.content
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
188 # # transform to dictionary:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
189
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
190
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
191
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
192
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
193 def _merge_data(input_data_record, extra_cols):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
194 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
195 Adds the extra information to the existing data record and returns
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
196 the combined new record.
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
197 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
198 record = []
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
199 for column in input_data_record:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
200 record.append(input_data_record[column])
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
201
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
202
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
203 # add extra columns
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
204 for column in extra_cols:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
205 record.append(column)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
206
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
207 return record
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
208
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
209
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
210 def _save_data(data_rows, headers, out_csv):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
211 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
212 Writes tab-separated data to file
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
213 @param data_rows: dictionary containing merged/enriched dataset
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
214 @param out_csv: output csv file
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
215 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
216
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
217 # Open output file for writing
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
218 outfile_single_handle = open(out_csv, 'wb')
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
219 output_single_handle = csv.writer(outfile_single_handle, delimiter="\t")
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
220
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
221 # Write headers
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
222 output_single_handle.writerow(headers)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
223
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
224 # Write one line for each row
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
225 for data_row in data_rows:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
226 output_single_handle.writerow(data_row)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
227
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
228 def _get_metexp_URL(metexp_dblink_file):
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
229 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
230 Read out and return the URL stored in the given file.
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
231 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
232 file_input = fileinput.input(metexp_dblink_file)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
233 try:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
234 for line in file_input:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
235 if line[0] != '#':
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
236 # just return the first line that is not a comment line:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
237 return line
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
238 finally:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
239 file_input.close()
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
240
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
241
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
242 def main():
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
243 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
244 MetExp Query main function
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
245
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
246 The input file can be any tabular file, as long as it contains a column for the molecular mass
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
247 and one for the formula of the respective identification. These two columns are then
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
248 used to query against MetExp Database.
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
249 '''
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
250 input_file = sys.argv[1]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
251 casid_col = sys.argv[2]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
252 formula_col = sys.argv[3]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
253 molecular_mass_col = sys.argv[4]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
254 metexp_dblink_file = sys.argv[5]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
255 separation_method = sys.argv[6]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
256 output_result = sys.argv[7]
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
257
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
258 # Parse metexp_dblink_file to find the URL to the MetExp service:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
259 metexp_dblink = _get_metexp_URL(metexp_dblink_file)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
260
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
261 # Parse tabular input file into dictionary/array:
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
262 input_data = _process_file(input_file)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
263
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
264 # Query data against MetExp DB :
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
265 enriched_data = _query_and_add_data(input_data, casid_col, formula_col, molecular_mass_col, metexp_dblink, separation_method)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
266 headers = input_data.keys() + ['METEXP hits for ','METEXP hits: organisms', 'METEXP hits: tissues',
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
267 'METEXP hits: experiments','METEXP hits: user names','METEXP hits: column types', 'METEXP hits: CAS nrs', 'Link to METEXP hits']
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
268
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
269 _save_data(enriched_data, headers, output_result)
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
270
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
271
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
272 if __name__ == '__main__':
19d8fd10248e * Added interface to METEXP data store, including tool to fire queries in batch mode
pieter.lukasse@wur.nl
parents:
diff changeset
273 main()