|
542
|
1 # Export Metabolic Model
|
|
|
2
|
|
|
3 Export tabular data (CSV/TSV) into COBRA metabolic models in various formats.
|
|
|
4
|
|
|
5 ## Overview
|
|
|
6
|
|
|
7 Export Metabolic Model (exportMetabolicModel) converts structured tabular data containing reaction information into fully functional COBRA metabolic models. This tool enables creation of custom models from spreadsheet data and supports multiple output formats including SBML, JSON, MATLAB, and YAML.
|
|
|
8
|
|
|
9 ## Usage
|
|
|
10
|
|
|
11 ### Command Line
|
|
|
12
|
|
|
13 ```bash
|
|
|
14 exportMetabolicModel --input model_data.csv \
|
|
|
15 --format sbml \
|
|
|
16 --output custom_model.xml \
|
|
|
17 --out_log conversion.log \
|
|
|
18 --tool_dir /path/to/COBRAxy/src
|
|
|
19 ```
|
|
|
20
|
|
|
21 ### Galaxy Interface
|
|
|
22
|
|
|
23 Select "Export Metabolic Model" from the COBRAxy tool suite and configure conversion parameters.
|
|
|
24
|
|
|
25 ## Parameters
|
|
|
26
|
|
|
27 ### Required Parameters
|
|
|
28
|
|
|
29 | Parameter | Flag | Description |
|
|
|
30 |-----------|------|-------------|
|
|
|
31 | Input File | `--input` | Tabular file (CSV/TSV) with model data |
|
|
|
32 | Output Format | `--format` | Model format (sbml, json, mat, yaml) |
|
|
|
33 | Output File | `--output` | Output model file path |
|
|
|
34 | Output Log | `--out_log` | Log file for conversion process |
|
|
|
35
|
|
|
36 ### Optional Parameters
|
|
|
37
|
|
|
38 | Parameter | Flag | Description | Default |
|
|
|
39 |-----------|------|-------------|---------|
|
|
|
40 | Tool Directory | `--tool_dir` | COBRAxy installation directory | Current directory |
|
|
|
41
|
|
|
42 ## Input Format
|
|
|
43
|
|
|
44 ### Tabular Model Data
|
|
|
45
|
|
|
46 The input file must contain structured model information with the following columns:
|
|
|
47
|
|
|
48 ```csv
|
|
|
49 Reaction_ID,GPR_Rule,Reaction_Formula,Lower_Bound,Upper_Bound,Objective_Coefficient,Medium_Member,Compartment,Subsystem
|
|
|
50 R00001,GENE1 or GENE2,A + B -> C + D,-1000.0,1000.0,0.0,FALSE,cytosol,Glycolysis
|
|
|
51 R00002,GENE3 and GENE4,E <-> F,-1000.0,1000.0,0.0,FALSE,mitochondria,TCA_Cycle
|
|
|
52 EX_glc_e,-,glc_e <->,-1000.0,1000.0,0.0,TRUE,extracellular,Exchange
|
|
|
53 BIOMASS,GENE5,0.5 A + 0.3 B -> 1 BIOMASS,0.0,1000.0,1.0,FALSE,cytosol,Biomass
|
|
|
54 ```
|
|
|
55
|
|
|
56 ### Required Columns
|
|
|
57
|
|
|
58 | Column | Description | Format |
|
|
|
59 |--------|-------------|--------|
|
|
|
60 | **Reaction_ID** | Unique reaction identifier | String |
|
|
|
61 | **Reaction_Formula** | Stoichiometric equation | Metabolite notation |
|
|
|
62 | **Lower_Bound** | Minimum flux constraint | Numeric |
|
|
|
63 | **Upper_Bound** | Maximum flux constraint | Numeric |
|
|
|
64
|
|
|
65 ### Optional Columns
|
|
|
66
|
|
|
67 | Column | Description | Default |
|
|
|
68 |--------|-------------|---------|
|
|
|
69 | **GPR_Rule** | Gene-protein-reaction association | Empty string |
|
|
|
70 | **Objective_Coefficient** | Biomass/objective weight | 0.0 |
|
|
|
71 | **Medium_Member** | Exchange reaction flag | FALSE |
|
|
|
72 | **Compartment** | Subcellular location | Empty |
|
|
|
73 | **Subsystem** | Metabolic pathway | Empty |
|
|
|
74
|
|
|
75 ## Output Formats
|
|
|
76
|
|
|
77 ### SBML (Systems Biology Markup Language)
|
|
|
78 - **Format**: XML-based standard
|
|
|
79 - **Extension**: `.xml` or `.sbml`
|
|
|
80 - **Use Case**: Interoperability with other tools
|
|
|
81 - **Advantages**: Widely supported, standardized
|
|
|
82
|
|
|
83 ### JSON (JavaScript Object Notation)
|
|
|
84 - **Format**: COBRApy native JSON
|
|
|
85 - **Extension**: `.json`
|
|
|
86 - **Use Case**: Python/COBRApy workflows
|
|
|
87 - **Advantages**: Human-readable, lightweight
|
|
|
88
|
|
|
89 ### MATLAB (.mat)
|
|
|
90 - **Format**: MATLAB workspace format
|
|
|
91 - **Extension**: `.mat`
|
|
|
92 - **Use Case**: MATLAB COBRA Toolbox
|
|
|
93 - **Advantages**: Direct MATLAB compatibility
|
|
|
94
|
|
|
95 ### YAML (YAML Ain't Markup Language)
|
|
|
96 - **Format**: Human-readable data serialization
|
|
|
97 - **Extension**: `.yml` or `.yaml`
|
|
|
98 - **Use Case**: Configuration and documentation
|
|
|
99 - **Advantages**: Most human-readable format
|
|
|
100
|
|
|
101 ## Reaction Formula Syntax
|
|
|
102
|
|
|
103 ### Standard Notation
|
|
|
104 ```
|
|
|
105 # Irreversible reaction
|
|
|
106 A + B -> C + D
|
|
|
107
|
|
|
108 # Reversible reaction
|
|
|
109 A + B <-> C + D
|
|
|
110
|
|
|
111 # With stoichiometric coefficients
|
|
|
112 2 A + 3 B -> 1 C + 4 D
|
|
|
113
|
|
|
114 # Compartmentalized metabolites
|
|
|
115 glc_c + atp_c -> g6p_c + adp_c
|
|
|
116 ```
|
|
|
117
|
|
|
118 ### Compartment Suffixes
|
|
|
119 - `_c`: Cytosol
|
|
|
120 - `_m`: Mitochondria
|
|
|
121 - `_e`: Extracellular
|
|
|
122 - `_r`: Endoplasmic reticulum
|
|
|
123 - `_x`: Peroxisome
|
|
|
124 - `_n`: Nucleus
|
|
|
125
|
|
|
126 ### Exchange Reactions
|
|
|
127 ```
|
|
|
128 # Import reaction
|
|
|
129 EX_glc_e: glc_e <->
|
|
|
130
|
|
|
131 # Export reaction
|
|
|
132 EX_co2_e: co2_e <->
|
|
|
133 ```
|
|
|
134
|
|
|
135 ## GPR Rule Syntax
|
|
|
136
|
|
|
137 ### Logical Operators
|
|
|
138 - **AND**: Gene products required together
|
|
|
139 - **OR**: Alternative gene products
|
|
|
140 - **Parentheses**: Grouping for complex logic
|
|
|
141
|
|
|
142 ### Examples
|
|
|
143 ```
|
|
|
144 # Single gene
|
|
|
145 GENE1
|
|
|
146
|
|
|
147 # Alternative genes (isozymes)
|
|
|
148 GENE1 or GENE2 or GENE3
|
|
|
149
|
|
|
150 # Required genes (complex)
|
|
|
151 GENE1 and GENE2
|
|
|
152
|
|
|
153 # Complex logic
|
|
|
154 (GENE1 and GENE2) or (GENE3 and GENE4)
|
|
|
155 ```
|
|
|
156
|
|
|
157 ## Examples
|
|
|
158
|
|
|
159 ### Create Basic Model
|
|
|
160
|
|
|
161 ```bash
|
|
|
162 # Convert simple CSV to SBML model
|
|
|
163 exportMetabolicModel --input simple_model.csv \
|
|
|
164 --format sbml \
|
|
|
165 --output simple_model.xml \
|
|
|
166 --out_log simple_conversion.log \
|
|
|
167 --tool_dir /opt/COBRAxy/src
|
|
|
168 ```
|
|
|
169
|
|
|
170 ### Multi-format Export
|
|
|
171
|
|
|
172 ```bash
|
|
|
173 # Create models in all supported formats
|
|
|
174 formats=("sbml" "json" "mat" "yaml")
|
|
|
175 for fmt in "${formats[@]}"; do
|
|
|
176 exportMetabolicModel --input comprehensive_model.csv \
|
|
|
177 --format "$fmt" \
|
|
|
178 --output "model.$fmt" \
|
|
|
179 --out_log "conversion_$fmt.log" \
|
|
|
180 --tool_dir /opt/COBRAxy/src
|
|
|
181 done
|
|
|
182 ```
|
|
|
183
|
|
|
184 ### Custom Model Creation
|
|
|
185
|
|
|
186 ```bash
|
|
|
187 # Build tissue-specific model from curated data
|
|
|
188 exportMetabolicModel --input liver_reactions.tsv \
|
|
|
189 --format sbml \
|
|
|
190 --output liver_model.xml \
|
|
|
191 --out_log liver_model.log \
|
|
|
192 --tool_dir /opt/COBRAxy/src
|
|
|
193 ```
|
|
|
194
|
|
|
195 ### Model Integration Pipeline
|
|
|
196
|
|
|
197 ```bash
|
|
|
198 # Extract existing model, modify, and recreate
|
|
|
199 importMetabolicModel --model ENGRO2 \
|
|
|
200 --out_tabular base_model.csv \
|
|
|
201 --tool_dir /opt/COBRAxy/src
|
|
|
202
|
|
|
203 # Edit base_model.csv with custom reactions/constraints
|
|
|
204
|
|
|
205 # Create modified model
|
|
|
206 exportMetabolicModel --input modified_model.csv \
|
|
|
207 --format sbml \
|
|
|
208 --output custom_model.xml \
|
|
|
209 --out_log custom_creation.log \
|
|
|
210 --tool_dir /opt/COBRAxy/src
|
|
|
211 ```
|
|
|
212
|
|
|
213 ## Model Validation
|
|
|
214
|
|
|
215 ### Automatic Checks
|
|
|
216
|
|
|
217 The tool performs validation during conversion:
|
|
|
218 - **Stoichiometric Balance**: Reaction mass balance
|
|
|
219 - **Metabolite Consistency**: Compartment assignments
|
|
|
220 - **Bound Validation**: Feasible constraint ranges
|
|
|
221 - **Objective Function**: Valid biomass reaction
|
|
|
222
|
|
|
223 ### Post-conversion Validation
|
|
|
224
|
|
|
225 ```python
|
|
|
226 import cobra
|
|
|
227
|
|
|
228 # Load and validate model
|
|
|
229 model = cobra.io.read_sbml_model('custom_model.xml')
|
|
|
230
|
|
|
231 # Check basic properties
|
|
|
232 print(f"Reactions: {len(model.reactions)}")
|
|
|
233 print(f"Metabolites: {len(model.metabolites)}")
|
|
|
234 print(f"Genes: {len(model.genes)}")
|
|
|
235
|
|
|
236 # Test model solvability
|
|
|
237 solution = model.optimize()
|
|
|
238 print(f"Growth rate: {solution.objective_value}")
|
|
|
239
|
|
|
240 # Validate mass balance
|
|
|
241 unbalanced = cobra.flux_analysis.check_mass_balance(model)
|
|
|
242 if unbalanced:
|
|
|
243 print("Unbalanced reactions found:", unbalanced)
|
|
|
244 ```
|
|
|
245
|
|
|
246 ## Integration Workflow
|
|
|
247
|
|
|
248 ### Upstream Data Sources
|
|
|
249
|
|
|
250 #### COBRAxy Tools
|
|
|
251 - [Import Metabolic Model](import-metabolic-model.md) - Extract tabular data for modification
|
|
|
252
|
|
|
253 #### External Sources
|
|
|
254 - **Databases**: KEGG, Reactome, BiGG
|
|
|
255 - **Literature**: Manually curated reactions
|
|
|
256 - **Spreadsheets**: User-defined custom models
|
|
|
257
|
|
|
258 ### Downstream Applications
|
|
|
259
|
|
|
260 #### COBRAxy Analysis
|
|
|
261 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom model
|
|
|
262 - [Flux Simulation](flux-simulation.md) - Sample fluxes from custom model
|
|
|
263 - [MAREA](marea.md) - Analyze custom pathways
|
|
|
264
|
|
|
265 #### External Tools
|
|
|
266 - **COBRApy**: Python-based analysis
|
|
|
267 - **COBRA Toolbox**: MATLAB analysis
|
|
|
268 - **OptFlux**: Strain design
|
|
|
269 - **Escher**: Pathway visualization
|
|
|
270
|
|
|
271 ### Typical Pipeline
|
|
|
272
|
|
|
273 ```bash
|
|
|
274 # 1. Start with existing model data
|
|
|
275 importMetabolicModel --model ENGRO2 \
|
|
|
276 --out_tabular base_reactions.csv \
|
|
|
277 --tool_dir /opt/COBRAxy/src
|
|
|
278
|
|
|
279 # 2. Modify/extend the reaction data
|
|
|
280 # Edit base_reactions.csv to add tissue-specific reactions
|
|
|
281
|
|
|
282 # 3. Create custom model
|
|
|
283 exportMetabolicModel --input modified_reactions.csv \
|
|
|
284 --format sbml \
|
|
|
285 --output tissue_model.xml \
|
|
|
286 --out_log tissue_creation.log \
|
|
|
287 --tool_dir /opt/COBRAxy/src
|
|
|
288
|
|
|
289 # 4. Validate and use custom model
|
|
|
290 ras_to_bounds --model Custom --input tissue_model.xml \
|
|
|
291 --ras_input tissue_expression.tsv \
|
|
|
292 --idop tissue_bounds/ \
|
|
|
293 --tool_dir /opt/COBRAxy/src
|
|
|
294
|
|
|
295 # 5. Perform flux analysis
|
|
|
296 flux_simulation --model Custom --input tissue_model.xml \
|
|
|
297 --bounds tissue_bounds/*.tsv \
|
|
|
298 --algorithm CBS --idop tissue_fluxes/ \
|
|
|
299 --tool_dir /opt/COBRAxy/src
|
|
|
300 ```
|
|
|
301
|
|
|
302 ## Quality Control
|
|
|
303
|
|
|
304 ### Input Data Validation
|
|
|
305
|
|
|
306 #### Pre-conversion Checks
|
|
|
307 - **Format Consistency**: Verify column headers and data types
|
|
|
308 - **Reaction Completeness**: Check for missing required fields
|
|
|
309 - **Stoichiometric Validity**: Validate reaction formulas
|
|
|
310 - **Bound Feasibility**: Ensure lower ≤ upper bounds
|
|
|
311
|
|
|
312 #### Common Data Issues
|
|
|
313 ```bash
|
|
|
314 # Check for missing reaction IDs
|
|
|
315 awk -F',' 'NR>1 && ($1=="" || $1=="NA") {print "Empty ID in line " NR}' input.csv
|
|
|
316
|
|
|
317 # Validate reaction directions
|
|
|
318 awk -F',' 'NR>1 && $3 !~ /->|<->/ {print "Invalid formula: " $1 ", " $3}' input.csv
|
|
|
319
|
|
|
320 # Check bound consistency
|
|
|
321 awk -F',' 'NR>1 && $4>$5 {print "Invalid bounds: " $1 ", LB=" $4 " > UB=" $5}' input.csv
|
|
|
322 ```
|
|
|
323
|
|
|
324 ### Model Quality Assessment
|
|
|
325
|
|
|
326 #### Structural Properties
|
|
|
327 - **Network Connectivity**: Ensure realistic pathway structure
|
|
|
328 - **Compartmentalization**: Validate transport reactions
|
|
|
329 - **Exchange Reactions**: Verify medium composition
|
|
|
330 - **Biomass Function**: Check objective reaction completeness
|
|
|
331
|
|
|
332 #### Functional Testing
|
|
|
333 ```python
|
|
|
334 # Test model functionality
|
|
|
335 model = cobra.io.read_sbml_model('custom_model.xml')
|
|
|
336
|
|
|
337 # Check growth capability
|
|
|
338 growth = model.optimize().objective_value
|
|
|
339 print(f"Maximum growth rate: {growth}")
|
|
|
340
|
|
|
341 # Flux Variability Analysis
|
|
|
342 fva_result = cobra.flux_analysis.flux_variability_analysis(model)
|
|
|
343 blocked_reactions = fva_result[(fva_result.minimum == 0) & (fva_result.maximum == 0)]
|
|
|
344 print(f"Blocked reactions: {len(blocked_reactions)}")
|
|
|
345
|
|
|
346 # Essential gene analysis
|
|
|
347 essential_genes = cobra.flux_analysis.find_essential_genes(model)
|
|
|
348 print(f"Essential genes: {len(essential_genes)}")
|
|
|
349 ```
|
|
|
350
|
|
|
351 ## Tips and Best Practices
|
|
|
352
|
|
|
353 ### Data Preparation
|
|
|
354 - **Consistent Naming**: Use systematic metabolite/reaction IDs
|
|
|
355 - **Compartment Notation**: Follow standard suffixes (_c, _m, _e)
|
|
|
356 - **Balanced Reactions**: Verify mass and charge balance
|
|
|
357 - **Realistic Bounds**: Use physiologically relevant constraints
|
|
|
358
|
|
|
359 ### Model Design
|
|
|
360 - **Modular Structure**: Organize reactions by pathway/subsystem
|
|
|
361 - **Exchange Reactions**: Include all necessary transport processes
|
|
|
362 - **Biomass Function**: Define appropriate growth objective
|
|
|
363 - **Gene Associations**: Add GPR rules where available
|
|
|
364
|
|
|
365 ### Format Selection
|
|
|
366 - **SBML**: Choose for maximum compatibility and sharing
|
|
|
367 - **JSON**: Use for COBRApy-specific workflows
|
|
|
368 - **MATLAB**: Select for COBRA Toolbox integration
|
|
|
369 - **YAML**: Pick for human-readable documentation
|
|
|
370
|
|
|
371 ### Performance Optimization
|
|
|
372 - **Model Size**: Balance comprehensiveness with computational efficiency
|
|
|
373 - **Reaction Pruning**: Remove unnecessary or blocked reactions
|
|
|
374 - **Compartmentalization**: Minimize unnecessary compartments
|
|
|
375 - **Validation**: Test model properties before distribution
|
|
|
376
|
|
|
377 ## Troubleshooting
|
|
|
378
|
|
|
379 ### Common Issues
|
|
|
380
|
|
|
381 **Conversion fails with format error**
|
|
|
382 - Check CSV/TSV column headers and data consistency
|
|
|
383 - Verify reaction formula syntax
|
|
|
384 - Ensure numeric fields contain valid numbers
|
|
|
385
|
|
|
386 **Model is infeasible after conversion**
|
|
|
387 - Check reaction bounds for conflicts
|
|
|
388 - Verify exchange reaction setup
|
|
|
389 - Validate stoichiometric balance
|
|
|
390
|
|
|
391 **Missing metabolites or reactions**
|
|
|
392 - Confirm all required columns present in input
|
|
|
393 - Check for empty rows or malformed data
|
|
|
394 - Validate reaction formula parsing
|
|
|
395
|
|
|
396 ### Error Messages
|
|
|
397
|
|
|
398 | Error | Cause | Solution |
|
|
|
399 |-------|-------|----------|
|
|
|
400 | "Input file not found" | Invalid file path | Check file location and permissions |
|
|
|
401 | "Unknown format" | Invalid output format | Use: sbml, json, mat, or yaml |
|
|
|
402 | "Formula parsing failed" | Malformed reaction equation | Check reaction formula syntax |
|
|
|
403 | "Model infeasible" | Conflicting constraints | Review bounds and exchange reactions |
|
|
|
404
|
|
|
405 ### Performance Issues
|
|
|
406
|
|
|
407 **Slow conversion**
|
|
|
408 - Large input files require more processing time
|
|
|
409 - Complex GPR rules increase parsing overhead
|
|
|
410 - Monitor system memory usage
|
|
|
411
|
|
|
412 **Memory errors**
|
|
|
413 - Reduce model size or split into smaller files
|
|
|
414 - Increase available system memory
|
|
|
415 - Use more efficient data structures
|
|
|
416
|
|
|
417 **Output file corruption**
|
|
|
418 - Ensure sufficient disk space
|
|
|
419 - Check file write permissions
|
|
|
420 - Verify format-specific requirements
|
|
|
421
|
|
|
422 ## Advanced Usage
|
|
|
423
|
|
|
424 ### Batch Model Creation
|
|
|
425
|
|
|
426 ```python
|
|
|
427 #!/usr/bin/env python3
|
|
|
428 import subprocess
|
|
|
429 import pandas as pd
|
|
|
430
|
|
|
431 # Create multiple tissue-specific models
|
|
|
432 tissues = ['liver', 'muscle', 'brain', 'heart']
|
|
|
433 base_data = pd.read_csv('base_model.csv')
|
|
|
434
|
|
|
435 for tissue in tissues:
|
|
|
436 # Modify base data for tissue specificity
|
|
|
437 tissue_data = customize_for_tissue(base_data, tissue)
|
|
|
438 tissue_data.to_csv(f'{tissue}_model.csv', index=False)
|
|
|
439
|
|
|
440 # Convert to SBML
|
|
|
441 subprocess.run([
|
|
|
442 'exportMetabolicModel',
|
|
|
443 '--input', f'{tissue}_model.csv',
|
|
|
444 '--format', 'sbml',
|
|
|
445 '--output', f'{tissue}_model.xml',
|
|
|
446 '--out_log', f'{tissue}_conversion.log',
|
|
|
447 '--tool_dir', '/opt/COBRAxy/src'
|
|
|
448 ])
|
|
|
449 ```
|
|
|
450
|
|
|
451 ### Model Merging
|
|
|
452
|
|
|
453 Combine multiple tabular files into comprehensive models:
|
|
|
454
|
|
|
455 ```bash
|
|
|
456 # Merge core metabolism with tissue-specific pathways
|
|
|
457 cat core_reactions.csv > combined_model.csv
|
|
|
458 tail -n +2 tissue_reactions.csv >> combined_model.csv
|
|
|
459 tail -n +2 disease_reactions.csv >> combined_model.csv
|
|
|
460
|
|
|
461 # Create merged model
|
|
|
462 exportMetabolicModel --input combined_model.csv \
|
|
|
463 --format sbml \
|
|
|
464 --output comprehensive_model.xml \
|
|
|
465 --tool_dir /opt/COBRAxy/src
|
|
|
466 ```
|
|
|
467
|
|
|
468 ### Model Versioning
|
|
|
469
|
|
|
470 Track model versions and changes:
|
|
|
471
|
|
|
472 ```bash
|
|
|
473 # Version control for model development
|
|
|
474 git add model_v1.csv
|
|
|
475 git commit -m "Initial model version"
|
|
|
476
|
|
|
477 # Create versioned models
|
|
|
478 exportMetabolicModel --input model_v1.csv --format sbml \
|
|
|
479 --output model_v1.xml --tool_dir /opt/COBRAxy/src
|
|
|
480 exportMetabolicModel --input model_v2.csv --format sbml \
|
|
|
481 --output model_v2.xml --tool_dir /opt/COBRAxy/src
|
|
|
482
|
|
|
483 # Compare model versions
|
|
|
484 cobra_compare_models model_v1.xml model_v2.xml
|
|
|
485 ```
|
|
|
486
|
|
|
487 ## See Also
|
|
|
488
|
|
|
489 - [Import Metabolic Model](import-metabolic-model.md) - Extract tabular data from existing models
|
|
|
490 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom models
|
|
|
491 - [Flux Simulation](flux-simulation.md) - Analyze custom models with flux sampling
|
|
|
492 - [Model Creation Tutorial](/tutorials/custom-model-creation.md)
|
|
|
493 - [COBRA Model Standards](/tutorials/cobra-model-standards.md) |