comparison COBRAxy/docs/tools/export-metabolic-model.md @ 547:73f2f7e2be17 draft

Uploaded
author francesco_lapi
date Tue, 28 Oct 2025 10:44:07 +0000
parents fcdbc81feb45
children
comparison
equal deleted inserted replaced
546:01147e83f43c 547:73f2f7e2be17
1 # Export Metabolic Model 1 # Export Metabolic Model
2 2
3 Export tabular data (CSV/TSV) into COBRA metabolic models in various formats. 3 Convert tabular data into COBRA metabolic model.
4 4
5 ## Overview 5 ## Overview
6 6
7 Export Metabolic Model (exportMetabolicModel) converts structured tabular data containing reaction information into fully functional COBRA metabolic models. This tool enables creation of custom models from spreadsheet data and supports multiple output formats including SBML, JSON, MATLAB, and YAML. 7 Export Metabolic Model converts structured tabular data (CSV/TSV) into functional COBRA models in SBML, JSON, MATLAB, or YAML formats.
8 8
9 ## Usage 9 **Input**: Tabular model data (CSV/TSV)
10 **Output**: SBML/JSON/MAT/YAML model files
10 11
11 ### Command Line 12 ## Galaxy Interface
13
14 In Galaxy: **COBRAxy → Export Metabolic Model**
15
16 1. Upload tabular model data file
17 2. Select output format (SBML/JSON/MAT/YAML)
18 3. Click **Run tool**
19
20 ## Command-line console
12 21
13 ```bash 22 ```bash
14 exportMetabolicModel --input model_data.csv \ 23 exportMetabolicModel \
15 --format sbml \ 24 --input model_data.csv \
16 --output custom_model.xml \ 25 --format sbml \
17 --out_log conversion.log \ 26 --output custom_model.xml \
18 --tool_dir /path/to/COBRAxy/src 27 --out_log conversion.log
19 ``` 28 ```
20 29
21 ### Galaxy Interface
22
23 Select "Export Metabolic Model" from the COBRAxy tool suite and configure conversion parameters.
24
25 ## Parameters 30 ## Parameters
26
27 ### Required Parameters
28 31
29 | Parameter | Flag | Description | 32 | Parameter | Flag | Description |
30 |-----------|------|-------------| 33 |-----------|------|-------------|
31 | Input File | `--input` | Tabular file (CSV/TSV) with model data | 34 | Input File | `--input` | Tabular file (CSV/TSV) with model data |
32 | Output Format | `--format` | Model format (sbml, json, mat, yaml) | 35 | Output Format | `--format` | Model format: sbml, json, mat, yaml |
33 | Output File | `--output` | Output model file path | 36 | Output File | `--output` | Output model file path |
34 | Output Log | `--out_log` | Log file for conversion process | 37 | Output Log | `--out_log` | Log file |
35
36 ### Optional Parameters
37
38 | Parameter | Flag | Description | Default |
39 |-----------|------|-------------|---------|
40 | Tool Directory | `--tool_dir` | COBRAxy installation directory | Current directory |
41 38
42 ## Input Format 39 ## Input Format
43 40
44 ### Tabular Model Data 41 Required columns:
45
46 The input file must contain structured model information with the following columns:
47 42
48 ```csv 43 ```csv
49 Reaction_ID,GPR_Rule,Reaction_Formula,Lower_Bound,Upper_Bound,Objective_Coefficient,Medium_Member,Compartment,Subsystem 44 ReactionID,Formula,GPR,lower_bound,upper_bound,ObjectiveCoefficient,InMedium,TranslationIssues
50 R00001,GENE1 or GENE2,A + B -> C + D,-1000.0,1000.0,0.0,FALSE,cytosol,Glycolysis 45 R00001,A + B -> C + D,GENE1 or GENE2,-1000.0,1000.0,0.0,FALSE,
51 R00002,GENE3 and GENE4,E <-> F,-1000.0,1000.0,0.0,FALSE,mitochondria,TCA_Cycle 46 EX_glc_e,glc_e <->,-,-1000.0,1000.0,0.0,TRUE,
52 EX_glc_e,-,glc_e <->,-1000.0,1000.0,0.0,TRUE,extracellular,Exchange
53 BIOMASS,GENE5,0.5 A + 0.3 B -> 1 BIOMASS,0.0,1000.0,1.0,FALSE,cytosol,Biomass
54 ``` 47 ```
55 48
56 ### Required Columns 49 **File Format Notes:**
57 50 - Use **comma-separated** (CSV) or **tab-separated** (TSV)
58 | Column | Description | Format | 51 - First row must contain column headers
59 |--------|-------------|--------| 52 - Required columns: ReactionID, Formula, lower_bound, upper_bound
60 | **Reaction_ID** | Unique reaction identifier | String | 53 - Optional columns: GPR, ObjectiveCoefficient, InMedium, Pathway_1, Pathway_2
61 | **Reaction_Formula** | Stoichiometric equation | Metabolite notation |
62 | **Lower_Bound** | Minimum flux constraint | Numeric |
63 | **Upper_Bound** | Maximum flux constraint | Numeric |
64
65 ### Optional Columns
66
67 | Column | Description | Default |
68 |--------|-------------|---------|
69 | **GPR_Rule** | Gene-protein-reaction association | Empty string |
70 | **Objective_Coefficient** | Biomass/objective weight | 0.0 |
71 | **Medium_Member** | Exchange reaction flag | FALSE |
72 | **Compartment** | Subcellular location | Empty |
73 | **Subsystem** | Metabolic pathway | Empty |
74
75 ## Output Formats
76
77 ### SBML (Systems Biology Markup Language)
78 - **Format**: XML-based standard
79 - **Extension**: `.xml` or `.sbml`
80 - **Use Case**: Interoperability with other tools
81 - **Advantages**: Widely supported, standardized
82
83 ### JSON (JavaScript Object Notation)
84 - **Format**: COBRApy native JSON
85 - **Extension**: `.json`
86 - **Use Case**: Python/COBRApy workflows
87 - **Advantages**: Human-readable, lightweight
88
89 ### MATLAB (.mat)
90 - **Format**: MATLAB workspace format
91 - **Extension**: `.mat`
92 - **Use Case**: MATLAB COBRA Toolbox
93 - **Advantages**: Direct MATLAB compatibility
94
95 ### YAML (YAML Ain't Markup Language)
96 - **Format**: Human-readable data serialization
97 - **Extension**: `.yml` or `.yaml`
98 - **Use Case**: Configuration and documentation
99 - **Advantages**: Most human-readable format
100 54
101 ## Reaction Formula Syntax 55 ## Reaction Formula Syntax
102 56
103 ### Standard Notation
104 ``` 57 ```
105 # Irreversible reaction 58 # Irreversible
106 A + B -> C + D 59 A + B -> C + D
107 60
108 # Reversible reaction 61 # Reversible
109 A + B <-> C + D 62 A + B <-> C + D
110 63
111 # With stoichiometric coefficients 64 # With stoichiometry
112 2 A + 3 B -> 1 C + 4 D 65 2 A + 3 B -> 1 C + 4 D
113
114 # Compartmentalized metabolites
115 glc_c + atp_c -> g6p_c + adp_c
116 ```
117
118 ### Compartment Suffixes
119 - `_c`: Cytosol
120 - `_m`: Mitochondria
121 - `_e`: Extracellular
122 - `_r`: Endoplasmic reticulum
123 - `_x`: Peroxisome
124 - `_n`: Nucleus
125
126 ### Exchange Reactions
127 ```
128 # Import reaction
129 EX_glc_e: glc_e <->
130
131 # Export reaction
132 EX_co2_e: co2_e <->
133 ``` 66 ```
134 67
135 ## GPR Rule Syntax 68 ## GPR Rule Syntax
136 69
137 ### Logical Operators
138 - **AND**: Gene products required together
139 - **OR**: Alternative gene products
140 - **Parentheses**: Grouping for complex logic
141
142 ### Examples
143 ``` 70 ```
144 # Single gene 71 # Single gene
145 GENE1 72 GENE1
146 73
147 # Alternative genes (isozymes) 74 # Alternative genes (OR)
148 GENE1 or GENE2 or GENE3 75 GENE1 or GENE2
149 76
150 # Required genes (complex) 77 # Required complex (AND)
151 GENE1 and GENE2 78 GENE1 and GENE2
152 79
153 # Complex logic 80 # Nested logic
154 (GENE1 and GENE2) or (GENE3 and GENE4) 81 (GENE1 and GENE2) or GENE3
155 ``` 82 ```
83
84 ## Output Formats
85
86 - **SBML**: XML standard, maximum compatibility
87 - **JSON**: COBRApy native format
88 - **MATLAB**: COBRA Toolbox compatibility
89 - **YAML**: Human-readable format
156 90
157 ## Examples 91 ## Examples
158 92
159 ### Create Basic Model 93 ### Basic Export
160 94
161 ```bash 95 ```bash
162 # Convert simple CSV to SBML model 96 exportMetabolicModel --input model.csv \
163 exportMetabolicModel --input simple_model.csv \
164 --format sbml \ 97 --format sbml \
165 --output simple_model.xml \ 98 --output model.xml \
166 --out_log simple_conversion.log \ 99 --out_log conversion.log
167 --tool_dir /opt/COBRAxy/src
168 ``` 100 ```
169
170 ### Multi-format Export
171
172 ```bash
173 # Create models in all supported formats
174 formats=("sbml" "json" "mat" "yaml")
175 for fmt in "${formats[@]}"; do
176 exportMetabolicModel --input comprehensive_model.csv \
177 --format "$fmt" \
178 --output "model.$fmt" \
179 --out_log "conversion_$fmt.log" \
180 --tool_dir /opt/COBRAxy/src
181 done
182 ```
183
184 ### Custom Model Creation
185
186 ```bash
187 # Build tissue-specific model from curated data
188 exportMetabolicModel --input liver_reactions.tsv \
189 --format sbml \
190 --output liver_model.xml \
191 --out_log liver_model.log \
192 --tool_dir /opt/COBRAxy/src
193 ```
194
195 ### Model Integration Pipeline
196
197 ```bash
198 # Extract existing model, modify, and recreate
199 importMetabolicModel --model ENGRO2 \
200 --out_tabular base_model.csv \
201 --tool_dir /opt/COBRAxy/src
202
203 # Edit base_model.csv with custom reactions/constraints
204
205 # Create modified model
206 exportMetabolicModel --input modified_model.csv \
207 --format sbml \
208 --output custom_model.xml \
209 --out_log custom_creation.log \
210 --tool_dir /opt/COBRAxy/src
211 ```
212
213 ## Model Validation
214
215 ### Automatic Checks
216
217 The tool performs validation during conversion:
218 - **Stoichiometric Balance**: Reaction mass balance
219 - **Metabolite Consistency**: Compartment assignments
220 - **Bound Validation**: Feasible constraint ranges
221 - **Objective Function**: Valid biomass reaction
222
223 ### Post-conversion Validation
224
225 ```python
226 import cobra
227
228 # Load and validate model
229 model = cobra.io.read_sbml_model('custom_model.xml')
230
231 # Check basic properties
232 print(f"Reactions: {len(model.reactions)}")
233 print(f"Metabolites: {len(model.metabolites)}")
234 print(f"Genes: {len(model.genes)}")
235
236 # Test model solvability
237 solution = model.optimize()
238 print(f"Growth rate: {solution.objective_value}")
239
240 # Validate mass balance
241 unbalanced = cobra.flux_analysis.check_mass_balance(model)
242 if unbalanced:
243 print("Unbalanced reactions found:", unbalanced)
244 ```
245
246 ## Integration Workflow
247
248 ### Upstream Data Sources
249
250 #### COBRAxy Tools
251 - [Import Metabolic Model](import-metabolic-model.md) - Extract tabular data for modification
252
253 #### External Sources
254 - **Databases**: KEGG, Reactome, BiGG
255 - **Literature**: Manually curated reactions
256 - **Spreadsheets**: User-defined custom models
257
258 ### Downstream Applications
259
260 #### COBRAxy Analysis
261 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom model
262 - [Flux Simulation](flux-simulation.md) - Sample fluxes from custom model
263 - [MAREA](marea.md) - Analyze custom pathways
264
265 #### External Tools
266 - **COBRApy**: Python-based analysis
267 - **COBRA Toolbox**: MATLAB analysis
268 - **OptFlux**: Strain design
269 - **Escher**: Pathway visualization
270
271 ### Typical Pipeline
272
273 ```bash
274 # 1. Start with existing model data
275 importMetabolicModel --model ENGRO2 \
276 --out_tabular base_reactions.csv \
277 --tool_dir /opt/COBRAxy/src
278
279 # 2. Modify/extend the reaction data
280 # Edit base_reactions.csv to add tissue-specific reactions
281
282 # 3. Create custom model
283 exportMetabolicModel --input modified_reactions.csv \
284 --format sbml \
285 --output tissue_model.xml \
286 --out_log tissue_creation.log \
287 --tool_dir /opt/COBRAxy/src
288
289 # 4. Validate and use custom model
290 ras_to_bounds --model Custom --input tissue_model.xml \
291 --ras_input tissue_expression.tsv \
292 --idop tissue_bounds/ \
293 --tool_dir /opt/COBRAxy/src
294
295 # 5. Perform flux analysis
296 flux_simulation --model Custom --input tissue_model.xml \
297 --bounds tissue_bounds/*.tsv \
298 --algorithm CBS --idop tissue_fluxes/ \
299 --tool_dir /opt/COBRAxy/src
300 ```
301
302 ## Quality Control
303
304 ### Input Data Validation
305
306 #### Pre-conversion Checks
307 - **Format Consistency**: Verify column headers and data types
308 - **Reaction Completeness**: Check for missing required fields
309 - **Stoichiometric Validity**: Validate reaction formulas
310 - **Bound Feasibility**: Ensure lower ≤ upper bounds
311
312 #### Common Data Issues
313 ```bash
314 # Check for missing reaction IDs
315 awk -F',' 'NR>1 && ($1=="" || $1=="NA") {print "Empty ID in line " NR}' input.csv
316
317 # Validate reaction directions
318 awk -F',' 'NR>1 && $3 !~ /->|<->/ {print "Invalid formula: " $1 ", " $3}' input.csv
319
320 # Check bound consistency
321 awk -F',' 'NR>1 && $4>$5 {print "Invalid bounds: " $1 ", LB=" $4 " > UB=" $5}' input.csv
322 ```
323
324 ### Model Quality Assessment
325
326 #### Structural Properties
327 - **Network Connectivity**: Ensure realistic pathway structure
328 - **Compartmentalization**: Validate transport reactions
329 - **Exchange Reactions**: Verify medium composition
330 - **Biomass Function**: Check objective reaction completeness
331
332 #### Functional Testing
333 ```python
334 # Test model functionality
335 model = cobra.io.read_sbml_model('custom_model.xml')
336
337 # Check growth capability
338 growth = model.optimize().objective_value
339 print(f"Maximum growth rate: {growth}")
340
341 # Flux Variability Analysis
342 fva_result = cobra.flux_analysis.flux_variability_analysis(model)
343 blocked_reactions = fva_result[(fva_result.minimum == 0) & (fva_result.maximum == 0)]
344 print(f"Blocked reactions: {len(blocked_reactions)}")
345
346 # Essential gene analysis
347 essential_genes = cobra.flux_analysis.find_essential_genes(model)
348 print(f"Essential genes: {len(essential_genes)}")
349 ```
350
351 ## Tips and Best Practices
352
353 ### Data Preparation
354 - **Consistent Naming**: Use systematic metabolite/reaction IDs
355 - **Compartment Notation**: Follow standard suffixes (_c, _m, _e)
356 - **Balanced Reactions**: Verify mass and charge balance
357 - **Realistic Bounds**: Use physiologically relevant constraints
358
359 ### Model Design
360 - **Modular Structure**: Organize reactions by pathway/subsystem
361 - **Exchange Reactions**: Include all necessary transport processes
362 - **Biomass Function**: Define appropriate growth objective
363 - **Gene Associations**: Add GPR rules where available
364
365 ### Format Selection
366 - **SBML**: Choose for maximum compatibility and sharing
367 - **JSON**: Use for COBRApy-specific workflows
368 - **MATLAB**: Select for COBRA Toolbox integration
369 - **YAML**: Pick for human-readable documentation
370
371 ### Performance Optimization
372 - **Model Size**: Balance comprehensiveness with computational efficiency
373 - **Reaction Pruning**: Remove unnecessary or blocked reactions
374 - **Compartmentalization**: Minimize unnecessary compartments
375 - **Validation**: Test model properties before distribution
376 101
377 ## Troubleshooting 102 ## Troubleshooting
378 103
379 ### Common Issues 104 | Error | Solution |
380 105 |-------|----------|
381 **Conversion fails with format error** 106 | "Formula parsing failed" | Check reaction formula syntax |
382 - Check CSV/TSV column headers and data consistency 107 | "Model infeasible" | Review bounds and exchange reactions |
383 - Verify reaction formula syntax
384 - Ensure numeric fields contain valid numbers
385
386 **Model is infeasible after conversion**
387 - Check reaction bounds for conflicts
388 - Verify exchange reaction setup
389 - Validate stoichiometric balance
390
391 **Missing metabolites or reactions**
392 - Confirm all required columns present in input
393 - Check for empty rows or malformed data
394 - Validate reaction formula parsing
395
396 ### Error Messages
397
398 | Error | Cause | Solution |
399 |-------|-------|----------|
400 | "Input file not found" | Invalid file path | Check file location and permissions |
401 | "Unknown format" | Invalid output format | Use: sbml, json, mat, or yaml |
402 | "Formula parsing failed" | Malformed reaction equation | Check reaction formula syntax |
403 | "Model infeasible" | Conflicting constraints | Review bounds and exchange reactions |
404
405 ### Performance Issues
406
407 **Slow conversion**
408 - Large input files require more processing time
409 - Complex GPR rules increase parsing overhead
410 - Monitor system memory usage
411
412 **Memory errors**
413 - Reduce model size or split into smaller files
414 - Increase available system memory
415 - Use more efficient data structures
416
417 **Output file corruption**
418 - Ensure sufficient disk space
419 - Check file write permissions
420 - Verify format-specific requirements
421
422 ## Advanced Usage
423
424 ### Batch Model Creation
425
426 ```python
427 #!/usr/bin/env python3
428 import subprocess
429 import pandas as pd
430
431 # Create multiple tissue-specific models
432 tissues = ['liver', 'muscle', 'brain', 'heart']
433 base_data = pd.read_csv('base_model.csv')
434
435 for tissue in tissues:
436 # Modify base data for tissue specificity
437 tissue_data = customize_for_tissue(base_data, tissue)
438 tissue_data.to_csv(f'{tissue}_model.csv', index=False)
439
440 # Convert to SBML
441 subprocess.run([
442 'exportMetabolicModel',
443 '--input', f'{tissue}_model.csv',
444 '--format', 'sbml',
445 '--output', f'{tissue}_model.xml',
446 '--out_log', f'{tissue}_conversion.log',
447 '--tool_dir', '/opt/COBRAxy/src'
448 ])
449 ```
450
451 ### Model Merging
452
453 Combine multiple tabular files into comprehensive models:
454
455 ```bash
456 # Merge core metabolism with tissue-specific pathways
457 cat core_reactions.csv > combined_model.csv
458 tail -n +2 tissue_reactions.csv >> combined_model.csv
459 tail -n +2 disease_reactions.csv >> combined_model.csv
460
461 # Create merged model
462 exportMetabolicModel --input combined_model.csv \
463 --format sbml \
464 --output comprehensive_model.xml \
465 --tool_dir /opt/COBRAxy/src
466 ```
467
468 ### Model Versioning
469
470 Track model versions and changes:
471
472 ```bash
473 # Version control for model development
474 git add model_v1.csv
475 git commit -m "Initial model version"
476
477 # Create versioned models
478 exportMetabolicModel --input model_v1.csv --format sbml \
479 --output model_v1.xml --tool_dir /opt/COBRAxy/src
480 exportMetabolicModel --input model_v2.csv --format sbml \
481 --output model_v2.xml --tool_dir /opt/COBRAxy/src
482
483 # Compare model versions
484 cobra_compare_models model_v1.xml model_v2.xml
485 ```
486 108
487 ## See Also 109 ## See Also
488 110
489 - [Import Metabolic Model](import-metabolic-model.md) - Extract tabular data from existing models 111 - [Import Metabolic Model](reference/import-metabolic-model)
490 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom models 112 - [RAS to Bounds](tools/ras-to-bounds)
491 - [Flux Simulation](flux-simulation.md) - Analyze custom models with flux sampling 113 - [Flux Simulation](tools/flux-simulation)
492 - [Model Creation Tutorial](/tutorials/custom-model-creation.md)
493 - [COBRA Model Standards](/tutorials/cobra-model-standards.md)