comparison COBRAxy/docs/tools/import-metabolic-model.md @ 547:73f2f7e2be17 draft

Uploaded
author francesco_lapi
date Tue, 28 Oct 2025 10:44:07 +0000
parents fcdbc81feb45
children
comparison
equal deleted inserted replaced
546:01147e83f43c 547:73f2f7e2be17
1 # Import Metabolic Model 1 # Import Metabolic Model
2 2
3 Import and extract metabolic model components into tabular format for analysis and integration. 3 Import and extract metabolic model components into tabular format.
4 4
5 ## Overview 5 ## Overview
6 6
7 Import Metabolic Model (importMetabolicModel) imports metabolic models from various formats (SBML, JSON, MAT, YAML) and extracts key components into comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis. 7 Import Metabolic Model extracts metabolic models from SBML/JSON/MAT/YAML formats into tabular summary for analysis.
8 8
9 ## Usage 9 **Input**: Model file or built-in models
10 **Output**: Tabular data (CSV/TSV)
10 11
11 ### Command Line 12 ## Galaxy Interface
13
14 In Galaxy: **COBRAxy → Import Metabolic Model**
15
16 1. Select built-in model or upload custom file
17 2. Set model name and medium configuration
18 3. Click **Run tool**
19
20 ## Command-line console
12 21
13 ```bash 22 ```bash
14 importMetabolicModel --model ENGRO2 \ 23 # Import built-in model
15 --name ENGRO2 \ 24 importMetabolicModel \
16 --medium_selector allOpen \ 25 --model ENGRO2 \
17 --out_tabular model_data.csv \ 26 --name ENGRO2 \
18 --out_log extraction.log \ 27 --medium_selector allOpen \
19 --tool_dir /path/to/COBRAxy/src 28 --out_tabular model_data.csv \
29 --out_log extraction.log
20 ``` 30 ```
21
22 ### Galaxy Interface
23
24 Select "Import Metabolic Model" from the COBRAxy tool suite and configure model extraction parameters.
25 31
26 ## Parameters 32 ## Parameters
27 33
28 ### Required Parameters 34 ### Model Selection
29 35
30 | Parameter | Flag | Description | 36 | Parameter | Flag | Description |
31 |-----------|------|-------------| 37 |-----------|------|-------------|
32 | Model Name | `--name` | Model identifier for output files | 38 | Built-in Model | `--model` | ENGRO2 or Recon |
33 | Medium Selector | `--medium_selector` | Medium configuration option | 39 | Custom Model | `--input` | Path to SBML/JSON/MAT/YAML file |
34 | Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) |
35 | Output Log | `--out_log` | Log file for processing information |
36 | Tool Directory | `--tool_dir` | COBRAxy installation directory |
37 40
38 ### Model Selection Parameters 41 **Note**: Use either `--model` OR `--input`.
39 42
40 | Parameter | Flag | Description | Default |
41 |-----------|------|-------------|---------|
42 | Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - |
43 | Custom Model | `--input` | Path to custom SBML/JSON model file | - |
44 43
45 **Note**: Provide either `--model` OR `--input`, not both. 44 ### Required
46 45
47 ### Optional Parameters 46 | Parameter | Flag | Description |
47 |-----------|------|-------------|
48 | Model Name | `--name` | Model identifier |
49 | Medium Selector | `--medium_selector` | Medium configuration (use `allOpen`) |
50 | Output Tabular | `--out_tabular` | Output file (CSV/XLSX) |
51 | Output Log | `--out_log` | Log file |
52
53 ### Optional
48 54
49 | Parameter | Flag | Description | Default | 55 | Parameter | Flag | Description | Default |
50 |-----------|------|-------------|---------| 56 |-----------|------|-------------|---------|
51 | Custom Medium | `--custom_medium` | CSV file with medium constraints | - | 57 | Custom Medium | `--custom_medium` | CSV file with medium constraints | - |
58 | Gene Format | `--gene_format` | Gene ID conversion: Default, ENSG, HGNC_ID, entrez_id | Default |
52 59
53 ## Model Selection 60 ## Built-in Models
54 61
55 ### Built-in Models 62 - **ENGRO2**: ~500 reactions (recommended)
63 - **Recon**: ~10,000 reactions (genome-wide)
56 64
57 #### ENGRO2 65 See [Built-in Models](reference/built-in-models) for details.
58 - **Species**: Homo sapiens
59 - **Scope**: Genome-scale reconstruction
60 - **Reactions**: ~2,000 reactions
61 - **Metabolites**: ~1,500 metabolites
62 - **Coverage**: Comprehensive human metabolism
63 66
64 #### Recon 67 ## Supported Formats
65 - **Species**: Homo sapiens
66 - **Scope**: Recon3D human reconstruction
67 - **Reactions**: ~10,000+ reactions
68 - **Metabolites**: ~5,000+ metabolites
69 - **Coverage**: Most comprehensive human model
70 68
71 #### HMRcore 69 - **Model formats**: SBML (.xml), JSON (.json), MAT (.mat), YAML (.yml)
72 - **Species**: Homo sapiens 70 - **Compression**: .zip, .gz, .bz2 (e.g., `model.xml.gz`)
73 - **Scope**: Core metabolic network
74 - **Reactions**: ~300 essential reactions
75 - **Metabolites**: ~200 core metabolites
76 - **Coverage**: Central carbon and energy metabolism
77 71
78 ### Custom Models 72 Compressed files are automatically detected and extracted.
79
80 Supported formats for custom model import:
81 - **SBML**: Systems Biology Markup Language (.xml, .sbml)
82 - **JSON**: COBRApy JSON format (.json)
83 - **MAT**: MATLAB format (.mat)
84 - **YML**: YAML format (.yml, .yaml)
85 - **Compressed**: All formats support .gz, .zip, .bz2 compression
86
87 ## Medium Configuration
88
89 ### allOpen (Default)
90 - All exchange reactions unconstrained
91 - Maximum metabolic flexibility
92 - Suitable for general analysis
93
94 ### Custom Medium
95 Users can specify custom medium constraints by providing a CSV file with exchange reaction bounds.
96 73
97 ## Output Format 74 ## Output Format
98 75
99 ### Tabular Summary File 76 **ENGRO2 model:**
100
101 The output contains comprehensive model information in CSV or XLSX format:
102
103 #### Column Structure
104 ``` 77 ```
105 Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem 78 ReactionID Formula GPR lower_bound upper_bound ObjectiveCoefficient Pathway_1 Pathway_2 InMedium TranslationIssues
106 R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis 79 R00001 A + B -> C + D GENE1 or GENE2 -1000.0 1000.0 0.0 Glycolysis Central_Metabolism FALSE
107 R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle 80 EX_glc_e glc_e <-> - -1000.0 1000.0 0.0 Exchange Transport TRUE
108 EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange
109 ``` 81 ```
110 82
111 #### Data Fields 83 **Other models (Recon):**
84 ```
85 ReactionID Formula GPR lower_bound upper_bound ObjectiveCoefficient InMedium TranslationIssues
86 R00001 A + B -> C + D GENE1 or GENE2 -1000.0 1000.0 0.0 FALSE
87 EX_glc_e glc_e <-> - -1000.0 1000.0 0.0 TRUE
88 ```
112 89
113 | Field | Description | Values | 90 **File Format Notes:**
114 |-------|-------------|---------| 91 - Output can be **tab-separated** (CSV) or Excel (XLSX)
115 | Reaction_ID | Unique reaction identifier | String | 92 - Contains all model information in tabular format
116 | GPR_Rule | Gene-protein-reaction association | Logical expression | 93 - Can be edited and re-imported using Export Metabolic Model
117 | Reaction_Formula | Stoichiometric equation | Metabolites with coefficients | 94
118 | Lower_Bound | Minimum flux constraint | Numeric (typically -1000) | 95 ## Understanding Medium Composition
119 | Upper_Bound | Maximum flux constraint | Numeric (typically 1000) | 96
120 | Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) | 97 Exchange reactions with `InMedium = TRUE` represent nutrients in the medium:
121 | Medium_Member | Exchange reaction flag | TRUE/FALSE | 98 - **Lower bound**: Uptake rate (negative value, e.g., -10 = uptake 10 mmol/gDW/hr)
122 | Compartment | Subcellular location | String (for ENGRO2 only) | 99 - **Upper bound**: Secretion rate (positive value)
123 | Subsystem | Metabolic pathway | String | 100
101 Example:
102 ```
103 EX_glc_e glc_e <-> - -10.0 1000.0 0.0 TRUE
104 ```
105 Glucose uptake: 10 mmol/gDW/hr (lower bound = -10)
106
107 More info: [COBRApy Media Documentation](https://cobrapy.readthedocs.io/en/latest/media.html)
124 108
125 ## Examples 109 ## Examples
126 110
127 ### Extract Built-in Model Data 111 ### Extract Built-in Model
128 112
129 ```bash 113 ```bash
130 # Extract ENGRO2 model with default settings
131 importMetabolicModel --model ENGRO2 \ 114 importMetabolicModel --model ENGRO2 \
132 --name ENGRO2_extraction \ 115 --name ENGRO2_extraction \
133 --medium_selector allOpen \ 116 --medium_selector allOpen \
134 --out_tabular ENGRO2_data.csv \ 117 --out_tabular ENGRO2_data.csv \
135 --out_log ENGRO2_log.txt \ 118 --out_log ENGRO2_log.txt
136 --tool_dir /opt/COBRAxy/src
137 ``` 119 ```
138 120
139 ### Process Custom Model 121 ### Process Custom Model
140 122
141 ```bash 123 ```bash
142 # Extract custom SBML model 124 importMetabolicModel --input custom_model.xml \
143 importMetabolicModel --input /data/custom_model.xml \
144 --name CustomModel \ 125 --name CustomModel \
145 --medium_selector allOpen \ 126 --medium_selector allOpen \
146 --out_tabular custom_model_data.csv \ 127 --out_tabular custom_data.csv \
147 --out_log custom_extraction.log \ 128 --out_log custom_log.txt
148 --tool_dir /opt/COBRAxy/src
149 ``` 129 ```
150
151 ### Extract Core Model for Quick Analysis
152
153 ```bash
154 # Extract HMRcore for rapid prototyping
155 importMetabolicModel --model HMRcore \
156 --name CoreModel \
157 --medium_selector allOpen \
158 --out_tabular core_reactions.csv \
159 --out_log core_log.txt \
160 --tool_dir /opt/COBRAxy/src
161 ```
162
163 ### Batch Processing Multiple Models
164
165 ```bash
166 #!/bin/bash
167 models=("ENGRO2" "HMRcore" "Recon")
168 for model in "${models[@]}"; do
169 importMetabolicModel --model "$model" \
170 --name "${model}_extract" \
171 --medium_selector allOpen \
172 --out_tabular "${model}_data.csv" \
173 --out_log "${model}_log.txt" \
174 --tool_dir /opt/COBRAxy/src
175 done
176 ```
177
178 ## Use Cases
179
180 ### Model Comparison
181 Extract multiple models to compare:
182 - Reaction coverage across different reconstructions
183 - Gene-reaction associations
184 - Pathway representation
185 - Metabolite compartmentalization
186
187 ### Data Integration
188 Prepare model data for:
189 - Custom analysis pipelines
190 - Database integration
191 - Pathway annotation
192 - Cross-reference mapping
193
194 ### Quality Control
195 Validate model properties:
196 - Check reaction balancing
197 - Verify gene associations
198 - Assess network connectivity
199 - Identify missing annotations
200
201 ### Custom Analysis
202 Export structured data for:
203 - Network analysis (graph theory)
204 - Machine learning applications
205 - Statistical modeling
206 - Comparative genomics
207
208 ## Integration Workflow
209
210 ### Downstream Tools
211
212 The extracted tabular data serves as input for:
213
214 #### COBRAxy Tools
215 - [RAS Generator](ras-generator.md) - Use extracted GPR rules
216 - [RPS Generator](rps-generator.md) - Use reaction formulas
217 - [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds
218 - [MAREA](marea.md) - Use reaction annotations
219
220 #### External Analysis
221 - **R/Bioconductor**: Import CSV for pathway analysis
222 - **Python/pandas**: Load data for network analysis
223 - **MATLAB**: Process XLSX for modeling
224 - **Cytoscape**: Network visualization
225 - **Databases**: Populate reaction databases
226
227 ### Typical Pipeline
228
229 ```bash
230 # 1. Extract model components
231 importMetabolicModel --model ENGRO2 --name ModelData \
232 --out_tabular model_components.csv \
233 --tool_dir /opt/COBRAxy/src
234
235 # 2. Use extracted data for RAS analysis
236 ras_generator -td /opt/COBRAxy/src -rs Custom \
237 -rl model_components.csv \
238 -in expression_data.tsv -ra ras_scores.tsv
239
240 # 3. Apply constraints and sample fluxes
241 ras_to_bounds -td /opt/COBRAxy/src -ms Custom -mo model_components.csv \
242 -ir ras_scores.tsv -idop constrained_bounds/
243
244 # 4. Visualize results
245 marea -td /opt/COBRAxy/src -input_data ras_scores.tsv \
246 -choice_map Custom -custom_map custom.svg -idop results/
247 ```
248
249 ## Quality Control
250
251 ### Pre-extraction Validation
252 - Verify model file integrity and format
253 - Check SBML compliance for custom models
254 - Validate gene ID formats and coverage
255 - Confirm medium constraint specifications
256
257 ### Post-extraction Checks
258 - **Completeness**: Verify all expected reactions extracted
259 - **Consistency**: Check stoichiometric balance
260 - **Annotations**: Validate gene-reaction associations
261 - **Formatting**: Confirm output file structure
262
263 ### Data Validation
264
265 #### Reaction Balancing
266 ```bash
267 # Check for unbalanced reactions
268 awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv
269 ```
270
271 #### Gene Coverage
272 ```bash
273 # Count reactions with GPR rules
274 awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv
275 ```
276
277 #### Exchange Reactions
278 ```bash
279 # List medium components
280 awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv
281 ```
282
283 ## Tips and Best Practices
284
285 ### Model Selection
286 - **ENGRO2**: Balanced coverage for human tissue analysis
287 - **HMRcore**: Fast processing for algorithm development
288 - **Recon**: Comprehensive analysis requiring computational resources
289 - **Custom**: Organism-specific or specialized models
290
291 ### Output Format Optimization
292 - **CSV**: Lightweight, universal compatibility
293 - Choose based on downstream analysis requirements
294
295 ### Performance Considerations
296 - Large models (Recon) may require substantial memory
297 - Consider batch processing for multiple extractions
298 130
299 ## Troubleshooting 131 ## Troubleshooting
300 132
301 ### Common Issues 133 | Error | Solution |
302 134 |-------|----------|
303 **Model loading fails** 135 | "Model file not found" | Check file path |
304 - Check file format and compression 136 | "Unsupported format" | Use SBML, JSON, MAT, or YAML |
305 - Verify SBML/JSON/MAT/YAML validity for custom models
306 - Ensure sufficient system memory
307
308 **Empty output file**
309 - Model may contain no reactions
310 - Check model file integrity
311 - Verify tool directory configuration
312
313 ### Error Messages
314
315 | Error | Cause | Solution |
316 |-------|-------|----------|
317 | "Model file not found" | Invalid file path | Check file location and permissions |
318 | "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YAML |
319 | "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory |
320
321 ### Performance Issues
322
323 **Slow processing**
324 - Large models require more time
325 - Monitor system resource usage
326
327 **Memory errors**
328 - Reduce model size if possible
329 - Process in smaller batches
330 - Increase available system memory
331
332 **Output file corruption**
333 - Check disk space availability
334 - Verify file write permissions
335 - Monitor for system interruptions
336
337 ## Advanced Usage
338
339 ### Batch Extraction Script
340
341 ```python
342 #!/usr/bin/env python3
343 import subprocess
344 import sys
345
346 models = ['ENGRO2', 'HMRcore', 'Recon']
347
348 for model in models:
349 cmd = [
350 'importMetabolicModel',
351 '--model', model,
352 '--name', f'{model}_data',
353 '--medium_selector', 'allOpen',
354 '--out_tabular', f'{model}.csv',
355 '--out_log', f'{model}.log',
356 '--tool_dir', '/opt/COBRAxy/src'
357 ]
358 subprocess.run(cmd, check=True)
359 ```
360
361 ### Database Integration
362
363 Export model data to databases:
364
365 ```sql
366 -- Load CSV into PostgreSQL
367 CREATE TABLE model_reactions (
368 reaction_id VARCHAR(50),
369 gpr_rule TEXT,
370 reaction_formula TEXT,
371 lower_bound FLOAT,
372 upper_bound FLOAT,
373 objective_coefficient FLOAT,
374 medium_member BOOLEAN,
375 compartment VARCHAR(50),
376 subsystem VARCHAR(100)
377 );
378
379 COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER;
380 ```
381 137
382 ## See Also 138 ## See Also
383 139
384 - [Export Metabolic Model](export-metabolic-model.md) - Export tabular data to model formats 140 - [Export Metabolic Model](reference/export-metabolic-model)
385 - [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation 141 - [RAS Generator](tools/ras-generator)
386 - [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis 142 - [RPS Generator](tools/rps-generator)
387 - [Custom Model Tutorial](/tutorials/custom-model-integration.md)