comparison COBRAxy/docs/tools/import-metabolic-model.md @ 542:fcdbc81feb45 draft

Uploaded
author francesco_lapi
date Sun, 26 Oct 2025 19:27:41 +0000
parents
children 73f2f7e2be17
comparison
equal deleted inserted replaced
541:fa93040a75af 542:fcdbc81feb45
1 # Import Metabolic Model
2
3 Import and extract metabolic model components into tabular format for analysis and integration.
4
5 ## Overview
6
7 Import Metabolic Model (importMetabolicModel) imports metabolic models from various formats (SBML, JSON, MAT, YAML) and extracts key components into comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis.
8
9 ## Usage
10
11 ### Command Line
12
13 ```bash
14 importMetabolicModel --model ENGRO2 \
15 --name ENGRO2 \
16 --medium_selector allOpen \
17 --out_tabular model_data.csv \
18 --out_log extraction.log \
19 --tool_dir /path/to/COBRAxy/src
20 ```
21
22 ### Galaxy Interface
23
24 Select "Import Metabolic Model" from the COBRAxy tool suite and configure model extraction parameters.
25
26 ## Parameters
27
28 ### Required Parameters
29
30 | Parameter | Flag | Description |
31 |-----------|------|-------------|
32 | Model Name | `--name` | Model identifier for output files |
33 | Medium Selector | `--medium_selector` | Medium configuration option |
34 | Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) |
35 | Output Log | `--out_log` | Log file for processing information |
36 | Tool Directory | `--tool_dir` | COBRAxy installation directory |
37
38 ### Model Selection Parameters
39
40 | Parameter | Flag | Description | Default |
41 |-----------|------|-------------|---------|
42 | Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - |
43 | Custom Model | `--input` | Path to custom SBML/JSON model file | - |
44
45 **Note**: Provide either `--model` OR `--input`, not both.
46
47 ### Optional Parameters
48
49 | Parameter | Flag | Description | Default |
50 |-----------|------|-------------|---------|
51 | Custom Medium | `--custom_medium` | CSV file with medium constraints | - |
52
53 ## Model Selection
54
55 ### Built-in Models
56
57 #### ENGRO2
58 - **Species**: Homo sapiens
59 - **Scope**: Genome-scale reconstruction
60 - **Reactions**: ~2,000 reactions
61 - **Metabolites**: ~1,500 metabolites
62 - **Coverage**: Comprehensive human metabolism
63
64 #### Recon
65 - **Species**: Homo sapiens
66 - **Scope**: Recon3D human reconstruction
67 - **Reactions**: ~10,000+ reactions
68 - **Metabolites**: ~5,000+ metabolites
69 - **Coverage**: Most comprehensive human model
70
71 #### HMRcore
72 - **Species**: Homo sapiens
73 - **Scope**: Core metabolic network
74 - **Reactions**: ~300 essential reactions
75 - **Metabolites**: ~200 core metabolites
76 - **Coverage**: Central carbon and energy metabolism
77
78 ### Custom Models
79
80 Supported formats for custom model import:
81 - **SBML**: Systems Biology Markup Language (.xml, .sbml)
82 - **JSON**: COBRApy JSON format (.json)
83 - **MAT**: MATLAB format (.mat)
84 - **YML**: YAML format (.yml, .yaml)
85 - **Compressed**: All formats support .gz, .zip, .bz2 compression
86
87 ## Medium Configuration
88
89 ### allOpen (Default)
90 - All exchange reactions unconstrained
91 - Maximum metabolic flexibility
92 - Suitable for general analysis
93
94 ### Custom Medium
95 Users can specify custom medium constraints by providing a CSV file with exchange reaction bounds.
96
97 ## Output Format
98
99 ### Tabular Summary File
100
101 The output contains comprehensive model information in CSV or XLSX format:
102
103 #### Column Structure
104 ```
105 Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem
106 R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis
107 R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle
108 EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange
109 ```
110
111 #### Data Fields
112
113 | Field | Description | Values |
114 |-------|-------------|---------|
115 | Reaction_ID | Unique reaction identifier | String |
116 | GPR_Rule | Gene-protein-reaction association | Logical expression |
117 | Reaction_Formula | Stoichiometric equation | Metabolites with coefficients |
118 | Lower_Bound | Minimum flux constraint | Numeric (typically -1000) |
119 | Upper_Bound | Maximum flux constraint | Numeric (typically 1000) |
120 | Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) |
121 | Medium_Member | Exchange reaction flag | TRUE/FALSE |
122 | Compartment | Subcellular location | String (for ENGRO2 only) |
123 | Subsystem | Metabolic pathway | String |
124
125 ## Examples
126
127 ### Extract Built-in Model Data
128
129 ```bash
130 # Extract ENGRO2 model with default settings
131 importMetabolicModel --model ENGRO2 \
132 --name ENGRO2_extraction \
133 --medium_selector allOpen \
134 --out_tabular ENGRO2_data.csv \
135 --out_log ENGRO2_log.txt \
136 --tool_dir /opt/COBRAxy/src
137 ```
138
139 ### Process Custom Model
140
141 ```bash
142 # Extract custom SBML model
143 importMetabolicModel --input /data/custom_model.xml \
144 --name CustomModel \
145 --medium_selector allOpen \
146 --out_tabular custom_model_data.csv \
147 --out_log custom_extraction.log \
148 --tool_dir /opt/COBRAxy/src
149 ```
150
151 ### Extract Core Model for Quick Analysis
152
153 ```bash
154 # Extract HMRcore for rapid prototyping
155 importMetabolicModel --model HMRcore \
156 --name CoreModel \
157 --medium_selector allOpen \
158 --out_tabular core_reactions.csv \
159 --out_log core_log.txt \
160 --tool_dir /opt/COBRAxy/src
161 ```
162
163 ### Batch Processing Multiple Models
164
165 ```bash
166 #!/bin/bash
167 models=("ENGRO2" "HMRcore" "Recon")
168 for model in "${models[@]}"; do
169 importMetabolicModel --model "$model" \
170 --name "${model}_extract" \
171 --medium_selector allOpen \
172 --out_tabular "${model}_data.csv" \
173 --out_log "${model}_log.txt" \
174 --tool_dir /opt/COBRAxy/src
175 done
176 ```
177
178 ## Use Cases
179
180 ### Model Comparison
181 Extract multiple models to compare:
182 - Reaction coverage across different reconstructions
183 - Gene-reaction associations
184 - Pathway representation
185 - Metabolite compartmentalization
186
187 ### Data Integration
188 Prepare model data for:
189 - Custom analysis pipelines
190 - Database integration
191 - Pathway annotation
192 - Cross-reference mapping
193
194 ### Quality Control
195 Validate model properties:
196 - Check reaction balancing
197 - Verify gene associations
198 - Assess network connectivity
199 - Identify missing annotations
200
201 ### Custom Analysis
202 Export structured data for:
203 - Network analysis (graph theory)
204 - Machine learning applications
205 - Statistical modeling
206 - Comparative genomics
207
208 ## Integration Workflow
209
210 ### Downstream Tools
211
212 The extracted tabular data serves as input for:
213
214 #### COBRAxy Tools
215 - [RAS Generator](ras-generator.md) - Use extracted GPR rules
216 - [RPS Generator](rps-generator.md) - Use reaction formulas
217 - [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds
218 - [MAREA](marea.md) - Use reaction annotations
219
220 #### External Analysis
221 - **R/Bioconductor**: Import CSV for pathway analysis
222 - **Python/pandas**: Load data for network analysis
223 - **MATLAB**: Process XLSX for modeling
224 - **Cytoscape**: Network visualization
225 - **Databases**: Populate reaction databases
226
227 ### Typical Pipeline
228
229 ```bash
230 # 1. Extract model components
231 importMetabolicModel --model ENGRO2 --name ModelData \
232 --out_tabular model_components.csv \
233 --tool_dir /opt/COBRAxy/src
234
235 # 2. Use extracted data for RAS analysis
236 ras_generator -td /opt/COBRAxy/src -rs Custom \
237 -rl model_components.csv \
238 -in expression_data.tsv -ra ras_scores.tsv
239
240 # 3. Apply constraints and sample fluxes
241 ras_to_bounds -td /opt/COBRAxy/src -ms Custom -mo model_components.csv \
242 -ir ras_scores.tsv -idop constrained_bounds/
243
244 # 4. Visualize results
245 marea -td /opt/COBRAxy/src -input_data ras_scores.tsv \
246 -choice_map Custom -custom_map custom.svg -idop results/
247 ```
248
249 ## Quality Control
250
251 ### Pre-extraction Validation
252 - Verify model file integrity and format
253 - Check SBML compliance for custom models
254 - Validate gene ID formats and coverage
255 - Confirm medium constraint specifications
256
257 ### Post-extraction Checks
258 - **Completeness**: Verify all expected reactions extracted
259 - **Consistency**: Check stoichiometric balance
260 - **Annotations**: Validate gene-reaction associations
261 - **Formatting**: Confirm output file structure
262
263 ### Data Validation
264
265 #### Reaction Balancing
266 ```bash
267 # Check for unbalanced reactions
268 awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv
269 ```
270
271 #### Gene Coverage
272 ```bash
273 # Count reactions with GPR rules
274 awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv
275 ```
276
277 #### Exchange Reactions
278 ```bash
279 # List medium components
280 awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv
281 ```
282
283 ## Tips and Best Practices
284
285 ### Model Selection
286 - **ENGRO2**: Balanced coverage for human tissue analysis
287 - **HMRcore**: Fast processing for algorithm development
288 - **Recon**: Comprehensive analysis requiring computational resources
289 - **Custom**: Organism-specific or specialized models
290
291 ### Output Format Optimization
292 - **CSV**: Lightweight, universal compatibility
293 - Choose based on downstream analysis requirements
294
295 ### Performance Considerations
296 - Large models (Recon) may require substantial memory
297 - Consider batch processing for multiple extractions
298
299 ## Troubleshooting
300
301 ### Common Issues
302
303 **Model loading fails**
304 - Check file format and compression
305 - Verify SBML/JSON/MAT/YAML validity for custom models
306 - Ensure sufficient system memory
307
308 **Empty output file**
309 - Model may contain no reactions
310 - Check model file integrity
311 - Verify tool directory configuration
312
313 ### Error Messages
314
315 | Error | Cause | Solution |
316 |-------|-------|----------|
317 | "Model file not found" | Invalid file path | Check file location and permissions |
318 | "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YAML |
319 | "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory |
320
321 ### Performance Issues
322
323 **Slow processing**
324 - Large models require more time
325 - Monitor system resource usage
326
327 **Memory errors**
328 - Reduce model size if possible
329 - Process in smaller batches
330 - Increase available system memory
331
332 **Output file corruption**
333 - Check disk space availability
334 - Verify file write permissions
335 - Monitor for system interruptions
336
337 ## Advanced Usage
338
339 ### Batch Extraction Script
340
341 ```python
342 #!/usr/bin/env python3
343 import subprocess
344 import sys
345
346 models = ['ENGRO2', 'HMRcore', 'Recon']
347
348 for model in models:
349 cmd = [
350 'importMetabolicModel',
351 '--model', model,
352 '--name', f'{model}_data',
353 '--medium_selector', 'allOpen',
354 '--out_tabular', f'{model}.csv',
355 '--out_log', f'{model}.log',
356 '--tool_dir', '/opt/COBRAxy/src'
357 ]
358 subprocess.run(cmd, check=True)
359 ```
360
361 ### Database Integration
362
363 Export model data to databases:
364
365 ```sql
366 -- Load CSV into PostgreSQL
367 CREATE TABLE model_reactions (
368 reaction_id VARCHAR(50),
369 gpr_rule TEXT,
370 reaction_formula TEXT,
371 lower_bound FLOAT,
372 upper_bound FLOAT,
373 objective_coefficient FLOAT,
374 medium_member BOOLEAN,
375 compartment VARCHAR(50),
376 subsystem VARCHAR(100)
377 );
378
379 COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER;
380 ```
381
382 ## See Also
383
384 - [Export Metabolic Model](export-metabolic-model.md) - Export tabular data to model formats
385 - [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation
386 - [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis
387 - [Custom Model Tutorial](/tutorials/custom-model-integration.md)