|
542
|
1 # Import Metabolic Model
|
|
|
2
|
|
|
3 Import and extract metabolic model components into tabular format for analysis and integration.
|
|
|
4
|
|
|
5 ## Overview
|
|
|
6
|
|
|
7 Import Metabolic Model (importMetabolicModel) imports metabolic models from various formats (SBML, JSON, MAT, YAML) and extracts key components into comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis.
|
|
|
8
|
|
|
9 ## Usage
|
|
|
10
|
|
|
11 ### Command Line
|
|
|
12
|
|
|
13 ```bash
|
|
|
14 importMetabolicModel --model ENGRO2 \
|
|
|
15 --name ENGRO2 \
|
|
|
16 --medium_selector allOpen \
|
|
|
17 --out_tabular model_data.csv \
|
|
|
18 --out_log extraction.log \
|
|
|
19 --tool_dir /path/to/COBRAxy/src
|
|
|
20 ```
|
|
|
21
|
|
|
22 ### Galaxy Interface
|
|
|
23
|
|
|
24 Select "Import Metabolic Model" from the COBRAxy tool suite and configure model extraction parameters.
|
|
|
25
|
|
|
26 ## Parameters
|
|
|
27
|
|
|
28 ### Required Parameters
|
|
|
29
|
|
|
30 | Parameter | Flag | Description |
|
|
|
31 |-----------|------|-------------|
|
|
|
32 | Model Name | `--name` | Model identifier for output files |
|
|
|
33 | Medium Selector | `--medium_selector` | Medium configuration option |
|
|
|
34 | Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) |
|
|
|
35 | Output Log | `--out_log` | Log file for processing information |
|
|
|
36 | Tool Directory | `--tool_dir` | COBRAxy installation directory |
|
|
|
37
|
|
|
38 ### Model Selection Parameters
|
|
|
39
|
|
|
40 | Parameter | Flag | Description | Default |
|
|
|
41 |-----------|------|-------------|---------|
|
|
|
42 | Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - |
|
|
|
43 | Custom Model | `--input` | Path to custom SBML/JSON model file | - |
|
|
|
44
|
|
|
45 **Note**: Provide either `--model` OR `--input`, not both.
|
|
|
46
|
|
|
47 ### Optional Parameters
|
|
|
48
|
|
|
49 | Parameter | Flag | Description | Default |
|
|
|
50 |-----------|------|-------------|---------|
|
|
|
51 | Custom Medium | `--custom_medium` | CSV file with medium constraints | - |
|
|
|
52
|
|
|
53 ## Model Selection
|
|
|
54
|
|
|
55 ### Built-in Models
|
|
|
56
|
|
|
57 #### ENGRO2
|
|
|
58 - **Species**: Homo sapiens
|
|
|
59 - **Scope**: Genome-scale reconstruction
|
|
|
60 - **Reactions**: ~2,000 reactions
|
|
|
61 - **Metabolites**: ~1,500 metabolites
|
|
|
62 - **Coverage**: Comprehensive human metabolism
|
|
|
63
|
|
|
64 #### Recon
|
|
|
65 - **Species**: Homo sapiens
|
|
|
66 - **Scope**: Recon3D human reconstruction
|
|
|
67 - **Reactions**: ~10,000+ reactions
|
|
|
68 - **Metabolites**: ~5,000+ metabolites
|
|
|
69 - **Coverage**: Most comprehensive human model
|
|
|
70
|
|
|
71 #### HMRcore
|
|
|
72 - **Species**: Homo sapiens
|
|
|
73 - **Scope**: Core metabolic network
|
|
|
74 - **Reactions**: ~300 essential reactions
|
|
|
75 - **Metabolites**: ~200 core metabolites
|
|
|
76 - **Coverage**: Central carbon and energy metabolism
|
|
|
77
|
|
|
78 ### Custom Models
|
|
|
79
|
|
|
80 Supported formats for custom model import:
|
|
|
81 - **SBML**: Systems Biology Markup Language (.xml, .sbml)
|
|
|
82 - **JSON**: COBRApy JSON format (.json)
|
|
|
83 - **MAT**: MATLAB format (.mat)
|
|
|
84 - **YML**: YAML format (.yml, .yaml)
|
|
|
85 - **Compressed**: All formats support .gz, .zip, .bz2 compression
|
|
|
86
|
|
|
87 ## Medium Configuration
|
|
|
88
|
|
|
89 ### allOpen (Default)
|
|
|
90 - All exchange reactions unconstrained
|
|
|
91 - Maximum metabolic flexibility
|
|
|
92 - Suitable for general analysis
|
|
|
93
|
|
|
94 ### Custom Medium
|
|
|
95 Users can specify custom medium constraints by providing a CSV file with exchange reaction bounds.
|
|
|
96
|
|
|
97 ## Output Format
|
|
|
98
|
|
|
99 ### Tabular Summary File
|
|
|
100
|
|
|
101 The output contains comprehensive model information in CSV or XLSX format:
|
|
|
102
|
|
|
103 #### Column Structure
|
|
|
104 ```
|
|
|
105 Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem
|
|
|
106 R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis
|
|
|
107 R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle
|
|
|
108 EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange
|
|
|
109 ```
|
|
|
110
|
|
|
111 #### Data Fields
|
|
|
112
|
|
|
113 | Field | Description | Values |
|
|
|
114 |-------|-------------|---------|
|
|
|
115 | Reaction_ID | Unique reaction identifier | String |
|
|
|
116 | GPR_Rule | Gene-protein-reaction association | Logical expression |
|
|
|
117 | Reaction_Formula | Stoichiometric equation | Metabolites with coefficients |
|
|
|
118 | Lower_Bound | Minimum flux constraint | Numeric (typically -1000) |
|
|
|
119 | Upper_Bound | Maximum flux constraint | Numeric (typically 1000) |
|
|
|
120 | Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) |
|
|
|
121 | Medium_Member | Exchange reaction flag | TRUE/FALSE |
|
|
|
122 | Compartment | Subcellular location | String (for ENGRO2 only) |
|
|
|
123 | Subsystem | Metabolic pathway | String |
|
|
|
124
|
|
|
125 ## Examples
|
|
|
126
|
|
|
127 ### Extract Built-in Model Data
|
|
|
128
|
|
|
129 ```bash
|
|
|
130 # Extract ENGRO2 model with default settings
|
|
|
131 importMetabolicModel --model ENGRO2 \
|
|
|
132 --name ENGRO2_extraction \
|
|
|
133 --medium_selector allOpen \
|
|
|
134 --out_tabular ENGRO2_data.csv \
|
|
|
135 --out_log ENGRO2_log.txt \
|
|
|
136 --tool_dir /opt/COBRAxy/src
|
|
|
137 ```
|
|
|
138
|
|
|
139 ### Process Custom Model
|
|
|
140
|
|
|
141 ```bash
|
|
|
142 # Extract custom SBML model
|
|
|
143 importMetabolicModel --input /data/custom_model.xml \
|
|
|
144 --name CustomModel \
|
|
|
145 --medium_selector allOpen \
|
|
|
146 --out_tabular custom_model_data.csv \
|
|
|
147 --out_log custom_extraction.log \
|
|
|
148 --tool_dir /opt/COBRAxy/src
|
|
|
149 ```
|
|
|
150
|
|
|
151 ### Extract Core Model for Quick Analysis
|
|
|
152
|
|
|
153 ```bash
|
|
|
154 # Extract HMRcore for rapid prototyping
|
|
|
155 importMetabolicModel --model HMRcore \
|
|
|
156 --name CoreModel \
|
|
|
157 --medium_selector allOpen \
|
|
|
158 --out_tabular core_reactions.csv \
|
|
|
159 --out_log core_log.txt \
|
|
|
160 --tool_dir /opt/COBRAxy/src
|
|
|
161 ```
|
|
|
162
|
|
|
163 ### Batch Processing Multiple Models
|
|
|
164
|
|
|
165 ```bash
|
|
|
166 #!/bin/bash
|
|
|
167 models=("ENGRO2" "HMRcore" "Recon")
|
|
|
168 for model in "${models[@]}"; do
|
|
|
169 importMetabolicModel --model "$model" \
|
|
|
170 --name "${model}_extract" \
|
|
|
171 --medium_selector allOpen \
|
|
|
172 --out_tabular "${model}_data.csv" \
|
|
|
173 --out_log "${model}_log.txt" \
|
|
|
174 --tool_dir /opt/COBRAxy/src
|
|
|
175 done
|
|
|
176 ```
|
|
|
177
|
|
|
178 ## Use Cases
|
|
|
179
|
|
|
180 ### Model Comparison
|
|
|
181 Extract multiple models to compare:
|
|
|
182 - Reaction coverage across different reconstructions
|
|
|
183 - Gene-reaction associations
|
|
|
184 - Pathway representation
|
|
|
185 - Metabolite compartmentalization
|
|
|
186
|
|
|
187 ### Data Integration
|
|
|
188 Prepare model data for:
|
|
|
189 - Custom analysis pipelines
|
|
|
190 - Database integration
|
|
|
191 - Pathway annotation
|
|
|
192 - Cross-reference mapping
|
|
|
193
|
|
|
194 ### Quality Control
|
|
|
195 Validate model properties:
|
|
|
196 - Check reaction balancing
|
|
|
197 - Verify gene associations
|
|
|
198 - Assess network connectivity
|
|
|
199 - Identify missing annotations
|
|
|
200
|
|
|
201 ### Custom Analysis
|
|
|
202 Export structured data for:
|
|
|
203 - Network analysis (graph theory)
|
|
|
204 - Machine learning applications
|
|
|
205 - Statistical modeling
|
|
|
206 - Comparative genomics
|
|
|
207
|
|
|
208 ## Integration Workflow
|
|
|
209
|
|
|
210 ### Downstream Tools
|
|
|
211
|
|
|
212 The extracted tabular data serves as input for:
|
|
|
213
|
|
|
214 #### COBRAxy Tools
|
|
|
215 - [RAS Generator](ras-generator.md) - Use extracted GPR rules
|
|
|
216 - [RPS Generator](rps-generator.md) - Use reaction formulas
|
|
|
217 - [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds
|
|
|
218 - [MAREA](marea.md) - Use reaction annotations
|
|
|
219
|
|
|
220 #### External Analysis
|
|
|
221 - **R/Bioconductor**: Import CSV for pathway analysis
|
|
|
222 - **Python/pandas**: Load data for network analysis
|
|
|
223 - **MATLAB**: Process XLSX for modeling
|
|
|
224 - **Cytoscape**: Network visualization
|
|
|
225 - **Databases**: Populate reaction databases
|
|
|
226
|
|
|
227 ### Typical Pipeline
|
|
|
228
|
|
|
229 ```bash
|
|
|
230 # 1. Extract model components
|
|
|
231 importMetabolicModel --model ENGRO2 --name ModelData \
|
|
|
232 --out_tabular model_components.csv \
|
|
|
233 --tool_dir /opt/COBRAxy/src
|
|
|
234
|
|
|
235 # 2. Use extracted data for RAS analysis
|
|
|
236 ras_generator -td /opt/COBRAxy/src -rs Custom \
|
|
|
237 -rl model_components.csv \
|
|
|
238 -in expression_data.tsv -ra ras_scores.tsv
|
|
|
239
|
|
|
240 # 3. Apply constraints and sample fluxes
|
|
|
241 ras_to_bounds -td /opt/COBRAxy/src -ms Custom -mo model_components.csv \
|
|
|
242 -ir ras_scores.tsv -idop constrained_bounds/
|
|
|
243
|
|
|
244 # 4. Visualize results
|
|
|
245 marea -td /opt/COBRAxy/src -input_data ras_scores.tsv \
|
|
|
246 -choice_map Custom -custom_map custom.svg -idop results/
|
|
|
247 ```
|
|
|
248
|
|
|
249 ## Quality Control
|
|
|
250
|
|
|
251 ### Pre-extraction Validation
|
|
|
252 - Verify model file integrity and format
|
|
|
253 - Check SBML compliance for custom models
|
|
|
254 - Validate gene ID formats and coverage
|
|
|
255 - Confirm medium constraint specifications
|
|
|
256
|
|
|
257 ### Post-extraction Checks
|
|
|
258 - **Completeness**: Verify all expected reactions extracted
|
|
|
259 - **Consistency**: Check stoichiometric balance
|
|
|
260 - **Annotations**: Validate gene-reaction associations
|
|
|
261 - **Formatting**: Confirm output file structure
|
|
|
262
|
|
|
263 ### Data Validation
|
|
|
264
|
|
|
265 #### Reaction Balancing
|
|
|
266 ```bash
|
|
|
267 # Check for unbalanced reactions
|
|
|
268 awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv
|
|
|
269 ```
|
|
|
270
|
|
|
271 #### Gene Coverage
|
|
|
272 ```bash
|
|
|
273 # Count reactions with GPR rules
|
|
|
274 awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv
|
|
|
275 ```
|
|
|
276
|
|
|
277 #### Exchange Reactions
|
|
|
278 ```bash
|
|
|
279 # List medium components
|
|
|
280 awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv
|
|
|
281 ```
|
|
|
282
|
|
|
283 ## Tips and Best Practices
|
|
|
284
|
|
|
285 ### Model Selection
|
|
|
286 - **ENGRO2**: Balanced coverage for human tissue analysis
|
|
|
287 - **HMRcore**: Fast processing for algorithm development
|
|
|
288 - **Recon**: Comprehensive analysis requiring computational resources
|
|
|
289 - **Custom**: Organism-specific or specialized models
|
|
|
290
|
|
|
291 ### Output Format Optimization
|
|
|
292 - **CSV**: Lightweight, universal compatibility
|
|
|
293 - Choose based on downstream analysis requirements
|
|
|
294
|
|
|
295 ### Performance Considerations
|
|
|
296 - Large models (Recon) may require substantial memory
|
|
|
297 - Consider batch processing for multiple extractions
|
|
|
298
|
|
|
299 ## Troubleshooting
|
|
|
300
|
|
|
301 ### Common Issues
|
|
|
302
|
|
|
303 **Model loading fails**
|
|
|
304 - Check file format and compression
|
|
|
305 - Verify SBML/JSON/MAT/YAML validity for custom models
|
|
|
306 - Ensure sufficient system memory
|
|
|
307
|
|
|
308 **Empty output file**
|
|
|
309 - Model may contain no reactions
|
|
|
310 - Check model file integrity
|
|
|
311 - Verify tool directory configuration
|
|
|
312
|
|
|
313 ### Error Messages
|
|
|
314
|
|
|
315 | Error | Cause | Solution |
|
|
|
316 |-------|-------|----------|
|
|
|
317 | "Model file not found" | Invalid file path | Check file location and permissions |
|
|
|
318 | "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YAML |
|
|
|
319 | "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory |
|
|
|
320
|
|
|
321 ### Performance Issues
|
|
|
322
|
|
|
323 **Slow processing**
|
|
|
324 - Large models require more time
|
|
|
325 - Monitor system resource usage
|
|
|
326
|
|
|
327 **Memory errors**
|
|
|
328 - Reduce model size if possible
|
|
|
329 - Process in smaller batches
|
|
|
330 - Increase available system memory
|
|
|
331
|
|
|
332 **Output file corruption**
|
|
|
333 - Check disk space availability
|
|
|
334 - Verify file write permissions
|
|
|
335 - Monitor for system interruptions
|
|
|
336
|
|
|
337 ## Advanced Usage
|
|
|
338
|
|
|
339 ### Batch Extraction Script
|
|
|
340
|
|
|
341 ```python
|
|
|
342 #!/usr/bin/env python3
|
|
|
343 import subprocess
|
|
|
344 import sys
|
|
|
345
|
|
|
346 models = ['ENGRO2', 'HMRcore', 'Recon']
|
|
|
347
|
|
|
348 for model in models:
|
|
|
349 cmd = [
|
|
|
350 'importMetabolicModel',
|
|
|
351 '--model', model,
|
|
|
352 '--name', f'{model}_data',
|
|
|
353 '--medium_selector', 'allOpen',
|
|
|
354 '--out_tabular', f'{model}.csv',
|
|
|
355 '--out_log', f'{model}.log',
|
|
|
356 '--tool_dir', '/opt/COBRAxy/src'
|
|
|
357 ]
|
|
|
358 subprocess.run(cmd, check=True)
|
|
|
359 ```
|
|
|
360
|
|
|
361 ### Database Integration
|
|
|
362
|
|
|
363 Export model data to databases:
|
|
|
364
|
|
|
365 ```sql
|
|
|
366 -- Load CSV into PostgreSQL
|
|
|
367 CREATE TABLE model_reactions (
|
|
|
368 reaction_id VARCHAR(50),
|
|
|
369 gpr_rule TEXT,
|
|
|
370 reaction_formula TEXT,
|
|
|
371 lower_bound FLOAT,
|
|
|
372 upper_bound FLOAT,
|
|
|
373 objective_coefficient FLOAT,
|
|
|
374 medium_member BOOLEAN,
|
|
|
375 compartment VARCHAR(50),
|
|
|
376 subsystem VARCHAR(100)
|
|
|
377 );
|
|
|
378
|
|
|
379 COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER;
|
|
|
380 ```
|
|
|
381
|
|
|
382 ## See Also
|
|
|
383
|
|
|
384 - [Export Metabolic Model](export-metabolic-model.md) - Export tabular data to model formats
|
|
|
385 - [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation
|
|
|
386 - [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis
|
|
|
387 - [Custom Model Tutorial](/tutorials/custom-model-integration.md) |