492
|
1 # Metabolic Model Setting
|
|
2
|
|
3 Extract and organize metabolic model components into tabular format for analysis and integration.
|
|
4
|
|
5 ## Overview
|
|
6
|
|
7 Metabolic Model Setting (metabolicModel2Tabular) extracts key components from SBML metabolic models and generates comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis.
|
|
8
|
|
9 ## Usage
|
|
10
|
|
11 ### Command Line
|
|
12
|
|
13 ```bash
|
|
14 metabolicModel2Tabular --model ENGRO2 \
|
|
15 --name ENGRO2 \
|
|
16 --medium_selector allOpen \
|
|
17 --gene_format Default \
|
|
18 --out_tabular model_data.csv \
|
|
19 --out_log extraction.log \
|
|
20 --tool_dir /path/to/COBRAxy
|
|
21 ```
|
|
22
|
|
23 ### Galaxy Interface
|
|
24
|
|
25 Select "Metabolic Model Setting" from the COBRAxy tool suite and configure model extraction parameters.
|
|
26
|
|
27 ## Parameters
|
|
28
|
|
29 ### Required Parameters
|
|
30
|
|
31 | Parameter | Flag | Description |
|
|
32 |-----------|------|-------------|
|
|
33 | Model Name | `--name` | Model identifier for output files |
|
|
34 | Medium Selector | `--medium_selector` | Medium configuration option |
|
|
35 | Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) |
|
|
36 | Output Log | `--out_log` | Log file for processing information |
|
|
37 | Tool Directory | `--tool_dir` | COBRAxy installation directory |
|
|
38
|
|
39 ### Model Selection Parameters
|
|
40
|
|
41 | Parameter | Flag | Description | Default |
|
|
42 |-----------|------|-------------|---------|
|
|
43 | Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - |
|
|
44 | Custom Model | `--input` | Path to custom SBML/JSON model file | - |
|
|
45
|
|
46 **Note**: Provide either `--model` OR `--input`, not both.
|
|
47
|
|
48 ### Optional Parameters
|
|
49
|
|
50 | Parameter | Flag | Description | Default |
|
|
51 |-----------|------|-------------|---------|
|
|
52 | Gene Format | `--gene_format` | Gene ID format conversion | Default |
|
|
53
|
|
54 ## Model Selection
|
|
55
|
|
56 ### Built-in Models
|
|
57
|
|
58 #### ENGRO2
|
|
59 - **Species**: Homo sapiens
|
|
60 - **Scope**: Genome-scale reconstruction
|
|
61 - **Reactions**: ~2,000 reactions
|
|
62 - **Metabolites**: ~1,500 metabolites
|
|
63 - **Coverage**: Comprehensive human metabolism
|
|
64
|
|
65 #### Recon
|
|
66 - **Species**: Homo sapiens
|
|
67 - **Scope**: Recon3D human reconstruction
|
|
68 - **Reactions**: ~10,000+ reactions
|
|
69 - **Metabolites**: ~5,000+ metabolites
|
|
70 - **Coverage**: Most comprehensive human model
|
|
71
|
|
72 #### HMRcore
|
|
73 - **Species**: Homo sapiens
|
|
74 - **Scope**: Core metabolic network
|
|
75 - **Reactions**: ~300 essential reactions
|
|
76 - **Metabolites**: ~200 core metabolites
|
|
77 - **Coverage**: Central carbon and energy metabolism
|
|
78
|
|
79 ### Custom Models
|
|
80
|
|
81 Supported formats for custom model import:
|
|
82 - **SBML**: Systems Biology Markup Language (.xml, .sbml)
|
|
83 - **JSON**: COBRApy JSON format (.json)
|
|
84 - **MAT**: MATLAB format (.mat)
|
|
85 - **YML**: YAML format (.yml, .yaml)
|
|
86 - **Compressed**: All formats support .gz, .zip, .bz2 compression
|
|
87
|
|
88 ## Medium Configuration
|
|
89
|
|
90 ### allOpen (Default)
|
|
91 - All exchange reactions unconstrained
|
|
92 - Maximum metabolic flexibility
|
|
93 - Suitable for general analysis
|
|
94
|
|
95 ### Custom Medium
|
|
96 User can specify custom medium constraints through Galaxy interface or by modifying the tool configuration.
|
|
97
|
|
98 ## Gene Format Options
|
|
99
|
|
100 | Format | Description | Example |
|
|
101 |--------|-------------|---------|
|
|
102 | Default | Original model gene IDs | As stored in model |
|
|
103 | ENSNG | Ensembl Gene IDs | ENSG00000139618 |
|
|
104 | HGNC_SYMBOL | HUGO Gene Symbols | BRCA2 |
|
|
105 | HGNC_ID | HUGO Gene Committee IDs | HGNC:1101 |
|
|
106 | ENTREZ | NCBI Entrez Gene IDs | 675 |
|
|
107
|
|
108 Gene format conversion uses internal mapping tables and may not cover all genes in custom models.
|
|
109
|
|
110 ## Output Format
|
|
111
|
|
112 ### Tabular Summary File
|
|
113
|
|
114 The output contains comprehensive model information in CSV or XLSX format:
|
|
115
|
|
116 #### Column Structure
|
|
117 ```
|
|
118 Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem
|
|
119 R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis
|
|
120 R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle
|
|
121 EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange
|
|
122 ```
|
|
123
|
|
124 #### Data Fields
|
|
125
|
|
126 | Field | Description | Values |
|
|
127 |-------|-------------|---------|
|
|
128 | Reaction_ID | Unique reaction identifier | String |
|
|
129 | GPR_Rule | Gene-protein-reaction association | Logical expression |
|
|
130 | Reaction_Formula | Stoichiometric equation | Metabolites with coefficients |
|
|
131 | Lower_Bound | Minimum flux constraint | Numeric (typically -1000) |
|
|
132 | Upper_Bound | Maximum flux constraint | Numeric (typically 1000) |
|
|
133 | Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) |
|
|
134 | Medium_Member | Exchange reaction flag | TRUE/FALSE |
|
|
135 | Compartment | Subcellular location | String (for ENGRO2 only) |
|
|
136 | Subsystem | Metabolic pathway | String |
|
|
137
|
|
138 ## Examples
|
|
139
|
|
140 ### Extract Built-in Model Data
|
|
141
|
|
142 ```bash
|
|
143 # Extract ENGRO2 model with default settings
|
|
144 metabolicModel2Tabular --model ENGRO2 \
|
|
145 --name ENGRO2_extraction \
|
|
146 --medium_selector allOpen \
|
|
147 --gene_format Default \
|
|
148 --out_tabular ENGRO2_data.csv \
|
|
149 --out_log ENGRO2_log.txt \
|
|
150 --tool_dir /opt/COBRAxy
|
|
151 ```
|
|
152
|
|
153 ### Process Custom Model
|
|
154
|
|
155 ```bash
|
|
156 # Extract custom SBML model with gene conversion
|
|
157 metabolicModel2Tabular --input /data/custom_model.xml \
|
|
158 --name CustomModel \
|
|
159 --medium_selector allOpen \
|
|
160 --gene_format HGNC_SYMBOL \
|
|
161 --out_tabular custom_model_data.xlsx \
|
|
162 --out_log custom_extraction.log \
|
|
163 --tool_dir /opt/COBRAxy
|
|
164 ```
|
|
165
|
|
166 ### Extract Core Model for Quick Analysis
|
|
167
|
|
168 ```bash
|
|
169 # Extract HMRcore for rapid prototyping
|
|
170 metabolicModel2Tabular --model HMRcore \
|
|
171 --name CoreModel \
|
|
172 --medium_selector allOpen \
|
|
173 --gene_format ENSNG \
|
|
174 --out_tabular core_reactions.csv \
|
|
175 --out_log core_log.txt \
|
|
176 --tool_dir /opt/COBRAxy
|
|
177 ```
|
|
178
|
|
179 ### Batch Processing Multiple Models
|
|
180
|
|
181 ```bash
|
|
182 #!/bin/bash
|
|
183 models=("ENGRO2" "HMRcore" "Recon")
|
|
184 for model in "${models[@]}"; do
|
|
185 metabolicModel2Tabular --model "$model" \
|
|
186 --name "${model}_extract" \
|
|
187 --medium_selector allOpen \
|
|
188 --gene_format HGNC_SYMBOL \
|
|
189 --out_tabular "${model}_data.csv" \
|
|
190 --out_log "${model}_log.txt" \
|
|
191 --tool_dir /opt/COBRAxy
|
|
192 done
|
|
193 ```
|
|
194
|
|
195 ## Use Cases
|
|
196
|
|
197 ### Model Comparison
|
|
198 Extract multiple models to compare:
|
|
199 - Reaction coverage across different reconstructions
|
|
200 - Gene-reaction associations
|
|
201 - Pathway representation
|
|
202 - Metabolite compartmentalization
|
|
203
|
|
204 ### Data Integration
|
|
205 Prepare model data for:
|
|
206 - Custom analysis pipelines
|
|
207 - Database integration
|
|
208 - Pathway annotation
|
|
209 - Cross-reference mapping
|
|
210
|
|
211 ### Quality Control
|
|
212 Validate model properties:
|
|
213 - Check reaction balancing
|
|
214 - Verify gene associations
|
|
215 - Assess network connectivity
|
|
216 - Identify missing annotations
|
|
217
|
|
218 ### Custom Analysis
|
|
219 Export structured data for:
|
|
220 - Network analysis (graph theory)
|
|
221 - Machine learning applications
|
|
222 - Statistical modeling
|
|
223 - Comparative genomics
|
|
224
|
|
225 ## Integration Workflow
|
|
226
|
|
227 ### Downstream Tools
|
|
228
|
|
229 The extracted tabular data serves as input for:
|
|
230
|
|
231 #### COBRAxy Tools
|
|
232 - [RAS Generator](ras-generator.md) - Use extracted GPR rules
|
|
233 - [RPS Generator](rps-generator.md) - Use reaction formulas
|
|
234 - [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds
|
|
235 - [MAREA](marea.md) - Use reaction annotations
|
|
236
|
|
237 #### External Analysis
|
|
238 - **R/Bioconductor**: Import CSV for pathway analysis
|
|
239 - **Python/pandas**: Load data for network analysis
|
|
240 - **MATLAB**: Process XLSX for modeling
|
|
241 - **Cytoscape**: Network visualization
|
|
242 - **Databases**: Populate reaction databases
|
|
243
|
|
244 ### Typical Pipeline
|
|
245
|
|
246 ```bash
|
|
247 # 1. Extract model components
|
|
248 metabolicModel2Tabular --model ENGRO2 --name ModelData \
|
|
249 --out_tabular model_components.csv
|
|
250
|
|
251 # 2. Use extracted data for RAS analysis
|
|
252 ras_generator -td /opt/COBRAxy -rs Custom \
|
|
253 -rl model_components.csv \
|
|
254 -in expression_data.tsv -ra ras_scores.tsv
|
|
255
|
|
256 # 3. Apply constraints and sample fluxes
|
|
257 ras_to_bounds -td /opt/COBRAxy -ms Custom -mo model_components.csv \
|
|
258 -ir ras_scores.tsv -idop constrained_bounds/
|
|
259
|
|
260 # 4. Visualize results
|
|
261 marea -td /opt/COBRAxy -input_data ras_scores.tsv \
|
|
262 -choice_map Custom -custom_map custom.svg -idop results/
|
|
263 ```
|
|
264
|
|
265 ## Quality Control
|
|
266
|
|
267 ### Pre-extraction Validation
|
|
268 - Verify model file integrity and format
|
|
269 - Check SBML compliance for custom models
|
|
270 - Validate gene ID formats and coverage
|
|
271 - Confirm medium constraint specifications
|
|
272
|
|
273 ### Post-extraction Checks
|
|
274 - **Completeness**: Verify all expected reactions extracted
|
|
275 - **Consistency**: Check stoichiometric balance
|
|
276 - **Annotations**: Validate gene-reaction associations
|
|
277 - **Formatting**: Confirm output file structure
|
|
278
|
|
279 ### Data Validation
|
|
280
|
|
281 #### Reaction Balancing
|
|
282 ```bash
|
|
283 # Check for unbalanced reactions
|
|
284 awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv
|
|
285 ```
|
|
286
|
|
287 #### Gene Coverage
|
|
288 ```bash
|
|
289 # Count reactions with GPR rules
|
|
290 awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv
|
|
291 ```
|
|
292
|
|
293 #### Exchange Reactions
|
|
294 ```bash
|
|
295 # List medium components
|
|
296 awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv
|
|
297 ```
|
|
298
|
|
299 ## Tips and Best Practices
|
|
300
|
|
301 ### Model Selection
|
|
302 - **ENGRO2**: Balanced coverage for human tissue analysis
|
|
303 - **HMRcore**: Fast processing for algorithm development
|
|
304 - **Recon**: Comprehensive analysis requiring computational resources
|
|
305 - **Custom**: Organism-specific or specialized models
|
|
306
|
|
307 ### Gene Format Selection
|
|
308 - **Default**: Preserve original model annotations
|
|
309 - **HGNC_SYMBOL**: Human-readable gene names
|
|
310 - **ENSNG**: Stable identifiers for bioinformatics
|
|
311 - **ENTREZ**: Cross-database compatibility
|
|
312
|
|
313 ### Output Format Optimization
|
|
314 - **CSV**: Lightweight, universal compatibility
|
|
315 - **XLSX**: Rich formatting, multiple sheets possible
|
|
316 - Choose based on downstream analysis requirements
|
|
317
|
|
318 ### Performance Considerations
|
|
319 - Large models (Recon) may require substantial memory
|
|
320 - Gene format conversion adds processing time
|
|
321 - Consider batch processing for multiple extractions
|
|
322
|
|
323 ## Troubleshooting
|
|
324
|
|
325 ### Common Issues
|
|
326
|
|
327 **Model loading fails**
|
|
328 - Check file format and compression
|
|
329 - Verify SBML validity for custom models
|
|
330 - Ensure sufficient system memory
|
|
331
|
|
332 **Gene format conversion errors**
|
|
333 - Mapping tables may not cover all genes
|
|
334 - Original gene IDs retained when conversion fails
|
|
335 - Check log file for conversion statistics
|
|
336
|
|
337 **Empty output file**
|
|
338 - Model may contain no reactions
|
|
339 - Check model file integrity
|
|
340 - Verify tool directory configuration
|
|
341
|
|
342 ### Error Messages
|
|
343
|
|
344 | Error | Cause | Solution |
|
|
345 |-------|-------|----------|
|
|
346 | "Model file not found" | Invalid file path | Check file location and permissions |
|
|
347 | "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YML |
|
|
348 | "Gene mapping failed" | Missing gene conversion data | Use Default format or update mappings |
|
|
349 | "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory |
|
|
350
|
|
351 ### Performance Issues
|
|
352
|
|
353 **Slow processing**
|
|
354 - Large models require more time
|
|
355 - Gene conversion adds overhead
|
|
356 - Monitor system resource usage
|
|
357
|
|
358 **Memory errors**
|
|
359 - Reduce model size if possible
|
|
360 - Process in smaller batches
|
|
361 - Increase available system memory
|
|
362
|
|
363 **Output file corruption**
|
|
364 - Check disk space availability
|
|
365 - Verify file write permissions
|
|
366 - Monitor for system interruptions
|
|
367
|
|
368 ## Advanced Usage
|
|
369
|
|
370 ### Custom Gene Mapping
|
|
371
|
|
372 Advanced users can extend gene format conversion by modifying mapping files in the `local/mappings/` directory.
|
|
373
|
|
374 ### Batch Extraction Script
|
|
375
|
|
376 ```python
|
|
377 #!/usr/bin env python3
|
|
378 import subprocess
|
|
379 import sys
|
|
380
|
|
381 models = ['ENGRO2', 'HMRcore', 'Recon']
|
|
382 formats = ['Default', 'HGNC_SYMBOL', 'ENSNG']
|
|
383
|
|
384 for model in models:
|
|
385 for fmt in formats:
|
|
386 cmd = [
|
|
387 'metabolicModel2Tabular',
|
|
388 '--model', model,
|
|
389 '--name', f'{model}_{fmt}',
|
|
390 '--medium_selector', 'allOpen',
|
|
391 '--gene_format', fmt,
|
|
392 '--out_tabular', f'{model}_{fmt}.csv',
|
|
393 '--out_log', f'{model}_{fmt}.log',
|
|
394 '--tool_dir', '/opt/COBRAxy'
|
|
395 ]
|
|
396 subprocess.run(cmd, check=True)
|
|
397 ```
|
|
398
|
|
399 ### Database Integration
|
|
400
|
|
401 Export model data to databases:
|
|
402
|
|
403 ```sql
|
|
404 -- Load CSV into PostgreSQL
|
|
405 CREATE TABLE model_reactions (
|
|
406 reaction_id VARCHAR(50),
|
|
407 gpr_rule TEXT,
|
|
408 reaction_formula TEXT,
|
|
409 lower_bound FLOAT,
|
|
410 upper_bound FLOAT,
|
|
411 objective_coefficient FLOAT,
|
|
412 medium_member BOOLEAN,
|
|
413 compartment VARCHAR(50),
|
|
414 subsystem VARCHAR(100)
|
|
415 );
|
|
416
|
|
417 COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER;
|
|
418 ```
|
|
419
|
|
420 ## See Also
|
|
421
|
|
422 - [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation
|
|
423 - [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis
|
|
424 - [Custom Model Tutorial](../tutorials/custom-model-integration.md)
|
|
425 - [Gene Mapping Reference](../tutorials/gene-id-conversion.md) |