comparison COBRAxy/docs/tools/metabolic-model-setting.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents
children
comparison
equal deleted inserted replaced
491:7a413a5ec566 492:4ed95023af20
1 # Metabolic Model Setting
2
3 Extract and organize metabolic model components into tabular format for analysis and integration.
4
5 ## Overview
6
7 Metabolic Model Setting (metabolicModel2Tabular) extracts key components from SBML metabolic models and generates comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis.
8
9 ## Usage
10
11 ### Command Line
12
13 ```bash
14 metabolicModel2Tabular --model ENGRO2 \
15 --name ENGRO2 \
16 --medium_selector allOpen \
17 --gene_format Default \
18 --out_tabular model_data.csv \
19 --out_log extraction.log \
20 --tool_dir /path/to/COBRAxy
21 ```
22
23 ### Galaxy Interface
24
25 Select "Metabolic Model Setting" from the COBRAxy tool suite and configure model extraction parameters.
26
27 ## Parameters
28
29 ### Required Parameters
30
31 | Parameter | Flag | Description |
32 |-----------|------|-------------|
33 | Model Name | `--name` | Model identifier for output files |
34 | Medium Selector | `--medium_selector` | Medium configuration option |
35 | Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) |
36 | Output Log | `--out_log` | Log file for processing information |
37 | Tool Directory | `--tool_dir` | COBRAxy installation directory |
38
39 ### Model Selection Parameters
40
41 | Parameter | Flag | Description | Default |
42 |-----------|------|-------------|---------|
43 | Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - |
44 | Custom Model | `--input` | Path to custom SBML/JSON model file | - |
45
46 **Note**: Provide either `--model` OR `--input`, not both.
47
48 ### Optional Parameters
49
50 | Parameter | Flag | Description | Default |
51 |-----------|------|-------------|---------|
52 | Gene Format | `--gene_format` | Gene ID format conversion | Default |
53
54 ## Model Selection
55
56 ### Built-in Models
57
58 #### ENGRO2
59 - **Species**: Homo sapiens
60 - **Scope**: Genome-scale reconstruction
61 - **Reactions**: ~2,000 reactions
62 - **Metabolites**: ~1,500 metabolites
63 - **Coverage**: Comprehensive human metabolism
64
65 #### Recon
66 - **Species**: Homo sapiens
67 - **Scope**: Recon3D human reconstruction
68 - **Reactions**: ~10,000+ reactions
69 - **Metabolites**: ~5,000+ metabolites
70 - **Coverage**: Most comprehensive human model
71
72 #### HMRcore
73 - **Species**: Homo sapiens
74 - **Scope**: Core metabolic network
75 - **Reactions**: ~300 essential reactions
76 - **Metabolites**: ~200 core metabolites
77 - **Coverage**: Central carbon and energy metabolism
78
79 ### Custom Models
80
81 Supported formats for custom model import:
82 - **SBML**: Systems Biology Markup Language (.xml, .sbml)
83 - **JSON**: COBRApy JSON format (.json)
84 - **MAT**: MATLAB format (.mat)
85 - **YML**: YAML format (.yml, .yaml)
86 - **Compressed**: All formats support .gz, .zip, .bz2 compression
87
88 ## Medium Configuration
89
90 ### allOpen (Default)
91 - All exchange reactions unconstrained
92 - Maximum metabolic flexibility
93 - Suitable for general analysis
94
95 ### Custom Medium
96 User can specify custom medium constraints through Galaxy interface or by modifying the tool configuration.
97
98 ## Gene Format Options
99
100 | Format | Description | Example |
101 |--------|-------------|---------|
102 | Default | Original model gene IDs | As stored in model |
103 | ENSNG | Ensembl Gene IDs | ENSG00000139618 |
104 | HGNC_SYMBOL | HUGO Gene Symbols | BRCA2 |
105 | HGNC_ID | HUGO Gene Committee IDs | HGNC:1101 |
106 | ENTREZ | NCBI Entrez Gene IDs | 675 |
107
108 Gene format conversion uses internal mapping tables and may not cover all genes in custom models.
109
110 ## Output Format
111
112 ### Tabular Summary File
113
114 The output contains comprehensive model information in CSV or XLSX format:
115
116 #### Column Structure
117 ```
118 Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem
119 R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis
120 R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle
121 EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange
122 ```
123
124 #### Data Fields
125
126 | Field | Description | Values |
127 |-------|-------------|---------|
128 | Reaction_ID | Unique reaction identifier | String |
129 | GPR_Rule | Gene-protein-reaction association | Logical expression |
130 | Reaction_Formula | Stoichiometric equation | Metabolites with coefficients |
131 | Lower_Bound | Minimum flux constraint | Numeric (typically -1000) |
132 | Upper_Bound | Maximum flux constraint | Numeric (typically 1000) |
133 | Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) |
134 | Medium_Member | Exchange reaction flag | TRUE/FALSE |
135 | Compartment | Subcellular location | String (for ENGRO2 only) |
136 | Subsystem | Metabolic pathway | String |
137
138 ## Examples
139
140 ### Extract Built-in Model Data
141
142 ```bash
143 # Extract ENGRO2 model with default settings
144 metabolicModel2Tabular --model ENGRO2 \
145 --name ENGRO2_extraction \
146 --medium_selector allOpen \
147 --gene_format Default \
148 --out_tabular ENGRO2_data.csv \
149 --out_log ENGRO2_log.txt \
150 --tool_dir /opt/COBRAxy
151 ```
152
153 ### Process Custom Model
154
155 ```bash
156 # Extract custom SBML model with gene conversion
157 metabolicModel2Tabular --input /data/custom_model.xml \
158 --name CustomModel \
159 --medium_selector allOpen \
160 --gene_format HGNC_SYMBOL \
161 --out_tabular custom_model_data.xlsx \
162 --out_log custom_extraction.log \
163 --tool_dir /opt/COBRAxy
164 ```
165
166 ### Extract Core Model for Quick Analysis
167
168 ```bash
169 # Extract HMRcore for rapid prototyping
170 metabolicModel2Tabular --model HMRcore \
171 --name CoreModel \
172 --medium_selector allOpen \
173 --gene_format ENSNG \
174 --out_tabular core_reactions.csv \
175 --out_log core_log.txt \
176 --tool_dir /opt/COBRAxy
177 ```
178
179 ### Batch Processing Multiple Models
180
181 ```bash
182 #!/bin/bash
183 models=("ENGRO2" "HMRcore" "Recon")
184 for model in "${models[@]}"; do
185 metabolicModel2Tabular --model "$model" \
186 --name "${model}_extract" \
187 --medium_selector allOpen \
188 --gene_format HGNC_SYMBOL \
189 --out_tabular "${model}_data.csv" \
190 --out_log "${model}_log.txt" \
191 --tool_dir /opt/COBRAxy
192 done
193 ```
194
195 ## Use Cases
196
197 ### Model Comparison
198 Extract multiple models to compare:
199 - Reaction coverage across different reconstructions
200 - Gene-reaction associations
201 - Pathway representation
202 - Metabolite compartmentalization
203
204 ### Data Integration
205 Prepare model data for:
206 - Custom analysis pipelines
207 - Database integration
208 - Pathway annotation
209 - Cross-reference mapping
210
211 ### Quality Control
212 Validate model properties:
213 - Check reaction balancing
214 - Verify gene associations
215 - Assess network connectivity
216 - Identify missing annotations
217
218 ### Custom Analysis
219 Export structured data for:
220 - Network analysis (graph theory)
221 - Machine learning applications
222 - Statistical modeling
223 - Comparative genomics
224
225 ## Integration Workflow
226
227 ### Downstream Tools
228
229 The extracted tabular data serves as input for:
230
231 #### COBRAxy Tools
232 - [RAS Generator](ras-generator.md) - Use extracted GPR rules
233 - [RPS Generator](rps-generator.md) - Use reaction formulas
234 - [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds
235 - [MAREA](marea.md) - Use reaction annotations
236
237 #### External Analysis
238 - **R/Bioconductor**: Import CSV for pathway analysis
239 - **Python/pandas**: Load data for network analysis
240 - **MATLAB**: Process XLSX for modeling
241 - **Cytoscape**: Network visualization
242 - **Databases**: Populate reaction databases
243
244 ### Typical Pipeline
245
246 ```bash
247 # 1. Extract model components
248 metabolicModel2Tabular --model ENGRO2 --name ModelData \
249 --out_tabular model_components.csv
250
251 # 2. Use extracted data for RAS analysis
252 ras_generator -td /opt/COBRAxy -rs Custom \
253 -rl model_components.csv \
254 -in expression_data.tsv -ra ras_scores.tsv
255
256 # 3. Apply constraints and sample fluxes
257 ras_to_bounds -td /opt/COBRAxy -ms Custom -mo model_components.csv \
258 -ir ras_scores.tsv -idop constrained_bounds/
259
260 # 4. Visualize results
261 marea -td /opt/COBRAxy -input_data ras_scores.tsv \
262 -choice_map Custom -custom_map custom.svg -idop results/
263 ```
264
265 ## Quality Control
266
267 ### Pre-extraction Validation
268 - Verify model file integrity and format
269 - Check SBML compliance for custom models
270 - Validate gene ID formats and coverage
271 - Confirm medium constraint specifications
272
273 ### Post-extraction Checks
274 - **Completeness**: Verify all expected reactions extracted
275 - **Consistency**: Check stoichiometric balance
276 - **Annotations**: Validate gene-reaction associations
277 - **Formatting**: Confirm output file structure
278
279 ### Data Validation
280
281 #### Reaction Balancing
282 ```bash
283 # Check for unbalanced reactions
284 awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv
285 ```
286
287 #### Gene Coverage
288 ```bash
289 # Count reactions with GPR rules
290 awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv
291 ```
292
293 #### Exchange Reactions
294 ```bash
295 # List medium components
296 awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv
297 ```
298
299 ## Tips and Best Practices
300
301 ### Model Selection
302 - **ENGRO2**: Balanced coverage for human tissue analysis
303 - **HMRcore**: Fast processing for algorithm development
304 - **Recon**: Comprehensive analysis requiring computational resources
305 - **Custom**: Organism-specific or specialized models
306
307 ### Gene Format Selection
308 - **Default**: Preserve original model annotations
309 - **HGNC_SYMBOL**: Human-readable gene names
310 - **ENSNG**: Stable identifiers for bioinformatics
311 - **ENTREZ**: Cross-database compatibility
312
313 ### Output Format Optimization
314 - **CSV**: Lightweight, universal compatibility
315 - **XLSX**: Rich formatting, multiple sheets possible
316 - Choose based on downstream analysis requirements
317
318 ### Performance Considerations
319 - Large models (Recon) may require substantial memory
320 - Gene format conversion adds processing time
321 - Consider batch processing for multiple extractions
322
323 ## Troubleshooting
324
325 ### Common Issues
326
327 **Model loading fails**
328 - Check file format and compression
329 - Verify SBML validity for custom models
330 - Ensure sufficient system memory
331
332 **Gene format conversion errors**
333 - Mapping tables may not cover all genes
334 - Original gene IDs retained when conversion fails
335 - Check log file for conversion statistics
336
337 **Empty output file**
338 - Model may contain no reactions
339 - Check model file integrity
340 - Verify tool directory configuration
341
342 ### Error Messages
343
344 | Error | Cause | Solution |
345 |-------|-------|----------|
346 | "Model file not found" | Invalid file path | Check file location and permissions |
347 | "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YML |
348 | "Gene mapping failed" | Missing gene conversion data | Use Default format or update mappings |
349 | "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory |
350
351 ### Performance Issues
352
353 **Slow processing**
354 - Large models require more time
355 - Gene conversion adds overhead
356 - Monitor system resource usage
357
358 **Memory errors**
359 - Reduce model size if possible
360 - Process in smaller batches
361 - Increase available system memory
362
363 **Output file corruption**
364 - Check disk space availability
365 - Verify file write permissions
366 - Monitor for system interruptions
367
368 ## Advanced Usage
369
370 ### Custom Gene Mapping
371
372 Advanced users can extend gene format conversion by modifying mapping files in the `local/mappings/` directory.
373
374 ### Batch Extraction Script
375
376 ```python
377 #!/usr/bin env python3
378 import subprocess
379 import sys
380
381 models = ['ENGRO2', 'HMRcore', 'Recon']
382 formats = ['Default', 'HGNC_SYMBOL', 'ENSNG']
383
384 for model in models:
385 for fmt in formats:
386 cmd = [
387 'metabolicModel2Tabular',
388 '--model', model,
389 '--name', f'{model}_{fmt}',
390 '--medium_selector', 'allOpen',
391 '--gene_format', fmt,
392 '--out_tabular', f'{model}_{fmt}.csv',
393 '--out_log', f'{model}_{fmt}.log',
394 '--tool_dir', '/opt/COBRAxy'
395 ]
396 subprocess.run(cmd, check=True)
397 ```
398
399 ### Database Integration
400
401 Export model data to databases:
402
403 ```sql
404 -- Load CSV into PostgreSQL
405 CREATE TABLE model_reactions (
406 reaction_id VARCHAR(50),
407 gpr_rule TEXT,
408 reaction_formula TEXT,
409 lower_bound FLOAT,
410 upper_bound FLOAT,
411 objective_coefficient FLOAT,
412 medium_member BOOLEAN,
413 compartment VARCHAR(50),
414 subsystem VARCHAR(100)
415 );
416
417 COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER;
418 ```
419
420 ## See Also
421
422 - [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation
423 - [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis
424 - [Custom Model Tutorial](../tutorials/custom-model-integration.md)
425 - [Gene Mapping Reference](../tutorials/gene-id-conversion.md)