Mercurial > repos > bimib > cobraxy
comparison COBRAxy/docs/tools/metabolic-model-setting.md @ 492:4ed95023af20 draft
Uploaded
| author | francesco_lapi |
|---|---|
| date | Tue, 30 Sep 2025 14:02:17 +0000 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 491:7a413a5ec566 | 492:4ed95023af20 |
|---|---|
| 1 # Metabolic Model Setting | |
| 2 | |
| 3 Extract and organize metabolic model components into tabular format for analysis and integration. | |
| 4 | |
| 5 ## Overview | |
| 6 | |
| 7 Metabolic Model Setting (metabolicModel2Tabular) extracts key components from SBML metabolic models and generates comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis. | |
| 8 | |
| 9 ## Usage | |
| 10 | |
| 11 ### Command Line | |
| 12 | |
| 13 ```bash | |
| 14 metabolicModel2Tabular --model ENGRO2 \ | |
| 15 --name ENGRO2 \ | |
| 16 --medium_selector allOpen \ | |
| 17 --gene_format Default \ | |
| 18 --out_tabular model_data.csv \ | |
| 19 --out_log extraction.log \ | |
| 20 --tool_dir /path/to/COBRAxy | |
| 21 ``` | |
| 22 | |
| 23 ### Galaxy Interface | |
| 24 | |
| 25 Select "Metabolic Model Setting" from the COBRAxy tool suite and configure model extraction parameters. | |
| 26 | |
| 27 ## Parameters | |
| 28 | |
| 29 ### Required Parameters | |
| 30 | |
| 31 | Parameter | Flag | Description | | |
| 32 |-----------|------|-------------| | |
| 33 | Model Name | `--name` | Model identifier for output files | | |
| 34 | Medium Selector | `--medium_selector` | Medium configuration option | | |
| 35 | Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) | | |
| 36 | Output Log | `--out_log` | Log file for processing information | | |
| 37 | Tool Directory | `--tool_dir` | COBRAxy installation directory | | |
| 38 | |
| 39 ### Model Selection Parameters | |
| 40 | |
| 41 | Parameter | Flag | Description | Default | | |
| 42 |-----------|------|-------------|---------| | |
| 43 | Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - | | |
| 44 | Custom Model | `--input` | Path to custom SBML/JSON model file | - | | |
| 45 | |
| 46 **Note**: Provide either `--model` OR `--input`, not both. | |
| 47 | |
| 48 ### Optional Parameters | |
| 49 | |
| 50 | Parameter | Flag | Description | Default | | |
| 51 |-----------|------|-------------|---------| | |
| 52 | Gene Format | `--gene_format` | Gene ID format conversion | Default | | |
| 53 | |
| 54 ## Model Selection | |
| 55 | |
| 56 ### Built-in Models | |
| 57 | |
| 58 #### ENGRO2 | |
| 59 - **Species**: Homo sapiens | |
| 60 - **Scope**: Genome-scale reconstruction | |
| 61 - **Reactions**: ~2,000 reactions | |
| 62 - **Metabolites**: ~1,500 metabolites | |
| 63 - **Coverage**: Comprehensive human metabolism | |
| 64 | |
| 65 #### Recon | |
| 66 - **Species**: Homo sapiens | |
| 67 - **Scope**: Recon3D human reconstruction | |
| 68 - **Reactions**: ~10,000+ reactions | |
| 69 - **Metabolites**: ~5,000+ metabolites | |
| 70 - **Coverage**: Most comprehensive human model | |
| 71 | |
| 72 #### HMRcore | |
| 73 - **Species**: Homo sapiens | |
| 74 - **Scope**: Core metabolic network | |
| 75 - **Reactions**: ~300 essential reactions | |
| 76 - **Metabolites**: ~200 core metabolites | |
| 77 - **Coverage**: Central carbon and energy metabolism | |
| 78 | |
| 79 ### Custom Models | |
| 80 | |
| 81 Supported formats for custom model import: | |
| 82 - **SBML**: Systems Biology Markup Language (.xml, .sbml) | |
| 83 - **JSON**: COBRApy JSON format (.json) | |
| 84 - **MAT**: MATLAB format (.mat) | |
| 85 - **YML**: YAML format (.yml, .yaml) | |
| 86 - **Compressed**: All formats support .gz, .zip, .bz2 compression | |
| 87 | |
| 88 ## Medium Configuration | |
| 89 | |
| 90 ### allOpen (Default) | |
| 91 - All exchange reactions unconstrained | |
| 92 - Maximum metabolic flexibility | |
| 93 - Suitable for general analysis | |
| 94 | |
| 95 ### Custom Medium | |
| 96 User can specify custom medium constraints through Galaxy interface or by modifying the tool configuration. | |
| 97 | |
| 98 ## Gene Format Options | |
| 99 | |
| 100 | Format | Description | Example | | |
| 101 |--------|-------------|---------| | |
| 102 | Default | Original model gene IDs | As stored in model | | |
| 103 | ENSNG | Ensembl Gene IDs | ENSG00000139618 | | |
| 104 | HGNC_SYMBOL | HUGO Gene Symbols | BRCA2 | | |
| 105 | HGNC_ID | HUGO Gene Committee IDs | HGNC:1101 | | |
| 106 | ENTREZ | NCBI Entrez Gene IDs | 675 | | |
| 107 | |
| 108 Gene format conversion uses internal mapping tables and may not cover all genes in custom models. | |
| 109 | |
| 110 ## Output Format | |
| 111 | |
| 112 ### Tabular Summary File | |
| 113 | |
| 114 The output contains comprehensive model information in CSV or XLSX format: | |
| 115 | |
| 116 #### Column Structure | |
| 117 ``` | |
| 118 Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem | |
| 119 R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis | |
| 120 R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle | |
| 121 EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange | |
| 122 ``` | |
| 123 | |
| 124 #### Data Fields | |
| 125 | |
| 126 | Field | Description | Values | | |
| 127 |-------|-------------|---------| | |
| 128 | Reaction_ID | Unique reaction identifier | String | | |
| 129 | GPR_Rule | Gene-protein-reaction association | Logical expression | | |
| 130 | Reaction_Formula | Stoichiometric equation | Metabolites with coefficients | | |
| 131 | Lower_Bound | Minimum flux constraint | Numeric (typically -1000) | | |
| 132 | Upper_Bound | Maximum flux constraint | Numeric (typically 1000) | | |
| 133 | Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) | | |
| 134 | Medium_Member | Exchange reaction flag | TRUE/FALSE | | |
| 135 | Compartment | Subcellular location | String (for ENGRO2 only) | | |
| 136 | Subsystem | Metabolic pathway | String | | |
| 137 | |
| 138 ## Examples | |
| 139 | |
| 140 ### Extract Built-in Model Data | |
| 141 | |
| 142 ```bash | |
| 143 # Extract ENGRO2 model with default settings | |
| 144 metabolicModel2Tabular --model ENGRO2 \ | |
| 145 --name ENGRO2_extraction \ | |
| 146 --medium_selector allOpen \ | |
| 147 --gene_format Default \ | |
| 148 --out_tabular ENGRO2_data.csv \ | |
| 149 --out_log ENGRO2_log.txt \ | |
| 150 --tool_dir /opt/COBRAxy | |
| 151 ``` | |
| 152 | |
| 153 ### Process Custom Model | |
| 154 | |
| 155 ```bash | |
| 156 # Extract custom SBML model with gene conversion | |
| 157 metabolicModel2Tabular --input /data/custom_model.xml \ | |
| 158 --name CustomModel \ | |
| 159 --medium_selector allOpen \ | |
| 160 --gene_format HGNC_SYMBOL \ | |
| 161 --out_tabular custom_model_data.xlsx \ | |
| 162 --out_log custom_extraction.log \ | |
| 163 --tool_dir /opt/COBRAxy | |
| 164 ``` | |
| 165 | |
| 166 ### Extract Core Model for Quick Analysis | |
| 167 | |
| 168 ```bash | |
| 169 # Extract HMRcore for rapid prototyping | |
| 170 metabolicModel2Tabular --model HMRcore \ | |
| 171 --name CoreModel \ | |
| 172 --medium_selector allOpen \ | |
| 173 --gene_format ENSNG \ | |
| 174 --out_tabular core_reactions.csv \ | |
| 175 --out_log core_log.txt \ | |
| 176 --tool_dir /opt/COBRAxy | |
| 177 ``` | |
| 178 | |
| 179 ### Batch Processing Multiple Models | |
| 180 | |
| 181 ```bash | |
| 182 #!/bin/bash | |
| 183 models=("ENGRO2" "HMRcore" "Recon") | |
| 184 for model in "${models[@]}"; do | |
| 185 metabolicModel2Tabular --model "$model" \ | |
| 186 --name "${model}_extract" \ | |
| 187 --medium_selector allOpen \ | |
| 188 --gene_format HGNC_SYMBOL \ | |
| 189 --out_tabular "${model}_data.csv" \ | |
| 190 --out_log "${model}_log.txt" \ | |
| 191 --tool_dir /opt/COBRAxy | |
| 192 done | |
| 193 ``` | |
| 194 | |
| 195 ## Use Cases | |
| 196 | |
| 197 ### Model Comparison | |
| 198 Extract multiple models to compare: | |
| 199 - Reaction coverage across different reconstructions | |
| 200 - Gene-reaction associations | |
| 201 - Pathway representation | |
| 202 - Metabolite compartmentalization | |
| 203 | |
| 204 ### Data Integration | |
| 205 Prepare model data for: | |
| 206 - Custom analysis pipelines | |
| 207 - Database integration | |
| 208 - Pathway annotation | |
| 209 - Cross-reference mapping | |
| 210 | |
| 211 ### Quality Control | |
| 212 Validate model properties: | |
| 213 - Check reaction balancing | |
| 214 - Verify gene associations | |
| 215 - Assess network connectivity | |
| 216 - Identify missing annotations | |
| 217 | |
| 218 ### Custom Analysis | |
| 219 Export structured data for: | |
| 220 - Network analysis (graph theory) | |
| 221 - Machine learning applications | |
| 222 - Statistical modeling | |
| 223 - Comparative genomics | |
| 224 | |
| 225 ## Integration Workflow | |
| 226 | |
| 227 ### Downstream Tools | |
| 228 | |
| 229 The extracted tabular data serves as input for: | |
| 230 | |
| 231 #### COBRAxy Tools | |
| 232 - [RAS Generator](ras-generator.md) - Use extracted GPR rules | |
| 233 - [RPS Generator](rps-generator.md) - Use reaction formulas | |
| 234 - [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds | |
| 235 - [MAREA](marea.md) - Use reaction annotations | |
| 236 | |
| 237 #### External Analysis | |
| 238 - **R/Bioconductor**: Import CSV for pathway analysis | |
| 239 - **Python/pandas**: Load data for network analysis | |
| 240 - **MATLAB**: Process XLSX for modeling | |
| 241 - **Cytoscape**: Network visualization | |
| 242 - **Databases**: Populate reaction databases | |
| 243 | |
| 244 ### Typical Pipeline | |
| 245 | |
| 246 ```bash | |
| 247 # 1. Extract model components | |
| 248 metabolicModel2Tabular --model ENGRO2 --name ModelData \ | |
| 249 --out_tabular model_components.csv | |
| 250 | |
| 251 # 2. Use extracted data for RAS analysis | |
| 252 ras_generator -td /opt/COBRAxy -rs Custom \ | |
| 253 -rl model_components.csv \ | |
| 254 -in expression_data.tsv -ra ras_scores.tsv | |
| 255 | |
| 256 # 3. Apply constraints and sample fluxes | |
| 257 ras_to_bounds -td /opt/COBRAxy -ms Custom -mo model_components.csv \ | |
| 258 -ir ras_scores.tsv -idop constrained_bounds/ | |
| 259 | |
| 260 # 4. Visualize results | |
| 261 marea -td /opt/COBRAxy -input_data ras_scores.tsv \ | |
| 262 -choice_map Custom -custom_map custom.svg -idop results/ | |
| 263 ``` | |
| 264 | |
| 265 ## Quality Control | |
| 266 | |
| 267 ### Pre-extraction Validation | |
| 268 - Verify model file integrity and format | |
| 269 - Check SBML compliance for custom models | |
| 270 - Validate gene ID formats and coverage | |
| 271 - Confirm medium constraint specifications | |
| 272 | |
| 273 ### Post-extraction Checks | |
| 274 - **Completeness**: Verify all expected reactions extracted | |
| 275 - **Consistency**: Check stoichiometric balance | |
| 276 - **Annotations**: Validate gene-reaction associations | |
| 277 - **Formatting**: Confirm output file structure | |
| 278 | |
| 279 ### Data Validation | |
| 280 | |
| 281 #### Reaction Balancing | |
| 282 ```bash | |
| 283 # Check for unbalanced reactions | |
| 284 awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv | |
| 285 ``` | |
| 286 | |
| 287 #### Gene Coverage | |
| 288 ```bash | |
| 289 # Count reactions with GPR rules | |
| 290 awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv | |
| 291 ``` | |
| 292 | |
| 293 #### Exchange Reactions | |
| 294 ```bash | |
| 295 # List medium components | |
| 296 awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv | |
| 297 ``` | |
| 298 | |
| 299 ## Tips and Best Practices | |
| 300 | |
| 301 ### Model Selection | |
| 302 - **ENGRO2**: Balanced coverage for human tissue analysis | |
| 303 - **HMRcore**: Fast processing for algorithm development | |
| 304 - **Recon**: Comprehensive analysis requiring computational resources | |
| 305 - **Custom**: Organism-specific or specialized models | |
| 306 | |
| 307 ### Gene Format Selection | |
| 308 - **Default**: Preserve original model annotations | |
| 309 - **HGNC_SYMBOL**: Human-readable gene names | |
| 310 - **ENSNG**: Stable identifiers for bioinformatics | |
| 311 - **ENTREZ**: Cross-database compatibility | |
| 312 | |
| 313 ### Output Format Optimization | |
| 314 - **CSV**: Lightweight, universal compatibility | |
| 315 - **XLSX**: Rich formatting, multiple sheets possible | |
| 316 - Choose based on downstream analysis requirements | |
| 317 | |
| 318 ### Performance Considerations | |
| 319 - Large models (Recon) may require substantial memory | |
| 320 - Gene format conversion adds processing time | |
| 321 - Consider batch processing for multiple extractions | |
| 322 | |
| 323 ## Troubleshooting | |
| 324 | |
| 325 ### Common Issues | |
| 326 | |
| 327 **Model loading fails** | |
| 328 - Check file format and compression | |
| 329 - Verify SBML validity for custom models | |
| 330 - Ensure sufficient system memory | |
| 331 | |
| 332 **Gene format conversion errors** | |
| 333 - Mapping tables may not cover all genes | |
| 334 - Original gene IDs retained when conversion fails | |
| 335 - Check log file for conversion statistics | |
| 336 | |
| 337 **Empty output file** | |
| 338 - Model may contain no reactions | |
| 339 - Check model file integrity | |
| 340 - Verify tool directory configuration | |
| 341 | |
| 342 ### Error Messages | |
| 343 | |
| 344 | Error | Cause | Solution | | |
| 345 |-------|-------|----------| | |
| 346 | "Model file not found" | Invalid file path | Check file location and permissions | | |
| 347 | "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YML | | |
| 348 | "Gene mapping failed" | Missing gene conversion data | Use Default format or update mappings | | |
| 349 | "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory | | |
| 350 | |
| 351 ### Performance Issues | |
| 352 | |
| 353 **Slow processing** | |
| 354 - Large models require more time | |
| 355 - Gene conversion adds overhead | |
| 356 - Monitor system resource usage | |
| 357 | |
| 358 **Memory errors** | |
| 359 - Reduce model size if possible | |
| 360 - Process in smaller batches | |
| 361 - Increase available system memory | |
| 362 | |
| 363 **Output file corruption** | |
| 364 - Check disk space availability | |
| 365 - Verify file write permissions | |
| 366 - Monitor for system interruptions | |
| 367 | |
| 368 ## Advanced Usage | |
| 369 | |
| 370 ### Custom Gene Mapping | |
| 371 | |
| 372 Advanced users can extend gene format conversion by modifying mapping files in the `local/mappings/` directory. | |
| 373 | |
| 374 ### Batch Extraction Script | |
| 375 | |
| 376 ```python | |
| 377 #!/usr/bin env python3 | |
| 378 import subprocess | |
| 379 import sys | |
| 380 | |
| 381 models = ['ENGRO2', 'HMRcore', 'Recon'] | |
| 382 formats = ['Default', 'HGNC_SYMBOL', 'ENSNG'] | |
| 383 | |
| 384 for model in models: | |
| 385 for fmt in formats: | |
| 386 cmd = [ | |
| 387 'metabolicModel2Tabular', | |
| 388 '--model', model, | |
| 389 '--name', f'{model}_{fmt}', | |
| 390 '--medium_selector', 'allOpen', | |
| 391 '--gene_format', fmt, | |
| 392 '--out_tabular', f'{model}_{fmt}.csv', | |
| 393 '--out_log', f'{model}_{fmt}.log', | |
| 394 '--tool_dir', '/opt/COBRAxy' | |
| 395 ] | |
| 396 subprocess.run(cmd, check=True) | |
| 397 ``` | |
| 398 | |
| 399 ### Database Integration | |
| 400 | |
| 401 Export model data to databases: | |
| 402 | |
| 403 ```sql | |
| 404 -- Load CSV into PostgreSQL | |
| 405 CREATE TABLE model_reactions ( | |
| 406 reaction_id VARCHAR(50), | |
| 407 gpr_rule TEXT, | |
| 408 reaction_formula TEXT, | |
| 409 lower_bound FLOAT, | |
| 410 upper_bound FLOAT, | |
| 411 objective_coefficient FLOAT, | |
| 412 medium_member BOOLEAN, | |
| 413 compartment VARCHAR(50), | |
| 414 subsystem VARCHAR(100) | |
| 415 ); | |
| 416 | |
| 417 COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER; | |
| 418 ``` | |
| 419 | |
| 420 ## See Also | |
| 421 | |
| 422 - [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation | |
| 423 - [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis | |
| 424 - [Custom Model Tutorial](../tutorials/custom-model-integration.md) | |
| 425 - [Gene Mapping Reference](../tutorials/gene-id-conversion.md) |
