Mercurial > repos > bimib > cobraxy
comparison COBRAxy/docs/tools/import-metabolic-model.md @ 542:fcdbc81feb45 draft
Uploaded
| author | francesco_lapi |
|---|---|
| date | Sun, 26 Oct 2025 19:27:41 +0000 |
| parents | |
| children | 73f2f7e2be17 |
comparison
equal
deleted
inserted
replaced
| 541:fa93040a75af | 542:fcdbc81feb45 |
|---|---|
| 1 # Import Metabolic Model | |
| 2 | |
| 3 Import and extract metabolic model components into tabular format for analysis and integration. | |
| 4 | |
| 5 ## Overview | |
| 6 | |
| 7 Import Metabolic Model (importMetabolicModel) imports metabolic models from various formats (SBML, JSON, MAT, YAML) and extracts key components into comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis. | |
| 8 | |
| 9 ## Usage | |
| 10 | |
| 11 ### Command Line | |
| 12 | |
| 13 ```bash | |
| 14 importMetabolicModel --model ENGRO2 \ | |
| 15 --name ENGRO2 \ | |
| 16 --medium_selector allOpen \ | |
| 17 --out_tabular model_data.csv \ | |
| 18 --out_log extraction.log \ | |
| 19 --tool_dir /path/to/COBRAxy/src | |
| 20 ``` | |
| 21 | |
| 22 ### Galaxy Interface | |
| 23 | |
| 24 Select "Import Metabolic Model" from the COBRAxy tool suite and configure model extraction parameters. | |
| 25 | |
| 26 ## Parameters | |
| 27 | |
| 28 ### Required Parameters | |
| 29 | |
| 30 | Parameter | Flag | Description | | |
| 31 |-----------|------|-------------| | |
| 32 | Model Name | `--name` | Model identifier for output files | | |
| 33 | Medium Selector | `--medium_selector` | Medium configuration option | | |
| 34 | Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) | | |
| 35 | Output Log | `--out_log` | Log file for processing information | | |
| 36 | Tool Directory | `--tool_dir` | COBRAxy installation directory | | |
| 37 | |
| 38 ### Model Selection Parameters | |
| 39 | |
| 40 | Parameter | Flag | Description | Default | | |
| 41 |-----------|------|-------------|---------| | |
| 42 | Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - | | |
| 43 | Custom Model | `--input` | Path to custom SBML/JSON model file | - | | |
| 44 | |
| 45 **Note**: Provide either `--model` OR `--input`, not both. | |
| 46 | |
| 47 ### Optional Parameters | |
| 48 | |
| 49 | Parameter | Flag | Description | Default | | |
| 50 |-----------|------|-------------|---------| | |
| 51 | Custom Medium | `--custom_medium` | CSV file with medium constraints | - | | |
| 52 | |
| 53 ## Model Selection | |
| 54 | |
| 55 ### Built-in Models | |
| 56 | |
| 57 #### ENGRO2 | |
| 58 - **Species**: Homo sapiens | |
| 59 - **Scope**: Genome-scale reconstruction | |
| 60 - **Reactions**: ~2,000 reactions | |
| 61 - **Metabolites**: ~1,500 metabolites | |
| 62 - **Coverage**: Comprehensive human metabolism | |
| 63 | |
| 64 #### Recon | |
| 65 - **Species**: Homo sapiens | |
| 66 - **Scope**: Recon3D human reconstruction | |
| 67 - **Reactions**: ~10,000+ reactions | |
| 68 - **Metabolites**: ~5,000+ metabolites | |
| 69 - **Coverage**: Most comprehensive human model | |
| 70 | |
| 71 #### HMRcore | |
| 72 - **Species**: Homo sapiens | |
| 73 - **Scope**: Core metabolic network | |
| 74 - **Reactions**: ~300 essential reactions | |
| 75 - **Metabolites**: ~200 core metabolites | |
| 76 - **Coverage**: Central carbon and energy metabolism | |
| 77 | |
| 78 ### Custom Models | |
| 79 | |
| 80 Supported formats for custom model import: | |
| 81 - **SBML**: Systems Biology Markup Language (.xml, .sbml) | |
| 82 - **JSON**: COBRApy JSON format (.json) | |
| 83 - **MAT**: MATLAB format (.mat) | |
| 84 - **YML**: YAML format (.yml, .yaml) | |
| 85 - **Compressed**: All formats support .gz, .zip, .bz2 compression | |
| 86 | |
| 87 ## Medium Configuration | |
| 88 | |
| 89 ### allOpen (Default) | |
| 90 - All exchange reactions unconstrained | |
| 91 - Maximum metabolic flexibility | |
| 92 - Suitable for general analysis | |
| 93 | |
| 94 ### Custom Medium | |
| 95 Users can specify custom medium constraints by providing a CSV file with exchange reaction bounds. | |
| 96 | |
| 97 ## Output Format | |
| 98 | |
| 99 ### Tabular Summary File | |
| 100 | |
| 101 The output contains comprehensive model information in CSV or XLSX format: | |
| 102 | |
| 103 #### Column Structure | |
| 104 ``` | |
| 105 Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem | |
| 106 R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis | |
| 107 R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle | |
| 108 EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange | |
| 109 ``` | |
| 110 | |
| 111 #### Data Fields | |
| 112 | |
| 113 | Field | Description | Values | | |
| 114 |-------|-------------|---------| | |
| 115 | Reaction_ID | Unique reaction identifier | String | | |
| 116 | GPR_Rule | Gene-protein-reaction association | Logical expression | | |
| 117 | Reaction_Formula | Stoichiometric equation | Metabolites with coefficients | | |
| 118 | Lower_Bound | Minimum flux constraint | Numeric (typically -1000) | | |
| 119 | Upper_Bound | Maximum flux constraint | Numeric (typically 1000) | | |
| 120 | Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) | | |
| 121 | Medium_Member | Exchange reaction flag | TRUE/FALSE | | |
| 122 | Compartment | Subcellular location | String (for ENGRO2 only) | | |
| 123 | Subsystem | Metabolic pathway | String | | |
| 124 | |
| 125 ## Examples | |
| 126 | |
| 127 ### Extract Built-in Model Data | |
| 128 | |
| 129 ```bash | |
| 130 # Extract ENGRO2 model with default settings | |
| 131 importMetabolicModel --model ENGRO2 \ | |
| 132 --name ENGRO2_extraction \ | |
| 133 --medium_selector allOpen \ | |
| 134 --out_tabular ENGRO2_data.csv \ | |
| 135 --out_log ENGRO2_log.txt \ | |
| 136 --tool_dir /opt/COBRAxy/src | |
| 137 ``` | |
| 138 | |
| 139 ### Process Custom Model | |
| 140 | |
| 141 ```bash | |
| 142 # Extract custom SBML model | |
| 143 importMetabolicModel --input /data/custom_model.xml \ | |
| 144 --name CustomModel \ | |
| 145 --medium_selector allOpen \ | |
| 146 --out_tabular custom_model_data.csv \ | |
| 147 --out_log custom_extraction.log \ | |
| 148 --tool_dir /opt/COBRAxy/src | |
| 149 ``` | |
| 150 | |
| 151 ### Extract Core Model for Quick Analysis | |
| 152 | |
| 153 ```bash | |
| 154 # Extract HMRcore for rapid prototyping | |
| 155 importMetabolicModel --model HMRcore \ | |
| 156 --name CoreModel \ | |
| 157 --medium_selector allOpen \ | |
| 158 --out_tabular core_reactions.csv \ | |
| 159 --out_log core_log.txt \ | |
| 160 --tool_dir /opt/COBRAxy/src | |
| 161 ``` | |
| 162 | |
| 163 ### Batch Processing Multiple Models | |
| 164 | |
| 165 ```bash | |
| 166 #!/bin/bash | |
| 167 models=("ENGRO2" "HMRcore" "Recon") | |
| 168 for model in "${models[@]}"; do | |
| 169 importMetabolicModel --model "$model" \ | |
| 170 --name "${model}_extract" \ | |
| 171 --medium_selector allOpen \ | |
| 172 --out_tabular "${model}_data.csv" \ | |
| 173 --out_log "${model}_log.txt" \ | |
| 174 --tool_dir /opt/COBRAxy/src | |
| 175 done | |
| 176 ``` | |
| 177 | |
| 178 ## Use Cases | |
| 179 | |
| 180 ### Model Comparison | |
| 181 Extract multiple models to compare: | |
| 182 - Reaction coverage across different reconstructions | |
| 183 - Gene-reaction associations | |
| 184 - Pathway representation | |
| 185 - Metabolite compartmentalization | |
| 186 | |
| 187 ### Data Integration | |
| 188 Prepare model data for: | |
| 189 - Custom analysis pipelines | |
| 190 - Database integration | |
| 191 - Pathway annotation | |
| 192 - Cross-reference mapping | |
| 193 | |
| 194 ### Quality Control | |
| 195 Validate model properties: | |
| 196 - Check reaction balancing | |
| 197 - Verify gene associations | |
| 198 - Assess network connectivity | |
| 199 - Identify missing annotations | |
| 200 | |
| 201 ### Custom Analysis | |
| 202 Export structured data for: | |
| 203 - Network analysis (graph theory) | |
| 204 - Machine learning applications | |
| 205 - Statistical modeling | |
| 206 - Comparative genomics | |
| 207 | |
| 208 ## Integration Workflow | |
| 209 | |
| 210 ### Downstream Tools | |
| 211 | |
| 212 The extracted tabular data serves as input for: | |
| 213 | |
| 214 #### COBRAxy Tools | |
| 215 - [RAS Generator](ras-generator.md) - Use extracted GPR rules | |
| 216 - [RPS Generator](rps-generator.md) - Use reaction formulas | |
| 217 - [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds | |
| 218 - [MAREA](marea.md) - Use reaction annotations | |
| 219 | |
| 220 #### External Analysis | |
| 221 - **R/Bioconductor**: Import CSV for pathway analysis | |
| 222 - **Python/pandas**: Load data for network analysis | |
| 223 - **MATLAB**: Process XLSX for modeling | |
| 224 - **Cytoscape**: Network visualization | |
| 225 - **Databases**: Populate reaction databases | |
| 226 | |
| 227 ### Typical Pipeline | |
| 228 | |
| 229 ```bash | |
| 230 # 1. Extract model components | |
| 231 importMetabolicModel --model ENGRO2 --name ModelData \ | |
| 232 --out_tabular model_components.csv \ | |
| 233 --tool_dir /opt/COBRAxy/src | |
| 234 | |
| 235 # 2. Use extracted data for RAS analysis | |
| 236 ras_generator -td /opt/COBRAxy/src -rs Custom \ | |
| 237 -rl model_components.csv \ | |
| 238 -in expression_data.tsv -ra ras_scores.tsv | |
| 239 | |
| 240 # 3. Apply constraints and sample fluxes | |
| 241 ras_to_bounds -td /opt/COBRAxy/src -ms Custom -mo model_components.csv \ | |
| 242 -ir ras_scores.tsv -idop constrained_bounds/ | |
| 243 | |
| 244 # 4. Visualize results | |
| 245 marea -td /opt/COBRAxy/src -input_data ras_scores.tsv \ | |
| 246 -choice_map Custom -custom_map custom.svg -idop results/ | |
| 247 ``` | |
| 248 | |
| 249 ## Quality Control | |
| 250 | |
| 251 ### Pre-extraction Validation | |
| 252 - Verify model file integrity and format | |
| 253 - Check SBML compliance for custom models | |
| 254 - Validate gene ID formats and coverage | |
| 255 - Confirm medium constraint specifications | |
| 256 | |
| 257 ### Post-extraction Checks | |
| 258 - **Completeness**: Verify all expected reactions extracted | |
| 259 - **Consistency**: Check stoichiometric balance | |
| 260 - **Annotations**: Validate gene-reaction associations | |
| 261 - **Formatting**: Confirm output file structure | |
| 262 | |
| 263 ### Data Validation | |
| 264 | |
| 265 #### Reaction Balancing | |
| 266 ```bash | |
| 267 # Check for unbalanced reactions | |
| 268 awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv | |
| 269 ``` | |
| 270 | |
| 271 #### Gene Coverage | |
| 272 ```bash | |
| 273 # Count reactions with GPR rules | |
| 274 awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv | |
| 275 ``` | |
| 276 | |
| 277 #### Exchange Reactions | |
| 278 ```bash | |
| 279 # List medium components | |
| 280 awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv | |
| 281 ``` | |
| 282 | |
| 283 ## Tips and Best Practices | |
| 284 | |
| 285 ### Model Selection | |
| 286 - **ENGRO2**: Balanced coverage for human tissue analysis | |
| 287 - **HMRcore**: Fast processing for algorithm development | |
| 288 - **Recon**: Comprehensive analysis requiring computational resources | |
| 289 - **Custom**: Organism-specific or specialized models | |
| 290 | |
| 291 ### Output Format Optimization | |
| 292 - **CSV**: Lightweight, universal compatibility | |
| 293 - Choose based on downstream analysis requirements | |
| 294 | |
| 295 ### Performance Considerations | |
| 296 - Large models (Recon) may require substantial memory | |
| 297 - Consider batch processing for multiple extractions | |
| 298 | |
| 299 ## Troubleshooting | |
| 300 | |
| 301 ### Common Issues | |
| 302 | |
| 303 **Model loading fails** | |
| 304 - Check file format and compression | |
| 305 - Verify SBML/JSON/MAT/YAML validity for custom models | |
| 306 - Ensure sufficient system memory | |
| 307 | |
| 308 **Empty output file** | |
| 309 - Model may contain no reactions | |
| 310 - Check model file integrity | |
| 311 - Verify tool directory configuration | |
| 312 | |
| 313 ### Error Messages | |
| 314 | |
| 315 | Error | Cause | Solution | | |
| 316 |-------|-------|----------| | |
| 317 | "Model file not found" | Invalid file path | Check file location and permissions | | |
| 318 | "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YAML | | |
| 319 | "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory | | |
| 320 | |
| 321 ### Performance Issues | |
| 322 | |
| 323 **Slow processing** | |
| 324 - Large models require more time | |
| 325 - Monitor system resource usage | |
| 326 | |
| 327 **Memory errors** | |
| 328 - Reduce model size if possible | |
| 329 - Process in smaller batches | |
| 330 - Increase available system memory | |
| 331 | |
| 332 **Output file corruption** | |
| 333 - Check disk space availability | |
| 334 - Verify file write permissions | |
| 335 - Monitor for system interruptions | |
| 336 | |
| 337 ## Advanced Usage | |
| 338 | |
| 339 ### Batch Extraction Script | |
| 340 | |
| 341 ```python | |
| 342 #!/usr/bin/env python3 | |
| 343 import subprocess | |
| 344 import sys | |
| 345 | |
| 346 models = ['ENGRO2', 'HMRcore', 'Recon'] | |
| 347 | |
| 348 for model in models: | |
| 349 cmd = [ | |
| 350 'importMetabolicModel', | |
| 351 '--model', model, | |
| 352 '--name', f'{model}_data', | |
| 353 '--medium_selector', 'allOpen', | |
| 354 '--out_tabular', f'{model}.csv', | |
| 355 '--out_log', f'{model}.log', | |
| 356 '--tool_dir', '/opt/COBRAxy/src' | |
| 357 ] | |
| 358 subprocess.run(cmd, check=True) | |
| 359 ``` | |
| 360 | |
| 361 ### Database Integration | |
| 362 | |
| 363 Export model data to databases: | |
| 364 | |
| 365 ```sql | |
| 366 -- Load CSV into PostgreSQL | |
| 367 CREATE TABLE model_reactions ( | |
| 368 reaction_id VARCHAR(50), | |
| 369 gpr_rule TEXT, | |
| 370 reaction_formula TEXT, | |
| 371 lower_bound FLOAT, | |
| 372 upper_bound FLOAT, | |
| 373 objective_coefficient FLOAT, | |
| 374 medium_member BOOLEAN, | |
| 375 compartment VARCHAR(50), | |
| 376 subsystem VARCHAR(100) | |
| 377 ); | |
| 378 | |
| 379 COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER; | |
| 380 ``` | |
| 381 | |
| 382 ## See Also | |
| 383 | |
| 384 - [Export Metabolic Model](export-metabolic-model.md) - Export tabular data to model formats | |
| 385 - [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation | |
| 386 - [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis | |
| 387 - [Custom Model Tutorial](/tutorials/custom-model-integration.md) |
