Mercurial > repos > bimib > cobraxy
diff COBRAxy/docs/tools/import-metabolic-model.md @ 547:73f2f7e2be17 draft
Uploaded
| author | francesco_lapi |
|---|---|
| date | Tue, 28 Oct 2025 10:44:07 +0000 |
| parents | fcdbc81feb45 |
| children |
line wrap: on
line diff
--- a/COBRAxy/docs/tools/import-metabolic-model.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/import-metabolic-model.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,387 +1,142 @@ # Import Metabolic Model -Import and extract metabolic model components into tabular format for analysis and integration. +Import and extract metabolic model components into tabular format. ## Overview -Import Metabolic Model (importMetabolicModel) imports metabolic models from various formats (SBML, JSON, MAT, YAML) and extracts key components into comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis. +Import Metabolic Model extracts metabolic models from SBML/JSON/MAT/YAML formats into tabular summary for analysis. + +**Input**: Model file or built-in models +**Output**: Tabular data (CSV/TSV) + +## Galaxy Interface -## Usage +In Galaxy: **COBRAxy → Import Metabolic Model** -### Command Line +1. Select built-in model or upload custom file +2. Set model name and medium configuration +3. Click **Run tool** + +## Command-line console ```bash -importMetabolicModel --model ENGRO2 \ - --name ENGRO2 \ - --medium_selector allOpen \ - --out_tabular model_data.csv \ - --out_log extraction.log \ - --tool_dir /path/to/COBRAxy/src +# Import built-in model +importMetabolicModel \ + --model ENGRO2 \ + --name ENGRO2 \ + --medium_selector allOpen \ + --out_tabular model_data.csv \ + --out_log extraction.log ``` -### Galaxy Interface - -Select "Import Metabolic Model" from the COBRAxy tool suite and configure model extraction parameters. - ## Parameters -### Required Parameters +### Model Selection | Parameter | Flag | Description | |-----------|------|-------------| -| Model Name | `--name` | Model identifier for output files | -| Medium Selector | `--medium_selector` | Medium configuration option | -| Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) | -| Output Log | `--out_log` | Log file for processing information | -| Tool Directory | `--tool_dir` | COBRAxy installation directory | +| Built-in Model | `--model` | ENGRO2 or Recon | +| Custom Model | `--input` | Path to SBML/JSON/MAT/YAML file | -### Model Selection Parameters +**Note**: Use either `--model` OR `--input`. + + +### Required -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - | -| Custom Model | `--input` | Path to custom SBML/JSON model file | - | +| Parameter | Flag | Description | +|-----------|------|-------------| +| Model Name | `--name` | Model identifier | +| Medium Selector | `--medium_selector` | Medium configuration (use `allOpen`) | +| Output Tabular | `--out_tabular` | Output file (CSV/XLSX) | +| Output Log | `--out_log` | Log file | -**Note**: Provide either `--model` OR `--input`, not both. - -### Optional Parameters +### Optional | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| | Custom Medium | `--custom_medium` | CSV file with medium constraints | - | - -## Model Selection - -### Built-in Models +| Gene Format | `--gene_format` | Gene ID conversion: Default, ENSG, HGNC_ID, entrez_id | Default | -#### ENGRO2 -- **Species**: Homo sapiens -- **Scope**: Genome-scale reconstruction -- **Reactions**: ~2,000 reactions -- **Metabolites**: ~1,500 metabolites -- **Coverage**: Comprehensive human metabolism +## Built-in Models -#### Recon -- **Species**: Homo sapiens -- **Scope**: Recon3D human reconstruction -- **Reactions**: ~10,000+ reactions -- **Metabolites**: ~5,000+ metabolites -- **Coverage**: Most comprehensive human model +- **ENGRO2**: ~500 reactions (recommended) +- **Recon**: ~10,000 reactions (genome-wide) -#### HMRcore -- **Species**: Homo sapiens -- **Scope**: Core metabolic network -- **Reactions**: ~300 essential reactions -- **Metabolites**: ~200 core metabolites -- **Coverage**: Central carbon and energy metabolism +See [Built-in Models](reference/built-in-models) for details. -### Custom Models +## Supported Formats -Supported formats for custom model import: -- **SBML**: Systems Biology Markup Language (.xml, .sbml) -- **JSON**: COBRApy JSON format (.json) -- **MAT**: MATLAB format (.mat) -- **YML**: YAML format (.yml, .yaml) -- **Compressed**: All formats support .gz, .zip, .bz2 compression +- **Model formats**: SBML (.xml), JSON (.json), MAT (.mat), YAML (.yml) +- **Compression**: .zip, .gz, .bz2 (e.g., `model.xml.gz`) -## Medium Configuration - -### allOpen (Default) -- All exchange reactions unconstrained -- Maximum metabolic flexibility -- Suitable for general analysis - -### Custom Medium -Users can specify custom medium constraints by providing a CSV file with exchange reaction bounds. +Compressed files are automatically detected and extracted. ## Output Format -### Tabular Summary File - -The output contains comprehensive model information in CSV or XLSX format: +**ENGRO2 model:** +``` +ReactionID Formula GPR lower_bound upper_bound ObjectiveCoefficient Pathway_1 Pathway_2 InMedium TranslationIssues +R00001 A + B -> C + D GENE1 or GENE2 -1000.0 1000.0 0.0 Glycolysis Central_Metabolism FALSE +EX_glc_e glc_e <-> - -1000.0 1000.0 0.0 Exchange Transport TRUE +``` -#### Column Structure +**Other models (Recon):** ``` -Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem -R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis -R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle -EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange +ReactionID Formula GPR lower_bound upper_bound ObjectiveCoefficient InMedium TranslationIssues +R00001 A + B -> C + D GENE1 or GENE2 -1000.0 1000.0 0.0 FALSE +EX_glc_e glc_e <-> - -1000.0 1000.0 0.0 TRUE ``` -#### Data Fields +**File Format Notes:** +- Output can be **tab-separated** (CSV) or Excel (XLSX) +- Contains all model information in tabular format +- Can be edited and re-imported using Export Metabolic Model + +## Understanding Medium Composition -| Field | Description | Values | -|-------|-------------|---------| -| Reaction_ID | Unique reaction identifier | String | -| GPR_Rule | Gene-protein-reaction association | Logical expression | -| Reaction_Formula | Stoichiometric equation | Metabolites with coefficients | -| Lower_Bound | Minimum flux constraint | Numeric (typically -1000) | -| Upper_Bound | Maximum flux constraint | Numeric (typically 1000) | -| Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) | -| Medium_Member | Exchange reaction flag | TRUE/FALSE | -| Compartment | Subcellular location | String (for ENGRO2 only) | -| Subsystem | Metabolic pathway | String | +Exchange reactions with `InMedium = TRUE` represent nutrients in the medium: +- **Lower bound**: Uptake rate (negative value, e.g., -10 = uptake 10 mmol/gDW/hr) +- **Upper bound**: Secretion rate (positive value) + +Example: +``` +EX_glc_e glc_e <-> - -10.0 1000.0 0.0 TRUE +``` +Glucose uptake: 10 mmol/gDW/hr (lower bound = -10) + +More info: [COBRApy Media Documentation](https://cobrapy.readthedocs.io/en/latest/media.html) ## Examples -### Extract Built-in Model Data +### Extract Built-in Model ```bash -# Extract ENGRO2 model with default settings importMetabolicModel --model ENGRO2 \ --name ENGRO2_extraction \ --medium_selector allOpen \ --out_tabular ENGRO2_data.csv \ - --out_log ENGRO2_log.txt \ - --tool_dir /opt/COBRAxy/src + --out_log ENGRO2_log.txt ``` ### Process Custom Model ```bash -# Extract custom SBML model -importMetabolicModel --input /data/custom_model.xml \ +importMetabolicModel --input custom_model.xml \ --name CustomModel \ --medium_selector allOpen \ - --out_tabular custom_model_data.csv \ - --out_log custom_extraction.log \ - --tool_dir /opt/COBRAxy/src -``` - -### Extract Core Model for Quick Analysis - -```bash -# Extract HMRcore for rapid prototyping -importMetabolicModel --model HMRcore \ - --name CoreModel \ - --medium_selector allOpen \ - --out_tabular core_reactions.csv \ - --out_log core_log.txt \ - --tool_dir /opt/COBRAxy/src -``` - -### Batch Processing Multiple Models - -```bash -#!/bin/bash -models=("ENGRO2" "HMRcore" "Recon") -for model in "${models[@]}"; do - importMetabolicModel --model "$model" \ - --name "${model}_extract" \ - --medium_selector allOpen \ - --out_tabular "${model}_data.csv" \ - --out_log "${model}_log.txt" \ - --tool_dir /opt/COBRAxy/src -done + --out_tabular custom_data.csv \ + --out_log custom_log.txt ``` -## Use Cases - -### Model Comparison -Extract multiple models to compare: -- Reaction coverage across different reconstructions -- Gene-reaction associations -- Pathway representation -- Metabolite compartmentalization - -### Data Integration -Prepare model data for: -- Custom analysis pipelines -- Database integration -- Pathway annotation -- Cross-reference mapping - -### Quality Control -Validate model properties: -- Check reaction balancing -- Verify gene associations -- Assess network connectivity -- Identify missing annotations - -### Custom Analysis -Export structured data for: -- Network analysis (graph theory) -- Machine learning applications -- Statistical modeling -- Comparative genomics - -## Integration Workflow - -### Downstream Tools - -The extracted tabular data serves as input for: - -#### COBRAxy Tools -- [RAS Generator](ras-generator.md) - Use extracted GPR rules -- [RPS Generator](rps-generator.md) - Use reaction formulas -- [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds -- [MAREA](marea.md) - Use reaction annotations - -#### External Analysis -- **R/Bioconductor**: Import CSV for pathway analysis -- **Python/pandas**: Load data for network analysis -- **MATLAB**: Process XLSX for modeling -- **Cytoscape**: Network visualization -- **Databases**: Populate reaction databases - -### Typical Pipeline - -```bash -# 1. Extract model components -importMetabolicModel --model ENGRO2 --name ModelData \ - --out_tabular model_components.csv \ - --tool_dir /opt/COBRAxy/src - -# 2. Use extracted data for RAS analysis -ras_generator -td /opt/COBRAxy/src -rs Custom \ - -rl model_components.csv \ - -in expression_data.tsv -ra ras_scores.tsv - -# 3. Apply constraints and sample fluxes -ras_to_bounds -td /opt/COBRAxy/src -ms Custom -mo model_components.csv \ - -ir ras_scores.tsv -idop constrained_bounds/ - -# 4. Visualize results -marea -td /opt/COBRAxy/src -input_data ras_scores.tsv \ - -choice_map Custom -custom_map custom.svg -idop results/ -``` - -## Quality Control - -### Pre-extraction Validation -- Verify model file integrity and format -- Check SBML compliance for custom models -- Validate gene ID formats and coverage -- Confirm medium constraint specifications - -### Post-extraction Checks -- **Completeness**: Verify all expected reactions extracted -- **Consistency**: Check stoichiometric balance -- **Annotations**: Validate gene-reaction associations -- **Formatting**: Confirm output file structure - -### Data Validation - -#### Reaction Balancing -```bash -# Check for unbalanced reactions -awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv -``` - -#### Gene Coverage -```bash -# Count reactions with GPR rules -awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv -``` - -#### Exchange Reactions -```bash -# List medium components -awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv -``` - -## Tips and Best Practices - -### Model Selection -- **ENGRO2**: Balanced coverage for human tissue analysis -- **HMRcore**: Fast processing for algorithm development -- **Recon**: Comprehensive analysis requiring computational resources -- **Custom**: Organism-specific or specialized models - -### Output Format Optimization -- **CSV**: Lightweight, universal compatibility -- Choose based on downstream analysis requirements - -### Performance Considerations -- Large models (Recon) may require substantial memory -- Consider batch processing for multiple extractions - ## Troubleshooting -### Common Issues - -**Model loading fails** -- Check file format and compression -- Verify SBML/JSON/MAT/YAML validity for custom models -- Ensure sufficient system memory - -**Empty output file** -- Model may contain no reactions -- Check model file integrity -- Verify tool directory configuration - -### Error Messages - -| Error | Cause | Solution | -|-------|-------|----------| -| "Model file not found" | Invalid file path | Check file location and permissions | -| "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YAML | -| "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory | - -### Performance Issues - -**Slow processing** -- Large models require more time -- Monitor system resource usage - -**Memory errors** -- Reduce model size if possible -- Process in smaller batches -- Increase available system memory - -**Output file corruption** -- Check disk space availability -- Verify file write permissions -- Monitor for system interruptions - -## Advanced Usage - -### Batch Extraction Script - -```python -#!/usr/bin/env python3 -import subprocess -import sys - -models = ['ENGRO2', 'HMRcore', 'Recon'] - -for model in models: - cmd = [ - 'importMetabolicModel', - '--model', model, - '--name', f'{model}_data', - '--medium_selector', 'allOpen', - '--out_tabular', f'{model}.csv', - '--out_log', f'{model}.log', - '--tool_dir', '/opt/COBRAxy/src' - ] - subprocess.run(cmd, check=True) -``` - -### Database Integration - -Export model data to databases: - -```sql --- Load CSV into PostgreSQL -CREATE TABLE model_reactions ( - reaction_id VARCHAR(50), - gpr_rule TEXT, - reaction_formula TEXT, - lower_bound FLOAT, - upper_bound FLOAT, - objective_coefficient FLOAT, - medium_member BOOLEAN, - compartment VARCHAR(50), - subsystem VARCHAR(100) -); - -COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER; -``` +| Error | Solution | +|-------|----------| +| "Model file not found" | Check file path | +| "Unsupported format" | Use SBML, JSON, MAT, or YAML | ## See Also -- [Export Metabolic Model](export-metabolic-model.md) - Export tabular data to model formats -- [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation -- [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis -- [Custom Model Tutorial](/tutorials/custom-model-integration.md) \ No newline at end of file +- [Export Metabolic Model](reference/export-metabolic-model) +- [RAS Generator](tools/ras-generator) +- [RPS Generator](tools/rps-generator)
