diff COBRAxy/docs/tools/import-metabolic-model.md @ 547:73f2f7e2be17 draft

Uploaded
author francesco_lapi
date Tue, 28 Oct 2025 10:44:07 +0000
parents fcdbc81feb45
children
line wrap: on
line diff
--- a/COBRAxy/docs/tools/import-metabolic-model.md	Mon Oct 27 12:33:08 2025 +0000
+++ b/COBRAxy/docs/tools/import-metabolic-model.md	Tue Oct 28 10:44:07 2025 +0000
@@ -1,387 +1,142 @@
 # Import Metabolic Model
 
-Import and extract metabolic model components into tabular format for analysis and integration.
+Import and extract metabolic model components into tabular format.
 
 ## Overview
 
-Import Metabolic Model (importMetabolicModel) imports metabolic models from various formats (SBML, JSON, MAT, YAML) and extracts key components into comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis.
+Import Metabolic Model extracts metabolic models from SBML/JSON/MAT/YAML formats into tabular summary for analysis.
+
+**Input**: Model file or built-in models  
+**Output**: Tabular data (CSV/TSV)
+
+## Galaxy Interface
 
-## Usage
+In Galaxy: **COBRAxy → Import Metabolic Model**
 
-### Command Line
+1. Select built-in model or upload custom file
+2. Set model name and medium configuration
+3. Click **Run tool**
+
+## Command-line console
 
 ```bash
-importMetabolicModel --model ENGRO2 \
-                     --name ENGRO2 \
-                     --medium_selector allOpen \
-                     --out_tabular model_data.csv \
-                     --out_log extraction.log \
-                     --tool_dir /path/to/COBRAxy/src
+# Import built-in model
+importMetabolicModel \
+  --model ENGRO2 \
+  --name ENGRO2 \
+  --medium_selector allOpen \
+  --out_tabular model_data.csv \
+  --out_log extraction.log
 ```
 
-### Galaxy Interface
-
-Select "Import Metabolic Model" from the COBRAxy tool suite and configure model extraction parameters.
-
 ## Parameters
 
-### Required Parameters
+### Model Selection
 
 | Parameter | Flag | Description |
 |-----------|------|-------------|
-| Model Name | `--name` | Model identifier for output files |
-| Medium Selector | `--medium_selector` | Medium configuration option |
-| Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) |
-| Output Log | `--out_log` | Log file for processing information |
-| Tool Directory | `--tool_dir` | COBRAxy installation directory |
+| Built-in Model | `--model` | ENGRO2 or Recon |
+| Custom Model | `--input` | Path to SBML/JSON/MAT/YAML file |
 
-### Model Selection Parameters
+**Note**: Use either `--model` OR `--input`.
+
+
+### Required
 
-| Parameter | Flag | Description | Default |
-|-----------|------|-------------|---------|
-| Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - |
-| Custom Model | `--input` | Path to custom SBML/JSON model file | - |
+| Parameter | Flag | Description |
+|-----------|------|-------------|
+| Model Name | `--name` | Model identifier |
+| Medium Selector | `--medium_selector` | Medium configuration (use `allOpen`) |
+| Output Tabular | `--out_tabular` | Output file (CSV/XLSX) |
+| Output Log | `--out_log` | Log file |
 
-**Note**: Provide either `--model` OR `--input`, not both.
-
-### Optional Parameters
+### Optional
 
 | Parameter | Flag | Description | Default |
 |-----------|------|-------------|---------|
 | Custom Medium | `--custom_medium` | CSV file with medium constraints | - |
-
-## Model Selection
-
-### Built-in Models
+| Gene Format | `--gene_format` | Gene ID conversion: Default, ENSG, HGNC_ID, entrez_id | Default |
 
-#### ENGRO2
-- **Species**: Homo sapiens
-- **Scope**: Genome-scale reconstruction
-- **Reactions**: ~2,000 reactions
-- **Metabolites**: ~1,500 metabolites  
-- **Coverage**: Comprehensive human metabolism
+## Built-in Models
 
-#### Recon  
-- **Species**: Homo sapiens
-- **Scope**: Recon3D human reconstruction
-- **Reactions**: ~10,000+ reactions
-- **Metabolites**: ~5,000+ metabolites
-- **Coverage**: Most comprehensive human model
+- **ENGRO2**: ~500 reactions (recommended)
+- **Recon**: ~10,000 reactions (genome-wide)
 
-#### HMRcore
-- **Species**: Homo sapiens  
-- **Scope**: Core metabolic network
-- **Reactions**: ~300 essential reactions
-- **Metabolites**: ~200 core metabolites
-- **Coverage**: Central carbon and energy metabolism
+See [Built-in Models](reference/built-in-models) for details.
 
-### Custom Models
+## Supported Formats
 
-Supported formats for custom model import:
-- **SBML**: Systems Biology Markup Language (.xml, .sbml)
-- **JSON**: COBRApy JSON format (.json)
-- **MAT**: MATLAB format (.mat)  
-- **YML**: YAML format (.yml, .yaml)
-- **Compressed**: All formats support .gz, .zip, .bz2 compression
+- **Model formats**: SBML (.xml), JSON (.json), MAT (.mat), YAML (.yml)
+- **Compression**: .zip, .gz, .bz2 (e.g., `model.xml.gz`)
 
-## Medium Configuration
-
-### allOpen (Default)
-- All exchange reactions unconstrained
-- Maximum metabolic flexibility
-- Suitable for general analysis
-
-### Custom Medium
-Users can specify custom medium constraints by providing a CSV file with exchange reaction bounds.
+Compressed files are automatically detected and extracted.
 
 ## Output Format
 
-### Tabular Summary File
-
-The output contains comprehensive model information in CSV or XLSX format:
+**ENGRO2 model:**
+```
+ReactionID	Formula	GPR	lower_bound	upper_bound	ObjectiveCoefficient	Pathway_1	Pathway_2	InMedium	TranslationIssues
+R00001	A + B -> C + D	GENE1 or GENE2	-1000.0	1000.0	0.0	Glycolysis	Central_Metabolism	FALSE	
+EX_glc_e	glc_e <->	-	-1000.0	1000.0	0.0	Exchange	Transport	TRUE	
+```
 
-#### Column Structure
+**Other models (Recon):**
 ```
-Reaction_ID	GPR_Rule	Reaction_Formula	Lower_Bound	Upper_Bound	Objective_Coefficient	Medium_Member	Compartment	Subsystem
-R00001	GENE1 or GENE2	A + B -> C + D	-1000.0	1000.0	0.0	FALSE	cytosol	Glycolysis
-R00002	GENE3 and GENE4	E <-> F	-1000.0	1000.0	0.0	FALSE	mitochondria	TCA_Cycle
-EX_glc_e	-	glc_e <->	-1000.0	1000.0	0.0	TRUE	extracellular	Exchange
+ReactionID	Formula	GPR	lower_bound	upper_bound	ObjectiveCoefficient	InMedium	TranslationIssues
+R00001	A + B -> C + D	GENE1 or GENE2	-1000.0	1000.0	0.0	FALSE	
+EX_glc_e	glc_e <->	-	-1000.0	1000.0	0.0	TRUE	
 ```
 
-#### Data Fields
+**File Format Notes:**
+- Output can be **tab-separated** (CSV) or Excel (XLSX)
+- Contains all model information in tabular format
+- Can be edited and re-imported using Export Metabolic Model
+
+## Understanding Medium Composition
 
-| Field | Description | Values |
-|-------|-------------|---------|
-| Reaction_ID | Unique reaction identifier | String |
-| GPR_Rule | Gene-protein-reaction association | Logical expression |
-| Reaction_Formula | Stoichiometric equation | Metabolites with coefficients |
-| Lower_Bound | Minimum flux constraint | Numeric (typically -1000) |
-| Upper_Bound | Maximum flux constraint | Numeric (typically 1000) |
-| Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) |
-| Medium_Member | Exchange reaction flag | TRUE/FALSE |
-| Compartment | Subcellular location | String (for ENGRO2 only) |
-| Subsystem | Metabolic pathway | String |
+Exchange reactions with `InMedium = TRUE` represent nutrients in the medium:
+- **Lower bound**: Uptake rate (negative value, e.g., -10 = uptake 10 mmol/gDW/hr)
+- **Upper bound**: Secretion rate (positive value)
+
+Example:
+```
+EX_glc_e	glc_e <->	-	-10.0	1000.0	0.0	TRUE
+```
+Glucose uptake: 10 mmol/gDW/hr (lower bound = -10)
+
+More info: [COBRApy Media Documentation](https://cobrapy.readthedocs.io/en/latest/media.html)
 
 ## Examples
 
-### Extract Built-in Model Data
+### Extract Built-in Model
 
 ```bash
-# Extract ENGRO2 model with default settings
 importMetabolicModel --model ENGRO2 \
                      --name ENGRO2_extraction \
                      --medium_selector allOpen \
                      --out_tabular ENGRO2_data.csv \
-                     --out_log ENGRO2_log.txt \
-                     --tool_dir /opt/COBRAxy/src
+                     --out_log ENGRO2_log.txt
 ```
 
 ### Process Custom Model
 
 ```bash
-# Extract custom SBML model
-importMetabolicModel --input /data/custom_model.xml \
+importMetabolicModel --input custom_model.xml \
                      --name CustomModel \
                      --medium_selector allOpen \
-                     --out_tabular custom_model_data.csv \
-                     --out_log custom_extraction.log \
-                     --tool_dir /opt/COBRAxy/src
-```
-
-### Extract Core Model for Quick Analysis
-
-```bash  
-# Extract HMRcore for rapid prototyping
-importMetabolicModel --model HMRcore \
-                     --name CoreModel \
-                     --medium_selector allOpen \
-                     --out_tabular core_reactions.csv \
-                     --out_log core_log.txt \
-                     --tool_dir /opt/COBRAxy/src
-```
-
-### Batch Processing Multiple Models
-
-```bash
-#!/bin/bash
-models=("ENGRO2" "HMRcore" "Recon")
-for model in "${models[@]}"; do
-    importMetabolicModel --model "$model" \
-                         --name "${model}_extract" \
-                         --medium_selector allOpen \
-                         --out_tabular "${model}_data.csv" \
-                         --out_log "${model}_log.txt" \
-                         --tool_dir /opt/COBRAxy/src
-done
+                     --out_tabular custom_data.csv \
+                     --out_log custom_log.txt
 ```
 
-## Use Cases
-
-### Model Comparison
-Extract multiple models to compare:
-- Reaction coverage across different reconstructions  
-- Gene-reaction associations
-- Pathway representation
-- Metabolite compartmentalization
-
-### Data Integration
-Prepare model data for:
-- Custom analysis pipelines
-- Database integration
-- Pathway annotation
-- Cross-reference mapping
-
-### Quality Control
-Validate model properties:
-- Check reaction balancing
-- Verify gene associations
-- Assess network connectivity
-- Identify missing annotations
-
-### Custom Analysis
-Export structured data for:
-- Network analysis (graph theory)
-- Machine learning applications
-- Statistical modeling
-- Comparative genomics
-
-## Integration Workflow
-
-### Downstream Tools
-
-The extracted tabular data serves as input for:
-
-#### COBRAxy Tools
-- [RAS Generator](ras-generator.md) - Use extracted GPR rules
-- [RPS Generator](rps-generator.md) - Use reaction formulas
-- [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds
-- [MAREA](marea.md) - Use reaction annotations
-
-#### External Analysis
-- **R/Bioconductor**: Import CSV for pathway analysis
-- **Python/pandas**: Load data for network analysis  
-- **MATLAB**: Process XLSX for modeling
-- **Cytoscape**: Network visualization
-- **Databases**: Populate reaction databases
-
-### Typical Pipeline
-
-```bash
-# 1. Extract model components
-importMetabolicModel --model ENGRO2 --name ModelData \
-                     --out_tabular model_components.csv \
-                     --tool_dir /opt/COBRAxy/src
-
-# 2. Use extracted data for RAS analysis
-ras_generator -td /opt/COBRAxy/src -rs Custom \
-              -rl model_components.csv \
-              -in expression_data.tsv -ra ras_scores.tsv
-
-# 3. Apply constraints and sample fluxes
-ras_to_bounds -td /opt/COBRAxy/src -ms Custom -mo model_components.csv \
-              -ir ras_scores.tsv -idop constrained_bounds/
-
-# 4. Visualize results
-marea -td /opt/COBRAxy/src -input_data ras_scores.tsv \
-      -choice_map Custom -custom_map custom.svg -idop results/
-```
-
-## Quality Control
-
-### Pre-extraction Validation
-- Verify model file integrity and format
-- Check SBML compliance for custom models
-- Validate gene ID formats and coverage
-- Confirm medium constraint specifications
-
-### Post-extraction Checks
-- **Completeness**: Verify all expected reactions extracted
-- **Consistency**: Check stoichiometric balance
-- **Annotations**: Validate gene-reaction associations
-- **Formatting**: Confirm output file structure
-
-### Data Validation
-
-#### Reaction Balancing
-```bash
-# Check for unbalanced reactions
-awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv
-```
-
-#### Gene Coverage
-```bash
-# Count reactions with GPR rules  
-awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv
-```
-
-#### Exchange Reactions
-```bash
-# List medium components
-awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv
-```
-
-## Tips and Best Practices
-
-### Model Selection
-- **ENGRO2**: Balanced coverage for human tissue analysis
-- **HMRcore**: Fast processing for algorithm development  
-- **Recon**: Comprehensive analysis requiring computational resources
-- **Custom**: Organism-specific or specialized models
-
-### Output Format Optimization
-- **CSV**: Lightweight, universal compatibility
-- Choose based on downstream analysis requirements
-
-### Performance Considerations
-- Large models (Recon) may require substantial memory
-- Consider batch processing for multiple extractions
-
 ## Troubleshooting
 
-### Common Issues
-
-**Model loading fails**
-- Check file format and compression
-- Verify SBML/JSON/MAT/YAML validity for custom models
-- Ensure sufficient system memory
-
-**Empty output file**
-- Model may contain no reactions
-- Check model file integrity
-- Verify tool directory configuration
-
-### Error Messages
-
-| Error | Cause | Solution |
-|-------|-------|----------|
-| "Model file not found" | Invalid file path | Check file location and permissions |
-| "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YAML |
-| "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory |
-
-### Performance Issues
-
-**Slow processing**
-- Large models require more time
-- Monitor system resource usage
-
-**Memory errors**
-- Reduce model size if possible
-- Process in smaller batches
-- Increase available system memory
-
-**Output file corruption**  
-- Check disk space availability
-- Verify file write permissions
-- Monitor for system interruptions
-
-## Advanced Usage
-
-### Batch Extraction Script
-
-```python
-#!/usr/bin/env python3
-import subprocess
-import sys
-
-models = ['ENGRO2', 'HMRcore', 'Recon']
-
-for model in models:
-    cmd = [
-        'importMetabolicModel',
-        '--model', model,
-        '--name', f'{model}_data',
-        '--medium_selector', 'allOpen',
-        '--out_tabular', f'{model}.csv',
-        '--out_log', f'{model}.log',
-        '--tool_dir', '/opt/COBRAxy/src'
-    ]
-    subprocess.run(cmd, check=True)
-```
-
-### Database Integration
-
-Export model data to databases:
-
-```sql
--- Load CSV into PostgreSQL
-CREATE TABLE model_reactions (
-    reaction_id VARCHAR(50),
-    gpr_rule TEXT,
-    reaction_formula TEXT,
-    lower_bound FLOAT,
-    upper_bound FLOAT,
-    objective_coefficient FLOAT,
-    medium_member BOOLEAN,
-    compartment VARCHAR(50),
-    subsystem VARCHAR(100)
-);
-
-COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER;
-```
+| Error | Solution |
+|-------|----------|
+| "Model file not found" | Check file path |
+| "Unsupported format" | Use SBML, JSON, MAT, or YAML |
 
 ## See Also
 
-- [Export Metabolic Model](export-metabolic-model.md) - Export tabular data to model formats
-- [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation
-- [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis
-- [Custom Model Tutorial](/tutorials/custom-model-integration.md)
\ No newline at end of file
+- [Export Metabolic Model](reference/export-metabolic-model)
+- [RAS Generator](tools/ras-generator)
+- [RPS Generator](tools/rps-generator)