diff COBRAxy/README.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents a6e45049c1b9
children fd53d42348bd
line wrap: on
line diff
--- a/COBRAxy/README.md	Mon Sep 29 15:34:59 2025 +0000
+++ b/COBRAxy/README.md	Tue Sep 30 14:02:17 2025 +0000
@@ -1,289 +1,313 @@
-<p align="center">
-	<img src="https://opencobra.github.io/cobrapy/_static/img/cobrapy_logo.png" alt="COBRApy logo" width="120"/>
-</p>
-
-# COBRAxy — Metabolic analysis and visualization toolkit (Galaxy-ready)
-
-COBRAxy (COBRApy in Galaxy) is a toolkit to compute, analyze, and visualize metabolism at the reaction level from transcriptomics and metabolomics data. It enables users to:
+<div align="center">
+  <img src="docs/_media/logo.png" alt="COBRAxy Logo" width="200"/>
+</div>
 
-- derive Reaction Activity Scores (RAS) from gene expression and Reaction Propensity Scores (RPS) from metabolite abundances,
-- integrate RAS into model bounds,
-- perform flux sampling with either CBS (constraint-based sampling) or OPTGP,
-- compute statistics (pFBA, FVA, sensitivity) and generate styled SVG/PDF metabolic maps,
-- run all tools as Galaxy wrappers or via CLI on any machine.
-
-It extends the MaREA 2 (Metabolic Reaction Enrichment Analysis) concept by adding sampling-based flux comparison and rich visualization. The repository ships both Python CLIs and Galaxy tool XMLs.
-
-## Table of contents
+# COBRAxy
 
-- Overview and features
-- Requirements
-- Installation (pip/conda)
-- Quick start (CLI)
-- Tools and usage
-	- custom_data_generator
-	- ras_generator (RAS)
-	- rps_generator (RPS)
-	- ras_to_bounds
-	- flux_simulation (CBS/OPTGP)
-	- marea (enrichment + maps)
-	- flux_to_map (maps from fluxes)
-	- marea_cluster (clustering auxiliaries)
-- Typical workflow
-- Input/output formats
-- Galaxy usage
-- Troubleshooting
-- Contributing
-- License and citations
-- Useful links
+A Python toolkit for metabolic flux analysis and visualization, with Galaxy integration.
+
+COBRAxy transforms gene expression and metabolite data into meaningful metabolic insights through flux sampling and interactive pathway maps.
+DOC: https://compbtbs.github.io/COBRAxy
+## Features
 
-## Overview and features
-
-COBRAxy builds on COBRApy to deliver end‑to‑end analysis from expression/metabolite data to flux statistics and map rendering:
-
-- RAS and RPS computation from tabular inputs
-- Bounds integration and model preparation
-- Flux sampling: CBS (GLPK backend) with automatic fallback to a COBRApy interface, or OPTGP
-- Flux statistics: mean/median/quantiles, pFBA, FVA, sensitivity
-- Map styling/export: SVG with optional PDF/PNG export
-- Ready-made Galaxy wrappers for all tools
-
-Bundled resources in `local/` include example models (ENGRO2, Recon), gene mappings, a default medium, and SVG maps.
-
-## Requirements
+- **Reaction Activity Scores (RAS)** from gene expression data
+- **Reaction Propensity Scores (RPS)** from metabolite abundance
+- **Flux sampling** with CBS or OptGP algorithms  
+- **Statistical analysis** with pFBA, FVA, and sensitivity analysis
+- **Interactive maps** with SVG/PDF export and custom styling
+- **Galaxy tools** for web-based analysis
+- **Built-in models** including ENGRO2 and Recon
 
-- OS: Linux, macOS, or Windows (Linux recommended; Galaxy typically runs on Linux)
-- Python: 3.8.20 ≤ version < 3.12 (as per `setup.py`)
-- Python packages (installed automatically by `pip install .`):
-	- cobra==0.29.0, numpy==1.24.4, pandas==2.0.3, scipy==1.11, scikit-learn==1.3.2, seaborn==0.13.0
-	- matplotlib==3.7.3, lxml==5.2.2, cairosvg==2.7.1, svglib==1.5.1, pyvips==2.2.3, Pillow
-	- joblib==1.4.2, anndata==0.8.0, pydeseq2==0.5.1
-- Optional but recommended for CBS sampling performance:
-	- GLPK solver and Python bindings
-		- System library: glpk (e.g., Ubuntu: `apt-get install glpk-utils libglpk40`)
-		- Python: `swiglpk` (note: CBS falls back to a COBRApy interface if GLPK is unavailable)
-- For pyvips: system libvips (e.g., Ubuntu: `apt-get install libvips`)
+## Quick Start
 
-Notes:
-- If you hit system-level library errors for SVG/PDF/PNG conversion or vips, install the corresponding OS packages.
-- GPU is not required.
-
-## Installation
-
-Python virtual environment is strongly recommended.
-
-### Install from source (pip)
-
-1) Clone the repo and install:
+### Installation
 
 ```bash
 git clone https://github.com/CompBtBs/COBRAxy.git
 cd COBRAxy
-python3 -m venv .venv && source .venv/bin/activate
-pip install --upgrade pip
 pip install .
 ```
 
-This installs console entry points: `custom_data_generator`, `ras_generator`, `rps_generator`, `ras_to_bounds`, `flux_simulation`, `flux_to_map`, `marea`, `marea_cluster`.
-
-### Install with conda (alternative)
-
-```bash
-conda create -n cobraxy python=3.10 -y
-conda activate cobraxy
-pip install .
-# Optional system deps (Ubuntu): sudo apt-get install libvips libxml2 libxslt1.1 glpk-utils
-# Optional Python bindings for GLPK: pip install swiglpk
-```
-
-## Quick start (CLI)
-
-All tools provide `-h/--help` for details. Outputs are TSV/CSV and SVG/PDF files depending on the tool and flags.
-
-Example minimal flow (using built-in ENGRO2 model and provided assets):
+### Basic Workflow
 
 ```bash
-# 1) Generate rules/reactions/bounds/medium from a model (optional if using bundled ones)
-custom_data_generator \
-	-id local/models/ENGRO2.xml \
-	-mn ENGRO2.xml \
-	-orules out/ENGRO2_rules.tsv \
-	-orxns out/ENGRO2_reactions.tsv \
-	-omedium out/ENGRO2_medium.tsv \
-	-obnds out/ENGRO2_bounds.tsv
+# 1. Generate RAS from expression data
+ras_generator -td $(pwd) -in expression.tsv -ra ras_output.tsv -rs ENGRO2
+
+# 2. Generate RPS from metabolite data (optional)
+rps_generator -td $(pwd) -id metabolites.tsv -rp rps_output.tsv
+
+# 3. Create enriched pathway maps with statistical analysis
+marea -td $(pwd) -using_RAS true -input_data ras_output.tsv -choice_map ENGRO2 -gs true -idop base_maps
+
+# 4. Apply RAS constraints to model for flux simulation
+ras_to_bounds -td $(pwd) -ms ENGRO2 -ir ras_output.tsv -rs true -idop bounds_output
+
+# 5. Sample metabolic fluxes with constrained model
+flux_simulation -td $(pwd) -ms ENGRO2 -in bounds_output/*.tsv -a CBS -ns 1000 -idop flux_results
+
+# 6. Add flux data to enriched maps
+flux_to_map -td $(pwd) -if flux_results/*.tsv -mp base_maps/*.svg -idop final_maps
+```
+
+## Tools
 
-# 2) Compute RAS from expression data
-ras_generator \
-	-td $(pwd) \
-	-in my_expression.tsv \
-	-ra out/ras.tsv \
-	-rs ENGRO2
+| Tool | Purpose | Input | Output |
+|------|---------|--------|---------|
+| `metabolic_model_setting` | Extract model components | SBML model | Rules, reactions, bounds, medium |
+| `ras_generator` | Compute reaction activity scores | Gene expression data | RAS values |
+| `rps_generator` | Compute reaction propensity scores | Metabolite abundance | RPS values |
+| `marea` | Statistical pathway analysis | RAS + RPS data | Enrichment + base maps |
+| `ras_to_bounds` | Apply RAS constraints to model | RAS + SBML model | Constrained bounds |
+| `flux_simulation` | Sample metabolic fluxes | Constrained model | Flux distributions |
+| `flux_to_map` | Add fluxes to enriched maps | Flux samples + base maps | Final styled maps |
+| `marea_cluster` | Cluster analysis | Expression/flux data | Sample clusters |
+
+## Requirements
+
+- **Python**: 3.8-3.11
+- **OS**: Linux, macOS, Windows (Linux recommended)
+- **Dependencies**: Automatically installed via pip (COBRApy, pandas, numpy, etc.)
+
+**Optional system libraries** (for enhanced features):
+```bash
+# Ubuntu/Debian
+sudo apt-get install libvips libglpk40 glpk-utils
+
+# For Python GLPK bindings
+pip install swiglpk
+```
+
+## Data Flow
 
-# 3) Integrate RAS into bounds
-ras_to_bounds \
-	-td $(pwd) \
-	-ms ENGRO2 \
-	-ir out/ras.tsv \
-	-rs true \
-	-idop out/ras_bounds
+```
+Gene Expression    Metabolite Data    SBML Model
+      ↓                   ↓               ↓
+  RAS Generator      RPS Generator   Model Tables
+      ↓                   ↓               
+    RAS Values       RPS Values           
+    | ↓                   ↓               
+    | └─────────┬─────────┘               
+    |           ↓                         
+    |        MAREA                        
+    |    (Enrichment +                    
+    |     Base Maps)                      
+    ↓                
+    RAS Values  →  RAS to Bounds  ←── Model Tables
+                        ↓
+                  Constrained Model
+                        ↓
+                  Flux Simulation
+                        ↓
+                   Flux Samples
+                        ↓
+                   Flux to Map  ←── Maps (ENGRO2)
+                        ↓
+               Final Enriched Maps
+```
+
+## Built-in Models & Data
 
-# 4) Flux sampling (CBS)
-flux_simulation \
-	-td $(pwd) \
-	-ms ENGRO2 \
-	-in out/ras_bounds/sample1.tsv,out/ras_bounds/sample2.tsv \
-	-ni sample1,sample2 \
-	-a CBS -ns 500 -sd 0 -nb 1 \
-	-ot mean,median,quantiles \
-	-ota pFBA,FVA,sensitivity \
-	-idop out/flux
+COBRAxy includes ready-to-use resources:
+
+- **Models**: ENGRO2, Recon (human metabolism)
+- **Gene mappings**: HGNC, Ensembl, Entrez ID conversions
+- **Pathway maps**: Pre-styled SVG templates
+- **Medium compositions**: Standard growth conditions
+
+Located in `local/` directory for immediate use.
+
+## Command Line Usage
+
+All tools support `--help` for detailed options. Key commands:
 
-# 5) Enrichment + map styling (RAS/RPS or fluxes)
-marea \
-	-td $(pwd) \
-	-using_RAS true -input_data out/ras.tsv \
-	-comparison manyvsmany -test ks \
-	-generate_svg true -generate_pdf true \
-	-choice_map ENGRO2 -idop out/maps
+### Generate RAS/RPS scores
+```bash
+# From gene expression
+ras_generator -td $(pwd) -in expression.tsv -ra ras_output.tsv -rs ENGRO2
+
+# From metabolite data  
+rps_generator -td $(pwd) -id metabolites.tsv -rp rps_output.tsv
+```
+
+### Flux sampling
+```bash
+flux_simulation -td $(pwd) -ms ENGRO2 -in bounds/*.tsv -a CBS -ns 1000 -idop results/
+```
+
+### Statistical analysis & visualization
+```bash
+marea -td $(pwd) -using_RAS true -input_data ras.tsv -choice_map ENGRO2 -gs true -idop maps/
 ```
 
-## Tools and usage
+## Galaxy Integration
 
-Below is a high‑level summary of each CLI. Use `--help` for the full list of options.
-
-### 1) custom_data_generator
+COBRAxy provides Galaxy tool wrappers (`.xml` files) for web-based analysis:
 
-Generate model‑derived assets.
+- Upload data through Galaxy interface
+- Chain tools in visual workflows  
+- Share and reproduce analyses
+- Access via Galaxy ToolShed
 
-Required inputs:
-- `-id/--input`: model file (XML or JSON; gz/zip/bz2 also supported via extension)
-- `-mn/--name`: the original file name including extension (Galaxy renames files; this preserves the true format)
-- `-orules`, `-orxns`, `-omedium`, `-obnds`: output paths
+## Tutorials
 
-Outputs:
-- TSV with rules, reactions, exchange medium, and bounds.
+### Local Galaxy Installation
 
-### 2) ras_generator (Reaction Activity Scores)
-
-Compute RAS from a gene expression table.
+To set up a local Galaxy instance with COBRAxy tools:
 
-Key inputs:
-- `-td/--tool_dir`: repository root path (used to locate `local/` assets)
-- `-in/--input`: expression TSV (rows: genes; columns: samples)
-- `-rs/--rules_selector`: model/rules choice, e.g. `ENGRO2` or `Custom` with `-rl` and `-rn`
-- Optional: `-rl/--rule_list` custom rules TSV, `-rn/--rules_name` its original name/extension
-- Output: `-ra/--ras_output` TSV
+1. **Install Galaxy**:
+   ```bash
+   # Clone Galaxy repository
+   git clone -b release_23.1 https://github.com/galaxyproject/galaxy.git
+   cd galaxy
+   
+   # Install dependencies and start Galaxy
+   sh run.sh
+   ```
 
-### 3) rps_generator (Reaction Propensity Scores)
-
-Compute RPS from a metabolite abundance table.
-
-Key inputs:
-- `-td/--tool_dir`: repository root
-- `-id/--input`: metabolite TSV (rows: metabolites; columns: samples)
-- `-rc/--reaction_choice`: `default` or `custom` with `-cm/--custom` reactions TSV
-- Output: `-rp/--rps_output` TSV
-
-### 4) ras_to_bounds
-
-Integrate RAS into reaction bounds for a given model and medium.
+2. **Install COBRAxy tools**:
+   ```bash
+   # Add COBRAxy tools to Galaxy
+   mkdir -p tools/cobraxy
+   cp path/to/COBRAxy/Galaxy_tools/*.xml tools/cobraxy/
+   
+   # Update tool_conf.xml to include COBRAxy tools
+   # Add section in config/tool_conf.xml:
+   # <section id="cobraxy" name="COBRAxy">
+   #   <tool file="cobraxy/ras_generator.xml" />
+   #   <tool file="cobraxy/rps_generator.xml" />
+   #   <tool file="cobraxy/marea.xml" />
+   #   <!-- Add other tools -->
+   # </section>
+   ```
 
-Key inputs:
-- `-td/--tool_dir`: repository root
-- `-ms/--model_selector`: one of `ENGRO2` or `Custom` with `-mo/--model` and `-mn/--model_name`
-- Medium: `-mes/--medium_selector` (default `allOpen`) or `-meo/--medium` custom TSV
-- RAS: `-ir/--input_ras` and `-rs/--ras_selector` (true/false)
-- Output folder: `-idop/--output_path`
+3. **Galaxy Tutorial Resources**:
+   - [Galaxy Installation Guide](https://docs.galaxyproject.org/en/master/admin/)
+   - [Tool Development Tutorial](https://training.galaxyproject.org/training-material/topics/dev/)
+   - [Galaxy Admin Training](https://training.galaxyproject.org/training-material/topics/admin/)
+
+### Python Direct Usage
 
-Outputs:
-- One bounds TSV per sample in the RAS table.
+For programmatic use of COBRAxy tools in Python scripts:
 
-### 5) flux_simulation
-
-Flux sampling with CBS or OPTGP and downstream statistics.
+1. **Installation for Development**:
+   ```bash
+   # Clone and install in development mode
+   git clone https://github.com/CompBtBs/COBRAxy.git
+   cd COBRAxy
+   pip install -e .
+   ```
 
-Key inputs:
-- `-td/--tool_dir`
-- Model: `-ms/--model_selector` (ENGRO2 or Custom with `-mo`/`-mn`)
-- Bounds files: `-in` (comma‑separated list) and `-ni/--names` (comma‑separated sample names)
-- Algorithm: `-a CBS|OPTGP`; CBS uses GLPK if available and falls back to a COBRApy interface
-- Sampling params: `-ns/--n_samples`, `-th/--thinning` (OPTGP), `-nb/--n_batches`, `-sd/--seed`
-- Outputs: `-ot/--output_type` (mean,median,quantiles) and `-ota/--output_type_analysis` (pFBA,FVA,sensitivity)
-- Output path: `-idop/--output_path`
-
-Outputs:
-- Per‑sample or aggregated CSV/TSV with flux samples and statistics.
-
-### 6) marea
-
-Statistical enrichment and map styling for RAS and/or RPS groups with optional DESeq2‑style testing via `pydeseq2`.
-
-Key inputs:
-- `-td/--tool_dir`
-- Comparison: `-co manyvsmany|onevsrest|onevsmany`
-- Test: `-te ks|ttest_p|ttest_ind|wilcoxon|mw|DESeq`
-- Thresholds: `-pv`, `-adj` (FDR), `-fc`
-- Data: RAS `-using_RAS` plus `-input_data` or multiple datasets with names; similarly for RPS with `-using_RPS`
-- Map: `-choice_map HMRcore|ENGRO2|Custom` or `-custom_map` SVG
-- Output: `-gs/--generate_svg`, `-gp/--generate_pdf`, output dir `-idop`
-
-Outputs:
-- Styled SVG (and optional PDF/PNG) highlighting enriched reactions by color/width per your thresholds.
-
-### 7) flux_to_map
+2. **Python API Usage**:
+   ```python
+   import sys
+   import os
+   
+   # Add COBRAxy to Python path
+   sys.path.append('/path/to/COBRAxy')
+   
+   # Import tool modules
+   import ras_generator
+   import rps_generator
+   import flux_simulation
+   import marea
+   import ras_to_bounds
+   
+   # Set working directory
+   tool_dir = "/path/to/COBRAxy"
+   os.chdir(tool_dir)
+   
+   # Generate RAS scores
+   ras_args = [
+       '-td', tool_dir,
+       '-in', 'data/expression.tsv',
+       '-ra', 'output/ras_values.tsv',
+       '-rs', 'ENGRO2'
+   ]
+   ras_generator.main(ras_args)
+   
+   # Generate RPS scores (optional)
+   rps_args = [
+       '-td', tool_dir,
+       '-id', 'data/metabolites.tsv',
+       '-rp', 'output/rps_values.tsv'
+   ]
+   rps_generator.main(rps_args)
+   
+   # Create enriched pathway maps
+   marea_args = [
+       '-td', tool_dir,
+       '-using_RAS', 'true',
+       '-input_data', 'output/ras_values.tsv',
+       '-choice_map', 'ENGRO2',
+       '-gs', 'true',
+       '-idop', 'maps'
+   ]
+   marea.main(marea_args)
+   
+   # Apply RAS constraints to model
+   bounds_args = [
+       '-td', tool_dir,
+       '-ms', 'ENGRO2',
+       '-ir', 'output/ras_values.tsv',
+       '-rs', 'true',
+       '-idop', 'bounds'
+   ]
+   ras_to_bounds.main(bounds_args)
+   
+   # Sample metabolic fluxes
+   flux_args = [
+       '-td', tool_dir,
+       '-ms', 'ENGRO2',
+       '-in', 'bounds/bounds_output.tsv',
+       '-a', 'CBS',
+       '-ns', '1000',
+       '-idop', 'flux_results'
+   ]
+   flux_simulation.main(flux_args)
+   ```
 
-Like `marea`, but driven by fluxes instead of RAS/RPS. Accepts single or multiple flux datasets and produces styled maps.
-
-### 8) marea_cluster
-
-Convenience clustering utilities (k‑means, DBSCAN, hierarchical) for grouping samples; produces labels and optional plots.
+3. **Python Tutorial Resources**:
+   - [COBRApy Documentation](https://cobrapy.readthedocs.io/)
+   - [Metabolic Modeling with Python](https://opencobra.github.io/cobrapy/building_model.html)
+   - [Flux Sampling Tutorial](https://cobrapy.readthedocs.io/en/stable/sampling.html)
+   - [Jupyter Notebooks Examples](examples/) (included in repository)
 
-## Typical workflow
-
-1. Prepare a model and generate its assets (optional if using bundled assets): `custom_data_generator`
-2. Compute RAS from expression: `ras_generator` (and/or compute RPS via `rps_generator`)
-3. Integrate RAS into bounds: `ras_to_bounds`
-4. Sample fluxes: `flux_simulation` with CBS or OPTGP
-5. Analyze and visualize: `marea` or `flux_to_map` to render SVG/PDF metabolic maps
-6. Optionally cluster or further analyze results: `marea_cluster`
+## Input/Output Formats
 
-## Input/output formats
-
-Unless otherwise stated, inputs are tab‑separated (TSV) text files with headers.
-
-- Expression (RAS): rows = genes (HGNC/Ensembl/symbol/Entrez supported), columns = samples
-- Metabolite table (RPS): rows = metabolites, columns = samples
-- Rules/Reactions: TSV with two columns: ReactionID, Rule/Reaction
-- Bounds: TSV with index = reaction IDs, columns = lower_bound, upper_bound
-- Medium: single‑column TSV listing exchange reactions
-- Flux samples/statistics: CSV/TSV with reactions as rows and samples/statistics as columns
-
-## Galaxy usage
-
-Each CLI has a corresponding Galaxy tool XML in the repository (e.g., `marea.xml`, `flux_simulation.xml`). Use `shed.yml` to publish to a Galaxy toolshed. The `local/` directory provides models, mappings, and maps for out‑of‑the‑box runs inside Galaxy.
+| Data Type | Format | Description |
+|-----------|---------|-------------|
+| Gene expression | TSV | Genes (rows) × Samples (columns) |
+| Metabolites | TSV | Metabolites (rows) × Samples (columns) |  
+| Models | SBML | Standard metabolic model format |
+| Results | TSV/CSV | Tabular flux/score data |
+| Maps | SVG/PDF | Styled pathway visualizations |
 
 ## Troubleshooting
 
-- GLPK/CBS issues: if `swiglpk` or GLPK is missing, `flux_simulation` will attempt a COBRApy fallback. Install GLPK + `swiglpk` for best performance.
-- pyvips errors: install `libvips` on your system. Reinstall the `pyvips` wheel afterward if needed.
-- PDF/SVG conversions: ensure `cairosvg`, `svglib`, and system libraries (`libxml2`, `libxslt`) are installed.
-- Python version: stick to Python ≥3.8.20 and <3.12.
-- Memory/time: reduce `-ns` (samples) or `-nb` (batches); consider OPTGP if CBS is slow for your model.
+**Common issues:**
+
+- **Missing GLPK**: Install `glpk-utils` and `swiglpk` for optimal CBS performance
+- **SVG errors**: Install `libvips` system library
+- **Memory issues**: Reduce sampling count (`-ns`) or use fewer batches (`-nb`)
 
 ## Contributing
 
-Pull requests are welcome. Please:
-- keep changes focused and documented,
-- add concise docstrings/comments in English,
-- preserve public CLI parameters and file formats.
+Contributions welcome! Please:
+- Follow existing code style
+- Add documentation for new features
+- Test with provided example data
+- Submit focused pull requests
 
-## License and citations
+## Citation
 
-This project is distributed under the MIT License. If you use COBRAxy in academic work, please cite COBRApy and MaREA, and reference this repository.
-
-## Useful links
+If you use COBRAxy in research, please cite:
+- [COBRApy](https://opencobra.github.io/cobrapy/) for core metabolic modeling
+- [MaREA](https://galaxyproject.org/use/marea4galaxy/) for enrichment methods
+- This repository for integrated workflow
 
-- COBRAxy Google Summer of Code 2024: https://summerofcode.withgoogle.com/programs/2024/projects/LSrCKfq7
-- COBRApy: https://opencobra.github.io/cobrapy/
-- MaREA4Galaxy: https://galaxyproject.org/use/marea4galaxy/
-- Galaxy project: https://usegalaxy.org/
+## Links
+
+- [COBRApy Documentation](https://opencobra.github.io/cobrapy/)
+- [Galaxy Project](https://usegalaxy.org/)
+- [GSoC 2024 Project](https://summerofcode.withgoogle.com/programs/2024/projects/LSrCKfq7)