diff COBRAxy/README.md @ 456:a6e45049c1b9 draft

Uploaded
author francesco_lapi
date Fri, 12 Sep 2025 17:28:45 +0000
parents 7e703e546998
children 4ed95023af20
line wrap: on
line diff
--- a/COBRAxy/README.md	Fri Sep 12 15:05:54 2025 +0000
+++ b/COBRAxy/README.md	Fri Sep 12 17:28:45 2025 +0000
@@ -1,11 +1,289 @@
-# Official repository for the COBRAxy toolset
-> COBRAxy (COBRApy in Galaxy) is a user-friendly tool that allows a user to user to characterize and to graphically compare simulated fluxomics coming from groups of samples with different transcriptional regulation of metabolism. 
-It extends the MaREA 2 (Metabolic Reaction Enrichment Analysis) tool that enables users to compare groups in terms of RAS and RPS only. The tool is available as plug-in for the widely-used Galaxy platform for comparative genomics and bioinformatics analyses.
+<p align="center">
+	<img src="https://opencobra.github.io/cobrapy/_static/img/cobrapy_logo.png" alt="COBRApy logo" width="120"/>
+</p>
+
+# COBRAxy — Metabolic analysis and visualization toolkit (Galaxy-ready)
+
+COBRAxy (COBRApy in Galaxy) is a toolkit to compute, analyze, and visualize metabolism at the reaction level from transcriptomics and metabolomics data. It enables users to:
+
+- derive Reaction Activity Scores (RAS) from gene expression and Reaction Propensity Scores (RPS) from metabolite abundances,
+- integrate RAS into model bounds,
+- perform flux sampling with either CBS (constraint-based sampling) or OPTGP,
+- compute statistics (pFBA, FVA, sensitivity) and generate styled SVG/PDF metabolic maps,
+- run all tools as Galaxy wrappers or via CLI on any machine.
+
+It extends the MaREA 2 (Metabolic Reaction Enrichment Analysis) concept by adding sampling-based flux comparison and rich visualization. The repository ships both Python CLIs and Galaxy tool XMLs.
+
+## Table of contents
+
+- Overview and features
+- Requirements
+- Installation (pip/conda)
+- Quick start (CLI)
+- Tools and usage
+	- custom_data_generator
+	- ras_generator (RAS)
+	- rps_generator (RPS)
+	- ras_to_bounds
+	- flux_simulation (CBS/OPTGP)
+	- marea (enrichment + maps)
+	- flux_to_map (maps from fluxes)
+	- marea_cluster (clustering auxiliaries)
+- Typical workflow
+- Input/output formats
+- Galaxy usage
+- Troubleshooting
+- Contributing
+- License and citations
+- Useful links
+
+## Overview and features
+
+COBRAxy builds on COBRApy to deliver end‑to‑end analysis from expression/metabolite data to flux statistics and map rendering:
+
+- RAS and RPS computation from tabular inputs
+- Bounds integration and model preparation
+- Flux sampling: CBS (GLPK backend) with automatic fallback to a COBRApy interface, or OPTGP
+- Flux statistics: mean/median/quantiles, pFBA, FVA, sensitivity
+- Map styling/export: SVG with optional PDF/PNG export
+- Ready-made Galaxy wrappers for all tools
+
+Bundled resources in `local/` include example models (ENGRO2, Recon), gene mappings, a default medium, and SVG maps.
+
+## Requirements
+
+- OS: Linux, macOS, or Windows (Linux recommended; Galaxy typically runs on Linux)
+- Python: 3.8.20 ≤ version < 3.12 (as per `setup.py`)
+- Python packages (installed automatically by `pip install .`):
+	- cobra==0.29.0, numpy==1.24.4, pandas==2.0.3, scipy==1.11, scikit-learn==1.3.2, seaborn==0.13.0
+	- matplotlib==3.7.3, lxml==5.2.2, cairosvg==2.7.1, svglib==1.5.1, pyvips==2.2.3, Pillow
+	- joblib==1.4.2, anndata==0.8.0, pydeseq2==0.5.1
+- Optional but recommended for CBS sampling performance:
+	- GLPK solver and Python bindings
+		- System library: glpk (e.g., Ubuntu: `apt-get install glpk-utils libglpk40`)
+		- Python: `swiglpk` (note: CBS falls back to a COBRApy interface if GLPK is unavailable)
+- For pyvips: system libvips (e.g., Ubuntu: `apt-get install libvips`)
+
+Notes:
+- If you hit system-level library errors for SVG/PDF/PNG conversion or vips, install the corresponding OS packages.
+- GPU is not required.
+
+## Installation
+
+Python virtual environment is strongly recommended.
+
+### Install from source (pip)
+
+1) Clone the repo and install:
+
+```bash
+git clone https://github.com/CompBtBs/COBRAxy.git
+cd COBRAxy
+python3 -m venv .venv && source .venv/bin/activate
+pip install --upgrade pip
+pip install .
+```
+
+This installs console entry points: `custom_data_generator`, `ras_generator`, `rps_generator`, `ras_to_bounds`, `flux_simulation`, `flux_to_map`, `marea`, `marea_cluster`.
+
+### Install with conda (alternative)
+
+```bash
+conda create -n cobraxy python=3.10 -y
+conda activate cobraxy
+pip install .
+# Optional system deps (Ubuntu): sudo apt-get install libvips libxml2 libxslt1.1 glpk-utils
+# Optional Python bindings for GLPK: pip install swiglpk
+```
+
+## Quick start (CLI)
+
+All tools provide `-h/--help` for details. Outputs are TSV/CSV and SVG/PDF files depending on the tool and flags.
+
+Example minimal flow (using built-in ENGRO2 model and provided assets):
+
+```bash
+# 1) Generate rules/reactions/bounds/medium from a model (optional if using bundled ones)
+custom_data_generator \
+	-id local/models/ENGRO2.xml \
+	-mn ENGRO2.xml \
+	-orules out/ENGRO2_rules.tsv \
+	-orxns out/ENGRO2_reactions.tsv \
+	-omedium out/ENGRO2_medium.tsv \
+	-obnds out/ENGRO2_bounds.tsv
+
+# 2) Compute RAS from expression data
+ras_generator \
+	-td $(pwd) \
+	-in my_expression.tsv \
+	-ra out/ras.tsv \
+	-rs ENGRO2
+
+# 3) Integrate RAS into bounds
+ras_to_bounds \
+	-td $(pwd) \
+	-ms ENGRO2 \
+	-ir out/ras.tsv \
+	-rs true \
+	-idop out/ras_bounds
+
+# 4) Flux sampling (CBS)
+flux_simulation \
+	-td $(pwd) \
+	-ms ENGRO2 \
+	-in out/ras_bounds/sample1.tsv,out/ras_bounds/sample2.tsv \
+	-ni sample1,sample2 \
+	-a CBS -ns 500 -sd 0 -nb 1 \
+	-ot mean,median,quantiles \
+	-ota pFBA,FVA,sensitivity \
+	-idop out/flux
 
-## Useful links:
+# 5) Enrichment + map styling (RAS/RPS or fluxes)
+marea \
+	-td $(pwd) \
+	-using_RAS true -input_data out/ras.tsv \
+	-comparison manyvsmany -test ks \
+	-generate_svg true -generate_pdf true \
+	-choice_map ENGRO2 -idop out/maps
+```
+
+## Tools and usage
+
+Below is a high‑level summary of each CLI. Use `--help` for the full list of options.
+
+### 1) custom_data_generator
+
+Generate model‑derived assets.
+
+Required inputs:
+- `-id/--input`: model file (XML or JSON; gz/zip/bz2 also supported via extension)
+- `-mn/--name`: the original file name including extension (Galaxy renames files; this preserves the true format)
+- `-orules`, `-orxns`, `-omedium`, `-obnds`: output paths
+
+Outputs:
+- TSV with rules, reactions, exchange medium, and bounds.
+
+### 2) ras_generator (Reaction Activity Scores)
+
+Compute RAS from a gene expression table.
+
+Key inputs:
+- `-td/--tool_dir`: repository root path (used to locate `local/` assets)
+- `-in/--input`: expression TSV (rows: genes; columns: samples)
+- `-rs/--rules_selector`: model/rules choice, e.g. `ENGRO2` or `Custom` with `-rl` and `-rn`
+- Optional: `-rl/--rule_list` custom rules TSV, `-rn/--rules_name` its original name/extension
+- Output: `-ra/--ras_output` TSV
+
+### 3) rps_generator (Reaction Propensity Scores)
+
+Compute RPS from a metabolite abundance table.
+
+Key inputs:
+- `-td/--tool_dir`: repository root
+- `-id/--input`: metabolite TSV (rows: metabolites; columns: samples)
+- `-rc/--reaction_choice`: `default` or `custom` with `-cm/--custom` reactions TSV
+- Output: `-rp/--rps_output` TSV
+
+### 4) ras_to_bounds
+
+Integrate RAS into reaction bounds for a given model and medium.
+
+Key inputs:
+- `-td/--tool_dir`: repository root
+- `-ms/--model_selector`: one of `ENGRO2` or `Custom` with `-mo/--model` and `-mn/--model_name`
+- Medium: `-mes/--medium_selector` (default `allOpen`) or `-meo/--medium` custom TSV
+- RAS: `-ir/--input_ras` and `-rs/--ras_selector` (true/false)
+- Output folder: `-idop/--output_path`
+
+Outputs:
+- One bounds TSV per sample in the RAS table.
+
+### 5) flux_simulation
+
+Flux sampling with CBS or OPTGP and downstream statistics.
+
+Key inputs:
+- `-td/--tool_dir`
+- Model: `-ms/--model_selector` (ENGRO2 or Custom with `-mo`/`-mn`)
+- Bounds files: `-in` (comma‑separated list) and `-ni/--names` (comma‑separated sample names)
+- Algorithm: `-a CBS|OPTGP`; CBS uses GLPK if available and falls back to a COBRApy interface
+- Sampling params: `-ns/--n_samples`, `-th/--thinning` (OPTGP), `-nb/--n_batches`, `-sd/--seed`
+- Outputs: `-ot/--output_type` (mean,median,quantiles) and `-ota/--output_type_analysis` (pFBA,FVA,sensitivity)
+- Output path: `-idop/--output_path`
+
+Outputs:
+- Per‑sample or aggregated CSV/TSV with flux samples and statistics.
+
+### 6) marea
+
+Statistical enrichment and map styling for RAS and/or RPS groups with optional DESeq2‑style testing via `pydeseq2`.
+
+Key inputs:
+- `-td/--tool_dir`
+- Comparison: `-co manyvsmany|onevsrest|onevsmany`
+- Test: `-te ks|ttest_p|ttest_ind|wilcoxon|mw|DESeq`
+- Thresholds: `-pv`, `-adj` (FDR), `-fc`
+- Data: RAS `-using_RAS` plus `-input_data` or multiple datasets with names; similarly for RPS with `-using_RPS`
+- Map: `-choice_map HMRcore|ENGRO2|Custom` or `-custom_map` SVG
+- Output: `-gs/--generate_svg`, `-gp/--generate_pdf`, output dir `-idop`
+
+Outputs:
+- Styled SVG (and optional PDF/PNG) highlighting enriched reactions by color/width per your thresholds.
+
+### 7) flux_to_map
+
+Like `marea`, but driven by fluxes instead of RAS/RPS. Accepts single or multiple flux datasets and produces styled maps.
+
+### 8) marea_cluster
+
+Convenience clustering utilities (k‑means, DBSCAN, hierarchical) for grouping samples; produces labels and optional plots.
+
+## Typical workflow
+
+1. Prepare a model and generate its assets (optional if using bundled assets): `custom_data_generator`
+2. Compute RAS from expression: `ras_generator` (and/or compute RPS via `rps_generator`)
+3. Integrate RAS into bounds: `ras_to_bounds`
+4. Sample fluxes: `flux_simulation` with CBS or OPTGP
+5. Analyze and visualize: `marea` or `flux_to_map` to render SVG/PDF metabolic maps
+6. Optionally cluster or further analyze results: `marea_cluster`
+
+## Input/output formats
+
+Unless otherwise stated, inputs are tab‑separated (TSV) text files with headers.
+
+- Expression (RAS): rows = genes (HGNC/Ensembl/symbol/Entrez supported), columns = samples
+- Metabolite table (RPS): rows = metabolites, columns = samples
+- Rules/Reactions: TSV with two columns: ReactionID, Rule/Reaction
+- Bounds: TSV with index = reaction IDs, columns = lower_bound, upper_bound
+- Medium: single‑column TSV listing exchange reactions
+- Flux samples/statistics: CSV/TSV with reactions as rows and samples/statistics as columns
+
+## Galaxy usage
+
+Each CLI has a corresponding Galaxy tool XML in the repository (e.g., `marea.xml`, `flux_simulation.xml`). Use `shed.yml` to publish to a Galaxy toolshed. The `local/` directory provides models, mappings, and maps for out‑of‑the‑box runs inside Galaxy.
+
+## Troubleshooting
+
+- GLPK/CBS issues: if `swiglpk` or GLPK is missing, `flux_simulation` will attempt a COBRApy fallback. Install GLPK + `swiglpk` for best performance.
+- pyvips errors: install `libvips` on your system. Reinstall the `pyvips` wheel afterward if needed.
+- PDF/SVG conversions: ensure `cairosvg`, `svglib`, and system libraries (`libxml2`, `libxslt`) are installed.
+- Python version: stick to Python ≥3.8.20 and <3.12.
+- Memory/time: reduce `-ns` (samples) or `-nb` (batches); consider OPTGP if CBS is slow for your model.
+
+## Contributing
+
+Pull requests are welcome. Please:
+- keep changes focused and documented,
+- add concise docstrings/comments in English,
+- preserve public CLI parameters and file formats.
+
+## License and citations
+
+This project is distributed under the MIT License. If you use COBRAxy in academic work, please cite COBRApy and MaREA, and reference this repository.
+
+## Useful links
+
 - COBRAxy Google Summer of Code 2024: https://summerofcode.withgoogle.com/programs/2024/projects/LSrCKfq7
 - COBRApy: https://opencobra.github.io/cobrapy/
 - MaREA4Galaxy: https://galaxyproject.org/use/marea4galaxy/
 - Galaxy project: https://usegalaxy.org/
-
-## Documentation:
\ No newline at end of file