Mercurial > repos > bimib > cobraxy
view COBRAxy/README.md @ 475:c91c502f12e9 draft
Uploaded
| author | luca_milaz |
|---|---|
| date | Mon, 22 Sep 2025 15:38:51 +0000 |
| parents | a6e45049c1b9 |
| children | 4ed95023af20 |
line wrap: on
line source
<p align="center"> <img src="https://opencobra.github.io/cobrapy/_static/img/cobrapy_logo.png" alt="COBRApy logo" width="120"/> </p> # COBRAxy — Metabolic analysis and visualization toolkit (Galaxy-ready) COBRAxy (COBRApy in Galaxy) is a toolkit to compute, analyze, and visualize metabolism at the reaction level from transcriptomics and metabolomics data. It enables users to: - derive Reaction Activity Scores (RAS) from gene expression and Reaction Propensity Scores (RPS) from metabolite abundances, - integrate RAS into model bounds, - perform flux sampling with either CBS (constraint-based sampling) or OPTGP, - compute statistics (pFBA, FVA, sensitivity) and generate styled SVG/PDF metabolic maps, - run all tools as Galaxy wrappers or via CLI on any machine. It extends the MaREA 2 (Metabolic Reaction Enrichment Analysis) concept by adding sampling-based flux comparison and rich visualization. The repository ships both Python CLIs and Galaxy tool XMLs. ## Table of contents - Overview and features - Requirements - Installation (pip/conda) - Quick start (CLI) - Tools and usage - custom_data_generator - ras_generator (RAS) - rps_generator (RPS) - ras_to_bounds - flux_simulation (CBS/OPTGP) - marea (enrichment + maps) - flux_to_map (maps from fluxes) - marea_cluster (clustering auxiliaries) - Typical workflow - Input/output formats - Galaxy usage - Troubleshooting - Contributing - License and citations - Useful links ## Overview and features COBRAxy builds on COBRApy to deliver end‑to‑end analysis from expression/metabolite data to flux statistics and map rendering: - RAS and RPS computation from tabular inputs - Bounds integration and model preparation - Flux sampling: CBS (GLPK backend) with automatic fallback to a COBRApy interface, or OPTGP - Flux statistics: mean/median/quantiles, pFBA, FVA, sensitivity - Map styling/export: SVG with optional PDF/PNG export - Ready-made Galaxy wrappers for all tools Bundled resources in `local/` include example models (ENGRO2, Recon), gene mappings, a default medium, and SVG maps. ## Requirements - OS: Linux, macOS, or Windows (Linux recommended; Galaxy typically runs on Linux) - Python: 3.8.20 ≤ version < 3.12 (as per `setup.py`) - Python packages (installed automatically by `pip install .`): - cobra==0.29.0, numpy==1.24.4, pandas==2.0.3, scipy==1.11, scikit-learn==1.3.2, seaborn==0.13.0 - matplotlib==3.7.3, lxml==5.2.2, cairosvg==2.7.1, svglib==1.5.1, pyvips==2.2.3, Pillow - joblib==1.4.2, anndata==0.8.0, pydeseq2==0.5.1 - Optional but recommended for CBS sampling performance: - GLPK solver and Python bindings - System library: glpk (e.g., Ubuntu: `apt-get install glpk-utils libglpk40`) - Python: `swiglpk` (note: CBS falls back to a COBRApy interface if GLPK is unavailable) - For pyvips: system libvips (e.g., Ubuntu: `apt-get install libvips`) Notes: - If you hit system-level library errors for SVG/PDF/PNG conversion or vips, install the corresponding OS packages. - GPU is not required. ## Installation Python virtual environment is strongly recommended. ### Install from source (pip) 1) Clone the repo and install: ```bash git clone https://github.com/CompBtBs/COBRAxy.git cd COBRAxy python3 -m venv .venv && source .venv/bin/activate pip install --upgrade pip pip install . ``` This installs console entry points: `custom_data_generator`, `ras_generator`, `rps_generator`, `ras_to_bounds`, `flux_simulation`, `flux_to_map`, `marea`, `marea_cluster`. ### Install with conda (alternative) ```bash conda create -n cobraxy python=3.10 -y conda activate cobraxy pip install . # Optional system deps (Ubuntu): sudo apt-get install libvips libxml2 libxslt1.1 glpk-utils # Optional Python bindings for GLPK: pip install swiglpk ``` ## Quick start (CLI) All tools provide `-h/--help` for details. Outputs are TSV/CSV and SVG/PDF files depending on the tool and flags. Example minimal flow (using built-in ENGRO2 model and provided assets): ```bash # 1) Generate rules/reactions/bounds/medium from a model (optional if using bundled ones) custom_data_generator \ -id local/models/ENGRO2.xml \ -mn ENGRO2.xml \ -orules out/ENGRO2_rules.tsv \ -orxns out/ENGRO2_reactions.tsv \ -omedium out/ENGRO2_medium.tsv \ -obnds out/ENGRO2_bounds.tsv # 2) Compute RAS from expression data ras_generator \ -td $(pwd) \ -in my_expression.tsv \ -ra out/ras.tsv \ -rs ENGRO2 # 3) Integrate RAS into bounds ras_to_bounds \ -td $(pwd) \ -ms ENGRO2 \ -ir out/ras.tsv \ -rs true \ -idop out/ras_bounds # 4) Flux sampling (CBS) flux_simulation \ -td $(pwd) \ -ms ENGRO2 \ -in out/ras_bounds/sample1.tsv,out/ras_bounds/sample2.tsv \ -ni sample1,sample2 \ -a CBS -ns 500 -sd 0 -nb 1 \ -ot mean,median,quantiles \ -ota pFBA,FVA,sensitivity \ -idop out/flux # 5) Enrichment + map styling (RAS/RPS or fluxes) marea \ -td $(pwd) \ -using_RAS true -input_data out/ras.tsv \ -comparison manyvsmany -test ks \ -generate_svg true -generate_pdf true \ -choice_map ENGRO2 -idop out/maps ``` ## Tools and usage Below is a high‑level summary of each CLI. Use `--help` for the full list of options. ### 1) custom_data_generator Generate model‑derived assets. Required inputs: - `-id/--input`: model file (XML or JSON; gz/zip/bz2 also supported via extension) - `-mn/--name`: the original file name including extension (Galaxy renames files; this preserves the true format) - `-orules`, `-orxns`, `-omedium`, `-obnds`: output paths Outputs: - TSV with rules, reactions, exchange medium, and bounds. ### 2) ras_generator (Reaction Activity Scores) Compute RAS from a gene expression table. Key inputs: - `-td/--tool_dir`: repository root path (used to locate `local/` assets) - `-in/--input`: expression TSV (rows: genes; columns: samples) - `-rs/--rules_selector`: model/rules choice, e.g. `ENGRO2` or `Custom` with `-rl` and `-rn` - Optional: `-rl/--rule_list` custom rules TSV, `-rn/--rules_name` its original name/extension - Output: `-ra/--ras_output` TSV ### 3) rps_generator (Reaction Propensity Scores) Compute RPS from a metabolite abundance table. Key inputs: - `-td/--tool_dir`: repository root - `-id/--input`: metabolite TSV (rows: metabolites; columns: samples) - `-rc/--reaction_choice`: `default` or `custom` with `-cm/--custom` reactions TSV - Output: `-rp/--rps_output` TSV ### 4) ras_to_bounds Integrate RAS into reaction bounds for a given model and medium. Key inputs: - `-td/--tool_dir`: repository root - `-ms/--model_selector`: one of `ENGRO2` or `Custom` with `-mo/--model` and `-mn/--model_name` - Medium: `-mes/--medium_selector` (default `allOpen`) or `-meo/--medium` custom TSV - RAS: `-ir/--input_ras` and `-rs/--ras_selector` (true/false) - Output folder: `-idop/--output_path` Outputs: - One bounds TSV per sample in the RAS table. ### 5) flux_simulation Flux sampling with CBS or OPTGP and downstream statistics. Key inputs: - `-td/--tool_dir` - Model: `-ms/--model_selector` (ENGRO2 or Custom with `-mo`/`-mn`) - Bounds files: `-in` (comma‑separated list) and `-ni/--names` (comma‑separated sample names) - Algorithm: `-a CBS|OPTGP`; CBS uses GLPK if available and falls back to a COBRApy interface - Sampling params: `-ns/--n_samples`, `-th/--thinning` (OPTGP), `-nb/--n_batches`, `-sd/--seed` - Outputs: `-ot/--output_type` (mean,median,quantiles) and `-ota/--output_type_analysis` (pFBA,FVA,sensitivity) - Output path: `-idop/--output_path` Outputs: - Per‑sample or aggregated CSV/TSV with flux samples and statistics. ### 6) marea Statistical enrichment and map styling for RAS and/or RPS groups with optional DESeq2‑style testing via `pydeseq2`. Key inputs: - `-td/--tool_dir` - Comparison: `-co manyvsmany|onevsrest|onevsmany` - Test: `-te ks|ttest_p|ttest_ind|wilcoxon|mw|DESeq` - Thresholds: `-pv`, `-adj` (FDR), `-fc` - Data: RAS `-using_RAS` plus `-input_data` or multiple datasets with names; similarly for RPS with `-using_RPS` - Map: `-choice_map HMRcore|ENGRO2|Custom` or `-custom_map` SVG - Output: `-gs/--generate_svg`, `-gp/--generate_pdf`, output dir `-idop` Outputs: - Styled SVG (and optional PDF/PNG) highlighting enriched reactions by color/width per your thresholds. ### 7) flux_to_map Like `marea`, but driven by fluxes instead of RAS/RPS. Accepts single or multiple flux datasets and produces styled maps. ### 8) marea_cluster Convenience clustering utilities (k‑means, DBSCAN, hierarchical) for grouping samples; produces labels and optional plots. ## Typical workflow 1. Prepare a model and generate its assets (optional if using bundled assets): `custom_data_generator` 2. Compute RAS from expression: `ras_generator` (and/or compute RPS via `rps_generator`) 3. Integrate RAS into bounds: `ras_to_bounds` 4. Sample fluxes: `flux_simulation` with CBS or OPTGP 5. Analyze and visualize: `marea` or `flux_to_map` to render SVG/PDF metabolic maps 6. Optionally cluster or further analyze results: `marea_cluster` ## Input/output formats Unless otherwise stated, inputs are tab‑separated (TSV) text files with headers. - Expression (RAS): rows = genes (HGNC/Ensembl/symbol/Entrez supported), columns = samples - Metabolite table (RPS): rows = metabolites, columns = samples - Rules/Reactions: TSV with two columns: ReactionID, Rule/Reaction - Bounds: TSV with index = reaction IDs, columns = lower_bound, upper_bound - Medium: single‑column TSV listing exchange reactions - Flux samples/statistics: CSV/TSV with reactions as rows and samples/statistics as columns ## Galaxy usage Each CLI has a corresponding Galaxy tool XML in the repository (e.g., `marea.xml`, `flux_simulation.xml`). Use `shed.yml` to publish to a Galaxy toolshed. The `local/` directory provides models, mappings, and maps for out‑of‑the‑box runs inside Galaxy. ## Troubleshooting - GLPK/CBS issues: if `swiglpk` or GLPK is missing, `flux_simulation` will attempt a COBRApy fallback. Install GLPK + `swiglpk` for best performance. - pyvips errors: install `libvips` on your system. Reinstall the `pyvips` wheel afterward if needed. - PDF/SVG conversions: ensure `cairosvg`, `svglib`, and system libraries (`libxml2`, `libxslt`) are installed. - Python version: stick to Python ≥3.8.20 and <3.12. - Memory/time: reduce `-ns` (samples) or `-nb` (batches); consider OPTGP if CBS is slow for your model. ## Contributing Pull requests are welcome. Please: - keep changes focused and documented, - add concise docstrings/comments in English, - preserve public CLI parameters and file formats. ## License and citations This project is distributed under the MIT License. If you use COBRAxy in academic work, please cite COBRApy and MaREA, and reference this repository. ## Useful links - COBRAxy Google Summer of Code 2024: https://summerofcode.withgoogle.com/programs/2024/projects/LSrCKfq7 - COBRApy: https://opencobra.github.io/cobrapy/ - MaREA4Galaxy: https://galaxyproject.org/use/marea4galaxy/ - Galaxy project: https://usegalaxy.org/
