Mercurial > repos > bimib > cobraxy
changeset 550:4cf00f21f609 draft default tip
Uploaded
| author | francesco_lapi | 
|---|---|
| date | Mon, 03 Nov 2025 14:49:49 +0000 | 
| parents | 4c5fdcefce8e | 
| children | |
| files | COBRAxy/README.md COBRAxy/docs/_media/logoBLACK_GALAXY.png COBRAxy/docs/_media/logoBLACK_PURPLE.png COBRAxy/docs/_media/logoWHITE_GALAXY.png COBRAxy/docs/_media/logoWHITE_PURPLE.png COBRAxy/docs/reference/built-in-models.md COBRAxy/docs/troubleshooting.md COBRAxy/docs/tutorials/README.md COBRAxy/src/flux_to_map.xml COBRAxy/src/importMetabolicModel.xml COBRAxy/src/marea.xml | 
| diffstat | 11 files changed, 63 insertions(+), 310 deletions(-) [+] | 
line wrap: on
 line diff
--- a/COBRAxy/README.md Wed Oct 29 11:09:38 2025 +0000 +++ b/COBRAxy/README.md Mon Nov 03 14:49:49 2025 +0000 @@ -1,7 +1,7 @@ <p align="center"> <picture> - <source media="(prefers-color-scheme: dark)" srcset="docs/_media/logo-dark.png"> - <source media="(prefers-color-scheme: light)" srcset="docs/_media/logo-light.png"> + <source media="(prefers-color-scheme: dark)" srcset="docs/_media/logoWHITE_GALAXY.png"> + <source media="(prefers-color-scheme: light)" srcset="docs/_media/logoBLACK_GALAXY.png"> <img alt="COBRAxy Logo" src="docs/_media/logo-light.png" width="200"> </picture> </p>
--- a/COBRAxy/docs/reference/built-in-models.md Wed Oct 29 11:09:38 2025 +0000 +++ b/COBRAxy/docs/reference/built-in-models.md Mon Nov 03 14:49:49 2025 +0000 @@ -4,21 +4,17 @@ ## ENGRO2 (Recommended) -**Best for**: General metabolic analysis +**Best for**: Core metabolic analysis -- ~2,000 reactions, ~1,500 metabolites, ~500 genes -- Balanced coverage -- Core metabolic pathways well-represented -- **Use for**: Tissue profiling, disease comparisons, time-series analysis +- ~500 reactions, ~400 metabolites, ~500 genes +- Core metabolic model ## Recon (Comprehensive) **Best for**: Genome-wide studies - ~10,000 reactions, ~5,000 metabolites, ~2,000 genes -- Most complete human metabolic network -- Includes rare and specialized pathways -- **Use for**: Comprehensive studies, rare diseases +- Most complete human metabolic network, including all metabolic pathways ## Usage
--- a/COBRAxy/docs/troubleshooting.md Wed Oct 29 11:09:38 2025 +0000 +++ b/COBRAxy/docs/troubleshooting.md Mon Nov 03 14:49:49 2025 +0000 @@ -66,273 +66,42 @@ conda install -c conda-forge glpk swiglpk ``` -**Problem**: SVG processing errors -```bash -# Install libvips for image processing -# Ubuntu/Debian: sudo apt-get install libvips -# macOS: brew install vips -``` - -## Data Format Issues - -### Gene Expression Problems - -**Problem**: "No computable scores" error -``` -Cause: Gene IDs don't match between data and model -Solution: -1. Check gene ID format (HGNC vs symbols vs Ensembl) -2. Verify first column contains gene identifiers -3. Ensure tab-separated format -4. Try different built-in model -``` - -**Problem**: Many "gene not found" warnings -```python -# Check gene overlap with model -import pickle -genes_dict = pickle.load(open('src/local/pickle files/ENGRO2_genes.p', 'rb')) -model_genes = set(genes_dict['hugo_id'].keys()) -import pandas as pd -data_genes = set(pd.read_csv('expression.tsv', sep='\t').iloc[:, 0]) - -overlap = len(model_genes.intersection(data_genes)) -print(f"Gene overlap: {overlap}/{len(data_genes)} ({overlap/len(data_genes)*100:.1f}%)") -``` - -**Problem**: File format not recognized -```tsv -# Correct format - tab-separated: -Gene_ID Sample_1 Sample_2 -HGNC:5 10.5 11.2 -HGNC:10 3.2 4.1 +## Galaxy Tool Issues -# Wrong - comma-separated or spaces will fail -``` - -### Model Issues - -**Problem**: Custom model not loading -``` -Solution: -1. Check TSV format with "GPR" column header -2. Verify reaction IDs are unique -3. Test GPR syntax (use 'and'/'or', proper parentheses) -4. Check file permissions and encoding (UTF-8) -``` - -## Tool Execution Errors - +### Import Metabolic Model - -### File Path Problems - -**Problem**: "File not found" errors -```python -# Use absolute paths -from pathlib import Path - -input_file = str(Path('expression.tsv').absolute()) - -args = ['-in', input_file, ...] -``` - -**Problem**: Permission denied +**Error message**: ```bash -# Check write permissions -ls -la output_directory/ - -# Fix permissions -chmod 755 output_directory/ -chmod 644 input_files/* -``` - -### Galaxy Integration Issues - -**Problem**: COBRAxy tools not appearing in Galaxy -```xml -<!-- Check tool_conf.xml syntax --> -<section id="cobraxy" name="COBRAxy"> - <tool file="cobraxy/ras_generator.xml" /> -</section> - -<!-- Verify file paths are correct --> -ls tools/cobraxy/ras_generator.xml -``` - -**Problem**: Tool execution fails in Galaxy -``` -Check Galaxy logs: -- main.log: General Galaxy issues -- handler.log: Job execution problems -- uwsgi.log: Web server issues - -Common fixes: -1. Restart Galaxy after adding tools -2. Check Python environment has COBRApy installed -3. Verify file permissions on tool files -``` - - - -**Problem**: Flux sampling hangs -```bash -# Check solver availability -python -c "import cobra; print(cobra.Configuration().solver)" - -# Should show: glpk, cplex, or gurobi -# Install GLPK if missing: -pip install swiglpk +Traceback (most recent call last): + File "/export/tool_deps/_conda/envs/mulled-v1-d3fef6bda7daedb89425f527672b54ab0a4be6cfe3c8725b7f8c0948e0c80773/lib/python3.11/site-packages/cobra/io/sbml.py", line 458, in read_sbml_model + return _sbml_to_model(doc, number=number, f_replace=f_replace, **kwargs) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/export/tool_deps/_conda/envs/mulled-v1-d3fef6bda7daedb89425f527672b54ab0a4be6cfe3c8725b7f8c0948e0c80773/lib/python3.11/site-packages/cobra/io/sbml.py", line 563, in _sbml_to_model + raise CobraSBMLError("No SBML model detected in file.") +cobra.io.sbml.CobraSBMLError: No SBML model detected in file. ``` -### Large Dataset Handling +**Meaning:** +The Import Metabolic Model tool cannot read the input file as a valid SBML model with FBC annotations. -**Problem**: Cannot process large expression matrices -```python -# Process in chunks -def process_large_dataset(expression_file, chunk_size=1000): - df = pd.read_csv(expression_file, sep='\t') - - for i in range(0, len(df), chunk_size): - chunk = df.iloc[i:i+chunk_size] - chunk_file = f'chunk_{i}.tsv' - chunk.to_csv(chunk_file, sep='\t', index=False) - - # Process chunk - ras_generator.main(['-in', chunk_file, ...]) -``` - -## Output Validation - -### Unexpected Results - -**Problem**: All RAS values are zero or null -```python -# Debug gene mapping -import pandas as pd -ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0) +**Suggested Action:** +Verify that the input XML file is in proper SBML format and includes all necessary FBC annotations. -# Check data quality -print(f"Null percentage: {ras_df.isnull().sum().sum() / ras_df.size * 100:.1f}%") -print(f"Zero percentage: {(ras_df == 0).sum().sum() / ras_df.size * 100:.1f}%") -# Check expression data preprocessing -expr_df = pd.read_csv('expression.tsv', sep='\t', index_col=0) -print(f"Expression range: {expr_df.min().min():.2f} to {expr_df.max().max():.2f}") -``` +### Flux simulation -**Problem**: RAS values seem too high/low -``` -Possible causes: -1. Expression data not log-transformed -2. Wrong normalization method -3. Incorrect gene ID mapping -4. GPR rule interpretation issues - -Solutions: -1. Check expression data preprocessing -2. Validate against known control genes -3. Compare with published metabolic activity patterns -``` - -### Missing Pathway Maps - -**Problem**: MAREA generates no output maps -``` -Debug steps: -1. Check RAS input has non-null values -2. Verify model choice matches RAS generation -3. Check statistical significance thresholds -4. Look at log files for specific errors +**Error message**: +```bash +Execution aborted: wrong format of bounds dataset ``` -## Environment Issues - -### Conda/Virtual Environment Problems - -**Problem**: Tool import fails in virtual environment -```bash -# Activate environment properly -source venv/bin/activate # Linux/macOS -# or -venv\Scripts\activate # Windows - -# Verify COBRAxy installation -pip list | grep cobra -python -c "import cobra; print('COBRApy version:', cobra.__version__)" -``` - -**Problem**: Version conflicts -```bash -# Create clean environment -conda create -n cobraxy python=3.9 -conda activate cobraxy - -# Install COBRAxy fresh -cd COBRAxy/src -pip install -e . -``` - -### Cross-Platform Issues - -**Problem**: Windows path separator issues -```python -# Use pathlib for cross-platform paths -from pathlib import Path - -# Instead of: '/path/to/file' -# Use: str(Path('path') / 'to' / 'file') -``` +**Meaning:** +Flux simulation cannot read the bounds of the metabolic model for the constrained simulation problem (optimization or sampling). +This usually happens if the input “Bound file(s): *” is incorrect. For example, it occurs when the **RasToBounds - Cell Class** file is passed instead of the collection of bound files named **"RAS to bounds"**. -**Problem**: Line ending issues (Windows/Unix) -```bash -# Convert line endings if needed -dos2unix input_file.tsv # Unix -unix2dos input_file.tsv # Windows -``` - -## Debugging Strategies - -### Enable Detailed Logging - -```python -import logging -logging.basicConfig(level=logging.DEBUG) - -# Many tools accept log file parameter -args = [..., '--out_log', 'detailed.log'] -``` - -### Test with Small Datasets - -```python -# Create minimal test case -test_data = """Gene_ID Sample1 Sample2 -HGNC:5 10.0 15.0 -HGNC:10 5.0 8.0""" - -with open('test_input.tsv', 'w') as f: - f.write(test_data) - -# Test basic functionality -ras_generator.main(['-in', 'test_input.tsv', - '-ra', 'test_output.tsv', '-rs', 'ENGRO2']) -``` - -### Check Dependencies - -```python -# Verify all required packages -required_packages = ['cobra', 'pandas', 'numpy', 'scipy'] - -for package in required_packages: - try: - __import__(package) - print(f"✓ {package}") - except ImportError: - print(f"✗ {package} - MISSING") -``` +**Suggested Action:** +Check the input files and ensure the correct bounds collection is used. ## Getting Help @@ -368,38 +137,5 @@ - Tried alternative models/parameters - Checked file formats and permissions -## Prevention Tips -### Best Practices - -1. **Use virtual environments** to avoid conflicts -2. **Validate input data** before processing -3. **Start with small datasets** for testing -4. **Keep backups** of working configurations -5. **Document successful workflows** for reuse -6. **Test after updates** to catch regressions - -### Data Quality Checks - -```python -def validate_expression_data(filename): - """Validate gene expression file format.""" - df = pd.read_csv(filename, sep='\t') - - # Check basic format - assert df.shape[0] > 0, "Empty file" - assert df.shape[1] > 1, "Need at least 2 columns" - - # Check numeric data - numeric_cols = df.select_dtypes(include=[np.number]).columns - assert len(numeric_cols) > 0, "No numeric expression data" - - # Check for missing values - null_pct = df.isnull().sum().sum() / df.size * 100 - if null_pct > 50: - print(f"Warning: {null_pct:.1f}% missing values") - - print(f"✓ File valid: {df.shape[0]} genes × {df.shape[1]-1} samples") -``` - -This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages. \ No newline at end of file +This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages.
--- a/COBRAxy/docs/tutorials/README.md Wed Oct 29 11:09:38 2025 +0000 +++ b/COBRAxy/docs/tutorials/README.md Mon Nov 03 14:49:49 2025 +0000 @@ -2,13 +2,27 @@ Learn COBRAxy through hands-on tutorials for web-based analysis. -## Available Tutorials +To set up Galaxy and start using it for web-based analyses, see the [Galaxy Setup](tutorials/galaxy-setup) + +## Available Workflows + +This is a collection of GALAXY workflows illustrating different applications of the tool. +The general repository is at the following link: [Galaxy workflows](http://marea4galaxy.cloud.ba.infn.it/galaxy/workflows/list_published). + +To use a workflow, click the "Import" button, and it will be added to your personal workflow page. | Tutorial | Description | |----------|-------------| -| [Galaxy Setup](tutorials/galaxy-setup) | Set up Galaxy for web-based analysis | -| | | -| | | +|[Flux Enrichment Analysis - separated datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=a64417ff266b740e) | Creation of maps of the fluxes differently expressed between two conditions. One gene expression dataset different for each condition. | +| [Flux Enrichment Analysis (sampling mean) - separated datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=16e792953f5b45db) | Creation of maps of the fluxes differently expressed between two conditions. One gene expression dataset different for each condition. | +| [Flux clustering (sampling mean) + Flux Enrichment Analys](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=c851ab275e52f8af) | Creation of maps of the fluxes, using one dataset differently expressed for each condition and its sample group specification| +| [Flux Enrichment Analysis (pFBA) - separated datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=bf0806da5b28c6d9) | Creation of maps of the fluxes differently expressed between two conditions. One gene expression dataset different for each condition. | +| [Flux clustering (pFBA) + Flux Enrichment Analysis](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=be0a27b9edd0db03) | Creation of maps of the fluxes, using one dataset differently expressed for each condition and its sample group specification | +| [RAS clustering + Reaction Enrichment Analysis](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=81991b32733a4fc4) | Creation of RAS maps, one single expression gene dataset and its sample group specification | +| [Reaction Enrichment Analysis - unified datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=0d16186aaff7cbfd) |Creation of RAS maps starting from an expression dataset and its corresponding classes. One gene expression dataset as input and its classes to compare. | +| [Reaction Enrichment Analysis - separated datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=290670ee50ab85f0) | Creation of RAS maps using the tool MaREA. Confrontation of two datasets that must be different from one another. | + +A more detailed description of the tools is available on the corresponding GALAXY page. ## Tutorial Data
--- a/COBRAxy/src/flux_to_map.xml Wed Oct 29 11:09:38 2025 +0000 +++ b/COBRAxy/src/flux_to_map.xml Mon Nov 03 14:49:49 2025 +0000 @@ -248,11 +248,13 @@ +---------------+-----------+ -**TIP**: If your dataset is not split into classes, use MaREA cluster analysis. - +**TIP**: If the user provides just one dataset for analysis: + -Use the Cluster Analysis tool to assign group labels when no prior division information is available + -provide an external group file specifying the assignment of each sample, if the group division is known a priori ]]> </help> <expand macro="citations" /> </tool> +
--- a/COBRAxy/src/importMetabolicModel.xml Wed Oct 29 11:09:38 2025 +0000 +++ b/COBRAxy/src/importMetabolicModel.xml Mon Nov 03 14:49:49 2025 +0000 @@ -74,7 +74,7 @@ <!-- Custom model --> <when value="Custom_model"> - <param name="input" argument="--input" type="data" format="json,xml,sbml" label="Custom model file:" /> + <param name="input" argument="--input" type="data" format="sbml,json,mat,yaml" label="Custom model file:" /> <conditional name="cond_medium"> <param name="medium_selector" argument="--medium_selector" type="select" label="Medium"> <option value="Default" selected="true">Default (custom model medium)</option> @@ -136,7 +136,7 @@ - one tabular file (.tabular) containing reaction IDs, reaction formula, GPR rules, reaction bounds, objective function coefficients, pathways in which the reaction is involved and a flag indicating whether the reaction is an exchange reaction (i.e., related to the growth medium). - a log file (.txt). -**TIP 1**: Different input files can be used as the input model. The possible formats are XML (SBML), JSON, MAT or YAML (.yml). +**TIP 1**: Different input files can be used as the input model. The possible formats are SBML, JSON, MAT or YAML (.yml). Supported compressed formats: .zip, .gz and .bz2. Filename must follow the pattern: {model_name}.{extension}.[zip|gz|bz2] More detail can be found at https://cobrapy.readthedocs.io/en/latest/io.html @@ -153,3 +153,5 @@ </tool> + +
--- a/COBRAxy/src/marea.xml Wed Oct 29 11:09:38 2025 +0000 +++ b/COBRAxy/src/marea.xml Mon Nov 03 14:49:49 2025 +0000 @@ -212,7 +212,7 @@ <help> <![CDATA[ -What it does +Overview ------------- This tool analyzes and visualizes differences in the Reaction Activity Scores (RASs) of groups of samples, as computed by the Expression2RAS tool, of groups of samples. @@ -220,7 +220,7 @@ Accepted files are: - option 1) two or more RAS datasets, each referring to samples in a given group. The user can specify a label for each group (as e.g. "classA" and "classB"); - option 2) one RAS dataset and one group-file specifying the group each sample belongs to. - + RAS datasets format: tab-separated text files, reporting the RAS value of each reaction (row) for a given sample (column). Column header: sample ID. @@ -325,7 +325,9 @@ .. class:: infomark -**TIP**: If your dataset is not split into classes, use MaREA cluster analysis. +**TIP**: If the user provide just one dataset for analysis: + -Use the Cluster Analysis tool to assign group labels when no prior division information is available + -provide an external group file specifying the assignment of each sample, if the group division is known a priori .. class:: infomark @@ -340,4 +342,5 @@ ]]> </help> <expand macro="citations" /> -</tool> \ No newline at end of file + +</tool>
