changeset 550:4cf00f21f609 draft default tip

Uploaded
author francesco_lapi
date Mon, 03 Nov 2025 14:49:49 +0000
parents 4c5fdcefce8e
children
files COBRAxy/README.md COBRAxy/docs/_media/logoBLACK_GALAXY.png COBRAxy/docs/_media/logoBLACK_PURPLE.png COBRAxy/docs/_media/logoWHITE_GALAXY.png COBRAxy/docs/_media/logoWHITE_PURPLE.png COBRAxy/docs/reference/built-in-models.md COBRAxy/docs/troubleshooting.md COBRAxy/docs/tutorials/README.md COBRAxy/src/flux_to_map.xml COBRAxy/src/importMetabolicModel.xml COBRAxy/src/marea.xml
diffstat 11 files changed, 63 insertions(+), 310 deletions(-) [+]
line wrap: on
line diff
--- a/COBRAxy/README.md	Wed Oct 29 11:09:38 2025 +0000
+++ b/COBRAxy/README.md	Mon Nov 03 14:49:49 2025 +0000
@@ -1,7 +1,7 @@
 <p align="center">
   <picture>
-    <source media="(prefers-color-scheme: dark)" srcset="docs/_media/logo-dark.png">
-    <source media="(prefers-color-scheme: light)" srcset="docs/_media/logo-light.png">
+    <source media="(prefers-color-scheme: dark)" srcset="docs/_media/logoWHITE_GALAXY.png">
+    <source media="(prefers-color-scheme: light)" srcset="docs/_media/logoBLACK_GALAXY.png">
     <img alt="COBRAxy Logo" src="docs/_media/logo-light.png" width="200">
   </picture>
 </p>
Binary file COBRAxy/docs/_media/logoBLACK_GALAXY.png has changed
Binary file COBRAxy/docs/_media/logoBLACK_PURPLE.png has changed
Binary file COBRAxy/docs/_media/logoWHITE_GALAXY.png has changed
Binary file COBRAxy/docs/_media/logoWHITE_PURPLE.png has changed
--- a/COBRAxy/docs/reference/built-in-models.md	Wed Oct 29 11:09:38 2025 +0000
+++ b/COBRAxy/docs/reference/built-in-models.md	Mon Nov 03 14:49:49 2025 +0000
@@ -4,21 +4,17 @@
 
 ## ENGRO2 (Recommended)
 
-**Best for**: General metabolic analysis
+**Best for**: Core metabolic analysis
 
-- ~2,000 reactions, ~1,500 metabolites, ~500 genes
-- Balanced coverage
-- Core metabolic pathways well-represented
-- **Use for**: Tissue profiling, disease comparisons, time-series analysis
+- ~500 reactions, ~400 metabolites, ~500 genes
+- Core metabolic model
 
 ## Recon (Comprehensive)
 
 **Best for**: Genome-wide studies
 
 - ~10,000 reactions, ~5,000 metabolites, ~2,000 genes
-- Most complete human metabolic network
-- Includes rare and specialized pathways
-- **Use for**: Comprehensive studies, rare diseases
+- Most complete human metabolic network, including all metabolic pathways
 
 ## Usage
 
--- a/COBRAxy/docs/troubleshooting.md	Wed Oct 29 11:09:38 2025 +0000
+++ b/COBRAxy/docs/troubleshooting.md	Mon Nov 03 14:49:49 2025 +0000
@@ -66,273 +66,42 @@
 conda install -c conda-forge glpk swiglpk
 ```
 
-**Problem**: SVG processing errors
-```bash
-# Install libvips for image processing
-# Ubuntu/Debian: sudo apt-get install libvips
-# macOS: brew install vips
-```
-
-## Data Format Issues
-
-### Gene Expression Problems
-
-**Problem**: "No computable scores" error
-```
-Cause: Gene IDs don't match between data and model
-Solution: 
-1. Check gene ID format (HGNC vs symbols vs Ensembl)
-2. Verify first column contains gene identifiers
-3. Ensure tab-separated format
-4. Try different built-in model
-```
-
-**Problem**: Many "gene not found" warnings
-```python
-# Check gene overlap with model
-import pickle
-genes_dict = pickle.load(open('src/local/pickle files/ENGRO2_genes.p', 'rb'))
-model_genes = set(genes_dict['hugo_id'].keys())
 
-import pandas as pd
-data_genes = set(pd.read_csv('expression.tsv', sep='\t').iloc[:, 0])
-
-overlap = len(model_genes.intersection(data_genes))
-print(f"Gene overlap: {overlap}/{len(data_genes)} ({overlap/len(data_genes)*100:.1f}%)")
-```
-
-**Problem**: File format not recognized
-```tsv
-# Correct format - tab-separated:
-Gene_ID	Sample_1	Sample_2
-HGNC:5	10.5	11.2
-HGNC:10	3.2	4.1
+## Galaxy Tool Issues
 
-# Wrong - comma-separated or spaces will fail
-```
-
-### Model Issues
-
-**Problem**: Custom model not loading
-```
-Solution:
-1. Check TSV format with "GPR" column header
-2. Verify reaction IDs are unique
-3. Test GPR syntax (use 'and'/'or', proper parentheses)
-4. Check file permissions and encoding (UTF-8)
-```
-
-## Tool Execution Errors
-
+### Import Metabolic Model
 
-
-### File Path Problems
-
-**Problem**: "File not found" errors
-```python
-# Use absolute paths
-from pathlib import Path
-
-input_file = str(Path('expression.tsv').absolute())
-
-args = ['-in', input_file, ...]
-```
-
-**Problem**: Permission denied
+**Error message**: 
 ```bash
-# Check write permissions
-ls -la output_directory/
-
-# Fix permissions
-chmod 755 output_directory/
-chmod 644 input_files/*
-```
-
-### Galaxy Integration Issues
-
-**Problem**: COBRAxy tools not appearing in Galaxy
-```xml
-<!-- Check tool_conf.xml syntax -->
-<section id="cobraxy" name="COBRAxy">
-  <tool file="cobraxy/ras_generator.xml" />
-</section>
-
-<!-- Verify file paths are correct -->
-ls tools/cobraxy/ras_generator.xml
-```
-
-**Problem**: Tool execution fails in Galaxy
-```
-Check Galaxy logs:
-- main.log: General Galaxy issues
-- handler.log: Job execution problems  
-- uwsgi.log: Web server issues
-
-Common fixes:
-1. Restart Galaxy after adding tools
-2. Check Python environment has COBRApy installed
-3. Verify file permissions on tool files
-```
-
-
-
-**Problem**: Flux sampling hangs
-```bash
-# Check solver availability
-python -c "import cobra; print(cobra.Configuration().solver)"
-
-# Should show: glpk, cplex, or gurobi
-# Install GLPK if missing:
-pip install swiglpk
+Traceback (most recent call last):
+  File "/export/tool_deps/_conda/envs/mulled-v1-d3fef6bda7daedb89425f527672b54ab0a4be6cfe3c8725b7f8c0948e0c80773/lib/python3.11/site-packages/cobra/io/sbml.py", line 458, in read_sbml_model
+    return _sbml_to_model(doc, number=number, f_replace=f_replace, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/export/tool_deps/_conda/envs/mulled-v1-d3fef6bda7daedb89425f527672b54ab0a4be6cfe3c8725b7f8c0948e0c80773/lib/python3.11/site-packages/cobra/io/sbml.py", line 563, in _sbml_to_model
+    raise CobraSBMLError("No SBML model detected in file.")
+cobra.io.sbml.CobraSBMLError: No SBML model detected in file.
 ```
 
-### Large Dataset Handling
+**Meaning:**  
+The Import Metabolic Model tool cannot read the input file as a valid SBML model with FBC annotations.
 
-**Problem**: Cannot process large expression matrices
-```python
-# Process in chunks
-def process_large_dataset(expression_file, chunk_size=1000):
-    df = pd.read_csv(expression_file, sep='\t')
-    
-    for i in range(0, len(df), chunk_size):
-        chunk = df.iloc[i:i+chunk_size]
-        chunk_file = f'chunk_{i}.tsv'
-        chunk.to_csv(chunk_file, sep='\t', index=False)
-        
-        # Process chunk
-        ras_generator.main(['-in', chunk_file, ...])
-```
-
-## Output Validation
-
-### Unexpected Results
-
-**Problem**: All RAS values are zero or null
-```python
-# Debug gene mapping
-import pandas as pd
-ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
+**Suggested Action:**  
+Verify that the input XML file is in proper SBML format and includes all necessary FBC annotations.
 
-# Check data quality
-print(f"Null percentage: {ras_df.isnull().sum().sum() / ras_df.size * 100:.1f}%")
-print(f"Zero percentage: {(ras_df == 0).sum().sum() / ras_df.size * 100:.1f}%")
 
-# Check expression data preprocessing
-expr_df = pd.read_csv('expression.tsv', sep='\t', index_col=0)
-print(f"Expression range: {expr_df.min().min():.2f} to {expr_df.max().max():.2f}")
-```
+### Flux simulation 
 
-**Problem**: RAS values seem too high/low
-```
-Possible causes:
-1. Expression data not log-transformed
-2. Wrong normalization method
-3. Incorrect gene ID mapping
-4. GPR rule interpretation issues
-
-Solutions:
-1. Check expression data preprocessing
-2. Validate against known control genes
-3. Compare with published metabolic activity patterns
-```
-
-### Missing Pathway Maps
-
-**Problem**: MAREA generates no output maps
-```
-Debug steps:
-1. Check RAS input has non-null values
-2. Verify model choice matches RAS generation
-3. Check statistical significance thresholds
-4. Look at log files for specific errors
+**Error message**: 
+```bash
+Execution aborted: wrong format of bounds dataset
 ```
 
-## Environment Issues
-
-### Conda/Virtual Environment Problems
-
-**Problem**: Tool import fails in virtual environment
-```bash
-# Activate environment properly
-source venv/bin/activate  # Linux/macOS
-# or
-venv\Scripts\activate  # Windows
-
-# Verify COBRAxy installation
-pip list | grep cobra
-python -c "import cobra; print('COBRApy version:', cobra.__version__)"
-```
-
-**Problem**: Version conflicts
-```bash
-# Create clean environment
-conda create -n cobraxy python=3.9
-conda activate cobraxy
-
-# Install COBRAxy fresh
-cd COBRAxy/src
-pip install -e .
-```
-
-### Cross-Platform Issues
-
-**Problem**: Windows path separator issues
-```python
-# Use pathlib for cross-platform paths
-from pathlib import Path
-
-# Instead of: '/path/to/file'  
-# Use: str(Path('path') / 'to' / 'file')
-```
+**Meaning:**  
+Flux simulation cannot read the bounds of the metabolic model for the constrained simulation problem (optimization or sampling).  
+This usually happens if the input “Bound file(s): *” is incorrect. For example, it occurs when the **RasToBounds - Cell Class** file is passed instead of the collection of bound files named **"RAS to bounds"**.
 
-**Problem**: Line ending issues (Windows/Unix)
-```bash
-# Convert line endings if needed
-dos2unix input_file.tsv  # Unix
-unix2dos input_file.tsv  # Windows
-```
-
-## Debugging Strategies
-
-### Enable Detailed Logging
-
-```python
-import logging
-logging.basicConfig(level=logging.DEBUG)
-
-# Many tools accept log file parameter
-args = [..., '--out_log', 'detailed.log']
-```
-
-### Test with Small Datasets
-
-```python
-# Create minimal test case
-test_data = """Gene_ID	Sample1	Sample2
-HGNC:5	10.0	15.0
-HGNC:10	5.0	8.0"""
-
-with open('test_input.tsv', 'w') as f:
-    f.write(test_data)
-
-# Test basic functionality
-ras_generator.main(['-in', 'test_input.tsv', 
-                   '-ra', 'test_output.tsv', '-rs', 'ENGRO2'])
-```
-
-### Check Dependencies
-
-```python
-# Verify all required packages
-required_packages = ['cobra', 'pandas', 'numpy', 'scipy']
-
-for package in required_packages:
-    try:
-        __import__(package)
-        print(f"✓ {package}")
-    except ImportError:
-        print(f"✗ {package} - MISSING")
-```
+**Suggested Action:**  
+Check the input files and ensure the correct bounds collection is used.
 
 ## Getting Help
 
@@ -368,38 +137,5 @@
 - Tried alternative models/parameters
 - Checked file formats and permissions
 
-## Prevention Tips
 
-### Best Practices
-
-1. **Use virtual environments** to avoid conflicts
-2. **Validate input data** before processing
-3. **Start with small datasets** for testing
-4. **Keep backups** of working configurations
-5. **Document successful workflows** for reuse
-6. **Test after updates** to catch regressions
-
-### Data Quality Checks
-
-```python
-def validate_expression_data(filename):
-    """Validate gene expression file format."""
-    df = pd.read_csv(filename, sep='\t')
-    
-    # Check basic format
-    assert df.shape[0] > 0, "Empty file"
-    assert df.shape[1] > 1, "Need at least 2 columns"
-    
-    # Check numeric data  
-    numeric_cols = df.select_dtypes(include=[np.number]).columns
-    assert len(numeric_cols) > 0, "No numeric expression data"
-    
-    # Check for missing values
-    null_pct = df.isnull().sum().sum() / df.size * 100
-    if null_pct > 50:
-        print(f"Warning: {null_pct:.1f}% missing values")
-    
-    print(f"✓ File valid: {df.shape[0]} genes × {df.shape[1]-1} samples")
-```
-
-This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages.
\ No newline at end of file
+This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages.
--- a/COBRAxy/docs/tutorials/README.md	Wed Oct 29 11:09:38 2025 +0000
+++ b/COBRAxy/docs/tutorials/README.md	Mon Nov 03 14:49:49 2025 +0000
@@ -2,13 +2,27 @@
 
 Learn COBRAxy through hands-on tutorials for web-based analysis.
 
-## Available Tutorials
+To set up Galaxy and start using it for web-based analyses, see the [Galaxy Setup](tutorials/galaxy-setup)
+
+## Available Workflows
+
+This is a collection of GALAXY workflows illustrating different applications of the tool.
+The general repository is at the following link: [Galaxy workflows](http://marea4galaxy.cloud.ba.infn.it/galaxy/workflows/list_published). 
+
+To use a workflow, click the "Import" button, and it will be added to your personal workflow page.
 
 | Tutorial | Description |
 |----------|-------------|
-| [Galaxy Setup](tutorials/galaxy-setup) | Set up Galaxy for web-based analysis |
-|  |  |
-|  |  |
+|[Flux Enrichment Analysis - separated datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=a64417ff266b740e) | Creation of maps of the fluxes differently expressed between two conditions. One gene expression dataset different for each condition. |
+| [Flux Enrichment Analysis (sampling mean) - separated datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=16e792953f5b45db) |  Creation of maps of the fluxes differently expressed between two conditions. One gene expression dataset different for each condition. |
+| [Flux clustering (sampling mean) + Flux Enrichment Analys](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=c851ab275e52f8af) | Creation of maps of the fluxes, using one dataset differently expressed for each condition and its sample group specification|
+| [Flux Enrichment Analysis (pFBA) - separated datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=bf0806da5b28c6d9) | Creation of maps of the fluxes differently expressed between two conditions. One gene expression dataset different for each condition. |
+| [Flux clustering (pFBA) + Flux Enrichment Analysis](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=be0a27b9edd0db03) | Creation of maps of the fluxes, using one dataset differently expressed for each condition and its sample group specification |
+| [RAS clustering + Reaction Enrichment Analysis](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=81991b32733a4fc4) | Creation of RAS maps, one single expression gene dataset and its sample group specification |
+| [Reaction Enrichment Analysis - unified datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=0d16186aaff7cbfd) |Creation of RAS maps starting from an expression dataset and its corresponding classes. One gene expression dataset as input and its classes to compare. |
+| [Reaction Enrichment Analysis - separated datasets](http://marea4galaxy.cloud.ba.infn.it/galaxy/published/workflow?id=290670ee50ab85f0) | Creation of RAS maps using the tool MaREA. Confrontation of two datasets that must be different from one another. |
+
+A more detailed description of the tools is available on the corresponding GALAXY page.
 
 ## Tutorial Data
 
--- a/COBRAxy/src/flux_to_map.xml	Wed Oct 29 11:09:38 2025 +0000
+++ b/COBRAxy/src/flux_to_map.xml	Mon Nov 03 14:49:49 2025 +0000
@@ -248,11 +248,13 @@
 +---------------+-----------+
 
 
-**TIP**: If your dataset is not split into classes, use MaREA cluster analysis.
-
+**TIP**: If the user provides just one dataset for analysis:
+		-Use the Cluster Analysis tool to assign group labels when no prior division information is available
+		-provide an external group file specifying the assignment of each sample, if the group division is known a priori
 
 ]]>
 	</help>
 	<expand macro="citations" />
 
 </tool>
+
--- a/COBRAxy/src/importMetabolicModel.xml	Wed Oct 29 11:09:38 2025 +0000
+++ b/COBRAxy/src/importMetabolicModel.xml	Mon Nov 03 14:49:49 2025 +0000
@@ -74,7 +74,7 @@
 
             <!-- Custom model -->
             <when value="Custom_model">
-                <param name="input" argument="--input" type="data" format="json,xml,sbml" label="Custom model file:" />
+                <param name="input" argument="--input" type="data" format="sbml,json,mat,yaml" label="Custom model file:" />
                 <conditional name="cond_medium">
                     <param name="medium_selector" argument="--medium_selector" type="select" label="Medium">
                         <option value="Default" selected="true">Default (custom model medium)</option>
@@ -136,7 +136,7 @@
 	- one tabular file (.tabular) containing reaction IDs, reaction formula, GPR rules, reaction bounds, objective function coefficients, pathways in which the reaction is involved and a flag indicating whether the reaction is an exchange reaction (i.e., related to the growth medium).
     - a log file (.txt).
 
-**TIP 1**: Different input files can be used as the input model. The possible formats are XML (SBML), JSON, MAT or YAML (.yml). 
+**TIP 1**: Different input files can be used as the input model. The possible formats are SBML, JSON, MAT or YAML (.yml). 
     Supported compressed formats: .zip, .gz and .bz2. Filename must follow the pattern: {model_name}.{extension}.[zip|gz|bz2]
     More detail can be found at https://cobrapy.readthedocs.io/en/latest/io.html
 
@@ -153,3 +153,5 @@
 </tool>
 
 
+
+
--- a/COBRAxy/src/marea.xml	Wed Oct 29 11:09:38 2025 +0000
+++ b/COBRAxy/src/marea.xml	Mon Nov 03 14:49:49 2025 +0000
@@ -212,7 +212,7 @@
 	<help>
 	<![CDATA[
 
-What it does
+Overview
 -------------
 
 This tool analyzes and visualizes differences in the Reaction Activity Scores (RASs) of groups of samples, as computed by the Expression2RAS tool, of groups of samples.
@@ -220,7 +220,7 @@
 Accepted files are: 
     - option 1) two or more RAS datasets, each referring to samples in a given group. The user can specify a label for each group (as e.g. "classA" and "classB");
     - option 2) one RAS dataset and one group-file specifying the group each sample belongs to.
-    
+
 RAS datasets format: tab-separated text files, reporting the RAS value of each reaction (row) for a given sample (column).
 
 Column header: sample ID.
@@ -325,7 +325,9 @@
 
 .. class:: infomark
 
-**TIP**: If your dataset is not split into classes, use MaREA cluster analysis.
+**TIP**: If the user provide just one dataset for analysis:
+		-Use the Cluster Analysis tool to assign group labels when no prior division information is available
+		-provide an external group file specifying the assignment of each sample, if the group division is known a priori
 
 .. class:: infomark
 
@@ -340,4 +342,5 @@
 ]]>
 	</help>
 	<expand macro="citations" />
-</tool>
\ No newline at end of file
+
+</tool>