diff COBRAxy/docs/tools/ras-generator.md @ 547:73f2f7e2be17 draft

Uploaded
author francesco_lapi
date Tue, 28 Oct 2025 10:44:07 +0000
parents fcdbc81feb45
children
line wrap: on
line diff
--- a/COBRAxy/docs/tools/ras-generator.md	Mon Oct 27 12:33:08 2025 +0000
+++ b/COBRAxy/docs/tools/ras-generator.md	Tue Oct 28 10:44:07 2025 +0000
@@ -1,294 +1,107 @@
 # RAS Generator
 
-Generate Reaction Activity Scores (RAS) from gene expression data and GPR (Gene-Protein-Reaction) rules.
+Compute Reaction Activity Scores (RAS) from gene expression data.
 
 ## Overview
 
-The RAS Generator computes metabolic reaction activity by:
-1. Mapping gene expression to reactions via GPR rules
-2. Applying logical operations (AND/OR) for enzyme complexes
-3. Producing activity scores for each reaction in each sample
+RAS Generator computes reaction activity scores by evaluating GPR rules with gene expression values.
+
+## Galaxy Interface
+
+In Galaxy: **COBRAxy → Expression2RAS**
 
-**Input**: Gene expression data + GPR rules  
-**Output**: Reaction activity scores (RAS)
+1. Select built-in model or upload custom GPR rules
+2. Upload gene expression data
+3. Click **Run tool**
+
+## Command-line console
+
+```bash
+ras_generator -rs ENGRO2 \
+              -in expression_data.tsv \
+              -ra ras_scores.tsv \
+              -ol ras_generation.log
+```
 
 ## Parameters
 
-### Required Parameters
-
-| Parameter | Short | Type | Description |
-|-----------|--------|------|-------------|
-| `--input` | `-in` | file | Gene expression dataset (TSV format) |
-| `--ras_output` | `-ra` | file | Output file for RAS values |
-| `--rules_selector` | `-rs` | choice | Built-in model (ENGRO2, Recon, HMRcore) |
-
-### Optional Parameters
-
-| Parameter | Short | Type | Default | Description |
-|-----------|--------|------|---------|-------------|
-| `--tool_dir` | `-td` | string | auto-detected | COBRAxy installation directory (automatically detected after pip install) |
-| `--none` | `-n` | boolean | true | Handle missing gene values |
-| `--model_upload` | `-rl` | file | - | Custom GPR rules file |
-| `--model_upload_name` | `-rn` | string | - | Custom model name |
-| `--out_log` | - | file | log.txt | Output log file |
-
-> **Note**: After installing COBRAxy via pip, the `--tool_dir` parameter is automatically detected and doesn't need to be specified.
+| Parameter | Flag | Description | Default |
+|-----------|------|-------------|---------|
+| Rules Selector | `-rs` | ENGRO2, Recon, or Custom | ENGRO2 |
+| Input Data | `-in` | Gene expression TSV file | - |
+| Output RAS | `-ra` | Output RAS scores file | - |
+| Output Log | `-ol` | Log file | - |
+| Custom Rules | `-rl` | Custom GPR rules file | - |
+| Gene Names | `-gn` | Gene ID type | HGNC_Symbol |
+| Remove Gene | `-rg` | Remove missing genes | true |
+| Ignore NaN | `--none` | Handle missing gene expression | true |
 
 ## Input Format
 
-### Gene Expression File
-```tsv
-Gene_ID	Sample_1	Sample_2	Sample_3	Sample_4
-HGNC:5	10.5	11.2	15.7	14.3
-HGNC:10	3.2	4.1	8.8	7.9
-HGNC:15	7.9	8.2	4.4	5.1
-HGNC:25	12.1	13.5	18.2	17.8
+Gene expression file (TSV):
+
+```
+Gene	Sample1	Sample2	Sample3
+ALDOA	125.5	98.3	142.7
+ENO1	85.2	110.4	95.8
+PFKM	200.3	185.6	210.1
 ```
 
-**Requirements**:
-- First column: Gene identifiers (HGNC, Ensembl, Entrez, etc.)
-- Subsequent columns: Expression values (numeric)
-- Header row with sample names
-- Tab-separated format
+**File Format Notes:**
+- Use **tab-separated** values (TSV)
+- First row must contain column headers (Gene, Sample names)
+- Gene names must match the selected gene ID type
+- Numeric values only for expression data
 
-### Custom GPR Rules File (Optional)
-```tsv
-Reaction_ID	GPR
-R_HEX1	HGNC:4922
-R_PGI	HGNC:8906
-R_PFK	HGNC:8877 or HGNC:8878
-R_ALDOA	HGNC:414 and HGNC:417
-```
+## GPR Rules
 
-## Algorithm Details
+- **AND**: All genes required
+- **OR**: Any gene sufficient
+- Example: `(GENE1 and GENE2) or GENE3`
 
-### GPR Rule Processing
+## NaN Handling
 
-**Gene Mapping**: Each gene in the expression data is mapped to reactions via GPR rules.
+The `--none` parameter controls how missing gene expression values are treated in GPR rules:
 
-**Logical Operations**:
-- **OR**: `Gene1 or Gene2` → `expr1 + expr2`
-- **AND**: `Gene1 and Gene2` → `min(expr1, expr2)`
+**When `--none true` (default):**
+- `(GENE1 and NaN)` → evaluated as `GENE1` value
+- `(GENE1 or NaN)` → evaluated as `GENE1` value
+- Missing genes don't block reaction activity calculation
 
-**Missing Gene Handling**:
-- `-n true`: Ignore missing genes in the GPR rules.
-- `-n false`: Missing genes cause reaction score to be NaN
-
-### RAS Computation
+**When `--none false` (strict mode):**
+- `(GENE1 and NaN)` → `NaN` (reaction cannot be evaluated)
+- `(GENE1 or NaN)` → `NaN` (reaction cannot be evaluated)
+- Any missing gene propagates NaN through the entire GPR expression
 
-**Example**:
-```
-GPR: (HGNC:5 and HGNC:10) or HGNC:15
-Expression: HGNC:5=10.5, HGNC:10=3.2, HGNC:15=7.9
-RAS = max(min(10.5, 3.2), 7.9) = max(3.2, 7.9) = 7.9
-```
+**Recommendation**: Use default (`true`) for datasets with incomplete gene coverage.
 
 ## Output Format
 
-### RAS Values File
-```tsv
-Reactions	Sample_1	Sample_2	Sample_3	Sample_4
-R_HEX1	8.5	9.2	12.1	11.3
-R_PGI	7.3	8.1	6.4	7.2
-R_PFK	15.2	16.8	20.1	18.9
-R_ALDOA	3.2	4.1	4.4	5.1
+```
+Reaction	Sample1	Sample2	Sample3
+R00001	125.5	98.3	142.7
+R00002	85.2	110.4	95.8
 ```
 
-**Format**:
-- First column: Reaction identifiers
-- Subsequent columns: RAS values for each sample
-- Missing values represented as "None"
+## Examples
 
-## Usage Examples
-
-### Command Line
+### Basic Usage
 
 ```bash
-# Basic usage with built-in model (after pip install)
-ras_generator \
-  -in expression_data.tsv \
-  -ra ras_output.tsv \
-  -rs ENGRO2
-
-# With custom model and strict missing gene handling
-ras_generator \
-  -in expression_data.tsv \
-  -ra ras_output.tsv \
-  -rl custom_rules.tsv \
-  -rn "CustomModel" \
-  -n false
-
-# Explicitly specify tool directory (only needed if not using pip install)
-ras_generator -td /path/to/COBRAxy \
-  -in expression_data.tsv \
-  -ra ras_output.tsv \
-  -rs ENGRO2
+ras_generator -rs ENGRO2 \
+              -in expression.tsv \
+              -ra ras_scores.tsv
 ```
 
-### Galaxy Usage
-
-1. Upload gene expression file to Galaxy
-2. Select **RAS Generator** from COBRAxy tools
-3. Configure parameters:
-   - **Input dataset**: Your expression file
-   - **Rule selector**: ENGRO2 (or other model)
-   - **Handle missing genes**: Yes/No
-4. Click **Execute**
-
-## Built-in Models
-
-### ENGRO2 (Recommended for most analyses)
-- **Scope**: Focused human metabolism
-- **Reactions**: ~500
-- **Genes**: ~500
-- **Use case**: Core metabolic analysis
-
-### Recon (Comprehensive analysis)
-- **Scope**: Complete human metabolism  
-- **Reactions**: ~10,000
-- **Genes**: ~2,000
-- **Use case**: Genome-wide metabolic studies
-
-## Gene ID Mapping
-
-COBRAxy supports multiple gene identifier formats:
-
-| Format | Example | Notes |
-|--------|---------|--------|
-| **HGNC ID** | HGNC:5 | Recommended, most stable |
-| **HGNC Symbol** | ALDOA | Human-readable but may change |
-| **Ensembl** | ENSG00000149925 | Version-specific |
-| **Entrez** | 226 | Numeric identifier |
-
-**Recommendation**: Use HGNC IDs for best compatibility and stability.
-
-
-
 ## Troubleshooting
 
-### Common Issues
-
-**"Gene not found" warnings**
-```
-Solution: Check gene ID format matches model expectations
-- Verify gene identifiers (HGNC vs symbols vs Ensembl)
-- Use gene mapping tools if needed
-- Set -n true to handle missing genes
-```
-
-**"No computable scores" error**
-```
-Solution: Insufficient gene overlap between data and model
-- Check gene ID format compatibility
-- Verify expression file format
-- Try different built-in model
-```
-
-**Empty output file**
-```
-Solution: Check input file format and permissions
-- Ensure TSV format with proper headers
-- Verify file paths are correct
-- Check write permissions for output directory
-```
-
-
-
-### Debug Mode
-
-Enable detailed logging:
-
-```bash
-ras_generator -td /path/to/COBRAxy \
-  -in expression_data.tsv \
-  -ra ras_output.tsv \
-  -rs ENGRO2 \
-  --out_log detailed_log.txt
-```
-
-Check log file for detailed error messages and processing statistics.
-
-## Validation
-
-### Check Output Quality
-
-```python
-import pandas as pd
-
-# Read RAS output
-ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
-
-# Basic statistics
-print(f"RAS matrix shape: {ras_df.shape}")
-print(f"Non-null values: {ras_df.count().sum()}")
-print(f"Value range: {ras_df.min().min():.2f} to {ras_df.max().max():.2f}")
-
-# Check for problematic reactions
-null_reactions = ras_df.isnull().all(axis=1).sum()
-print(f"Reactions with no data: {null_reactions}")
-```
-
+| Error | Solution |
+|-------|----------|
+| "Gene not found" | Check gene ID format |
+| "Invalid GPR" | Verify GPR rule syntax |
 
-## Integration with Other Tools
-
-### Downstream Analysis
-
-RAS output can be used with:
-
-- **[MAREA](marea.md)**: Statistical enrichment analysis
-- **[RAS to Bounds](ras-to-bounds.md)**: Flux constraint application
-- **[MAREA Cluster](marea-cluster.md)**: Sample clustering
-
-### Preprocessing Options
-
-Before RAS generation:
-- **Normalize** expression data (log2, quantile, etc.)
-- **Filter** low-expression genes
-- **Batch correct** if multiple datasets
-
-## Advanced Usage
-
-### Custom Model Integration
-
-```python
-# Create custom GPR rules
-custom_rules = {
-    'R_CUSTOM1': 'HGNC:5 and HGNC:10',
-    'R_CUSTOM2': 'HGNC:15 or HGNC:20'  
-}
+## See Also
 
-# Save as TSV
-import pandas as pd
-rules_df = pd.DataFrame(list(custom_rules.items()), 
-                       columns=['Reaction_ID', 'GPR'])
-rules_df.to_csv('custom_rules.tsv', sep='\t', index=False)
-
-# Use with RAS generator
-args = ['-rl', 'custom_rules.tsv', '-rn', 'CustomModel']
-```
-
-### Batch Processing
-
-```python
-# Process multiple expression files
-expression_files = ['data1.tsv', 'data2.tsv', 'data3.tsv']
-
-for i, exp_file in enumerate(expression_files):
-    output_file = f'ras_output_{i}.tsv'
-    
-    args = [
-        '-td', '/path/to/COBRAxy',
-        '-in', exp_file,
-        '-ra', output_file,
-        '-rs', 'ENGRO2'
-    ]
-    
-    ras_generator.main(args)
-    print(f"Processed {exp_file} → {output_file}")
-```
-
-## References
-
-- [COBRApy documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling
-- [GPR rules format](https://cobrapy.readthedocs.io/en/stable/getting_started.html#gene-protein-reaction-rules) - Standard format specification  
-- [HGNC database](https://www.genenames.org/) - Gene nomenclature standards
\ No newline at end of file
+- [RAS to Bounds](tools/ras-to-bounds)
+- [MAREA](tools/marea)
+- [Built-in Models](reference/built-in-models)