Mercurial > repos > bimib > cobraxy
diff COBRAxy/docs/troubleshooting.md @ 492:4ed95023af20 draft
Uploaded
author | francesco_lapi |
---|---|
date | Tue, 30 Sep 2025 14:02:17 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/COBRAxy/docs/troubleshooting.md Tue Sep 30 14:02:17 2025 +0000 @@ -0,0 +1,380 @@ +# Troubleshooting + +Common issues and solutions when using COBRAxy. + +## Installation Issues + +### Python Import Errors + +**Problem**: `ModuleNotFoundError: No module named 'cobra'` +```bash +# Solution: Install missing dependencies +pip install cobra pandas numpy scipy + +# Or reinstall COBRAxy +cd COBRAxy +pip install -e . +``` + +**Problem**: `ImportError: No module named 'cobraxy'` +```python +# Solution: Add COBRAxy to Python path +import sys +sys.path.insert(0, '/path/to/COBRAxy') +``` + +### System Dependencies + +**Problem**: GLPK solver not found +```bash +# Ubuntu/Debian +sudo apt-get install libglpk40 glpk-utils +pip install swiglpk + +# macOS +brew install glpk +pip install swiglpk + +# Windows (using conda) +conda install -c conda-forge glpk swiglpk +``` + +**Problem**: SVG processing errors +```bash +# Install libvips for image processing +# Ubuntu/Debian: sudo apt-get install libvips +# macOS: brew install vips +``` + +## Data Format Issues + +### Gene Expression Problems + +**Problem**: "No computable scores" error +``` +Cause: Gene IDs don't match between data and model +Solution: +1. Check gene ID format (HGNC vs symbols vs Ensembl) +2. Verify first column contains gene identifiers +3. Ensure tab-separated format +4. Try different built-in model +``` + +**Problem**: Many "gene not found" warnings +```python +# Check gene overlap with model +import pickle +genes_dict = pickle.load(open('local/pickle files/ENGRO2_genes.p', 'rb')) +model_genes = set(genes_dict['hugo_id'].keys()) + +import pandas as pd +data_genes = set(pd.read_csv('expression.tsv', sep='\t').iloc[:, 0]) + +overlap = len(model_genes.intersection(data_genes)) +print(f"Gene overlap: {overlap}/{len(data_genes)} ({overlap/len(data_genes)*100:.1f}%)") +``` + +**Problem**: File format not recognized +```tsv +# Correct format - tab-separated: +Gene_ID Sample_1 Sample_2 +HGNC:5 10.5 11.2 +HGNC:10 3.2 4.1 + +# Wrong - comma-separated or spaces will fail +``` + +### Model Issues + +**Problem**: Custom model not loading +``` +Solution: +1. Check TSV format with "GPR" column header +2. Verify reaction IDs are unique +3. Test GPR syntax (use 'and'/'or', proper parentheses) +4. Check file permissions and encoding (UTF-8) +``` + +## Tool Execution Errors + + + +### File Path Problems + +**Problem**: "File not found" errors +```python +# Use absolute paths +from pathlib import Path + +tool_dir = str(Path('/path/to/COBRAxy').absolute()) +input_file = str(Path('expression.tsv').absolute()) + +args = ['-td', tool_dir, '-in', input_file, ...] +``` + +**Problem**: Permission denied +```bash +# Check write permissions +ls -la output_directory/ + +# Fix permissions +chmod 755 output_directory/ +chmod 644 input_files/* +``` + +### Galaxy Integration Issues + +**Problem**: COBRAxy tools not appearing in Galaxy +```xml +<!-- Check tool_conf.xml syntax --> +<section id="cobraxy" name="COBRAxy"> + <tool file="cobraxy/ras_generator.xml" /> +</section> + +<!-- Verify file paths are correct --> +ls tools/cobraxy/ras_generator.xml +``` + +**Problem**: Tool execution fails in Galaxy +``` +Check Galaxy logs: +- main.log: General Galaxy issues +- handler.log: Job execution problems +- uwsgi.log: Web server issues + +Common fixes: +1. Restart Galaxy after adding tools +2. Check Python environment has COBRApy installed +3. Verify file permissions on tool files +``` + + + +**Problem**: Flux sampling hangs +```bash +# Check solver availability +python -c "import cobra; print(cobra.Configuration().solver)" + +# Should show: glpk, cplex, or gurobi +# Install GLPK if missing: +pip install swiglpk +``` + +### Large Dataset Handling + +**Problem**: Cannot process large expression matrices +```python +# Process in chunks +def process_large_dataset(expression_file, chunk_size=1000): + df = pd.read_csv(expression_file, sep='\t') + + for i in range(0, len(df), chunk_size): + chunk = df.iloc[i:i+chunk_size] + chunk_file = f'chunk_{i}.tsv' + chunk.to_csv(chunk_file, sep='\t', index=False) + + # Process chunk + ras_generator.main(['-in', chunk_file, ...]) +``` + +## Output Validation + +### Unexpected Results + +**Problem**: All RAS values are zero or null +```python +# Debug gene mapping +import pandas as pd +ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0) + +# Check data quality +print(f"Null percentage: {ras_df.isnull().sum().sum() / ras_df.size * 100:.1f}%") +print(f"Zero percentage: {(ras_df == 0).sum().sum() / ras_df.size * 100:.1f}%") + +# Check expression data preprocessing +expr_df = pd.read_csv('expression.tsv', sep='\t', index_col=0) +print(f"Expression range: {expr_df.min().min():.2f} to {expr_df.max().max():.2f}") +``` + +**Problem**: RAS values seem too high/low +``` +Possible causes: +1. Expression data not log-transformed +2. Wrong normalization method +3. Incorrect gene ID mapping +4. GPR rule interpretation issues + +Solutions: +1. Check expression data preprocessing +2. Validate against known control genes +3. Compare with published metabolic activity patterns +``` + +### Missing Pathway Maps + +**Problem**: MAREA generates no output maps +``` +Debug steps: +1. Check RAS input has non-null values +2. Verify model choice matches RAS generation +3. Check statistical significance thresholds +4. Look at log files for specific errors +``` + +## Environment Issues + +### Conda/Virtual Environment Problems + +**Problem**: Tool import fails in virtual environment +```bash +# Activate environment properly +source venv/bin/activate # Linux/macOS +# or +venv\Scripts\activate # Windows + +# Verify COBRAxy installation +pip list | grep cobra +python -c "import cobra; print('COBRApy version:', cobra.__version__)" +``` + +**Problem**: Version conflicts +```bash +# Create clean environment +conda create -n cobraxy python=3.9 +conda activate cobraxy + +# Install COBRAxy fresh +cd COBRAxy +pip install -e . +``` + +### Cross-Platform Issues + +**Problem**: Windows path separator issues +```python +# Use pathlib for cross-platform paths +from pathlib import Path + +# Instead of: '/path/to/file' +# Use: str(Path('path') / 'to' / 'file') +``` + +**Problem**: Line ending issues (Windows/Unix) +```bash +# Convert line endings if needed +dos2unix input_file.tsv # Unix +unix2dos input_file.tsv # Windows +``` + +## Debugging Strategies + +### Enable Detailed Logging + +```python +import logging +logging.basicConfig(level=logging.DEBUG) + +# Many tools accept log file parameter +args = [..., '--out_log', 'detailed.log'] +``` + +### Test with Small Datasets + +```python +# Create minimal test case +test_data = """Gene_ID Sample1 Sample2 +HGNC:5 10.0 15.0 +HGNC:10 5.0 8.0""" + +with open('test_input.tsv', 'w') as f: + f.write(test_data) + +# Test basic functionality +ras_generator.main(['-td', tool_dir, '-in', 'test_input.tsv', + '-ra', 'test_output.tsv', '-rs', 'ENGRO2']) +``` + +### Check Dependencies + +```python +# Verify all required packages +required_packages = ['cobra', 'pandas', 'numpy', 'scipy'] + +for package in required_packages: + try: + __import__(package) + print(f"✓ {package}") + except ImportError: + print(f"✗ {package} - MISSING") +``` + +## Getting Help + +### Information to Include in Bug Reports + +When reporting issues, include: + +1. **System information**: + ```bash + python --version + pip list | grep cobra + uname -a # Linux/macOS + ``` + +2. **Complete error messages**: Copy full traceback +3. **Input file format**: First few lines of input data +4. **Command/parameters used**: Exact command or Python code +5. **Expected vs actual behavior**: What should happen vs what happens + +### Community Resources + +- **GitHub Issues**: [Report bugs](https://github.com/CompBtBs/COBRAxy/issues) +- **Discussions**: [Ask questions](https://github.com/CompBtBs/COBRAxy/discussions) +- **COBRApy Community**: [General metabolic modeling help](https://github.com/opencobra/cobrapy) + +### Self-Help Checklist + +Before reporting issues: + +- ✅ Checked this troubleshooting guide +- ✅ Verified installation completeness +- ✅ Tested with built-in example data +- ✅ Searched existing GitHub issues +- ✅ Tried alternative models/parameters +- ✅ Checked file formats and permissions + +## Prevention Tips + +### Best Practices + +1. **Use virtual environments** to avoid conflicts +2. **Validate input data** before processing +3. **Start with small datasets** for testing +4. **Keep backups** of working configurations +5. **Document successful workflows** for reuse +6. **Test after updates** to catch regressions + +### Data Quality Checks + +```python +def validate_expression_data(filename): + """Validate gene expression file format.""" + df = pd.read_csv(filename, sep='\t') + + # Check basic format + assert df.shape[0] > 0, "Empty file" + assert df.shape[1] > 1, "Need at least 2 columns" + + # Check numeric data + numeric_cols = df.select_dtypes(include=[np.number]).columns + assert len(numeric_cols) > 0, "No numeric expression data" + + # Check for missing values + null_pct = df.isnull().sum().sum() / df.size * 100 + if null_pct > 50: + print(f"Warning: {null_pct:.1f}% missing values") + + print(f"✓ File valid: {df.shape[0]} genes × {df.shape[1]-1} samples") +``` + +This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages. \ No newline at end of file