Mercurial > repos > bimib > cobraxy
view COBRAxy/docs/troubleshooting.md @ 492:4ed95023af20 draft
Uploaded
author | francesco_lapi |
---|---|
date | Tue, 30 Sep 2025 14:02:17 +0000 |
parents | |
children |
line wrap: on
line source
# Troubleshooting Common issues and solutions when using COBRAxy. ## Installation Issues ### Python Import Errors **Problem**: `ModuleNotFoundError: No module named 'cobra'` ```bash # Solution: Install missing dependencies pip install cobra pandas numpy scipy # Or reinstall COBRAxy cd COBRAxy pip install -e . ``` **Problem**: `ImportError: No module named 'cobraxy'` ```python # Solution: Add COBRAxy to Python path import sys sys.path.insert(0, '/path/to/COBRAxy') ``` ### System Dependencies **Problem**: GLPK solver not found ```bash # Ubuntu/Debian sudo apt-get install libglpk40 glpk-utils pip install swiglpk # macOS brew install glpk pip install swiglpk # Windows (using conda) conda install -c conda-forge glpk swiglpk ``` **Problem**: SVG processing errors ```bash # Install libvips for image processing # Ubuntu/Debian: sudo apt-get install libvips # macOS: brew install vips ``` ## Data Format Issues ### Gene Expression Problems **Problem**: "No computable scores" error ``` Cause: Gene IDs don't match between data and model Solution: 1. Check gene ID format (HGNC vs symbols vs Ensembl) 2. Verify first column contains gene identifiers 3. Ensure tab-separated format 4. Try different built-in model ``` **Problem**: Many "gene not found" warnings ```python # Check gene overlap with model import pickle genes_dict = pickle.load(open('local/pickle files/ENGRO2_genes.p', 'rb')) model_genes = set(genes_dict['hugo_id'].keys()) import pandas as pd data_genes = set(pd.read_csv('expression.tsv', sep='\t').iloc[:, 0]) overlap = len(model_genes.intersection(data_genes)) print(f"Gene overlap: {overlap}/{len(data_genes)} ({overlap/len(data_genes)*100:.1f}%)") ``` **Problem**: File format not recognized ```tsv # Correct format - tab-separated: Gene_ID Sample_1 Sample_2 HGNC:5 10.5 11.2 HGNC:10 3.2 4.1 # Wrong - comma-separated or spaces will fail ``` ### Model Issues **Problem**: Custom model not loading ``` Solution: 1. Check TSV format with "GPR" column header 2. Verify reaction IDs are unique 3. Test GPR syntax (use 'and'/'or', proper parentheses) 4. Check file permissions and encoding (UTF-8) ``` ## Tool Execution Errors ### File Path Problems **Problem**: "File not found" errors ```python # Use absolute paths from pathlib import Path tool_dir = str(Path('/path/to/COBRAxy').absolute()) input_file = str(Path('expression.tsv').absolute()) args = ['-td', tool_dir, '-in', input_file, ...] ``` **Problem**: Permission denied ```bash # Check write permissions ls -la output_directory/ # Fix permissions chmod 755 output_directory/ chmod 644 input_files/* ``` ### Galaxy Integration Issues **Problem**: COBRAxy tools not appearing in Galaxy ```xml <!-- Check tool_conf.xml syntax --> <section id="cobraxy" name="COBRAxy"> <tool file="cobraxy/ras_generator.xml" /> </section> <!-- Verify file paths are correct --> ls tools/cobraxy/ras_generator.xml ``` **Problem**: Tool execution fails in Galaxy ``` Check Galaxy logs: - main.log: General Galaxy issues - handler.log: Job execution problems - uwsgi.log: Web server issues Common fixes: 1. Restart Galaxy after adding tools 2. Check Python environment has COBRApy installed 3. Verify file permissions on tool files ``` **Problem**: Flux sampling hangs ```bash # Check solver availability python -c "import cobra; print(cobra.Configuration().solver)" # Should show: glpk, cplex, or gurobi # Install GLPK if missing: pip install swiglpk ``` ### Large Dataset Handling **Problem**: Cannot process large expression matrices ```python # Process in chunks def process_large_dataset(expression_file, chunk_size=1000): df = pd.read_csv(expression_file, sep='\t') for i in range(0, len(df), chunk_size): chunk = df.iloc[i:i+chunk_size] chunk_file = f'chunk_{i}.tsv' chunk.to_csv(chunk_file, sep='\t', index=False) # Process chunk ras_generator.main(['-in', chunk_file, ...]) ``` ## Output Validation ### Unexpected Results **Problem**: All RAS values are zero or null ```python # Debug gene mapping import pandas as pd ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0) # Check data quality print(f"Null percentage: {ras_df.isnull().sum().sum() / ras_df.size * 100:.1f}%") print(f"Zero percentage: {(ras_df == 0).sum().sum() / ras_df.size * 100:.1f}%") # Check expression data preprocessing expr_df = pd.read_csv('expression.tsv', sep='\t', index_col=0) print(f"Expression range: {expr_df.min().min():.2f} to {expr_df.max().max():.2f}") ``` **Problem**: RAS values seem too high/low ``` Possible causes: 1. Expression data not log-transformed 2. Wrong normalization method 3. Incorrect gene ID mapping 4. GPR rule interpretation issues Solutions: 1. Check expression data preprocessing 2. Validate against known control genes 3. Compare with published metabolic activity patterns ``` ### Missing Pathway Maps **Problem**: MAREA generates no output maps ``` Debug steps: 1. Check RAS input has non-null values 2. Verify model choice matches RAS generation 3. Check statistical significance thresholds 4. Look at log files for specific errors ``` ## Environment Issues ### Conda/Virtual Environment Problems **Problem**: Tool import fails in virtual environment ```bash # Activate environment properly source venv/bin/activate # Linux/macOS # or venv\Scripts\activate # Windows # Verify COBRAxy installation pip list | grep cobra python -c "import cobra; print('COBRApy version:', cobra.__version__)" ``` **Problem**: Version conflicts ```bash # Create clean environment conda create -n cobraxy python=3.9 conda activate cobraxy # Install COBRAxy fresh cd COBRAxy pip install -e . ``` ### Cross-Platform Issues **Problem**: Windows path separator issues ```python # Use pathlib for cross-platform paths from pathlib import Path # Instead of: '/path/to/file' # Use: str(Path('path') / 'to' / 'file') ``` **Problem**: Line ending issues (Windows/Unix) ```bash # Convert line endings if needed dos2unix input_file.tsv # Unix unix2dos input_file.tsv # Windows ``` ## Debugging Strategies ### Enable Detailed Logging ```python import logging logging.basicConfig(level=logging.DEBUG) # Many tools accept log file parameter args = [..., '--out_log', 'detailed.log'] ``` ### Test with Small Datasets ```python # Create minimal test case test_data = """Gene_ID Sample1 Sample2 HGNC:5 10.0 15.0 HGNC:10 5.0 8.0""" with open('test_input.tsv', 'w') as f: f.write(test_data) # Test basic functionality ras_generator.main(['-td', tool_dir, '-in', 'test_input.tsv', '-ra', 'test_output.tsv', '-rs', 'ENGRO2']) ``` ### Check Dependencies ```python # Verify all required packages required_packages = ['cobra', 'pandas', 'numpy', 'scipy'] for package in required_packages: try: __import__(package) print(f"✓ {package}") except ImportError: print(f"✗ {package} - MISSING") ``` ## Getting Help ### Information to Include in Bug Reports When reporting issues, include: 1. **System information**: ```bash python --version pip list | grep cobra uname -a # Linux/macOS ``` 2. **Complete error messages**: Copy full traceback 3. **Input file format**: First few lines of input data 4. **Command/parameters used**: Exact command or Python code 5. **Expected vs actual behavior**: What should happen vs what happens ### Community Resources - **GitHub Issues**: [Report bugs](https://github.com/CompBtBs/COBRAxy/issues) - **Discussions**: [Ask questions](https://github.com/CompBtBs/COBRAxy/discussions) - **COBRApy Community**: [General metabolic modeling help](https://github.com/opencobra/cobrapy) ### Self-Help Checklist Before reporting issues: - ✅ Checked this troubleshooting guide - ✅ Verified installation completeness - ✅ Tested with built-in example data - ✅ Searched existing GitHub issues - ✅ Tried alternative models/parameters - ✅ Checked file formats and permissions ## Prevention Tips ### Best Practices 1. **Use virtual environments** to avoid conflicts 2. **Validate input data** before processing 3. **Start with small datasets** for testing 4. **Keep backups** of working configurations 5. **Document successful workflows** for reuse 6. **Test after updates** to catch regressions ### Data Quality Checks ```python def validate_expression_data(filename): """Validate gene expression file format.""" df = pd.read_csv(filename, sep='\t') # Check basic format assert df.shape[0] > 0, "Empty file" assert df.shape[1] > 1, "Need at least 2 columns" # Check numeric data numeric_cols = df.select_dtypes(include=[np.number]).columns assert len(numeric_cols) > 0, "No numeric expression data" # Check for missing values null_pct = df.isnull().sum().sum() / df.size * 100 if null_pct > 50: print(f"Warning: {null_pct:.1f}% missing values") print(f"✓ File valid: {df.shape[0]} genes × {df.shape[1]-1} samples") ``` This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages.