comparison COBRAxy/docs/troubleshooting.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents
children fcdbc81feb45
comparison
equal deleted inserted replaced
491:7a413a5ec566 492:4ed95023af20
1 # Troubleshooting
2
3 Common issues and solutions when using COBRAxy.
4
5 ## Installation Issues
6
7 ### Python Import Errors
8
9 **Problem**: `ModuleNotFoundError: No module named 'cobra'`
10 ```bash
11 # Solution: Install missing dependencies
12 pip install cobra pandas numpy scipy
13
14 # Or reinstall COBRAxy
15 cd COBRAxy
16 pip install -e .
17 ```
18
19 **Problem**: `ImportError: No module named 'cobraxy'`
20 ```python
21 # Solution: Add COBRAxy to Python path
22 import sys
23 sys.path.insert(0, '/path/to/COBRAxy')
24 ```
25
26 ### System Dependencies
27
28 **Problem**: GLPK solver not found
29 ```bash
30 # Ubuntu/Debian
31 sudo apt-get install libglpk40 glpk-utils
32 pip install swiglpk
33
34 # macOS
35 brew install glpk
36 pip install swiglpk
37
38 # Windows (using conda)
39 conda install -c conda-forge glpk swiglpk
40 ```
41
42 **Problem**: SVG processing errors
43 ```bash
44 # Install libvips for image processing
45 # Ubuntu/Debian: sudo apt-get install libvips
46 # macOS: brew install vips
47 ```
48
49 ## Data Format Issues
50
51 ### Gene Expression Problems
52
53 **Problem**: "No computable scores" error
54 ```
55 Cause: Gene IDs don't match between data and model
56 Solution:
57 1. Check gene ID format (HGNC vs symbols vs Ensembl)
58 2. Verify first column contains gene identifiers
59 3. Ensure tab-separated format
60 4. Try different built-in model
61 ```
62
63 **Problem**: Many "gene not found" warnings
64 ```python
65 # Check gene overlap with model
66 import pickle
67 genes_dict = pickle.load(open('local/pickle files/ENGRO2_genes.p', 'rb'))
68 model_genes = set(genes_dict['hugo_id'].keys())
69
70 import pandas as pd
71 data_genes = set(pd.read_csv('expression.tsv', sep='\t').iloc[:, 0])
72
73 overlap = len(model_genes.intersection(data_genes))
74 print(f"Gene overlap: {overlap}/{len(data_genes)} ({overlap/len(data_genes)*100:.1f}%)")
75 ```
76
77 **Problem**: File format not recognized
78 ```tsv
79 # Correct format - tab-separated:
80 Gene_ID Sample_1 Sample_2
81 HGNC:5 10.5 11.2
82 HGNC:10 3.2 4.1
83
84 # Wrong - comma-separated or spaces will fail
85 ```
86
87 ### Model Issues
88
89 **Problem**: Custom model not loading
90 ```
91 Solution:
92 1. Check TSV format with "GPR" column header
93 2. Verify reaction IDs are unique
94 3. Test GPR syntax (use 'and'/'or', proper parentheses)
95 4. Check file permissions and encoding (UTF-8)
96 ```
97
98 ## Tool Execution Errors
99
100
101
102 ### File Path Problems
103
104 **Problem**: "File not found" errors
105 ```python
106 # Use absolute paths
107 from pathlib import Path
108
109 tool_dir = str(Path('/path/to/COBRAxy').absolute())
110 input_file = str(Path('expression.tsv').absolute())
111
112 args = ['-td', tool_dir, '-in', input_file, ...]
113 ```
114
115 **Problem**: Permission denied
116 ```bash
117 # Check write permissions
118 ls -la output_directory/
119
120 # Fix permissions
121 chmod 755 output_directory/
122 chmod 644 input_files/*
123 ```
124
125 ### Galaxy Integration Issues
126
127 **Problem**: COBRAxy tools not appearing in Galaxy
128 ```xml
129 <!-- Check tool_conf.xml syntax -->
130 <section id="cobraxy" name="COBRAxy">
131 <tool file="cobraxy/ras_generator.xml" />
132 </section>
133
134 <!-- Verify file paths are correct -->
135 ls tools/cobraxy/ras_generator.xml
136 ```
137
138 **Problem**: Tool execution fails in Galaxy
139 ```
140 Check Galaxy logs:
141 - main.log: General Galaxy issues
142 - handler.log: Job execution problems
143 - uwsgi.log: Web server issues
144
145 Common fixes:
146 1. Restart Galaxy after adding tools
147 2. Check Python environment has COBRApy installed
148 3. Verify file permissions on tool files
149 ```
150
151
152
153 **Problem**: Flux sampling hangs
154 ```bash
155 # Check solver availability
156 python -c "import cobra; print(cobra.Configuration().solver)"
157
158 # Should show: glpk, cplex, or gurobi
159 # Install GLPK if missing:
160 pip install swiglpk
161 ```
162
163 ### Large Dataset Handling
164
165 **Problem**: Cannot process large expression matrices
166 ```python
167 # Process in chunks
168 def process_large_dataset(expression_file, chunk_size=1000):
169 df = pd.read_csv(expression_file, sep='\t')
170
171 for i in range(0, len(df), chunk_size):
172 chunk = df.iloc[i:i+chunk_size]
173 chunk_file = f'chunk_{i}.tsv'
174 chunk.to_csv(chunk_file, sep='\t', index=False)
175
176 # Process chunk
177 ras_generator.main(['-in', chunk_file, ...])
178 ```
179
180 ## Output Validation
181
182 ### Unexpected Results
183
184 **Problem**: All RAS values are zero or null
185 ```python
186 # Debug gene mapping
187 import pandas as pd
188 ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
189
190 # Check data quality
191 print(f"Null percentage: {ras_df.isnull().sum().sum() / ras_df.size * 100:.1f}%")
192 print(f"Zero percentage: {(ras_df == 0).sum().sum() / ras_df.size * 100:.1f}%")
193
194 # Check expression data preprocessing
195 expr_df = pd.read_csv('expression.tsv', sep='\t', index_col=0)
196 print(f"Expression range: {expr_df.min().min():.2f} to {expr_df.max().max():.2f}")
197 ```
198
199 **Problem**: RAS values seem too high/low
200 ```
201 Possible causes:
202 1. Expression data not log-transformed
203 2. Wrong normalization method
204 3. Incorrect gene ID mapping
205 4. GPR rule interpretation issues
206
207 Solutions:
208 1. Check expression data preprocessing
209 2. Validate against known control genes
210 3. Compare with published metabolic activity patterns
211 ```
212
213 ### Missing Pathway Maps
214
215 **Problem**: MAREA generates no output maps
216 ```
217 Debug steps:
218 1. Check RAS input has non-null values
219 2. Verify model choice matches RAS generation
220 3. Check statistical significance thresholds
221 4. Look at log files for specific errors
222 ```
223
224 ## Environment Issues
225
226 ### Conda/Virtual Environment Problems
227
228 **Problem**: Tool import fails in virtual environment
229 ```bash
230 # Activate environment properly
231 source venv/bin/activate # Linux/macOS
232 # or
233 venv\Scripts\activate # Windows
234
235 # Verify COBRAxy installation
236 pip list | grep cobra
237 python -c "import cobra; print('COBRApy version:', cobra.__version__)"
238 ```
239
240 **Problem**: Version conflicts
241 ```bash
242 # Create clean environment
243 conda create -n cobraxy python=3.9
244 conda activate cobraxy
245
246 # Install COBRAxy fresh
247 cd COBRAxy
248 pip install -e .
249 ```
250
251 ### Cross-Platform Issues
252
253 **Problem**: Windows path separator issues
254 ```python
255 # Use pathlib for cross-platform paths
256 from pathlib import Path
257
258 # Instead of: '/path/to/file'
259 # Use: str(Path('path') / 'to' / 'file')
260 ```
261
262 **Problem**: Line ending issues (Windows/Unix)
263 ```bash
264 # Convert line endings if needed
265 dos2unix input_file.tsv # Unix
266 unix2dos input_file.tsv # Windows
267 ```
268
269 ## Debugging Strategies
270
271 ### Enable Detailed Logging
272
273 ```python
274 import logging
275 logging.basicConfig(level=logging.DEBUG)
276
277 # Many tools accept log file parameter
278 args = [..., '--out_log', 'detailed.log']
279 ```
280
281 ### Test with Small Datasets
282
283 ```python
284 # Create minimal test case
285 test_data = """Gene_ID Sample1 Sample2
286 HGNC:5 10.0 15.0
287 HGNC:10 5.0 8.0"""
288
289 with open('test_input.tsv', 'w') as f:
290 f.write(test_data)
291
292 # Test basic functionality
293 ras_generator.main(['-td', tool_dir, '-in', 'test_input.tsv',
294 '-ra', 'test_output.tsv', '-rs', 'ENGRO2'])
295 ```
296
297 ### Check Dependencies
298
299 ```python
300 # Verify all required packages
301 required_packages = ['cobra', 'pandas', 'numpy', 'scipy']
302
303 for package in required_packages:
304 try:
305 __import__(package)
306 print(f"✓ {package}")
307 except ImportError:
308 print(f"✗ {package} - MISSING")
309 ```
310
311 ## Getting Help
312
313 ### Information to Include in Bug Reports
314
315 When reporting issues, include:
316
317 1. **System information**:
318 ```bash
319 python --version
320 pip list | grep cobra
321 uname -a # Linux/macOS
322 ```
323
324 2. **Complete error messages**: Copy full traceback
325 3. **Input file format**: First few lines of input data
326 4. **Command/parameters used**: Exact command or Python code
327 5. **Expected vs actual behavior**: What should happen vs what happens
328
329 ### Community Resources
330
331 - **GitHub Issues**: [Report bugs](https://github.com/CompBtBs/COBRAxy/issues)
332 - **Discussions**: [Ask questions](https://github.com/CompBtBs/COBRAxy/discussions)
333 - **COBRApy Community**: [General metabolic modeling help](https://github.com/opencobra/cobrapy)
334
335 ### Self-Help Checklist
336
337 Before reporting issues:
338
339 - ✅ Checked this troubleshooting guide
340 - ✅ Verified installation completeness
341 - ✅ Tested with built-in example data
342 - ✅ Searched existing GitHub issues
343 - ✅ Tried alternative models/parameters
344 - ✅ Checked file formats and permissions
345
346 ## Prevention Tips
347
348 ### Best Practices
349
350 1. **Use virtual environments** to avoid conflicts
351 2. **Validate input data** before processing
352 3. **Start with small datasets** for testing
353 4. **Keep backups** of working configurations
354 5. **Document successful workflows** for reuse
355 6. **Test after updates** to catch regressions
356
357 ### Data Quality Checks
358
359 ```python
360 def validate_expression_data(filename):
361 """Validate gene expression file format."""
362 df = pd.read_csv(filename, sep='\t')
363
364 # Check basic format
365 assert df.shape[0] > 0, "Empty file"
366 assert df.shape[1] > 1, "Need at least 2 columns"
367
368 # Check numeric data
369 numeric_cols = df.select_dtypes(include=[np.number]).columns
370 assert len(numeric_cols) > 0, "No numeric expression data"
371
372 # Check for missing values
373 null_pct = df.isnull().sum().sum() / df.size * 100
374 if null_pct > 50:
375 print(f"Warning: {null_pct:.1f}% missing values")
376
377 print(f"✓ File valid: {df.shape[0]} genes × {df.shape[1]-1} samples")
378 ```
379
380 This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages.