comparison COBRAxy/docs/tools/flux-simulation.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents
children fcdbc81feb45
comparison
equal deleted inserted replaced
491:7a413a5ec566 492:4ed95023af20
1 # Flux Simulation
2
3 Sample metabolic fluxes using constraint-based modeling with CBS or OPTGP algorithms.
4
5 ## Overview
6
7 Flux Simulation performs constraint-based sampling of metabolic flux distributions from constrained models. It supports two sampling algorithms (CBS and OPTGP) and provides comprehensive flux statistics including mean, median, quantiles, pFBA, FVA, and sensitivity analysis.
8
9 ## Usage
10
11 ### Command Line
12
13 ```bash
14 flux_simulation -td /path/to/COBRAxy \
15 -ms ENGRO2 \
16 -in bounds1.tsv,bounds2.tsv \
17 -ni Sample1,Sample2 \
18 -a CBS \
19 -ns 1000 \
20 -nb 1 \
21 -sd 42 \
22 -ot mean,median,quantiles \
23 -ota pFBA,FVA,sensitivity \
24 -idop flux_results/
25 ```
26
27 ### Galaxy Interface
28
29 Select "Flux Simulation" from the COBRAxy tool suite and configure sampling parameters through the web interface.
30
31 ## Parameters
32
33 ### Required Parameters
34
35 | Parameter | Flag | Description |
36 |-----------|------|-------------|
37 | Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory |
38 | Input Bounds | `-in, --input` | Comma-separated list of bounds files |
39 | Sample Names | `-ni, --names` | Comma-separated sample names |
40 | Algorithm | `-a, --algorithm` | Sampling algorithm (CBS or OPTGP) |
41 | Number of Samples | `-ns, --n_samples` | Samples per batch |
42 | Number of Batches | `-nb, --n_batches` | Number of sampling batches |
43 | Random Seed | `-sd, --seed` | Random seed for reproducibility |
44 | Output Types | `-ot, --output_type` | Flux statistics to compute |
45
46 ### Model Parameters
47
48 | Parameter | Flag | Description | Default |
49 |-----------|------|-------------|---------|
50 | Model Selector | `-ms, --model_selector` | Built-in model (ENGRO2, Custom) | ENGRO2 |
51 | Custom Model | `-mo, --model` | Path to custom SBML model | - |
52 | Model Name | `-mn, --model_name` | Custom model filename | - |
53
54 ### Sampling Parameters
55
56 | Parameter | Flag | Description | Default |
57 |-----------|------|-------------|---------|
58 | Algorithm | `-a, --algorithm` | CBS or OPTGP | - |
59 | Thinning | `-th, --thinning` | OPTGP thinning parameter | 100 |
60 | Samples | `-ns, --n_samples` | Samples per batch | - |
61 | Batches | `-nb, --n_batches` | Number of batches | - |
62 | Seed | `-sd, --seed` | Random seed | - |
63
64 ### Output Parameters
65
66 | Parameter | Flag | Description | Options |
67 |-----------|------|-------------|---------|
68 | Output Types | `-ot, --output_type` | Flux statistics | mean,median,quantiles,fluxes |
69 | Analysis Types | `-ota, --output_type_analysis` | Additional analyses | pFBA,FVA,sensitivity |
70 | Output Path | `-idop, --output_path` | Results directory | flux_simulation/ |
71 | Output Log | `-ol, --out_log` | Log file path | - |
72
73 ## Algorithms
74
75 ### CBS (Constraint-Based Sampling)
76
77 **Method**: Random objective function optimization
78 - Generates random linear combinations of reactions
79 - Optimizes using LP solver (GLPK preferred, COBRApy fallback)
80 - Fast and memory-efficient
81 - Suitable for large models
82
83 **Advantages**:
84 - High performance with GLPK
85 - Good coverage of solution space
86 - Robust to model size
87
88 ### OPTGP (Optimal Growth Perturbation)
89
90 **Method**: MCMC-based sampling
91 - Markov Chain Monte Carlo with growth optimization
92 - Requires thinning to reduce autocorrelation
93 - More computationally intensive
94 - Better theoretical guarantees
95
96 **Advantages**:
97 - Uniform sampling guarantee
98 - Well-established method
99 - Good for smaller models
100
101 ## Input Formats
102
103 ### Bounds Files
104
105 Tab-separated format with reaction bounds:
106
107 ```
108 Reaction lower_bound upper_bound
109 R00001 -1000.0 1250.5
110 R00002 -650.2 1000.0
111 R00003 0.0 2150.8
112 ```
113
114 Multiple bounds files can be processed simultaneously by providing comma-separated paths.
115
116 ### Custom Model File (Optional)
117
118 SBML format metabolic model compatible with COBRApy.
119
120 ## Output Formats
121
122 ### Flux Statistics
123
124 #### Mean Fluxes (`mean.csv`)
125 ```
126 Reaction Sample1 Sample2 Sample3
127 R00001 15.23 -8.45 22.1
128 R00002 0.0 12.67 -5.3
129 R00003 45.8 38.2 51.7
130 ```
131
132 #### Median Fluxes (`median.csv`)
133 ```
134 Reaction Sample1 Sample2 Sample3
135 R00001 14.1 -7.8 21.5
136 R00002 0.0 11.9 -4.8
137 R00003 44.2 37.1 50.3
138 ```
139
140 #### Quantiles (`quantiles.csv`)
141 ```
142 Reaction Sample1_q1 Sample1_q2 Sample1_q3 Sample2_q1 ...
143 R00001 10.5 14.1 18.7 -12.3 ...
144 R00002 -2.1 0.0 1.8 8.9 ...
145 R00003 38.9 44.2 49.8 32.1 ...
146 ```
147
148 ### Additional Analyses
149
150 #### pFBA (`pFBA.csv`)
151 Parsimonious Flux Balance Analysis results:
152 ```
153 Reaction Sample1 Sample2 Sample3
154 R00001 12.5 -6.7 19.3
155 R00002 0.0 8.9 -3.2
156 R00003 41.2 35.8 47.9
157 ```
158
159 #### FVA (`FVA.csv`)
160 Flux Variability Analysis bounds:
161 ```
162 Reaction Sample1_min Sample1_max Sample2_min Sample2_max ...
163 R00001 -5.2 35.8 -25.3 8.7 ...
164 R00002 -8.9 8.9 0.0 28.4 ...
165 R00003 15.6 78.3 10.2 65.9 ...
166 ```
167
168 #### Sensitivity (`sensitivity.csv`)
169 Single reaction deletion effects:
170 ```
171 Reaction Sample1 Sample2 Sample3
172 R00001 0.98 0.95 0.97
173 R00002 1.0 0.87 1.0
174 R00003 0.23 0.19 0.31
175 ```
176
177 ## Examples
178
179 ### Basic CBS Sampling
180
181 ```bash
182 # Simple CBS sampling with statistics
183 flux_simulation -td /opt/COBRAxy \
184 -ms ENGRO2 \
185 -in sample1_bounds.tsv,sample2_bounds.tsv \
186 -ni Sample1,Sample2 \
187 -a CBS \
188 -ns 500 \
189 -nb 2 \
190 -sd 42 \
191 -ot mean,median \
192 -ota pFBA \
193 -idop cbs_results/
194 ```
195
196 ### Comprehensive OPTGP Analysis
197
198 ```bash
199 # Full analysis with OPTGP
200 flux_simulation -td /opt/COBRAxy \
201 -ms ENGRO2 \
202 -in bounds/*.tsv \
203 -ni Sample1,Sample2,Sample3,Control1,Control2 \
204 -a OPTGP \
205 -th 200 \
206 -ns 1000 \
207 -nb 1 \
208 -sd 123 \
209 -ot mean,median,quantiles,fluxes \
210 -ota pFBA,FVA,sensitivity \
211 -idop comprehensive_analysis/ \
212 -ol sampling.log
213 ```
214
215 ### Custom Model Sampling
216
217 ```bash
218 # Use custom model with CBS
219 flux_simulation -td /opt/COBRAxy \
220 -ms Custom \
221 -mo models/tissue_specific.xml \
222 -mn tissue_specific.xml \
223 -in patient_bounds.tsv \
224 -ni PatientA \
225 -a CBS \
226 -ns 2000 \
227 -nb 5 \
228 -sd 456 \
229 -ot mean,quantiles \
230 -ota FVA,sensitivity \
231 -idop patient_analysis/
232 ```
233
234 ### Batch Processing Multiple Conditions
235
236 ```bash
237 # Process multiple experimental conditions
238 flux_simulation -td /opt/COBRAxy \
239 -ms ENGRO2 \
240 -in ctrl1.tsv,ctrl2.tsv,treat1.tsv,treat2.tsv \
241 -ni Control1,Control2,Treatment1,Treatment2 \
242 -a CBS \
243 -ns 800 \
244 -nb 3 \
245 -sd 789 \
246 -ot mean,median,fluxes \
247 -ota pFBA,FVA \
248 -idop batch_conditions/
249 ```
250
251 ## Algorithm Selection Guide
252
253 ### Choose CBS When:
254 - Large models (>1000 reactions)
255 - High sample throughput required
256 - GLPK solver available
257 - Memory constraints present
258
259 ### Choose OPTGP When:
260 - Theoretical sampling guarantees needed
261 - Smaller models (<500 reactions)
262 - Sufficient computational resources
263 - Publication-quality sampling required
264
265 ## Performance Optimization
266
267 ### CBS Optimization
268 - Install GLPK and swiglpk for maximum performance
269 - Increase batch number rather than samples per batch
270 - Monitor memory usage for large models
271
272 ### OPTGP Optimization
273 - Adjust thinning based on model size (100-500)
274 - Use parallel processing when available
275 - Consider warmup period for chain convergence
276
277 ### General Tips
278 - Use appropriate sample sizes (500-2000 per condition)
279 - Balance batches vs samples for memory management
280 - Set consistent random seeds for reproducibility
281
282 ## Quality Control
283
284 ### Convergence Assessment
285 - Compare statistics across batches
286 - Check for systematic trends in sampling
287 - Validate against known flux ranges
288
289 ### Statistical Validation
290 - Ensure adequate sample sizes (n≥100 recommended)
291 - Check for outliers and artifacts
292 - Validate against experimental flux data when available
293
294 ### Output Verification
295 - Confirm mass balance constraints satisfied
296 - Check thermodynamic consistency
297 - Verify biological plausibility of results
298
299 ## Integration Workflow
300
301 ### Upstream Tools
302 - [RAS to Bounds](ras-to-bounds.md) - Generate constrained bounds from RAS
303 - [Model Setting](metabolic-model-setting.md) - Extract model components
304
305 ### Downstream Tools
306 - [Flux to Map](flux-to-map.md) - Visualize flux distributions on metabolic maps
307 - [MAREA](marea.md) - Statistical analysis of flux differences
308
309 ### Typical Pipeline
310
311 ```bash
312 # 1. Generate sample-specific bounds
313 ras_to_bounds -td /opt/COBRAxy -ms ENGRO2 -ir ras.tsv -idop bounds/
314
315 # 2. Sample fluxes from constrained models
316 flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv \
317 -ni Sample1,Sample2,Sample3 -a CBS -ns 1000 \
318 -ot mean,quantiles -ota pFBA,FVA -idop fluxes/
319
320 # 3. Visualize results on metabolic maps
321 flux_to_map -td /opt/COBRAxy -input_data_fluxes fluxes/mean.csv \
322 -choice_map ENGRO2 -idop flux_maps/
323 ```
324
325 ## Troubleshooting
326
327 ### Common Issues
328
329 **CBS sampling fails**
330 - GLPK installation issues → Install GLPK and swiglpk
331 - Model infeasibility → Check bounds constraints
332 - Memory errors → Reduce samples per batch
333
334 **OPTGP convergence problems**
335 - Poor mixing → Increase thinning parameter
336 - Slow convergence → Extend sampling time
337 - Chain stuck → Check model feasibility
338
339 **Output files missing**
340 - Insufficient disk space → Check available storage
341 - Permission errors → Verify write permissions
342 - Invalid sample names → Check naming conventions
343
344 ### Error Messages
345
346 | Error | Cause | Solution |
347 |-------|-------|----------|
348 | "GLPK solver failed" | Missing GLPK/swiglpk | Install GLPK libraries |
349 | "Model infeasible" | Over-constrained bounds | Relax constraints or check model |
350 | "Sampling timeout" | Insufficient time/resources | Reduce sample size or increase resources |
351
352 ### Performance Issues
353
354 **Slow sampling**
355 - Use CBS instead of OPTGP for speed
356 - Reduce model size if possible
357 - Increase system resources
358
359 **Memory errors**
360 - Lower samples per batch
361 - Process samples sequentially
362 - Use more efficient data formats
363
364 **Disk space issues**
365 - Monitor output file sizes
366 - Clean intermediate files
367 - Use compressed formats when possible
368
369 ## Advanced Usage
370
371 ### Custom Sampling Parameters
372
373 For fine-tuning sampling behavior, advanced users can modify:
374 - Objective function generation (CBS)
375 - MCMC parameters (OPTGP)
376 - Convergence criteria
377 - Output precision and format
378
379 ### Parallel Processing
380
381 ```bash
382 # Split sampling across multiple cores/nodes
383 for i in {1..4}; do
384 flux_simulation -td /opt/COBRAxy -ms ENGRO2 \
385 -in subset_${i}_bounds.tsv \
386 -ni Batch${i} -a CBS -ns 250 \
387 -sd $((42 + i)) -idop batch_${i}/ &
388 done
389 wait
390 ```
391
392 ### Result Aggregation
393
394 Combine results from multiple simulation runs:
395
396 ```bash
397 # Merge statistics files
398 python merge_flux_results.py -i batch_*/mean.csv -o combined_mean.csv
399 ```
400
401 ## See Also
402
403 - [RAS to Bounds](ras-to-bounds.md) - Generate input constraints
404 - [Flux to Map](flux-to-map.md) - Visualize flux results
405 - [CBS Algorithm Documentation](../tutorials/cbs-algorithm.md)
406 - [OPTGP Algorithm Documentation](../tutorials/optgp-algorithm.md)