comparison COBRAxy/docs/tools/flux-simulation.md @ 547:73f2f7e2be17 draft

Uploaded
author francesco_lapi
date Tue, 28 Oct 2025 10:44:07 +0000
parents fcdbc81feb45
children
comparison
equal deleted inserted replaced
546:01147e83f43c 547:73f2f7e2be17
1 # Flux Simulation 1 # Flux Simulation
2 2
3 Sample metabolic fluxes using constraint-based modeling with CBS or OPTGP algorithms. 3 Simulate flux distributions from constraint-based metabolic models using different optimization or sampling strategies.
4 4
5 ## Overview 5 ## Overview
6 6
7 Flux Simulation performs constraint-based sampling of metabolic flux distributions from constrained models. It supports two sampling algorithms (CBS and OPTGP) and provides comprehensive flux statistics including mean, median, quantiles, pFBA, FVA, and sensitivity analysis. 7 Two types of analysis are available:
8 - **flux optimization**
9 - **flux sampling**
10
11 For flux optimization, one of the following methods can be performed: parsimonious-FBA, Flux Variability Analysis, Biomass sensitivity analysis (single reaction knock-out)
12 The objective function, a linear combination of fluxes weighted by specific coefficients, depends on the provided metabolic network.
8 13
9 ## Usage 14 For flux sampling, one of the following methods can be performed: CBS (Corner-based sampling), OPTGP (Improved Artificial Centering Hit-and-Run sampler).
10 15
11 ### Command Line 16 ## Galaxy Interface
17
18 In Galaxy: **COBRAxy → Flux Simulation**
19
20 1. Select model and upload bounds files
21 2. Choose algorithm (CBS/OPTGP) and sampling parameters
22 3. Click **Run tool**
23
24 ## Command-line console
12 25
13 ```bash 26 ```bash
14 flux_simulation -td /path/to/COBRAxy \ 27 flux_simulation -ms ENGRO2 \
15 -ms ENGRO2 \ 28 -in bounds/*.tsv \
16 -in bounds1.tsv,bounds2.tsv \ 29 -ni Sample1,Sample2,Sample3 \
17 -ni Sample1,Sample2 \
18 -a CBS \ 30 -a CBS \
19 -ns 1000 \ 31 -ns 1000 \
20 -nb 1 \ 32 -idop output/
21 -sd 42 \
22 -ot mean,median,quantiles \
23 -ota pFBA,FVA,sensitivity \
24 -idop flux_results/
25 ``` 33 ```
26
27 ### Galaxy Interface
28
29 Select "Flux Simulation" from the COBRAxy tool suite and configure sampling parameters through the web interface.
30 34
31 ## Parameters 35 ## Parameters
32 36
33 ### Required Parameters
34
35 | Parameter | Flag | Description |
36 |-----------|------|-------------|
37 | Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory |
38 | Input Bounds | `-in, --input` | Comma-separated list of bounds files |
39 | Sample Names | `-ni, --names` | Comma-separated sample names |
40 | Algorithm | `-a, --algorithm` | Sampling algorithm (CBS or OPTGP) |
41 | Number of Samples | `-ns, --n_samples` | Samples per batch |
42 | Number of Batches | `-nb, --n_batches` | Number of sampling batches |
43 | Random Seed | `-sd, --seed` | Random seed for reproducibility |
44 | Output Types | `-ot, --output_type` | Flux statistics to compute |
45
46 ### Model Parameters
47
48 | Parameter | Flag | Description | Default | 37 | Parameter | Flag | Description | Default |
49 |-----------|------|-------------|---------| 38 |-----------|------|-------------|---------|
50 | Model Selector | `-ms, --model_selector` | Built-in model (ENGRO2, Custom) | ENGRO2 | 39 | Model Selector | `-ms` | ENGRO2, Recon, or Custom | ENGRO2 |
51 | Custom Model | `-mo, --model` | Path to custom SBML model | - | 40 | Input Format | `--model_and_bounds` | Separate files (true) or complete models (false) | true |
52 | Model Name | `-mn, --model_name` | Custom model filename | - | 41 | Input Bounds | `-in` | Bounds files | - |
53 42 | Name Input | `-ni` | Sample names (comma-separated) | - |
54 ### Sampling Parameters 43 | Algorithm | `-a` | CBS or OPTGP | CBS |
55 44 | Num Samples | `-ns` | Number of samples per batch | 1000 |
56 | Parameter | Flag | Description | Default | 45 | Num Batches | `-nb` | Number of batches | 1 |
57 |-----------|------|-------------|---------| 46 | Thinning | `-th` | OPTGP thinning parameter | 100 |
58 | Algorithm | `-a, --algorithm` | CBS or OPTGP | - | 47 | Output Type | `-ot` | mean, median, quantiles, fluxes | mean,median |
59 | Thinning | `-th, --thinning` | OPTGP thinning parameter | 100 | 48 | FVA Optimality | `--perc_opt` | Optimality fraction (0.0-1.0) | 0.90 |
60 | Samples | `-ns, --n_samples` | Samples per batch | - | 49 | Output Path | `-idop` | Output directory | flux_simulation/ |
61 | Batches | `-nb, --n_batches` | Number of batches | - |
62 | Seed | `-sd, --seed` | Random seed | - |
63
64 ### Output Parameters
65
66 | Parameter | Flag | Description | Options |
67 |-----------|------|-------------|---------|
68 | Output Types | `-ot, --output_type` | Flux statistics | mean,median,quantiles,fluxes |
69 | Analysis Types | `-ota, --output_type_analysis` | Additional analyses | pFBA,FVA,sensitivity |
70 | Output Path | `-idop, --output_path` | Results directory | flux_simulation/ |
71 | Output Log | `-ol, --out_log` | Log file path | - |
72 50
73 ## Algorithms 51 ## Algorithms
74 52
75 ### CBS (Constraint-Based Sampling) 53 ### CBS (Corner-Based Sampling)
76 54 - Random objective optimization
77 **Method**: Random objective function optimization 55 - Requires GLPK (recommended) or COBRApy solver
78 - Generates random linear combinations of reactions
79 - Optimizes using LP solver (GLPK preferred, COBRApy fallback)
80 - Fast and memory-efficient
81 - Suitable for large models 56 - Suitable for large models
82 57
83 **Advantages**: 58 ### OPTGP (MCMC Sampling)
84 - High performance with GLPK 59 - Markov Chain Monte Carlo
85 - Good coverage of solution space 60 - Uniform sampling guarantee
86 - Robust to model size 61 - Requires thinning parameter
87 62
88 ### OPTGP (Optimal Growth Perturbation) 63 ## Input Modes
89 64
90 **Method**: MCMC-based sampling 65 The tool supports two different input formats:
91 - Markov Chain Monte Carlo with growth optimization
92 - Requires thinning to reduce autocorrelation
93 - More computationally intensive
94 - Better theoretical guarantees
95 66
96 **Advantages**: 67 ### Mode 1: Model + Bounds (default, `--model_and_bounds true`)
97 - Uniform sampling guarantee 68 Upload one base model + multiple bound files (one per sample/context):
98 - Well-established method 69 - Base model: Tabular file with reaction structure (from Import Metabolic Model)
99 - Good for smaller models 70 - Bounds: Individual TSV files with sample-specific constraints (from RAS to Bounds)
71 - Use when you have RAS-derived bounds for multiple samples
100 72
101 ## Input Formats 73 ### Mode 2: Multiple Complete Models (`--model_and_bounds false`)
74 Upload pre-built model files, each already containing integrated bounds:
75 - Each file is a complete tabular model with reaction structure + bounds
76 - Use when models are already prepared with specific constraints
77 - Useful for comparing different modelling scenarios
102 78
103 ### Bounds Files 79 ## Input Format
104 80
105 Tab-separated format with reaction bounds: 81 Bounds files (TSV):
106 82
107 ``` 83 ```
108 Reaction lower_bound upper_bound 84 reaction lower_bound upper_bound
109 R00001 -1000.0 1250.5 85 R00001 -125.0 125.0
110 R00002 -650.2 1000.0 86 R00002 -65.0 65.0
111 R00003 0.0 2150.8
112 ``` 87 ```
113 88
114 Multiple bounds files can be processed simultaneously by providing comma-separated paths. 89 **File Format Notes:**
90 - Use **tab-separated** values (TSV)
91 - Column headers must be: reaction, lower_bound, upper_bound
92 - Reaction IDs must match model reaction IDs
93 - Numeric values for bounds
115 94
116 ### Custom Model File (Optional) 95 ## Sampling Outputs
117 96
118 SBML format metabolic model compatible with COBRApy. 97 The tool can generate different types of output from flux sampling:
119 98
120 ## Output Formats 99 | Output Type | Description |
100 |-------------|-------------|
101 | **mean** | Mean flux across all samples |
102 | **median** | Median flux across all samples |
103 | **quantiles** | 25th, 50th, 75th percentiles |
104 | **fluxes** | Complete flux distributions (all samples, all reactions) |
121 105
122 ### Flux Statistics 106 **Note**: The `fluxes` output can be very large for many samples. Use summary statistics (mean/median/quantiles) unless you need the complete distribution.
123 107
124 #### Mean Fluxes (`mean.csv`) 108 ## Optimization Methods
125 ```
126 Reaction Sample1 Sample2 Sample3
127 R00001 15.23 -8.45 22.1
128 R00002 0.0 12.67 -5.3
129 R00003 45.8 38.2 51.7
130 ```
131 109
132 #### Median Fluxes (`median.csv`) 110 In alternative to sampling, the tool can perform optimization analyses:
133 ```
134 Reaction Sample1 Sample2 Sample3
135 R00001 14.1 -7.8 21.5
136 R00002 0.0 11.9 -4.8
137 R00003 44.2 37.1 50.3
138 ```
139 111
140 #### Quantiles (`quantiles.csv`) 112 | Method | Description | Output |
141 ``` 113 |--------|-------------|--------|
142 Reaction Sample1_q1 Sample1_q2 Sample1_q3 Sample2_q1 ... 114 | **FVA** | Flux Variability Analysis | Min/max flux ranges for each reaction |
143 R00001 10.5 14.1 18.7 -12.3 ... 115 | **pFBA** | Parsimonious FBA | Flux distribution with minimal total flux |
144 R00002 -2.1 0.0 1.8 8.9 ... 116 | **sensitivity** | Reaction knockout analysis | Biomass impact of single reaction deletions |
145 R00003 38.9 44.2 49.8 32.1 ...
146 ```
147 117
148 ### Additional Analyses 118 ### FVA Optimality Fraction
149 119
150 #### pFBA (`pFBA.csv`) 120 The `--perc_opt` parameter (default: 0.90) controls the optimality constraint for FVA:
151 Parsimonious Flux Balance Analysis results: 121 - **1.0**: Only optimal solutions (100% of maximum biomass)
152 ``` 122 - **0.90**: Allow suboptimal solutions (≥90% of maximum biomass)
153 Reaction Sample1 Sample2 Sample3 123 - **Lower values**: Explore broader flux ranges
154 R00001 12.5 -6.7 19.3
155 R00002 0.0 8.9 -3.2
156 R00003 41.2 35.8 47.9
157 ```
158 124
159 #### FVA (`FVA.csv`) 125 ## Output
160 Flux Variability Analysis bounds:
161 ```
162 Reaction Sample1_min Sample1_max Sample2_min Sample2_max ...
163 R00001 -5.2 35.8 -25.3 8.7 ...
164 R00002 -8.9 8.9 0.0 28.4 ...
165 R00003 15.6 78.3 10.2 65.9 ...
166 ```
167 126
168 #### Sensitivity (`sensitivity.csv`) 127 - `mean.csv`: Mean flux values
169 Single reaction deletion effects: 128 - `median.csv`: Median flux values
170 ``` 129 - `quantiles.csv`: Flux quantiles (25%, 50%, 75%)
171 Reaction Sample1 Sample2 Sample3 130 - `fluxes/`: Complete flux distributions (if requested)
172 R00001 0.98 0.95 0.97 131 - `fva.csv`: FVA results (if requested)
173 R00002 1.0 0.87 1.0 132 - `pfba.csv`: pFBA results (if requested)
174 R00003 0.23 0.19 0.31 133 - `sensitivity.csv`: Knockout sensitivity analysis (if requested)
175 ``` 134 - `*.log`: Processing log
176 135
177 ## Examples 136 ## Examples
178 137
179 ### Basic CBS Sampling 138 ### Basic CBS Sampling
180 139
181 ```bash 140 ```bash
182 # Simple CBS sampling with statistics 141 flux_simulation -ms ENGRO2 \
183 flux_simulation -td /opt/COBRAxy \ 142 -in bounds/*.tsv \
184 -ms ENGRO2 \
185 -in sample1_bounds.tsv,sample2_bounds.tsv \
186 -ni Sample1,Sample2 \ 143 -ni Sample1,Sample2 \
187 -a CBS \ 144 -a CBS \
188 -ns 500 \ 145 -ns 1000 \
189 -nb 2 \ 146 -idop output/
190 -sd 42 \
191 -ot mean,median \
192 -ota pFBA \
193 -idop cbs_results/
194 ``` 147 ```
195 148
196 ### Comprehensive OPTGP Analysis 149 ### OPTGP Sampling
197 150
198 ```bash 151 ```bash
199 # Full analysis with OPTGP 152 flux_simulation -ms ENGRO2 \
200 flux_simulation -td /opt/COBRAxy \
201 -ms ENGRO2 \
202 -in bounds/*.tsv \ 153 -in bounds/*.tsv \
203 -ni Sample1,Sample2,Sample3,Control1,Control2 \ 154 -ni Sample1,Sample2 \
204 -a OPTGP \ 155 -a OPTGP \
156 -ns 1000 \
205 -th 200 \ 157 -th 200 \
206 -ns 1000 \ 158 -idop output/
207 -nb 1 \
208 -sd 123 \
209 -ot mean,median,quantiles,fluxes \
210 -ota pFBA,FVA,sensitivity \
211 -idop comprehensive_analysis/ \
212 -ol sampling.log
213 ``` 159 ```
214 160
215 ### Custom Model Sampling 161 ### Custom Model with CBS Sampling
216 162
217 ```bash 163 ```bash
218 # Use custom model with CBS 164 flux_simulation -ms Custom \
219 flux_simulation -td /opt/COBRAxy \ 165 -mo custom_model.xml \
220 -ms Custom \ 166 -in bounds/*.tsv \
221 -mo models/tissue_specific.xml \ 167 -ni Sample1 \
222 -mn tissue_specific.xml \
223 -in patient_bounds.tsv \
224 -ni PatientA \
225 -a CBS \ 168 -a CBS \
226 -ns 2000 \ 169 -ns 2000 \
227 -nb 5 \ 170 -idop output/
228 -sd 456 \
229 -ot mean,quantiles \
230 -ota FVA,sensitivity \
231 -idop patient_analysis/
232 ```
233
234 ### Batch Processing Multiple Conditions
235
236 ```bash
237 # Process multiple experimental conditions
238 flux_simulation -td /opt/COBRAxy \
239 -ms ENGRO2 \
240 -in ctrl1.tsv,ctrl2.tsv,treat1.tsv,treat2.tsv \
241 -ni Control1,Control2,Treatment1,Treatment2 \
242 -a CBS \
243 -ns 800 \
244 -nb 3 \
245 -sd 789 \
246 -ot mean,median,fluxes \
247 -ota pFBA,FVA \
248 -idop batch_conditions/
249 ```
250
251 ## Algorithm Selection Guide
252
253 ### Choose CBS When:
254 - Large models (>1000 reactions)
255 - High sample throughput required
256 - GLPK solver available
257 - Memory constraints present
258
259 ### Choose OPTGP When:
260 - Theoretical sampling guarantees needed
261 - Smaller models (<500 reactions)
262 - Sufficient computational resources
263 - Publication-quality sampling required
264
265 ## Performance Optimization
266
267 ### CBS Optimization
268 - Install GLPK and swiglpk for maximum performance
269 - Increase batch number rather than samples per batch
270 - Monitor memory usage for large models
271
272 ### OPTGP Optimization
273 - Adjust thinning based on model size (100-500)
274 - Use parallel processing when available
275 - Consider warmup period for chain convergence
276
277 ### General Tips
278 - Use appropriate sample sizes (500-2000 per condition)
279 - Balance batches vs samples for memory management
280 - Set consistent random seeds for reproducibility
281
282 ## Quality Control
283
284 ### Convergence Assessment
285 - Compare statistics across batches
286 - Check for systematic trends in sampling
287 - Validate against known flux ranges
288
289 ### Statistical Validation
290 - Ensure adequate sample sizes (n≥100 recommended)
291 - Check for outliers and artifacts
292 - Validate against experimental flux data when available
293
294 ### Output Verification
295 - Confirm mass balance constraints satisfied
296 - Check thermodynamic consistency
297 - Verify biological plausibility of results
298
299 ## Integration Workflow
300
301 ### Upstream Tools
302 - [RAS to Bounds](ras-to-bounds.md) - Generate constrained bounds from RAS
303 - [Import Metabolic Model](import-metabolic-model.md) - Extract model components
304
305 ### Downstream Tools
306 - [Flux to Map](flux-to-map.md) - Visualize flux distributions on metabolic maps
307 - [MAREA](marea.md) - Statistical analysis of flux differences
308
309 ### Typical Pipeline
310
311 ```bash
312 # 1. Generate sample-specific bounds
313 ras_to_bounds -td /opt/COBRAxy -ms ENGRO2 -ir ras.tsv -idop bounds/
314
315 # 2. Sample fluxes from constrained models
316 flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv \
317 -ni Sample1,Sample2,Sample3 -a CBS -ns 1000 \
318 -ot mean,quantiles -ota pFBA,FVA -idop fluxes/
319
320 # 3. Visualize results on metabolic maps
321 flux_to_map -td /opt/COBRAxy -input_data_fluxes fluxes/mean.csv \
322 -choice_map ENGRO2 -idop flux_maps/
323 ``` 171 ```
324 172
325 ## Troubleshooting 173 ## Troubleshooting
326 174
327 ### Common Issues 175 | Error | Solution |
328 176 |-------|----------|
329 **CBS sampling fails** 177 | "GLPK solver failed" | Install GLPK libraries |
330 - GLPK installation issues → Install GLPK and swiglpk 178 | "Model infeasible" | Check bounds constraints |
331 - Model infeasibility → Check bounds constraints
332 - Memory errors → Reduce samples per batch
333
334 **OPTGP convergence problems**
335 - Poor mixing → Increase thinning parameter
336 - Slow convergence → Extend sampling time
337 - Chain stuck → Check model feasibility
338
339 **Output files missing**
340 - Insufficient disk space → Check available storage
341 - Permission errors → Verify write permissions
342 - Invalid sample names → Check naming conventions
343
344 ### Error Messages
345
346 | Error | Cause | Solution |
347 |-------|-------|----------|
348 | "GLPK solver failed" | Missing GLPK/swiglpk | Install GLPK libraries |
349 | "Model infeasible" | Over-constrained bounds | Relax constraints or check model |
350 | "Sampling timeout" | Insufficient time/resources | Reduce sample size or increase resources |
351
352 ### Performance Issues
353
354 **Slow sampling**
355 - Use CBS instead of OPTGP for speed
356 - Reduce model size if possible
357 - Increase system resources
358
359 **Memory errors**
360 - Lower samples per batch
361 - Process samples sequentially
362 - Use more efficient data formats
363
364 **Disk space issues**
365 - Monitor output file sizes
366 - Clean intermediate files
367 - Use compressed formats when possible
368
369 ## Advanced Usage
370
371 ### Custom Sampling Parameters
372
373 For fine-tuning sampling behavior, advanced users can modify:
374 - Objective function generation (CBS)
375 - MCMC parameters (OPTGP)
376 - Convergence criteria
377 - Output precision and format
378
379 ### Parallel Processing
380
381 ```bash
382 # Split sampling across multiple cores/nodes
383 for i in {1..4}; do
384 flux_simulation -td /opt/COBRAxy -ms ENGRO2 \
385 -in subset_${i}_bounds.tsv \
386 -ni Batch${i} -a CBS -ns 250 \
387 -sd $((42 + i)) -idop batch_${i}/ &
388 done
389 wait
390 ```
391
392 ### Result Aggregation
393
394 Combine results from multiple simulation runs:
395
396 ```bash
397 # Merge statistics files
398 python merge_flux_results.py -i batch_*/mean.csv -o combined_mean.csv
399 ```
400 179
401 ## See Also 180 ## See Also
402 181
403 - [RAS to Bounds](ras-to-bounds.md) - Generate input constraints 182 - [RAS to Bounds](tools/ras-to-bounds)
404 - [Flux to Map](flux-to-map.md) - Visualize flux results 183 - [Flux to Map](tools/flux-to-map)
405 - [CBS Algorithm Documentation](/tutorials/cbs-algorithm.md) 184 - [Built-in Models](reference/built-in-models)
406 - [OPTGP Algorithm Documentation](/tutorials/optgp-algorithm.md)