comparison COBRAxy/docs/tools/flux-to-map.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents
children fcdbc81feb45
comparison
equal deleted inserted replaced
491:7a413a5ec566 492:4ed95023af20
1 # Flux to Map
2
3 Visualize metabolic flux data on pathway maps with statistical analysis and color coding.
4
5 ## Overview
6
7 Flux to Map performs statistical analysis on flux distribution data and generates color-coded metabolic pathway maps. It compares flux values between sample groups and highlights significantly different reactions with appropriate colors and line weights.
8
9 ## Usage
10
11 ### Command Line
12
13 ```bash
14 flux_to_map -td /path/to/COBRAxy \
15 -input_data_fluxes flux_data.tsv \
16 -input_class_fluxes sample_groups.tsv \
17 -comparison manyvsmany \
18 -test ks \
19 -pv 0.05 \
20 -fc 1.5 \
21 -choice_map ENGRO2 \
22 -generate_svg true \
23 -generate_pdf true \
24 -idop flux_maps/
25 ```
26
27 ### Galaxy Interface
28
29 Select "Flux to Map" from the COBRAxy tool suite and configure flux analysis and visualization parameters.
30
31 ## Parameters
32
33 ### Required Parameters
34
35 | Parameter | Flag | Description |
36 |-----------|------|-------------|
37 | Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory |
38
39 ### Data Input Parameters
40
41 | Parameter | Flag | Description | Default |
42 |-----------|------|-------------|---------|
43 | Flux Data | `-idf, --input_data_fluxes` | Flux values TSV file | - |
44 | Flux Classes | `-icf, --input_class_fluxes` | Sample group labels for fluxes | - |
45 | Multiple Flux Files | `-idsf, --input_datas_fluxes` | Multiple flux datasets (space-separated) | - |
46 | Flux Names | `-naf, --names_fluxes` | Names for multiple flux datasets | - |
47 | Analysis Option | `-op, --option` | Analysis mode (datasets or dataset_class) | - |
48
49 ### Statistical Parameters
50
51 | Parameter | Flag | Description | Default |
52 |-----------|------|-------------|---------|
53 | Comparison Type | `-co, --comparison` | Statistical comparison mode | manyvsmany |
54 | Statistical Test | `-te, --test` | Statistical test method | ks |
55 | P-Value Threshold | `-pv, --pValue` | Significance threshold | 0.1 |
56 | Adjusted P-values | `-adj, --adjusted` | Apply FDR correction | false |
57 | Fold Change | `-fc, --fChange` | Minimum fold change threshold | 1.5 |
58
59 ### Visualization Parameters
60
61 | Parameter | Flag | Description | Default |
62 |-----------|------|-------------|---------|
63 | Map Choice | `-mc, --choice_map` | Built-in metabolic map | HMRcore |
64 | Custom Map | `-cm, --custom_map` | Path to custom SVG map | - |
65 | Generate SVG | `-gs, --generate_svg` | Create SVG output | true |
66 | Generate PDF | `-gp, --generate_pdf` | Create PDF output | true |
67 | Color Map | `-colorm, --color_map` | Color scheme (jet, viridis) | - |
68 | Output Directory | `-idop, --output_path` | Results directory | result/ |
69
70 ### Advanced Parameters
71
72 | Parameter | Flag | Description | Default |
73 |-----------|------|-------------|---------|
74 | Output Log | `-ol, --out_log` | Log file path | - |
75 | Control Sample | `-on, --control` | Control group identifier | - |
76
77 ## Input Formats
78
79 ### Flux Data File
80
81 Tab-separated format with reactions as rows and samples as columns:
82
83 ```
84 Reaction Sample1 Sample2 Sample3 Control1 Control2
85 R00001 15.23 -8.45 22.1 12.8 14.2
86 R00002 0.0 12.67 -5.3 8.9 7.4
87 R00003 45.8 38.2 51.7 42.1 39.8
88 R00004 -12.4 -15.8 -9.2 -11.5 -13.1
89 ```
90
91 ### Sample Class File
92
93 Group assignment for statistical comparisons:
94
95 ```
96 Sample Class
97 Sample1 Treatment
98 Sample2 Treatment
99 Sample3 Treatment
100 Control1 Control
101 Control2 Control
102 ```
103
104 ### Multiple Dataset Format
105
106 When using multiple flux files, provide space-separated paths and corresponding names:
107
108 ```bash
109 -idsf "dataset1_flux.tsv dataset2_flux.tsv dataset3_flux.tsv"
110 -naf "Condition_A Condition_B Condition_C"
111 ```
112
113 ## Statistical Analysis
114
115 ### Comparison Types
116
117 #### manyvsmany
118 Compare all possible group pairs:
119 - Treatment vs Control
120 - Condition_A vs Condition_B
121 - Condition_A vs Condition_C
122 - Condition_B vs Condition_C
123
124 #### onevsrest
125 Compare each group against all others combined:
126 - Treatment vs (Control + Other)
127 - Control vs (Treatment + Other)
128
129 #### onevsmany
130 Compare one reference group against each other group:
131 - Control vs Treatment
132 - Control vs Condition_A
133 - Control vs Condition_B
134
135 ### Statistical Tests
136
137 | Test | Description | Best For |
138 |------|-------------|----------|
139 | `ks` | Kolmogorov-Smirnov | Non-parametric, distribution-free |
140 | `ttest_p` | Paired t-test | Related samples, normal distributions |
141 | `ttest_ind` | Independent t-test | Independent samples, normal distributions |
142 | `wilcoxon` | Wilcoxon signed-rank | Non-parametric paired comparisons |
143 | `mw` | Mann-Whitney U | Non-parametric independent comparisons |
144
145 ### Significance Assessment
146
147 Reactions are considered significant when:
148 1. **P-value** ≤ specified threshold (default: 0.1)
149 2. **Fold change** ≥ specified threshold (default: 1.5)
150 3. **FDR correction** (if enabled) maintains significance
151
152 ## Map Visualization
153
154 ### Built-in Maps
155
156 #### HMRcore (Default)
157 - **Scope**: Core human metabolic network
158 - **Reactions**: ~300 essential reactions
159 - **Coverage**: Central carbon, amino acid, lipid metabolism
160 - **Use Case**: General overview, publication figures
161
162 #### ENGRO2
163 - **Scope**: Extended human genome-scale reconstruction
164 - **Reactions**: ~2,000 reactions
165 - **Coverage**: Comprehensive metabolic network
166 - **Use Case**: Detailed analysis, specialized tissues
167
168 #### Custom Maps
169 User-provided SVG files with reaction elements:
170 ```xml
171 <rect id="R00001" class="reaction" fill="gray" stroke="black"/>
172 <path id="R00002" class="reaction" fill="gray" stroke="black"/>
173 ```
174
175 ### Color Coding Scheme
176
177 #### Significance Colors
178 - **Red Gradient**: Significantly upregulated (positive fold change)
179 - **Blue Gradient**: Significantly downregulated (negative fold change)
180 - **Gray**: Not statistically significant
181 - **White**: No data available
182
183 #### Visual Elements
184 - **Line Width**: Proportional to fold change magnitude
185 - **Color Intensity**: Proportional to statistical significance (-log10 p-value)
186 - **Transparency**: Indicates confidence level
187
188 ### Color Maps
189
190 #### Jet (Default)
191 - High contrast color transitions
192 - Blue (low) → Green → Yellow → Red (high)
193 - Good for identifying extreme values
194
195 #### Viridis
196 - Perceptually uniform color scale
197 - Colorblind-friendly
198 - Purple (low) → Blue → Green → Yellow (high)
199
200 ## Output Files
201
202 ### Statistical Results
203 - `flux_statistics.tsv`: P-values, fold changes, test statistics for all reactions
204 - `significant_fluxes.tsv`: Only reactions meeting significance criteria
205 - `comparison_summary.txt`: Analysis parameters and summary statistics
206
207 ### Visualizations
208 - `flux_map.svg`: Interactive color-coded pathway map
209 - `flux_map.pdf`: High-resolution PDF (if requested)
210 - `flux_map.png`: Raster image (if requested)
211 - `legend.svg`: Color scale and statistical significance legend
212
213 ### Analysis Files
214 - `fold_changes.tsv`: Detailed fold change calculations
215 - `group_statistics.tsv`: Per-group summary statistics
216 - `comparison_matrix.tsv`: Pairwise comparison results
217
218 ## Examples
219
220 ### Basic Flux Comparison
221
222 ```bash
223 # Compare treatment vs control fluxes
224 flux_to_map -td /opt/COBRAxy \
225 -idf treatment_vs_control_fluxes.tsv \
226 -icf sample_groups.tsv \
227 -co manyvsmany \
228 -te ks \
229 -pv 0.05 \
230 -fc 2.0 \
231 -mc HMRcore \
232 -gs true \
233 -gp true \
234 -idop flux_comparison/
235 ```
236
237 ### Multiple Condition Analysis
238
239 ```bash
240 # Compare multiple experimental conditions
241 flux_to_map -td /opt/COBRAxy \
242 -idsf "cond1_flux.tsv cond2_flux.tsv cond3_flux.tsv" \
243 -naf "Control Treatment1 Treatment2" \
244 -co onevsrest \
245 -te wilcoxon \
246 -adj true \
247 -pv 0.01 \
248 -fc 1.8 \
249 -mc ENGRO2 \
250 -colorm viridis \
251 -idop multi_condition_flux/
252 ```
253
254 ### Custom Map Visualization
255
256 ```bash
257 # Use tissue-specific custom map
258 flux_to_map -td /opt/COBRAxy \
259 -idf liver_flux_data.tsv \
260 -icf liver_conditions.tsv \
261 -co manyvsmany \
262 -te ttest_ind \
263 -pv 0.05 \
264 -fc 1.5 \
265 -cm maps/liver_specific_map.svg \
266 -gs true \
267 -gp true \
268 -idop liver_flux_analysis/ \
269 -ol liver_analysis.log
270 ```
271
272 ### High-Throughput Analysis
273
274 ```bash
275 # Process multiple datasets with stringent criteria
276 flux_to_map -td /opt/COBRAxy \
277 -idsf "exp1.tsv exp2.tsv exp3.tsv exp4.tsv" \
278 -naf "Exp1 Exp2 Exp3 Exp4" \
279 -co manyvsmany \
280 -te ks \
281 -adj true \
282 -pv 0.001 \
283 -fc 3.0 \
284 -mc HMRcore \
285 -colorm jet \
286 -gs true \
287 -gp true \
288 -idop high_throughput_flux/
289 ```
290
291 ## Quality Control
292
293 ### Data Validation
294
295 #### Pre-analysis Checks
296 - Verify flux value distributions (check for outliers)
297 - Ensure sample names match between data and class files
298 - Validate reaction coverage across samples
299 - Check for missing values and their patterns
300
301 #### Statistical Validation
302 - Assess normality assumptions for parametric tests
303 - Verify adequate sample sizes per group (n≥3 recommended)
304 - Check variance homogeneity between groups
305 - Evaluate multiple testing burden
306
307 ### Result Interpretation
308
309 #### Biological Validation
310 - Compare results with known pathway activities
311 - Check for pathway coherence (related reactions should cluster)
312 - Validate against literature or experimental evidence
313 - Assess metabolic network connectivity
314
315 #### Technical Validation
316 - Compare results across different statistical tests
317 - Check sensitivity to parameter changes
318 - Validate fold change calculations
319 - Verify map element correspondence
320
321 ## Tips and Best Practices
322
323 ### Data Preparation
324 - **Normalization**: Ensure consistent flux units across samples
325 - **Filtering**: Remove reactions with excessive missing values (>50%)
326 - **Outlier Detection**: Identify and handle extreme flux values
327 - **Batch Effects**: Account for technical variation between experiments
328
329 ### Statistical Considerations
330 - Use FDR correction for multiple comparisons (`-adj true`)
331 - Choose appropriate statistical tests based on data distribution
332 - Consider effect size (fold change) alongside significance
333 - Validate results with independent datasets when possible
334
335 ### Visualization Optimization
336 - Select appropriate color maps for your audience
337 - Use high fold change thresholds (>2.0) for cleaner maps
338 - Export both SVG (editable) and PDF (publication) formats
339 - Include comprehensive legends and annotations
340
341 ### Performance Tips
342 - Use HMRcore for faster processing and clearer visualizations
343 - Reduce dataset size for initial exploratory analysis
344 - Process large datasets in batches if memory constrained
345 - Cache intermediate results for parameter optimization
346
347 ## Integration Workflow
348
349 ### Upstream Tools
350 - [Flux Simulation](flux-simulation.md) - Generate flux distributions for comparison
351 - [MAREA](marea.md) - Alternative analysis pathway for RAS/RPS data
352
353 ### Downstream Analysis
354 - Export results to statistical software (R, Python) for advanced analysis
355 - Integrate with pathway databases (KEGG, Reactome)
356 - Combine with other omics data for systems-level insights
357
358 ### Typical Pipeline
359
360 ```bash
361 # 1. Generate flux samples from constrained models
362 flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv \
363 -ni Sample1,Sample2,Control1,Control2 -a CBS \
364 -ot mean -idop fluxes/
365
366 # 2. Analyze and visualize flux differences
367 flux_to_map -td /opt/COBRAxy -idf fluxes/mean.csv \
368 -icf sample_groups.tsv -co manyvsmany -te ks \
369 -mc HMRcore -gs true -gp true -idop flux_maps/
370
371 # 3. Further analysis with custom scripts
372 python analyze_flux_results.py -i flux_maps/ -o final_results/
373 ```
374
375 ## Troubleshooting
376
377 ### Common Issues
378
379 **No significant reactions found**
380 - Lower p-value threshold (`-pv 0.2`)
381 - Reduce fold change requirement (`-fc 1.2`)
382 - Check sample group definitions and sizes
383 - Verify flux data quality and normalization
384
385 **Map rendering problems**
386 - Check SVG map file integrity and format
387 - Verify reaction ID matching between data and map
388 - Ensure sufficient system memory for large maps
389 - Validate XML structure of custom maps
390
391 **Statistical test failures**
392 - Check data distribution assumptions
393 - Verify sufficient sample sizes per group
394 - Consider alternative non-parametric tests
395 - Examine variance patterns between groups
396
397 ### Error Messages
398
399 | Error | Cause | Solution |
400 |-------|-------|----------|
401 | "Map file not found" | Missing/invalid map path | Check file location and format |
402 | "No matching reactions" | ID mismatch between data and map | Verify reaction naming consistency |
403 | "Insufficient data" | Too few samples per group | Increase sample sizes or merge groups |
404 | "Memory allocation failed" | Large dataset/map combination | Reduce data size or increase system memory |
405
406 ### Performance Issues
407
408 **Slow processing**
409 - Use HMRcore instead of ENGRO2 for faster rendering
410 - Reduce dataset size for testing
411 - Process subsets of reactions separately
412 - Monitor system resource usage
413
414 **Large output files**
415 - Use compressed formats when possible
416 - Reduce map resolution for preliminary analysis
417 - Export only essential output formats
418 - Clean temporary files regularly
419
420 ## Advanced Usage
421
422 ### Custom Statistical Functions
423
424 Advanced users can implement custom statistical tests by modifying the analysis functions:
425
426 ```python
427 def custom_test(group1, group2):
428 # Custom statistical test implementation
429 statistic, pvalue = your_test_function(group1, group2)
430 return statistic, pvalue
431 ```
432
433 ### Batch Processing Script
434
435 Process multiple experiments systematically:
436
437 ```bash
438 #!/bin/bash
439 experiments=("exp1" "exp2" "exp3" "exp4")
440 for exp in "${experiments[@]}"; do
441 flux_to_map -td /opt/COBRAxy \
442 -idf "data/${exp}_flux.tsv" \
443 -icf "data/${exp}_classes.tsv" \
444 -co manyvsmany -te ks -pv 0.05 \
445 -mc HMRcore -gs true -gp true \
446 -idop "results/${exp}/"
447 done
448 ```
449
450 ### Result Aggregation
451
452 Combine results across multiple analyses:
453
454 ```bash
455 # Merge significant reactions across experiments
456 python merge_flux_results.py \
457 -i results/exp*/significant_fluxes.tsv \
458 -o combined_significant_reactions.tsv \
459 --method intersection
460 ```
461
462 ## See Also
463
464 - [Flux Simulation](flux-simulation.md) - Generate input flux distributions
465 - [MAREA](marea.md) - Alternative pathway analysis approach
466 - [Custom Map Creation Guide](../tutorials/custom-map-creation.md)
467 - [Statistical Methods Reference](../tutorials/statistical-methods.md)