view COBRAxy/docs/tools/flux-to-map.md @ 509:5956dcf94277 draft default tip

Uploaded
author francesco_lapi
date Wed, 01 Oct 2025 15:34:21 +0000
parents 4ed95023af20
children
line wrap: on
line source

# Flux to Map

Visualize metabolic flux data on pathway maps with statistical analysis and color coding.

## Overview

Flux to Map performs statistical analysis on flux distribution data and generates color-coded metabolic pathway maps. It compares flux values between sample groups and highlights significantly different reactions with appropriate colors and line weights.

## Usage

### Command Line

```bash
flux_to_map -td /path/to/COBRAxy \
            -input_data_fluxes flux_data.tsv \
            -input_class_fluxes sample_groups.tsv \
            -comparison manyvsmany \
            -test ks \
            -pv 0.05 \
            -fc 1.5 \
            -choice_map ENGRO2 \
            -generate_svg true \
            -generate_pdf true \
            -idop flux_maps/
```

### Galaxy Interface

Select "Flux to Map" from the COBRAxy tool suite and configure flux analysis and visualization parameters.

## Parameters

### Required Parameters

| Parameter | Flag | Description |
|-----------|------|-------------|
| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory |

### Data Input Parameters

| Parameter | Flag | Description | Default |
|-----------|------|-------------|---------|
| Flux Data | `-idf, --input_data_fluxes` | Flux values TSV file | - |
| Flux Classes | `-icf, --input_class_fluxes` | Sample group labels for fluxes | - |
| Multiple Flux Files | `-idsf, --input_datas_fluxes` | Multiple flux datasets (space-separated) | - |
| Flux Names | `-naf, --names_fluxes` | Names for multiple flux datasets | - |
| Analysis Option | `-op, --option` | Analysis mode (datasets or dataset_class) | - |

### Statistical Parameters

| Parameter | Flag | Description | Default |
|-----------|------|-------------|---------|
| Comparison Type | `-co, --comparison` | Statistical comparison mode | manyvsmany |
| Statistical Test | `-te, --test` | Statistical test method | ks |
| P-Value Threshold | `-pv, --pValue` | Significance threshold | 0.1 |
| Adjusted P-values | `-adj, --adjusted` | Apply FDR correction | false |
| Fold Change | `-fc, --fChange` | Minimum fold change threshold | 1.5 |

### Visualization Parameters

| Parameter | Flag | Description | Default |
|-----------|------|-------------|---------|
| Map Choice | `-mc, --choice_map` | Built-in metabolic map | HMRcore |
| Custom Map | `-cm, --custom_map` | Path to custom SVG map | - |
| Generate SVG | `-gs, --generate_svg` | Create SVG output | true |
| Generate PDF | `-gp, --generate_pdf` | Create PDF output | true |
| Color Map | `-colorm, --color_map` | Color scheme (jet, viridis) | - |
| Output Directory | `-idop, --output_path` | Results directory | result/ |

### Advanced Parameters

| Parameter | Flag | Description | Default |
|-----------|------|-------------|---------|
| Output Log | `-ol, --out_log` | Log file path | - |
| Control Sample | `-on, --control` | Control group identifier | - |

## Input Formats

### Flux Data File

Tab-separated format with reactions as rows and samples as columns:

```
Reaction	Sample1	Sample2	Sample3	Control1	Control2
R00001	15.23	-8.45	22.1	12.8	14.2
R00002	0.0	12.67	-5.3	8.9	7.4
R00003	45.8	38.2	51.7	42.1	39.8
R00004	-12.4	-15.8	-9.2	-11.5	-13.1
```

### Sample Class File

Group assignment for statistical comparisons:

```
Sample	Class
Sample1	Treatment
Sample2	Treatment  
Sample3	Treatment
Control1	Control
Control2	Control
```

### Multiple Dataset Format

When using multiple flux files, provide space-separated paths and corresponding names:

```bash
-idsf "dataset1_flux.tsv dataset2_flux.tsv dataset3_flux.tsv"
-naf "Condition_A Condition_B Condition_C"
```

## Statistical Analysis

### Comparison Types

#### manyvsmany
Compare all possible group pairs:
- Treatment vs Control
- Condition_A vs Condition_B
- Condition_A vs Condition_C
- Condition_B vs Condition_C

#### onevsrest
Compare each group against all others combined:
- Treatment vs (Control + Other)
- Control vs (Treatment + Other)

#### onevsmany
Compare one reference group against each other group:
- Control vs Treatment
- Control vs Condition_A
- Control vs Condition_B

### Statistical Tests

| Test | Description | Best For |
|------|-------------|----------|
| `ks` | Kolmogorov-Smirnov | Non-parametric, distribution-free |
| `ttest_p` | Paired t-test | Related samples, normal distributions |
| `ttest_ind` | Independent t-test | Independent samples, normal distributions |
| `wilcoxon` | Wilcoxon signed-rank | Non-parametric paired comparisons |
| `mw` | Mann-Whitney U | Non-parametric independent comparisons |

### Significance Assessment

Reactions are considered significant when:
1. **P-value** ≤ specified threshold (default: 0.1)
2. **Fold change** ≥ specified threshold (default: 1.5)
3. **FDR correction** (if enabled) maintains significance

## Map Visualization

### Built-in Maps

#### HMRcore (Default)
- **Scope**: Core human metabolic network
- **Reactions**: ~300 essential reactions
- **Coverage**: Central carbon, amino acid, lipid metabolism
- **Use Case**: General overview, publication figures

#### ENGRO2  
- **Scope**: Extended human genome-scale reconstruction
- **Reactions**: ~2,000 reactions
- **Coverage**: Comprehensive metabolic network
- **Use Case**: Detailed analysis, specialized tissues

#### Custom Maps
User-provided SVG files with reaction elements:
```xml
<rect id="R00001" class="reaction" fill="gray" stroke="black"/>
<path id="R00002" class="reaction" fill="gray" stroke="black"/>
```

### Color Coding Scheme

#### Significance Colors
- **Red Gradient**: Significantly upregulated (positive fold change)
- **Blue Gradient**: Significantly downregulated (negative fold change)  
- **Gray**: Not statistically significant
- **White**: No data available

#### Visual Elements
- **Line Width**: Proportional to fold change magnitude
- **Color Intensity**: Proportional to statistical significance (-log10 p-value)
- **Transparency**: Indicates confidence level

### Color Maps

#### Jet (Default)
- High contrast color transitions
- Blue (low) → Green → Yellow → Red (high)
- Good for identifying extreme values

#### Viridis
- Perceptually uniform color scale
- Colorblind-friendly
- Purple (low) → Blue → Green → Yellow (high)

## Output Files

### Statistical Results
- `flux_statistics.tsv`: P-values, fold changes, test statistics for all reactions
- `significant_fluxes.tsv`: Only reactions meeting significance criteria
- `comparison_summary.txt`: Analysis parameters and summary statistics

### Visualizations
- `flux_map.svg`: Interactive color-coded pathway map
- `flux_map.pdf`: High-resolution PDF (if requested)  
- `flux_map.png`: Raster image (if requested)
- `legend.svg`: Color scale and statistical significance legend

### Analysis Files
- `fold_changes.tsv`: Detailed fold change calculations
- `group_statistics.tsv`: Per-group summary statistics
- `comparison_matrix.tsv`: Pairwise comparison results

## Examples

### Basic Flux Comparison

```bash
# Compare treatment vs control fluxes
flux_to_map -td /opt/COBRAxy \
            -idf treatment_vs_control_fluxes.tsv \
            -icf sample_groups.tsv \
            -co manyvsmany \
            -te ks \
            -pv 0.05 \
            -fc 2.0 \
            -mc HMRcore \
            -gs true \
            -gp true \
            -idop flux_comparison/
```

### Multiple Condition Analysis

```bash
# Compare multiple experimental conditions
flux_to_map -td /opt/COBRAxy \
            -idsf "cond1_flux.tsv cond2_flux.tsv cond3_flux.tsv" \
            -naf "Control Treatment1 Treatment2" \
            -co onevsrest \
            -te wilcoxon \
            -adj true \
            -pv 0.01 \
            -fc 1.8 \
            -mc ENGRO2 \
            -colorm viridis \
            -idop multi_condition_flux/
```

### Custom Map Visualization

```bash
# Use tissue-specific custom map
flux_to_map -td /opt/COBRAxy \
            -idf liver_flux_data.tsv \
            -icf liver_conditions.tsv \
            -co manyvsmany \
            -te ttest_ind \
            -pv 0.05 \
            -fc 1.5 \
            -cm maps/liver_specific_map.svg \
            -gs true \
            -gp true \
            -idop liver_flux_analysis/ \
            -ol liver_analysis.log
```

### High-Throughput Analysis

```bash
# Process multiple datasets with stringent criteria
flux_to_map -td /opt/COBRAxy \
            -idsf "exp1.tsv exp2.tsv exp3.tsv exp4.tsv" \
            -naf "Exp1 Exp2 Exp3 Exp4" \
            -co manyvsmany \
            -te ks \
            -adj true \
            -pv 0.001 \
            -fc 3.0 \
            -mc HMRcore \
            -colorm jet \
            -gs true \
            -gp true \
            -idop high_throughput_flux/
```

## Quality Control

### Data Validation

#### Pre-analysis Checks
- Verify flux value distributions (check for outliers)
- Ensure sample names match between data and class files
- Validate reaction coverage across samples
- Check for missing values and their patterns

#### Statistical Validation  
- Assess normality assumptions for parametric tests
- Verify adequate sample sizes per group (n≥3 recommended)
- Check variance homogeneity between groups
- Evaluate multiple testing burden

### Result Interpretation

#### Biological Validation
- Compare results with known pathway activities
- Check for pathway coherence (related reactions should cluster)
- Validate against literature or experimental evidence
- Assess metabolic network connectivity

#### Technical Validation
- Compare results across different statistical tests
- Check sensitivity to parameter changes
- Validate fold change calculations
- Verify map element correspondence

## Tips and Best Practices

### Data Preparation
- **Normalization**: Ensure consistent flux units across samples
- **Filtering**: Remove reactions with excessive missing values (>50%)
- **Outlier Detection**: Identify and handle extreme flux values
- **Batch Effects**: Account for technical variation between experiments

### Statistical Considerations
- Use FDR correction for multiple comparisons (`-adj true`)
- Choose appropriate statistical tests based on data distribution
- Consider effect size (fold change) alongside significance
- Validate results with independent datasets when possible

### Visualization Optimization
- Select appropriate color maps for your audience
- Use high fold change thresholds (>2.0) for cleaner maps
- Export both SVG (editable) and PDF (publication) formats
- Include comprehensive legends and annotations

### Performance Tips
- Use HMRcore for faster processing and clearer visualizations
- Reduce dataset size for initial exploratory analysis
- Process large datasets in batches if memory constrained
- Cache intermediate results for parameter optimization

## Integration Workflow

### Upstream Tools
- [Flux Simulation](flux-simulation.md) - Generate flux distributions for comparison
- [MAREA](marea.md) - Alternative analysis pathway for RAS/RPS data

### Downstream Analysis
- Export results to statistical software (R, Python) for advanced analysis
- Integrate with pathway databases (KEGG, Reactome)
- Combine with other omics data for systems-level insights

### Typical Pipeline

```bash
# 1. Generate flux samples from constrained models
flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv \
                -ni Sample1,Sample2,Control1,Control2 -a CBS \
                -ot mean -idop fluxes/

# 2. Analyze and visualize flux differences
flux_to_map -td /opt/COBRAxy -idf fluxes/mean.csv \
            -icf sample_groups.tsv -co manyvsmany -te ks \
            -mc HMRcore -gs true -gp true -idop flux_maps/

# 3. Further analysis with custom scripts
python analyze_flux_results.py -i flux_maps/ -o final_results/
```

## Troubleshooting

### Common Issues

**No significant reactions found**
- Lower p-value threshold (`-pv 0.2`)
- Reduce fold change requirement (`-fc 1.2`)  
- Check sample group definitions and sizes
- Verify flux data quality and normalization

**Map rendering problems**
- Check SVG map file integrity and format
- Verify reaction ID matching between data and map
- Ensure sufficient system memory for large maps
- Validate XML structure of custom maps

**Statistical test failures**
- Check data distribution assumptions
- Verify sufficient sample sizes per group
- Consider alternative non-parametric tests
- Examine variance patterns between groups

### Error Messages

| Error | Cause | Solution |
|-------|-------|----------|
| "Map file not found" | Missing/invalid map path | Check file location and format |
| "No matching reactions" | ID mismatch between data and map | Verify reaction naming consistency |
| "Insufficient data" | Too few samples per group | Increase sample sizes or merge groups |
| "Memory allocation failed" | Large dataset/map combination | Reduce data size or increase system memory |

### Performance Issues

**Slow processing**
- Use HMRcore instead of ENGRO2 for faster rendering
- Reduce dataset size for testing
- Process subsets of reactions separately
- Monitor system resource usage

**Large output files**
- Use compressed formats when possible
- Reduce map resolution for preliminary analysis
- Export only essential output formats
- Clean temporary files regularly

## Advanced Usage

### Custom Statistical Functions

Advanced users can implement custom statistical tests by modifying the analysis functions:

```python
def custom_test(group1, group2):
    # Custom statistical test implementation
    statistic, pvalue = your_test_function(group1, group2)
    return statistic, pvalue
```

### Batch Processing Script

Process multiple experiments systematically:

```bash
#!/bin/bash
experiments=("exp1" "exp2" "exp3" "exp4")
for exp in "${experiments[@]}"; do
    flux_to_map -td /opt/COBRAxy \
                -idf "data/${exp}_flux.tsv" \
                -icf "data/${exp}_classes.tsv" \
                -co manyvsmany -te ks -pv 0.05 \
                -mc HMRcore -gs true -gp true \
                -idop "results/${exp}/"
done
```

### Result Aggregation

Combine results across multiple analyses:

```bash
# Merge significant reactions across experiments
python merge_flux_results.py \
    -i results/exp*/significant_fluxes.tsv \
    -o combined_significant_reactions.tsv \
    --method intersection
```

## See Also

- [Flux Simulation](flux-simulation.md) - Generate input flux distributions
- [MAREA](marea.md) - Alternative pathway analysis approach
- [Custom Map Creation Guide](../tutorials/custom-map-creation.md)
- [Statistical Methods Reference](../tutorials/statistical-methods.md)