comparison COBRAxy/docs/getting-started.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents
children
comparison
equal deleted inserted replaced
491:7a413a5ec566 492:4ed95023af20
1 # Getting Started
2
3 Welcome to COBRAxy! This guide will help you get up and running with metabolic flux analysis.
4
5 ## What is COBRAxy?
6
7 COBRAxy is a comprehensive toolkit for metabolic flux analysis that bridges the gap between omics data and biological insights. It provides:
8
9 - **Data Integration**: Combine gene expression and metabolite data
10 - **Metabolic Modeling**: Use constraint-based models for flux analysis
11 - **Visualization**: Generate interactive pathway maps
12 - **Statistical Analysis**: Perform enrichment and sensitivity analysis
13
14 ## Core Concepts
15
16 ### Reaction Activity Scores (RAS)
17 RAS quantify how active metabolic reactions are based on gene expression data. COBRAxy computes RAS by:
18 1. Mapping genes to reactions via GPR (Gene-Protein-Reaction) rules
19 2. Applying logical operations (AND/OR) based on enzyme complexes
20 3. Producing activity scores for each reaction in each sample
21
22 ### Reaction Propensity Scores (RPS)
23 RPS indicate metabolic preferences based on metabolite abundance:
24 1. Map metabolites to reactions as substrates/products
25 2. Weight by stoichiometry and frequency
26 3. Compute propensity scores using log-normalized formulas
27
28 ### Flux Sampling
29 Sample feasible flux distributions using:
30 - **CBS (Coordinate Hit-and-Run with Rounding)**: Fast, uniform sampling
31 - **OptGP (Optimal Growth Parallel)**: Growth-optimized sampling
32
33 ## Analysis Workflows
34
35 COBRAxy supports two main analysis paths:
36
37 ### 1. Enrichment Analysis Workflow
38 ```bash
39 # Generate activity scores
40 ras_generator → RAS values
41 rps_generator → RPS values
42
43 # Statistical enrichment analysis
44 marea → Enriched pathway maps
45 ```
46
47 **Use when**: You want to identify significantly altered pathways and create publication-ready maps.
48
49 ### 2. Flux Simulation Workflow
50 ```bash
51 # Apply constraints to model
52 ras_generator → RAS values
53 ras_to_bounds → Constrained model
54
55 # Sample flux distributions
56 flux_simulation → Flux samples
57 flux_to_map → Final visualizations
58 ```
59
60 **Use when**: You want to predict metabolic flux distributions and study network-wide changes.
61
62 ## Your First Analysis
63
64 Let's run a basic analysis with sample data:
65
66 ### Step 1: Prepare Your Data
67
68 You'll need:
69 - **Gene expression data**: TSV file with genes (rows) × samples (columns)
70 - **Metabolic model**: SBML file or use built-in models (ENGRO2, Recon)
71 - **Metabolite data** (optional): TSV file with metabolites (rows) × samples (columns)
72
73 ### Step 2: Generate Activity Scores
74
75 ```bash
76 # Generate RAS from expression data
77 ras_generator -td $(pwd) \
78 -in expression_data.tsv \
79 -ra ras_output.tsv \
80 -rs ENGRO2
81 ```
82
83 ### Step 3: Create Pathway Maps
84
85 ```bash
86 # Generate enriched pathway maps
87 marea -td $(pwd) \
88 -using_RAS true \
89 -input_data ras_output.tsv \
90 -choice_map ENGRO2 \
91 -gs true \
92 -idop pathway_maps
93 ```
94
95 ### Step 4: View Results
96
97 Your analysis will generate:
98 - **RAS values**: `ras_output.tsv` - Activity scores for each reaction
99 - **Statistical maps**: `pathway_maps/` - SVG files with enrichment visualization
100 - **Log files**: Detailed execution logs for troubleshooting
101
102 ## Built-in Models
103
104 COBRAxy includes ready-to-use metabolic models:
105
106 | Model | Organism | Reactions | Genes | Description |
107 |-------|----------|-----------|-------|-------------|
108 | **ENGRO2** | Human | ~2,000 | ~500 | Focused human metabolism model |
109 | **Recon** | Human | ~10,000 | ~2,000 | Comprehensive human metabolism |
110
111 Models are stored in the `local/` directory and include:
112 - SBML files
113 - GPR rules
114 - Gene mapping tables
115 - Pathway templates
116
117 ## Data Formats
118
119 ### Gene Expression Format
120 ```tsv
121 Gene_ID Sample_1 Sample_2 Sample_3
122 HGNC:5 12.5 8.3 15.7
123 HGNC:10 3.2 4.1 2.8
124 HGNC:15 7.9 11.2 6.4
125 ```
126
127 ### Metabolite Format
128 ```tsv
129 Metabolite_ID Sample_1 Sample_2 Sample_3
130 glucose 100.5 85.3 120.7
131 pyruvate 45.2 38.1 52.8
132 lactate 23.9 41.2 19.4
133 ```
134
135 ## Command Line vs Python API
136
137 COBRAxy offers two usage modes:
138
139 ### Command Line (Quick Analysis)
140 ```bash
141 # Simple command-line execution
142 ras_generator -td $(pwd) -in data.tsv -ra output.tsv -rs ENGRO2
143 ```
144
145 ### Python API (Programming)
146 ```python
147 import ras_generator
148 # Call main function with arguments
149 ras_generator.main(['-td', '/path', '-in', 'data.tsv', '-ra', 'output.tsv', '-rs', 'ENGRO2'])
150 ```
151
152 ## Next Steps
153
154 Now that you understand the basics:
155
156 1. **[Quick Start Guide](quickstart.md)** - Complete walkthrough with example data
157 2. **[Python API Tutorial](tutorials/python-api.md)** - Learn programmatic usage
158 3. **[Tools Reference](tools/)** - Detailed documentation for each tool
159 4. **[Examples](examples/)** - Real-world analysis examples
160
161 ## Need Help?
162
163 - **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
164 - **[GitHub Issues](https://github.com/CompBtBs/COBRAxy/issues)** - Report bugs or ask questions
165 - **[Contributing](contributing.md)** - Help improve COBRAxy