annotate COBRAxy/docs/tools/ras-generator.md @ 538:fd53d42348bd draft

Uploaded
author francesco_lapi
date Sat, 25 Oct 2025 11:39:03 +0000
parents 4ed95023af20
children fcdbc81feb45
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
1 # RAS Generator
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
3 Generate Reaction Activity Scores (RAS) from gene expression data and GPR (Gene-Protein-Reaction) rules.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
5 ## Overview
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
6
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
7 The RAS Generator computes metabolic reaction activity by:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
8 1. Mapping gene expression to reactions via GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
9 2. Applying logical operations (AND/OR) for enzyme complexes
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
10 3. Producing activity scores for each reaction in each sample
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
11
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
12 **Input**: Gene expression data + GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
13 **Output**: Reaction activity scores (RAS)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
14
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
15 ## Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
16
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
17 ### Required Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
18
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
19 | Parameter | Short | Type | Description |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
20 |-----------|--------|------|-------------|
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
21 | `--tool_dir` | `-td` | string | COBRAxy installation directory |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
22 | `--input` | `-in` | file | Gene expression dataset (TSV format) |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
23 | `--ras_output` | `-ra` | file | Output file for RAS values |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
24 | `--rules_selector` | `-rs` | choice | Built-in model (ENGRO2, Recon, HMRcore) |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
25
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
26 ### Optional Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
27
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
28 | Parameter | Short | Type | Default | Description |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
29 |-----------|--------|------|---------|-------------|
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
30 | `--none` | `-n` | boolean | true | Handle missing gene values |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
31 | `--model_upload` | `-rl` | file | - | Custom GPR rules file |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
32 | `--model_upload_name` | `-rn` | string | - | Custom model name |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
33 | `--out_log` | - | file | log.txt | Output log file |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
34
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
35 ## Input Format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
36
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
37 ### Gene Expression File
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
38 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
39 Gene_ID Sample_1 Sample_2 Sample_3 Sample_4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
40 HGNC:5 10.5 11.2 15.7 14.3
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
41 HGNC:10 3.2 4.1 8.8 7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
42 HGNC:15 7.9 8.2 4.4 5.1
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
43 HGNC:25 12.1 13.5 18.2 17.8
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
44 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
45
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
46 **Requirements**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
47 - First column: Gene identifiers (HGNC, Ensembl, Entrez, etc.)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
48 - Subsequent columns: Expression values (numeric)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
49 - Header row with sample names
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
50 - Tab-separated format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
51
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
52 ### Custom GPR Rules File (Optional)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
53 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
54 Reaction_ID GPR
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
55 R_HEX1 HGNC:4922
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
56 R_PGI HGNC:8906
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
57 R_PFK HGNC:8877 or HGNC:8878
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
58 R_ALDOA HGNC:414 and HGNC:417
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
59 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
60
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
61 ## Algorithm Details
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
62
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
63 ### GPR Rule Processing
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
64
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
65 **Gene Mapping**: Each gene in the expression data is mapped to reactions via GPR rules.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
66
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
67 **Logical Operations**:
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
68 - **OR**: `Gene1 or Gene2` → `expr1 + expr2`
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
69 - **AND**: `Gene1 and Gene2` → `min(expr1, expr2)`
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
70
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
71 **Missing Gene Handling**:
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
72 - `-n true`: Ignore missing genes in the GPR rules.
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
73 - `-n false`: Missing genes cause reaction score to be NaN
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
74
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
75 ### RAS Computation
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
76
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
77 **Example**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
78 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
79 GPR: (HGNC:5 and HGNC:10) or HGNC:15
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
80 Expression: HGNC:5=10.5, HGNC:10=3.2, HGNC:15=7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
81 RAS = max(min(10.5, 3.2), 7.9) = max(3.2, 7.9) = 7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
82 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
83
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
84 ## Output Format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
85
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
86 ### RAS Values File
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
87 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
88 Reactions Sample_1 Sample_2 Sample_3 Sample_4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
89 R_HEX1 8.5 9.2 12.1 11.3
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
90 R_PGI 7.3 8.1 6.4 7.2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
91 R_PFK 15.2 16.8 20.1 18.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
92 R_ALDOA 3.2 4.1 4.4 5.1
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
93 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
94
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
95 **Format**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
96 - First column: Reaction identifiers
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
97 - Subsequent columns: RAS values for each sample
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
98 - Missing values represented as "None"
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
99
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
100 ## Usage Examples
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
101
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
102 ### Command Line
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
103
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
104 ```bash
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
105 # Basic usage with built-in model
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
106 ras_generator -td /path/to/COBRAxy \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
107 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
108 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
109 -rs ENGRO2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
110
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
111 # With custom model and strict missing gene handling
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
112 ras_generator -td /path/to/COBRAxy \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
113 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
114 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
115 -rl custom_rules.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
116 -rn "CustomModel" \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
117 -n false
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
118 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
119
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
120 ### Python API
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
121
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
122 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
123 import ras_generator
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
124
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
125 # Basic RAS generation
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
126 args = [
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
127 '-td', '/path/to/COBRAxy',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
128 '-in', 'expression_data.tsv',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
129 '-ra', 'ras_output.tsv',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
130 '-rs', 'ENGRO2'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
131 ]
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
132
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
133 ras_generator.main(args)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
134 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
135
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
136 ### Galaxy Usage
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
137
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
138 1. Upload gene expression file to Galaxy
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
139 2. Select **RAS Generator** from COBRAxy tools
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
140 3. Configure parameters:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
141 - **Input dataset**: Your expression file
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
142 - **Rule selector**: ENGRO2 (or other model)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
143 - **Handle missing genes**: Yes/No
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
144 4. Click **Execute**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
145
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
146 ## Built-in Models
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
147
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
148 ### ENGRO2 (Recommended for most analyses)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
149 - **Scope**: Focused human metabolism
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
150 - **Reactions**: ~500
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
151 - **Genes**: ~500
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
152 - **Use case**: Core metabolic analysis
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
153
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
154 ### Recon (Comprehensive analysis)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
155 - **Scope**: Complete human metabolism
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
156 - **Reactions**: ~10,000
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
157 - **Genes**: ~2,000
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
158 - **Use case**: Genome-wide metabolic studies
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
159
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
160 ## Gene ID Mapping
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
161
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
162 COBRAxy supports multiple gene identifier formats:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
163
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
164 | Format | Example | Notes |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
165 |--------|---------|--------|
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
166 | **HGNC ID** | HGNC:5 | Recommended, most stable |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
167 | **HGNC Symbol** | ALDOA | Human-readable but may change |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
168 | **Ensembl** | ENSG00000149925 | Version-specific |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
169 | **Entrez** | 226 | Numeric identifier |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
170
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
171 **Recommendation**: Use HGNC IDs for best compatibility and stability.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
172
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
173
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
174
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
175 ## Troubleshooting
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
176
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
177 ### Common Issues
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
178
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
179 **"Gene not found" warnings**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
180 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
181 Solution: Check gene ID format matches model expectations
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
182 - Verify gene identifiers (HGNC vs symbols vs Ensembl)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
183 - Use gene mapping tools if needed
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
184 - Set -n true to handle missing genes
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
185 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
186
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
187 **"No computable scores" error**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
188 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
189 Solution: Insufficient gene overlap between data and model
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
190 - Check gene ID format compatibility
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
191 - Verify expression file format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
192 - Try different built-in model
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
193 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
194
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
195 **Empty output file**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
196 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
197 Solution: Check input file format and permissions
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
198 - Ensure TSV format with proper headers
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
199 - Verify file paths are correct
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
200 - Check write permissions for output directory
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
201 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
202
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
203
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
204
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
205 ### Debug Mode
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
206
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
207 Enable detailed logging:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
208
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
209 ```bash
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
210 ras_generator -td /path/to/COBRAxy \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
211 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
212 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
213 -rs ENGRO2 \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
214 --out_log detailed_log.txt
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
215 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
216
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
217 Check log file for detailed error messages and processing statistics.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
218
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
219 ## Validation
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
220
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
221 ### Check Output Quality
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
222
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
223 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
224 import pandas as pd
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
225
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
226 # Read RAS output
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
227 ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
228
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
229 # Basic statistics
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
230 print(f"RAS matrix shape: {ras_df.shape}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
231 print(f"Non-null values: {ras_df.count().sum()}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
232 print(f"Value range: {ras_df.min().min():.2f} to {ras_df.max().max():.2f}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
233
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
234 # Check for problematic reactions
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
235 null_reactions = ras_df.isnull().all(axis=1).sum()
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
236 print(f"Reactions with no data: {null_reactions}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
237 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
238
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
239
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
240 ## Integration with Other Tools
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
241
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
242 ### Downstream Analysis
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
243
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
244 RAS output can be used with:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
245
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
246 - **[MAREA](marea.md)**: Statistical enrichment analysis
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
247 - **[RAS to Bounds](ras-to-bounds.md)**: Flux constraint application
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
248 - **[MAREA Cluster](marea-cluster.md)**: Sample clustering
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
249
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
250 ### Preprocessing Options
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
251
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
252 Before RAS generation:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
253 - **Normalize** expression data (log2, quantile, etc.)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
254 - **Filter** low-expression genes
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
255 - **Batch correct** if multiple datasets
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
256
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
257 ## Advanced Usage
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
258
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
259 ### Custom Model Integration
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
260
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
261 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
262 # Create custom GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
263 custom_rules = {
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
264 'R_CUSTOM1': 'HGNC:5 and HGNC:10',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
265 'R_CUSTOM2': 'HGNC:15 or HGNC:20'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
266 }
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
267
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
268 # Save as TSV
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
269 import pandas as pd
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
270 rules_df = pd.DataFrame(list(custom_rules.items()),
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
271 columns=['Reaction_ID', 'GPR'])
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
272 rules_df.to_csv('custom_rules.tsv', sep='\t', index=False)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
273
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
274 # Use with RAS generator
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
275 args = ['-rl', 'custom_rules.tsv', '-rn', 'CustomModel']
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
276 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
277
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
278 ### Batch Processing
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
279
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
280 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
281 # Process multiple expression files
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
282 expression_files = ['data1.tsv', 'data2.tsv', 'data3.tsv']
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
283
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
284 for i, exp_file in enumerate(expression_files):
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
285 output_file = f'ras_output_{i}.tsv'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
286
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
287 args = [
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
288 '-td', '/path/to/COBRAxy',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
289 '-in', exp_file,
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
290 '-ra', output_file,
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
291 '-rs', 'ENGRO2'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
292 ]
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
293
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
294 ras_generator.main(args)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
295 print(f"Processed {exp_file} → {output_file}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
296 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
297
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
298 ## References
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
299
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
300 - [COBRApy documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
301 - [GPR rules format](https://cobrapy.readthedocs.io/en/stable/getting_started.html#gene-protein-reaction-rules) - Standard format specification
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
302 - [HGNC database](https://www.genenames.org/) - Gene nomenclature standards