492
|
1 # RAS Generator
|
|
2
|
|
3 Generate Reaction Activity Scores (RAS) from gene expression data and GPR (Gene-Protein-Reaction) rules.
|
|
4
|
|
5 ## Overview
|
|
6
|
|
7 The RAS Generator computes metabolic reaction activity by:
|
|
8 1. Mapping gene expression to reactions via GPR rules
|
|
9 2. Applying logical operations (AND/OR) for enzyme complexes
|
|
10 3. Producing activity scores for each reaction in each sample
|
|
11
|
|
12 **Input**: Gene expression data + GPR rules
|
|
13 **Output**: Reaction activity scores (RAS)
|
|
14
|
|
15 ## Parameters
|
|
16
|
|
17 ### Required Parameters
|
|
18
|
|
19 | Parameter | Short | Type | Description |
|
|
20 |-----------|--------|------|-------------|
|
|
21 | `--tool_dir` | `-td` | string | COBRAxy installation directory |
|
|
22 | `--input` | `-in` | file | Gene expression dataset (TSV format) |
|
|
23 | `--ras_output` | `-ra` | file | Output file for RAS values |
|
|
24 | `--rules_selector` | `-rs` | choice | Built-in model (ENGRO2, Recon, HMRcore) |
|
|
25
|
|
26 ### Optional Parameters
|
|
27
|
|
28 | Parameter | Short | Type | Default | Description |
|
|
29 |-----------|--------|------|---------|-------------|
|
|
30 | `--none` | `-n` | boolean | true | Handle missing gene values |
|
|
31 | `--model_upload` | `-rl` | file | - | Custom GPR rules file |
|
|
32 | `--model_upload_name` | `-rn` | string | - | Custom model name |
|
|
33 | `--out_log` | - | file | log.txt | Output log file |
|
|
34
|
|
35 ## Input Format
|
|
36
|
|
37 ### Gene Expression File
|
|
38 ```tsv
|
|
39 Gene_ID Sample_1 Sample_2 Sample_3 Sample_4
|
|
40 HGNC:5 10.5 11.2 15.7 14.3
|
|
41 HGNC:10 3.2 4.1 8.8 7.9
|
|
42 HGNC:15 7.9 8.2 4.4 5.1
|
|
43 HGNC:25 12.1 13.5 18.2 17.8
|
|
44 ```
|
|
45
|
|
46 **Requirements**:
|
|
47 - First column: Gene identifiers (HGNC, Ensembl, Entrez, etc.)
|
|
48 - Subsequent columns: Expression values (numeric)
|
|
49 - Header row with sample names
|
|
50 - Tab-separated format
|
|
51
|
|
52 ### Custom GPR Rules File (Optional)
|
|
53 ```tsv
|
|
54 Reaction_ID GPR
|
|
55 R_HEX1 HGNC:4922
|
|
56 R_PGI HGNC:8906
|
|
57 R_PFK HGNC:8877 or HGNC:8878
|
|
58 R_ALDOA HGNC:414 and HGNC:417
|
|
59 ```
|
|
60
|
|
61 ## Algorithm Details
|
|
62
|
|
63 ### GPR Rule Processing
|
|
64
|
|
65 **Gene Mapping**: Each gene in the expression data is mapped to reactions via GPR rules.
|
|
66
|
|
67 **Logical Operations**:
|
|
68 - **OR**: `Gene1 or Gene2` → `max(expr1, expr2)` or `expr1 + expr2`
|
|
69 - **AND**: `Gene1 and Gene2` → `min(expr1, expr2)`
|
|
70
|
|
71 **Missing Gene Handling**:
|
|
72 - `-n true`: Missing genes treated as 0, OR operations continue
|
|
73 - `-n false`: Missing genes cause reaction score to be null
|
|
74
|
|
75 ### RAS Computation
|
|
76
|
|
77 For each reaction and sample:
|
|
78
|
|
79 1. **Parse GPR rule** into nested logical structure
|
|
80 2. **Replace gene names** with expression values
|
|
81 3. **Evaluate logical operations** recursively
|
|
82 4. **Assign RAS score** based on final result
|
|
83
|
|
84 **Example**:
|
|
85 ```
|
|
86 GPR: (HGNC:5 and HGNC:10) or HGNC:15
|
|
87 Expression: HGNC:5=10.5, HGNC:10=3.2, HGNC:15=7.9
|
|
88 RAS = max(min(10.5, 3.2), 7.9) = max(3.2, 7.9) = 7.9
|
|
89 ```
|
|
90
|
|
91 ## Output Format
|
|
92
|
|
93 ### RAS Values File
|
|
94 ```tsv
|
|
95 Reactions Sample_1 Sample_2 Sample_3 Sample_4
|
|
96 R_HEX1 8.5 9.2 12.1 11.3
|
|
97 R_PGI 7.3 8.1 6.4 7.2
|
|
98 R_PFK 15.2 16.8 20.1 18.9
|
|
99 R_ALDOA 3.2 4.1 4.4 5.1
|
|
100 ```
|
|
101
|
|
102 **Format**:
|
|
103 - First column: Reaction identifiers
|
|
104 - Subsequent columns: RAS values for each sample
|
|
105 - Missing values represented as "None"
|
|
106
|
|
107 ## Usage Examples
|
|
108
|
|
109 ### Command Line
|
|
110
|
|
111 ```bash
|
|
112 # Basic usage with built-in model
|
|
113 ras_generator -td /path/to/COBRAxy \
|
|
114 -in expression_data.tsv \
|
|
115 -ra ras_output.tsv \
|
|
116 -rs ENGRO2
|
|
117
|
|
118 # With custom model and strict missing gene handling
|
|
119 ras_generator -td /path/to/COBRAxy \
|
|
120 -in expression_data.tsv \
|
|
121 -ra ras_output.tsv \
|
|
122 -rl custom_rules.tsv \
|
|
123 -rn "CustomModel" \
|
|
124 -n false
|
|
125 ```
|
|
126
|
|
127 ### Python API
|
|
128
|
|
129 ```python
|
|
130 import ras_generator
|
|
131
|
|
132 # Basic RAS generation
|
|
133 args = [
|
|
134 '-td', '/path/to/COBRAxy',
|
|
135 '-in', 'expression_data.tsv',
|
|
136 '-ra', 'ras_output.tsv',
|
|
137 '-rs', 'ENGRO2'
|
|
138 ]
|
|
139
|
|
140 ras_generator.main(args)
|
|
141 ```
|
|
142
|
|
143 ### Galaxy Usage
|
|
144
|
|
145 1. Upload gene expression file to Galaxy
|
|
146 2. Select **RAS Generator** from COBRAxy tools
|
|
147 3. Configure parameters:
|
|
148 - **Input dataset**: Your expression file
|
|
149 - **Rule selector**: ENGRO2 (or other model)
|
|
150 - **Handle missing genes**: Yes/No
|
|
151 4. Click **Execute**
|
|
152
|
|
153 ## Built-in Models
|
|
154
|
|
155 ### ENGRO2 (Recommended for most analyses)
|
|
156 - **Scope**: Focused human metabolism
|
|
157 - **Reactions**: ~2,000
|
|
158 - **Genes**: ~500
|
|
159 - **Use case**: General metabolic analysis
|
|
160
|
|
161 ### Recon (Comprehensive analysis)
|
|
162 - **Scope**: Complete human metabolism
|
|
163 - **Reactions**: ~10,000
|
|
164 - **Genes**: ~2,000
|
|
165 - **Use case**: Detailed metabolic studies
|
|
166
|
|
167 ### HMRcore (Balanced option)
|
|
168 - **Scope**: Core human metabolism
|
|
169 - **Reactions**: ~5,000
|
|
170 - **Genes**: ~1,000
|
|
171 - **Use case**: Balanced coverage
|
|
172
|
|
173 ## Gene ID Mapping
|
|
174
|
|
175 COBRAxy supports multiple gene identifier formats:
|
|
176
|
|
177 | Format | Example | Notes |
|
|
178 |--------|---------|--------|
|
|
179 | **HGNC ID** | HGNC:5 | Recommended, most stable |
|
|
180 | **HGNC Symbol** | ALDOA | Human-readable but may change |
|
|
181 | **Ensembl** | ENSG00000149925 | Version-specific |
|
|
182 | **Entrez** | 226 | Numeric identifier |
|
|
183
|
|
184 **Recommendation**: Use HGNC IDs for best compatibility and stability.
|
|
185
|
|
186
|
|
187
|
|
188 ## Troubleshooting
|
|
189
|
|
190 ### Common Issues
|
|
191
|
|
192 **"Gene not found" warnings**
|
|
193 ```
|
|
194 Solution: Check gene ID format matches model expectations
|
|
195 - Verify gene identifiers (HGNC vs symbols vs Ensembl)
|
|
196 - Use gene mapping tools if needed
|
|
197 - Set -n true to handle missing genes gracefully
|
|
198 ```
|
|
199
|
|
200 **"No computable scores" error**
|
|
201 ```
|
|
202 Solution: Insufficient gene overlap between data and model
|
|
203 - Check gene ID format compatibility
|
|
204 - Verify expression file format
|
|
205 - Try different built-in model
|
|
206 ```
|
|
207
|
|
208 **Empty output file**
|
|
209 ```
|
|
210 Solution: Check input file format and permissions
|
|
211 - Ensure TSV format with proper headers
|
|
212 - Verify file paths are correct
|
|
213 - Check write permissions for output directory
|
|
214 ```
|
|
215
|
|
216
|
|
217
|
|
218 ### Debug Mode
|
|
219
|
|
220 Enable detailed logging:
|
|
221
|
|
222 ```bash
|
|
223 ras_generator -td /path/to/COBRAxy \
|
|
224 -in expression_data.tsv \
|
|
225 -ra ras_output.tsv \
|
|
226 -rs ENGRO2 \
|
|
227 --out_log detailed_log.txt
|
|
228 ```
|
|
229
|
|
230 Check log file for detailed error messages and processing statistics.
|
|
231
|
|
232 ## Validation
|
|
233
|
|
234 ### Check Output Quality
|
|
235
|
|
236 ```python
|
|
237 import pandas as pd
|
|
238
|
|
239 # Read RAS output
|
|
240 ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
|
|
241
|
|
242 # Basic statistics
|
|
243 print(f"RAS matrix shape: {ras_df.shape}")
|
|
244 print(f"Non-null values: {ras_df.count().sum()}")
|
|
245 print(f"Value range: {ras_df.min().min():.2f} to {ras_df.max().max():.2f}")
|
|
246
|
|
247 # Check for problematic reactions
|
|
248 null_reactions = ras_df.isnull().all(axis=1).sum()
|
|
249 print(f"Reactions with no data: {null_reactions}")
|
|
250 ```
|
|
251
|
|
252 ### Expected Results
|
|
253
|
|
254 - **Coverage**: 60-90% of reactions should have computable scores
|
|
255 - **Range**: RAS values typically 0-20 for log-transformed expression
|
|
256 - **Distribution**: Should reflect biological variation in your samples
|
|
257
|
|
258 ## Integration with Other Tools
|
|
259
|
|
260 ### Downstream Analysis
|
|
261
|
|
262 RAS output can be used with:
|
|
263
|
|
264 - **[MAREA](marea.md)**: Statistical enrichment analysis
|
|
265 - **[RAS to Bounds](ras-to-bounds.md)**: Flux constraint application
|
|
266 - **[MAREA Cluster](marea-cluster.md)**: Sample clustering
|
|
267
|
|
268 ### Preprocessing Options
|
|
269
|
|
270 Before RAS generation:
|
|
271 - **Normalize** expression data (log2, quantile, etc.)
|
|
272 - **Filter** low-expression genes
|
|
273 - **Batch correct** if multiple datasets
|
|
274
|
|
275 ## Advanced Usage
|
|
276
|
|
277 ### Custom Model Integration
|
|
278
|
|
279 ```python
|
|
280 # Create custom GPR rules
|
|
281 custom_rules = {
|
|
282 'R_CUSTOM1': 'HGNC:5 and HGNC:10',
|
|
283 'R_CUSTOM2': 'HGNC:15 or HGNC:20'
|
|
284 }
|
|
285
|
|
286 # Save as TSV
|
|
287 import pandas as pd
|
|
288 rules_df = pd.DataFrame(list(custom_rules.items()),
|
|
289 columns=['Reaction_ID', 'GPR'])
|
|
290 rules_df.to_csv('custom_rules.tsv', sep='\t', index=False)
|
|
291
|
|
292 # Use with RAS generator
|
|
293 args = ['-rl', 'custom_rules.tsv', '-rn', 'CustomModel']
|
|
294 ```
|
|
295
|
|
296 ### Batch Processing
|
|
297
|
|
298 ```python
|
|
299 # Process multiple expression files
|
|
300 expression_files = ['data1.tsv', 'data2.tsv', 'data3.tsv']
|
|
301
|
|
302 for i, exp_file in enumerate(expression_files):
|
|
303 output_file = f'ras_output_{i}.tsv'
|
|
304
|
|
305 args = [
|
|
306 '-td', '/path/to/COBRAxy',
|
|
307 '-in', exp_file,
|
|
308 '-ra', output_file,
|
|
309 '-rs', 'ENGRO2'
|
|
310 ]
|
|
311
|
|
312 ras_generator.main(args)
|
|
313 print(f"Processed {exp_file} → {output_file}")
|
|
314 ```
|
|
315
|
|
316 ## References
|
|
317
|
|
318 - [COBRApy documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling
|
|
319 - [GPR rules format](https://cobrapy.readthedocs.io/en/stable/getting_started.html#gene-protein-reaction-rules) - Standard format specification
|
|
320 - [HGNC database](https://www.genenames.org/) - Gene nomenclature standards |