annotate COBRAxy/docs/tools/ras-generator.md @ 509:5956dcf94277 draft default tip

Uploaded
author francesco_lapi
date Wed, 01 Oct 2025 15:34:21 +0000
parents 4ed95023af20
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
1 # RAS Generator
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
3 Generate Reaction Activity Scores (RAS) from gene expression data and GPR (Gene-Protein-Reaction) rules.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
5 ## Overview
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
6
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
7 The RAS Generator computes metabolic reaction activity by:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
8 1. Mapping gene expression to reactions via GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
9 2. Applying logical operations (AND/OR) for enzyme complexes
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
10 3. Producing activity scores for each reaction in each sample
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
11
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
12 **Input**: Gene expression data + GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
13 **Output**: Reaction activity scores (RAS)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
14
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
15 ## Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
16
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
17 ### Required Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
18
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
19 | Parameter | Short | Type | Description |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
20 |-----------|--------|------|-------------|
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
21 | `--tool_dir` | `-td` | string | COBRAxy installation directory |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
22 | `--input` | `-in` | file | Gene expression dataset (TSV format) |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
23 | `--ras_output` | `-ra` | file | Output file for RAS values |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
24 | `--rules_selector` | `-rs` | choice | Built-in model (ENGRO2, Recon, HMRcore) |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
25
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
26 ### Optional Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
27
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
28 | Parameter | Short | Type | Default | Description |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
29 |-----------|--------|------|---------|-------------|
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
30 | `--none` | `-n` | boolean | true | Handle missing gene values |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
31 | `--model_upload` | `-rl` | file | - | Custom GPR rules file |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
32 | `--model_upload_name` | `-rn` | string | - | Custom model name |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
33 | `--out_log` | - | file | log.txt | Output log file |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
34
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
35 ## Input Format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
36
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
37 ### Gene Expression File
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
38 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
39 Gene_ID Sample_1 Sample_2 Sample_3 Sample_4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
40 HGNC:5 10.5 11.2 15.7 14.3
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
41 HGNC:10 3.2 4.1 8.8 7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
42 HGNC:15 7.9 8.2 4.4 5.1
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
43 HGNC:25 12.1 13.5 18.2 17.8
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
44 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
45
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
46 **Requirements**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
47 - First column: Gene identifiers (HGNC, Ensembl, Entrez, etc.)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
48 - Subsequent columns: Expression values (numeric)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
49 - Header row with sample names
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
50 - Tab-separated format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
51
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
52 ### Custom GPR Rules File (Optional)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
53 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
54 Reaction_ID GPR
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
55 R_HEX1 HGNC:4922
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
56 R_PGI HGNC:8906
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
57 R_PFK HGNC:8877 or HGNC:8878
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
58 R_ALDOA HGNC:414 and HGNC:417
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
59 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
60
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
61 ## Algorithm Details
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
62
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
63 ### GPR Rule Processing
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
64
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
65 **Gene Mapping**: Each gene in the expression data is mapped to reactions via GPR rules.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
66
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
67 **Logical Operations**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
68 - **OR**: `Gene1 or Gene2` → `max(expr1, expr2)` or `expr1 + expr2`
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
69 - **AND**: `Gene1 and Gene2` → `min(expr1, expr2)`
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
70
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
71 **Missing Gene Handling**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
72 - `-n true`: Missing genes treated as 0, OR operations continue
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
73 - `-n false`: Missing genes cause reaction score to be null
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
74
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
75 ### RAS Computation
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
76
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
77 For each reaction and sample:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
78
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
79 1. **Parse GPR rule** into nested logical structure
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
80 2. **Replace gene names** with expression values
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
81 3. **Evaluate logical operations** recursively
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
82 4. **Assign RAS score** based on final result
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
83
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
84 **Example**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
85 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
86 GPR: (HGNC:5 and HGNC:10) or HGNC:15
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
87 Expression: HGNC:5=10.5, HGNC:10=3.2, HGNC:15=7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
88 RAS = max(min(10.5, 3.2), 7.9) = max(3.2, 7.9) = 7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
89 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
90
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
91 ## Output Format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
92
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
93 ### RAS Values File
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
94 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
95 Reactions Sample_1 Sample_2 Sample_3 Sample_4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
96 R_HEX1 8.5 9.2 12.1 11.3
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
97 R_PGI 7.3 8.1 6.4 7.2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
98 R_PFK 15.2 16.8 20.1 18.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
99 R_ALDOA 3.2 4.1 4.4 5.1
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
100 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
101
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
102 **Format**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
103 - First column: Reaction identifiers
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
104 - Subsequent columns: RAS values for each sample
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
105 - Missing values represented as "None"
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
106
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
107 ## Usage Examples
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
108
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
109 ### Command Line
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
110
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
111 ```bash
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
112 # Basic usage with built-in model
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
113 ras_generator -td /path/to/COBRAxy \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
114 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
115 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
116 -rs ENGRO2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
117
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
118 # With custom model and strict missing gene handling
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
119 ras_generator -td /path/to/COBRAxy \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
120 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
121 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
122 -rl custom_rules.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
123 -rn "CustomModel" \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
124 -n false
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
125 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
126
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
127 ### Python API
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
128
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
129 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
130 import ras_generator
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
131
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
132 # Basic RAS generation
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
133 args = [
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
134 '-td', '/path/to/COBRAxy',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
135 '-in', 'expression_data.tsv',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
136 '-ra', 'ras_output.tsv',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
137 '-rs', 'ENGRO2'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
138 ]
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
139
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
140 ras_generator.main(args)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
141 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
142
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
143 ### Galaxy Usage
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
144
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
145 1. Upload gene expression file to Galaxy
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
146 2. Select **RAS Generator** from COBRAxy tools
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
147 3. Configure parameters:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
148 - **Input dataset**: Your expression file
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
149 - **Rule selector**: ENGRO2 (or other model)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
150 - **Handle missing genes**: Yes/No
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
151 4. Click **Execute**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
152
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
153 ## Built-in Models
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
154
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
155 ### ENGRO2 (Recommended for most analyses)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
156 - **Scope**: Focused human metabolism
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
157 - **Reactions**: ~2,000
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
158 - **Genes**: ~500
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
159 - **Use case**: General metabolic analysis
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
160
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
161 ### Recon (Comprehensive analysis)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
162 - **Scope**: Complete human metabolism
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
163 - **Reactions**: ~10,000
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
164 - **Genes**: ~2,000
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
165 - **Use case**: Detailed metabolic studies
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
166
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
167 ### HMRcore (Balanced option)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
168 - **Scope**: Core human metabolism
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
169 - **Reactions**: ~5,000
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
170 - **Genes**: ~1,000
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
171 - **Use case**: Balanced coverage
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
172
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
173 ## Gene ID Mapping
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
174
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
175 COBRAxy supports multiple gene identifier formats:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
176
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
177 | Format | Example | Notes |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
178 |--------|---------|--------|
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
179 | **HGNC ID** | HGNC:5 | Recommended, most stable |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
180 | **HGNC Symbol** | ALDOA | Human-readable but may change |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
181 | **Ensembl** | ENSG00000149925 | Version-specific |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
182 | **Entrez** | 226 | Numeric identifier |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
183
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
184 **Recommendation**: Use HGNC IDs for best compatibility and stability.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
185
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
186
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
187
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
188 ## Troubleshooting
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
189
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
190 ### Common Issues
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
191
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
192 **"Gene not found" warnings**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
193 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
194 Solution: Check gene ID format matches model expectations
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
195 - Verify gene identifiers (HGNC vs symbols vs Ensembl)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
196 - Use gene mapping tools if needed
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
197 - Set -n true to handle missing genes gracefully
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
198 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
199
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
200 **"No computable scores" error**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
201 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
202 Solution: Insufficient gene overlap between data and model
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
203 - Check gene ID format compatibility
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
204 - Verify expression file format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
205 - Try different built-in model
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
206 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
207
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
208 **Empty output file**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
209 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
210 Solution: Check input file format and permissions
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
211 - Ensure TSV format with proper headers
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
212 - Verify file paths are correct
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
213 - Check write permissions for output directory
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
214 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
215
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
216
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
217
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
218 ### Debug Mode
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
219
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
220 Enable detailed logging:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
221
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
222 ```bash
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
223 ras_generator -td /path/to/COBRAxy \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
224 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
225 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
226 -rs ENGRO2 \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
227 --out_log detailed_log.txt
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
228 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
229
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
230 Check log file for detailed error messages and processing statistics.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
231
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
232 ## Validation
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
233
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
234 ### Check Output Quality
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
235
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
236 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
237 import pandas as pd
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
238
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
239 # Read RAS output
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
240 ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
241
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
242 # Basic statistics
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
243 print(f"RAS matrix shape: {ras_df.shape}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
244 print(f"Non-null values: {ras_df.count().sum()}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
245 print(f"Value range: {ras_df.min().min():.2f} to {ras_df.max().max():.2f}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
246
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
247 # Check for problematic reactions
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
248 null_reactions = ras_df.isnull().all(axis=1).sum()
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
249 print(f"Reactions with no data: {null_reactions}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
250 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
251
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
252 ### Expected Results
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
253
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
254 - **Coverage**: 60-90% of reactions should have computable scores
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
255 - **Range**: RAS values typically 0-20 for log-transformed expression
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
256 - **Distribution**: Should reflect biological variation in your samples
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
257
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
258 ## Integration with Other Tools
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
259
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
260 ### Downstream Analysis
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
261
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
262 RAS output can be used with:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
263
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
264 - **[MAREA](marea.md)**: Statistical enrichment analysis
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
265 - **[RAS to Bounds](ras-to-bounds.md)**: Flux constraint application
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
266 - **[MAREA Cluster](marea-cluster.md)**: Sample clustering
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
267
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
268 ### Preprocessing Options
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
269
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
270 Before RAS generation:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
271 - **Normalize** expression data (log2, quantile, etc.)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
272 - **Filter** low-expression genes
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
273 - **Batch correct** if multiple datasets
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
274
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
275 ## Advanced Usage
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
276
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
277 ### Custom Model Integration
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
278
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
279 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
280 # Create custom GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
281 custom_rules = {
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
282 'R_CUSTOM1': 'HGNC:5 and HGNC:10',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
283 'R_CUSTOM2': 'HGNC:15 or HGNC:20'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
284 }
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
285
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
286 # Save as TSV
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
287 import pandas as pd
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
288 rules_df = pd.DataFrame(list(custom_rules.items()),
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
289 columns=['Reaction_ID', 'GPR'])
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
290 rules_df.to_csv('custom_rules.tsv', sep='\t', index=False)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
291
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
292 # Use with RAS generator
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
293 args = ['-rl', 'custom_rules.tsv', '-rn', 'CustomModel']
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
294 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
295
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
296 ### Batch Processing
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
297
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
298 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
299 # Process multiple expression files
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
300 expression_files = ['data1.tsv', 'data2.tsv', 'data3.tsv']
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
301
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
302 for i, exp_file in enumerate(expression_files):
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
303 output_file = f'ras_output_{i}.tsv'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
304
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
305 args = [
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
306 '-td', '/path/to/COBRAxy',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
307 '-in', exp_file,
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
308 '-ra', output_file,
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
309 '-rs', 'ENGRO2'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
310 ]
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
311
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
312 ras_generator.main(args)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
313 print(f"Processed {exp_file} → {output_file}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
314 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
315
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
316 ## References
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
317
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
318 - [COBRApy documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
319 - [GPR rules format](https://cobrapy.readthedocs.io/en/stable/getting_started.html#gene-protein-reaction-rules) - Standard format specification
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
320 - [HGNC database](https://www.genenames.org/) - Gene nomenclature standards