comparison COBRAxy/docs/tools/tabular-to-model.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents
children
comparison
equal deleted inserted replaced
491:7a413a5ec566 492:4ed95023af20
1 # Tabular to Metabolic Model
2
3 Convert tabular data (CSV/TSV) into COBRA metabolic models in various formats.
4
5 ## Overview
6
7 Tabular to Metabolic Model (tabular2MetabolicModel) converts structured tabular data containing reaction information into fully functional COBRA metabolic models. This tool enables creation of custom models from spreadsheet data and supports multiple output formats including SBML, JSON, MATLAB, and YAML.
8
9 ## Usage
10
11 ### Command Line
12
13 ```bash
14 tabular2MetabolicModel --input model_data.csv \
15 --format sbml \
16 --output custom_model.xml \
17 --out_log conversion.log \
18 --tool_dir /path/to/COBRAxy
19 ```
20
21 ### Galaxy Interface
22
23 Select "Tabular to Metabolic Model" from the COBRAxy tool suite and configure conversion parameters.
24
25 ## Parameters
26
27 ### Required Parameters
28
29 | Parameter | Flag | Description |
30 |-----------|------|-------------|
31 | Input File | `--input` | Tabular file (CSV/TSV) with model data |
32 | Output Format | `--format` | Model format (sbml, json, mat, yaml) |
33 | Output File | `--output` | Output model file path |
34 | Output Log | `--out_log` | Log file for conversion process |
35
36 ### Optional Parameters
37
38 | Parameter | Flag | Description | Default |
39 |-----------|------|-------------|---------|
40 | Tool Directory | `--tool_dir` | COBRAxy installation directory | Current directory |
41
42 ## Input Format
43
44 ### Tabular Model Data
45
46 The input file must contain structured model information with the following columns:
47
48 ```csv
49 Reaction_ID,GPR_Rule,Reaction_Formula,Lower_Bound,Upper_Bound,Objective_Coefficient,Medium_Member,Compartment,Subsystem
50 R00001,GENE1 or GENE2,A + B -> C + D,-1000.0,1000.0,0.0,FALSE,cytosol,Glycolysis
51 R00002,GENE3 and GENE4,E <-> F,-1000.0,1000.0,0.0,FALSE,mitochondria,TCA_Cycle
52 EX_glc_e,-,glc_e <->,-1000.0,1000.0,0.0,TRUE,extracellular,Exchange
53 BIOMASS,GENE5,0.5 A + 0.3 B -> 1 BIOMASS,0.0,1000.0,1.0,FALSE,cytosol,Biomass
54 ```
55
56 ### Required Columns
57
58 | Column | Description | Format |
59 |--------|-------------|--------|
60 | **Reaction_ID** | Unique reaction identifier | String |
61 | **Reaction_Formula** | Stoichiometric equation | Metabolite notation |
62 | **Lower_Bound** | Minimum flux constraint | Numeric |
63 | **Upper_Bound** | Maximum flux constraint | Numeric |
64
65 ### Optional Columns
66
67 | Column | Description | Default |
68 |--------|-------------|---------|
69 | **GPR_Rule** | Gene-protein-reaction association | Empty string |
70 | **Objective_Coefficient** | Biomass/objective weight | 0.0 |
71 | **Medium_Member** | Exchange reaction flag | FALSE |
72 | **Compartment** | Subcellular location | Empty |
73 | **Subsystem** | Metabolic pathway | Empty |
74
75 ## Output Formats
76
77 ### SBML (Systems Biology Markup Language)
78 - **Format**: XML-based standard
79 - **Extension**: `.xml` or `.sbml`
80 - **Use Case**: Interoperability with other tools
81 - **Advantages**: Widely supported, standardized
82
83 ### JSON (JavaScript Object Notation)
84 - **Format**: COBRApy native JSON
85 - **Extension**: `.json`
86 - **Use Case**: Python/COBRApy workflows
87 - **Advantages**: Human-readable, lightweight
88
89 ### MATLAB (.mat)
90 - **Format**: MATLAB workspace format
91 - **Extension**: `.mat`
92 - **Use Case**: MATLAB COBRA Toolbox
93 - **Advantages**: Direct MATLAB compatibility
94
95 ### YAML (YAML Ain't Markup Language)
96 - **Format**: Human-readable data serialization
97 - **Extension**: `.yml` or `.yaml`
98 - **Use Case**: Configuration and documentation
99 - **Advantages**: Most human-readable format
100
101 ## Reaction Formula Syntax
102
103 ### Standard Notation
104 ```
105 # Irreversible reaction
106 A + B -> C + D
107
108 # Reversible reaction
109 A + B <-> C + D
110
111 # With stoichiometric coefficients
112 2 A + 3 B -> 1 C + 4 D
113
114 # Compartmentalized metabolites
115 glc_c + atp_c -> g6p_c + adp_c
116 ```
117
118 ### Compartment Suffixes
119 - `_c`: Cytosol
120 - `_m`: Mitochondria
121 - `_e`: Extracellular
122 - `_r`: Endoplasmic reticulum
123 - `_x`: Peroxisome
124 - `_n`: Nucleus
125
126 ### Exchange Reactions
127 ```
128 # Import reaction
129 EX_glc_e: glc_e <->
130
131 # Export reaction
132 EX_co2_e: co2_e <->
133 ```
134
135 ## GPR Rule Syntax
136
137 ### Logical Operators
138 - **AND**: Gene products required together
139 - **OR**: Alternative gene products
140 - **Parentheses**: Grouping for complex logic
141
142 ### Examples
143 ```
144 # Single gene
145 GENE1
146
147 # Alternative genes (isozymes)
148 GENE1 or GENE2 or GENE3
149
150 # Required genes (complex)
151 GENE1 and GENE2
152
153 # Complex logic
154 (GENE1 and GENE2) or (GENE3 and GENE4)
155 ```
156
157 ## Examples
158
159 ### Create Basic Model
160
161 ```bash
162 # Convert simple CSV to SBML model
163 tabular2MetabolicModel --input simple_model.csv \
164 --format sbml \
165 --output simple_model.xml \
166 --out_log simple_conversion.log
167 ```
168
169 ### Multi-format Export
170
171 ```bash
172 # Create models in all supported formats
173 formats=("sbml" "json" "mat" "yaml")
174 for fmt in "${formats[@]}"; do
175 tabular2MetabolicModel --input comprehensive_model.csv \
176 --format "$fmt" \
177 --output "model.$fmt" \
178 --out_log "conversion_$fmt.log"
179 done
180 ```
181
182 ### Custom Model Creation
183
184 ```bash
185 # Build tissue-specific model from curated data
186 tabular2MetabolicModel --input liver_reactions.tsv \
187 --format sbml \
188 --output liver_model.xml \
189 --out_log liver_model.log \
190 --tool_dir /opt/COBRAxy
191 ```
192
193 ### Model Integration Pipeline
194
195 ```bash
196 # Extract existing model, modify, and recreate
197 metabolicModel2Tabular --model ENGRO2 --out_tabular base_model.csv
198
199 # Edit base_model.csv with custom reactions/constraints
200
201 # Create modified model
202 tabular2MetabolicModel --input modified_model.csv \
203 --format sbml \
204 --output custom_model.xml \
205 --out_log custom_creation.log
206 ```
207
208 ## Model Validation
209
210 ### Automatic Checks
211
212 The tool performs validation during conversion:
213 - **Stoichiometric Balance**: Reaction mass balance
214 - **Metabolite Consistency**: Compartment assignments
215 - **Bound Validation**: Feasible constraint ranges
216 - **Objective Function**: Valid biomass reaction
217
218 ### Post-conversion Validation
219
220 ```python
221 import cobra
222
223 # Load and validate model
224 model = cobra.io.read_sbml_model('custom_model.xml')
225
226 # Check basic properties
227 print(f"Reactions: {len(model.reactions)}")
228 print(f"Metabolites: {len(model.metabolites)}")
229 print(f"Genes: {len(model.genes)}")
230
231 # Test model solvability
232 solution = model.optimize()
233 print(f"Growth rate: {solution.objective_value}")
234
235 # Validate mass balance
236 unbalanced = cobra.flux_analysis.check_mass_balance(model)
237 if unbalanced:
238 print("Unbalanced reactions found:", unbalanced)
239 ```
240
241 ## Integration Workflow
242
243 ### Upstream Data Sources
244
245 #### COBRAxy Tools
246 - [Metabolic Model Setting](metabolic-model-setting.md) - Extract tabular data for modification
247
248 #### External Sources
249 - **Databases**: KEGG, Reactome, BiGG
250 - **Literature**: Manually curated reactions
251 - **Spreadsheets**: User-defined custom models
252
253 ### Downstream Applications
254
255 #### COBRAxy Analysis
256 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom model
257 - [Flux Simulation](flux-simulation.md) - Sample fluxes from custom model
258 - [MAREA](marea.md) - Analyze custom pathways
259
260 #### External Tools
261 - **COBRApy**: Python-based analysis
262 - **COBRA Toolbox**: MATLAB analysis
263 - **OptFlux**: Strain design
264 - **Escher**: Pathway visualization
265
266 ### Typical Pipeline
267
268 ```bash
269 # 1. Start with existing model data
270 metabolicModel2Tabular --model ENGRO2 \
271 --out_tabular base_reactions.csv
272
273 # 2. Modify/extend the reaction data
274 # Edit base_reactions.csv to add tissue-specific reactions
275
276 # 3. Create custom model
277 tabular2MetabolicModel --input modified_reactions.csv \
278 --format sbml \
279 --output tissue_model.xml \
280 --out_log tissue_creation.log
281
282 # 4. Validate and use custom model
283 ras_to_bounds --model Custom --input tissue_model.xml \
284 --ras_input tissue_expression.tsv \
285 --idop tissue_bounds/
286
287 # 5. Perform flux analysis
288 flux_simulation --model Custom --input tissue_model.xml \
289 --bounds tissue_bounds/*.tsv \
290 --algorithm CBS --idop tissue_fluxes/
291 ```
292
293 ## Quality Control
294
295 ### Input Data Validation
296
297 #### Pre-conversion Checks
298 - **Format Consistency**: Verify column headers and data types
299 - **Reaction Completeness**: Check for missing required fields
300 - **Stoichiometric Validity**: Validate reaction formulas
301 - **Bound Feasibility**: Ensure lower ≤ upper bounds
302
303 #### Common Data Issues
304 ```bash
305 # Check for missing reaction IDs
306 awk -F',' 'NR>1 && ($1=="" || $1=="NA") {print "Empty ID in line " NR}' input.csv
307
308 # Validate reaction directions
309 awk -F',' 'NR>1 && $3 !~ /->|<->/ {print "Invalid formula: " $1 ", " $3}' input.csv
310
311 # Check bound consistency
312 awk -F',' 'NR>1 && $4>$5 {print "Invalid bounds: " $1 ", LB=" $4 " > UB=" $5}' input.csv
313 ```
314
315 ### Model Quality Assessment
316
317 #### Structural Properties
318 - **Network Connectivity**: Ensure realistic pathway structure
319 - **Compartmentalization**: Validate transport reactions
320 - **Exchange Reactions**: Verify medium composition
321 - **Biomass Function**: Check objective reaction completeness
322
323 #### Functional Testing
324 ```python
325 # Test model functionality
326 model = cobra.io.read_sbml_model('custom_model.xml')
327
328 # Check growth capability
329 growth = model.optimize().objective_value
330 print(f"Maximum growth rate: {growth}")
331
332 # Flux Variability Analysis
333 fva_result = cobra.flux_analysis.flux_variability_analysis(model)
334 blocked_reactions = fva_result[(fva_result.minimum == 0) & (fva_result.maximum == 0)]
335 print(f"Blocked reactions: {len(blocked_reactions)}")
336
337 # Essential gene analysis
338 essential_genes = cobra.flux_analysis.find_essential_genes(model)
339 print(f"Essential genes: {len(essential_genes)}")
340 ```
341
342 ## Tips and Best Practices
343
344 ### Data Preparation
345 - **Consistent Naming**: Use systematic metabolite/reaction IDs
346 - **Compartment Notation**: Follow standard suffixes (_c, _m, _e)
347 - **Balanced Reactions**: Verify mass and charge balance
348 - **Realistic Bounds**: Use physiologically relevant constraints
349
350 ### Model Design
351 - **Modular Structure**: Organize reactions by pathway/subsystem
352 - **Exchange Reactions**: Include all necessary transport processes
353 - **Biomass Function**: Define appropriate growth objective
354 - **Gene Associations**: Add GPR rules where available
355
356 ### Format Selection
357 - **SBML**: Choose for maximum compatibility and sharing
358 - **JSON**: Use for COBRApy-specific workflows
359 - **MATLAB**: Select for COBRA Toolbox integration
360 - **YAML**: Pick for human-readable documentation
361
362 ### Performance Optimization
363 - **Model Size**: Balance comprehensiveness with computational efficiency
364 - **Reaction Pruning**: Remove unnecessary or blocked reactions
365 - **Compartmentalization**: Minimize unnecessary compartments
366 - **Validation**: Test model properties before distribution
367
368 ## Troubleshooting
369
370 ### Common Issues
371
372 **Conversion fails with format error**
373 - Check CSV/TSV column headers and data consistency
374 - Verify reaction formula syntax
375 - Ensure numeric fields contain valid numbers
376
377 **Model is infeasible after conversion**
378 - Check reaction bounds for conflicts
379 - Verify exchange reaction setup
380 - Validate stoichiometric balance
381
382 **Missing metabolites or reactions**
383 - Confirm all required columns present in input
384 - Check for empty rows or malformed data
385 - Validate reaction formula parsing
386
387 ### Error Messages
388
389 | Error | Cause | Solution |
390 |-------|-------|----------|
391 | "Input file not found" | Invalid file path | Check file location and permissions |
392 | "Unknown format" | Invalid output format | Use: sbml, json, mat, or yaml |
393 | "Formula parsing failed" | Malformed reaction equation | Check reaction formula syntax |
394 | "Model infeasible" | Conflicting constraints | Review bounds and exchange reactions |
395
396 ### Performance Issues
397
398 **Slow conversion**
399 - Large input files require more processing time
400 - Complex GPR rules increase parsing overhead
401 - Monitor system memory usage
402
403 **Memory errors**
404 - Reduce model size or split into smaller files
405 - Increase available system memory
406 - Use more efficient data structures
407
408 **Output file corruption**
409 - Ensure sufficient disk space
410 - Check file write permissions
411 - Verify format-specific requirements
412
413 ## Advanced Usage
414
415 ### Batch Model Creation
416
417 ```python
418 #!/usr/bin/env python3
419 import subprocess
420 import pandas as pd
421
422 # Create multiple tissue-specific models
423 tissues = ['liver', 'muscle', 'brain', 'heart']
424 base_data = pd.read_csv('base_model.csv')
425
426 for tissue in tissues:
427 # Modify base data for tissue specificity
428 tissue_data = customize_for_tissue(base_data, tissue)
429 tissue_data.to_csv(f'{tissue}_model.csv', index=False)
430
431 # Convert to SBML
432 subprocess.run([
433 'tabular2MetabolicModel',
434 '--input', f'{tissue}_model.csv',
435 '--format', 'sbml',
436 '--output', f'{tissue}_model.xml',
437 '--out_log', f'{tissue}_conversion.log'
438 ])
439 ```
440
441 ### Model Merging
442
443 Combine multiple tabular files into comprehensive models:
444
445 ```bash
446 # Merge core metabolism with tissue-specific pathways
447 cat core_reactions.csv > combined_model.csv
448 tail -n +2 tissue_reactions.csv >> combined_model.csv
449 tail -n +2 disease_reactions.csv >> combined_model.csv
450
451 # Create merged model
452 tabular2MetabolicModel --input combined_model.csv \
453 --format sbml \
454 --output comprehensive_model.xml
455 ```
456
457 ### Model Versioning
458
459 Track model versions and changes:
460
461 ```bash
462 # Version control for model development
463 git add model_v1.csv
464 git commit -m "Initial model version"
465
466 # Create versioned models
467 tabular2MetabolicModel --input model_v1.csv --format sbml --output model_v1.xml
468 tabular2MetabolicModel --input model_v2.csv --format sbml --output model_v2.xml
469
470 # Compare model versions
471 cobra_compare_models model_v1.xml model_v2.xml
472 ```
473
474 ## See Also
475
476 - [Metabolic Model Setting](metabolic-model-setting.md) - Extract tabular data from existing models
477 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom models
478 - [Flux Simulation](flux-simulation.md) - Analyze custom models with flux sampling
479 - [Model Creation Tutorial](../tutorials/custom-model-creation.md)
480 - [COBRA Model Standards](../tutorials/cobra-model-standards.md)