Mercurial > repos > bimib > cobraxy
view COBRAxy/docs/tools/marea-cluster.md @ 547:73f2f7e2be17 draft
Uploaded
| author | francesco_lapi |
|---|---|
| date | Tue, 28 Oct 2025 10:44:07 +0000 |
| parents | fcdbc81feb45 |
| children |
line wrap: on
line source
# MAREA Cluster Cluster analysis for metabolic data (RAS/RPS scores, flux distributions). ## Overview MAREA Cluster performs unsupervised clustering on metabolic data using K-means, DBSCAN, or hierarchical algorithms. ## Galaxy Interface In Galaxy: **COBRAxy → Cluster Analysis** 1. Upload metabolic data file 2. Select clustering algorithm and parameters 3. Click **Run tool** ## Command-line console ```bash marea_cluster -in metabolic_data.tsv \ -cy kmeans \ -sc true \ -k1 2 \ -k2 10 \ -idop output/ ``` ## Parameters | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| | Input Data | `-in` | Metabolic data TSV file | - | | Algorithm | `-cy` | kmeans, dbscan, hierarchy | kmeans | | Scaling | `-sc` | Scale data | false | | K Min | `-k1` | Minimum clusters (K-means/hierarchy) | 2 | | K Max | `-k2` | Maximum clusters (K-means/hierarchy) | 10 | | Epsilon | `-ep` | DBSCAN radius | 0.5 | | Min Samples | `-ms` | DBSCAN minimum samples | 5 | | Elbow Plot | `-el` | Generate elbow plot | false | | Silhouette | `-si` | Compute silhouette scores | false | | Output Path | `-idop` | Output directory | marea_cluster/ | ## Input Format ``` Reaction Sample1 Sample2 Sample3 R00001 1.25 0.85 1.42 R00002 0.65 1.35 0.72 ``` **File Format Notes:** - Use **tab-separated** values (TSV) or **comma-separated** (CSV) - First row must contain column headers (Reaction, Sample names) - Numeric values only for metabolic data - Missing values should be avoided or handled before clustering ## Algorithms - **K-means**: Fast, requires number of clusters - **DBSCAN**: Density-based, handles noise and irregular shapes - **Hierarchical**: Tree-based, good for small datasets ## Output - `clusters.tsv`: Sample assignments - `silhouette_scores.tsv`: Cluster quality metrics - `elbow_plot.svg`: Optimal K visualization (K-means) - `*.log`: Processing log ## See Also - [MAREA](tools/marea) - [RAS Generator](tools/ras-generator) - [Flux Simulation](tools/flux-simulation)
