| Boxplot: Transcriptome data | Boxplot: Proteome data | 
|---|---|
|  |  | 
| Boxplot: Transcriptome data | Boxplot: Proteome data | 
|---|---|
|  |  | 
| Boxplot: Transcriptome data | Boxplot: Proteome data | 
|---|---|
|  |  | 
| Transcript Fold-Change | Protein Fold-Change | 
|---|---|
|  |  | 
| PCA plot: Transcriptome data | PCA plot: Proteome data | 
|---|---|
|  |  | 
| Scatter plot between Proteome and Transcriptome Abundance | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|  | ||||||||||||
| 
 Below we use Cook's distance based approach to identify such influential observations. | 
Assuming a linear relationship between Proteome and Transcriptome data, we here fit a linear regression model.
| Parameter | Value | 
|---|---|
| Formula | PE_abundance~TE_abundance | 
| Coefficients | |
| (Intercept) | -0.06910598 (Pvalue: 1.220723e-05 ) | 
| TE_abundance | 0.1712395 (Pvalue: 4.168015e-10 ) | 
| Model parameters | |
| Residual standard error | 0.8363295 ( 2815 degree of freedom) | 
| F-statistic | 39.31142 ( on 1 and 2815 degree of freedom) | 
| R-squared | 0.01377265 | 
| Adjusted R-squared | 0.0134223 | 
| 1) Residuals vs Fitted plot | 2) Normal Q-Q plot of residuals | 
|---|---|
|  |  | 
| This plot checks for linear relationship assumptions. If a horizontal line is observed without any distinct patterns, it indicates a linear relationship. | This plot checks whether residuals are normally distributed or not. It is good if the residuals points follow the straight dashed line i.e., do not deviate much from dashed line. | 
| Residuals from Regression | |
|---|---|
| Parameter | Value | 
| Mean Residual value | 1.942328e-17 | 
| Standard deviation (Residuals) | 0.836181 | 
| Total outliers (Residual value > 2 standard deviation from the mean) | 164 (Download these 164 data points with high residual values here) | 
| (Download the complete residuals data here) | |
| 3) Residuals vs Leverage plot | 
|---|
|  | 
| This plot is useful to identify any influential cases, that is outliers or extreme values. They might influence the regression results upon inclusion or exclusion from the analysis. | 
Cook's distance computes the influence of each data point/observation on the predicted outcome. i.e. this measures how much the observation is influencing the fitted values.
In general use, those observations that have a Cook's distance > than  4  times the mean may be classified as influential.
 
 | Parameter | Value | 
|---|---|
| Mean Cook's distance | 0.0004875011 | 
| Total influential observations (Cook's distance > 4 * mean Cook's distance) | 115 | 
| Observations with Cook's distance < 4 * mean Cook's distance | 2702 | 
| Scatterplot: Before removal | Scatterplot: After removal | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|  |  | ||||||||||||||||||||||||
| 
 | 
 | 
| Gene | Protein Log Fold-Change | Transcript Log Fold-Change | Cook's Distance | 
|---|---|---|---|
| CATHL2 | -1.960863 | 4.88565 | 0.1432189 | 
| CD177 | -4.173263 | 2.057499 | 0.06826605 | 
| CATHL1 | -0.9912973 | 4.835209 | 0.05767091 | 
| HP | 2.570727 | 3.885549 | 0.04680496 | 
| AZU1 | -2.226356 | -5.561874 | 0.03737565 | 
| ELANE | -2.732479 | -2.914936 | 0.03266198 | 
| PYGM | -0.06079228 | 6.071712 | 0.03242859 | 
| LTF | -2.4294 | 2.129742 | 0.02725017 | 
| ATP1A2 | 0.2871971 | 6.446299 | 0.01939256 | 
| C13H20orf194 | -5.640732 | -0.6697401 | 0.01852927 | 
| Heatmap of PE and TE abundance values (Hierarchical clustering) | Number of clusters to extract: 5 | 
|---|---|
|  | |
| Download the hierarchical cluster list | |
| K-mean clustering | Number of clusters: 4 | 
|---|---|
|  | |
| Download the cluster list | |