QuanTP: Association between abundance ratios of transcript and protein


Input data summary


Table of Contents:


SAMPLE DISTRIBUTION

Boxplot: Transcriptome dataBoxplot: Proteome data

Sample wise distribution (Box plot) after using mean on replicates

Boxplot: Transcriptome dataBoxplot: Proteome data

Distribution (Box plot) of log fold change

Boxplot: Transcriptome dataBoxplot: Proteome data


Download the complete fold change data here

Transcript Fold-ChangeProtein Fold-Change



PCA plot: Transcriptome dataPCA plot: Proteome data

CORRELATION


Scatter plot between Proteome and Transcriptome Abundance
ParameterMethod 1Method 2Method 3
Correlation method Pearson's product-moment correlation Spearman's rank correlation rho Kendall's rank correlation tau
Correlation coefficient 0.1173569 0.1608612 0.1093701
*Note that correlation is sensitive to outliers in the data. So it is important to analyze outliers/influential observations in the data.
Below we use Cook's distance based approach to identify such influential observations.

REGRESSION ANALYSIS

Linear Regression model fit between Proteome and Transcriptome data

Assuming a linear relationship between Proteome and Transcriptome data, we here fit a linear regression model.

ParameterValue
Formula PE_abundance~TE_abundance
Coefficients
(Intercept) -0.06910598 (Pvalue: 1.220723e-05 )
TE_abundance 0.1712395 (Pvalue: 4.168015e-10 )
Model parameters
Residual standard error 0.8363295 ( 2815 degree of freedom)
F-statistic 39.31142 ( on 1 and 2815 degree of freedom)
R-squared 0.01377265
Adjusted R-squared 0.0134223

Regression and diagnostics plots

1) Residuals vs Fitted plot

2) Normal Q-Q plot of residuals

This plot checks for linear relationship assumptions.
If a horizontal line is observed without any distinct patterns, it indicates a linear relationship.
This plot checks whether residuals are normally distributed or not.
It is good if the residuals points follow the straight dashed line i.e., do not deviate much from dashed line.

Outliers based on the residuals from regression analysis

Residuals from Regression
ParameterValue
Mean Residual value 1.942328e-17
Standard deviation (Residuals) 0.836181
Total outliers (Residual value > 2 standard deviation from the mean) 164 (Download these 164 data points with high residual values here)
(Download the complete residuals data here)




3) Residuals vs Leverage plot

This plot is useful to identify any influential cases, that is outliers or extreme values.
They might influence the regression results upon inclusion or exclusion from the analysis.


INFLUENTIAL OBSERVATIONS

Cook's distance computes the influence of each data point/observation on the predicted outcome. i.e. this measures how much the observation is influencing the fitted values.
In general use, those observations that have a Cook's distance > than 4 times the mean may be classified as influential.


In the above plot, observations above red line ( 4 * mean Cook's distance) are influential. Genes that are outliers could be important. These observations influences the correlation values and regression coefficients

ParameterValue
Mean Cook's distance 0.0004875011
Total influential observations (Cook's distance > 4 * mean Cook's distance) 115
Observations with Cook's distance < 4 * mean Cook's distance 2702


Scatterplot: Before removalScatterplot: After removal
ParameterMethod 1Method 2Method 3
Correlation method Pearson's product-moment correlation Spearman's rank correlation rho Kendall's rank correlation tau
Correlation coefficient 0.1173569 0.1608612 0.1093701
ParameterMethod 1Method 2Method 3
Correlation method Pearson's product-moment correlation Spearman's rank correlation rho Kendall's rank correlation tau
Correlation coefficient 0.1334038 0.1611936 0.1082761


Download the complete list of influential observations     Download the complete list (After removing influential points)

Top 10 Influential observations (Cook's distance > 4 * mean Cook's distance)

GeneProtein Log Fold-ChangeTranscript Log Fold-ChangeCook's Distance
CATHL2 -1.960863 4.88565 0.1432189
CD177 -4.173263 2.057499 0.06826605
CATHL1 -0.9912973 4.835209 0.05767091
HP 2.570727 3.885549 0.04680496
AZU1 -2.226356 -5.561874 0.03737565
ELANE -2.732479 -2.914936 0.03266198
PYGM -0.06079228 6.071712 0.03242859
LTF -2.4294 2.129742 0.02725017
ATP1A2 0.2871971 6.446299 0.01939256
C13H20orf194 -5.640732 -0.6697401 0.01852927



CLUSTER ANALYSIS


Heatmap of PE and TE abundance values (Hierarchical clustering)Number of clusters to extract: 5
Download the hierarchical cluster list


K-mean clusteringNumber of clusters: 4
Download the cluster list


Go To:


TOP