Mercurial > repos > galaxyp > quantp
view test-data/output.html @ 3:a8bedebab467 draft default tip
planemo upload commit 30548b09c6490f2f1f4d953a54c114d8e895fb5a
author | galaxyp |
---|---|
date | Wed, 06 Feb 2019 11:35:44 -0500 |
parents | 75faf9a89f5b |
children |
line wrap: on
line source
<html><head></head><body> <h1><u>QuanTP: Association between abundance ratios of transcript and protein</u></h1><hr/> <font><h3>Input data summary</h3></font> <ul> <li>Abbreviations used: PE (Proteome data) and TE (Transcriptome data) </li><br> <li>Input Proteome data dimension (Row Column): 2817 x 5 </li> <li>Input Transcriptome data dimension (Row Column): 2817 x 5 </li></ul><hr/> <h3 id=table_of_content>Table of Contents:</h3> <ul> <li><a href=#sample_dist>Sample distribution</a></li> <li><a href=#corr_data>Correlation</a></li> <li><a href=#regression_data>Regression analysis</a></li> <li><a href=#inf_obs>Influential observations</a></li> <li><a href=#cluster_data>Cluster analysis</a></li></ul><hr/> <h2 id="sample_dist"><font color=#ff0000>SAMPLE DISTRIBUTION</font></h2> <table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Boxplot: Transcriptome data</font></th><th><font color=#ffcc33>Boxplot: Proteome data</font></th></tr> <tr><td align=center> <img src="Box_TE_all_rep.png" width=500 height=500></td> <td align=center> <img src="Box_PE_all_rep.png" width=500 height=500></td></tr></table> <br><font color="#ff0000"><h3>Sample wise distribution (Box plot) after using mean on replicates </h3></font><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Boxplot: Transcriptome data</font></th><th><font color=#ffcc33>Boxplot: Proteome data</font></th></tr> <tr><td align=center> <img src="Box_TE_rep.png" width=500 height=500></td> <td align=center> <img src="Box_PE_rep.png" width=500 height=500></td></tr></table> <br><font color="#ff0000"><h3>Distribution (Box plot) of log fold change </h3></font><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Boxplot: Transcriptome data</font></th><th><font color=#ffcc33>Boxplot: Proteome data</font></th></tr> <tr><td align=center> <img src="Box_TE.png" width=500 height=500></td> <td align=center> <img src="Box_PE.png" width=500 height=500></td></tr></table> <br><br><font size=5><b><a href='PE_TE_logfold_pval.txt' target='_blank'>Download the complete fold change data here</a></b></font><br> <br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Transcript Fold-Change</font></th><th><font color=#ffcc33>Protein Fold-Change</font></th></tr> <tr><td align=center> <img src="TE_volcano.png" width=600 height=600></td> <td align=center> <img src="PE_volcano.png" width=600 height=600></td></tr></table><br> <br><br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>PCA plot: Transcriptome data</font></th><th><font color=#ffcc33>PCA plot: Proteome data</font></th></tr> <tr><td align=center> <img src="PCA_TE_all_rep.png" width=500 height=500></td> <td align=center> <img src="PCA_PE_all_rep.png" width=500 height=500></td></tr></table> <hr/><h2 id="corr_data"><font color=#ff0000>CORRELATION</font></h2> <br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Scatter plot between Proteome and Transcriptome Abundance</font></th></tr> <tr><td align=center> <img src="TE_PE_scatter.png" width=800 height=800></td> <tr><td align=center> <table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Method 1</font></th><th><font color=#ffcc33>Method 2</font></th><th><font color=#ffcc33>Method 3</font></th></tr> <tr><td>Correlation method</td><td> Pearson's product-moment correlation </td><td> Spearman's rank correlation rho </td><td> Kendall's rank correlation tau </td></tr> <tr><td>Correlation coefficient</td><td> 0.1173569 </td><td> 0.1608612 </td><td> 0.1093701 </td></tr> </table> <font color="red">*Note that <u>correlation</u> is <u>sensitive to outliers</u> in the data. So it is important to analyze outliers/influential observations in the data.<br> Below we use <u>Cook's distance based approach</u> to identify such influential observations.</font> </td></table><hr/><h2 id="regression_data"><font color=#ff0000>REGRESSION ANALYSIS</font></h2> <font><h3>Linear Regression model fit between Proteome and Transcriptome data</h3></font> <p>Assuming a linear relationship between Proteome and Transcriptome data, we here fit a linear regression model.</p> <table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Value</font></th></tr> <tr><td>Formula</td><td> PE_abundance~TE_abundance </td></tr> <tr><td colspan='2' align='center'> <b>Coefficients</b></td> </tr> <tr><td> (Intercept) </td><td> -0.06910598 (Pvalue: 1.220723e-05 ) </td></tr> <tr><td> TE_abundance </td><td> 0.1712395 (Pvalue: 4.168015e-10 ) </td></tr> <tr><td colspan='2' align='center'> <b>Model parameters</b></td> </tr> <tr><td>Residual standard error</td><td> 0.8363295 ( 2815 degree of freedom)</td></tr> <tr><td>F-statistic</td><td> 39.31142 ( on 1 and 2815 degree of freedom)</td></tr> <tr><td>R-squared</td><td> 0.01377265 </td></tr> <tr><td>Adjusted R-squared</td><td> 0.0134223 </td></tr> </table> <font color='#ff0000'><h3>Regression and diagnostics plots</h3></font> <table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "><tr bgcolor="#7a0019"><th> <font color='#ffcc33'><h4>1) <u>Residuals vs Fitted plot</h4></font></u></th> <th><font color=#ffcc33><h4>2) <u>Normal Q-Q plot of residuals</h4></font></u></th></tr> <tr><td align=center><img src="PE_TE_lm_1.png" width=600 height=600></td><td align=center><img src="PE_TE_lm_2.png" width=600 height=600></td></tr> <tr><td align=center>This plot checks for linear relationship assumptions.<br>If a horizontal line is observed without any distinct patterns, it indicates a linear relationship.</td> <td align=center>This plot checks whether residuals are normally distributed or not.<br>It is good if the residuals points follow the straight dashed line i.e., do not deviate much from dashed line.</td></tr></table> <br><h2 id="inf_obs"><font color=#ff0000>Outliers based on the residuals from regression analysis</font></h2> <table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th colspan=2><font color=#ffcc33>Residuals from Regression</font></th></tr> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Value</font></th></tr> <tr><td>Mean Residual value</td><td> 1.942328e-17 </td></tr> <tr><td>Standard deviation (Residuals)</td><td> 0.836181 </td></tr> <tr><td>Total outliers (Residual value > 2 standard deviation from the mean)</td><td> 164 <font size=4>(<b><a href=PE_TE_outliers_residuals.txt target="_blank">Download these 164 data points with high residual values here</a></b>)</font></td> <tr><td colspan=2 align=center><font size=4>(<b><a href=PE_TE_abundance_residuals.txt target="_blank">Download the complete residuals data here</a></b>)</font></td></td> </table><br><br> <br><br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "><tr bgcolor="#7a0019"><th><font color=#ffcc33><h4>3) <u>Residuals vs Leverage plot</h4></font></u></th></tr> <tr><td align=center><img src="PE_TE_lm_5.png" width=600 height=600></td></tr> <tr><td align=center>This plot is useful to identify any influential cases, that is outliers or extreme values.<br>They might influence the regression results upon inclusion or exclusion from the analysis.</td></tr></table><br> <hr/><h2 id="inf_obs"><font color=#ff0000>INFLUENTIAL OBSERVATIONS</font></h2> <p><b>Cook's distance</b> computes the influence of each data point/observation on the predicted outcome. i.e. this measures how much the observation is influencing the fitted values.<br>In general use, those observations that have a <b>Cook's distance > than 4 times the mean</b> may be classified as <b>influential.</b></p> <img src="PE_TE_lm_cooksd.png" width=800 height=800> <br>In the above plot, observations above red line ( 4 * mean Cook's distance) are influential. Genes that are outliers could be important. These observations influences the correlation values and regression coefficients<br><br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Value</font></th></tr> <tr><td>Mean Cook's distance</td><td> 0.0004875011 </td></tr> <tr><td>Total influential observations (Cook's distance > 4 * mean Cook's distance)</td><td> 115 </td> <tr><td>Observations with Cook's distance < 4 * mean Cook's distance</td><td> 2702 </td> </table><br><br> <table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Scatterplot: Before removal</font></th><th><font color=#ffcc33>Scatterplot: After removal</font></th></tr> <tr><td align=center><!--<font color='#ff0000'><h3>Scatter plot between Proteome and Transcriptome Abundance</h3></font> --> <img src="TE_PE_scatter.png" width=600 height=600></td> <td align=center> <img src="AbundancePlot_scatter_without_outliers.png" width=600 height=600></td></tr> <tr><td> <table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Method 1</font></th><th><font color=#ffcc33>Method 2</font></th><th><font color=#ffcc33>Method 3</font></th></tr> <tr><td>Correlation method</td><td> Pearson's product-moment correlation </td><td> Spearman's rank correlation rho </td><td> Kendall's rank correlation tau </td></tr> <tr><td>Correlation coefficient</td><td> 0.1173569 </td><td> 0.1608612 </td><td> 0.1093701 </td></tr> </table> </td> <td><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Method 1</font></th><th><font color=#ffcc33>Method 2</font></th><th><font color=#ffcc33>Method 3</font></th></tr> <tr><td>Correlation method</td><td> Pearson's product-moment correlation </td><td> Spearman's rank correlation rho </td><td> Kendall's rank correlation tau </td></tr> <tr><td>Correlation coefficient</td><td> 0.1334038 </td><td> 0.1611936 </td><td> 0.1082761 </td></tr> </table></td></tr></table> <br><br><font size=5><b><a href='PE_TE_influential_observation.txt' target='_blank'>Download the complete list of influential observations</a></b></font> <font size=5><b><a href='PE_TE_non_influential_observation.txt' target='_blank'>Download the complete list (After removing influential points)</a></b></font><br> <br><font color="brown"><h4>Top 10 Influential observations (Cook's distance > 4 * mean Cook's distance)</h4></font> <table border=1 cellspacing=0 cellpadding=5> <tr bgcolor="#7a0019"> <th><font color=#ffcc33>Gene</font></th><th><font color=#ffcc33>Protein Log Fold-Change</font></th><th><font color=#ffcc33>Transcript Log Fold-Change</font></th><th><font color=#ffcc33>Cook's Distance</font></th></tr> <tr> <td> CATHL2 </td> <td> -1.960863 </td> <td> 4.88565 </td> <td> 0.1432189 </td></tr> <tr> <td> CD177 </td> <td> -4.173263 </td> <td> 2.057499 </td> <td> 0.06826605 </td></tr> <tr> <td> CATHL1 </td> <td> -0.9912973 </td> <td> 4.835209 </td> <td> 0.05767091 </td></tr> <tr> <td> HP </td> <td> 2.570727 </td> <td> 3.885549 </td> <td> 0.04680496 </td></tr> <tr> <td> AZU1 </td> <td> -2.226356 </td> <td> -5.561874 </td> <td> 0.03737565 </td></tr> <tr> <td> ELANE </td> <td> -2.732479 </td> <td> -2.914936 </td> <td> 0.03266198 </td></tr> <tr> <td> PYGM </td> <td> -0.06079228 </td> <td> 6.071712 </td> <td> 0.03242859 </td></tr> <tr> <td> LTF </td> <td> -2.4294 </td> <td> 2.129742 </td> <td> 0.02725017 </td></tr> <tr> <td> ATP1A2 </td> <td> 0.2871971 </td> <td> 6.446299 </td> <td> 0.01939256 </td></tr> <tr> <td> C13H20orf194 </td> <td> -5.640732 </td> <td> -0.6697401 </td> <td> 0.01852927 </td></tr> </table><br><br> <hr/><h2 id="cluster_data"><font color=#ff0000>CLUSTER ANALYSIS</font></h2> <br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Heatmap of PE and TE abundance values (Hierarchical clustering)</font></th><th><font color=#ffcc33>Number of clusters to extract: 5 </font></th></tr> <tr><td align=center colspan="2"><img src="PE_TE_heatmap.png" width=800 height=800></td></tr> <tr><td colspan="2" align=center><font size=5><a href="PE_TE_hc_clusterpoints.txt" target="_blank"><b>Download the hierarchical cluster list</b></a></font></td></tr></table> <br><br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>K-mean clustering</font></th><th><font color=#ffcc33>Number of clusters: 4 </font></th></tr> <tr><td colspan="2" align=center><img src="PE_TE_kmeans.png" width=800 height=800></td></tr> <tr><td colspan="2" align=center><font size=5><a href="PE_TE_kmeans_clusterpoints.txt" target="_blank"><b>Download the cluster list</b></a></font></td></tr></table><br><hr/> <h3>Go To:</h3> <ul> <li><a href=#sample_dist>Sample distribution</a></li> <li><a href=#corr_data>Correlation</a></li> <li><a href=#regression_data>Regression analysis</a></li> <li><a href=#inf_obs>Influential observations</a></li> <li><a href=#cluster_data>Cluster analysis</a></li></ul> <br><a href=#>TOP</a></body></html>