Mercurial > repos > mingchen0919 > rmarkdown_fastqc_report
view fastqc_report.Rmd @ 15:d1d20f341632 draft
fastqc_report v2.0.0
author | mingchen0919 |
---|---|
date | Thu, 19 Oct 2017 00:11:14 -0400 |
parents | 2efa46ce2c4c |
children | 1710b0e874f1 |
line wrap: on
line source
--- title: 'Short reads evaluation with [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)' output: html_document: number_sections: true toc: true theme: cosmo highlight: tango --- ```{r setup, include=FALSE, warning=FALSE, message=FALSE} knitr::opts_chunk$set( echo = ECHO ) ``` # Fastqc Analysis * Copy fastq files to job working directory ```{bash 'copy files'} for f in $(echo READS | sed "s/,/ /g") do cp $f ./ done ``` * Run fastqc ```{bash 'run fastqc'} for r in $(ls *.dat) do fastqc -o REPORT_DIR $r > /dev/null 2>&1 done ``` ## Evaluation results ```{r 'html report links'} html_file = list.files('REPORT_DIR', pattern = '.*html') tags$ul(tags$a(href=html_file, paste0('HTML report', opt$name))) ``` ```{r 'extract fastqc_data.txt and summary.txt'} # list all zip files zip_file = list.files(path = 'REPORT_DIR', pattern = '.zip') unzip(paste0('REPORT_DIR/', zip_file), exdir = 'REPORT_DIR') unzip_directory = paste0(tail(strsplit(opt$reads, '/')[[1]], 1), '_fastqc/') fastqc_data_txt_path = paste0('REPORT_DIR/', unzip_directory, 'fastqc_data.txt') summary_txt_path = paste0('REPORT_DIR/', unzip_directory, 'summary.txt') ``` ```{r 'summary.txt'} tags$ul(tags$a(href=paste0(unzip_directory, 'summary.txt'), 'summary.txt')) ``` ```{r 'fastqc_data.txt'} tags$ul(tags$a(href=paste0(unzip_directory, 'fastqc_data.txt'), 'fastqc_data.txt')) ``` # Fastqc output visualization ## Overview ```{r} # read.table(fastqc_data_txt_path) summary_txt = read.csv(summary_txt_path, header = FALSE, sep = '\t')[, 2:1] names(summary_txt) = c('MODULE', 'PASS/FAIL') knitr::kable(summary_txt) ``` ## Summary by module {.tabset} * Define a function to extract outputs for each module from fastqc output ```{r 'function definition'} extract_data_module = function(fastqc_data, module_name) { f = readLines(fastqc_data) start_line = grep(module_name, f) end_module_lines = grep('END_MODULE', f) end_line = end_module_lines[which(end_module_lines > start_line)[1]] module_data = f[(start_line+1):(end_line-1)] writeLines(module_data, 'temp.txt') read.csv('temp.txt', sep = '\t') } ``` ### Per base sequence quality ```{r} pbsq = extract_data_module(fastqc_data_txt_path, 'Per base sequence quality') knitr::kable(pbsq) ``` ### Per tile sequence quality ```{r} ptsq = extract_data_module(fastqc_data_txt_path, 'Per tile sequence quality') knitr::kable(ptsq) ``` # Session Info ```{r 'session info'} sessionInfo() ```