view fastqc_report.Rmd @ 15:d1d20f341632 draft

fastqc_report v2.0.0
author mingchen0919
date Thu, 19 Oct 2017 00:11:14 -0400
parents 2efa46ce2c4c
children 1710b0e874f1
line wrap: on
line source

---
title: 'Short reads evaluation with [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)'
output:
    html_document:
      number_sections: true
      toc: true
      theme: cosmo
      highlight: tango
---

```{r setup, include=FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(
  echo = ECHO
)
```


# Fastqc Analysis

* Copy fastq files to job working directory

```{bash 'copy files'}
for f in $(echo READS | sed "s/,/ /g")
do
    cp $f ./
done
```

* Run fastqc

```{bash 'run fastqc'}
for r in $(ls *.dat)
do
    fastqc -o REPORT_DIR $r > /dev/null 2>&1
done
```

## Evaluation results

```{r 'html report links'}
html_file = list.files('REPORT_DIR', pattern = '.*html')
tags$ul(tags$a(href=html_file, paste0('HTML report', opt$name)))
```


```{r 'extract fastqc_data.txt and summary.txt'}
# list all zip files
zip_file = list.files(path = 'REPORT_DIR', pattern = '.zip')
unzip(paste0('REPORT_DIR/', zip_file), exdir = 'REPORT_DIR')

unzip_directory = paste0(tail(strsplit(opt$reads, '/')[[1]], 1), '_fastqc/')
fastqc_data_txt_path = paste0('REPORT_DIR/', unzip_directory, 'fastqc_data.txt')
summary_txt_path = paste0('REPORT_DIR/', unzip_directory, 'summary.txt')
```


```{r 'summary.txt'}
tags$ul(tags$a(href=paste0(unzip_directory, 'summary.txt'), 'summary.txt'))
```


```{r 'fastqc_data.txt'}
tags$ul(tags$a(href=paste0(unzip_directory, 'fastqc_data.txt'), 'fastqc_data.txt'))
```


# Fastqc output visualization

## Overview

```{r}
# read.table(fastqc_data_txt_path)
summary_txt = read.csv(summary_txt_path, header = FALSE, sep = '\t')[, 2:1]
names(summary_txt) = c('MODULE', 'PASS/FAIL')
knitr::kable(summary_txt)
```

## Summary by module {.tabset}

* Define a function to extract outputs for each module from fastqc output

```{r 'function definition'}
extract_data_module = function(fastqc_data, module_name) {
  f = readLines(fastqc_data)
  start_line = grep(module_name, f)
  end_module_lines = grep('END_MODULE', f)
  end_line = end_module_lines[which(end_module_lines > start_line)[1]]
  module_data = f[(start_line+1):(end_line-1)]
  writeLines(module_data, 'temp.txt')
  read.csv('temp.txt', sep = '\t')
}
```

### Per base sequence quality

```{r}
pbsq = extract_data_module(fastqc_data_txt_path, 'Per base sequence quality')
knitr::kable(pbsq)
```

### Per tile sequence quality

```{r}
ptsq = extract_data_module(fastqc_data_txt_path, 'Per tile sequence quality')
knitr::kable(ptsq)
```



# Session Info

```{r 'session info'}
sessionInfo()
```