# HG changeset patch # User iuc # Date 1523130341 14400 # Node ID 4c7ab9995f9ed70f1c405ef5c7d25d1138254f62 # Parent d7725c5596ab8cd745b8105874a71dc26afeb346 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/diffbind commit cc4c1c4131518b9cbf986a1f252767ff73ca938e diff -r d7725c5596ab -r 4c7ab9995f9e diffbind.R --- a/diffbind.R Tue Mar 20 04:51:25 2018 -0400 +++ b/diffbind.R Sat Apr 07 15:45:41 2018 -0400 @@ -4,8 +4,9 @@ Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") suppressPackageStartupMessages({ - library('getopt') - library('DiffBind') + library('getopt') + library('DiffBind') + library('rjson') }) options(stringAsfactors = FALSE, useFancyQuotes = FALSE) @@ -14,15 +15,19 @@ #get options, using the spec as defined by the enclosed list. #we read the options from the default: commandArgs(TRUE). spec = matrix(c( - 'verbose', 'v', 2, "integer", - 'help' , 'h', 0, "logical", + 'infile' , 'i', 1, "character", 'outfile' , 'o', 1, "character", + 'scorecol', 'n', 1, "integer", + 'lowerbetter', 'l', 1, "logical", + 'summits', 's', 1, "integer", + 'th', 't', 1, "double", + 'format', 'f', 1, "character", 'plots' , 'p', 2, "character", - 'infile' , 'i', 1, "character", - 'format', 'f', 1, "character", - 'th', 't', 1, "double", 'bmatrix', 'b', 0, "logical", - "rdaOpt", "r", 0, "logical" + "rdaOpt", "r", 0, "logical", + 'infoOpt' , 'a', 0, "logical", + 'verbose', 'v', 2, "integer", + 'help' , 'h', 0, "logical" ), byrow=TRUE, ncol=4); opt = getopt(spec); @@ -34,31 +39,81 @@ q(status=1); } -if ( !is.null(opt$plots) ) { - pdf(opt$plots) +parser <- newJSONParser() +parser$addData(opt$infile) +factorList <- parser$getObject() +filenamesIn <- unname(unlist(factorList[[1]][[2]])) +peaks <- filenamesIn[grepl("peaks.bed", filenamesIn)] +bams <- filenamesIn[grepl("bamreads.bam", filenamesIn)] +ctrls <- filenamesIn[grepl("bamcontrol.bam", filenamesIn)] + +# get the group and sample id from the peaks filenames +groups <- sapply(strsplit(peaks,"-"), `[`, 1) +samples <- sapply(strsplit(peaks,"-"), `[`, 2) + +if ( length(ctrls) != 0 ) { + sampleTable <- data.frame(SampleID=samples, + Condition=groups, + bamReads=bams, + bamControl=ctrls, + Peaks=peaks, + Tissue=samples, # using "Tissue" column to display ids as labels in PCA plot + stringsAsFactors=FALSE) +} else { + sampleTable <- data.frame(SampleID=samples, + Replicate=samples, + Condition=groups, + bamReads=bams, + Peaks=peaks, + Tissue=samples, + stringsAsFactors=FALSE) } -sample = dba(sampleSheet=opt$infile, peakFormat='bed') -sample_count = dba.count(sample) +sample = dba(sampleSheet=sampleTable, peakFormat='bed', scoreCol=opt$scorecol, bLowerScoreBetter=opt$lowerbetter) + +if ( !is.null(opt$summits) ) { + sample_count = dba.count(sample, summits=opt$summits) +} else { + sample_count = dba.count(sample) +} + sample_contrast = dba.contrast(sample_count, categories=DBA_CONDITION, minMembers=2) sample_analyze = dba.analyze(sample_contrast) -diff_bind = dba.report(sample_analyze) -orvals = dba.plotHeatmap(sample_analyze, contrast=1, correlations=FALSE) -dev.off() +diff_bind = dba.report(sample_analyze, th=opt$th) +# Generate plots +if ( !is.null(opt$plots) ) { + pdf(opt$plots) + orvals = dba.plotHeatmap(sample_analyze, contrast=1, correlations=FALSE, cexCol=0.8, th=opt$th) + dba.plotPCA(sample_analyze, contrast=1, th=opt$th, label=DBA_TISSUE, labelSize=0.3) + dba.plotMA(sample_analyze, th=opt$th) + dba.plotVolcano(sample_analyze, th=opt$th) + dba.plotBox(sample_analyze, th=opt$th) + dev.off() +} + +# Output differential binding sites resSorted <- diff_bind[order(diff_bind$FDR),] -write.table(as.data.frame(resSorted), file = opt$outfile, sep="\t", quote = FALSE, append=TRUE, row.names = FALSE, col.names = FALSE) +write.table(as.data.frame(resSorted), file = opt$outfile, sep="\t", quote = FALSE, append=TRUE, row.names = FALSE) # Output binding affinity scores if (!is.null(opt$bmatrix)) { bmat <- dba.peakset(sample_count, bRetrieve=TRUE, DataType=DBA_DATA_FRAME) - write.table(as.data.frame(bmat), file="bmatrix.tab", sep="\t", quote=FALSE, row.names=FALSE, col.names=FALSE) + write.table(as.data.frame(bmat), file="bmatrix.tab", sep="\t", quote=FALSE, row.names=FALSE) } -## Output RData file - +# Output RData file if (!is.null(opt$rdaOpt)) { save.image(file = "DiffBind_analysis.RData") } -sessionInfo() +# Output analysis info +if (!is.null(opt$infoOpt)) { + info <- "DiffBind_analysis_info.txt" + cat("dba.count Info\n\n", file=info, append = TRUE) + capture.output(sample, file=info, append=TRUE) + cat("\ndba.analyze Info\n\n", file=info, append = TRUE) + capture.output(sample_analyze, file=info, append=TRUE) + cat("\nSessionInfo\n\n", file=info, append = TRUE) + capture.output(sessionInfo(), file=info, append=TRUE) +} \ No newline at end of file diff -r d7725c5596ab -r 4c7ab9995f9e diffbind.xml --- a/diffbind.xml Tue Mar 20 04:51:25 2018 -0400 +++ b/diffbind.xml Sat Apr 07 15:45:41 2018 -0400 @@ -1,8 +1,9 @@ - + differential binding analysis of ChIP-Seq peak data bioconductor-diffbind r-getopt + r-rjson /dev/null | grep -v -i "WARNING: ") +echo $(R --version | grep version | grep -v GNU)", DiffBind version" $(R --vanilla --slave -e "library(DiffBind); cat(sessionInfo()\$otherPkgs\$DiffBind\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", rjson version" $(R --vanilla --slave -e "library(rjson); cat(sessionInfo()\$otherPkgs\$rjson\$Version)" 2> /dev/null | grep -v -i "WARNING: ") ]]> - - - - - - - - - - - - + + + + + + + + + + + + + + - - + + + + + +
@@ -91,8 +135,9 @@ - - + + +
@@ -112,53 +157,43 @@ out['rdata'] + + out['rscript'] + + + out['analysis_info'] + - - - - - - - - - - - - - - - - - - + + + + + + - - - - - - - - + + + + - - - - - - - - - + + + + + + + + + diff -r d7725c5596ab -r 4c7ab9995f9e test-data/DiffBind_analysis.RData Binary file test-data/DiffBind_analysis.RData has changed diff -r d7725c5596ab -r 4c7ab9995f9e test-data/out_analysis_info.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/out_analysis_info.txt Sat Apr 07 15:45:41 2018 -0400 @@ -0,0 +1,83 @@ +dba.count Info + +4 Samples, 1394 sites in matrix (2141 total): + ID Tissue Condition Caller Intervals +1 MCF7_ER_1_bed MCF7_ER_1_bed Responsive macs 1556 +2 MCF7_ER_2_bed MCF7_ER_2_bed Responsive macs 1046 +3 BT474_ER_1_bed BT474_ER_1_bed Resistant macs 1080 +4 BT474_ER_2_bed BT474_ER_2_bed Resistant macs 1122 + +dba.analyze Info + +4 Samples, 1394 sites in matrix: + ID Tissue Condition Caller Intervals FRiP +1 MCF7_ER_1_bed MCF7_ER_1_bed Responsive counts 1394 0.38 +2 MCF7_ER_2_bed MCF7_ER_2_bed Responsive counts 1394 0.22 +3 BT474_ER_1_bed BT474_ER_1_bed Resistant counts 1394 0.27 +4 BT474_ER_2_bed BT474_ER_2_bed Resistant counts 1394 0.25 + +1 Contrast: + Group1 Members1 Group2 Members2 DB.DESeq2 +1 Responsive 2 Resistant 2 5 + +SessionInfo + +R version 3.4.1 (2017-06-30) +Platform: x86_64-apple-darwin14.5.0 (64-bit) +Running under: OS X El Capitan 10.11.6 + +Matrix products: default +BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib +LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib + +locale: +[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_US.UTF-8 + +attached base packages: +[1] parallel stats4 methods stats graphics grDevices utils +[8] datasets base + +other attached packages: + [1] bindrcpp_0.2 rjson_0.2.15 + [3] DiffBind_2.6.6 SummarizedExperiment_1.8.0 + [5] DelayedArray_0.4.1 matrixStats_0.52.2 + [7] Biobase_2.38.0 GenomicRanges_1.30.3 + [9] GenomeInfoDb_1.14.0 IRanges_2.12.0 +[11] S4Vectors_0.16.0 BiocGenerics_0.24.0 +[13] getopt_1.20.0 + +loaded via a namespace (and not attached): + [1] Category_2.44.0 bitops_1.0-6 bit64_0.9-5 + [4] RColorBrewer_1.1-2 progress_1.1.2 httr_1.3.1 + [7] Rgraphviz_2.22.0 tools_3.4.1 backports_1.0.5 + [10] R6_2.2.2 rpart_4.1-13 KernSmooth_2.23-15 + [13] Hmisc_4.0-3 DBI_0.8 lazyeval_0.2.1 + [16] colorspace_1.3-2 nnet_7.3-12 gridExtra_2.3 + [19] DESeq2_1.18.1 prettyunits_1.0.2 bit_1.1-12 + [22] compiler_3.4.1 sendmailR_1.2-1 graph_1.56.0 + [25] htmlTable_1.9 labeling_0.3 rtracklayer_1.38.0 + [28] caTools_1.17.1 scales_0.5.0 checkmate_1.8.2 + [31] BatchJobs_1.6 genefilter_1.60.0 RBGL_1.54.0 + [34] stringr_1.3.0 digest_0.6.12 Rsamtools_1.30.0 + [37] foreign_0.8-67 AnnotationForge_1.20.0 XVector_0.18.0 + [40] htmltools_0.3.6 base64enc_0.1-3 pkgconfig_2.0.1 + [43] limma_3.34.6 htmlwidgets_1.0 rlang_0.2.0 + [46] RSQLite_2.0 BBmisc_1.11 bindr_0.1.1 + [49] GOstats_2.44.0 hwriter_1.3.2 BiocParallel_1.12.0 + [52] gtools_3.5.0 acepack_1.4.1 dplyr_0.7.4 + [55] RCurl_1.95-4.8 magrittr_1.5 Formula_1.2-1 + [58] GO.db_3.5.0 GenomeInfoDbData_0.99.1 Matrix_1.2-12 + [61] Rcpp_0.12.15 munsell_0.4.3 stringi_1.1.6 + [64] edgeR_3.20.7 zlibbioc_1.24.0 gplots_3.0.1 + [67] fail_1.3 plyr_1.8.4 grid_3.4.1 + [70] blob_1.1.1 ggrepel_0.7.0 gdata_2.18.0 + [73] lattice_0.20-34 Biostrings_2.46.0 splines_3.4.1 + [76] GenomicFeatures_1.28.5 annotate_1.56.0 locfit_1.5-9.1 + [79] knitr_1.20 pillar_1.2.1 systemPipeR_1.12.0 + [82] geneplotter_1.56.0 biomaRt_2.34.2 glue_1.2.0 + [85] XML_3.98-1.6 ShortRead_1.36.0 latticeExtra_0.6-28 + [88] data.table_1.10.4 gtable_0.2.0 amap_0.8-14 + [91] assertthat_0.2.0 ggplot2_2.2.1 xtable_1.8-2 + [94] survival_2.40-1 tibble_1.4.2 pheatmap_1.0.8 + [97] GenomicAlignments_1.14.1 AnnotationDbi_1.40.0 memoise_1.1.0 +[100] cluster_2.0.6 brew_1.0-6 GSEABase_1.40.0 diff -r d7725c5596ab -r 4c7ab9995f9e test-data/out_binding.matrix --- a/test-data/out_binding.matrix Tue Mar 20 04:51:25 2018 -0400 +++ b/test-data/out_binding.matrix Sat Apr 07 15:45:41 2018 -0400 @@ -1,17 +1,18 @@ -chr18 111567 112005 29.4139375878664 19.9594576489093 137.615208000375 59.878372946728 -chr18 189223 189652 11.5554754809475 23.110950961895 19.9594576489093 12.6059732519427 -chr18 215232 216063 31.5149331298568 72.4843461986707 11.5554754809475 15.7574665649284 -chr18 311530 312172 54.6258840917518 43.0704086108043 17.8584621069189 11.5554754809475 -chr18 346464 347342 21.0099554199046 16.8079643359236 75.6358395116564 40.9694130688139 -chr18 356560 357362 57.7773774047375 53.5753863207566 11.5554754809475 14.7069687939332 -chr18 371110 372102 81.9388261376278 82.989323908623 8.40398216796182 9.45447993895705 -chr18 394600 396513 510.541916703681 438.05757050501 56.7268796337423 43.0704086108043 -chr18 399014 400382 558.864814169461 496.885445680743 156.524167878289 117.655750351465 -chr18 498906 500200 196.443083176108 181.736114382174 767.913870597511 278.381909313735 -chr18 503518 504552 194.342087634117 231.10950961895 99.7972882445466 61.9793684887184 -chr18 531672 532439 57.7773774047375 73.5348439696659 44.1209063817996 23.110950961895 -chr18 568036 569332 207.998558657055 221.655029679993 165.978647817246 157.574665649284 -chr18 589046 589875 117.655750351466 149.170683481322 50.4238930077709 72.4843461986707 +CHR START END MCF7_ER_1_bed MCF7_ER_2_bed BT474_ER_1_bed BT474_ER_2_bed +chr18 111567 112005 137.615208000375 59.878372946728 29.4139375878664 19.9594576489093 +chr18 189223 189652 19.9594576489093 12.6059732519427 11.5554754809475 23.110950961895 +chr18 215232 216063 11.5554754809475 15.7574665649284 31.5149331298568 72.4843461986707 +chr18 311530 312172 17.8584621069189 11.5554754809475 54.6258840917518 43.0704086108043 +chr18 346464 347342 75.6358395116564 40.9694130688139 21.0099554199046 16.8079643359236 +chr18 356560 357362 11.5554754809475 14.7069687939332 57.7773774047375 53.5753863207566 +chr18 371110 372102 8.40398216796182 9.45447993895705 81.9388261376278 82.989323908623 +chr18 394600 396513 56.7268796337423 43.0704086108043 510.541916703681 438.05757050501 +chr18 399014 400382 156.524167878289 117.655750351465 558.864814169461 496.885445680743 +chr18 498906 500200 767.913870597511 278.381909313735 196.443083176108 181.736114382174 +chr18 503518 504552 99.7972882445466 61.9793684887184 194.342087634117 231.10950961895 +chr18 531672 532439 44.1209063817996 23.110950961895 57.7773774047375 73.5348439696659 +chr18 568036 569332 165.978647817246 157.574665649284 207.998558657055 221.655029679993 +chr18 589046 589875 50.4238930077709 72.4843461986707 117.655750351466 149.170683481322 chr18 651923 652540 1.05049777099523 1.05049777099523 1.05049777099523 1.05049777099523 chr18 657092 657876 1.05049777099523 1.05049777099523 1.05049777099523 1.05049777099523 chr18 770235 770992 1.05049777099523 1.05049777099523 1.05049777099523 1.05049777099523 diff -r d7725c5596ab -r 4c7ab9995f9e test-data/out_diffbind.bed --- a/test-data/out_diffbind.bed Tue Mar 20 04:51:25 2018 -0400 +++ b/test-data/out_diffbind.bed Sat Apr 07 15:45:41 2018 -0400 @@ -1,5 +1,6 @@ -chr18 394600 396513 1914 * 7.15 7.89 5.55 2.35 7.06e-24 9.84e-21 -chr18 111567 112005 439 * 5.71 3.63 6.53 -2.89 1.27e-08 8.88e-06 -chr18 346464 347342 879 * 5 3.24 5.77 -2.52 6.51e-06 0.00303 -chr18 399014 400382 1369 * 7.62 8.05 7 1.04 1.04e-05 0.00364 -chr18 371110 372102 993 * 4.63 5.36 3.07 2.3 8.1e-05 0.0226 +seqnames start end width strand Conc Conc_Responsive Conc_Resistant Fold p.value FDR +chr18 394600 396513 1914 * 7.15 5.55 7.89 -2.35 7.06e-24 9.84e-21 +chr18 111567 112005 439 * 5.71 6.53 3.63 2.89 1.27e-08 8.88e-06 +chr18 346464 347342 879 * 5 5.77 3.24 2.52 6.51e-06 0.00303 +chr18 399014 400382 1369 * 7.62 7 8.05 -1.04 1.04e-05 0.00364 +chr18 371110 372102 993 * 4.63 3.07 5.36 -2.3 8.1e-05 0.0226 diff -r d7725c5596ab -r 4c7ab9995f9e test-data/out_plots.pdf Binary file test-data/out_plots.pdf has changed diff -r d7725c5596ab -r 4c7ab9995f9e test-data/out_rscript.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/out_rscript.txt Sat Apr 07 15:45:41 2018 -0400 @@ -0,0 +1,119 @@ +## Setup R error handling to go to stderr +options( show.error.messages=F, error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) +# we need that to not crash galaxy with an UTF8 error on German LC settings. +Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") + +suppressPackageStartupMessages({ + library('getopt') + library('DiffBind') + library('rjson') +}) + +options(stringAsfactors = FALSE, useFancyQuotes = FALSE) +args <- commandArgs(trailingOnly = TRUE) + +#get options, using the spec as defined by the enclosed list. +#we read the options from the default: commandArgs(TRUE). +spec = matrix(c( + 'infile' , 'i', 1, "character", + 'outfile' , 'o', 1, "character", + 'scorecol', 'n', 1, "integer", + 'lowerbetter', 'l', 1, "logical", + 'summits', 's', 1, "integer", + 'th', 't', 1, "double", + 'format', 'f', 1, "character", + 'plots' , 'p', 2, "character", + 'bmatrix', 'b', 0, "logical", + "rdaOpt", "r", 0, "logical", + 'infoOpt' , 'a', 0, "logical", + 'verbose', 'v', 2, "integer", + 'help' , 'h', 0, "logical" +), byrow=TRUE, ncol=4); + +opt = getopt(spec); + +# if help was asked for print a friendly message +# and exit with a non-zero error code +if ( !is.null(opt$help) ) { + cat(getopt(spec, usage=TRUE)); + q(status=1); +} + +parser <- newJSONParser() +parser$addData(opt$infile) +factorList <- parser$getObject() +filenamesIn <- unname(unlist(factorList[[1]][[2]])) +peaks <- filenamesIn[grepl("peaks.bed", filenamesIn)] +bams <- filenamesIn[grepl("bamreads.bam", filenamesIn)] +ctrls <- filenamesIn[grepl("bamcontrol.bam", filenamesIn)] + +# get the group and sample id from the peaks filenames +groups <- sapply(strsplit(peaks,"-"), `[`, 1) +samples <- sapply(strsplit(peaks,"-"), `[`, 2) + +if ( length(ctrls) != 0 ) { + sampleTable <- data.frame(SampleID=samples, + Condition=groups, + bamReads=bams, + bamControl=ctrls, + Peaks=peaks, + Tissue=samples, # using "Tissue" column to display ids as labels in PCA plot + stringsAsFactors=FALSE) +} else { + sampleTable <- data.frame(SampleID=samples, + Replicate=samples, + Condition=groups, + bamReads=bams, + Peaks=peaks, + Tissue=samples, + stringsAsFactors=FALSE) +} + +sample = dba(sampleSheet=sampleTable, peakFormat='bed', scoreCol=opt$scorecol, bLowerScoreBetter=opt$lowerbetter) + +if ( !is.null(opt$summits) ) { + sample_count = dba.count(sample, summits=opt$summits) +} else { + sample_count = dba.count(sample) +} + +sample_contrast = dba.contrast(sample_count, categories=DBA_CONDITION, minMembers=2) +sample_analyze = dba.analyze(sample_contrast) +diff_bind = dba.report(sample_analyze, th=opt$th) + +# Generate plots +if ( !is.null(opt$plots) ) { + pdf(opt$plots) + orvals = dba.plotHeatmap(sample_analyze, contrast=1, correlations=FALSE, cexCol=0.8, th=opt$th) + dba.plotPCA(sample_analyze, contrast=1, th=opt$th, label=DBA_TISSUE, labelSize=0.3) + dba.plotMA(sample_analyze, th=opt$th) + dba.plotVolcano(sample_analyze, th=opt$th) + dba.plotBox(sample_analyze, th=opt$th) + dev.off() +} + +# Output differential binding sites +resSorted <- diff_bind[order(diff_bind$FDR),] +write.table(as.data.frame(resSorted), file = opt$outfile, sep="\t", quote = FALSE, append=TRUE, row.names = FALSE) + +# Output binding affinity scores +if (!is.null(opt$bmatrix)) { + bmat <- dba.peakset(sample_count, bRetrieve=TRUE, DataType=DBA_DATA_FRAME) + write.table(as.data.frame(bmat), file="bmatrix.tab", sep="\t", quote=FALSE, row.names=FALSE) +} + +# Output RData file +if (!is.null(opt$rdaOpt)) { + save.image(file = "DiffBind_analysis.RData") +} + +# Output analysis info +if (!is.null(opt$infoOpt)) { + info <- "DiffBind_analysis_info.txt" + cat("dba.count Info\n\n", file=info, append = TRUE) + capture.output(sample, file=info, append=TRUE) + cat("\ndba.analyze Info\n\n", file=info, append = TRUE) + capture.output(sample_analyze, file=info, append=TRUE) + cat("\nSessionInfo\n\n", file=info, append = TRUE) + capture.output(sessionInfo(), file=info, append=TRUE) +} \ No newline at end of file