# HG changeset patch # User sblanck # Date 1618238829 0 # Node ID 3fcbb8030fcc153a2164d686e4f100cfbcf22eee # Parent 94fc6ed13946bc8a186d833239bad96521703741 "planemo upload for repository https://github.com/sblanck/MPAgenomics4Galaxy/tree/master/mpagenomics_wrappers commit 40eda5ea3551e8b3bae32d0a8f405fe90ef22646-dirty" diff -r 94fc6ed13946 -r 3fcbb8030fcc extractCN.R --- a/extractCN.R Tue Jun 16 04:34:09 2020 -0400 +++ b/extractCN.R Mon Apr 12 14:47:09 2021 +0000 @@ -3,14 +3,16 @@ options( show.error.messages=F, error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) # we need that to not crash galaxy with an UTF8 error on German LC settings. -loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") +# loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") library("optparse") +library("zip") ##### Read options option_list=list( make_option("--chrom",type="character",default=NULL, dest="chrom"), make_option("--input",type="character",default=NULL, dest="input"), + make_option("--zip",type="character",default=NULL, dest="zip"), make_option("--output",type="character",default=NULL, dest="output"), make_option("--new_file_path",type="character",default=NULL, dest="new_file_path"), make_option("--settings_type",type="character",default=NULL, dest="settings_type"), @@ -35,6 +37,7 @@ chrom=opt$chrom input=opt$input +zip=opt$zip tmp_dir=opt$new_file_path output=opt$output settingsType=opt$settings_type @@ -47,9 +50,24 @@ user=opt$userid library(MPAgenomics) -workdir=file.path(tmp_dir, "mpagenomics",user) +library(aroma.affymetrix) +library(R.utils) +#workdir=file.path(tmp_dir, "mpagenomics",user) +tmp_dir +tmp_dir=file.path(tmp_dir) +if (!dir.exists(tmp_dir)) + dir.create(tmp_dir, showWarnings = TRUE, recursive = TRUE) +setwd(tmp_dir) +# tmpzip=file.copy(from = zip,to=paste0(workdir,"/tmp.zip")) +# tmpzip +unzip(zipfile = zip,exdir = ".") +# if (file.exists(tmpzip)) { +# #Delete file if it exists +# file.remove(fn) +# } + +workdir=file.path(tmp_dir,user) setwd(workdir) - inputDataset=read.table(file=input,stringsAsFactors=FALSE) dataset=inputDataset[1,2] @@ -75,7 +93,7 @@ CN=getCopyNumberSignal(dataset,chromosome=chrom_vec, onlySNP=snp) } else { - CN=getCopyNumberSignal(dataset,chromosome=chrom_vec, normalTumorArray=tumorcsv, onlySNP=snp) + CN=getCopyNumberSignal(dataset,chromosome=chrom_vec, normalTumorArray=tumorcsv, onlySNP=snp) } } else { input_tmp <- strsplit(settingsType,",") @@ -109,7 +127,9 @@ } symFracB_global=data.frame(check.names = FALSE) - + tumorFile=read.csv(tumorcsv,header=TRUE) + tumor=tumorFile$tumor + input_vecstring=input_vecstring[which(input_vecstring %in% tumor)] for (currentFile in input_vecstring) { cat(paste0("extracting signal from ",currentFile,".\n")) currentSymFracB=data.frame() @@ -124,7 +144,11 @@ if (is.null(symFracB_global) || nrow(symFracB_global)==0) { symFracB_global=currentSymFracB } else { - symFracB_global=cbind(symFracB_global,currentFile=currentSymFracB[[3]]) + #symFracB_global=cbind(symFracB_global,currentFile=currentSymFracB[[3]]) + + symFracB_global=merge(symFracB_global,currentSymFracB[,c(3,4)],by="featureNames") + symFracB_global=symFracB_global[c(2:ncol(symFracB_global),1)] + symFracB_global=symFracB_global[order(symFracB_global$chromosome, symFracB_global$position),] } } names(symFracB_global)[names(symFracB_global)=="featureNames"] <- "probeName" @@ -163,8 +187,11 @@ } +if (dir.exists(workdir)) + system(paste0("rm -r ", workdir)) + if (outputlog){ sink(type="output") sink(type="message") close(sinklog) -} \ No newline at end of file +} diff -r 94fc6ed13946 -r 3fcbb8030fcc extractCN.xml --- a/extractCN.xml Tue Jun 16 04:34:09 2020 -0400 +++ b/extractCN.xml Mon Apr 12 14:47:09 2021 +0000 @@ -1,14 +1,17 @@ - + copy number or allele B fraction signal - mpagenomics - + + sblanck/mpagenomicsdependencies + + + @@ -66,14 +61,14 @@ - + - + + @@ -109,7 +104,6 @@ - @@ -144,8 +138,8 @@ - - + + outputlog == "TRUE" diff -r 94fc6ed13946 -r 3fcbb8030fcc filter.R --- a/filter.R Tue Jun 16 04:34:09 2020 -0400 +++ b/filter.R Mon Apr 12 14:47:09 2021 +0000 @@ -15,6 +15,7 @@ make_option("--nbcall",type="character",default=NULL, dest="nbcall"), make_option("--length",type="character",default=NULL, dest="length"), make_option("--probes",type="character",default=NULL, dest="probes"), + make_option("--settings_signal",type="character",default=NULL, dest="settings_signal"), make_option("--outputlog",type="character",default=NULL, dest="outputlog"), make_option("--log",type="character",default=NULL, dest="log") ); @@ -35,6 +36,7 @@ nbcall=opt$nbcall length=as.numeric(opt$length) probes=as.numeric(opt$probes) +signal=opt$settings_signal log=opt$log outputlog=opt$outputlog @@ -47,14 +49,20 @@ nbcall_tmp <- strsplit(nbcall,",") nbcall_vecstring <-unlist(nbcall_tmp) -nbcall_vecstring - library(MPAgenomics) -workdir=file.path(tmp_dir, "mpagenomics") +workdir=file.path(tmp_dir) +if (!dir.exists(workdir)) + dir.create(workdir, showWarnings = TRUE, recursive = TRUE) setwd(workdir) segcall = read.table(input, header = TRUE) -filtercall=filterSeg(segcall,length,probes,nbcall_vecstring) +if (signal=="fracB") { + segcall=cbind(segcall,calls=rep("normal",nrow(segcall))) + filtercall=filterSeg(segcall,length,probes,nbcall_vecstring) + filtercall=filtercall[,1:(ncol(filtercall)-1)] +} else { + filtercall=filterSeg(segcall,length,probes,nbcall_vecstring) +} #sink(output) #print(format(filtercall),row.names=FALSE) #sink() diff -r 94fc6ed13946 -r 3fcbb8030fcc filter.xml --- a/filter.xml Tue Jun 16 04:34:09 2020 -0400 +++ b/filter.xml Mon Apr 12 14:47:09 2021 +0000 @@ -1,5 +1,7 @@ - - mpagenomics + + + sblanck/mpagenomicsdependencies + @@ -18,13 +26,23 @@ - - - - - - - + + + + + + + + + + + + + + + + + @@ -34,7 +52,7 @@ - + outputlog == "TRUE" diff -r 94fc6ed13946 -r 3fcbb8030fcc preprocess.R --- a/preprocess.R Tue Jun 16 04:34:09 2020 -0400 +++ b/preprocess.R Mon Apr 12 14:47:09 2021 +0000 @@ -3,7 +3,7 @@ options( show.error.messages=F, error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) # we need that to not crash galaxy with an UTF8 error on German LC settings. -loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") +#loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") library("optparse") @@ -24,6 +24,7 @@ make_option("--settingsType",type="character",default=NULL, dest="settingsType"), make_option("--outputgraph",type="character",default=NULL, dest="outputgraph"), make_option("--zipfigures",type="character",default=NULL, dest="zipfigures"), + make_option("--zipresults",type="character",default=NULL, dest="zipresults"), make_option("--outputlog",type="character",default=NULL, dest="outputlog"), make_option("--log",type="character",default=NULL, dest="log"), make_option("--user_id",type="character",default=NULL, dest="user_id"), @@ -55,6 +56,7 @@ settingsType=opt$settingsType outputGraph=opt$outputgraph zipfigures=opt$zipfigures +zipresults=opt$zipresults outputlog=opt$outputlog log=opt$log userId=opt$user_id @@ -129,6 +131,9 @@ library(MPAgenomics) +library(R.utils) +library(aroma.affymetrix) + setwd(workdir) if (outputlog){ @@ -143,13 +148,27 @@ } else { signalPreProcess(dataSetName=dataset, chipType=chip, dataSetPath=celPath,chipFilesPath=chipPath, normalTumorArray=tumor, path=workdir,createArchitecture=createArchitecture, savePlot=outputgraph, tags=tag) } +setwd(mpagenomicsDir) +library(zip) +zipr(zipresults,files=".") setwd(abs_fig_dir) +#abs_fig_dir files2zip <- dir(abs_fig_dir) -zip(zipfile = "figures.zip", files = files2zip) -file.rename("figures.zip",zipfigures) +zipr(zipfigures, files = files2zip) + summarydf=data.frame(celFileNameList,rep(dataSetName,length(celFileNameList)),rep(chipType,length(celFileNameList))) write.table(summarydf,file=summary,quote=FALSE,row.names=FALSE,col.names=FALSE,sep="\t") +if (dir.exists(mpagenomicsDir)) { + system(paste0("rm -r ", mpagenomicsDir)) + dir.create(mpagenomicsDir, showWarnings = TRUE, recursive = TRUE) + } + +if (dir.exists(dataDir)) { + system(paste0("rm -r ", dataDir)) + dir.create(dataDir, showWarnings = TRUE, recursive = TRUE) + } + if (outputlog){ sink(type="output") sink(type="message") diff -r 94fc6ed13946 -r 3fcbb8030fcc preprocess.xml --- a/preprocess.xml Tue Jun 16 04:34:09 2020 -0400 +++ b/preprocess.xml Mon Apr 12 14:47:09 2021 +0000 @@ -1,15 +1,14 @@ - - - - mpagenomics - - + + + sblanck/mpagenomicsdependencies + + @@ -72,7 +72,8 @@ doesn't occur. --> - + + outputgraph == "TRUE" diff -r 94fc6ed13946 -r 3fcbb8030fcc segcall.R --- a/segcall.R Tue Jun 16 04:34:09 2020 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,124 +0,0 @@ -#!/usr/bin/env Rscript -# setup R error handling to go to stderr -options( show.error.messages=F, error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) - -# we need that to not crash galaxy with an UTF8 error on German LC settings. -loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") - -library("optparse") - -##### Read options -option_list=list( - make_option("--chrom",type="character",default=NULL, dest="chrom"), - make_option("--input",type="character",default=NULL, dest="input"), - make_option("--output",type="character",default=NULL, dest="output"), - make_option("--new_file_path",type="character",default=NULL, dest="new_file_path"), - make_option("--nbcall",type="character",default=NULL, dest="nbcall"), - make_option("--settingsType",type="character",default=NULL, dest="settingsType"), - make_option("--outputgraph",type="character",default=NULL, dest="outputgraph"), - make_option("--snp",type="character",default=NULL, dest="snp"), - make_option("--zipfigures",type="character",default=NULL, dest="zipfigures"), - make_option("--settingsTypeTumor",type="character",default=NULL, dest="settingsTypeTumor"), - make_option("--cellularity",type="character",default=NULL, dest="cellularity"), - make_option("--outputlog",type="character",default=NULL, dest="outputlog"), - make_option("--log",type="character",default=NULL, dest="log"), - make_option("--userid",type="character",default=NULL, dest="userid"), - make_option("--method",type="character",default=NULL, dest="method") -); - -opt_parser = OptionParser(option_list=option_list); -opt = parse_args(opt_parser); - -if(is.null(opt$input)){ - print_help(opt_parser) - stop("input required.", call.=FALSE) -} - -#loading libraries - -chrom=opt$chrom -datasetFile=opt$input -output=opt$output -tmp_dir=opt$new_file_path -nbcall=as.numeric(opt$nbcall) -settingsType=opt$settingsType -outputfigures=type.convert(opt$outputgraph) -snp=type.convert(opt$snp) -tumorcsv=opt$settingsTypeTumor -cellularity=as.numeric(opt$cellularity) -user=opt$userid -method=opt$method -log=opt$log -outputlog=opt$outputlog -outputgraph=opt$outputgraph -zipfigures=opt$zipfigures - -library(MPAgenomics) -workdir=file.path(tmp_dir, "mpagenomics",user) -setwd(workdir) - -if (grepl("all",tolower(chrom)) | chrom=="None") { - chrom_vec=c(1:25) - } else { - chrom_tmp <- strsplit(chrom,",") - chrom_vecstring <-unlist(chrom_tmp) - chrom_vec <- as.numeric(chrom_vecstring) - } - - -if (outputlog){ - sinklog <- file(log, open = "wt") - sink(sinklog ,type = "output") - sink(sinklog, type = "message") -} - - -inputDataset=read.table(file=datasetFile,stringsAsFactors=FALSE) -dataset=inputDataset[1,2] - -fig_dir = file.path("mpagenomics", user, "figures", dataset, "segmentation","CN") -abs_fig_dir = file.path(tmp_dir, fig_dir) - -if (outputgraph) { - if (dir.exists(abs_fig_dir)) { - system(paste0("rm -r ", abs_fig_dir)) - } -} - -if (settingsType == 'dataset') { - if (tumorcsv== "none") - { - segcall=cnSegCallingProcess(dataset,chromosome=chrom_vec, nclass=nbcall, savePlot=outputfigures,onlySNP=snp, cellularity=cellularity, method=method) - } else { - segcall=cnSegCallingProcess(dataset,chromosome=chrom_vec, normalTumorArray=tumorcsv, nclass=nbcall, savePlot=outputfigures,onlySNP=snp, cellularity=cellularity, method=method) - } -} else { - input_tmp <- strsplit(settingsType,",") - input_tmp_vecstring <-unlist(input_tmp) - input_vecstring = sub("^([^.]*).*", "\\1", input_tmp_vecstring) - if (tumorcsv== "none") - { - segcall=cnSegCallingProcess(dataset,chromosome=chrom_vec, listOfFiles=input_vecstring, nclass=nbcall, savePlot=outputfigures, onlySNP=snp, cellularity=cellularity, method=method) - } else { - segcall=cnSegCallingProcess(dataset,chromosome=chrom_vec, normalTumorArray=tumorcsv, listOfFiles=input_vecstring, nclass=nbcall, savePlot=outputfigures, onlySNP=snp, cellularity=cellularity, method=method) - } -} - - -write.table(format(segcall),output,row.names = FALSE, quote=FALSE, sep = "\t") - - -if (outputgraph) { - setwd(abs_fig_dir) - files2zip <- dir(abs_fig_dir) - zip(zipfile = "figures.zip", files = files2zip) - file.rename("figures.zip",zipfigures) -} - -if (outputlog){ - sink(type="output") - sink(type="message") - close(sinklog) -} -#write.fwf(segcall,output,rownames = FALSE, quote=FALSE, sep = "\t") - diff -r 94fc6ed13946 -r 3fcbb8030fcc segcall.xml --- a/segcall.xml Tue Jun 16 04:34:09 2020 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,212 +0,0 @@ - - of the normalized data - mpagenomics - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - outputgraph == "TRUE" - - - outputlog == "TRUE" - - - - - - -.. class:: warningmark - -Data normalization must be run with the Data Normalization tool prior to segmentation. Otherwise, the standalone version can be used to perform marker selection from matrices containing data normalized with tools different from the one proposed in this instance. - - ------ - -**What it does** -This tool segments the previously normalized profiles and labels segments found in the copy-number profiles. Otherwise, the standalone version can be used to perform segmentation from matrices containing data normalized with tools different from the one proposed in this instance. - -Outputs: - -*A tabular text file containing 7 columns which describe all the segments (1 line per segment):* - - - sampleNames: Names of the original .CEL files. - - chrom: Chromosome of the segment. - - chromStart: Starting position (in bp) of the segment. This position is not included in the segment. - - chromEnd: Ending position (in bp) of the segment. This position is included in the segment. - - probes: Number of probes in the segment. - - means: Mean of the segment. - - calls: Calling of the segment (”double loss”, ”loss”, ”normal”, ”gain” or ”amplification”). - -*A .zip file containing all the figures (optionnal)* - ------ - -**Normal-tumor study** - -In cases where normal (control) samples match to tumor samples, they are taken as references to extract copy number profile. In this case, a normal-tumor csv file must be provided : - - - The first column contains the names of the files corresponding to normal samples of the dataset. - - - The second column contains the names of the tumor samples files. - - - Column names of these two columns are respectively normal and tumor. - - - Columns are separated by a comma. - - - *Extensions of the files (.CEL for example) should be removed* - - - -**Example** - -Let 6 .cel files in the studied dataset (3 patients, each of them being represented by a couple of normal and tumor cel files.) :: - - patient1_normal.cel - patient1_tumor.cel - patient2_normal.cel - patient2_tumor.cel - patient3_normal.cel - patient3_tumor.cel - - -The csv file should look like this :: - - normal,tumor - patient1_normal,patient1_tumor - patient2_normal,patient2_tumor - patient3_normal,patient3_tumor - ------ - - -**Citation** - -If you use this tool please cite : - -`Q. Grimonprez, A. Celisse, M. Cheok, M. Figeac, and G. Marot. MPAgenomics : An R package for multi-patients analysis of genomic markers, 2014. Preprint <http://fr.arxiv.org/abs/1401.5035>`_ - -As segmentation is performed with PELT, please also cite `R. Killick, P. Fearnhead, and I. A. Eckley. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598, 2012. <http://arxiv.org/abs/1101.1438>`_ - -As segmentation is performed by cghseg, please cite `Picard, F., Robin, S., Lavielle, M., Vaisse, C., and Daudin, J.-J. (2005). A statistical approach for array CGH data analysis. BMC Bioinformatics, 6(1):27. <http://www.ncbi.nlm.nih.gov/pubmed/15705208>`_ , -and also cite Rigaill, G. (2010). `Pruned dynamic programming for optimal multiple change-point detection. <http://arxiv.org/abs/1004.0887>`_ - -When using the labels of the segments, please cite CGHCall `M. A. van de Wiel, K. I. Kim, S. J. Vosse, W. N. van Wieringen, S. M. Wilting, and B. Ylstra. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics, 23(7):892–894, 2007. <http://bioinformatics.oxfordjournals.org/content/23/7/892.abstract>`_ - - - diff -r 94fc6ed13946 -r 3fcbb8030fcc segmentFracB.R --- a/segmentFracB.R Tue Jun 16 04:34:09 2020 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,144 +0,0 @@ -#!/usr/bin/env Rscript -# setup R error handling to go to stderr -options( show.error.messages=F, error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) - -# we need that to not crash galaxy with an UTF8 error on German LC settings. -loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") - -library("optparse") - -##### Read options -option_list=list( - make_option("--chrom",type="character",default=NULL, dest="chrom"), - make_option("--input",type="character",default=NULL, dest="input"), - make_option("--output",type="character",default=NULL, dest="output"), - make_option("--new_file_path",type="character",default=NULL, dest="new_file_path"), - make_option("--settings_type",type="character",default=NULL, dest="settingsType"), - make_option("--output_graph",type="character",default=NULL, dest="outputgraph"), - make_option("--zip_figures",type="character",default=NULL, dest="zipfigures"), - make_option("--settings_tumor",type="character",default=NULL, dest="settingsTypeTumor"), - make_option("--outputlog",type="character",default=NULL, dest="outputlog"), - make_option("--log",type="character",default=NULL, dest="log"), - make_option("--userid",type="character",default=NULL, dest="userid"), - make_option("--method",type="character",default=NULL, dest="method") -); - -opt_parser = OptionParser(option_list=option_list); -opt = parse_args(opt_parser); - -if(is.null(opt$input)){ - print_help(opt_parser) - stop("input required.", call.=FALSE) -} - -#loading libraries - -args<-commandArgs(TRUE) - -chrom=opt$chrom -datasetFile=opt$input -output=opt$output -tmp_dir=opt$new_file_path -input=opt$settingsType -outputfigures=type.convert(opt$outputgraph) -tumorcsv=opt$settingsTypeTumor -user=opt$userid -method=opt$method -log=opt$log -outputlog=opt$outputlog -outputgraph=opt$outputgraph -zipfigures=opt$zipfigures - -#chrom=opt$chrom -#datasetFile=opt$input -#output=opt$output -#tmp_dir=opt$new_file_path -#nbcall=as.numeric(opt$nbcall) -#settingsType=opt$settingsType -#outputfigures=type.convert(opt$outputgraph) -#snp=type.convert(opt$snp) -#tumorcsv=opt$settingsTypeTumor -#cellularity=as.numeric(opt$cellularity) -#user=opt$userid -#method=opt$method -#log=opt$log -#outputlog=opt$outputlog -#outputgraph=opt$outputgraph -#zipfigures=opt$zipfigures - -library(MPAgenomics) -workdir=file.path(tmp_dir, "mpagenomics",user) -setwd(workdir) - -if (grepl("all",tolower(chrom)) | chrom=="None") { - chrom_vec=c(1:25) -} else { - chrom_tmp <- strsplit(chrom,",") - chrom_vecstring <-unlist(chrom_tmp) - chrom_vec <- as.numeric(chrom_vecstring) -} - - -if (outputlog){ - sinklog <- file(log, open = "wt") - sink(sinklog ,type = "output") - sink(sinklog, type = "message") -} - - -inputDataset=read.table(file=datasetFile,stringsAsFactors=FALSE) -dataset=inputDataset[1,2] - - - -library(MPAgenomics) -workdir=file.path(tmp_dir, "mpagenomics",user) -setwd(workdir) - -if (grepl("all",tolower(chrom)) | chrom=="None") { - chrom_vec=c(1:25) -} else { - chrom_tmp <- strsplit(chrom,",") - chrom_vecstring <-unlist(chrom_tmp) - chrom_vec <- as.numeric(chrom_vecstring) -} - -fig_dir = file.path("mpagenomics", user, "figures", dataset, "segmentation","fracB") -abs_fig_dir = file.path(tmp_dir, fig_dir) - -if (outputgraph) { - if (dir.exists(abs_fig_dir)) { - system(paste0("rm -r ", abs_fig_dir)) - } -} - -if (input == 'dataset') { - segcall=segFracBSignal(dataset,chromosome=chrom_vec, normalTumorArray=tumorcsv, savePlot=outputfigures, method=method) - -} else { - input_tmp <- strsplit(input,",") - input_tmp_vecstring <-unlist(input_tmp) - input_vecstring = sub("^([^.]*).*", "\\1", input_tmp_vecstring) - segcall=segFracBSignal(dataset,chromosome=chrom_vec, normalTumorArray=tumorcsv, listOfFiles=input_vecstring, savePlot=outputfigures, method=method) - -} -write.table(segcall,output,row.names = FALSE, quote=FALSE, sep = "\t") - -if (outputgraph) { - setwd(abs_fig_dir) - files2zip <- dir(abs_fig_dir) - zip(zipfile = "figures.zip", files = files2zip) - file.rename("figures.zip",zipfigures) -} - -if (outputlog){ - sink(type="output") - sink(type="message") - close(sinklog) -} - -#sink(output) -#print(format(segcall)) -#sink() - - diff -r 94fc6ed13946 -r 3fcbb8030fcc segmentFracB.xml --- a/segmentFracB.xml Tue Jun 16 04:34:09 2020 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,178 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - outputgraph == "TRUE" - - - outputlog == "TRUE" - - - - - - -.. class:: warningmark - -Data normalization must be run (with the data normalization tool) prior to segmentation. - ------ - -**What it does** -This tool segments allele B fraction extracted from the previously normalized data. This tools works only on normal-tumor study. - -Outputs: - -*A tabular text file containing 6 columns which describe all the segment (1 line per segment):* - - - sampleNames: Name of the file. - - chrom: The chromosome of the segment. - - chromStart: The starting position (in bp) of the segment. This position is not included in the segment. - - chromEnd: The ending position (in bp) of the segment. This position is included in the segment. - - probes: Number of probes in the segment. - - means: Mean of the segment. - -*A .zip file containing all the figures (optionnal)* - ------ - -**Normal-tumor csv files** - -Normal-tumor csv file is required to segment Allele B fraction, because naive genotyping is based on normal samples : - - - The first column contains the names of the files corresponding to normal samples of the dataset. - - - The second column contains the names of the tumor samples files. - - - Column names of these two columns are respectively normal and tumor. - - - Columns are separated by a comma. - - - *Extensions of the files (.CEL for example) should be removed* - - - -**Example** - -Let 6 .cel files in the studied dataset (3 patients, each of them being represented by a couple of normal and tumor cel files.) :: - - patient1_normal.cel - patient1_tumor.cel - patient2_normal.cel - patient2_tumor.cel - patient3_normal.cel - patient3_tumor.cel - - -The csv file should look like this :: - - normal,tumor - patient1_normal,patient1_tumor - patient2_normal,patient2_tumor - patient3_normal,patient3_tumor - ------ - - -**Citation** - -If you use this tool please cite : - -`Q. Grimonprez, A. Celisse, M. Cheok, M. Figeac, and G. Marot. MPAgenomics : An R package for multi-patients analysis of genomic markers, 2014. Preprint <http://fr.arxiv.org/abs/1401.5035>`_ - -If segmentation is performed with PELT, please cite `R. Killick, P. Fearnhead, and I. A. Eckley. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598, 2012. <http://arxiv.org/abs/1101.1438>`_ - -If segmentation is performed by cghseg, please cite `Picard, F., Robin, S., Lavielle, M., Vaisse, C., and Daudin, J.-J. (2005). A statistical approach for array CGH data analysis. BMC Bioinformatics, 6(1):27. <http://www.ncbi.nlm.nih.gov/pubmed/15705208>`_ , -and also cite Rigaill, G. (2010). `Pruned dynamic programming for optimal multiple change-point detection. <http://arxiv.org/abs/1004.0887>`_ - - - \ No newline at end of file diff -r 94fc6ed13946 -r 3fcbb8030fcc segmentation.R --- a/segmentation.R Tue Jun 16 04:34:09 2020 -0400 +++ b/segmentation.R Mon Apr 12 14:47:09 2021 +0000 @@ -3,7 +3,7 @@ options( show.error.messages=F, error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) # we need that to not crash galaxy with an UTF8 error on German LC settings. -loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") +# loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") library("optparse") @@ -58,9 +58,13 @@ #signalType=args[8] library(MPAgenomics) -workdir=file.path(tmp_dir,"mpagenomics",userId) +workdir=file.path(tmp_dir) +if (!dir.exists(workdir)) + dir.create(workdir, showWarnings = TRUE, recursive = TRUE) setwd(workdir) +workdir + if (outputlog){ sinklog <- file(log, open = "wt") sink(sinklog ,type = "output") @@ -92,6 +96,21 @@ callobj= callingObject(copynumber=currentSeg$signal, segmented=currentSeg$segmented,chromosome=rep(chr,length(currentSeg$signal)), position=currentSeg$startPos,sampleNames=sample) currentCall=callingProcess(callobj,nclass=nbcall,cellularity=cellularity,verbose=TRUE) currentResult=currentCall$segment + if(outputgraph) + { + + currentPos=unlist(currentPositions) + figName <- sprintf("%s,%s", sample, chr); + pathname <- file.path(sprintf("%s.png", figName)); + png(filename = pathname, width = 1280, height = 480) + plot(NA,xlim=c(min(currentPos),max(currentPos)), ylim=c(0,6),xlab="Position", main=figName,ylab="CN", pch=".") + points(currentPos, unlist(currentSignal), pch="."); + for(i in 1:nrow(currentResult)) + lines(c(currentResult$chromStart[i],currentResult$chromEnd[i]),rep(currentResult$means[i],2),col="red",lwd=3) + dev.off() + + } + currentResult["sampleNames"]=c(rep(sample,length(currentCall$segment$chrom))) result=rbind(result,currentResult) } @@ -119,17 +138,38 @@ currentResult["chrom"]=c(rep(chr,length(currentSeg$segment$means))) currentResult["sampleNames"]=c(rep(sample,length(currentSeg$segment$means))) result=rbind(result,currentResult) - + if(outputgraph) + { + + currentPos=unlist(currentPositions) + figName <- sprintf("%s,%s", sample, chr); + pathname <- file.path(sprintf("%s.png", figName)); + png(filename = pathname, width = 1280, height = 480) + plot(NA,xlim=c(min(currentPos),max(currentPos)), ylim=c(0,1),xlab="Position", main=figName,ylab="CN", pch=".") + points(currentPos, unlist(currentSignal), pch="."); + print(currentResult) + for(i in 1:nrow(currentResult)) + lines(c(currentResult$start[i],currentResult$end[i]),rep(currentResult$means[i],2),col="red",lwd=3) + dev.off() + + } + + + } cat(paste0("OK\n")) } } finalResult=data.frame(sampleNames=result["sampleNames"],chrom=result["chrom"],chromStart=result["start"],chromEnd=result["end"],probes=result["points"],means=result["means"],stringsAsFactors=FALSE) + colnames(finalResult)=c("sampleNames","chrom","chromStart","chromEnd","probes","means") write.table(finalResult,output,row.names = FALSE, quote=FALSE, sep = "\t") } if (outputgraph){ - file.rename(file.path(tmp_dir,"mpagenomics",userId,"Rplots.pdf"), graph) + library(zip) + files2zip <- dir(pattern=".png") + zipr(graph, files = files2zip) + } if (outputlog){ diff -r 94fc6ed13946 -r 3fcbb8030fcc segmentation.xml --- a/segmentation.xml Tue Jun 16 04:34:09 2020 -0400 +++ b/segmentation.xml Mon Apr 12 14:47:09 2021 +0000 @@ -1,6 +1,8 @@ - + of a previously normalized signal - mpagenomics + + sblanck/mpagenomicsdependencies + outputlog == "TRUE" - + outputgraph == "TRUE" diff -r 94fc6ed13946 -r 3fcbb8030fcc selection.R --- a/selection.R Tue Jun 16 04:34:09 2020 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,135 +0,0 @@ -#!/usr/bin/env Rscript -# setup R error handling to go to stderr -options( show.error.messages=F, error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) - -# we need that to not crash galaxy with an UTF8 error on German LC settings. -loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") - -library("optparse") - -##### Read options -option_list=list( - make_option("--chrom",type="character",default=NULL, dest="chrom"), - make_option("--input",type="character",default=NULL, dest="input"), - make_option("--output",type="character",default=NULL, dest="output"), - make_option("--new_file_path",type="character",default=NULL, dest="new_file_path"), - make_option("--response",type="character",default=NULL, dest="response"), - make_option("--settingsType",type="character",default=NULL, dest="settingsType"), - make_option("--outputgraph",type="character",default=NULL, dest="outputgraph"), - make_option("--settingsSnp",type="character",default=NULL, dest="settingsSnp"), - make_option("--settingsSignal",type="character",default=NULL, dest="settingsSignal"), - make_option("--settingsLoss",type="character",default=NULL, dest="settingsLoss"), - make_option("--pdffigures",type="character",default=NULL, dest="pdffigures"), - make_option("--folds",type="character",default=NULL, dest="folds"), - make_option("--outputlog",type="character",default=NULL, dest="outputlog"), - make_option("--log",type="character",default=NULL, dest="log"), - make_option("--userId",type="character",default=NULL, dest="userid"), - make_option("--settingsPackage",type="character",default=NULL, dest="settingsPackage") -); - -opt_parser = OptionParser(option_list=option_list); -opt = parse_args(opt_parser); - -if(is.null(opt$input)){ - print_help(opt_parser) - stop("input required.", call.=FALSE) -} - -#loading libraries - - -chrom=opt$chrom -dataset=opt$input -dataResponse=opt$response -output=opt$output -tmp_dir=opt$new_file_path -signal=opt$settingsSignal -settingsType=opt$settingsType -outputfigures=type.convert(opt$outputgraph) -snp=type.convert(opt$settingsSnp) -user=opt$userid -folds=as.numeric(opt$folds) -loss=opt$settingsLoss -log=opt$log -outputlog=opt$outputlog -outputgraph=opt$outputgraph -pdffigures=opt$pdffigures -package=opt$settingsPackage - - -library(MPAgenomics) -library(glmnet) -library(spikeslab) -library(lars) - -inputDataset=read.table(file=dataset,stringsAsFactors=FALSE) -input=inputDataset[1,2] -workdir=file.path(tmp_dir, "mpagenomics",user) -print(workdir) -setwd(workdir) - -if (grepl("all",tolower(chrom)) | chrom=="None") { - chrom_vec=c(1:25) - } else { - chrom_tmp <- strsplit(chrom,",") - chrom_vecstring <-unlist(chrom_tmp) - chrom_vec <- as.numeric(chrom_vecstring) - } - -if (outputlog){ - sinklog <- file(log, open = "wt") - sink(sinklog ,type = "output") - sink(sinklog, type = "message") -} - -if (settingsType == "tumor") { - if (signal=="CN") { - res=markerSelection(input,dataResponse, chromosome=chrom_vec, signal=signal, normalTumorArray=tumor, onlySNP=snp, loss=loss, plot=outputfigures, nbFolds=folds, pkg=package) - } else { - res=markerSelection(input,dataResponse, chromosome=chrom_vec,signal=signal,normalTumorArray=tumor, loss=loss, plot=outputfigures, nbFolds=folds,pkg=package) - } -} else { - if (signal=="CN") { - res=markerSelection(input,dataResponse, chromosome=chrom_vec, signal=signal, onlySNP=snp, loss=loss, plot=outputfigures, nbFolds=folds,pkg=package) - } else { - res=markerSelection(input,dataResponse, chromosome=chrom_vec, signal=signal, loss=loss, plot=outputfigures, nbFolds=folds,pkg=package) - } -} - -res - -df=data.frame() -list_chr=names(res) -markerSelected=FALSE - -for (i in list_chr) { - chr_data=res[[i]] - len=length(chr_data$markers.index) - if (len != 0) - { - markerSelected=TRUE - chrdf=data.frame(rep(i,len),chr_data$markers.position,chr_data$markers.index,chr_data$markers.names,chr_data$coefficient) - df=rbind(df,chrdf) - } -} - -if (outputgraph){ - file.rename(file.path(tmp_dir,"mpagenomics",user,"Rplots.pdf"), pdffigures) -} - -if (outputlog){ - sink(type="output") - sink(type="message") - close(sinklog) -} - -if (markerSelected) { - colnames(df) <- c("chr","position","index","names","coefficient") - #sink(output) - #print(format(df),row.names=FALSE) - #sink() - write.table(df,output,row.names = FALSE, quote = FALSE, sep = "\t") -} else - writeLines("no SNP selected", output) - - diff -r 94fc6ed13946 -r 3fcbb8030fcc selection.xml --- a/selection.xml Tue Jun 16 04:34:09 2020 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,235 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - outputgraph == "TRUE" - (settingsLoss['package'] != 'spikeslab') - - - outputlog == "TRUE" - - - - - - -.. class:: warningmark - -Data normalization must be run with the Data Normalization tool prior to SNPs selection. Otherwise, the standalone version can be used to perform marker selection from matrices containing data normalized with tools different from the one proposed in this instance. - ------ - -**What it does** - -This tool selects some relevant markers according to a response using penalized regressions. - -Output: - -A tabular text file containing 5 columns which describe all the selected SNPs (1 line per SNPs): - - - chr: Chromosome containing the selected SNP. - - position: Position of the selected SNP. - - index: Index of the selected SNP. - - names: Name of the selected SNP. - - coefficient: Regression coefficient of the selected SNP. - ------ - -**Data Response csv file** - -Data response csv file format: - - - The first column contains the names of the different files of the data-set. - - - The second column contains the response associated with each file. - - - Column names of these two columns are respectively files and response. - - - Columns are separated by a comma - - - *Extensions of the files (.CEL for example) should be removed* - - - -**Example** - -Let 3 .cel files in the studied dataset :: - - patient1.cel - patient2.cel - patient3.cel - -The csv file should look like this :: - - files,response - patient1,1.92145 - patient2,2.12481 - patient3,1.23545 - - ------ - -**Normal-tumor study** - -In cases where normal (control) samples match to tumor samples, they are taken as references to extract copy number profile. In this case, a normal-tumor csv file must be provided : - - - The first column contains the names of the files corresponding to normal samples of the dataset. - - - The second column contains the names of the tumor samples files. - - - Column names of these two columns are respectively normal and tumor. - - - Columns are separated by a comma. - - - *Extensions of the files (.CEL for example) should be removed* - - -**Example** - -Let 6 .cel files in the studied dataset (3 patients, each of them being represented by a couple of normal and tumor cel file.) :: - - patient1_normal.cel - patient1_tumor.cel - patient2_normal.cel - patient2_tumor.cel - patient3_normal.cel - patient3_tumor.cel - - -The csv file should look like this :: - - normal,tumor - patient1_normal,patient1_tumor - patient2_normal,patient2_tumor - patient3_normal,patient3_tumor - ------ - - - -**Citation** - -If you use this tool please cite : - -`Q. Grimonprez, A. Celisse, M. Cheok, M. Figeac, and G. Marot. MPAgenomics : An R package for multi-patients analysis of genomic markers, 2014. Preprint <http://fr.arxiv.org/abs/1401.5035>`_ - - - diff -r 94fc6ed13946 -r 3fcbb8030fcc selectionExtracted.R --- a/selectionExtracted.R Tue Jun 16 04:34:09 2020 -0400 +++ b/selectionExtracted.R Mon Apr 12 14:47:09 2021 +0000 @@ -50,7 +50,9 @@ #output=args[6] library(MPAgenomics) -workdir=file.path(tmp_dir, "mpagenomics") +workdir=file.path(tmp_dir) +if (!dir.exists(workdir)) + dir.create(workdir, showWarnings = TRUE, recursive = TRUE) setwd(workdir) if (outputlog){ @@ -70,7 +72,7 @@ index = match(listOfFile,rownames(CNsignalMatrix)) responseValueOrder=responseValue[index] -result=variableSelection(CNsignalMatrix,responseValueOrder,nbFolds=nbFolds,loss=loss,plot=TRUE) +result=variableSelection(CNsignalMatrix,responseValueOrder,nbFolds=nbFolds,loss=loss,plot=FALSE) CNsignalResult=CN[result$markers.index,(names(CN)%in% drops)] diff -r 94fc6ed13946 -r 3fcbb8030fcc selectionExtracted.xml --- a/selectionExtracted.xml Tue Jun 16 04:34:09 2020 -0400 +++ b/selectionExtracted.xml Mon Apr 12 14:47:09 2021 +0000 @@ -1,4 +1,7 @@ - + + + sblanck/mpagenomicsdependencies + of previously extracted signal