# HG changeset patch # User davidvanzessen # Date 1478608374 18000 # Node ID 372ccdcf0b2d5d3c4bc78be2d3c5d5dad27fdde4 # Parent 3968d04b5724c822a040276b1895a8b6af9de516 Uploaded diff -r 3968d04b5724 -r 372ccdcf0b2d merge_and_filter.r --- a/merge_and_filter.r Mon Nov 07 06:52:53 2016 -0500 +++ b/merge_and_filter.r Tue Nov 08 07:32:54 2016 -0500 @@ -113,9 +113,6 @@ filtering.steps = rbind(filtering.steps, c("After empty CDR2, FR3 filter", nrow(result))) } -print(paste("Number of sequences in result after CDR/FR filtering:", nrow(result))) -print(paste("Number of sequences in result after CDR/FR filtering:", nrow(result[!grepl("unmatched", result$best_match),]))) - if(empty.region.filter == "leader"){ result = result[!(grepl("n|N", result$FR1.IMGT.seq) | grepl("n|N", result$FR2.IMGT.seq) | grepl("n|N", result$FR3.IMGT.seq) | grepl("n|N", result$CDR1.IMGT.seq) | grepl("n|N", result$CDR2.IMGT.seq) | grepl("n|N", result$CDR3.IMGT.seq)),] } else if(empty.region.filter == "FR1"){ @@ -175,10 +172,6 @@ result$unique.def = paste(result$CDR2.IMGT.seq, result$FR3.IMGT.seq, result$CDR3.IMGT.seq) } - if(grepl("_c", filter.unique)){ - result$unique.def = paste(result$unique.def, result$best_match) - } - if(grepl("keep", filter.unique)){ result$unique.def = paste(result$unique.def, result$best_match) #keep the unique sequences that are in multiple classes result = result[!duplicated(result$unique.def),] diff -r 3968d04b5724 -r 372ccdcf0b2d shm_csr.r --- a/shm_csr.r Mon Nov 07 06:52:53 2016 -0500 +++ b/shm_csr.r Tue Nov 08 07:32:54 2016 -0500 @@ -430,8 +430,8 @@ p = ggplot(dat.clss, aes(best_match, percentage_mutations)) p = p + geom_point(aes(colour=best_match), position="jitter") + geom_boxplot(aes(middle=mean(percentage_mutations)), alpha=0.1, outlier.shape = NA) p = p + xlab("Subclass") + ylab("Frequency") + ggtitle("Frequency scatter plot") + theme(panel.background = element_rect(fill = "white", colour="black"), text = element_text(size=16, colour="black")) -p = p + scale_fill_manual(values=c("IGA" = "blue4", "IGA1" = "lightblue1", "IGA2" = "blue4", "IGG" = "olivedrab3", "IGG1" = "olivedrab3", "IGG2" = "red", "IGG3" = "gold", "IGG4" = "darkred", "IGM" = "darkviolet", "all" = "blue4")) -p = p + scale_colour_manual(values=c("IGA" = "blue4", "IGA1" = "lightblue1", "IGA2" = "blue4", "IGG" = "olivedrab3", "IGG1" = "olivedrab3", "IGG2" = "red", "IGG3" = "gold", "IGG4" = "darkred", "IGM" = "darkviolet", "all" = "blue4")) +p = p + scale_fill_manual(values=c("IGA" = "blue4", "IGA1" = "lightblue1", "IGA2" = "blue4", "IGG" = "olivedrab3", "IGG1" = "olivedrab3", "IGG2" = "red", "IGG3" = "gold", "IGG4" = "darkred", "IGM" = "darkviolet", "IGE" = "darkorange", "all" = "blue4")) +p = p + scale_colour_manual(values=c("IGA" = "blue4", "IGA1" = "lightblue1", "IGA2" = "blue4", "IGG" = "olivedrab3", "IGG1" = "olivedrab3", "IGG2" = "red", "IGG3" = "gold", "IGG4" = "darkred", "IGM" = "darkviolet", "IGE" = "darkorange", "all" = "blue4")) png(filename="scatter.png") print(p) @@ -455,7 +455,7 @@ p = ggplot(frequency_bins_data, aes(frequency_bins, frequency)) p = p + geom_bar(aes(fill=best_match_class), stat="identity", position="dodge") + theme(panel.background = element_rect(fill = "white", colour="black"), text = element_text(size=16, colour="black")) -p = p + xlab("Frequency ranges") + ylab("Frequency") + ggtitle("Mutation Frequencies by class") + scale_fill_manual(values=c("IGA" = "blue4", "IGG" = "olivedrab3", "IGM" = "black", "all" = "blue4")) +p = p + xlab("Frequency ranges") + ylab("Frequency") + ggtitle("Mutation Frequencies by class") + scale_fill_manual(values=c("IGA" = "blue4", "IGG" = "olivedrab3", "IGM" = "darkviolet", "IGE" = "darkorange", "all" = "blue4")) png(filename="frequency_ranges.png") print(p)