# HG changeset patch # User ufz # Date 1753182564 0 # Node ID 3a7f73d638ba5b651438b2e9b88833a336bb8e6a # Parent 315c2ed31af1d35b3a6af913eac8d1bff4b3b1ca planemo upload for repository https://github.com/Helmholtz-UFZ/ufz-galaxy-tools/blob/main/tools/phi-toolkit commit 368e8a7322c9763c648637263d4695abc146be13 diff -r 315c2ed31af1 -r 3a7f73d638ba logo.jpg Binary file logo.jpg has changed diff -r 315c2ed31af1 -r 3a7f73d638ba phitk.xml --- a/phitk.xml Wed Jun 04 17:36:40 2025 +0000 +++ b/phitk.xml Tue Jul 22 11:09:24 2025 +0000 @@ -1,47 +1,48 @@ + - 0.1.0 + 0.2.0 0 - - + + - - + + - + - - - - - - - - - - - - - + + + + + + + + + + + + + - - - - - - - - - - - + + + + + + + + + + + @@ -62,10 +63,15 @@ r-base64 r-pdftools + /dev/null) + ]]> &2 echo "debug.log:" + && >&2 cat debug.log ]]> @@ -163,7 +181,6 @@ - @@ -200,13 +217,64 @@ - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -244,4 +312,4 @@ - \ No newline at end of file + diff -r 315c2ed31af1 -r 3a7f73d638ba report.Rmd --- a/report.Rmd Wed Jun 04 17:36:40 2025 +0000 +++ b/report.Rmd Tue Jul 22 11:09:24 2025 +0000 @@ -25,17 +25,21 @@ ```{r setup_env, include=FALSE} knitr::opts_chunk$set(echo = FALSE) +knitr::opts_chunk$set(dev = "svglite") # set output device to svg cat("params$outdir:", params$outdir, "\n") ``` ```{r setup_libraries, message=FALSE, warning=FALSE, echo=FALSE, results='asis'} # Define required packages -required_packages <- c("tidyverse", "janitor", "here", - "kableExtra", "gmoviz", "circlize", - "GenomicRanges", "patchwork", "fs", - "tools", "scales", "formattable", - "pdftools", "base64") +# Note: update version_command if changed here! +required_packages <- c( + "tidyverse", "janitor", "here", + "kableExtra", "gmoviz", "circlize", + "GenomicRanges", "patchwork", "fs", + "tools", "scales", "formattable", + "pdftools", "base64" +) # Load required packages invisible(lapply(required_packages, library, character.only = TRUE)) @@ -45,180 +49,185 @@ log_file <- "debug.log" log_debug <- function(message) { - if (!exists("log_initialized") || !log_initialized) { - cat(paste0(Sys.time(), " - DEBUG: ", message, "\n"), file = log_file, append = FALSE) - assign("log_initialized", TRUE, envir = .GlobalEnv) - } else { - cat(paste0(Sys.time(), " - DEBUG: ", message, "\n"), file = log_file, append = TRUE) - } + if (!exists("log_initialized") || !log_initialized) { + cat(paste0(Sys.time(), " - DEBUG: ", message, "\n"), file = log_file, append = FALSE) + assign("log_initialized", TRUE, envir = .GlobalEnv) + } else { + cat(paste0(Sys.time(), " - DEBUG: ", message, "\n"), file = log_file, append = TRUE) + } } load_file <- function(path) { - log_debug(paste("Attempting to load:", path)) - if (file.exists(path)) { - ext <- tools::file_ext(path) - if (ext %in% c("tsv", "csv")) { - data <- read_delim(path, delim = ifelse(ext == "csv", ",", "\t"), show_col_types = FALSE) %>% clean_names - log_debug(paste("Loaded", nrow(data), "rows from", path)) - data - } else if (ext == "fna") { - data <- Biostrings::readDNAStringSet(path) - log_debug(paste("Loaded", length(data), "sequences from", path)) - data + log_debug(paste("Attempting to load:", path)) + if (file.exists(path)) { + ext <- tools::file_ext(path) + if (ext %in% c("tsv", "csv")) { + data <- read_delim(path, delim = ifelse(ext == "csv", ",", "\t"), show_col_types = FALSE) %>% janitor::clean_names() + log_debug(paste("Loaded", nrow(data), "rows from", path)) + data + } else if (ext == "fna") { + data <- Biostrings::readDNAStringSet(path) + log_debug(paste("Loaded", length(data), "sequences from", path)) + data + } else { + log_debug(paste("Skipping", path, "- unsupported file type")) + NULL + } } else { - log_debug(paste("Skipping", path, "- unsupported file type")) - NULL + log_debug(paste("File does not exist:", path)) + NULL } - } else { - log_debug(paste("File does not exist:", path)) - NULL - } } get_file_info <- function(path, loaded_data) { - log_debug(paste("Processing file info for:", path)) - if (file.exists(path)) { - ext <- tools::file_ext(path) - if (ext %in% c("tsv", "csv", "fna")) { - data <- loaded_data[[basename(path)]] - rows <- if(ext == "fna") length(data) else nrow(data) - tibble(exists = TRUE, rows = rows, size = file.size(path), path = path) + log_debug(paste("Processing file info for:", path)) + if (file.exists(path)) { + ext <- tools::file_ext(path) + if (ext %in% c("tsv", "csv", "fna")) { + data <- loaded_data[[basename(path)]] + rows <- if (ext == "fna") length(data) else nrow(data) + tibble::tibble(exists = TRUE, rows = rows, size = file.size(path), path = path) + } else { + tibble::tibble(exists = TRUE, rows = NA_integer_, size = file.size(path), path = path) + } } else { - tibble(exists = TRUE, rows = NA_integer_, size = file.size(path), path = path) + tibble::tibble(exists = FALSE, rows = NA_integer_, size = NA_real_, path = NA_character_) } - } else { - tibble(exists = FALSE, rows = NA_integer_, size = NA_real_, path = NA_character_) - } } process_genome_folder <- function(folder, host_analyses_dir, virus_analyses_dir) { - log_debug(paste("Processing folder:", folder)) - genome_name <- basename(folder) + log_debug(paste("Processing folder:", folder)) + genome_name <- basename(folder) - paths <- list( - genomad = file.path(host_analyses_dir, "genomad", genome_name, paste0(genome_name, "_summary"), paste0(genome_name, "_virus_summary.tsv")), - genomad_phages = file.path(host_analyses_dir, "genomad", genome_name, paste0(genome_name, "_summary"), paste0(genome_name, "_virus.fna")), - genomad_annotations = file.path(host_analyses_dir, "genomad", genome_name, paste0(genome_name, "_summary"), paste0(genome_name, "_virus_genes.tsv")), - defense_finder = file.path(host_analyses_dir, "defense-finder", genome_name, paste0(genome_name, "_defense_finder_systems.tsv")), - checkv = file.path(virus_analyses_dir, "checkv", genome_name, "quality_summary.tsv"), - iphop = file.path(virus_analyses_dir, "iphop", genome_name, "Host_prediction_to_genome_m90.csv"), - drep = file.path(virus_analyses_dir, "drep_compare", genome_name, "data_tables", "Cdb.csv"), - phatyp = file.path(virus_analyses_dir, "phatyp", genome_name, "phatyp.csv"), - abricate = file.path(virus_analyses_dir, "abricate", genome_name, paste0(genome_name, "_virus_vfdb.tsv")), - vibrant = file.path(virus_analyses_dir, "vibrant", genome_name, - paste0("VIBRANT_", genome_name, "_virus"), - paste0("VIBRANT_results_", genome_name, "_virus"), - paste0("VIBRANT_AMG_individuals_", genome_name, "_virus.tsv")) - ) + paths <- list( + genomad = file.path(host_analyses_dir, "genomad", genome_name, paste0(genome_name, "_summary"), paste0(genome_name, "_virus_summary.tsv")), + genomad_phages = file.path(host_analyses_dir, "genomad", genome_name, paste0(genome_name, "_summary"), paste0(genome_name, "_virus.fna")), + genomad_annotations = file.path(host_analyses_dir, "genomad", genome_name, paste0(genome_name, "_summary"), paste0(genome_name, "_virus_genes.tsv")), + defense_finder = file.path(host_analyses_dir, "defense-finder", genome_name, paste0(genome_name, "_defense_finder_systems.tsv")), + checkv = file.path(virus_analyses_dir, "checkv", genome_name, "quality_summary.tsv"), + iphop = file.path(virus_analyses_dir, "iphop", genome_name, "Host_prediction_to_genome_m90.csv"), + drep = file.path(virus_analyses_dir, "drep_compare", genome_name, "data_tables", "Cdb.csv"), + phatyp = file.path(virus_analyses_dir, "phatyp", genome_name, "phatyp.csv"), + abricate = file.path(virus_analyses_dir, "abricate", genome_name, paste0(genome_name, "_virus_vfdb.tsv")), + vibrant = file.path( + virus_analyses_dir, "vibrant", genome_name, + paste0("VIBRANT_", genome_name, "_virus"), + paste0("VIBRANT_results_", genome_name, "_virus"), + paste0("VIBRANT_AMG_individuals_", genome_name, "_virus.tsv") + ) + ) - loaded_data <- map(paths, load_file) - file_info <- map_dfr(paths, ~get_file_info(.x, loaded_data), .id = "file_type") + loaded_data <- map(paths, load_file) + file_info <- purrr::map_dfr(paths, ~ get_file_info(.x, loaded_data), .id = "file_type") - virus_count <- if(!is.null(loaded_data$genomad)) { - count <- sum(loaded_data$genomad$virus_score > 0.5, na.rm = TRUE) - log_debug(paste("Virus count:", count)) - count - } else { - log_debug("No genomad summary found, virus count set to 0") - 0 - } + virus_count <- if (!is.null(loaded_data$genomad)) { + count <- sum(loaded_data$genomad$virus_score > 0.5, na.rm = TRUE) + log_debug(paste("Virus count:", count)) + count + } else { + log_debug("No genomad summary found, virus count set to 0") + 0 + } - log_debug("Returning results from process_genome_folder") - list(file_info = file_info, virus_count = virus_count, loaded_data = loaded_data) + log_debug("Returning results from process_genome_folder") + list(file_info = file_info, virus_count = virus_count, loaded_data = loaded_data) } ``` ```{r compile_results, message=FALSE, warning=FALSE, echo=FALSE} compile_results <- function() { - base_dir <- params$outdir - log_debug(paste("Base directory:", base_dir)) - - host_analyses_dir <- file.path(base_dir, "host_analyses") - virus_analyses_dir <- file.path(base_dir, "virus_analyses") + base_dir <- params$outdir + log_debug(paste("Base directory:", base_dir)) - # List all sample-level directories from all tools under virus_analyses - tool_dirs <- list.dirs(virus_analyses_dir, full.names = TRUE, recursive = FALSE) - - genome_folders <- list.dirs(file.path(base_dir, "host_analyses", "genomad"), - full.names = TRUE, recursive = FALSE) - - # cat(length(genome_folders), "sample(s) processed\n") - - log_debug("Processing genome folders") - genome_data <- map(genome_folders, process_genome_folder, - host_analyses_dir = host_analyses_dir, - virus_analyses_dir = virus_analyses_dir) %>% - set_names(basename(genome_folders)) %>% - compact() + host_analyses_dir <- file.path(base_dir, "host_analyses") + virus_analyses_dir <- file.path(base_dir, "virus_analyses") - log_debug("Creating summary dataframe") - summary_df <- map_dfr(genome_data, ~{ - file_info <- .x$file_info - tibble( - Sample = basename(file_info$path[1]), - Virus_Count = .x$virus_count, - geNomad = file_info$exists[file_info$file_type == "genomad"], - CheckV = file_info$exists[file_info$file_type == "checkv"], - VIBRANT = file_info$exists[file_info$file_type == "vibrant"], - dRep = file_info$exists[file_info$file_type == "drep"], - iPHOP = file_info$exists[file_info$file_type == "iphop"], - PhaTYP = file_info$exists[file_info$file_type == "phatyp"], - Defense_Finder = file_info$exists[file_info$file_type == "defense_finder"], - geNomad_Path = file_info$path[file_info$file_type == "genomad"], - CheckV_Path = file_info$path[file_info$file_type == "checkv"], - VIBRANT_Path = file_info$path[file_info$file_type == "vibrant"], - dRep_Path = file_info$path[file_info$file_type == "drep"], - PhaTYP_Path = file_info$path[file_info$file_type == "phatyp"], - Defense_Finder_Path = file_info$path[file_info$file_type == "defense_finder"], - Virus_Contigs = ifelse(file_info$exists[file_info$file_type == "genomad_phages"], - file_info$rows[file_info$file_type == "genomad_phages"], - 0) + # List all sample-level directories from all tools under virus_analyses + tool_dirs <- list.dirs(virus_analyses_dir, full.names = TRUE, recursive = FALSE) + + genome_folders <- list.dirs(file.path(base_dir, "host_analyses", "genomad"), + full.names = TRUE, recursive = FALSE ) - }) %>% - mutate(across(ends_with("_Path"), ~ifelse(is.na(.), "Not available", as.character(.)))) + + # cat(length(genome_folders), "sample(s) processed\n") + + log_debug("Processing genome folders") + genome_data <- map(genome_folders, process_genome_folder, + host_analyses_dir = host_analyses_dir, + virus_analyses_dir = virus_analyses_dir + ) %>% + purrr::set_names(basename(genome_folders)) %>% + compact() - host_genomes_fasta <- list.files( - path = file.path(params$outdir, "genomes"), - pattern = "\\.fna$", - full.names = TRUE - ) - - host_genomes_paths <- tibble( - name = tools::file_path_sans_ext(basename(host_genomes_fasta)), - path = host_genomes_fasta - ) - - data_gtdbtk_host <- read_tsv( - file.path(params$outdir, "host_analyses/gtdbtk/gtdbtk.bac120.summary.tsv"), - show_col_types = FALSE - ) %>% clean_names() + log_debug("Creating summary dataframe") + summary_df <- purrr::map_dfr(genome_data, ~ { + file_info <- .x$file_info + tibble::tibble( + Sample = basename(file_info$path[1]), + Virus_Count = .x$virus_count, + geNomad = file_info$exists[file_info$file_type == "genomad"], + CheckV = file_info$exists[file_info$file_type == "checkv"], + VIBRANT = file_info$exists[file_info$file_type == "vibrant"], + dRep = file_info$exists[file_info$file_type == "drep"], + iPHOP = file_info$exists[file_info$file_type == "iphop"], + PhaTYP = file_info$exists[file_info$file_type == "phatyp"], + Defense_Finder = file_info$exists[file_info$file_type == "defense_finder"], + geNomad_Path = file_info$path[file_info$file_type == "genomad"], + CheckV_Path = file_info$path[file_info$file_type == "checkv"], + VIBRANT_Path = file_info$path[file_info$file_type == "vibrant"], + dRep_Path = file_info$path[file_info$file_type == "drep"], + PhaTYP_Path = file_info$path[file_info$file_type == "phatyp"], + Defense_Finder_Path = file_info$path[file_info$file_type == "defense_finder"], + Virus_Contigs = ifelse(file_info$exists[file_info$file_type == "genomad_phages"], + file_info$rows[file_info$file_type == "genomad_phages"], + 0 + ) + ) + }) %>% + dplyr::mutate(dplyr::across(ends_with("_Path"), ~ ifelse(is.na(.), "Not available", as.character(.)))) - data_checkm_host <- read_tsv( - file.path(params$outdir, "host_analyses/checkm2/quality_report.tsv"), - show_col_types = FALSE - ) %>% clean_names() + host_genomes_fasta <- list.files( + path = file.path(params$outdir, "genomes"), + pattern = "\\.fna$", + full.names = TRUE + ) - log_debug("Returning summary dataframe, genome data, and host data") + host_genomes_paths <- tibble::tibble( + name = tools::file_path_sans_ext(basename(host_genomes_fasta)), + path = host_genomes_fasta + ) + + data_gtdbtk_host <- read_tsv( + file.path(params$outdir, "host_analyses/gtdbtk/gtdbtk.bac120.summary.tsv"), + show_col_types = FALSE + ) %>% janitor::clean_names() + + data_checkm_host <- read_tsv( + file.path(params$outdir, "host_analyses/checkm2/quality_report.tsv"), + show_col_types = FALSE + ) %>% janitor::clean_names() - log_debug(paste("summary_df dimensions:", nrow(summary_df), "rows,", ncol(summary_df), "columns")) - log_debug(paste("summary_df column names:", paste(colnames(summary_df), collapse = ", "))) - log_debug(paste("genome_data length:", length(genome_data))) - log_debug(paste("genome_data names:", paste(names(genome_data), collapse = ", "))) - log_debug(paste("host_genomes_paths dimensions:", nrow(host_genomes_paths), "rows,", ncol(host_genomes_paths), "columns")) - log_debug(paste("host_genomes_paths column names:", paste(colnames(host_genomes_paths), collapse = ", "))) - log_debug(paste("data_gtdbtk_host dimensions:", nrow(data_gtdbtk_host), "rows,", ncol(data_gtdbtk_host), "columns")) - log_debug(paste("data_gtdbtk_host column names:", paste(colnames(data_gtdbtk_host), collapse = ", "))) - log_debug(paste("data_checkm_host dimensions:", nrow(data_checkm_host), "rows,", ncol(data_checkm_host), "columns")) - log_debug(paste("data_checkm_host column names:", paste(colnames(data_checkm_host), collapse = ", "))) - - list( - summary = summary_df, - genome_data = genome_data, - host_genomes_paths = host_genomes_paths, - data_gtdbtk_host = data_gtdbtk_host, - data_checkm_host = data_checkm_host - ) + log_debug("Returning summary dataframe, genome data, and host data") + + log_debug(paste("summary_df dimensions:", nrow(summary_df), "rows,", ncol(summary_df), "columns")) + log_debug(paste("summary_df column names:", paste(colnames(summary_df), collapse = ", "))) + log_debug(paste("genome_data length:", length(genome_data))) + log_debug(paste("genome_data names:", paste(names(genome_data), collapse = ", "))) + log_debug(paste("host_genomes_paths dimensions:", nrow(host_genomes_paths), "rows,", ncol(host_genomes_paths), "columns")) + log_debug(paste("host_genomes_paths column names:", paste(colnames(host_genomes_paths), collapse = ", "))) + log_debug(paste("data_gtdbtk_host dimensions:", nrow(data_gtdbtk_host), "rows,", ncol(data_gtdbtk_host), "columns")) + log_debug(paste("data_gtdbtk_host column names:", paste(colnames(data_gtdbtk_host), collapse = ", "))) + log_debug(paste("data_checkm_host dimensions:", nrow(data_checkm_host), "rows,", ncol(data_checkm_host), "columns")) + log_debug(paste("data_checkm_host column names:", paste(colnames(data_checkm_host), collapse = ", "))) + + list( + summary = summary_df, + genome_data = genome_data, + host_genomes_paths = host_genomes_paths, + data_gtdbtk_host = data_gtdbtk_host, + data_checkm_host = data_checkm_host + ) } ``` @@ -227,8 +236,8 @@ result <- compile_results() if (is.null(result)) { - log_debug("Main function execution failed") - stop("Main function execution failed") + log_debug("Main function execution failed") + stop("Main function execution failed") } summary_df <- result$summary @@ -238,263 +247,229 @@ data_checkm_host <- result$data_checkm_host log_debug("Data extracted successfully") -# Remove any extensions from names in data gtdbtk and checm2 -data_gtdbtk_host <- data_gtdbtk_host %>% - mutate(user_genome = str_remove(user_genome, "\\.[^.]+$")) - -data_checkm_host <- data_checkm_host %>% - mutate(name = str_remove(name, "\\.[^.]+$")) - result$summary <- result$summary %>% - mutate(Sample = str_remove(Sample, "_virus_summary.tsv")) + dplyr::mutate(Sample = stringr::str_remove(Sample, "_virus_summary.tsv")) ``` # Summary {.tabset .tabset-fade} -## Overview Table - This table provides sample-by-sample information on detected viruses and key host genome statistics. It includes taxonomy, virus count, genome quality classification, CheckM2 metrics (completeness and contamination), and genome assembly statistics such as size and N50. ```{r render_table, message=FALSE, warning=FALSE, echo=FALSE, results='asis'} data <- result$summary log_debug("Assigning checkm2 host data") -checkm_host_data <- data_checkm_host %>% clean_names() %>% - select(name, completeness, contamination, - contig_n50, genome_size) +checkm_host_data <- data_checkm_host %>% + janitor::clean_names() %>% + dplyr::select( + name, completeness, contamination, + contig_n50, genome_size + ) log_debug("Assigning GTDB-Tk host data") -gtdbtk_data <- data_gtdbtk_host %>% - select(user_genome, classification) +gtdbtk_data <- data_gtdbtk_host %>% + dplyr::select(user_genome, classification) log_debug("Defining color-blind friendly palette") cb_friendly_colors <- list( - green = "#009E73", - blue = "#0072B2", - orange = "#E69F00", - red = "#D55E00", - grey = "#999999" + green = "#009E73", + blue = "#0072B2", + orange = "#E69F00", + red = "#D55E00", + grey = "#999999" ) log_debug("Defining function to color cells") -color_cell <- function(values, color_true = cb_friendly_colors$green, +color_cell <- function(values, color_true = cb_friendly_colors$green, color_false = cb_friendly_colors$red) { - ifelse(values, - cell_spec("Yes", color = "white", bold = TRUE, background = color_true), - cell_spec("No", color = "white", bold = TRUE, background = color_false)) + ifelse(values, + kableExtra::cell_spec("Yes", color = "white", bold = TRUE, background = color_true), + kableExtra::cell_spec("No", color = "white", bold = TRUE, background = color_false) + ) } log_debug("Defining function to create bar plot") create_bar_plot <- function(values, max_value, color = cb_friendly_colors$grey) { - sapply(values, function(value) { - if(is.na(value) || !is.numeric(value)) { - return("N/A") - } - bar_width <- min(max(value, 0), max_value) / max_value * 100 - sprintf('
%.1f%%', - color, bar_width, value) - }) + sapply(values, function(value) { + if (is.na(value) || !is.numeric(value)) { + return("N/A") + } + bar_width <- min(max(value, 0), max_value) / max_value * 100 + sprintf( + '
%.1f%%', + color, bar_width, value + ) + }) } log_debug("Defining function to format large numbers") format_large_number <- function(x) { - sapply(x, function(value) { - if (is.na(value) || !is.numeric(value)) { - return("N/A") - } else if (value < 1000) { - return(as.character(value)) - } else if (value < 1e6) { - return(paste0(round(value / 1e3, 1), "K")) - } else if (value < 1e9) { - return(paste0(round(value / 1e6, 1), "M")) - } else { - return(paste0(round(value / 1e9, 1), "G")) - } - }) + sapply(x, function(value) { + if (is.na(value) || !is.numeric(value)) { + return("N/A") + } else if (value < 1000) { + return(as.character(value)) + } else if (value < 1e6) { + return(paste0(round(value / 1e3, 1), "K")) + } else if (value < 1e9) { + return(paste0(round(value / 1e6, 1), "M")) + } else { + return(paste0(round(value / 1e9, 1), "G")) + } + }) } log_debug("Defining function to extract last known taxonomy level") extract_last_known_taxonomy <- function(classification) { - if (is.na(classification) || classification == "") { + if (is.na(classification) || classification == "") { + return(list(level = "Unknown", name = "Unknown")) + } + + parts <- strsplit(classification, ";")[[1]] + for (i in length(parts):1) { + level <- sub("^[a-z]__", "", parts[i]) + if (level != "") { + prefix <- sub("__.*$", "", parts[i]) + return(list(level = prefix, name = level)) + } + } return(list(level = "Unknown", name = "Unknown")) - } - - parts <- strsplit(classification, ";")[[1]] - for (i in length(parts):1) { - level <- sub("^[a-z]__", "", parts[i]) - if (level != "") { - prefix <- sub("__.*$", "", parts[i]) - return(list(level = prefix, name = level)) - } - } - return(list(level = "Unknown", name = "Unknown")) } log_debug("Defining function to format taxonomy") format_taxonomy <- function(classification) { - result <- extract_last_known_taxonomy(classification) - if (result$level == "Unknown") { - return("Unknown") - } else if (result$level == "s") { - return(paste0("", result$name, "")) - } else { - genus <- str_replace_all(result$name, "_", " ") - return(paste0("", genus, " sp.")) - } + result <- extract_last_known_taxonomy(classification) + if (result$level == "Unknown") { + return("Unknown") + } else if (result$level == "s") { + return(paste0("", result$name, "")) + } else { + genus <- str_replace_all(result$name, "_", " ") + return(paste0("", genus, " sp.")) + } } log_debug("Defining function to calculate quality score and determine genome quality class") calculate_quality_score_and_class <- function(completeness, contamination) { - if (is.na(completeness) || is.na(contamination)) { - return(list( - score = cell_spec("N/A", color = "white", bold = TRUE, background = cb_friendly_colors$grey), - class = cell_spec("Unknown", color = "white", bold = TRUE, background = cb_friendly_colors$grey), - numeric_score = NA - )) - } - - quality_score <- completeness - (5 * contamination) - formatted_score <- sprintf("%.1f", quality_score) - - if (completeness > 90 && contamination < 5) { - class <- "High-quality draft" - color <- cb_friendly_colors$green - } else if (completeness >= 50 && contamination < 10) { - class <- "Medium-quality draft" - color <- cb_friendly_colors$blue - } else { - class <- "Low-quality draft" - color <- cb_friendly_colors$red - } - - list( - score = cell_spec(formatted_score, color = "white", bold = TRUE, background = color), - class = cell_spec(class, color = "white", bold = TRUE, background = color), - numeric_score = quality_score - ) + if (is.na(completeness) || is.na(contamination)) { + return(list( + score = kableExtra::cell_spec("N/A", color = "white", bold = TRUE, background = cb_friendly_colors$grey), + class = kableExtra::cell_spec("Unknown", color = "white", bold = TRUE, background = cb_friendly_colors$grey), + numeric_score = NA + )) + } + + quality_score <- completeness - (5 * contamination) + formatted_score <- sprintf("%.1f", quality_score) + + if (completeness > 90 && contamination < 5) { + class <- "High-quality draft" + color <- cb_friendly_colors$green + } else if (completeness >= 50 && contamination < 10) { + class <- "Medium-quality draft" + color <- cb_friendly_colors$blue + } else { + class <- "Low-quality draft" + color <- cb_friendly_colors$red + } + + list( + score = kableExtra::cell_spec(formatted_score, color = "white", bold = TRUE, background = color), + class = kableExtra::cell_spec(class, color = "white", bold = TRUE, background = color), + numeric_score = quality_score + ) } log_debug("Preparing the data") table_data <- data %>% - #mutate(Sample = basename(Sample) %>% trim_sample_name()) %>% - mutate(Sample = basename(Sample)) %>% - left_join(checkm_host_data, by = c("Sample" = "name")) %>% - left_join(gtdbtk_data, by = c("Sample" = "user_genome")) %>% - mutate( - quality_data = pmap(list(as.numeric(completeness), - as.numeric(contamination)), - calculate_quality_score_and_class), - Quality_Score = map_chr(quality_data, ~.$score), - Genome_Quality = map_chr(quality_data, ~.$class), - Quality_Score_Numeric = map_dbl(quality_data, ~.$numeric_score), - Virus_Count_Numeric = as.numeric(Virus_Count), - Virus_Count = cell_spec( - Virus_Count, - color = "white", - bold = TRUE, - background = case_when( - Virus_Count == 0 ~ cb_friendly_colors$red, - Virus_Count == 1 ~ cb_friendly_colors$blue, - Virus_Count > 1 ~ cb_friendly_colors$green - ) - ), - Completeness_Numeric = as.numeric(completeness), - Completeness = create_bar_plot(as.numeric(completeness), 100), - Contamination = create_bar_plot(as.numeric(contamination), 100), - `N50 (contigs)` = format_large_number(as.numeric(contig_n50)), - `Genome size (bp)` = format_large_number(as.numeric(genome_size)), - `GTDB Taxonomy` = sapply(classification, format_taxonomy) - ) %>% - mutate(`#` = row_number()) %>% - select(`#`, Sample, `GTDB Taxonomy`, Virus_Count, - Quality_Score, Genome_Quality, Completeness, Contamination, - `Genome size (bp)`, `N50 (contigs)`) + # dplyr::mutate(Sample = basename(Sample) %>% trim_sample_name()) %>% + dplyr::mutate(Sample = basename(Sample)) %>% + dplyr::left_join(checkm_host_data, by = c("Sample" = "name")) %>% + dplyr::left_join(gtdbtk_data, by = c("Sample" = "user_genome")) %>% + dplyr::mutate( + quality_data = purrr::pmap( + list( + as.numeric(completeness), + as.numeric(contamination) + ), + calculate_quality_score_and_class + ), + Quality_Score = purrr::map_chr(quality_data, ~ .$score), + Genome_Quality = purrr::map_chr(quality_data, ~ .$class), + Quality_Score_Numeric = purrr::map_dbl(quality_data, ~ .$numeric_score), + Virus_Count_Numeric = as.numeric(Virus_Count), + Virus_Count = kableExtra::cell_spec( + Virus_Count, + color = "white", + bold = TRUE, + background = dplyr::case_when( + Virus_Count == 0 ~ cb_friendly_colors$red, + Virus_Count == 1 ~ cb_friendly_colors$blue, + Virus_Count > 1 ~ cb_friendly_colors$green + ) + ), + Completeness_Numeric = as.numeric(completeness), + Completeness = create_bar_plot(as.numeric(completeness), 100), + Contamination = create_bar_plot(as.numeric(contamination), 100), + `N50 (contigs)` = format_large_number(as.numeric(contig_n50)), + `Genome size (bp)` = format_large_number(as.numeric(genome_size)), + `GTDB Taxonomy` = sapply(classification, format_taxonomy) + ) %>% + dplyr::mutate(`#` = row_number()) %>% + dplyr::select( + `#`, Sample, `GTDB Taxonomy`, Virus_Count, + Quality_Score, Genome_Quality, Completeness, Contamination, + `Genome size (bp)`, `N50 (contigs)` + ) log_debug("Creating the table") -kbl(table_data, escape = FALSE, - align = c("c", "l", "l", "c", rep("c", 2), rep("r", 2), rep("r", 2))) %>% - kable_paper(full_width = TRUE) %>% - column_spec(1, bold = TRUE, width = "2em") %>% - column_spec(2:3, bold = TRUE) %>% - column_spec(4:5, width = "5em") %>% - column_spec(6:7, width = "60px") %>% - column_spec(8:9, width = "4em") %>% - add_header_above(c(" " = 4, "Host Genome Quality" = 2, "CheckM Metrics" = 2, - "Statistics" = 2)) %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), - font_size = 9, - html_font = "Arial", - position = "left") %>% - row_spec(0, bold = TRUE, color = "white", background = "#333333") %>% - row_spec(0, extra_css = "border-bottom: 2px solid #000000;") %>% - column_spec(9, extra_css = "border-right: 2px solid #000000;") %>% - scroll_box(width = "100%", height = "100%", - extra_css = "overflow-x: auto; border: 1px solid #ccc; border-radius: 4px;") +kableExtra::kbl(table_data, + escape = FALSE, + align = c("c", "l", "l", "c", rep("c", 2), rep("r", 2), rep("r", 2)) +) %>% + kableExtra::kable_paper(full_width = TRUE) %>% + kableExtra::column_spec(1, bold = TRUE, width = "2em") %>% + kableExtra::column_spec(2:3, bold = TRUE) %>% + kableExtra::column_spec(4:5, width = "5em") %>% + kableExtra::column_spec(6:7, width = "60px") %>% + kableExtra::column_spec(8:9, width = "4em") %>% + kableExtra::add_header_above(c( + " " = 4, "Host Genome Quality" = 2, "CheckM Metrics" = 2, + "Statistics" = 2 + )) %>% + kableExtra::kable_styling( + bootstrap_options = c("striped", "hover", "condensed", "responsive"), + font_size = 9, + html_font = "Arial", + position = "left" + ) %>% + kableExtra::row_spec(0, bold = TRUE, color = "white", background = "#333333") %>% + kableExtra::row_spec(0, extra_css = "border-bottom: 2px solid #000000;") %>% + kableExtra::column_spec(9, extra_css = "border-right: 2px solid #000000;") %>% + kableExtra::scroll_box( + width = "100%", height = "100%", + extra_css = "overflow-x: auto; border: 1px solid #ccc; border-radius: 4px;" + ) ``` -## Tools Documentation - -The following tools are utilized in this workflow. Each tool name below is a link to its respective documentation. - -**Host-analyses** - -- [**CheckM2 v1.1.0**](https://github.com/chklovski/CheckM2): Assesses the quality of the host. Most useful when working with assembled genomes. - -- [**GTDB-Tk v2.3.2**](https://ecogenomics.github.io/GTDBTk/index.html): Assigns a taxonomy to the host genome. - -- [**Defense-Finder v2.0.0, models 2.0.2**](https://ecogenomics.github.io/GTDBTk/index.html): Detects known anti-phage systems in the host. - -- [**geNomad v1.7.1**](https://portal.nersc.gov/genomad/): Predicts and annotates proviruses. - -**Virus-analyses** - -- [**CheckV v1.0.1**](https://pypi.org/project/checkv/): Evaluates the quality of viral genomes. - -- [**dRep v3.4.5**](https://drep.readthedocs.io/en/latest/): Compares viral genomes within the same host. - -- [**Abricate v1.0.1**](https://github.com/tseemann/abricate): Identifies virulence genes in the prophage genomes with the [VFDB database](https://www.mgc.ac.cn/VFs/). - -- [**iPHOP v1.3.3**](https://bitbucket.org/srouxjgi/iphop/src/main/): Predicts other potential hosts of viral genomes. - -- [**VIBRANT v1.2.1**](https://github.com/AnantharamanLab/VIBRANT): Used to identify Auxiliary Metabolic Genes in the prophages. - -## Workflow - -The workflow begins with the input of bacterial genomes by the user. These are processed by the **host-analyses** tools. Prophage prediction is -performed by **geNomad** only. Afterward, prophages identified by **geNomad** are processed by the **virus-analyses** tools. - -If more than one prophage is recovered in the same sample, **dRep** is used to compare and determine if the viruses are identical or different within the same host. - -*PLACEHOLDER FOR PIPELINE* - -## R Session Info - -Information about the R session used to render this markdown document. - -```{r} -sessionInfo() -``` - - # Results {.tabset .tabset-fade} ```{r} # Creating combined_unique object combined_unique <- bind_rows( - checkm_host_data %>% - # select(bin_id) %>% - # dplyr::rename(Sample = bin_id), - select(name) %>% - dplyr::rename(Sample = name), - - data %>% - #mutate(Sample = str_remove(Sample, "_virus_summary.tsv")) %>% - select(Sample) + checkm_host_data %>% + # dplyr::select(bin_id) %>% + # dplyr::rename(Sample = bin_id), + dplyr::select(name) %>% + dplyr::rename(Sample = name), + data %>% + # dplyr::mutate(Sample = stringr::str_remove(Sample, "_virus_summary.tsv")) %>% + dplyr::select(Sample) ) %>% - distinct(Sample) %>% - arrange(Sample) + distinct(Sample) %>% + arrange(Sample) log_debug(paste("combined_unique samples:", paste(combined_unique$Sample, collapse = ", "))) ``` @@ -504,578 +479,621 @@ ```{r main_workflow, fig.width=6, fig.height=6, out.height="100%", out.width='100%', dpi=300, fig.align='center', warning=FALSE, message=FALSE, results='asis'} # Process proviruses data process_proviruses <- function(data_genomad) { - proviruses <- data_genomad %>% - dplyr::filter(topology == "Provirus") %>% - dplyr::mutate(contig = sub("\\|provirus_.*", "", seq_name)) %>% # take everything before "|provirus" - dplyr::mutate(contig = paste0("c", as.numeric(factor(contig)))) %>% # map them to c_1, c_2, ... - dplyr::select(seq_name, coordinates, length, contig, virus_score, n_hallmarks) - - proviruses <- proviruses %>% - tidyr::separate(coordinates, into = c("start", "end"), sep = "-") - - proviruses$start <- as.integer(proviruses$start) - proviruses$end <- as.integer(proviruses$end) - - proviruses_gr_features <- GRanges(seqnames = proviruses$contig, - ranges = IRanges(start = proviruses$start, - end = proviruses$end)) - proviruses_gr_features$length <- proviruses_gr_features %>% ranges %>% width - proviruses_gr_features$score <- as.numeric(proviruses$virus_score) - proviruses_gr_features$n_hallmarks <- as.numeric(proviruses$n_hallmarks) - - proviruses_gr_features$n_hallmarks_pos <- - abs(start(proviruses_gr_features) - end(proviruses_gr_features)) / 2 - - return(proviruses_gr_features) + proviruses <- data_genomad %>% + dplyr::filter(topology == "Provirus") %>% + dplyr::mutate(contig = sub("\\|provirus_.*", "", seq_name)) %>% # take everything before "|provirus" + dplyr::mutate(contig = paste0("c", as.numeric(factor(contig)))) %>% # map them to c_1, c_2, ... + dplyr::select(seq_name, coordinates, length, contig, virus_score, n_hallmarks) + + proviruses <- proviruses %>% + tidyr::separate(coordinates, into = c("start", "end"), sep = "-") + + proviruses$start <- as.integer(proviruses$start) + proviruses$end <- as.integer(proviruses$end) + + proviruses_gr_features <- GRanges( + seqnames = proviruses$contig, + ranges = IRanges( + start = proviruses$start, + end = proviruses$end + ) + ) + proviruses_gr_features$length <- proviruses_gr_features %>% + ranges() %>% + width() + proviruses_gr_features$score <- as.numeric(proviruses$virus_score) + proviruses_gr_features$n_hallmarks <- as.numeric(proviruses$n_hallmarks) + + proviruses_gr_features$n_hallmarks_pos <- + abs(start(proviruses_gr_features) - end(proviruses_gr_features)) / 2 + + return(proviruses_gr_features) } plot_genome_ideogram <- function(genome_current, proviruses_gr_features) { - fasta_file_path <- file.path(params$outdir, "genomes", paste0(genome_current, ".fna")) - #cat(fasta_file_path, "\n\n") - genome_ideogram <- getIdeogramData(fasta_file = fasta_file_path) - - # Replace any seqlevel to c_1, c_2, c_3, ... - new_seqlevels <- paste0("c", seq_along(seqlevels(genome_ideogram))) - names(new_seqlevels) <- seqlevels(genome_ideogram) - genome_ideogram <- GenomeInfoDb::renameSeqlevels(genome_ideogram, new_seqlevels) - colours <- rep("#a58bc5", length(seqlevels(genome_ideogram))) - - par(mar = c(2, 2, 2, 2)) # minimal margins around the plot - - gmovizInitialise(genome_ideogram, - sector_colours = colours, - sector_border_colours = colours, - sector_labels = FALSE - ) - - for (i in 1:length(proviruses_gr_features)) { - name <- as.character(seqnames(proviruses_gr_features[i])) - start <- as.numeric(start(proviruses_gr_features[i])) - end <- as.numeric(end(proviruses_gr_features[i])) - region <- data.frame(start = start, end = end) - circos.genomicRect(seqnames = name, - region, - ytop = .5, - ybottom = 0, - track.index = 1, - sector.index = name, - border = "#e9d27d", - col = "#e9d27d") - } - - length <- as.numeric(proviruses_gr_features$length) - length <- ifelse(length > 1000000, - paste0(round(length/1000000, 2), "mb"), - paste0(round(length/1000, 2), "kb")) - labels <- paste0(as.character(seqnames(proviruses_gr_features)), " (", length, ")") - circos.labels(sectors = as.character(seqnames(proviruses_gr_features)), - x = as.numeric(start(proviruses_gr_features)), - labels, - facing = "clockwise") + fasta_file_path <- file.path(params$outdir, "genomes", paste0(genome_current, ".fna")) + # cat(fasta_file_path, "\n\n") + genome_ideogram <- gmoviz::getIdeogramData(fasta_file = fasta_file_path) + + # Replace any seqlevel to c_1, c_2, c_3, ... + new_seqlevels <- paste0("c", seq_along(GenomeInfoDb::seqlevels(genome_ideogram))) + names(new_seqlevels) <- GenomeInfoDb::seqlevels(genome_ideogram) + genome_ideogram <- GenomeInfoDb::renameSeqlevels(genome_ideogram, new_seqlevels) + colours <- rep("#a58bc5", length(GenomeInfoDb::seqlevels(genome_ideogram))) + + par(mar = c(2, 2, 2, 2)) # minimal margins around the plot + + gmoviz::gmovizInitialise(genome_ideogram, + sector_colours = colours, + sector_border_colours = colours, + sector_labels = FALSE + ) + + for (i in 1:length(proviruses_gr_features)) { + name <- as.character(seqnames(proviruses_gr_features[i])) + start <- as.numeric(start(proviruses_gr_features[i])) + end <- as.numeric(end(proviruses_gr_features[i])) + region <- data.frame(start = start, end = end) + circlize::circos.genomicRect( + seqnames = name, + region, + ytop = .5, + ybottom = 0, + track.index = 1, + sector.index = name, + border = "#e9d27d", + col = "#e9d27d" + ) + } + + length <- as.numeric(proviruses_gr_features$length) + length <- ifelse(length > 1000000, + paste0(round(length / 1000000, 2), "mb"), + paste0(round(length / 1000, 2), "kb") + ) + labels <- paste0(as.character(seqnames(proviruses_gr_features)), " (", length, ")") + circlize::circos.labels( + sectors = as.character(seqnames(proviruses_gr_features)), + x = as.numeric(start(proviruses_gr_features)), + labels, + facing = "clockwise" + ) } process_sample <- function(sample, combined_unique, host_genomes_paths, genome_data) { - genome_current <- sample # Add this line - tryCatch({ - log_debug(paste("Starting to process sample:", sample)) - - # Check if sample exists in genome_data - if (!(sample %in% names(genome_data))) { - log_debug(paste("Sample", sample, "not found in genome_data")) - cat(paste("Error: Sample", sample, "not found in genome_data\n\n")) - return() - } - - cat(paste("## ", sample, "{.tabset .tabset-fade} \n\n")) + genome_current <- sample # Add this line + tryCatch( + { + log_debug(paste("Starting to process sample:", sample)) + + # Check if sample exists in genome_data + if (!(sample %in% names(genome_data))) { + log_debug(paste("Sample", sample, "not found in genome_data")) + cat(paste("Error: Sample", sample, "not found in genome_data\n\n")) + return() + } + + cat(paste("## ", sample, "{.tabset .tabset-fade} \n\n")) + + host_genome_path <- host_genomes_paths$path[host_genomes_paths$name == sample] + if (length(host_genome_path) == 0) { + log_debug(paste("Host genome path not found for sample:", sample)) + cat(paste("Error: Host genome path not found for sample", sample, "\n\n")) + return() + } + + host_genome_ideogram <- tryCatch( + { + gmoviz::getIdeogramData(fasta_file = host_genome_path) + }, + error = function(e) { + log_debug(paste("Error loading host genome ideogram for sample", sample, ":", conditionMessage(e))) + NULL + } + ) + + if (is.null(host_genome_ideogram)) { + cat(paste("Error: Unable to load host genome ideogram for sample", sample, "\n\n")) + return() + } + + sample_data <- genome_data[[sample]]$loaded_data + genomad_summary <- sample_data$genomad + genomad_annotation <- sample_data$genomad_annotations + checkv_data <- sample_data$checkv + defense_finder_data <- sample_data$defense_finder + abricate_data <- sample_data$abricate + iphop_data <- sample_data$iphop + vibrant_data <- sample_data$vibrant - host_genome_path <- host_genomes_paths$path[host_genomes_paths$name == sample] - if (length(host_genome_path) == 0) { - log_debug(paste("Host genome path not found for sample:", sample)) - cat(paste("Error: Host genome path not found for sample", sample, "\n\n")) - return() - } - - host_genome_ideogram <- tryCatch({ - getIdeogramData(fasta_file = host_genome_path) - }, error = function(e) { - log_debug(paste("Error loading host genome ideogram for sample", sample, ":", conditionMessage(e))) - NULL - }) - - if (is.null(host_genome_ideogram)) { - cat(paste("Error: Unable to load host genome ideogram for sample", sample, "\n\n")) - return() - } - - sample_data <- genome_data[[sample]]$loaded_data - genomad_summary <- sample_data$genomad - genomad_annotation <- sample_data$genomad_annotations - checkv_data <- sample_data$checkv - defense_finder_data <- sample_data$defense_finder - abricate_data <- sample_data$abricate - iphop_data <- sample_data$iphop - vibrant_data <- sample_data$vibrant - - cat("### Host Genome\n\n") + cat("### Host Genome\n\n") + + cat("**GTDB-Tk taxonomy**: \n\n") + data_gtdbtk_host %>% + dplyr::filter(user_genome == sample) %>% + dplyr::select(classification) %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper("striped", full_width = TRUE) %>% + kableExtra::scroll_box(width = "100%", height = "100%") %>% + cat() - cat("**GTDB-Tk taxonomy**: \n\n") - data_gtdbtk_host %>% filter(user_genome == sample) %>% - select(classification) %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper("striped", full_width = TRUE) %>% - scroll_box(width = "100%", height = "100%") %>% - cat() + # Cat checkm summary for this genome + cat("**CheckM2 Summary**:\n\n") + # checkm_summary <- data_checkm_host %>% dplyr::filter(`bin_id` == sample) + checkm_summary <- data_checkm_host %>% dplyr::filter(`name` == sample) + checkm_summary %>% + janitor::clean_names() %>% + # dplyr::select(number_contigs, n50_contigs, completeness, contamination, strain_heterogeneity) %>% + dplyr::select(total_contigs, contig_n50, completeness, contamination) %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper("striped", full_width = TRUE) %>% + kableExtra::scroll_box(width = "100%", height = "100%") %>% + cat() - # Cat checkm summary for this genome - cat("**CheckM2 Summary**:\n\n") - #checkm_summary <- data_checkm_host %>% filter(`bin_id` == sample) - checkm_summary <- data_checkm_host %>% filter(`name` == sample) - checkm_summary %>% clean_names %>% - #select(number_contigs, n50_contigs, completeness, contamination, strain_heterogeneity) %>% - select(total_contigs, contig_n50, completeness, contamination) %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper("striped", full_width = TRUE) %>% - scroll_box(width = "100%", height = "100%") %>% - cat() - - # Display defense-finder as a table - if (!is.null(defense_finder_data) && nrow(defense_finder_data) > 0) { - cat("**Defense-Finder Systems**:\n\n") + # Display defense-finder as a table + if (!is.null(defense_finder_data) && nrow(defense_finder_data) > 0) { + cat("**Defense-Finder Systems**:\n\n") + + defense_finder_data %>% + dplyr::select(sys_id, type, subtype, sys_beg, sys_end, protein_in_syst, genes_count, name_of_profiles_in_sys) %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper("striped", full_width = TRUE) %>% + kableExtra::scroll_box(width = "100%", height = "100%") %>% + cat() + } else { + cat("No Defense-Finder systems detected.\n\n") + } + + if (is.null(genomad_summary) || nrow(genomad_summary) == 0) { + q + log_debug(paste("No geNomad summary data found for sample:", sample)) + return() + } - defense_finder_data %>% - select(sys_id, type, subtype, sys_beg, sys_end, protein_in_syst, genes_count, name_of_profiles_in_sys) %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper("striped", full_width = TRUE) %>% - scroll_box(width = "100%", height = "100%") %>% - cat() - } else { - cat("No Defense-Finder systems detected.\n\n") - } - - if (is.null(genomad_summary) || nrow(genomad_summary) == 0) { - log_debug(paste("No geNomad summary data found for sample:", sample)) - return() - } - - if (length(seqlevels(host_genome_ideogram)) == 1) { - host_genome_size <- sum(width(host_genome_ideogram)) - } else { - virus_containing_contigs <- unique(sub("\\|.*", "", genomad_summary$seq_name)) - virus_containing_contigs <- paste0("c_", as.numeric(factor(virus_containing_contigs))) - filtered_host_genome <- subset_and_update_ideogram(host_genome_ideogram, virus_containing_contigs) - host_genome_size <- sum(width(filtered_host_genome)) - } - - # Process proviruses - proviruses_gr_features <- process_proviruses(genomad_summary) - - cat("**Genomad and CheckV Summary**:\n\n") - genomad_summary %>% - select(seq_name, taxonomy, topology, coordinates, length) %>% - left_join( - checkv_data %>% select(contig_id, gene_count, viral_genes, checkv_quality, miuvig_quality), - by = c("seq_name" = "contig_id")) %>% - select(seq_name, length, gene_count, viral_genes, checkv_quality, miuvig_quality, taxonomy, topology, coordinates) %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper("striped", full_width = TRUE) %>% - scroll_box(width = "100%", height = "100%") %>% - cat() - - cat("**Host Genome Ideogram with Phages**:\n\n") - plot_genome_ideogram(sample, proviruses_gr_features) - cat('In this circular plot, **"c"** indicates the contig, and the number that follows (e.g., **c1**) represents the contig number. + if (length(GenomeInfoDb::seqlevels(host_genome_ideogram)) == 1) { + host_genome_size <- sum(width(host_genome_ideogram)) + } else { + virus_containing_contigs <- unique(sub("\\|.*", "", genomad_summary$seq_name)) + virus_containing_contigs <- paste0("c_", as.numeric(factor(virus_containing_contigs))) + filtered_host_genome <- subset_and_update_ideogram(host_genome_ideogram, virus_containing_contigs) + host_genome_size <- sum(width(filtered_host_genome)) + } + + # Process proviruses + proviruses_gr_features <- process_proviruses(genomad_summary) + + cat("**Genomad and CheckV Summary**:\n\n") + genomad_summary %>% + dplyr::select(seq_name, taxonomy, topology, coordinates, length) %>% + dplyr::left_join( + checkv_data %>% dplyr::select(contig_id, gene_count, viral_genes, checkv_quality, miuvig_quality), + by = c("seq_name" = "contig_id") + ) %>% + dplyr::select(seq_name, length, gene_count, viral_genes, checkv_quality, miuvig_quality, taxonomy, topology, coordinates) %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper("striped", full_width = TRUE) %>% + kableExtra::scroll_box(width = "100%", height = "100%") %>% + cat() + + cat("**Host Genome Ideogram with Phages**:\n\n") + plot_genome_ideogram(sample, proviruses_gr_features) + cat('In this circular plot, **"c"** indicates the contig, and the number that follows (e.g., **c1**) represents the contig number. If multiple contigs are present in the genome, each will be shown with a distinct label (e.g., **c1**, **c2**, etc.).\n\n') - cat("\n\n") + cat("\n\n") + + # Process phage genomes + cat("### Prophages {.tabset .tabset-fade} \n\n") + cat("**Select prophage to show: ** \n\n") + for (i in seq_len(nrow(genomad_summary))) { + log_debug(paste("Processing phage", i, "of", nrow(genomad_summary), "for sample", sample)) + process_phage(genomad_summary[i, ], genomad_summary, genomad_annotation, checkv_data, host_genome_size) + } + + # Plot dREP if applicable + if (nrow(genomad_summary) > 1) { + cat("### vOTUs\n\n") + plot_drep(sample, genomad_summary) + } - # Process phage genomes - cat("### Prophages {.tabset .tabset-fade} \n\n") - cat("**Select prophage to show: ** \n\n") - for (i in seq_len(nrow(genomad_summary))) { - log_debug(paste("Processing phage", i, "of", nrow(genomad_summary), "for sample", sample)) - process_phage(genomad_summary[i, ], genomad_summary, genomad_annotation, checkv_data, host_genome_size) - } - - # Plot dREP if applicable - if (nrow(genomad_summary) > 1) { - cat("### vOTUs\n\n") - plot_drep(sample, genomad_summary) - } - - # Creating table with Abricate data - if (nrow(abricate_data) > 0) { - cat("### Virulence Genes {.tabset .tabset-fade} \n\n") - cat("Screening of virulence genes present in the prophage contigs. \n\n") - abricate_data %>% select(-number_file) %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper("striped", full_width = TRUE) %>% - scroll_box(width = "100%", height = "100%") %>% cat() - cat("\n\n") - } - - # Creating table with iPHOP - if (nrow(iphop_data) > 0) { - cat("### Prophage-Host Prediction {.tabset .tabset-fade} \n\n") - cat("Prediction of potential hosts for the prophage contigs. \n\n") - iphop_data %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper("striped", full_width = TRUE) %>% - scroll_box(width = "100%", height = "100%") %>% cat() - cat("\n\n") - } - - # Creating table with VIBRANT AMGs - if (nrow(vibrant_data) > 0) { - cat("### AMG Predictions {.tabset .tabset-fade} \n\n") - cat("Prediction of auxiliary metabolic genes in the prophage contigs. \n\n") - vibrant_data %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper("striped", full_width = TRUE) %>% - scroll_box(width = "100%", height = "100%") %>% cat() - cat("\n\n") - } + # Creating table with Abricate data + if (nrow(abricate_data) > 0) { + cat("### Virulence Genes {.tabset .tabset-fade} \n\n") + cat("Screening of virulence genes present in the prophage contigs. \n\n") + abricate_data %>% + dplyr::select(-number_file) %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper("striped", full_width = TRUE) %>% + kableExtra::scroll_box(width = "100%", height = "100%") %>% + cat() + cat("\n\n") + } - log_debug(paste("Finished processing sample:", sample)) - }, error = function(e) { - log_debug(paste("Error in process_sample for", sample, ":", conditionMessage(e))) - cat(paste("Error processing sample", sample, ":", conditionMessage(e), "\n\n")) - }) + # Creating table with iPHOP + if (nrow(iphop_data) > 0) { + cat("### Prophage-Host Prediction {.tabset .tabset-fade} \n\n") + cat("Prediction of potential hosts for the prophage contigs. \n\n") + iphop_data %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper("striped", full_width = TRUE) %>% + kableExtra::scroll_box(width = "100%", height = "100%") %>% + cat() + cat("\n\n") + } + + # Creating table with VIBRANT AMGs + if (nrow(vibrant_data) > 0) { + cat("### AMG Predictions {.tabset .tabset-fade} \n\n") + cat("Prediction of auxiliary metabolic genes in the prophage contigs. \n\n") + vibrant_data %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper("striped", full_width = TRUE) %>% + kableExtra::scroll_box(width = "100%", height = "100%") %>% + cat() + cat("\n\n") + } + + log_debug(paste("Finished processing sample:", sample)) + }, + error = function(e) { + log_debug(paste("Error in process_sample for", sample, ":", conditionMessage(e))) + cat(paste("Error processing sample", sample, ":", conditionMessage(e), "\n\n")) + } + ) } process_phage <- function(virus, genomad_summary, genomad_annotation, checkv_data, host_genome_size) { - cat(paste("#### Phage ID:", virus$seq_name, " {.tabset .tabset-fade} \n\n")) + cat(paste("#### Phage ID:", virus$seq_name, " {.tabset .tabset-fade} \n\n")) + + current_contig <- sub("\\|.*", "", virus$seq_name) + + provirus_start <- as.numeric(sub(".*provirus_(\\d+)_\\d+", "\\1", virus$seq_name)) + provirus_end <- as.numeric(sub(".*provirus_\\d+_(\\d+)", "\\1", virus$seq_name)) + virus_length <- provirus_end - provirus_start + 1 + + current_contig_base <- sub("\\|provirus_.*", "", virus$seq_name) + current_provirus_range <- sub(".*\\|provirus_", "", virus$seq_name) + current_annotations <- genomad_annotation[grepl(paste0(current_contig_base, "\\|provirus_", current_provirus_range, "_"), + genomad_annotation$gene, + fixed = FALSE + ), ] %>% + dplyr::mutate(arrow_pos = ifelse(strand == -1, "start", "end")) - current_contig <- sub("\\|.*", "", virus$seq_name) - - provirus_start <- as.numeric(sub(".*provirus_(\\d+)_\\d+", "\\1", virus$seq_name)) - provirus_end <- as.numeric(sub(".*provirus_\\d+_(\\d+)", "\\1", virus$seq_name)) - virus_length <- provirus_end - provirus_start + 1 - - current_contig_base <- sub("\\|provirus_.*", "", virus$seq_name) - current_provirus_range <- sub(".*\\|provirus_", "", virus$seq_name) - current_annotations <- genomad_annotation[grepl(paste0(current_contig_base, "\\|provirus_", current_provirus_range, "_"), - genomad_annotation$gene, fixed = FALSE), ] %>% - mutate(arrow_pos = ifelse(strand == -1, "start", "end")) - - - cat("\n\n**Phage–Host Genome Ideogram:**\n\n") - - plot_phage_circos(virus, genomad_summary, current_annotations, virus_length, host_genome_size, provirus_start, provirus_end, checkv_data) - - cat("\n\n") - cat("\n\n**Genes Annotation (geNomad):**\n\n") - - current_annotations %>% - select(gene, length, marker, annotation_accessions, annotation_description) %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper() %>% - cat() - cat("\n\n") + + cat("\n\n**Phage-Host Genome Ideogram:**\n\n") + + plot_phage_circos(virus, genomad_summary, current_annotations, virus_length, host_genome_size, provirus_start, provirus_end, checkv_data) + + cat("\n\n") + cat("\n\n**Genes Annotation (geNomad):**\n\n") + + current_annotations %>% + dplyr::select(gene, length, marker, annotation_accessions, annotation_description) %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper() %>% + cat() + cat("\n\n") } plot_phage_circos <- function(virus, genomad_summary, current_annotations, virus_length, host_genome_size, provirus_start, provirus_end, checkv_data) { - tryCatch({ - log_debug("Starting plot_phage_circos function") - log_debug(paste("Current virus:", virus$seq_name)) - log_debug(paste("Virus length:", virus_length)) - log_debug(paste("Host genome size:", host_genome_size)) - log_debug(paste("Provirus start:", provirus_start)) - log_debug(paste("Provirus end:", provirus_end)) - - # Check for NA or invalid values in input parameters - if (is.na(virus_length) || virus_length <= 0) { - log_debug("Error: Invalid virus length") - return(NULL) - } - if (is.na(host_genome_size) || host_genome_size <= 0) { - log_debug("Error: Invalid host genome size") - return(NULL) - } - if (is.na(provirus_start) || provirus_start < 0) { - log_debug("Error: Invalid provirus start position") - return(NULL) - } - if (is.na(provirus_end) || provirus_end <= provirus_start) { - log_debug("Error: Invalid provirus end position") - return(NULL) - } - - # Extract contig information - current_contig <- sub("\\|.*", "", virus$seq_name) - log_debug(paste("Current contig:", current_contig)) - - contig_viruses <- genomad_summary[grepl(paste0("^", current_contig), genomad_summary$seq_name), ] - if (nrow(contig_viruses) == 0) { - log_debug("Error: No viruses found for the current contig") - return(NULL) - } - - contig_length <- max(as.numeric(sub(".*_(\\d+)$", "\\1", contig_viruses$seq_name))) - if (is.na(contig_length) || contig_length <= 0) { - contig_length <- virus_length # Use virus length as fallback if contig length is invalid - log_debug(paste("Using virus length as contig length:", contig_length)) - } else { - log_debug(paste("Contig length:", contig_length)) - } - - if (provirus_end > contig_length) { - log_debug("Error: Provirus end position exceeds contig length") - return(NULL) - } - - log_debug("Clearing circos") - circos.clear() - - log_debug("Setting circos parameters") - circos.par(start.degree = 180, gap.degree = 10, track.margin = c(0.01, 0.01)) - - main_color <- "#a58bc5" - zoom_color <- "#e9d27d" - - zoom_start <- (provirus_start / contig_length) * 100 - zoom_end <- (provirus_end / contig_length) * 100 - - log_debug(paste("Zoom start:", zoom_start)) - log_debug(paste("Zoom end:", zoom_end)) - - log_debug("Initializing circos") - circos.initialize(factors = c("Zoom", "Main"), xlim = c(0, 100)) - - format_genome_labels <- function(x) { - ifelse(x >= 1e6, paste0(round(x / 1e6, 2), " Mb"), - ifelse(x >= 1e3, paste0(round(x / 1e3, 2), " Kb"), - paste0(x, " bp"))) - } - - log_debug("Adding link") - tryCatch({ - circos.link("Main", c(zoom_start, zoom_end), "Zoom", c(0, 100), - rou1 = 0.8, - rou2 = 0.97, - h.ratio = 0.55, # width? - lty = 2, - lwd = 0.5, - h2 = 1, - col = "grey99", border = "grey80") - }, error = function(e) { - log_debug(paste("Error in circos.link:", e$message)) - }) - - log_debug("Adding zoom track") - circos.track(factors = "Zoom", ylim = c(0, 1), track.height = 0.15, - panel.fun = function(x, y) { - circos.rect(0, 0, 100, 1, col = zoom_color, border = NA) - axis_labels <- seq(0, virus_length, length.out = 6) - axis_positions <- seq(0, 100, length.out = 6) - circos.axis(h = "top", major.at = axis_positions, - labels = format_genome_labels(axis_labels), - labels.cex = 0.7, direction = "outside") + tryCatch( + { + log_debug("Starting plot_phage_circos function") + log_debug(paste("Current virus:", virus$seq_name)) + log_debug(paste("Virus length:", virus_length)) + log_debug(paste("Host genome size:", host_genome_size)) + log_debug(paste("Provirus start:", provirus_start)) + log_debug(paste("Provirus end:", provirus_end)) + + # Check for NA or invalid values in input parameters + if (is.na(virus_length) || virus_length <= 0) { + log_debug("Error: Invalid virus length", virus_length) + return(NULL) + } + if (is.na(host_genome_size) || host_genome_size <= 0) { + log_debug("Error: Invalid host genome size") + return(NULL) + } + if (is.na(provirus_start) || provirus_start < 0) { + log_debug("Error: Invalid provirus start position") + return(NULL) + } + if (is.na(provirus_end) || provirus_end <= provirus_start) { + log_debug("Error: Invalid provirus end position") + return(NULL) + } + + # Extract contig information + current_contig <- sub("\\|.*", "", virus$seq_name) + log_debug(paste("Current contig:", current_contig)) + + contig_viruses <- genomad_summary[grepl(paste0("^", current_contig), genomad_summary$seq_name), ] + if (nrow(contig_viruses) == 0) { + log_debug("Error: No viruses found for the current contig") + return(NULL) + } + + contig_length <- max(as.numeric(sub(".*_(\\d+)$", "\\1", contig_viruses$seq_name))) + if (is.na(contig_length) || contig_length <= 0) { + contig_length <- virus_length # Use virus length as fallback if contig length is invalid + log_debug(paste("Using virus length as contig length:", contig_length)) + } else { + log_debug(paste("Contig length:", contig_length)) + } + + if (provirus_end > contig_length) { + log_debug("Error: Provirus end position exceeds contig length") + return(NULL) + } + + log_debug("Clearing circos") + circlize::circos.clear() + + log_debug("Setting circos parameters") + circlize::circos.par(start.degree = 180, gap.degree = 10, track.margin = c(0.01, 0.01)) + + main_color <- "#a58bc5" + zoom_color <- "#e9d27d" + + zoom_start <- (provirus_start / contig_length) * 100 + zoom_end <- (provirus_end / contig_length) * 100 + + log_debug(paste("Zoom start:", zoom_start)) + log_debug(paste("Zoom end:", zoom_end)) + + log_debug("Initializing circos") + circlize::circos.initialize(factors = c("Zoom", "Main"), xlim = c(0, 100)) + + format_genome_labels <- function(x) { + ifelse(x >= 1e6, paste0(round(x / 1e6, 2), " Mb"), + ifelse(x >= 1e3, paste0(round(x / 1e3, 2), " Kb"), + paste0(x, " bp") + ) + ) + } + + log_debug("Adding link") + tryCatch( + { + circlize::circos.link("Main", c(zoom_start, zoom_end), "Zoom", c(0, 100), + rou1 = 0.8, + rou2 = 0.97, + h.ratio = 0.55, # width? + lty = 2, + lwd = 0.5, + h2 = 1, + col = "grey99", border = "grey80" + ) + }, + error = function(e) { + log_debug(paste("Error in circlize::circos.link:", e$message)) + } + ) - for (i in 1:nrow(current_annotations)) { - gene_start <- current_annotations$start[i] - gene_end <- current_annotations$end[i] - arrow_start <- (gene_start - provirus_start) / virus_length * 100 - arrow_end <- (gene_end - provirus_start) / virus_length * 100 + log_debug("Adding zoom track") + circlize::circos.track( + factors = "Zoom", ylim = c(0, 1), track.height = 0.15, + panel.fun = function(x, y) { + circlize::circos.rect(0, 0, 100, 1, col = zoom_color, border = NA) + axis_labels <- seq(0, virus_length, length.out = 6) + axis_positions <- seq(0, 100, length.out = 6) + circlize::circos.axis( + h = "top", major.at = axis_positions, + labels = format_genome_labels(axis_labels), + labels.cex = 0.7, direction = "outside" + ) - circos.arrow(arrow_start, arrow_end, y1 = 0, y2 = 1, - arrow.head.width = 0.75, arrow.head.length = cm_x(0.1), - arrow.position = current_annotations$arrow_pos[i], - col = ifelse(is.na(current_annotations$annotation_description[i]), "grey", "#7fbfff"), - border = ifelse(is.na(current_annotations$annotation_description[i]), "grey20", "darkblue")) - } - }, bg.border = NA) + for (i in 1:nrow(current_annotations)) { + gene_start <- current_annotations$start[i] + gene_end <- current_annotations$end[i] + arrow_start <- (gene_start - provirus_start) / virus_length * 100 + arrow_end <- (gene_end - provirus_start) / virus_length * 100 - log_debug("Adding main track") - circos.track(factors = "Main", ylim = c(0, 1), track.height = 0.1, - panel.fun = function(x, y) { - circos.rect(xleft = 0, ybottom = 0, xright = 100, ytop = 1, col = main_color, border = NA) - - for (i in 1:nrow(contig_viruses)) { - virus_start <- as.numeric(sub(".*provirus_(\\d+)_\\d+", "\\1", contig_viruses$seq_name[i])) - virus_end <- as.numeric(sub(".*provirus_\\d+_(\\d+)", "\\1", contig_viruses$seq_name[i])) + circlize::circos.arrow(arrow_start, arrow_end, + arrow.head.width = 0.75, arrow.head.length = cm_x(0.1), + arrow.position = current_annotations$arrow_pos[i], + col = ifelse(is.na(current_annotations$annotation_description[i]), "grey", "#7fbfff"), + border = ifelse(is.na(current_annotations$annotation_description[i]), "grey20", "darkblue") + ) + } + }, bg.border = NA + ) - virus_start_percent <- (virus_start / contig_length) * 100 - virus_end_percent <- (virus_end / contig_length) * 100 - - rect_color <- if (contig_viruses$seq_name[i] == virus$seq_name) zoom_color else adjustcolor(zoom_color, alpha.f = 0.7) + log_debug("Adding main track") + circlize::circos.track( + factors = "Main", ylim = c(0, 1), track.height = 0.1, + panel.fun = function(x, y) { + circlize::circos.rect(xleft = 0, ybottom = 0, xright = 100, ytop = 1, col = main_color, border = NA) - circos.rect(xleft = virus_start_percent, ybottom = 0, - xright = virus_end_percent, ytop = 1, - col = rect_color, border = NA) - } + for (i in 1:nrow(contig_viruses)) { + virus_start <- as.numeric(sub(".*provirus_(\\d+)_\\d+", "\\1", contig_viruses$seq_name[i])) + virus_end <- as.numeric(sub(".*provirus_\\d+_(\\d+)", "\\1", contig_viruses$seq_name[i])) + + virus_start_percent <- (virus_start / contig_length) * 100 + virus_end_percent <- (virus_end / contig_length) * 100 + + rect_color <- if (contig_viruses$seq_name[i] == virus$seq_name) zoom_color else adjustcolor(zoom_color, alpha.f = 0.7) - axis_labels <- seq(0, contig_length, length.out = 6) - axis_positions <- seq(0, 100, length.out = 6) - circos.axis(h = "top", major.at = axis_positions, - labels = format_genome_labels(axis_labels), - labels.cex = 0.7, direction = "outside") - }, bg.border = NA) - - log_debug("Locating phage positions") - phage_positions <- sapply(1:nrow(contig_viruses), function(i) { - virus_start <- as.numeric(sub(".*provirus_(\\d+)_\\d+", "\\1", contig_viruses$seq_name[i])) - virus_end <- as.numeric(sub(".*provirus_\\d+_(\\d+)", "\\1", contig_viruses$seq_name[i])) - ((virus_start + virus_end) / 2 / contig_length) * 100 - }) - - log_debug("Annotating names on phage positions") - - # Extract start and end positions from sequence names - start_positions <- as.numeric(sub(".*provirus_([0-9]+)_.*", "\\1", contig_viruses$seq_name)) - end_positions <- as.numeric(sub(".*provirus_[0-9]+_([0-9]+)", "\\1", contig_viruses$seq_name)) - - # Create phage labels with the desired format - phage_labels <- paste0(round(contig_viruses$length / 1e3, 2), " Kb") - - # Apply labels to circos plot - circos.labels( - sectors = "Main", - x = phage_positions, - labels = phage_labels, - facing = "reverse.clockwise", - niceFacing = TRUE, - col = "black", - cex = 0.6, - side = "inside", - connection_height = 0.02, - line_col = "gray" - ) - - center_x <- 50 - virus_name <- virus$seq_name - taxonomy <- virus$taxonomy - - log_debug("Adding taxonomy and virus name to the plot") - circos.text(x = center_x, y = -0.2, labels = taxonomy, + circlize::circos.rect( + xleft = virus_start_percent, ybottom = 0, + xright = virus_end_percent, ytop = 1, + col = rect_color, border = NA + ) + } + + axis_labels <- seq(0, contig_length, length.out = 6) + axis_positions <- seq(0, 100, length.out = 6) + circlize::circos.axis( + h = "top", major.at = axis_positions, + labels = format_genome_labels(axis_labels), + labels.cex = 0.7, direction = "outside" + ) + }, bg.border = NA + ) + + log_debug("Locating phage positions") + phage_positions <- sapply(1:nrow(contig_viruses), function(i) { + virus_start <- as.numeric(sub(".*provirus_(\\d+)_\\d+", "\\1", contig_viruses$seq_name[i])) + virus_end <- as.numeric(sub(".*provirus_\\d+_(\\d+)", "\\1", contig_viruses$seq_name[i])) + ((virus_start + virus_end) / 2 / contig_length) * 100 + }) + + log_debug("Annotating names on phage positions") + + # Extract start and end positions from sequence names + start_positions <- as.numeric(sub(".*provirus_([0-9]+)_.*", "\\1", contig_viruses$seq_name)) + end_positions <- as.numeric(sub(".*provirus_[0-9]+_([0-9]+)", "\\1", contig_viruses$seq_name)) + + # Create phage labels with the desired format + phage_labels <- paste0(round(contig_viruses$length / 1e3, 2), " Kb") + + # Apply labels to circos plot + circlize::circos.labels( + sectors = "Main", + x = phage_positions, + labels = phage_labels, + facing = "reverse.clockwise", + niceFacing = TRUE, + col = "black", + cex = 0.6, + side = "inside", + connection_height = 0.02, + line_col = "gray" + ) + + center_x <- 50 + virus_name <- virus$seq_name + taxonomy <- virus$taxonomy + + log_debug("Adding taxonomy and virus name to the plot") + circlize::circos.text( + x = center_x, y = -0.2, labels = taxonomy, sector.index = "Zoom", track.index = 1, facing = "bending.inside", niceFacing = TRUE, - adj = c(0.5, 0.7), cex = 0.8) - - checkv_info <- checkv_data[checkv_data$contig_id == virus$seq_name, ] - if (nrow(checkv_info) > 0) { - checkv_quality <- checkv_info$checkv_quality - gene_count <- checkv_info$gene_count - viral_genes <- checkv_info$viral_genes - host_genes <- checkv_info$host_genes - miuvig_quality <- checkv_info$miuvig_quality - completeness <- checkv_info$completeness - completeness_method <- checkv_info$completeness_method - contamination <- checkv_info$contamination - - circos.text( - x = center_x, y = -0.5, - labels = paste("CheckV Quality:", checkv_quality, " - miuvig Quality:", miuvig_quality), - sector.index = "Zoom", track.index = 2, - facing = "bending.inside", niceFacing = TRUE, - adj = c(0.5, 0), cex = 0.7 - ) - - circos.text( - x = center_x, y = -1.5, - labels = paste("Gene Count:", gene_count, " - Viral Genes:", viral_genes, " - Host Genes:", host_genes), - sector.index = "Zoom", track.index = 2, - facing = "bending.inside", niceFacing = TRUE, - adj = c(0.5, 0), cex = 0.7 - ) - - circos.text( - x = center_x, y = -2.5, - labels = paste("Completeness:", completeness, " - Contamination:", contamination), - sector.index = "Zoom", track.index = 2, - facing = "bending.inside", niceFacing = TRUE, - adj = c(0.5, 0), cex = 0.7 - ) - } - - log_debug("Adding legend") - # Add legend - legend("topright", - legend = c("Annotated gene", "Unknown gene"), - fill = c("#7fbfff", "grey"), - border = c("darkblue", "grey20"), - cex = 0.8, - bty = "n") - - log_debug("Clearing circos") - circos.clear() - log_debug("Finished plot_phage_circos function successfully") - }, error = function(e) { - log_debug(paste("Error in plot_phage_circos:", e$message)) - circos.clear() - }) + adj = c(0.5, 0.7), cex = 0.8 + ) + + checkv_info <- checkv_data[checkv_data$contig_id == virus$seq_name, ] + if (nrow(checkv_info) > 0) { + checkv_quality <- checkv_info$checkv_quality + gene_count <- checkv_info$gene_count + viral_genes <- checkv_info$viral_genes + host_genes <- checkv_info$host_genes + miuvig_quality <- checkv_info$miuvig_quality + completeness <- checkv_info$completeness + completeness_method <- checkv_info$completeness_method + contamination <- checkv_info$contamination + + circlize::circos.text( + x = center_x, y = -0.5, + labels = paste("CheckV Quality:", checkv_quality, " - miuvig Quality:", miuvig_quality), + sector.index = "Zoom", track.index = 2, + facing = "bending.inside", niceFacing = TRUE, + adj = c(0.5, 0), cex = 0.7 + ) + + circlize::circos.text( + x = center_x, y = -1.5, + labels = paste("Gene Count:", gene_count, " - Viral Genes:", viral_genes, " - Host Genes:", host_genes), + sector.index = "Zoom", track.index = 2, + facing = "bending.inside", niceFacing = TRUE, + adj = c(0.5, 0), cex = 0.7 + ) + + circlize::circos.text( + x = center_x, y = -2.5, + labels = paste("Completeness:", completeness, " - Contamination:", contamination), + sector.index = "Zoom", track.index = 2, + facing = "bending.inside", niceFacing = TRUE, + adj = c(0.5, 0), cex = 0.7 + ) + } + + log_debug("Adding legend") + # Add legend + legend("topright", + legend = c("Annotated gene", "Unknown gene"), + fill = c("#7fbfff", "grey"), + border = c("darkblue", "grey20"), + cex = 0.8, + bty = "n" + ) + + log_debug("Clearing circos") + circlize::circos.clear() + log_debug("Finished plot_phage_circos function successfully") + }, + error = function(e) { + log_debug(paste("Error in plot_phage_circos:", e$message)) + circlize::circos.clear() + } + ) } plot_drep <- function(sample, genomad_summary) { - drep_file_path <- file.path(params$outdir, "virus_analyses", "drep_compare", sample, "data_tables", "Cdb.csv") - drep_data <- read_csv(drep_file_path) %>% clean_names() - drep_data <- cbind(genomad_summary$seq_name, drep_data) - - cat("When more than 1 phage is detected in the host genome, we perform a clustering step using the tool dRep.\n\n") - cat("A threshold of 0.95 was applied to the ANI similarity index to define clusters of virus operational taxonomic units (vOTUs).") - - cat("\n\n**Final cluster designations**\n\n") - drep_data %>% - kbl() %>% - kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% - kable_paper("striped", full_width = TRUE) %>% - cat() - - # Insert the PDF plot - plot_path <- file.path(params$outdir, "virus_analyses", "drep_compare", sample, "figures", "Primary_clustering_dendrogram.pdf") - png_path <- file.path(params$outdir, "virus_analyses", "drep_compare", sample, "figures", "Primary_clustering_dendrogram.png") - - if (file.exists(plot_path)) { - pdftools::pdf_convert(plot_path, format = "png", filenames = png_path, verbose = FALSE, dpi=150) - base64_str <- base64enc::dataURI(file = png_path, mime = "image/png") - cat("**Primary clustering plot**\n\n") - cat(sprintf( - '
',base64_str - )) - } else { - cat("**No dRep clustering plot found.**\n\n") - } - + drep_file_path <- file.path(params$outdir, "virus_analyses", "drep_compare", sample, "data_tables", "Cdb.csv") + drep_data <- read_csv(drep_file_path) %>% janitor::clean_names() + drep_data <- cbind(genomad_summary$seq_name, drep_data) + + cat("When more than 1 phage is detected in the host genome, we perform a clustering step using the tool dRep.\n\n") + cat("A threshold of 0.95 was applied to the ANI similarity index to define clusters of virus operational taxonomic units (vOTUs).") + + cat("\n\n**Final cluster designations**\n\n") + drep_data %>% + kableExtra::kbl() %>% + kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% + kableExtra::kable_paper("striped", full_width = TRUE) %>% + cat() + + # Insert the PDF plot + plot_path <- file.path(params$outdir, "virus_analyses", "drep_compare", sample, "figures", "Primary_clustering_dendrogram.pdf") + png_path <- file.path(params$outdir, "virus_analyses", "drep_compare", sample, "figures", "Primary_clustering_dendrogram.png") + + if (file.exists(plot_path)) { + pdftools::pdf_convert(plot_path, format = "png", filenames = png_path, verbose = FALSE, dpi = 150) + base64_str <- base64enc::dataURI(file = png_path, mime = "image/png") + cat("**Primary clustering plot**\n\n") + cat(sprintf( + '
', base64_str + )) + } else { + cat("**No dRep clustering plot found.**\n\n") + } } subset_and_update_ideogram <- function(ideogram, contigs) { - filtered <- ideogram[seqnames(ideogram) %in% contigs] - seqlevels(filtered) <- contigs - seqinfo(filtered) <- seqinfo(filtered)[contigs] - filtered + filtered <- ideogram[seqnames(ideogram) %in% contigs] + GenomeInfoDb::seqlevels(filtered) <- contigs + seqinfo(filtered) <- seqinfo(filtered)[contigs] + filtered } render_all_samples <- function(test_mode = FALSE) { - if (test_mode) { - if (nrow(combined_unique) > 0) { - cat("**Select sample to show:** \n\n\n") - current_sample <- combined_unique$Sample[6] - process_sample(current_sample, combined_unique, host_genomes_paths, genome_data) + if (test_mode) { + if (nrow(combined_unique) > 0) { + cat("**Select sample to show:** \n\n\n") + current_sample <- combined_unique$Sample[6] + process_sample(current_sample, combined_unique, host_genomes_paths, genome_data) + } else { + print("No samples can be further analysed.") + } } else { - print("No samples can be further analysed.") + cat("**Select sample to show:** \n\n\n") + for (i in seq_len(nrow(combined_unique))) { + current_sample <- combined_unique$Sample[i] + process_sample(current_sample, combined_unique, host_genomes_paths, genome_data) + } } - } else { - cat("**Select sample to show:** \n\n\n") - for (i in seq_len(nrow(combined_unique))) { - current_sample <- combined_unique$Sample[i] - process_sample(current_sample, combined_unique, host_genomes_paths, genome_data) - } - } } # Execute the main function # Test mode processes one sample only render_all_samples(test_mode = F) ``` - -# Citation - -Cite this work: XXXXX - - diff -r 315c2ed31af1 -r 3a7f73d638ba test-data/checkm2.Quality_report-wext.tabular --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/checkm2.Quality_report-wext.tabular Tue Jul 22 11:09:24 2025 +0000 @@ -0,0 +1,23 @@ +Name Completeness Contamination Completeness_Model_Used Translation_Table_Used Coding_Density Contig_N50 Average_Gene_Length Genome_Size GC_Content Total_Coding_Sequences Total_Contigs Max_Contig_Length Additional_Notes +NC_000913.fasta 100.0 0.13 Neural Network (Specific Model) 11 0.876 4641652 314.20629775410976 4641652 0.51 4319 1 4641652 None +NC_002737.fasta 99.99 0.48 Neural Network (Specific Model) 11 0.856 1852433 298.1390765765766 1852433 0.39 1776 1 1852433 None +NC_003450.fasta 100.0 0.29 Neural Network (Specific Model) 11 0.872 3309401 313.8808205796158 3309401 0.54 3071 1 3309401 None +NC_008261.fasta 100.0 0.14 Neural Network (Specific Model) 11 0.837 3256683 316.0170197985412 3256683 0.28 2879 1 3256683 None +NC_009012.fasta 100.0 1.09 Neural Network (Specific Model) 11 0.849 3843301 320.7092407298411 3843301 0.39 3398 1 3843301 None +NC_012982.fasta 100.0 0.05 Neural Network (Specific Model) 11 0.897 3455622 328.879173290938 3455622 0.45 3145 1 3455622 None +NC_014008.fasta 100.0 0.02 Neural Network (Specific Model) 11 0.902 3750771 358.93038779402417 3750771 0.54 3146 1 3750771 None +NC_014168.fasta 99.99 0.03 Neural Network (Specific Model) 11 0.907 3157527 311.3860162601626 3157527 0.67 3075 1 3157527 None +NC_014211.fasta 14.07 0.01 Neural Network (Specific Model) 11 0.836 775354 308.61626248216834 775354 0.72 701 1 775354 None +NC_014212.fasta 99.99 0.16 Neural Network (Specific Model) 11 0.904 3249394 303.41267387944356 3249394 0.62 3235 1 3249394 None +NC_014363.fasta 99.7 0.35 Neural Network (Specific Model) 11 0.874 2051896 337.48281690140846 2051896 0.65 1775 1 2051896 None +NC_014364.fasta 99.98 1.88 Neural Network (Specific Model) 11 0.931 4653970 337.5459421641791 4653970 0.49 4288 1 4653970 None +NC_015761.fasta 100.0 0.14 Neural Network (Specific Model) 11 0.871 4460105 320.1726352185725 4460105 0.51 4049 1 4460105 None +NC_017033.fasta 99.99 0.04 Neural Network (Specific Model) 11 0.872 3603458 329.2016938519448 3603458 0.63 3188 1 3603458 None +NC_017095.fasta 99.95 2.67 Neural Network (Specific Model) 11 0.916 2166381 329.75584286424663 2166381 0.39 2011 1 2166381 None +NC_018014.fasta 99.99 9.38 Neural Network (Specific Model) 11 0.891 5227858 357.07764759935674 5227858 0.6 4353 1 5227858 None +NC_018068.fasta 99.99 0.42 Neural Network (Specific Model) 11 0.824 4926837 296.0867001528718 4926837 0.42 4579 1 4926837 None +NC_018515.fasta 100.0 1.83 Neural Network (Specific Model) 11 0.834 4873567 302.3391304347826 4873567 0.42 4485 1 4873567 None +NC_019897.fasta 99.95 0.34 Neural Network (Specific Model) 11 0.874 4206343 318.1461139896373 4206343 0.61 3860 1 4206343 None +NC_019904.fasta 100.0 0.23 Neural Network (Specific Model) 11 0.869 5608040 351.0986827898942 5608040 0.45 4631 1 5608040 None +NC_019936.fasta 100.0 0.13 Neural Network (Specific Model) 11 0.896 4575057 320.7199437543942 4575057 0.63 4267 1 4575057 None +NC_021184.fasta 100.0 3.04 Neural Network (Specific Model) 11 0.823 4855529 289.67620906527867 4855529 0.45 4611 1 4855529 None diff -r 315c2ed31af1 -r 3a7f73d638ba test-data/checkm2.Quality_report.tabular --- a/test-data/checkm2.Quality_report.tabular Wed Jun 04 17:36:40 2025 +0000 +++ b/test-data/checkm2.Quality_report.tabular Tue Jul 22 11:09:24 2025 +0000 @@ -1,23 +1,23 @@ Name Completeness Contamination Completeness_Model_Used Translation_Table_Used Coding_Density Contig_N50 Average_Gene_Length Genome_Size GC_Content Total_Coding_Sequences Total_Contigs Max_Contig_Length Additional_Notes -NC_000913.fasta 100.0 0.13 Neural Network (Specific Model) 11 0.876 4641652 314.20629775410976 4641652 0.51 4319 1 4641652 None -NC_002737.fasta 99.99 0.48 Neural Network (Specific Model) 11 0.856 1852433 298.1390765765766 1852433 0.39 1776 1 1852433 None -NC_003450.fasta 100.0 0.29 Neural Network (Specific Model) 11 0.872 3309401 313.8808205796158 3309401 0.54 3071 1 3309401 None -NC_008261.fasta 100.0 0.14 Neural Network (Specific Model) 11 0.837 3256683 316.0170197985412 3256683 0.28 2879 1 3256683 None -NC_009012.fasta 100.0 1.09 Neural Network (Specific Model) 11 0.849 3843301 320.7092407298411 3843301 0.39 3398 1 3843301 None -NC_012982.fasta 100.0 0.05 Neural Network (Specific Model) 11 0.897 3455622 328.879173290938 3455622 0.45 3145 1 3455622 None -NC_014008.fasta 100.0 0.02 Neural Network (Specific Model) 11 0.902 3750771 358.93038779402417 3750771 0.54 3146 1 3750771 None -NC_014168.fasta 99.99 0.03 Neural Network (Specific Model) 11 0.907 3157527 311.3860162601626 3157527 0.67 3075 1 3157527 None -NC_014211.fasta 14.07 0.01 Neural Network (Specific Model) 11 0.836 775354 308.61626248216834 775354 0.72 701 1 775354 None -NC_014212.fasta 99.99 0.16 Neural Network (Specific Model) 11 0.904 3249394 303.41267387944356 3249394 0.62 3235 1 3249394 None -NC_014363.fasta 99.7 0.35 Neural Network (Specific Model) 11 0.874 2051896 337.48281690140846 2051896 0.65 1775 1 2051896 None -NC_014364.fasta 99.98 1.88 Neural Network (Specific Model) 11 0.931 4653970 337.5459421641791 4653970 0.49 4288 1 4653970 None -NC_015761.fasta 100.0 0.14 Neural Network (Specific Model) 11 0.871 4460105 320.1726352185725 4460105 0.51 4049 1 4460105 None -NC_017033.fasta 99.99 0.04 Neural Network (Specific Model) 11 0.872 3603458 329.2016938519448 3603458 0.63 3188 1 3603458 None -NC_017095.fasta 99.95 2.67 Neural Network (Specific Model) 11 0.916 2166381 329.75584286424663 2166381 0.39 2011 1 2166381 None -NC_018014.fasta 99.99 9.38 Neural Network (Specific Model) 11 0.891 5227858 357.07764759935674 5227858 0.6 4353 1 5227858 None -NC_018068.fasta 99.99 0.42 Neural Network (Specific Model) 11 0.824 4926837 296.0867001528718 4926837 0.42 4579 1 4926837 None -NC_018515.fasta 100.0 1.83 Neural Network (Specific Model) 11 0.834 4873567 302.3391304347826 4873567 0.42 4485 1 4873567 None -NC_019897.fasta 99.95 0.34 Neural Network (Specific Model) 11 0.874 4206343 318.1461139896373 4206343 0.61 3860 1 4206343 None -NC_019904.fasta 100.0 0.23 Neural Network (Specific Model) 11 0.869 5608040 351.0986827898942 5608040 0.45 4631 1 5608040 None -NC_019936.fasta 100.0 0.13 Neural Network (Specific Model) 11 0.896 4575057 320.7199437543942 4575057 0.63 4267 1 4575057 None -NC_021184.fasta 100.0 3.04 Neural Network (Specific Model) 11 0.823 4855529 289.67620906527867 4855529 0.45 4611 1 4855529 None +NC_000913 100.0 0.13 Neural Network (Specific Model) 11 0.876 4641652 314.20629775410976 4641652 0.51 4319 1 4641652 None +NC_002737 99.99 0.48 Neural Network (Specific Model) 11 0.856 1852433 298.1390765765766 1852433 0.39 1776 1 1852433 None +NC_003450 100.0 0.29 Neural Network (Specific Model) 11 0.872 3309401 313.8808205796158 3309401 0.54 3071 1 3309401 None +NC_008261 100.0 0.14 Neural Network (Specific Model) 11 0.837 3256683 316.0170197985412 3256683 0.28 2879 1 3256683 None +NC_009012 100.0 1.09 Neural Network (Specific Model) 11 0.849 3843301 320.7092407298411 3843301 0.39 3398 1 3843301 None +NC_012982 100.0 0.05 Neural Network (Specific Model) 11 0.897 3455622 328.879173290938 3455622 0.45 3145 1 3455622 None +NC_014008 100.0 0.02 Neural Network (Specific Model) 11 0.902 3750771 358.93038779402417 3750771 0.54 3146 1 3750771 None +NC_014168 99.99 0.03 Neural Network (Specific Model) 11 0.907 3157527 311.3860162601626 3157527 0.67 3075 1 3157527 None +NC_014211 14.07 0.01 Neural Network (Specific Model) 11 0.836 775354 308.61626248216834 775354 0.72 701 1 775354 None +NC_014212 99.99 0.16 Neural Network (Specific Model) 11 0.904 3249394 303.41267387944356 3249394 0.62 3235 1 3249394 None +NC_014363 99.7 0.35 Neural Network (Specific Model) 11 0.874 2051896 337.48281690140846 2051896 0.65 1775 1 2051896 None +NC_014364 99.98 1.88 Neural Network (Specific Model) 11 0.931 4653970 337.5459421641791 4653970 0.49 4288 1 4653970 None +NC_015761 100.0 0.14 Neural Network (Specific Model) 11 0.871 4460105 320.1726352185725 4460105 0.51 4049 1 4460105 None +NC_017033 99.99 0.04 Neural Network (Specific Model) 11 0.872 3603458 329.2016938519448 3603458 0.63 3188 1 3603458 None +NC_017095 99.95 2.67 Neural Network (Specific Model) 11 0.916 2166381 329.75584286424663 2166381 0.39 2011 1 2166381 None +NC_018014 99.99 9.38 Neural Network (Specific Model) 11 0.891 5227858 357.07764759935674 5227858 0.6 4353 1 5227858 None +NC_018068 99.99 0.42 Neural Network (Specific Model) 11 0.824 4926837 296.0867001528718 4926837 0.42 4579 1 4926837 None +NC_018515 100.0 1.83 Neural Network (Specific Model) 11 0.834 4873567 302.3391304347826 4873567 0.42 4485 1 4873567 None +NC_019897 99.95 0.34 Neural Network (Specific Model) 11 0.874 4206343 318.1461139896373 4206343 0.61 3860 1 4206343 None +NC_019904 100.0 0.23 Neural Network (Specific Model) 11 0.869 5608040 351.0986827898942 5608040 0.45 4631 1 5608040 None +NC_019936 100.0 0.13 Neural Network (Specific Model) 11 0.896 4575057 320.7199437543942 4575057 0.63 4267 1 4575057 None +NC_021184 100.0 3.04 Neural Network (Specific Model) 11 0.823 4855529 289.67620906527867 4855529 0.45 4611 1 4855529 None diff -r 315c2ed31af1 -r 3a7f73d638ba test-data/gtdbtk.bac120-wext.summary --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/gtdbtk.bac120-wext.summary Tue Jul 22 11:09:24 2025 +0000 @@ -0,0 +1,23 @@ +user_genome classification closest_genome_reference closest_genome_reference_radius closest_genome_taxonomy closest_genome_ani closest_genome_af closest_placement_reference closest_placement_radius closest_placement_taxonomy closest_placement_ani closest_placement_af pplacer_taxonomy classification_method note other_related_references(genome_id,species_name,radius,ANI,AF) msa_percent translation_table red_value warnings +NC_000913.fasta d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli GCF_003697165.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli 96.74 0.856 GCF_000026225.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia fergusonii 91.54 0.56 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__ taxonomic classification defined by topology and ANI N/A GCF_000194175.1, s__Escherichia coli_F, 95.0, 95.46, 0.89; GCF_002965065.1, s__Escherichia sp002965065, 95.0, 94.52, 0.691; GCF_004211955.1, s__Escherichia sp004211955, 95.0, 93.12, 0.774; GCF_005843885.1, s__Escherichia sp005843885, 95.0, 92.76, 0.782; GCF_011881725.1, s__Escherichia coli_E, 95.0, 92.37, 0.807; GCF_029876145.1, s__Escherichia ruysiae, 95.0, 92.28, 0.788; GCF_014836715.1, s__Escherichia whittamii, 95.0, 91.78, 0.782; GCF_002900365.1, s__Escherichia marmotae, 95.0, 90.92, 0.738; GCF_000759775.1, s__Escherichia albertii, 95.0, 90.18, 0.68 98.47 11 N/A N/A +NC_002737.fasta d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes GCF_002055535.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes 99.7 0.968 GCF_002055535.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes 99.7 0.968 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900459225.1, s__Streptococcus dysgalactiae, 95.0, 88.16, 0.456; GCF_900636575.1, s__Streptococcus canis, 95.0, 86.81, 0.468 98.31 11 N/A N/A +NC_003450.fasta d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum GCF_000011325.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum 100.0 1.0 GCF_000011325.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002355155.1, s__Corynebacterium suranareeae, 95.0, 86.55, 0.531; GCF_001643015.1, s__Corynebacterium crudilactis, 95.0, 84.07, 0.363; GCF_001277995.1, s__Corynebacterium deserti, 95.0, 83.77, 0.275 96.43 11 N/A N/A +NC_008261.fasta d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens GCF_000013285.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens 100.0 1.0 GCF_000013285.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens 100.0 1.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_029258205.1, s__Sarcina sp029258205, 95.0, 92.75, 0.744; GCF_029267215.1, s__Sarcina sp029267215, 95.0, 84.03, 0.29 94.4 11 N/A N/A +NC_009012.fasta d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum GCF_000015865.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum 100.0 1.0 GCF_000015865.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum 100.0 1.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000521465.1, s__Hungateiclostridium straminisolvens, 95.0, 84.49, 0.429; GCF_004102745.1, s__Hungateiclostridium mesophilum, 95.0, 81.39, 0.257 94.22 11 N/A N/A +NC_012982.fasta d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica GCF_000023785.1 95.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica 100.0 1.0 GCF_000023785.1 95.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 98.25 11 N/A N/A +NC_014008.fasta d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis GCF_000025905.1 95.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis 100.0 1.0 GCF_000025905.1 95.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis 100.0 1.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 90.76 11 N/A N/A +NC_014168.fasta d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus GCF_000092825.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus 100.0 1.0 GCF_000092825.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000185725.2, s__Segniliparus rugosus, 95.0, 80.18, 0.178 96.56 11 N/A N/A +NC_014211.fasta d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei GCF_000092985.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei 100.0 1.0 GCF_000092985.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_012396365.1, s__Nocardiopsis alborubida, 95.0, 94.89, 0.822; GCA_000340945.1, s__Nocardiopsis synnemataformans, 95.0, 94.67, 0.838; GCA_002529455.1, s__Nocardiopsis fusca, 95.0, 93.48, 0.743; GCF_000341065.1, s__Nocardiopsis halotolerans, 95.0, 89.3, 0.575; GCF_001905145.1, s__Nocardiopsis sp001905145, 95.0, 89.08, 0.554; GCF_008638415.1, s__Nocardiopsis sinuspersici, 95.0, 88.61, 0.531; GCF_008638365.1, s__Nocardiopsis quinghaiensis, 95.0, 88.23, 0.498; GCF_009830945.1, s__Nocardiopsis sp009830945, 95.0, 87.99, 0.442; GCA_937957845.1, s__Nocardiopsis sp937957845, 95.0, 87.38, 0.277; GCF_026642255.1, s__Nocardiopsis nanhaiensis_A, 95.0, 86.06, 0.436; GCF_030271535.1, s__Nocardiopsis sp030271535, 95.0, 84.81, 0.338; GCF_013410755.1, s__Nocardiopsis aegyptia, 95.0, 84.69, 0.277; GCF_018316655.1, s__Nocardiopsis changdeensis, 95.0, 84.65, 0.301; GCF_000341125.1, s__Nocardiopsis lucentensis, 95.0, 84.45, 0.226; GCF_001279585.1, s__Nocardiopsis sp001279585, 95.0, 84.4, 0.267; GCA_018388625.1, s__Nocardiopsis eucommiae, 95.0, 84.39, 0.264; GCF_014201115.1, s__Nocardiopsis metallicus, 95.0, 84.3, 0.268; GCF_003634495.1, s__Nocardiopsis sp003634495, 95.0, 84.18, 0.288; GCF_030766825.1, s__Nocardiopsis sp030766825, 95.0, 84.18, 0.286; GCF_000341085.1, s__Nocardiopsis ganjiahuensis, 95.0, 84.16, 0.28; GCF_030555055.1, s__Nocardiopsis sp030555055, 95.0, 84.06, 0.267; GCF_024134545.1, s__Nocardiopsis exhalans, 95.08, 84.04, 0.267; GCF_900141985.1, s__Nocardiopsis flavescens, 95.0, 83.82, 0.272; GCF_003386285.1, s__Nocardiopsis sp003386285, 95.0, 83.73, 0.264; GCF_014651695.1, s__Nocardiopsis terrae, 95.0, 83.6, 0.27; GCF_014203695.1, s__Nocardiopsis algeriensis, 95.0, 83.59, 0.238; GCF_020741345.1, s__Nocardiopsis listeri_A, 95.0, 83.53, 0.202; GCF_000341225.1, s__Nocardiopsis alba, 95.0, 83.47, 0.217; GCF_018207095.1, s__Nocardiopsis sp018207095, 95.0, 83.32, 0.229; GCF_028882275.1, s__Nocardiopsis sp028882275, 95.0, 83.25, 0.266; GCF_900143625.1, s__Nocardiopsis sp900143625, 95.0, 83.09, 0.229; GCF_000515115.1, s__Nocardiopsis sp000515115, 95.0, 83.02, 0.181; GCF_014892575.1, s__Nocardiopsis coralli, 95.0, 82.85, 0.213; GCF_000341265.1, s__Nocardiopsis prasina, 95.0, 82.63, 0.245; GCF_001942255.1, s__Nocardiopsis sp001942255, 95.0, 82.61, 0.248; GCF_000341005.1, s__Nocardiopsis alkaliphila, 95.0, 82.53, 0.186; GCF_001570765.1, s__Nocardiopsis listeri, 95.0, 82.26, 0.198; GCF_000341025.1, s__Nocardiopsis salina, 95.0, 82.22, 0.158; GCF_000341145.1, s__Nocardiopsis xinjiangensis, 95.0, 81.14, 0.168 26.24 11 N/A N/A +NC_014212.fasta d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus GCF_000092125.1 95.0 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus 100.0 0.999 GCF_000092125.1 95.0 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus 100.0 0.999 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_003226535.1, s__Allomeiothermus sp003226535, 95.0, 87.73, 0.485 93.82 11 N/A N/A +NC_014363.fasta d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli GCF_000143845.1 95.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli 100.0 1.0 GCF_000143845.1 95.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 95.47 11 N/A N/A +NC_014364.fasta d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae GCF_000143985.1 95.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae 100.0 1.0 GCF_000143985.1 95.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae 100.0 1.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000378205.1, s__Sediminispirochaeta bajacaliforniensis, 95.0, 94.41, 0.82 88.42 11 N/A N/A +NC_015761.fasta d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori GCF_000252995.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori 100.0 1.0 GCF_000252995.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000006945.2, s__Salmonella enterica, 95.0, 90.34, 0.758; GCA_900478215.1, s__Salmonella houtenae, 95.0, 90.05, 0.755; GCF_008692785.1, s__Salmonella diarizonae, 95.0, 89.91, 0.746; GCF_008692845.1, s__Salmonella arizonae, 95.0, 89.51, 0.704 97.58 11 N/A N/A +NC_017033.fasta d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia GCF_000242255.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia 100.0 1.0 GCF_000242255.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 98.45 11 N/A N/A +NC_017095.fasta d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans GCF_000235405.2 95.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans 100.0 1.0 GCF_000235405.2 95.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans 100.0 1.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_001644665.1, s__Fervidobacterium pennivorans_A, 95.0, 92.25, 0.827 90.82 11 N/A N/A +NC_018014.fasta d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus GCF_000265425.1 95.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus 100.0 1.0 GCF_000265425.1 95.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus 100.0 1.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900105625.1, s__Terriglobus roseus_B, 95.0, 82.2, 0.254 94.06 11 N/A N/A +NC_018068.fasta d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus GCF_000255115.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus 100.0 1.0 GCF_000255115.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002196705.1, s__Desulfosporosinus sp002196705, 95.0, 81.15, 0.156 93.92 11 N/A N/A +NC_018515.fasta d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei GCF_000231385.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei 100.0 1.0 GCF_000231385.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900100785.1, s__Desulfosporosinus hippei, 95.0, 94.11, 0.765; GCF_000765145.1, s__Desulfosporosinus sp000765145, 95.0, 92.48, 0.692 94.5 11 N/A N/A +NC_019897.fasta d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus GCF_907165215.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus 97.81 0.815 GCF_907165215.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus 97.81 0.815 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002159085.1, s__Thermobacillus sp002159085, 95.0, 85.65, 0.466 95.97 11 N/A N/A +NC_019904.fasta d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis GCF_000325705.1 95.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis 100.0 1.0 GCF_000325705.1 95.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis 100.0 1.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_005281475.1, s__Echinicola rosea, 95.0, 83.28, 0.288; GCF_006575665.1, s__Echinicola soli, 95.0, 81.29, 0.203; GCF_003260975.1, s__Echinicola strongylocentroti, 95.0, 80.38, 0.164 96.96 11 N/A N/A +NC_019936.fasta d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE GCF_000327065.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE 100.0 1.0 GCF_000327065.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_003696315.1, s__Stutzerimonas songnenensis, 95.0, 93.2, 0.909; GCF_000661915.1, s__Stutzerimonas decontaminans, 95.0, 91.33, 0.803; GCF_900114065.1, s__Stutzerimonas kunmingensis, 96.6, 90.17, 0.753; GCF_024397475.1, s__Stutzerimonas xanthomarina_A, 97.0, 90.15, 0.746; GCA_007713455.1, s__Stutzerimonas sp007713455, 95.0, 90.03, 0.748; GCF_014764705.1, s__Stutzerimonas sp002692525, 96.82, 90.01, 0.736; GCA_003530955.1, s__Stutzerimonas sp003530955, 96.79, 89.98, 0.727; GCF_002929225.1, s__Stutzerimonas stutzeri_U, 97.04, 89.97, 0.743; GCF_000935215.1, s__Stutzerimonas stutzeri_AD, 96.98, 89.84, 0.723; GCF_000495915.1, s__Stutzerimonas chloritidismutans, 96.78, 89.84, 0.717; GCA_018823765.1, s__Stutzerimonas sp018823765, 97.02, 89.77, 0.71; GCA_003488145.1, s__Stutzerimonas sp003488145, 96.59, 89.74, 0.666; GCF_000341615.1, s__Stutzerimonas stutzeri_G, 95.0, 89.32, 0.697; GCF_002890795.1, s__Stutzerimonas stutzeri_AA, 95.0, 89.28, 0.735; GCF_024448335.1, s__Stutzerimonas frequens, 95.0, 88.75, 0.714; GCF_015291885.1, s__Stutzerimonas stutzeri_AC, 95.0, 88.59, 0.662; GCF_000219605.1, s__Stutzerimonas stutzeri, 95.0, 88.34, 0.673; GCF_002909485.1, s__Stutzerimonas stutzeri_AH, 95.0, 88.19, 0.688; GCF_000307775.2, s__Stutzerimonas stutzeri_B, 95.0, 86.95, 0.549; GCF_002890915.1, s__Stutzerimonas stutzeri_AF, 95.0, 85.91, 0.472; GCF_025966695.1, s__Stutzerimonas sp025966695, 95.0, 85.39, 0.44; GCA_000263395.1, s__Stutzerimonas stutzeri_C, 95.0, 85.38, 0.392; GCF_024448505.1, s__Stutzerimonas degradans, 96.08, 85.26, 0.426; GCF_021432765.1, s__Stutzerimonas phenolilytica, 95.0, 85.11, 0.435; GCF_015070855.1, s__Stutzerimonas lopnurensis, 95.0, 85.07, 0.41; GCF_021726475.1, s__Stutzerimonas oligotrophica, 95.0, 84.76, 0.419; GCF_000818015.1, s__Stutzerimonas balearica, 95.0, 83.88, 0.33; GCF_003696285.1, s__Stutzerimonas nitrititolerans, 95.0, 82.51, 0.26; GCF_018138085.1, s__Stutzerimonas stutzeri_AI, 95.0, 82.03, 0.245; GCF_005876855.1, s__Stutzerimonas nosocomialis, 95.0, 81.99, 0.242; GCF_019090095.1, s__Stutzerimonas stutzeri_AN, 95.0, 81.91, 0.245; GCF_013522825.1, s__Stutzerimonas stutzeri_AK, 95.0, 81.91, 0.2; GCA_002339675.1, s__Stutzerimonas stutzeri_O, 95.0, 81.89, 0.253; GCF_019355055.1, s__Stutzerimonas sp004331835, 95.0, 81.83, 0.222; GCF_024448955.1, s__Stutzerimonas stutzeri_AQ, 95.0, 81.79, 0.232; GCF_024448695.1, s__Stutzerimonas stutzeri_T, 95.0, 81.71, 0.242; GCF_022810315.1, s__Stutzerimonas marianensis, 95.0, 81.66, 0.218; GCF_024448935.1, s__Stutzerimonas stutzeri_AO, 95.0, 81.57, 0.177; GCF_009789555.1, s__Stutzerimonas stutzeri_R, 95.0, 81.41, 0.22; GCA_022448005.1, s__Stutzerimonas sp022448005, 95.0, 81.35, 0.194; GCF_003325755.1, s__Stutzerimonas sp003325755, 95.0, 81.28, 0.206; GCA_004010935.1, s__Stutzerimonas sp004010935, 95.0, 81.27, 0.216; GCA_002387205.1, s__Stutzerimonas stutzeri_N, 95.0, 81.26, 0.195; GCF_024448985.1, s__Stutzerimonas stutzeri_AP, 95.0, 81.22, 0.203; GCF_000425625.1, s__Stutzerimonas azotifigens, 95.0, 81.18, 0.187; GCF_000756775.1, s__Stutzerimonas saudiphocaensis, 95.0, 81.13, 0.219; GCF_000952685.1, s__Stutzerimonas stutzeri_E, 95.0, 81.08, 0.216; GCF_000235745.1, s__Stutzerimonas stutzeri_H, 95.0, 81.07, 0.199; GCF_013522725.1, s__Stutzerimonas azotifigens_A, 95.0, 80.99, 0.203; GCA_900766265.1, s__Stutzerimonas sp900766265, 95.0, 80.94, 0.196; GCF_024448895.1, s__Stutzerimonas stutzeri_Q, 95.0, 80.91, 0.193; GCF_900129835.1, s__Stutzerimonas xanthomarina, 95.0, 80.89, 0.2; GCF_002890895.1, s__Stutzerimonas stutzeri_AB, 95.0, 80.87, 0.198; GCF_019880365.1, s__Stutzerimonas stutzeri_P, 95.0, 80.81, 0.187; GCA_002345575.1, s__Stutzerimonas stutzeri_S, 95.0, 80.74, 0.191; GCF_013620795.1, s__Stutzerimonas sp013620795, 95.0, 80.74, 0.187; GCF_024448715.1, s__Stutzerimonas stutzeri_AR, 95.0, 80.6, 0.18; GCF_000590475.1, s__Stutzerimonas stutzeri_D, 95.0, 80.4, 0.167; GCF_003696365.1, s__Stutzerimonas zhaodongensis, 95.0, 79.95, 0.152 97.72 11 N/A N/A +NC_021184.fasta d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae GCF_000233715.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae 100.0 1.0 GCF_000233715.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 95.73 11 N/A N/A diff -r 315c2ed31af1 -r 3a7f73d638ba test-data/gtdbtk.bac120.summary --- a/test-data/gtdbtk.bac120.summary Wed Jun 04 17:36:40 2025 +0000 +++ b/test-data/gtdbtk.bac120.summary Tue Jul 22 11:09:24 2025 +0000 @@ -1,23 +1,23 @@ user_genome classification closest_genome_reference closest_genome_reference_radius closest_genome_taxonomy closest_genome_ani closest_genome_af closest_placement_reference closest_placement_radius closest_placement_taxonomy closest_placement_ani closest_placement_af pplacer_taxonomy classification_method note other_related_references(genome_id,species_name,radius,ANI,AF) msa_percent translation_table red_value warnings -NC_000913.fasta d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli GCF_003697165.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli 96.74 0.856 GCF_000026225.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia fergusonii 91.54 0.56 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__ taxonomic classification defined by topology and ANI N/A GCF_000194175.1, s__Escherichia coli_F, 95.0, 95.46, 0.89; GCF_002965065.1, s__Escherichia sp002965065, 95.0, 94.52, 0.691; GCF_004211955.1, s__Escherichia sp004211955, 95.0, 93.12, 0.774; GCF_005843885.1, s__Escherichia sp005843885, 95.0, 92.76, 0.782; GCF_011881725.1, s__Escherichia coli_E, 95.0, 92.37, 0.807; GCF_029876145.1, s__Escherichia ruysiae, 95.0, 92.28, 0.788; GCF_014836715.1, s__Escherichia whittamii, 95.0, 91.78, 0.782; GCF_002900365.1, s__Escherichia marmotae, 95.0, 90.92, 0.738; GCF_000759775.1, s__Escherichia albertii, 95.0, 90.18, 0.68 98.47 11 N/A N/A -NC_002737.fasta d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes GCF_002055535.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes 99.7 0.968 GCF_002055535.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes 99.7 0.968 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900459225.1, s__Streptococcus dysgalactiae, 95.0, 88.16, 0.456; GCF_900636575.1, s__Streptococcus canis, 95.0, 86.81, 0.468 98.31 11 N/A N/A -NC_003450.fasta d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum GCF_000011325.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum 100.0 1.0 GCF_000011325.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002355155.1, s__Corynebacterium suranareeae, 95.0, 86.55, 0.531; GCF_001643015.1, s__Corynebacterium crudilactis, 95.0, 84.07, 0.363; GCF_001277995.1, s__Corynebacterium deserti, 95.0, 83.77, 0.275 96.43 11 N/A N/A -NC_008261.fasta d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens GCF_000013285.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens 100.0 1.0 GCF_000013285.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens 100.0 1.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_029258205.1, s__Sarcina sp029258205, 95.0, 92.75, 0.744; GCF_029267215.1, s__Sarcina sp029267215, 95.0, 84.03, 0.29 94.4 11 N/A N/A -NC_009012.fasta d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum GCF_000015865.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum 100.0 1.0 GCF_000015865.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum 100.0 1.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000521465.1, s__Hungateiclostridium straminisolvens, 95.0, 84.49, 0.429; GCF_004102745.1, s__Hungateiclostridium mesophilum, 95.0, 81.39, 0.257 94.22 11 N/A N/A -NC_012982.fasta d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica GCF_000023785.1 95.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica 100.0 1.0 GCF_000023785.1 95.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 98.25 11 N/A N/A -NC_014008.fasta d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis GCF_000025905.1 95.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis 100.0 1.0 GCF_000025905.1 95.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis 100.0 1.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 90.76 11 N/A N/A -NC_014168.fasta d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus GCF_000092825.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus 100.0 1.0 GCF_000092825.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000185725.2, s__Segniliparus rugosus, 95.0, 80.18, 0.178 96.56 11 N/A N/A -NC_014211.fasta d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei GCF_000092985.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei 100.0 1.0 GCF_000092985.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_012396365.1, s__Nocardiopsis alborubida, 95.0, 94.89, 0.822; GCA_000340945.1, s__Nocardiopsis synnemataformans, 95.0, 94.67, 0.838; GCA_002529455.1, s__Nocardiopsis fusca, 95.0, 93.48, 0.743; GCF_000341065.1, s__Nocardiopsis halotolerans, 95.0, 89.3, 0.575; GCF_001905145.1, s__Nocardiopsis sp001905145, 95.0, 89.08, 0.554; GCF_008638415.1, s__Nocardiopsis sinuspersici, 95.0, 88.61, 0.531; GCF_008638365.1, s__Nocardiopsis quinghaiensis, 95.0, 88.23, 0.498; GCF_009830945.1, s__Nocardiopsis sp009830945, 95.0, 87.99, 0.442; GCA_937957845.1, s__Nocardiopsis sp937957845, 95.0, 87.38, 0.277; GCF_026642255.1, s__Nocardiopsis nanhaiensis_A, 95.0, 86.06, 0.436; GCF_030271535.1, s__Nocardiopsis sp030271535, 95.0, 84.81, 0.338; GCF_013410755.1, s__Nocardiopsis aegyptia, 95.0, 84.69, 0.277; GCF_018316655.1, s__Nocardiopsis changdeensis, 95.0, 84.65, 0.301; GCF_000341125.1, s__Nocardiopsis lucentensis, 95.0, 84.45, 0.226; GCF_001279585.1, s__Nocardiopsis sp001279585, 95.0, 84.4, 0.267; GCA_018388625.1, s__Nocardiopsis eucommiae, 95.0, 84.39, 0.264; GCF_014201115.1, s__Nocardiopsis metallicus, 95.0, 84.3, 0.268; GCF_003634495.1, s__Nocardiopsis sp003634495, 95.0, 84.18, 0.288; GCF_030766825.1, s__Nocardiopsis sp030766825, 95.0, 84.18, 0.286; GCF_000341085.1, s__Nocardiopsis ganjiahuensis, 95.0, 84.16, 0.28; GCF_030555055.1, s__Nocardiopsis sp030555055, 95.0, 84.06, 0.267; GCF_024134545.1, s__Nocardiopsis exhalans, 95.08, 84.04, 0.267; GCF_900141985.1, s__Nocardiopsis flavescens, 95.0, 83.82, 0.272; GCF_003386285.1, s__Nocardiopsis sp003386285, 95.0, 83.73, 0.264; GCF_014651695.1, s__Nocardiopsis terrae, 95.0, 83.6, 0.27; GCF_014203695.1, s__Nocardiopsis algeriensis, 95.0, 83.59, 0.238; GCF_020741345.1, s__Nocardiopsis listeri_A, 95.0, 83.53, 0.202; GCF_000341225.1, s__Nocardiopsis alba, 95.0, 83.47, 0.217; GCF_018207095.1, s__Nocardiopsis sp018207095, 95.0, 83.32, 0.229; GCF_028882275.1, s__Nocardiopsis sp028882275, 95.0, 83.25, 0.266; GCF_900143625.1, s__Nocardiopsis sp900143625, 95.0, 83.09, 0.229; GCF_000515115.1, s__Nocardiopsis sp000515115, 95.0, 83.02, 0.181; GCF_014892575.1, s__Nocardiopsis coralli, 95.0, 82.85, 0.213; GCF_000341265.1, s__Nocardiopsis prasina, 95.0, 82.63, 0.245; GCF_001942255.1, s__Nocardiopsis sp001942255, 95.0, 82.61, 0.248; GCF_000341005.1, s__Nocardiopsis alkaliphila, 95.0, 82.53, 0.186; GCF_001570765.1, s__Nocardiopsis listeri, 95.0, 82.26, 0.198; GCF_000341025.1, s__Nocardiopsis salina, 95.0, 82.22, 0.158; GCF_000341145.1, s__Nocardiopsis xinjiangensis, 95.0, 81.14, 0.168 26.24 11 N/A N/A -NC_014212.fasta d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus GCF_000092125.1 95.0 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus 100.0 0.999 GCF_000092125.1 95.0 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus 100.0 0.999 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_003226535.1, s__Allomeiothermus sp003226535, 95.0, 87.73, 0.485 93.82 11 N/A N/A -NC_014363.fasta d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli GCF_000143845.1 95.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli 100.0 1.0 GCF_000143845.1 95.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 95.47 11 N/A N/A -NC_014364.fasta d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae GCF_000143985.1 95.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae 100.0 1.0 GCF_000143985.1 95.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae 100.0 1.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000378205.1, s__Sediminispirochaeta bajacaliforniensis, 95.0, 94.41, 0.82 88.42 11 N/A N/A -NC_015761.fasta d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori GCF_000252995.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori 100.0 1.0 GCF_000252995.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000006945.2, s__Salmonella enterica, 95.0, 90.34, 0.758; GCA_900478215.1, s__Salmonella houtenae, 95.0, 90.05, 0.755; GCF_008692785.1, s__Salmonella diarizonae, 95.0, 89.91, 0.746; GCF_008692845.1, s__Salmonella arizonae, 95.0, 89.51, 0.704 97.58 11 N/A N/A -NC_017033.fasta d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia GCF_000242255.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia 100.0 1.0 GCF_000242255.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 98.45 11 N/A N/A -NC_017095.fasta d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans GCF_000235405.2 95.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans 100.0 1.0 GCF_000235405.2 95.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans 100.0 1.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_001644665.1, s__Fervidobacterium pennivorans_A, 95.0, 92.25, 0.827 90.82 11 N/A N/A -NC_018014.fasta d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus GCF_000265425.1 95.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus 100.0 1.0 GCF_000265425.1 95.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus 100.0 1.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900105625.1, s__Terriglobus roseus_B, 95.0, 82.2, 0.254 94.06 11 N/A N/A -NC_018068.fasta d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus GCF_000255115.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus 100.0 1.0 GCF_000255115.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002196705.1, s__Desulfosporosinus sp002196705, 95.0, 81.15, 0.156 93.92 11 N/A N/A -NC_018515.fasta d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei GCF_000231385.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei 100.0 1.0 GCF_000231385.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900100785.1, s__Desulfosporosinus hippei, 95.0, 94.11, 0.765; GCF_000765145.1, s__Desulfosporosinus sp000765145, 95.0, 92.48, 0.692 94.5 11 N/A N/A -NC_019897.fasta d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus GCF_907165215.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus 97.81 0.815 GCF_907165215.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus 97.81 0.815 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002159085.1, s__Thermobacillus sp002159085, 95.0, 85.65, 0.466 95.97 11 N/A N/A -NC_019904.fasta d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis GCF_000325705.1 95.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis 100.0 1.0 GCF_000325705.1 95.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis 100.0 1.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_005281475.1, s__Echinicola rosea, 95.0, 83.28, 0.288; GCF_006575665.1, s__Echinicola soli, 95.0, 81.29, 0.203; GCF_003260975.1, s__Echinicola strongylocentroti, 95.0, 80.38, 0.164 96.96 11 N/A N/A -NC_019936.fasta d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE GCF_000327065.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE 100.0 1.0 GCF_000327065.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_003696315.1, s__Stutzerimonas songnenensis, 95.0, 93.2, 0.909; GCF_000661915.1, s__Stutzerimonas decontaminans, 95.0, 91.33, 0.803; GCF_900114065.1, s__Stutzerimonas kunmingensis, 96.6, 90.17, 0.753; GCF_024397475.1, s__Stutzerimonas xanthomarina_A, 97.0, 90.15, 0.746; GCA_007713455.1, s__Stutzerimonas sp007713455, 95.0, 90.03, 0.748; GCF_014764705.1, s__Stutzerimonas sp002692525, 96.82, 90.01, 0.736; GCA_003530955.1, s__Stutzerimonas sp003530955, 96.79, 89.98, 0.727; GCF_002929225.1, s__Stutzerimonas stutzeri_U, 97.04, 89.97, 0.743; GCF_000935215.1, s__Stutzerimonas stutzeri_AD, 96.98, 89.84, 0.723; GCF_000495915.1, s__Stutzerimonas chloritidismutans, 96.78, 89.84, 0.717; GCA_018823765.1, s__Stutzerimonas sp018823765, 97.02, 89.77, 0.71; GCA_003488145.1, s__Stutzerimonas sp003488145, 96.59, 89.74, 0.666; GCF_000341615.1, s__Stutzerimonas stutzeri_G, 95.0, 89.32, 0.697; GCF_002890795.1, s__Stutzerimonas stutzeri_AA, 95.0, 89.28, 0.735; GCF_024448335.1, s__Stutzerimonas frequens, 95.0, 88.75, 0.714; GCF_015291885.1, s__Stutzerimonas stutzeri_AC, 95.0, 88.59, 0.662; GCF_000219605.1, s__Stutzerimonas stutzeri, 95.0, 88.34, 0.673; GCF_002909485.1, s__Stutzerimonas stutzeri_AH, 95.0, 88.19, 0.688; GCF_000307775.2, s__Stutzerimonas stutzeri_B, 95.0, 86.95, 0.549; GCF_002890915.1, s__Stutzerimonas stutzeri_AF, 95.0, 85.91, 0.472; GCF_025966695.1, s__Stutzerimonas sp025966695, 95.0, 85.39, 0.44; GCA_000263395.1, s__Stutzerimonas stutzeri_C, 95.0, 85.38, 0.392; GCF_024448505.1, s__Stutzerimonas degradans, 96.08, 85.26, 0.426; GCF_021432765.1, s__Stutzerimonas phenolilytica, 95.0, 85.11, 0.435; GCF_015070855.1, s__Stutzerimonas lopnurensis, 95.0, 85.07, 0.41; GCF_021726475.1, s__Stutzerimonas oligotrophica, 95.0, 84.76, 0.419; GCF_000818015.1, s__Stutzerimonas balearica, 95.0, 83.88, 0.33; GCF_003696285.1, s__Stutzerimonas nitrititolerans, 95.0, 82.51, 0.26; GCF_018138085.1, s__Stutzerimonas stutzeri_AI, 95.0, 82.03, 0.245; GCF_005876855.1, s__Stutzerimonas nosocomialis, 95.0, 81.99, 0.242; GCF_019090095.1, s__Stutzerimonas stutzeri_AN, 95.0, 81.91, 0.245; GCF_013522825.1, s__Stutzerimonas stutzeri_AK, 95.0, 81.91, 0.2; GCA_002339675.1, s__Stutzerimonas stutzeri_O, 95.0, 81.89, 0.253; GCF_019355055.1, s__Stutzerimonas sp004331835, 95.0, 81.83, 0.222; GCF_024448955.1, s__Stutzerimonas stutzeri_AQ, 95.0, 81.79, 0.232; GCF_024448695.1, s__Stutzerimonas stutzeri_T, 95.0, 81.71, 0.242; GCF_022810315.1, s__Stutzerimonas marianensis, 95.0, 81.66, 0.218; GCF_024448935.1, s__Stutzerimonas stutzeri_AO, 95.0, 81.57, 0.177; GCF_009789555.1, s__Stutzerimonas stutzeri_R, 95.0, 81.41, 0.22; GCA_022448005.1, s__Stutzerimonas sp022448005, 95.0, 81.35, 0.194; GCF_003325755.1, s__Stutzerimonas sp003325755, 95.0, 81.28, 0.206; GCA_004010935.1, s__Stutzerimonas sp004010935, 95.0, 81.27, 0.216; GCA_002387205.1, s__Stutzerimonas stutzeri_N, 95.0, 81.26, 0.195; GCF_024448985.1, s__Stutzerimonas stutzeri_AP, 95.0, 81.22, 0.203; GCF_000425625.1, s__Stutzerimonas azotifigens, 95.0, 81.18, 0.187; GCF_000756775.1, s__Stutzerimonas saudiphocaensis, 95.0, 81.13, 0.219; GCF_000952685.1, s__Stutzerimonas stutzeri_E, 95.0, 81.08, 0.216; GCF_000235745.1, s__Stutzerimonas stutzeri_H, 95.0, 81.07, 0.199; GCF_013522725.1, s__Stutzerimonas azotifigens_A, 95.0, 80.99, 0.203; GCA_900766265.1, s__Stutzerimonas sp900766265, 95.0, 80.94, 0.196; GCF_024448895.1, s__Stutzerimonas stutzeri_Q, 95.0, 80.91, 0.193; GCF_900129835.1, s__Stutzerimonas xanthomarina, 95.0, 80.89, 0.2; GCF_002890895.1, s__Stutzerimonas stutzeri_AB, 95.0, 80.87, 0.198; GCF_019880365.1, s__Stutzerimonas stutzeri_P, 95.0, 80.81, 0.187; GCA_002345575.1, s__Stutzerimonas stutzeri_S, 95.0, 80.74, 0.191; GCF_013620795.1, s__Stutzerimonas sp013620795, 95.0, 80.74, 0.187; GCF_024448715.1, s__Stutzerimonas stutzeri_AR, 95.0, 80.6, 0.18; GCF_000590475.1, s__Stutzerimonas stutzeri_D, 95.0, 80.4, 0.167; GCF_003696365.1, s__Stutzerimonas zhaodongensis, 95.0, 79.95, 0.152 97.72 11 N/A N/A -NC_021184.fasta d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae GCF_000233715.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae 100.0 1.0 GCF_000233715.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 95.73 11 N/A N/A +NC_000913 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli GCF_003697165.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli 96.74 0.856 GCF_000026225.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia fergusonii 91.54 0.56 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__ taxonomic classification defined by topology and ANI N/A GCF_000194175.1, s__Escherichia coli_F, 95.0, 95.46, 0.89; GCF_002965065.1, s__Escherichia sp002965065, 95.0, 94.52, 0.691; GCF_004211955.1, s__Escherichia sp004211955, 95.0, 93.12, 0.774; GCF_005843885.1, s__Escherichia sp005843885, 95.0, 92.76, 0.782; GCF_011881725.1, s__Escherichia coli_E, 95.0, 92.37, 0.807; GCF_029876145.1, s__Escherichia ruysiae, 95.0, 92.28, 0.788; GCF_014836715.1, s__Escherichia whittamii, 95.0, 91.78, 0.782; GCF_002900365.1, s__Escherichia marmotae, 95.0, 90.92, 0.738; GCF_000759775.1, s__Escherichia albertii, 95.0, 90.18, 0.68 98.47 11 N/A N/A +NC_002737 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes GCF_002055535.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes 99.7 0.968 GCF_002055535.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__Streptococcus pyogenes 99.7 0.968 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900459225.1, s__Streptococcus dysgalactiae, 95.0, 88.16, 0.456; GCF_900636575.1, s__Streptococcus canis, 95.0, 86.81, 0.468 98.31 11 N/A N/A +NC_003450 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum GCF_000011325.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum 100.0 1.0 GCF_000011325.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium glutamicum 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002355155.1, s__Corynebacterium suranareeae, 95.0, 86.55, 0.531; GCF_001643015.1, s__Corynebacterium crudilactis, 95.0, 84.07, 0.363; GCF_001277995.1, s__Corynebacterium deserti, 95.0, 83.77, 0.275 96.43 11 N/A N/A +NC_008261 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens GCF_000013285.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens 100.0 1.0 GCF_000013285.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__Sarcina perfringens 100.0 1.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Sarcina;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_029258205.1, s__Sarcina sp029258205, 95.0, 92.75, 0.744; GCF_029267215.1, s__Sarcina sp029267215, 95.0, 84.03, 0.29 94.4 11 N/A N/A +NC_009012 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum GCF_000015865.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum 100.0 1.0 GCF_000015865.1 95.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__Hungateiclostridium thermocellum 100.0 1.0 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Acetivibrionales;f__Acetivibrionaceae;g__Hungateiclostridium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000521465.1, s__Hungateiclostridium straminisolvens, 95.0, 84.49, 0.429; GCF_004102745.1, s__Hungateiclostridium mesophilum, 95.0, 81.39, 0.257 94.22 11 N/A N/A +NC_012982 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica GCF_000023785.1 95.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica 100.0 1.0 GCF_000023785.1 95.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__Hirschia baltica 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;o__Caulobacterales;f__Hyphomonadaceae;g__Hirschia;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 98.25 11 N/A N/A +NC_014008 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis GCF_000025905.1 95.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis 100.0 1.0 GCF_000025905.1 95.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__Coraliomargarita;s__Coraliomargarita akajimensis 100.0 1.0 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Opitutales;f__Coraliomargaritaceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 90.76 11 N/A N/A +NC_014168 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus GCF_000092825.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus 100.0 1.0 GCF_000092825.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__Segniliparus rotundus 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Mycobacteriales;f__Mycobacteriaceae;g__Segniliparus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000185725.2, s__Segniliparus rugosus, 95.0, 80.18, 0.178 96.56 11 N/A N/A +NC_014211 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei GCF_000092985.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei 100.0 1.0 GCF_000092985.1 95.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__Nocardiopsis dassonvillei 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Streptosporangiales;f__Streptosporangiaceae;g__Nocardiopsis;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_012396365.1, s__Nocardiopsis alborubida, 95.0, 94.89, 0.822; GCA_000340945.1, s__Nocardiopsis synnemataformans, 95.0, 94.67, 0.838; GCA_002529455.1, s__Nocardiopsis fusca, 95.0, 93.48, 0.743; GCF_000341065.1, s__Nocardiopsis halotolerans, 95.0, 89.3, 0.575; GCF_001905145.1, s__Nocardiopsis sp001905145, 95.0, 89.08, 0.554; GCF_008638415.1, s__Nocardiopsis sinuspersici, 95.0, 88.61, 0.531; GCF_008638365.1, s__Nocardiopsis quinghaiensis, 95.0, 88.23, 0.498; GCF_009830945.1, s__Nocardiopsis sp009830945, 95.0, 87.99, 0.442; GCA_937957845.1, s__Nocardiopsis sp937957845, 95.0, 87.38, 0.277; GCF_026642255.1, s__Nocardiopsis nanhaiensis_A, 95.0, 86.06, 0.436; GCF_030271535.1, s__Nocardiopsis sp030271535, 95.0, 84.81, 0.338; GCF_013410755.1, s__Nocardiopsis aegyptia, 95.0, 84.69, 0.277; GCF_018316655.1, s__Nocardiopsis changdeensis, 95.0, 84.65, 0.301; GCF_000341125.1, s__Nocardiopsis lucentensis, 95.0, 84.45, 0.226; GCF_001279585.1, s__Nocardiopsis sp001279585, 95.0, 84.4, 0.267; GCA_018388625.1, s__Nocardiopsis eucommiae, 95.0, 84.39, 0.264; GCF_014201115.1, s__Nocardiopsis metallicus, 95.0, 84.3, 0.268; GCF_003634495.1, s__Nocardiopsis sp003634495, 95.0, 84.18, 0.288; GCF_030766825.1, s__Nocardiopsis sp030766825, 95.0, 84.18, 0.286; GCF_000341085.1, s__Nocardiopsis ganjiahuensis, 95.0, 84.16, 0.28; GCF_030555055.1, s__Nocardiopsis sp030555055, 95.0, 84.06, 0.267; GCF_024134545.1, s__Nocardiopsis exhalans, 95.08, 84.04, 0.267; GCF_900141985.1, s__Nocardiopsis flavescens, 95.0, 83.82, 0.272; GCF_003386285.1, s__Nocardiopsis sp003386285, 95.0, 83.73, 0.264; GCF_014651695.1, s__Nocardiopsis terrae, 95.0, 83.6, 0.27; GCF_014203695.1, s__Nocardiopsis algeriensis, 95.0, 83.59, 0.238; GCF_020741345.1, s__Nocardiopsis listeri_A, 95.0, 83.53, 0.202; GCF_000341225.1, s__Nocardiopsis alba, 95.0, 83.47, 0.217; GCF_018207095.1, s__Nocardiopsis sp018207095, 95.0, 83.32, 0.229; GCF_028882275.1, s__Nocardiopsis sp028882275, 95.0, 83.25, 0.266; GCF_900143625.1, s__Nocardiopsis sp900143625, 95.0, 83.09, 0.229; GCF_000515115.1, s__Nocardiopsis sp000515115, 95.0, 83.02, 0.181; GCF_014892575.1, s__Nocardiopsis coralli, 95.0, 82.85, 0.213; GCF_000341265.1, s__Nocardiopsis prasina, 95.0, 82.63, 0.245; GCF_001942255.1, s__Nocardiopsis sp001942255, 95.0, 82.61, 0.248; GCF_000341005.1, s__Nocardiopsis alkaliphila, 95.0, 82.53, 0.186; GCF_001570765.1, s__Nocardiopsis listeri, 95.0, 82.26, 0.198; GCF_000341025.1, s__Nocardiopsis salina, 95.0, 82.22, 0.158; GCF_000341145.1, s__Nocardiopsis xinjiangensis, 95.0, 81.14, 0.168 26.24 11 N/A N/A +NC_014212 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus GCF_000092125.1 95.0 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus 100.0 0.999 GCF_000092125.1 95.0 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__Allomeiothermus silvanus 100.0 0.999 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Allomeiothermus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_003226535.1, s__Allomeiothermus sp003226535, 95.0, 87.73, 0.485 93.82 11 N/A N/A +NC_014363 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli GCF_000143845.1 95.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli 100.0 1.0 GCF_000143845.1 95.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__Olsenella;s__Olsenella uli 100.0 1.0 d__Bacteria;p__Actinomycetota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 95.47 11 N/A N/A +NC_014364 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae GCF_000143985.1 95.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae 100.0 1.0 GCF_000143985.1 95.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__Sediminispirochaeta smaragdinae 100.0 1.0 d__Bacteria;p__Spirochaetota;c__Spirochaetia;o__DSM-16054;f__Sediminispirochaetaceae;g__Sediminispirochaeta;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000378205.1, s__Sediminispirochaeta bajacaliforniensis, 95.0, 94.41, 0.82 88.42 11 N/A N/A +NC_015761 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori GCF_000252995.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori 100.0 1.0 GCF_000252995.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_000006945.2, s__Salmonella enterica, 95.0, 90.34, 0.758; GCA_900478215.1, s__Salmonella houtenae, 95.0, 90.05, 0.755; GCF_008692785.1, s__Salmonella diarizonae, 95.0, 89.91, 0.746; GCF_008692845.1, s__Salmonella arizonae, 95.0, 89.51, 0.704 97.58 11 N/A N/A +NC_017033 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia GCF_000242255.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia 100.0 1.0 GCF_000242255.2 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__Frateuria;s__Frateuria aurantia 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Xanthomonadales;f__Rhodanobacteraceae;g__;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 98.45 11 N/A N/A +NC_017095 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans GCF_000235405.2 95.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans 100.0 1.0 GCF_000235405.2 95.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__Fervidobacterium pennivorans 100.0 1.0 d__Bacteria;p__Thermotogota;c__Thermotogae;o__Thermotogales;f__Fervidobacteriaceae;g__Fervidobacterium;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_001644665.1, s__Fervidobacterium pennivorans_A, 95.0, 92.25, 0.827 90.82 11 N/A N/A +NC_018014 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus GCF_000265425.1 95.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus 100.0 1.0 GCF_000265425.1 95.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__Terriglobus roseus 100.0 1.0 d__Bacteria;p__Acidobacteriota;c__Terriglobia;o__Terriglobales;f__Acidobacteriaceae;g__Terriglobus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900105625.1, s__Terriglobus roseus_B, 95.0, 82.2, 0.254 94.06 11 N/A N/A +NC_018068 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus GCF_000255115.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus 100.0 1.0 GCF_000255115.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus acidiphilus 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002196705.1, s__Desulfosporosinus sp002196705, 95.0, 81.15, 0.156 93.92 11 N/A N/A +NC_018515 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei GCF_000231385.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei 100.0 1.0 GCF_000231385.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__Desulfosporosinus meridiei 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfitobacteriia;o__Desulfitobacteriales;f__Desulfitobacteriaceae;g__Desulfosporosinus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_900100785.1, s__Desulfosporosinus hippei, 95.0, 94.11, 0.765; GCF_000765145.1, s__Desulfosporosinus sp000765145, 95.0, 92.48, 0.692 94.5 11 N/A N/A +NC_019897 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus GCF_907165215.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus 97.81 0.815 GCF_907165215.1 95.0 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__Thermobacillus xylanilyticus 97.81 0.815 d__Bacteria;p__Bacillota;c__Bacilli;o__Paenibacillales;f__Paenibacillaceae;g__Thermobacillus;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_002159085.1, s__Thermobacillus sp002159085, 95.0, 85.65, 0.466 95.97 11 N/A N/A +NC_019904 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis GCF_000325705.1 95.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis 100.0 1.0 GCF_000325705.1 95.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__Echinicola vietnamensis 100.0 1.0 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Cyclobacteriaceae;g__Echinicola;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_005281475.1, s__Echinicola rosea, 95.0, 83.28, 0.288; GCF_006575665.1, s__Echinicola soli, 95.0, 81.29, 0.203; GCF_003260975.1, s__Echinicola strongylocentroti, 95.0, 80.38, 0.164 96.96 11 N/A N/A +NC_019936 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE GCF_000327065.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE 100.0 1.0 GCF_000327065.1 95.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__Stutzerimonas stutzeri_AE 100.0 1.0 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Stutzerimonas;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments GCF_003696315.1, s__Stutzerimonas songnenensis, 95.0, 93.2, 0.909; GCF_000661915.1, s__Stutzerimonas decontaminans, 95.0, 91.33, 0.803; GCF_900114065.1, s__Stutzerimonas kunmingensis, 96.6, 90.17, 0.753; GCF_024397475.1, s__Stutzerimonas xanthomarina_A, 97.0, 90.15, 0.746; GCA_007713455.1, s__Stutzerimonas sp007713455, 95.0, 90.03, 0.748; GCF_014764705.1, s__Stutzerimonas sp002692525, 96.82, 90.01, 0.736; GCA_003530955.1, s__Stutzerimonas sp003530955, 96.79, 89.98, 0.727; GCF_002929225.1, s__Stutzerimonas stutzeri_U, 97.04, 89.97, 0.743; GCF_000935215.1, s__Stutzerimonas stutzeri_AD, 96.98, 89.84, 0.723; GCF_000495915.1, s__Stutzerimonas chloritidismutans, 96.78, 89.84, 0.717; GCA_018823765.1, s__Stutzerimonas sp018823765, 97.02, 89.77, 0.71; GCA_003488145.1, s__Stutzerimonas sp003488145, 96.59, 89.74, 0.666; GCF_000341615.1, s__Stutzerimonas stutzeri_G, 95.0, 89.32, 0.697; GCF_002890795.1, s__Stutzerimonas stutzeri_AA, 95.0, 89.28, 0.735; GCF_024448335.1, s__Stutzerimonas frequens, 95.0, 88.75, 0.714; GCF_015291885.1, s__Stutzerimonas stutzeri_AC, 95.0, 88.59, 0.662; GCF_000219605.1, s__Stutzerimonas stutzeri, 95.0, 88.34, 0.673; GCF_002909485.1, s__Stutzerimonas stutzeri_AH, 95.0, 88.19, 0.688; GCF_000307775.2, s__Stutzerimonas stutzeri_B, 95.0, 86.95, 0.549; GCF_002890915.1, s__Stutzerimonas stutzeri_AF, 95.0, 85.91, 0.472; GCF_025966695.1, s__Stutzerimonas sp025966695, 95.0, 85.39, 0.44; GCA_000263395.1, s__Stutzerimonas stutzeri_C, 95.0, 85.38, 0.392; GCF_024448505.1, s__Stutzerimonas degradans, 96.08, 85.26, 0.426; GCF_021432765.1, s__Stutzerimonas phenolilytica, 95.0, 85.11, 0.435; GCF_015070855.1, s__Stutzerimonas lopnurensis, 95.0, 85.07, 0.41; GCF_021726475.1, s__Stutzerimonas oligotrophica, 95.0, 84.76, 0.419; GCF_000818015.1, s__Stutzerimonas balearica, 95.0, 83.88, 0.33; GCF_003696285.1, s__Stutzerimonas nitrititolerans, 95.0, 82.51, 0.26; GCF_018138085.1, s__Stutzerimonas stutzeri_AI, 95.0, 82.03, 0.245; GCF_005876855.1, s__Stutzerimonas nosocomialis, 95.0, 81.99, 0.242; GCF_019090095.1, s__Stutzerimonas stutzeri_AN, 95.0, 81.91, 0.245; GCF_013522825.1, s__Stutzerimonas stutzeri_AK, 95.0, 81.91, 0.2; GCA_002339675.1, s__Stutzerimonas stutzeri_O, 95.0, 81.89, 0.253; GCF_019355055.1, s__Stutzerimonas sp004331835, 95.0, 81.83, 0.222; GCF_024448955.1, s__Stutzerimonas stutzeri_AQ, 95.0, 81.79, 0.232; GCF_024448695.1, s__Stutzerimonas stutzeri_T, 95.0, 81.71, 0.242; GCF_022810315.1, s__Stutzerimonas marianensis, 95.0, 81.66, 0.218; GCF_024448935.1, s__Stutzerimonas stutzeri_AO, 95.0, 81.57, 0.177; GCF_009789555.1, s__Stutzerimonas stutzeri_R, 95.0, 81.41, 0.22; GCA_022448005.1, s__Stutzerimonas sp022448005, 95.0, 81.35, 0.194; GCF_003325755.1, s__Stutzerimonas sp003325755, 95.0, 81.28, 0.206; GCA_004010935.1, s__Stutzerimonas sp004010935, 95.0, 81.27, 0.216; GCA_002387205.1, s__Stutzerimonas stutzeri_N, 95.0, 81.26, 0.195; GCF_024448985.1, s__Stutzerimonas stutzeri_AP, 95.0, 81.22, 0.203; GCF_000425625.1, s__Stutzerimonas azotifigens, 95.0, 81.18, 0.187; GCF_000756775.1, s__Stutzerimonas saudiphocaensis, 95.0, 81.13, 0.219; GCF_000952685.1, s__Stutzerimonas stutzeri_E, 95.0, 81.08, 0.216; GCF_000235745.1, s__Stutzerimonas stutzeri_H, 95.0, 81.07, 0.199; GCF_013522725.1, s__Stutzerimonas azotifigens_A, 95.0, 80.99, 0.203; GCA_900766265.1, s__Stutzerimonas sp900766265, 95.0, 80.94, 0.196; GCF_024448895.1, s__Stutzerimonas stutzeri_Q, 95.0, 80.91, 0.193; GCF_900129835.1, s__Stutzerimonas xanthomarina, 95.0, 80.89, 0.2; GCF_002890895.1, s__Stutzerimonas stutzeri_AB, 95.0, 80.87, 0.198; GCF_019880365.1, s__Stutzerimonas stutzeri_P, 95.0, 80.81, 0.187; GCA_002345575.1, s__Stutzerimonas stutzeri_S, 95.0, 80.74, 0.191; GCF_013620795.1, s__Stutzerimonas sp013620795, 95.0, 80.74, 0.187; GCF_024448715.1, s__Stutzerimonas stutzeri_AR, 95.0, 80.6, 0.18; GCF_000590475.1, s__Stutzerimonas stutzeri_D, 95.0, 80.4, 0.167; GCF_003696365.1, s__Stutzerimonas zhaodongensis, 95.0, 79.95, 0.152 97.72 11 N/A N/A +NC_021184 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae GCF_000233715.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae 100.0 1.0 GCF_000233715.2 95.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__Sporotomaculum gibsoniae 100.0 1.0 d__Bacteria;p__Bacillota_B;c__Desulfotomaculia;o__Desulfotomaculales;f__Desulfallaceae;g__Sporotomaculum;s__ taxonomic classification defined by topology and ANI topological placement and ANI have congruent species assignments N/A 95.73 11 N/A N/A