# HG changeset patch # User onnodg # Date 1761033261 0 # Node ID 706b7acdb23078e366b5ff6475be8b3356bee5e4 # Parent ff68835adb2ba2f786c9e2ab28eb464dc4ab6f34 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_clusters_tool commit c2020ecc91cea0c8cf7439180cf796743c838b4d-dirty diff -r ff68835adb2b -r 706b7acdb230 README.md --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.md Tue Oct 21 07:54:21 2025 +0000 @@ -0,0 +1,14 @@ +This script processes cluster output files from cd-hit-est for use in Galaxy. +It extracts cluster information, associates taxa and e-values from annotation files, +performs statistical calculations, and generates text and plot outputs +summarizing similarity and taxonomic distributions. + + +Main steps: +1. Parse cd-hit-est cluster file and (optional) annotation file. +2. Process each cluster to extract similarity, taxa, and e-value information. +3. Aggregate results across clusters. +4. Generate requested outputs: text summaries, plots, and Excel reports. + + +Note: Uses a non-interactive matplotlib backend (Agg) for compatibility with Galaxy. diff -r ff68835adb2b -r 706b7acdb230 cdhit_analysis.py --- a/cdhit_analysis.py Mon Oct 20 12:27:31 2025 +0000 +++ b/cdhit_analysis.py Tue Oct 21 07:54:21 2025 +0000 @@ -1,14 +1,3 @@ -import argparse -import os -import re -from collections import Counter, defaultdict -from math import sqrt -import pandas as pd -import matplotlib - -matplotlib.use('Agg') # Non-interactive backend for Galaxy -import matplotlib.pyplot as plt - """ This script processes cluster output files from cd-hit-est for use in Galaxy. It extracts cluster information, associates taxa and e-values from annotation files, @@ -26,6 +15,16 @@ Note: Uses a non-interactive matplotlib backend (Agg) for compatibility with Galaxy. """ +import argparse +from collections import Counter, defaultdict +import os +import re +import matplotlib.pyplot as plt +import pandas as pd +from math import sqrt +import openpyxl + + def parse_arguments(args_list=None): """Parse command-line arguments for the script.""" diff -r ff68835adb2b -r 706b7acdb230 cdhit_analysis.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/cdhit_analysis.sh Tue Oct 21 07:54:21 2025 +0000 @@ -0,0 +1,13 @@ +#!/bin/bash + +SCRIPTDIR=$(dirname "$(readlink -f "$0")") +python $SCRIPTDIR"/cdhit_analysis.py" "$@" + +# sanity check +printf "Conda env: %s\n" "$CONDA_DEFAULT_ENV" +printf "Python version: %s\n" "$(python --version | awk '{print $2}')" +printf "Matplotlib version: %s\n" "$(python -c 'import matplotlib; print(matplotlib.__version__)')" +printf "Pandas version: %s\n" "$(python -c 'import pandas; print(pandas.__version__)')" +printf "Openpyxl version: %s\n" "$(python -c 'import openpyxl; print(openpyxl.__version__)')" +printf "Bash version: %s\n" "${BASH_VERSION}" +printf "SCRIPTDIR: %s\n\n" "$SCRIPTDIR" \ No newline at end of file diff -r ff68835adb2b -r 706b7acdb230 cdhit_analysis.xml --- a/cdhit_analysis.xml Mon Oct 20 12:27:31 2025 +0000 +++ b/cdhit_analysis.xml Tue Oct 21 07:54:21 2025 +0000 @@ -9,7 +9,7 @@