annotate pytorch_embedding.py @ 0:38333676a029 draft default tip

planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
author goeckslab
date Thu, 19 Jun 2025 23:33:23 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
1 """
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
2 This module provides functionality to extract image embeddings
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
3 using a specified
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
4 pretrained model from the torchvision library. It includes functions to:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
5 - List image files directly from a ZIP file without extraction.
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
6 - Apply model-specific preprocessing and transformations.
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
7 - Extract embeddings using various models.
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
8 - Save the resulting embeddings into a CSV file.
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
9 Modules required:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
10 - argparse: For command-line argument parsing.
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
11 - os, csv, zipfile: For file handling (ZIP file reading, CSV writing).
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
12 - inspect: For inspecting function signatures and models.
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
13 - torch, torchvision: For loading and using pretrained models
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
14 to extract embeddings.
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
15 - PIL, cv2: For image processing tasks such as resizing, normalization,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
16 and conversion.
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
17 """
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
18
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
19 import argparse
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
20 import csv
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
21 import inspect
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
22 import logging
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
23 import os
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
24 import zipfile
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
25 from inspect import signature
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
26
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
27 import cv2
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
28 import numpy as np
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
29 import torch
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
30 import torchvision.models as models
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
31 from PIL import Image
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
32 from torch.utils.data import DataLoader, Dataset
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
33 from torchvision import transforms
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
34
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
35 # Configure logging
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
36 logging.basicConfig(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
37 filename="/tmp/ludwig_embeddings.log",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
38 filemode="a",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
39 format="%(asctime)s - %(levelname)s - %(message)s",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
40 level=logging.DEBUG,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
41 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
42
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
43 # Create a cache directory in the current working directory
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
44 cache_dir = os.path.join(os.getcwd(), 'hf_cache')
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
45 try:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
46 os.makedirs(cache_dir, exist_ok=True)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
47 logging.info(f"Cache directory created: {cache_dir}, writable: {os.access(cache_dir, os.W_OK)}")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
48 except OSError as e:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
49 logging.error(f"Failed to create cache directory {cache_dir}: {e}")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
50 raise
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
51
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
52 # Available models from torchvision
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
53 AVAILABLE_MODELS = {
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
54 name: getattr(models, name)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
55 for name in dir(models)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
56 if callable(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
57 getattr(models, name)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
58 ) and "weights" in signature(getattr(models, name)).parameters
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
59 }
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
60
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
61 # Default resize and normalization settings for models
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
62 MODEL_DEFAULTS = {
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
63 "default": {"resize": (224, 224), "normalize": (
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
64 [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
65 )},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
66 "efficientnet_b1": {"resize": (240, 240)},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
67 "efficientnet_b2": {"resize": (260, 260)},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
68 "efficientnet_b3": {"resize": (300, 300)},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
69 "efficientnet_b4": {"resize": (380, 380)},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
70 "efficientnet_b5": {"resize": (456, 456)},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
71 "efficientnet_b6": {"resize": (528, 528)},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
72 "efficientnet_b7": {"resize": (600, 600)},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
73 "inception_v3": {"resize": (299, 299)},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
74 "swin_b": {"resize": (224, 224), "normalize": (
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
75 [0.5, 0.0, 0.5], [0.5, 0.5, 0.5]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
76 )},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
77 "swin_s": {"resize": (224, 224), "normalize": (
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
78 [0.5, 0.0, 0.5], [0.5, 0.5, 0.5]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
79 )},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
80 "swin_t": {"resize": (224, 224), "normalize": (
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
81 [0.5, 0.0, 0.5], [0.5, 0.5, 0.5]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
82 )},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
83 "vit_b_16": {"resize": (224, 224), "normalize": (
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
84 [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
85 )},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
86 "vit_b_32": {"resize": (224, 224), "normalize": (
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
87 [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
88 )},
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
89 }
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
90
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
91 for model, settings in MODEL_DEFAULTS.items():
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
92 if "normalize" not in settings:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
93 settings["normalize"] = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
94
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
95
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
96 # Custom transform classes
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
97 class CLAHETransform:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
98 def __init__(self, clip_limit=2.0, tile_grid_size=(8, 8)):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
99 self.clahe = cv2.createCLAHE(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
100 clipLimit=clip_limit,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
101 tileGridSize=tile_grid_size
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
102 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
103
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
104 def __call__(self, img):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
105 img = np.array(img.convert("L"))
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
106 img = self.clahe.apply(img)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
107 return Image.fromarray(img).convert("RGB")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
108
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
109
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
110 class CannyTransform:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
111 def __init__(self, threshold1=100, threshold2=200):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
112 self.threshold1 = threshold1
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
113 self.threshold2 = threshold2
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
114
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
115 def __call__(self, img):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
116 img = np.array(img.convert("L"))
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
117 edges = cv2.Canny(img, self.threshold1, self.threshold2)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
118 return Image.fromarray(edges).convert("RGB")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
119
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
120
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
121 class RGBAtoRGBTransform:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
122 def __call__(self, img):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
123 if img.mode == "RGBA":
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
124 background = Image.new("RGBA", img.size, (255, 255, 255, 255))
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
125 img = Image.alpha_composite(background, img).convert("RGB")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
126 else:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
127 img = img.convert("RGB")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
128 return img
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
129
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
130
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
131 def get_image_files_from_zip(zip_file):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
132 """Returns a list of image file names in the ZIP file."""
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
133 try:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
134 with zipfile.ZipFile(zip_file, "r") as zip_ref:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
135 file_list = [
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
136 f for f in zip_ref.namelist() if f.lower().endswith(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
137 (".png", ".jpg", ".jpeg", ".bmp", ".gif")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
138 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
139 ]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
140 return file_list
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
141 except zipfile.BadZipFile as exc:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
142 raise RuntimeError("Invalid ZIP file.") from exc
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
143 except Exception as exc:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
144 raise RuntimeError("Error reading ZIP file.") from exc
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
145
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
146
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
147 def load_model(model_name, device):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
148 """Loads a specified torchvision model and
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
149 modifies it for feature extraction."""
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
150 if model_name not in AVAILABLE_MODELS:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
151 raise ValueError(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
152 f"Unsupported model: {model_name}. \
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
153 Available models: {list(AVAILABLE_MODELS.keys())}")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
154 try:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
155 if "weights" in inspect.signature(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
156 AVAILABLE_MODELS[model_name]).parameters:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
157 model = AVAILABLE_MODELS[model_name](weights="DEFAULT").to(device)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
158 else:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
159 model = AVAILABLE_MODELS[model_name]().to(device)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
160 logging.info("Model loaded")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
161 except Exception as e:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
162 logging.error(f"Failed to load model {model_name}: {e}")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
163 raise
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
164
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
165 if hasattr(model, "fc"):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
166 model.fc = torch.nn.Identity()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
167 elif hasattr(model, "classifier"):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
168 model.classifier = torch.nn.Identity()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
169 elif hasattr(model, "head"):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
170 model.head = torch.nn.Identity()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
171
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
172 model.eval()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
173 return model
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
174
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
175
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
176 def write_csv(output_csv, list_embeddings, ludwig_format=False):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
177 """Writes embeddings to a CSV file, optionally in Ludwig format."""
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
178 with open(output_csv, mode="w", encoding="utf-8", newline="") as csv_file:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
179 csv_writer = csv.writer(csv_file)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
180 if list_embeddings:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
181 if ludwig_format:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
182 header = ["sample_name", "embedding"]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
183 formatted_embeddings = []
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
184 for embedding in list_embeddings:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
185 sample_name = embedding[0]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
186 vector = embedding[1:]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
187 embedding_str = " ".join(map(str, vector))
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
188 formatted_embeddings.append([sample_name, embedding_str])
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
189 csv_writer.writerow(header)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
190 csv_writer.writerows(formatted_embeddings)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
191 logging.info("CSV created in Ludwig format")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
192 else:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
193 header = ["sample_name"] + [f"vector{i + 1}" for i in range(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
194 len(list_embeddings[0]) - 1
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
195 )]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
196 csv_writer.writerow(header)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
197 csv_writer.writerows(list_embeddings)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
198 logging.info("CSV created")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
199 else:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
200 csv_writer.writerow(["sample_name"] if not ludwig_format
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
201 else ["sample_name", "embedding"])
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
202 logging.info("No valid images found. Empty CSV created.")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
203
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
204
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
205 def extract_embeddings(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
206 model_name,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
207 apply_normalization,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
208 zip_file,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
209 file_list,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
210 transform_type="rgb"):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
211 """Extracts embeddings from images
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
212 using batch processing or sequential fallback."""
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
213
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
214 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
215 model = load_model(model_name, device)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
216 model_settings = MODEL_DEFAULTS.get(model_name, MODEL_DEFAULTS["default"])
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
217 resize = model_settings["resize"]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
218 normalize = model_settings.get("normalize", (
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
219 [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
220 ))
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
221
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
222 # Define transform pipeline
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
223 if transform_type == "grayscale":
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
224 initial_transform = transforms.Grayscale(num_output_channels=3)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
225 elif transform_type == "clahe":
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
226 initial_transform = CLAHETransform()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
227 elif transform_type == "edges":
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
228 initial_transform = CannyTransform()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
229 elif transform_type == "rgba_to_rgb":
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
230 initial_transform = RGBAtoRGBTransform()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
231 else:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
232 initial_transform = transforms.Lambda(lambda x: x.convert("RGB"))
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
233
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
234 transform_list = [initial_transform,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
235 transforms.Resize(resize),
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
236 transforms.ToTensor()]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
237 if apply_normalization:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
238 transform_list.append(transforms.Normalize(mean=normalize[0],
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
239 std=normalize[1]))
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
240 transform = transforms.Compose(transform_list)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
241
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
242 class ImageDataset(Dataset):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
243 def __init__(self, zip_file, file_list, transform=None):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
244 self.zip_file = zip_file
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
245 self.file_list = file_list
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
246 self.transform = transform
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
247
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
248 def __len__(self):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
249 return len(self.file_list)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
250
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
251 def __getitem__(self, idx):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
252 with zipfile.ZipFile(self.zip_file, "r") as zip_ref:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
253 with zip_ref.open(self.file_list[idx]) as file:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
254 try:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
255 image = Image.open(file)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
256 if self.transform:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
257 image = self.transform(image)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
258 return image, os.path.basename(self.file_list[idx])
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
259 except Exception as e:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
260 logging.warning(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
261 "Skipping %s: %s", self.file_list[idx], e
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
262 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
263 return None, os.path.basename(self.file_list[idx])
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
264
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
265 # Custom collate function
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
266 def collate_fn(batch):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
267 batch = [item for item in batch if item[0] is not None]
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
268 if not batch:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
269 return None, None
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
270 images, names = zip(*batch)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
271 return torch.stack(images), names
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
272
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
273 list_embeddings = []
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
274 with torch.inference_mode():
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
275 try:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
276 # Try DataLoader with reduced resource usage
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
277 dataset = ImageDataset(zip_file, file_list, transform=transform)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
278 dataloader = DataLoader(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
279 dataset,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
280 batch_size=16, # Reduced for lower memory usage
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
281 num_workers=1, # Reduced to minimize shared memory
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
282 shuffle=False,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
283 pin_memory=True if device == "cuda" else False,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
284 collate_fn=collate_fn,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
285 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
286 for images, names in dataloader:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
287 if images is None:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
288 continue
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
289 images = images.to(device)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
290 embeddings = model(images).cpu().numpy()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
291 for name, embedding in zip(names, embeddings):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
292 list_embeddings.append([name] + embedding.tolist())
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
293 except RuntimeError as e:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
294 logging.warning(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
295 f"DataLoader failed: {e}. \
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
296 Falling back to sequential processing."
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
297 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
298 # Fallback to sequential processing
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
299 for file in file_list:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
300 with zipfile.ZipFile(zip_file, "r") as zip_ref:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
301 with zip_ref.open(file) as img_file:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
302 try:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
303 image = Image.open(img_file)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
304 image = transform(image)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
305 input_tensor = image.unsqueeze(0).to(device)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
306 embedding = model(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
307 input_tensor
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
308 ).squeeze().cpu().numpy()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
309 list_embeddings.append(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
310 [os.path.basename(file)] + embedding.tolist()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
311 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
312 except Exception as e:
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
313 logging.warning("Skipping %s: %s", file, e)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
314
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
315 return list_embeddings
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
316
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
317
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
318 def main(zip_file, output_csv, model_name, apply_normalization=False,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
319 transform_type="rgb", ludwig_format=False):
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
320 """Main entry point for processing the zip file and
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
321 extracting embeddings."""
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
322 file_list = get_image_files_from_zip(zip_file)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
323 logging.info("Image files listed from ZIP")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
324
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
325 list_embeddings = extract_embeddings(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
326 model_name,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
327 apply_normalization,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
328 zip_file,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
329 file_list,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
330 transform_type
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
331 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
332 logging.info("Embeddings extracted")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
333 write_csv(output_csv, list_embeddings, ludwig_format)
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
334
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
335
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
336 if __name__ == "__main__":
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
337 parser = argparse.ArgumentParser(description="Extract image embeddings.")
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
338 parser.add_argument(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
339 "--zip_file",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
340 required=True,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
341 help="Path to the ZIP file containing images."
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
342 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
343 parser.add_argument(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
344 "--model_name",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
345 required=True,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
346 choices=AVAILABLE_MODELS.keys(),
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
347 help="Model for embedding extraction."
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
348 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
349 parser.add_argument(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
350 "--normalize",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
351 action="store_true",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
352 help="Whether to apply normalization."
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
353 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
354 parser.add_argument(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
355 "--transform_type",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
356 required=True,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
357 help="Image transformation type."
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
358 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
359 parser.add_argument(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
360 "--output_csv",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
361 required=True,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
362 help="Path to the output CSV file"
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
363 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
364 parser.add_argument(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
365 "--ludwig_format",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
366 action="store_true",
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
367 help="Prepare CSV file in Ludwig input format"
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
368 )
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
369
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
370 args = parser.parse_args()
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
371 main(
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
372 args.zip_file,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
373 args.output_csv,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
374 args.model_name,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
375 args.normalize,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
376 args.transform_type,
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
377 args.ludwig_format
38333676a029 planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff changeset
378 )