Mercurial > repos > goeckslab > extract_embeddings
annotate pytorch_embedding.py @ 0:38333676a029 draft default tip
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
author | goeckslab |
---|---|
date | Thu, 19 Jun 2025 23:33:23 +0000 |
parents | |
children |
rev | line source |
---|---|
0
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
1 """ |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
2 This module provides functionality to extract image embeddings |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
3 using a specified |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
4 pretrained model from the torchvision library. It includes functions to: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
5 - List image files directly from a ZIP file without extraction. |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
6 - Apply model-specific preprocessing and transformations. |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
7 - Extract embeddings using various models. |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
8 - Save the resulting embeddings into a CSV file. |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
9 Modules required: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
10 - argparse: For command-line argument parsing. |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
11 - os, csv, zipfile: For file handling (ZIP file reading, CSV writing). |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
12 - inspect: For inspecting function signatures and models. |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
13 - torch, torchvision: For loading and using pretrained models |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
14 to extract embeddings. |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
15 - PIL, cv2: For image processing tasks such as resizing, normalization, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
16 and conversion. |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
17 """ |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
18 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
19 import argparse |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
20 import csv |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
21 import inspect |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
22 import logging |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
23 import os |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
24 import zipfile |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
25 from inspect import signature |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
26 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
27 import cv2 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
28 import numpy as np |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
29 import torch |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
30 import torchvision.models as models |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
31 from PIL import Image |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
32 from torch.utils.data import DataLoader, Dataset |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
33 from torchvision import transforms |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
34 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
35 # Configure logging |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
36 logging.basicConfig( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
37 filename="/tmp/ludwig_embeddings.log", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
38 filemode="a", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
39 format="%(asctime)s - %(levelname)s - %(message)s", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
40 level=logging.DEBUG, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
41 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
42 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
43 # Create a cache directory in the current working directory |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
44 cache_dir = os.path.join(os.getcwd(), 'hf_cache') |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
45 try: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
46 os.makedirs(cache_dir, exist_ok=True) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
47 logging.info(f"Cache directory created: {cache_dir}, writable: {os.access(cache_dir, os.W_OK)}") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
48 except OSError as e: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
49 logging.error(f"Failed to create cache directory {cache_dir}: {e}") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
50 raise |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
51 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
52 # Available models from torchvision |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
53 AVAILABLE_MODELS = { |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
54 name: getattr(models, name) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
55 for name in dir(models) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
56 if callable( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
57 getattr(models, name) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
58 ) and "weights" in signature(getattr(models, name)).parameters |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
59 } |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
60 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
61 # Default resize and normalization settings for models |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
62 MODEL_DEFAULTS = { |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
63 "default": {"resize": (224, 224), "normalize": ( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
64 [0.485, 0.456, 0.406], [0.229, 0.224, 0.225] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
65 )}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
66 "efficientnet_b1": {"resize": (240, 240)}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
67 "efficientnet_b2": {"resize": (260, 260)}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
68 "efficientnet_b3": {"resize": (300, 300)}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
69 "efficientnet_b4": {"resize": (380, 380)}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
70 "efficientnet_b5": {"resize": (456, 456)}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
71 "efficientnet_b6": {"resize": (528, 528)}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
72 "efficientnet_b7": {"resize": (600, 600)}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
73 "inception_v3": {"resize": (299, 299)}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
74 "swin_b": {"resize": (224, 224), "normalize": ( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
75 [0.5, 0.0, 0.5], [0.5, 0.5, 0.5] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
76 )}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
77 "swin_s": {"resize": (224, 224), "normalize": ( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
78 [0.5, 0.0, 0.5], [0.5, 0.5, 0.5] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
79 )}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
80 "swin_t": {"resize": (224, 224), "normalize": ( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
81 [0.5, 0.0, 0.5], [0.5, 0.5, 0.5] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
82 )}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
83 "vit_b_16": {"resize": (224, 224), "normalize": ( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
84 [0.5, 0.5, 0.5], [0.5, 0.5, 0.5] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
85 )}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
86 "vit_b_32": {"resize": (224, 224), "normalize": ( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
87 [0.5, 0.5, 0.5], [0.5, 0.5, 0.5] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
88 )}, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
89 } |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
90 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
91 for model, settings in MODEL_DEFAULTS.items(): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
92 if "normalize" not in settings: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
93 settings["normalize"] = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
94 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
95 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
96 # Custom transform classes |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
97 class CLAHETransform: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
98 def __init__(self, clip_limit=2.0, tile_grid_size=(8, 8)): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
99 self.clahe = cv2.createCLAHE( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
100 clipLimit=clip_limit, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
101 tileGridSize=tile_grid_size |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
102 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
103 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
104 def __call__(self, img): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
105 img = np.array(img.convert("L")) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
106 img = self.clahe.apply(img) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
107 return Image.fromarray(img).convert("RGB") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
108 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
109 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
110 class CannyTransform: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
111 def __init__(self, threshold1=100, threshold2=200): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
112 self.threshold1 = threshold1 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
113 self.threshold2 = threshold2 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
114 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
115 def __call__(self, img): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
116 img = np.array(img.convert("L")) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
117 edges = cv2.Canny(img, self.threshold1, self.threshold2) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
118 return Image.fromarray(edges).convert("RGB") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
119 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
120 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
121 class RGBAtoRGBTransform: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
122 def __call__(self, img): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
123 if img.mode == "RGBA": |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
124 background = Image.new("RGBA", img.size, (255, 255, 255, 255)) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
125 img = Image.alpha_composite(background, img).convert("RGB") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
126 else: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
127 img = img.convert("RGB") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
128 return img |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
129 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
130 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
131 def get_image_files_from_zip(zip_file): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
132 """Returns a list of image file names in the ZIP file.""" |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
133 try: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
134 with zipfile.ZipFile(zip_file, "r") as zip_ref: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
135 file_list = [ |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
136 f for f in zip_ref.namelist() if f.lower().endswith( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
137 (".png", ".jpg", ".jpeg", ".bmp", ".gif") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
138 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
139 ] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
140 return file_list |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
141 except zipfile.BadZipFile as exc: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
142 raise RuntimeError("Invalid ZIP file.") from exc |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
143 except Exception as exc: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
144 raise RuntimeError("Error reading ZIP file.") from exc |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
145 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
146 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
147 def load_model(model_name, device): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
148 """Loads a specified torchvision model and |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
149 modifies it for feature extraction.""" |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
150 if model_name not in AVAILABLE_MODELS: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
151 raise ValueError( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
152 f"Unsupported model: {model_name}. \ |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
153 Available models: {list(AVAILABLE_MODELS.keys())}") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
154 try: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
155 if "weights" in inspect.signature( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
156 AVAILABLE_MODELS[model_name]).parameters: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
157 model = AVAILABLE_MODELS[model_name](weights="DEFAULT").to(device) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
158 else: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
159 model = AVAILABLE_MODELS[model_name]().to(device) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
160 logging.info("Model loaded") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
161 except Exception as e: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
162 logging.error(f"Failed to load model {model_name}: {e}") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
163 raise |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
164 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
165 if hasattr(model, "fc"): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
166 model.fc = torch.nn.Identity() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
167 elif hasattr(model, "classifier"): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
168 model.classifier = torch.nn.Identity() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
169 elif hasattr(model, "head"): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
170 model.head = torch.nn.Identity() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
171 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
172 model.eval() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
173 return model |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
174 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
175 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
176 def write_csv(output_csv, list_embeddings, ludwig_format=False): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
177 """Writes embeddings to a CSV file, optionally in Ludwig format.""" |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
178 with open(output_csv, mode="w", encoding="utf-8", newline="") as csv_file: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
179 csv_writer = csv.writer(csv_file) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
180 if list_embeddings: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
181 if ludwig_format: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
182 header = ["sample_name", "embedding"] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
183 formatted_embeddings = [] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
184 for embedding in list_embeddings: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
185 sample_name = embedding[0] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
186 vector = embedding[1:] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
187 embedding_str = " ".join(map(str, vector)) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
188 formatted_embeddings.append([sample_name, embedding_str]) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
189 csv_writer.writerow(header) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
190 csv_writer.writerows(formatted_embeddings) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
191 logging.info("CSV created in Ludwig format") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
192 else: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
193 header = ["sample_name"] + [f"vector{i + 1}" for i in range( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
194 len(list_embeddings[0]) - 1 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
195 )] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
196 csv_writer.writerow(header) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
197 csv_writer.writerows(list_embeddings) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
198 logging.info("CSV created") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
199 else: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
200 csv_writer.writerow(["sample_name"] if not ludwig_format |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
201 else ["sample_name", "embedding"]) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
202 logging.info("No valid images found. Empty CSV created.") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
203 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
204 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
205 def extract_embeddings( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
206 model_name, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
207 apply_normalization, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
208 zip_file, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
209 file_list, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
210 transform_type="rgb"): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
211 """Extracts embeddings from images |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
212 using batch processing or sequential fallback.""" |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
213 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
214 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
215 model = load_model(model_name, device) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
216 model_settings = MODEL_DEFAULTS.get(model_name, MODEL_DEFAULTS["default"]) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
217 resize = model_settings["resize"] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
218 normalize = model_settings.get("normalize", ( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
219 [0.485, 0.456, 0.406], [0.229, 0.224, 0.225] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
220 )) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
221 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
222 # Define transform pipeline |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
223 if transform_type == "grayscale": |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
224 initial_transform = transforms.Grayscale(num_output_channels=3) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
225 elif transform_type == "clahe": |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
226 initial_transform = CLAHETransform() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
227 elif transform_type == "edges": |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
228 initial_transform = CannyTransform() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
229 elif transform_type == "rgba_to_rgb": |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
230 initial_transform = RGBAtoRGBTransform() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
231 else: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
232 initial_transform = transforms.Lambda(lambda x: x.convert("RGB")) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
233 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
234 transform_list = [initial_transform, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
235 transforms.Resize(resize), |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
236 transforms.ToTensor()] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
237 if apply_normalization: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
238 transform_list.append(transforms.Normalize(mean=normalize[0], |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
239 std=normalize[1])) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
240 transform = transforms.Compose(transform_list) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
241 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
242 class ImageDataset(Dataset): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
243 def __init__(self, zip_file, file_list, transform=None): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
244 self.zip_file = zip_file |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
245 self.file_list = file_list |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
246 self.transform = transform |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
247 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
248 def __len__(self): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
249 return len(self.file_list) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
250 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
251 def __getitem__(self, idx): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
252 with zipfile.ZipFile(self.zip_file, "r") as zip_ref: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
253 with zip_ref.open(self.file_list[idx]) as file: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
254 try: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
255 image = Image.open(file) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
256 if self.transform: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
257 image = self.transform(image) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
258 return image, os.path.basename(self.file_list[idx]) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
259 except Exception as e: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
260 logging.warning( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
261 "Skipping %s: %s", self.file_list[idx], e |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
262 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
263 return None, os.path.basename(self.file_list[idx]) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
264 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
265 # Custom collate function |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
266 def collate_fn(batch): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
267 batch = [item for item in batch if item[0] is not None] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
268 if not batch: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
269 return None, None |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
270 images, names = zip(*batch) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
271 return torch.stack(images), names |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
272 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
273 list_embeddings = [] |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
274 with torch.inference_mode(): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
275 try: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
276 # Try DataLoader with reduced resource usage |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
277 dataset = ImageDataset(zip_file, file_list, transform=transform) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
278 dataloader = DataLoader( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
279 dataset, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
280 batch_size=16, # Reduced for lower memory usage |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
281 num_workers=1, # Reduced to minimize shared memory |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
282 shuffle=False, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
283 pin_memory=True if device == "cuda" else False, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
284 collate_fn=collate_fn, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
285 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
286 for images, names in dataloader: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
287 if images is None: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
288 continue |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
289 images = images.to(device) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
290 embeddings = model(images).cpu().numpy() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
291 for name, embedding in zip(names, embeddings): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
292 list_embeddings.append([name] + embedding.tolist()) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
293 except RuntimeError as e: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
294 logging.warning( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
295 f"DataLoader failed: {e}. \ |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
296 Falling back to sequential processing." |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
297 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
298 # Fallback to sequential processing |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
299 for file in file_list: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
300 with zipfile.ZipFile(zip_file, "r") as zip_ref: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
301 with zip_ref.open(file) as img_file: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
302 try: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
303 image = Image.open(img_file) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
304 image = transform(image) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
305 input_tensor = image.unsqueeze(0).to(device) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
306 embedding = model( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
307 input_tensor |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
308 ).squeeze().cpu().numpy() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
309 list_embeddings.append( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
310 [os.path.basename(file)] + embedding.tolist() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
311 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
312 except Exception as e: |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
313 logging.warning("Skipping %s: %s", file, e) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
314 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
315 return list_embeddings |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
316 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
317 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
318 def main(zip_file, output_csv, model_name, apply_normalization=False, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
319 transform_type="rgb", ludwig_format=False): |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
320 """Main entry point for processing the zip file and |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
321 extracting embeddings.""" |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
322 file_list = get_image_files_from_zip(zip_file) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
323 logging.info("Image files listed from ZIP") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
324 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
325 list_embeddings = extract_embeddings( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
326 model_name, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
327 apply_normalization, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
328 zip_file, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
329 file_list, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
330 transform_type |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
331 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
332 logging.info("Embeddings extracted") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
333 write_csv(output_csv, list_embeddings, ludwig_format) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
334 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
335 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
336 if __name__ == "__main__": |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
337 parser = argparse.ArgumentParser(description="Extract image embeddings.") |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
338 parser.add_argument( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
339 "--zip_file", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
340 required=True, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
341 help="Path to the ZIP file containing images." |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
342 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
343 parser.add_argument( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
344 "--model_name", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
345 required=True, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
346 choices=AVAILABLE_MODELS.keys(), |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
347 help="Model for embedding extraction." |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
348 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
349 parser.add_argument( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
350 "--normalize", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
351 action="store_true", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
352 help="Whether to apply normalization." |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
353 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
354 parser.add_argument( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
355 "--transform_type", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
356 required=True, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
357 help="Image transformation type." |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
358 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
359 parser.add_argument( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
360 "--output_csv", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
361 required=True, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
362 help="Path to the output CSV file" |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
363 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
364 parser.add_argument( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
365 "--ludwig_format", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
366 action="store_true", |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
367 help="Prepare CSV file in Ludwig input format" |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
368 ) |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
369 |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
370 args = parser.parse_args() |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
371 main( |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
372 args.zip_file, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
373 args.output_csv, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
374 args.model_name, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
375 args.normalize, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
376 args.transform_type, |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
377 args.ludwig_format |
38333676a029
planemo upload for repository https://github.com/goeckslab/gleam.git commit f57ec1ad637e8299db265ee08be0fa9d4d829b93
goeckslab
parents:
diff
changeset
|
378 ) |