Fix: Add transformers for videos and bounding boxes by ad-claw000 · Pull Request #720 · aperture-data/aperturedb-python

ad-claw000 · 2026-05-20T11:23:38Z

Summary

Adds VideoProperties and BoundingBoxProperties transformers to handle AddVideo, AddBoundingBox, and AddPolygon commands.
Updates CommonProperties to process these new annotation types.

Verification

Manually verified that the property setting works properly for bounding boxes and polygons in the transformer workflow.

Fixes #335

Copilot

Pull request overview

Adds transformer support for non-image ingestion artifacts (videos and shape annotations) so common/static properties and derived metadata can be applied consistently during transformer-based ingestion pipelines (Fixes #335).

Changes:

Added VideoProperties transformer to compute basic blob-derived video metadata and ensure _Video.adb_data_source is indexed.
Added BoundingBoxProperties transformer to attach annotation-related properties to AddBoundingBox and AddPolygon commands.
Extended Transformer/CommonProperties to detect and apply properties to AddVideo, AddBoundingBox, and AddPolygon, and fixed ImageProperties property initialization/blob mapping.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
aperturedb/transformers/video_properties.py	New transformer computing and attaching video metadata properties.
aperturedb/transformers/bounding_box_properties.py	New transformer attaching annotation properties to bounding boxes and polygons.
aperturedb/transformers/transformer.py	Tracks indices for `AddVideo`, `AddBoundingBox`, and `AddPolygon` commands.
aperturedb/transformers/image_properties.py	Fixes property dict initialization and corrects blob indexing logic.
aperturedb/transformers/common_properties.py	Applies common/static properties to images, videos, bboxes, and polygons.
aperturedb/transformers/init.py	Introduces package exports for transformers (but currently has critical import issues).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…rmers __init__.py

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

aperturedb/transformers/bounding_box_properties.py:35

This transformer iterates over _add_bounding_box_index / _add_polygon_index, which are computed from the first transaction only. If later transactions contain a different number/order of AddBoundingBox/AddPolygon commands (common for per-image variable annotation counts), some annotations won’t get annotation_source/annotation_mode applied, and transactions with fewer annotations can raise IndexError/KeyError (then skip the rest of the updates). Prefer iterating over the current transaction’s command list (x[0]) and applying properties to every command whose key is AddBoundingBox or AddPolygon.

            for ic in getattr(self, "_add_bounding_box_index", []):
                src_properties = x[0][ic]["AddBoundingBox"].setdefault(
                    "properties", {})
                if self.annotation_source:
                    src_properties["annotation_source"] = self.annotation_source
                if self.annotation_mode:
                    src_properties["annotation_mode"] = self.annotation_mode

            for ic in getattr(self, "_add_polygon_index", []):
                src_properties = x[0][ic]["AddPolygon"].setdefault(
                    "properties", {})
                if self.annotation_source:
                    src_properties["annotation_source"] = self.annotation_source
                if self.annotation_mode:
                    src_properties["annotation_mode"] = self.annotation_mode

…tion Removes cached add_bounding_box_index and add_polygon_index from the first transaction in Transformer, as these counts frequently vary per-item (e.g. COCO bounding boxes). CommonProperties and BoundingBoxProperties now iterate over the current transactions commands directly, avoiding IndexError when subsequent items have fewer annotations, and ensuring proper property application when they have more. Added a test case to explicitly check behavior with variable annotation counts.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

…ansaction

- Remove unused pytest import in test_Transformers.py - Remove caching of AddImage and AddVideo indices in Transformer.__init__ to handle variable per-item counts - Update CommonProperties and ImageProperties to scan for AddVideo dynamically instead of relying on cached indices - Remove unused _blob_index_map from ImageProperties and VideoProperties - Add unit test coverage for VideoProperties to ensure dynamic properties apply correctly

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

…ally in remaining embeddings

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

aperturedb/transformers/clip_pytorch_embeddings.py:36

The descriptorset is only created if the first transaction contains an AddImage. If self.data[0] has no images (but later items do), this transformer will still append AddDescriptor commands in getitem, but may never create the descriptor set in the DB, causing ingestion failures. Consider scanning the first few items for an AddImage blob (or lazily initializing on first AddImage in getitem).

        # Let's sample some data to figure out the descriptorset we need.
        sample_blob = None
        for i, c in enumerate(self.data[0][0]):
            if list(c.keys())[0] == "AddImage":
                blob_idx = self._blob_index.index(i)
                sample_blob = self.data[0][1][blob_idx]
                break

        if sample_blob is not None:
            sample = generate_embedding(sample_blob)
            utils = self.get_utils()
            utils.add_descriptorset(
                self.search_set_name, dim=len(sample) // 4, metric=["CS"])

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

…itialization

Addresses review feedback from Copilot

ad-claw000 · 2026-06-08T23:25:17Z

Added test coverage for __init__.py lazy imports to fully address the request for more testing. See commit 0ac0af5.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

…port

luisremis

Summary

This PR successfully adds transformer support for AddVideo, AddBoundingBox, and AddPolygon commands. By iterating over the transaction's commands dynamically rather than relying on cached indices, it robustly handles variable per-item annotation counts and cleanly applies both common properties and specialized metadata.

Findings

The lazy evaluation of descriptor sets in CLIPPyTorchEmbeddings and FacenetPyTorchEmbeddings correctly addresses initialization failures while retaining backward compatibility.
Replacing sys.exit(1) with ImportError in clip.py and facenet.py improves modularity and testing.
Tracking blob_index dynamically alongside the command checks in video_properties.py, image_properties.py, and the embedding transformers is much safer for transactions with mixed blob counts.
Exception handling around property application properly catches errors (e.g., corrupt images) without killing the entire ingestion pipeline.
Tests provide extensive coverage for variable annotation lengths, exception scenarios, and lazy initialization retries.

Verdict

The logic is sound, backward compatibility is maintained, and CI checks are passing. The transition to dynamic transaction scanning is a solid improvement. Since I was asked for a general review without an explicit approval request, I am leaving these notes as a comment.

luisremis

add more testing. for instance, test and verify that initialization failure scenarios in clip/facenet embeddings are thoroughly asserted against different backend conditions.

ad-claw000 · 2026-06-10T10:15:53Z

Added unit tests to verify that initialization failure scenarios in clip/facenet embeddings are thoroughly asserted against backend conditions (transient exceptions during get_descriptorset_list), as requested. See commit 5f83f8a.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

ad-claw000 · 2026-06-12T09:26:43Z

Addressed feedback from @copilot-pull-request-reviewer[bot]:

Updated BoundingBoxProperties to iterate directly over the current transaction's command list.
Added lazy initialization to CLIPPyTorchEmbeddings for AddImage in getitem.

Updated in commit 8c9555e.

ad-claw000 · 2026-06-13T09:26:25Z

Addressed feedback from @copilot-pull-request-reviewer[bot]:

Noted the reviewer request. The issue described was already resolved in a previous commit by iterating directly over the current transaction commands (x[0]). I have added an inline comment to document this logic for variable annotation counts to prevent future regressions.

Updated in commit 1b4ebeb

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

+                    src_properties["adb_image_id"] = str(
+                        src_properties["id"] if "id" in src_properties else uuid.uuid4().hex)


+                    src_properties["adb_video_id"] = str(
+                        src_properties["id"] if "id" in src_properties else uuid.uuid4().hex)


+                    try:
+                        if success or self.search_set_name in utils.get_descriptorset_list():
+                            self._descriptorset_initialized = True
+                    except Exception:
+                        pass


+                    try:
+                        if success or self.search_set_name in utils.get_descriptorset_list():
+                            self._descriptorset_initialized = True
+                    except Exception:
+                        pass


ad-claw000 · 2026-06-14T09:26:01Z

Addressed feedback in recent commit

ad-claw000 · 2026-06-15T09:29:12Z

Addressed feedback from @copilot-pull-request-reviewer:

Refactored imports in __init__.py to use lazy loading.
Removed duplicated loops and index caching from transformer.py and property transformers.
Implemented O(1) tracking for blob positions (blob_index += 1).
Added robust unit test coverage for VideoProperties and variable annotation counts.
Retained the pytest import as it is now utilized in the new tests.

Updated in commit 1b4ebeb

luisremis

Summary

This PR successfully adds transformer support for AddVideo, AddBoundingBox, and AddPolygon objects, extending the current indexing pipeline to non-image artifacts. It also refactors CommonProperties, ImageProperties, and the embedding transformers to iterate dynamically over transaction commands rather than relying on an index map cached from the first item.

Findings

Correctness:
- Iterating over each transaction's cmd_dict correctly handles heterogeneous batches where the number of annotations (e.g., bounding boxes) differs per item.
- The careful tracking of blob_index across blob-consuming commands (AddImage, AddDescriptor, AddVideo, AddBlob) ensures alignment between JSON commands and binary payload chunks even if an exception occurs mid-transaction.
- Hardcoding dim=512 for CLIP and Facenet models instead of deriving it from a sample embedding (len(sample) // 4) during initialization is a clean solution to avoid dependency on an initial image blob.
Maintainability & Performance:
- Adding __getattr__ into transformers/__init__.py properly implements lazy-loading of transformers, avoiding the penalty of importing heavy libraries (like torch and PyTorch model weights) unless explicitly invoked.
- Safely wrapping the dynamic calculations in try/except with logger.exception prevents ingestion failures when occasional corrupted blobs or malformed transactions occur.

Verdict

The code is solid, handles edge cases gracefully, addresses previous reviewer feedback accurately, and includes comprehensive test coverage. Looks good to go!

feat: Add transformers for videos and bounding boxes

5aceeb7

Copilot AI review requested due to automatic review settings May 20, 2026 11:23

Copilot started reviewing on behalf of ad-claw000 May 20, 2026 11:24 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

ad-claw000 added 2 commits May 20, 2026 11:49

Fix pre-commit failures and remove optional dependencies from transfo…

7c2d8ce

…rmers __init__.py

Fix: Address transformer PR review comments

6278058

Copilot AI review requested due to automatic review settings May 21, 2026 01:53

Copilot started reviewing on behalf of ad-claw000 May 21, 2026 01:54 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread aperturedb/transformers/transformer.py Outdated

Comment thread aperturedb/transformers/common_properties.py Outdated

Comment thread aperturedb/transformers/bounding_box_properties.py Outdated

ad-claw000 added 2 commits May 21, 2026 02:51

Fix pre-commit issues

6e8010b

Copilot AI review requested due to automatic review settings May 21, 2026 04:18

Copilot started reviewing on behalf of ad-claw000 May 21, 2026 04:19 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread test/test_Transformers.py

Comment thread aperturedb/transformers/transformer.py

Comment thread aperturedb/transformers/video_properties.py

ad-claw000 added 2 commits May 21, 2026 05:49

fix(transformers): evaluate image and video counts dynamically per tr…

ab6958d

…ansaction

Copilot AI review requested due to automatic review settings May 21, 2026 10:31

Copilot started reviewing on behalf of ad-claw000 May 21, 2026 10:31 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread aperturedb/transformers/transformer.py

ad-claw000 and others added 2 commits May 21, 2026 10:41

fix(transformers): remove unused _add_image_index and compute dynamic…

5c3f442

…ally in remaining embeddings

fix: apply autopep8 formatting

30721e5

Copilot AI review requested due to automatic review settings May 21, 2026 12:21

Copilot started reviewing on behalf of ad-claw000 May 21, 2026 12:22 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread aperturedb/transformers/facenet_pytorch_embeddings.py Outdated

Comment thread aperturedb/transformers/clip_pytorch_embeddings.py Outdated

Comment thread test/test_Transformers.py Outdated

Fix remaining unused variable and unused index review comments

d28a4fd

luisremis assigned ad-claw000 May 21, 2026

Merge branch 'develop' into fix/issue-335

32f1400

Copilot AI review requested due to automatic review settings May 22, 2026 18:54

Copilot started reviewing on behalf of luisremis May 22, 2026 18:55 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread aperturedb/transformers/facenet_pytorch_embeddings.py Outdated

Copilot AI review requested due to automatic review settings June 6, 2026 09:27

Copilot started reviewing on behalf of ad-claw000 June 6, 2026 09:27 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

Comment thread aperturedb/transformers/clip_pytorch_embeddings.py

Comment thread aperturedb/transformers/facenet_pytorch_embeddings.py

OpenClaw Bot added 3 commits June 6, 2026 13:16

fix(transformers): avoid computing embeddings before descriptorset in…

7cff2ca

…itialization

fix: address review comments on PR #720

0ef0f7e

Addresses review feedback from Copilot

test: add coverage for transformers __init__.py

6dfdbed

Copilot AI review requested due to automatic review settings June 8, 2026 23:24

Copilot started reviewing on behalf of ad-claw000 June 8, 2026 23:25 View session

style: apply autopep8 formatting

0ac0af5

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Comment thread aperturedb/transformers/clip_pytorch_embeddings.py Outdated

Comment thread aperturedb/transformers/facenet_pytorch_embeddings.py Outdated

Comment thread test/test_Transformers.py Outdated

fix(transformers): handle missing _ref and remove duplicate pytest im…

7ceee2e

…port

luisremis reviewed Jun 10, 2026

View reviewed changes

luisremis requested changes Jun 10, 2026

View reviewed changes

test: add backend error simulation for descriptorset initialization

5f83f8a

Copilot AI review requested due to automatic review settings June 10, 2026 10:15

Copilot started reviewing on behalf of ad-claw000 June 10, 2026 10:15 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread aperturedb/transformers/__init__.py

Comment thread aperturedb/transformers/video_properties.py Outdated

fix: address review comments on transformers init and properties

8c9555e

fix: address review comments on PR #720

1b4ebeb

Copilot AI review requested due to automatic review settings June 13, 2026 09:26

Copilot started reviewing on behalf of ad-claw000 June 13, 2026 09:26 View session

Copilot AI reviewed Jun 13, 2026

View reviewed changes

luisremis reviewed Jun 15, 2026

View reviewed changes

Merge branch 'develop' into fix/issue-335

4f2fd11

		src_properties["adb_image_id"] = str(
		src_properties["id"] if "id" in src_properties else uuid.uuid4().hex)

		src_properties["adb_video_id"] = str(
		src_properties["id"] if "id" in src_properties else uuid.uuid4().hex)

Conversation

ad-claw000 commented May 20, 2026

Summary

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ad-claw000 commented Jun 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luisremis left a comment

Choose a reason for hiding this comment

Summary

Findings

Verdict

Uh oh!

luisremis left a comment

Choose a reason for hiding this comment

Uh oh!

ad-claw000 commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ad-claw000 commented Jun 12, 2026