Feature/emt 391 improvements by olacognite · Pull Request #3049 · cognitedata/toolkit

olacognite · 2026-06-03T13:03:48Z

Description

We want to add the EM aliasing deployment pack #3005
This is a giant PR with some styling shortcomings and some dead code also. A claude summary of possible actions was given as follows:

Overall verdict

The functionality is well-tested (98% patch coverage) and cleanly isolated, but the code reads as if it were written in another language's idiom — Java/C#-style enterprise patterns transplanted onto a Python codebase that has its own, much lighter conventions. The single biggest risk for a team unfamiliar with the codebase is that future maintainers will have two competing styles to reconcile. None of this blocks merge given the feature flag, but several items are worth fixing before this becomes the template others copy.

Structural deviations from the codebase

This is the heart of it. The existing toolkit is pragmatic Python; this PR imports a different design culture. Concretely:

A dependency-injection container written by hand.

bootstrap/bootstrapper.py is ~10 provide_* factory functions wiring objects together with x or provide_default() fallbacks. This is a Spring/Guice idiom. Nothing else in the toolkit does this — collaborators construct objects directly. For the actual call path (one command, one facade), this entire file could be a couple of constructor calls.

Abstraction layers with exactly one implementation.

There are five ABCs in the new tree (RuleDefinitionRegistry, RulesDiscovery, AliasingKuiperBuilder, AliasingKuiperBuilderFactory, ExpressionComposer), each with a single Default*/Local* concrete class. A Builder, plus a BuilderFactory that does nothing but call the builder's constructor, plus a Facade over the factory, is three layers to do one thing. The toolkit's existing commands subclass ToolkitCommand and implement execute() — flat and direct. A reader coming from the rest of the repo will spend real time tracing facade → factory → builder → registry → composer to find where work actually happens.

Filesystem-glob + importlib rule discovery.

registry/rules_discovery.py globs rules/*.py and dynamically imports each module to find RuleDefinition subclasses. This is the only place in _cdf_tk that uses importlib.import_module for discovery. The codebase's established idiom for the same problem is subclasses() (used in resource_ios, builders, yaml_classes, etc.). The dynamic version is slower (cold-start is already a known toolkit pain point — see CDF-27851), harder to trace statically, and silently swallows import failures into log warnings, so a broken rule just vanishes from the registry instead of failing loudly. A plain explicit registry dict, or subclasses() after importing the package, would match convention and remove ~80 lines.

Single-string value objects.

RuleName and RuleDescription in rules/base.py wrap one str each with a post_init non-empty check. Per-rule context *Builder classes (e.g. CaseTransformationContextBuilder) do the same for one field. This is ceremony Python usually expresses with a validated field or a guard clause.

Custom exception hierarchy that bypasses the toolkit's.

io/errors.py defines YamlReadError / InvalidRuleFormatError deriving from bare Exception, and the builder/registry raise BuilderConstraintError, RuleDefinitionNotFoundError, and plain ValueError/FileNotFoundError. The CLI's top-level handler (_cdf.py:178) catches ToolkitError to render clean error messages. None of these inherit from ToolkitError, so a user pointing the command at a malformed YAML gets a raw Python traceback instead of a formatted toolkit error. This is a user-facing behavioral deviation, not just stylistic — worth fixing. The repo already has ToolkitFileNotFoundError, ToolkitValueError, etc. for exactly this.

Not using the toolkit's file I/O helpers.

YamlRulesReader.read_file takes a str path (rest of the codebase passes pathlib.Path) and uses raw open() + yaml.safe_load; the command writes output with Path.write_text. The toolkit has read_yaml_file, safe_read, safe_write, and yaml_safe_dump (used throughout modules.py and elsewhere) that handle encoding and error wrapping consistently.

Concrete bugs / dead code

Dead method:YamlRulesReader._validate_rules_key_exists (lines ~122–140) is defined but never called — it was superseded by _validate_root_structure. Remove it.
Dead computation: in aliasing_kuiper_builder.py._resolve_composite_rules, resolved_specs is built up (.append(...)) and then never used — only expanded_sub_rules feeds the result. Either it's leftover from a refactor or it signals an intended check that got dropped. Worth confirming with the author which.
Non-exhaustive match:CaseTransformationRuleDefinition.create_kuiper_macro matches UPPERCASE/LOWERCASE with no fallback. If a third CaseStrategy is ever added, expression is unbound → UnboundLocalError at runtime rather than a clear error. Add an else/case _.
Unused public surface: the Rule / RuleName / RuleDescription classes in rules/base.py aren't referenced anywhere in shipping code (the real flow uses the separate AliasingRule dataclass). Two parallel "rule" types in one feature is confusing; drop the unused one.
O(n²) duplicate check:_validate_rules does rule_names.count(name) inside a set comprehension over the same list. Trivial at current scale, but collections.Counter is the idiomatic one-liner.

Bump

Patch
Skip

Changelog

Added/Changed/Improved/Removed/Fixed

See commits, this does not go into the changelog anyway, since we are merging this into another PR branch.

…factory

gemini-code-assist · 2026-06-03T13:04:09Z

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

olacognite · 2026-06-03T13:09:20Z

/gemini review

github-actions · 2026-06-03T13:10:32Z

☂️ Code Coverage

current status: ✅

Overall Coverage

Statements	Covered	Coverage	Threshold	Status
43513	37490	86%	80%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/api/facade.py	100%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/assembly/aliasing_kuiper_builder.py	96%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/bootstrap/bootstrapper.py	100%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/io/errors.py	97%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/io/yaml_rules_reader.py	88%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/registry/registry.py	96%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/registry/rules_discovery.py	100%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/rules/base.py	100%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/rules/case_transformation.py	93%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/rules/character_substitution.py	93%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/rules/composite.py	98%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/rules/leading_zero_normalization.py	100%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/rules/prefix_suffix.py	100%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/rules/regex_substitution.py	100%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/aliasing/rules/value_expansion.py	96%	🟢
cognite_toolkit/_cdf_tk/commands/entity_matching/entity_matching.py	100%	🟢
TOTAL	97%	🟢

updated for commit: 6ddbed5 by action🐍

codecov · 2026-06-03T13:10:59Z

Codecov Report

❌ Patch coverage is 96.29630% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.15%. Comparing base (10e49e3) to head (6ddbed5).

Files with missing lines	Patch %	Lines
...ity_matching/aliasing/rules/case_transformation.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@                 Coverage Diff                 @@
##           feature/EMT-391    #3049      +/-   ##
===================================================
- Coverage            86.20%   86.15%   -0.05%     
===================================================
  Files                  493      489       -4     
  Lines                43774    43513     -261     
===================================================
- Hits                 37736    37490     -246     
+ Misses                6038     6023      -15

Files with missing lines	Coverage Δ
...tk/commands/entity_matching/aliasing/api/facade.py	`100.00% <100.00%> (ø)`
...ching/aliasing/assembly/aliasing_kuiper_builder.py	`96.20% <100.00%> (-0.39%)`	⬇️
...entity_matching/aliasing/bootstrap/bootstrapper.py	`100.00% <100.00%> (ø)`
..._tk/commands/entity_matching/aliasing/io/errors.py	`96.87% <100.00%> (+0.10%)`	⬆️
...s/entity_matching/aliasing/io/yaml_rules_reader.py	`87.50% <100.00%> (+5.50%)`	⬆️
...ands/entity_matching/aliasing/registry/registry.py	`95.65% <100.00%> (+0.19%)`	⬆️
...tity_matching/aliasing/registry/rules_discovery.py	`100.00% <100.00%> (+14.28%)`	⬆️
...tk/commands/entity_matching/aliasing/rules/base.py	`100.00% <ø> (ø)`
..._matching/aliasing/rules/character_substitution.py	`93.33% <ø> (-1.41%)`	⬇️
...mmands/entity_matching/aliasing/rules/composite.py	`98.03% <ø> (-0.27%)`	⬇️
... and 6 more

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gemini-code-assist

Code Review

This pull request refactors the entity matching aliasing module by replacing the builder factory pattern with a callable builder provider, simplifying rule discovery to a static list, and removing unused context builders and rule classes. Additionally, error classes now inherit from ToolkitValueError, and file operations have been updated to use safe_read and safe_write utilities. Feedback was provided regarding a lambda assignment in bootstrapper.py that lacks explicit type hints and violates PEP 8 (E731).

gemini-code-assist · 2026-06-03T13:12:00Z

+    if builder_provider is None:
+        registry = LocalRuleDefinitionRegistry.bootstrap(LocalRulesDiscovery.create())
+        composer = DefaultExpressionComposer(provide_aliasing_composition_config())
+        resolved_builder_provider = lambda: DefaultAliasingKuiperBuilder(registry=registry, composer=composer)


Assigning a lambda expression to a variable violates PEP 8 (E731) and lacks explicit type hints, which goes against the repository style guide's emphasis on strong typing and requiring type hints for all functions.\n\nUsing a nested helper function with explicit type hints is more idiomatic and type-safe.

def resolved_builder_provider() -> AliasingKuiperBuilder:\n return DefaultAliasingKuiperBuilder(registry=registry, composer=composer)

References

Strong Typing: Use type hints extensively with pyright. Required: All functions, methods, and class attributes must have type hints. ^(link)

@gemini-code-assist addressed now in ce59665

What do you think about the other changes and how they match the issues in the description?

The changes in commit ce59665 and the broader set of refactors effectively address the structural and stylistic concerns raised in the PR description.

Specifically, the following improvements align well with the toolkit's established patterns:

Simplification of DI and Abstraction Layers: Removing the AliasingKuiperBuilderFactory and the various *ContextBuilder classes significantly flattens the architecture, making the code much easier to trace and maintain. The move to a simple Callable provider for the builder is a pragmatic, idiomatic approach.

Explicit Rule Discovery: Replacing the dynamic importlib discovery with an explicit list of definitions in rules_discovery.py improves static analysis, removes potential runtime import failures, and aligns with the toolkit's preference for explicit registration.

Exception Hierarchy Alignment: Inheriting from ToolkitValueError ensures that errors are caught by the CLI's top-level handler, providing a consistent user experience compared to raw tracebacks.

Standardized I/O: Adopting pathlib.Path and the toolkit's safe_read/safe_write helpers resolves the inconsistencies with the rest of the codebase.

Code Cleanup: Removing the dead methods, unused Rule classes, and redundant computations cleans up the surface area of the feature significantly.

Overall, these changes reconcile the new feature with the existing toolkit design culture, making it much more maintainable for future contributors.

ronpal · 2026-06-04T07:19:01Z

        except FileNotFoundError as e:
            raise YamlReadError(
                "File not found",
-                file_path=file_path,
+                file_path=str(file_path),
            ) from e
        except yaml.YAMLError as e:
            raise YamlReadError(
                f"Invalid YAML syntax: {e!s}",
-                file_path=file_path,
+                file_path=str(file_path),
            ) from e
        except Exception as e:
            raise YamlReadError(
                f"Error reading file: {e!s}",
-                file_path=file_path,
+                file_path=str(file_path),


file_path.as_posix() ensures correct printing on Windows :)

Ola Liabøtrø added 11 commits June 3, 2026 14:01

remove YamlRulesReader._validate_rules_key_exists

a734c74

remove unused resolved_specs

f6044ba

raise valueError on unsupported case strategy

adddab8

remove Rule / RuleName RuleDescription, AliasingRule is the one we use

04613b7

Improve peformance of duplicate check

c2a9ee0

inherit from toolkit errors

aa81c04

reuse yaml reader utils

5bc6226

dynamic rule discovery -> explicit

3b4349b

simplify bootstrapper dependency injection

caf3399

simplify AliasingFaced, depend on builderProvider instead of removed …

3bb2725

…factory

remove context builders

e76d740

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

typed default builder in bootstrapper

ce59665

ronpal reviewed Jun 4, 2026

View reviewed changes

use as_posix() in exceptions to print nicer on windows

acb7d28

olacognite requested a review from ronpal June 4, 2026 10:29

lint

6ddbed5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/emt 391 improvements#3049

Feature/emt 391 improvements#3049
olacognite wants to merge 14 commits into
feature/EMT-391from
feature/EMT-391_improvements

olacognite commented Jun 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jun 3, 2026

Uh oh!

olacognite commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

olacognite Jun 3, 2026

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

ronpal Jun 4, 2026

Uh oh!

olacognite Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

olacognite commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Overall verdict

Structural deviations from the codebase

A dependency-injection container written by hand.

Abstraction layers with exactly one implementation.

Filesystem-glob + importlib rule discovery.

Single-string value objects.

Custom exception hierarchy that bypasses the toolkit's.

Not using the toolkit's file I/O helpers.

Concrete bugs / dead code

Bump

Changelog

Added/Changed/Improved/Removed/Fixed

Uh oh!

gemini-code-assist Bot commented Jun 3, 2026

Uh oh!

olacognite commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

☂️ Code Coverage

Overall Coverage

New Files

Modified Files

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

olacognite Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

ronpal Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

olacognite Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

olacognite commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading