Feature/emt 391 improvements#3049
Conversation
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. |
|
/gemini review |
☂️ Code Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/EMT-391 #3049 +/- ##
===================================================
- Coverage 86.20% 86.15% -0.05%
===================================================
Files 493 489 -4
Lines 43774 43513 -261
===================================================
- Hits 37736 37490 -246
+ Misses 6038 6023 -15
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request refactors the entity matching aliasing module by replacing the builder factory pattern with a callable builder provider, simplifying rule discovery to a static list, and removing unused context builders and rule classes. Additionally, error classes now inherit from ToolkitValueError, and file operations have been updated to use safe_read and safe_write utilities. Feedback was provided regarding a lambda assignment in bootstrapper.py that lacks explicit type hints and violates PEP 8 (E731).
| if builder_provider is None: | ||
| registry = LocalRuleDefinitionRegistry.bootstrap(LocalRulesDiscovery.create()) | ||
| composer = DefaultExpressionComposer(provide_aliasing_composition_config()) | ||
| resolved_builder_provider = lambda: DefaultAliasingKuiperBuilder(registry=registry, composer=composer) |
There was a problem hiding this comment.
Assigning a lambda expression to a variable violates PEP 8 (E731) and lacks explicit type hints, which goes against the repository style guide's emphasis on strong typing and requiring type hints for all functions.\n\nUsing a nested helper function with explicit type hints is more idiomatic and type-safe.
def resolved_builder_provider() -> AliasingKuiperBuilder:\n return DefaultAliasingKuiperBuilder(registry=registry, composer=composer)References
- Strong Typing: Use type hints extensively with pyright. Required: All functions, methods, and class attributes must have type hints. (link)
There was a problem hiding this comment.
@gemini-code-assist addressed now in ce59665
What do you think about the other changes and how they match the issues in the description?
There was a problem hiding this comment.
The changes in commit ce59665 and the broader set of refactors effectively address the structural and stylistic concerns raised in the PR description.
Specifically, the following improvements align well with the toolkit's established patterns:
- Simplification of DI and Abstraction Layers: Removing the
AliasingKuiperBuilderFactoryand the various*ContextBuilderclasses significantly flattens the architecture, making the code much easier to trace and maintain. The move to a simpleCallableprovider for the builder is a pragmatic, idiomatic approach. - Explicit Rule Discovery: Replacing the dynamic
importlibdiscovery with an explicit list of definitions inrules_discovery.pyimproves static analysis, removes potential runtime import failures, and aligns with the toolkit's preference for explicit registration. - Exception Hierarchy Alignment: Inheriting from
ToolkitValueErrorensures that errors are caught by the CLI's top-level handler, providing a consistent user experience compared to raw tracebacks. - Standardized I/O: Adopting
pathlib.Pathand the toolkit'ssafe_read/safe_writehelpers resolves the inconsistencies with the rest of the codebase. - Code Cleanup: Removing the dead methods, unused
Ruleclasses, and redundant computations cleans up the surface area of the feature significantly.
Overall, these changes reconcile the new feature with the existing toolkit design culture, making it much more maintainable for future contributors.
| except FileNotFoundError as e: | ||
| raise YamlReadError( | ||
| "File not found", | ||
| file_path=file_path, | ||
| file_path=str(file_path), | ||
| ) from e | ||
| except yaml.YAMLError as e: | ||
| raise YamlReadError( | ||
| f"Invalid YAML syntax: {e!s}", | ||
| file_path=file_path, | ||
| file_path=str(file_path), | ||
| ) from e | ||
| except Exception as e: | ||
| raise YamlReadError( | ||
| f"Error reading file: {e!s}", | ||
| file_path=file_path, | ||
| file_path=str(file_path), |
There was a problem hiding this comment.
file_path.as_posix() ensures correct printing on Windows :)
Description
We want to add the EM aliasing deployment pack #3005
This is a giant PR with some styling shortcomings and some dead code also. A claude summary of possible actions was given as follows:
Overall verdict
The functionality is well-tested (98% patch coverage) and cleanly isolated, but the code reads as if it were written in another language's idiom — Java/C#-style enterprise patterns transplanted onto a Python codebase that has its own, much lighter conventions. The single biggest risk for a team unfamiliar with the codebase is that future maintainers will have two competing styles to reconcile. None of this blocks merge given the feature flag, but several items are worth fixing before this becomes the template others copy.
Structural deviations from the codebase
This is the heart of it. The existing toolkit is pragmatic Python; this PR imports a different design culture. Concretely:
A dependency-injection container written by hand.
bootstrap/bootstrapper.py is ~10 provide_* factory functions wiring objects together with x or provide_default() fallbacks. This is a Spring/Guice idiom. Nothing else in the toolkit does this — collaborators construct objects directly. For the actual call path (one command, one facade), this entire file could be a couple of constructor calls.
Abstraction layers with exactly one implementation.
There are five ABCs in the new tree (RuleDefinitionRegistry, RulesDiscovery, AliasingKuiperBuilder, AliasingKuiperBuilderFactory, ExpressionComposer), each with a single Default*/Local* concrete class. A Builder, plus a BuilderFactory that does nothing but call the builder's constructor, plus a Facade over the factory, is three layers to do one thing. The toolkit's existing commands subclass ToolkitCommand and implement execute() — flat and direct. A reader coming from the rest of the repo will spend real time tracing facade → factory → builder → registry → composer to find where work actually happens.
Filesystem-glob + importlib rule discovery.
registry/rules_discovery.py globs rules/*.py and dynamically imports each module to find RuleDefinition subclasses. This is the only place in _cdf_tk that uses importlib.import_module for discovery. The codebase's established idiom for the same problem is subclasses() (used in resource_ios, builders, yaml_classes, etc.). The dynamic version is slower (cold-start is already a known toolkit pain point — see CDF-27851), harder to trace statically, and silently swallows import failures into log warnings, so a broken rule just vanishes from the registry instead of failing loudly. A plain explicit registry dict, or subclasses() after importing the package, would match convention and remove ~80 lines.
Single-string value objects.
RuleName and RuleDescription in rules/base.py wrap one str each with a post_init non-empty check. Per-rule context *Builder classes (e.g. CaseTransformationContextBuilder) do the same for one field. This is ceremony Python usually expresses with a validated field or a guard clause.
Custom exception hierarchy that bypasses the toolkit's.
io/errors.py defines YamlReadError / InvalidRuleFormatError deriving from bare Exception, and the builder/registry raise BuilderConstraintError, RuleDefinitionNotFoundError, and plain ValueError/FileNotFoundError. The CLI's top-level handler (_cdf.py:178) catches ToolkitError to render clean error messages. None of these inherit from ToolkitError, so a user pointing the command at a malformed YAML gets a raw Python traceback instead of a formatted toolkit error. This is a user-facing behavioral deviation, not just stylistic — worth fixing. The repo already has ToolkitFileNotFoundError, ToolkitValueError, etc. for exactly this.
Not using the toolkit's file I/O helpers.
YamlRulesReader.read_file takes a str path (rest of the codebase passes pathlib.Path) and uses raw open() + yaml.safe_load; the command writes output with Path.write_text. The toolkit has read_yaml_file, safe_read, safe_write, and yaml_safe_dump (used throughout modules.py and elsewhere) that handle encoding and error wrapping consistently.
Concrete bugs / dead code
Bump
Changelog
Added/Changed/Improved/Removed/Fixed
See commits, this does not go into the changelog anyway, since we are merging this into another PR branch.