Skip to content

perf: Allow running as streaming node on the streaming engine#343

Open
Oliver Borchert (borchero) wants to merge 5 commits into
mainfrom
streamin
Open

perf: Allow running as streaming node on the streaming engine#343
Oliver Borchert (borchero) wants to merge 5 commits into
mainfrom
streamin

Conversation

@borchero
Copy link
Copy Markdown
Member

@borchero Oliver Borchert (borchero) commented May 24, 2026

Motivation

I realized that, when running on the streaming engine, our current Rust plugin forces in-memory execution. In some simple benchmarks, this also improves execution speed on the streaming engine 10-20%. I did not measure memory consumption but would expect a significant drop.

Example plan prior to this PR

old

Same plan after this PR

new

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make dataframely’s Polars validation path compatible with Polars’ streaming engine by registering the Rust plugin expressions as element-wise and adjusting the “required validation” expression to avoid forcing in-memory execution.

Changes:

  • Register plugin functions as element-wise to allow streaming execution (is_elementwise=True).
  • Adjust the Rust all_rules_required success-path to return a scalar/broadcastable true mask (intended to be streaming-friendly).
  • Document streaming-engine behavior caveats for lazy validation (early abort and non-deterministic reported failure).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/polars_plugin/mod.rs Changes all_rules_required’s empty-failure return value construction to support element-wise/streaming execution.
dataframely/_plugin.py Updates plugin registration flags and docstring for all_rules_required to reflect streaming/element-wise behavior.
dataframely/schema.py Adds documentation warning about lazy validation behavior under the streaming engine.
Comments suppressed due to low confidence (1)

dataframely/_plugin.py:82

  • Docstring inconsistency: the function is now registered as is_elementwise=True and described as broadcasting to input length, but the Returns: section still says “A scalar boolean expression.” Update the return description to reflect that this expression yields a boolean result per row (mask) (or is broadcast to the input length).
    - It broadcasts the resulting boolean series to the length of the input. This allows
      element-wise evaluation and making this a non-blocking operation on the streaming
      engine.

    Args:
        rules: The rules to evaluate.
        schema_name: The name of the schema being validated. This is used to produce
            better error messages.
        null_is_valid: Whether to treat null values as valid (i.e., `true`).

    Returns:
        A scalar boolean expression.
    """

Comment thread src/polars_plugin/mod.rs
Comment thread dataframely/schema.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (b339296) to head (c79baae).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #343   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           56        56           
  Lines         3404      3427   +23     
=========================================
+ Hits          3404      3427   +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants