Skip to content

Support None comparisons for null expressions#1489

Open
zeel2104 wants to merge 1 commit intoapache:mainfrom
zeel2104:fix-none-null-comparison
Open

Support None comparisons for null expressions#1489
zeel2104 wants to merge 1 commit intoapache:mainfrom
zeel2104:fix-none-null-comparison

Conversation

@zeel2104
Copy link
Copy Markdown

Which issue does this PR close?

Closes #1483.

Rationale for this change

Comparing expressions to None with == currently builds a regular equality comparison against a null literal, which follows SQL null semantics and does not match null values in filters. This is surprising for Python users, especially since comparing against other scalar values works as expected and the equivalent .is_null() expression does return the expected rows.

What changes are included in this PR?

  • Special-case Expr.__eq__ so expr == None maps to expr.is_null()
  • Special-case Expr.__ne__ so expr != None maps to expr.is_not_null()
  • Add regression tests covering == None and != None on nullable integer and string columns

Are there any user-facing changes?

Yes. Python users can now write col("a") == None and col("a") != None as shorthand for is_null() and is_not_null().

is None is not supported because Python identity checks cannot be overloaded.

Copy link
Copy Markdown
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice addition. Thank you for the submission! I have one minor recommendation.

Comment on lines +176 to +191
def test_relational_expr_none_uses_null_predicates():
ctx = SessionContext()

batch = pa.RecordBatch.from_arrays(
[
pa.array([1, 2, None]),
pa.array(["alpha", None, "gamma"], type=pa.string_view()),
],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]], name="batch_with_nulls")

assert df.filter(col("a") == None).count() == 1 # noqa: E711
assert df.filter(col("a") != None).count() == 2 # noqa: E711
assert df.filter(col("b") == None).count() == 1 # noqa: E711
assert df.filter(col("b") != None).count() == 2 # noqa: E711
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just update the test_relational_expr to have some null values and incorporate this into the existing test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support None comparison for nulls

2 participants