Skip to content

Add static analysis tests for DB migration anti-patterns#64972

Draft
Dev-iL wants to merge 1 commit intoapache:mainfrom
Dev-iL:2604/MIG_rules
Draft

Add static analysis tests for DB migration anti-patterns#64972
Dev-iL wants to merge 1 commit intoapache:mainfrom
Dev-iL:2604/MIG_rules

Conversation

@Dev-iL
Copy link
Copy Markdown
Collaborator

@Dev-iL Dev-iL commented Apr 9, 2026

Summary

  • Add AST-based static analysis tests that detect 3 migration anti-patterns (MIG001/MIG002/MIG003) across all migration files
  • Add unit tests for disable_sqlite_fkeys, mysql_drop_foreignkey_if_exists, and ignore_sqlite_value_error with real database backends
  • Fix confirmed SQLite PRAGMA bugs in migrations 0100 and 0106
  • Add # noqa: MIG0XX suppression comments to 20 old migrations with acknowledged violations

related: #64876

Motivation

PR #64876 fixed a bug in migration 0097 where UPDATE statements before the disable_sqlite_fkeys context manager triggered SQLAlchemy autobegin, making PRAGMA foreign_keys=off a no-op on SQLite. The same pattern existed in migrations 0100 and 0106 but had no automated detection.

This PR adds two layers of protection:

  1. Static analysis tests (test_migration_patterns.py) — AST-based parametrized tests that scan every migration file for known anti-patterns:

    • MIG001: DML (UPDATE/INSERT/DELETE) via op.execute() before disable_sqlite_fkeys — triggers autobegin, making PRAGMA a no-op
    • MIG002: DDL (op.add_column, op.batch_alter_table, etc.) before disable_sqlite_fkeys — same autobegin issue
    • MIG003: DML via op.execute() without context.is_offline_mode() guard — produces data-dependent SQL in offline mode
  2. Utils unit tests (test_migration_utils.py) — tests for the 3 most-used migration utility functions using real database connections via pytest.mark.db_test.

Changes

New files

  • airflow-core/tests/unit/migrations/test_migration_patterns.py — 336 parametrized tests (3 rules x 112 migration files), with # noqa: MIG0XX suppression support and ruff-style rule documentation in the module docstring
  • airflow-core/tests/unit/migrations/test_migration_utils.py — 8 tests for disable_sqlite_fkeys (PRAGMA on/off on SQLite, no-op on others), mysql_drop_foreignkey_if_exists (MySQL-only, skipped elsewhere), and ignore_sqlite_value_error (ValueError suppression on SQLite)

Bug fixes

Suppressions

  • Added # noqa: MIG003 -- <reason> to 62 DML lines across 22 migration files where the DML is intentional and the migration is designed to run online only
  • Added external = ["MIG"] to [tool.ruff.lint] in pyproject.toml to prevent RUF100 from stripping the custom noqa comments

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Opus 4.6 following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg bot added area:db-migrations PRs with DB migration area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels Apr 9, 2026
@Dev-iL Dev-iL changed the title Add static analysis tests for migration anti-patterns Add static analysis tests for DB migration anti-patterns Apr 9, 2026
Prevent migration bugs like apache#64876 (DML before disable_sqlite_fkeys
silently breaking SQLite PRAGMA) from reaching production by adding:

1. AST-based static analysis tests (test_migration_patterns.py) that
   detect three anti-patterns across all migration files:
   - MIG001: DML before disable_sqlite_fkeys (triggers autobegin)
   - MIG002: DDL before disable_sqlite_fkeys (triggers autobegin)
   - MIG003: DML without context.is_offline_mode() guard

2. Unit tests for migration utility functions (test_migration_utils.py)
   covering disable_sqlite_fkeys, mysql_drop_foreignkey_if_exists, and
   ignore_sqlite_value_error with real database backends.

3. Fixes confirmed SQLite PRAGMA bugs in migrations 0100 and 0106 by
   moving all operations inside the disable_sqlite_fkeys block.

4. Adds # noqa: MIG0XX suppression comments (with reasons) to 20 old
   migrations with acknowledged violations, and configures ruff to
   preserve them via external = ["MIG"].
try:
source = filepath.read_text()
except OSError as exc:
print(f"{filepath}: {exc}", file=sys.stderr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rich give us better TUI

silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has specific
requirements about transaction state.

TODO: These checks are candidates for future Ruff rules (AIR category).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not? AIR rules are more for Airflow users instead of Airflow itself

batch_op.alter_column("col", nullable=True)


MIG003 -- DML without offline-mode guard
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we have lots and lots of exception for this one 🤔 is that expected?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on Slack - I think it's justified to err on the side of false-positive in the case of migration issue detection. If you disagree, I can try to tighten this check or disable it. The question is - does it catch real issues we previously missed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never encountered this one, so I can't really tell 🤔

I don't disagree, just want to make sure.

Copy link
Copy Markdown
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do another round. but these are the nits found



def _dialect() -> str:
assert settings.engine is not None # guaranteed by db_test marker
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert settings.engine is not None # guaranteed by db_test marker
if TYPE_CHECKING:
assert settings.engine is not None # guaranteed by db_test marker

I think we usually do it for assert like this

Comment on lines +55 to +56
if _dialect() != "sqlite":
pytest.skip("SQLite-specific test")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use pytest.mark.skipif

"""On non-SQLite backends, the context manager simply yields."""
if _dialect() == "sqlite":
pytest.skip("Non-SQLite backends only")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. then we probably don't need to make _dialect a fixture. _get_dialect might be a better name?


def test_drops_existing_fk(self):
"""On MySQL, an existing foreign key is dropped."""
if _dialect() != "mysql":
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines +97 to +102
"CREATE TABLE IF NOT EXISTS _test_mig_child ("
" id INT PRIMARY KEY,"
" parent_id INT,"
" CONSTRAINT _test_mig_fk FOREIGN KEY (parent_id)"
" REFERENCES _test_mig_parent(id)"
")"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"CREATE TABLE IF NOT EXISTS _test_mig_child ("
" id INT PRIMARY KEY,"
" parent_id INT,"
" CONSTRAINT _test_mig_fk FOREIGN KEY (parent_id)"
" REFERENCES _test_mig_parent(id)"
")"
"""
CREATE TABLE IF NOT EXISTS _test_mig_child (
id INT PRIMARY KEY,
parent_id INT,
CONSTRAINT _test_mig_fk FOREIGN KEY (parent_id)
REFERENCES _test_mig_parent(id)
)"""


count = conn.execute(
text(
"SELECT COUNT(*) FROM information_schema.TABLE_CONSTRAINTS "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same format as the above one would be easier to read

batch_op.alter_column("col", nullable=True)


MIG003 -- DML without offline-mode guard
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never encountered this one, so I can't really tell 🤔

I don't disagree, just want to make sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:db-migrations PRs with DB migration area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants