Add static analysis tests for DB migration anti-patterns#64972
Add static analysis tests for DB migration anti-patterns#64972Dev-iL wants to merge 1 commit intoapache:mainfrom
Conversation
Prevent migration bugs like apache#64876 (DML before disable_sqlite_fkeys silently breaking SQLite PRAGMA) from reaching production by adding: 1. AST-based static analysis tests (test_migration_patterns.py) that detect three anti-patterns across all migration files: - MIG001: DML before disable_sqlite_fkeys (triggers autobegin) - MIG002: DDL before disable_sqlite_fkeys (triggers autobegin) - MIG003: DML without context.is_offline_mode() guard 2. Unit tests for migration utility functions (test_migration_utils.py) covering disable_sqlite_fkeys, mysql_drop_foreignkey_if_exists, and ignore_sqlite_value_error with real database backends. 3. Fixes confirmed SQLite PRAGMA bugs in migrations 0100 and 0106 by moving all operations inside the disable_sqlite_fkeys block. 4. Adds # noqa: MIG0XX suppression comments (with reasons) to 20 old migrations with acknowledged violations, and configures ruff to preserve them via external = ["MIG"].
| try: | ||
| source = filepath.read_text() | ||
| except OSError as exc: | ||
| print(f"{filepath}: {exc}", file=sys.stderr) |
| silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has specific | ||
| requirements about transaction state. | ||
|
|
||
| TODO: These checks are candidates for future Ruff rules (AIR category). |
There was a problem hiding this comment.
Probably not? AIR rules are more for Airflow users instead of Airflow itself
| batch_op.alter_column("col", nullable=True) | ||
|
|
||
|
|
||
| MIG003 -- DML without offline-mode guard |
There was a problem hiding this comment.
It seems we have lots and lots of exception for this one 🤔 is that expected?
There was a problem hiding this comment.
As mentioned on Slack - I think it's justified to err on the side of false-positive in the case of migration issue detection. If you disagree, I can try to tighten this check or disable it. The question is - does it catch real issues we previously missed?
There was a problem hiding this comment.
I have never encountered this one, so I can't really tell 🤔
I don't disagree, just want to make sure.
Lee-W
left a comment
There was a problem hiding this comment.
will do another round. but these are the nits found
|
|
||
|
|
||
| def _dialect() -> str: | ||
| assert settings.engine is not None # guaranteed by db_test marker |
There was a problem hiding this comment.
| assert settings.engine is not None # guaranteed by db_test marker | |
| if TYPE_CHECKING: | |
| assert settings.engine is not None # guaranteed by db_test marker |
I think we usually do it for assert like this
| if _dialect() != "sqlite": | ||
| pytest.skip("SQLite-specific test") |
| """On non-SQLite backends, the context manager simply yields.""" | ||
| if _dialect() == "sqlite": | ||
| pytest.skip("Non-SQLite backends only") | ||
|
|
There was a problem hiding this comment.
same here. then we probably don't need to make _dialect a fixture. _get_dialect might be a better name?
|
|
||
| def test_drops_existing_fk(self): | ||
| """On MySQL, an existing foreign key is dropped.""" | ||
| if _dialect() != "mysql": |
| "CREATE TABLE IF NOT EXISTS _test_mig_child (" | ||
| " id INT PRIMARY KEY," | ||
| " parent_id INT," | ||
| " CONSTRAINT _test_mig_fk FOREIGN KEY (parent_id)" | ||
| " REFERENCES _test_mig_parent(id)" | ||
| ")" |
There was a problem hiding this comment.
| "CREATE TABLE IF NOT EXISTS _test_mig_child (" | |
| " id INT PRIMARY KEY," | |
| " parent_id INT," | |
| " CONSTRAINT _test_mig_fk FOREIGN KEY (parent_id)" | |
| " REFERENCES _test_mig_parent(id)" | |
| ")" | |
| """ | |
| CREATE TABLE IF NOT EXISTS _test_mig_child ( | |
| id INT PRIMARY KEY, | |
| parent_id INT, | |
| CONSTRAINT _test_mig_fk FOREIGN KEY (parent_id) | |
| REFERENCES _test_mig_parent(id) | |
| )""" |
|
|
||
| count = conn.execute( | ||
| text( | ||
| "SELECT COUNT(*) FROM information_schema.TABLE_CONSTRAINTS " |
There was a problem hiding this comment.
same format as the above one would be easier to read
| batch_op.alter_column("col", nullable=True) | ||
|
|
||
|
|
||
| MIG003 -- DML without offline-mode guard |
There was a problem hiding this comment.
I have never encountered this one, so I can't really tell 🤔
I don't disagree, just want to make sure.
Summary
disable_sqlite_fkeys,mysql_drop_foreignkey_if_exists, andignore_sqlite_value_errorwith real database backends# noqa: MIG0XXsuppression comments to 20 old migrations with acknowledged violationsrelated: #64876
Motivation
PR #64876 fixed a bug in migration 0097 where
UPDATEstatements before thedisable_sqlite_fkeyscontext manager triggered SQLAlchemy autobegin, makingPRAGMA foreign_keys=offa no-op on SQLite. The same pattern existed in migrations 0100 and 0106 but had no automated detection.This PR adds two layers of protection:
Static analysis tests (
test_migration_patterns.py) — AST-based parametrized tests that scan every migration file for known anti-patterns:UPDATE/INSERT/DELETE) viaop.execute()beforedisable_sqlite_fkeys— triggers autobegin, making PRAGMA a no-opop.add_column,op.batch_alter_table, etc.) beforedisable_sqlite_fkeys— same autobegin issueop.execute()withoutcontext.is_offline_mode()guard — produces data-dependent SQL in offline modeUtils unit tests (
test_migration_utils.py) — tests for the 3 most-used migration utility functions using real database connections viapytest.mark.db_test.Changes
New files
airflow-core/tests/unit/migrations/test_migration_patterns.py— 336 parametrized tests (3 rules x 112 migration files), with# noqa: MIG0XXsuppression support and ruff-style rule documentation in the module docstringairflow-core/tests/unit/migrations/test_migration_utils.py— 8 tests fordisable_sqlite_fkeys(PRAGMA on/off on SQLite, no-op on others),mysql_drop_foreignkey_if_exists(MySQL-only, skipped elsewhere), andignore_sqlite_value_error(ValueError suppression on SQLite)Bug fixes
add_timetable_type): movedbatch_alter_table(add_column) andop.execute(UPDATE)inside thedisable_sqlite_fkeysblock — same pattern as the fix in fix(migrations): move UPDATEs inside disable_sqlite_fkeys in migration 0097 #64876add_partition_key): movedop.add_column("dag_run", ...)inside thedisable_sqlite_fkeysblockSuppressions
# noqa: MIG003 -- <reason>to 62 DML lines across 22 migration files where the DML is intentional and the migration is designed to run online onlyexternal = ["MIG"]to[tool.ruff.lint]inpyproject.tomlto preventRUF100from stripping the custom noqa commentsWas generative AI tooling used to co-author this PR?
Generated-by: Claude Opus 4.6 following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.