Skip to content

feat(python-notebook-migration): add database tables for Notebook Migration tool#5055

Open
zyratlo wants to merge 7 commits into
apache:mainfrom
zyratlo:migration-tool-database-tables
Open

feat(python-notebook-migration): add database tables for Notebook Migration tool#5055
zyratlo wants to merge 7 commits into
apache:mainfrom
zyratlo:migration-tool-database-tables

Conversation

@zyratlo
Copy link
Copy Markdown
Contributor

@zyratlo zyratlo commented May 13, 2026

What changes were proposed in this PR?

This PR adds two new tables to the database, notebook and workflow_notebook_mapping. These tables will be used in the new Python Notebook Migration tool to store the user-uploaded notebook and the generated mapping between the notebook and workflow.

Any related issues, documentation, discussions?

Closes #5054
The parent issue is #4301

Schema

image

How was this PR tested?

New tables were manually confirmed to be created successfully and usable.

Was this PR authored or co-authored using generative AI tooling?

No

@github-actions github-actions Bot added the ddl-change Changes to the TexeraDB DDL label May 13, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.02%. Comparing base (c435aa7) to head (914f6b5).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5055      +/-   ##
============================================
- Coverage     47.13%   46.02%   -1.11%     
+ Complexity     2344     2342       -2     
============================================
  Files          1042     1046       +4     
  Lines         39989    40037      +48     
  Branches       4260     4262       +2     
============================================
- Hits          18849    18429     -420     
- Misses        20015    20491     +476     
+ Partials       1125     1117       -8     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø)
agent-service 33.74% <ø> (-0.03%) ⬇️ Carriedforward from 0a42fcd
amber 50.29% <ø> (-0.04%) ⬇️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 32.18% <ø> (ø)
frontend 35.18% <ø> (-2.64%) ⬇️ Carriedforward from 0a42fcd
python 90.50% <ø> (ø) Carriedforward from 0a42fcd
workflow-compiling-service 56.81% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mengw15 mengw15 self-requested a review May 13, 2026 20:47
@mengw15
Copy link
Copy Markdown
Contributor

mengw15 commented May 18, 2026

Please check this pr: #4401, for database changes

@zyratlo
Copy link
Copy Markdown
Contributor Author

zyratlo commented May 18, 2026

Please check this pr: #4401, for database changes

I have addressed this issue

Copy link
Copy Markdown
Contributor

@mengw15 mengw15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

Comment thread sql/updates/23.sql
Comment on lines +31 to +32
DROP TABLE IF EXISTS notebook CASCADE;
DROP TABLE IF EXISTS workflow_notebook_mapping CASCADE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-run safety: the DROP TABLE IF EXISTS notebook CASCADE + DROP TABLE IF EXISTS workflow_notebook_mapping CASCADE at the top of the migration silently destroys data if this file is ever applied a second time (manual psql -f, a DB reset that forgets the Liquibase DATABASECHANGELOG, or a future edit that changes the MD5 and prompts a re-run). The other update files (21.sql, 22.sql, etc.) are pure ALTER / CREATE and don't carry this footgun. Could we drop the DROPs and use CREATE TABLE IF NOT EXISTS for the two new tables instead?

Comment thread sql/texera_ddl.sql
Comment on lines +441 to +447
CREATE TABLE IF NOT EXISTS notebook
(
nid SERIAL NOT NULL PRIMARY KEY,
wid INT NOT NULL,
notebook JSONB NOT NULL,
FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cardinality question: nothing in the schema prevents inserting two notebook rows for the same wid, but the parent issue (#4301, demo #5 "when the user reopens a workflow that was generated from a notebook, it will also reopen the notebook") reads like a 1:1 relationship. If a workflow is supposed to have at most one notebook, would a UNIQUE (wid) on notebook (or making wid the PK in place of nid) be safer than relying on application code to enforce it?

Comment thread sql/texera_ddl.sql
Comment on lines +441 to +447
CREATE TABLE IF NOT EXISTS notebook
(
nid SERIAL NOT NULL PRIMARY KEY,
wid INT NOT NULL,
notebook JSONB NOT NULL,
FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main read pattern for notebook seems to be "given a workflow, find its notebook" (e.g., reopening a workflow → load its notebook). Postgres doesn't auto-create an index on FK columns, so this lookup would currently sequential-scan the table. Worth a CREATE INDEX ON notebook(wid)? (If a UNIQUE(wid) is added per the previous comment, that already creates an index and this is moot.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddl-change Changes to the TexeraDB DDL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Notebook Migration] 1. Add Database Tables

3 participants