workflow update for including embeddings as a parallel processing job with aggregation by shixiao-coder · Pull Request #599 · datacommonsorg/import

shixiao-coder · 2026-06-29T14:59:11Z

Also update the config to read the template config from a Yaml file for embeddings

The workflow is e2e tested in https://pantheon.corp.google.com/workflows/workflow/us-central1/spanner-ingestion-workflow/execution/bbbac16f-c86f-4f20-9e4f-9ea8e0bd1997/summary?e=13803378&mods=-monitoring_api_staging&project=datcom-ci

It could properly filter the stats vars and add embeddings for test DB: https://pantheon.corp.google.com/spanner/instances/datcom-spanner-test/databases/dc-test-db

… with aggregation Also update the config to read the template config from a Yaml file for embeddings

…ddings

codacy-production · 2026-06-29T15:01:01Z

Not up to standards ⛔

🔴 Issues 1 minor

Alerts:
⚠ 1 issue (≤ 0 issues of at least minor severity)

Results:
1 new issue

Category Results

CodeStyle 1 minor

View in Codacy

🟢 Metrics 12 complexity

Metric Results

Complexity 12

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

gemini-code-assist

Code Review

This pull request introduces support for embedding generation within the Spanner ingestion workflow, running embedding jobs in parallel with aggregation jobs. It adds configuration loading from a YAML specification file, along with corresponding unit tests and dependencies. The review feedback highlights three critical issues: the HTTP timeout in the Cloud Workflow exceeds the 1800-second limit, the gcsfs dependency is missing for reading GCS paths with pandas, and resolving the embedding spec path relative to the current working directory may fail in Cloud Run.

…ddings

shixiao-coder added 2 commits June 29, 2026 10:58

workflow update for including embeddings as a parallel processing job…

1e84a7d

… with aggregation Also update the config to read the template config from a Yaml file for embeddings

Merge branch 'master' into update-spanner-ingestion-workflow-for-embe…

9601b42

…ddings

shixiao-coder requested review from gmechali and vish-cs June 29, 2026 14:59

gemini-code-assist Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread pipeline/workflow/spanner-ingestion-workflow.yaml

Comment thread pipeline/workflow/ingestion-helper/pyproject.toml

Comment thread pipeline/workflow/ingestion-helper/config.py

shixiao-coder added 2 commits June 30, 2026 10:31

Merge branch 'master' into update-spanner-ingestion-workflow-for-embe…

2edac2b

…ddings

Merge branch 'master' into update-spanner-ingestion-workflow-for-embe…

74df7f8

…ddings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

workflow update for including embeddings as a parallel processing job with aggregation#599

workflow update for including embeddings as a parallel processing job with aggregation#599
shixiao-coder wants to merge 4 commits into
datacommonsorg:masterfrom
shixiao-coder:update-spanner-ingestion-workflow-for-embeddings

shixiao-coder commented Jun 29, 2026 •

edited

Loading

Uh oh!

codacy-production Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shixiao-coder commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codacy-production Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shixiao-coder commented Jun 29, 2026 •

edited

Loading

codacy-production Bot commented Jun 29, 2026 •

edited

Loading