GML-2135 Release 1.4.2 by chengbiao-jin · Pull Request #45 · tigergraph/graphrag

chengbiao-jin · 2026-06-23T20:49:50Z

PR Type

Enhancement, Bug fix, Documentation

Description

Adds graph compatibility repair assistant
Reinstalls drifted GSQL queries safely
Hardens ingestion ID and chunk handling
Documents and bumps v1.4.2 release

Diagram Walkthrough

flowchart LR
  admin["KG Admin UI"]
  status["Migration status API"]
  repair["Migration apply API"]
  migrate["GSQL migration helpers"]
  tg["TigerGraph queries"]
  ingest["Ingestion pipeline"]
  ids["Normalized IDs and atomic chunks"]
  admin -- "checks" --> status
  status -- "hashes" --> migrate
  admin -- "repairs" --> repair
  repair -- "recreates" --> tg
  ingest -- "uses" --> ids

File Walkthrough

Relevant files

Enhancement

4 files

ui.py `Add migration status and repair endpoints`	+387/-12
util.py `Detect query drift and atomic upserts`	+115/-9
migrate.py `Add GSQL query migration utilities`	+224/-0
KGAdmin.tsx `Add compatibility check repair dialog`	+343/-6

Bug fix

8 files

graph_rag.py `Harden chunk streaming and batched loading`	+168/-98
supportai_ingest.py `Normalize IDs and log ingest failures`	+34/-13
workers.py `Atomic chunk writes with normalized IDs`	+28/-34
eventual_consistency_checker.py `Normalize entity relationship and chunk IDs`	+9/-7
workers.py `Normalize supportai worker chunk links`	+5/-4
util.py `Align ID normalization whitespace behavior`	+1/-4
IngestGraph.tsx `Report per-file upload failures safely`	+43/-34
StreamIds.gsql `Atomically claim unprocessed vertex IDs`	+15/-6

Error handling

2 files

supportai.py `Log server folder processing failures`	+1/-0
supportai.py `Return clearer ingestion endpoint errors`	+14/-4

Documentation

2 files

README.md `Add v1.4.2 release announcement`	+2/-0
CHANGELOG.md `Document v1.4.2 release changes`	+11/-0

Configuration changes

1 files

VERSION `Bump version to 1.4.2`	+1/-1

- Scan an existing graph for installed queries that have drifted from the shipped version or are missing - Repair them in place without rebuilding the knowledge graph - Refuse repair while a rebuild is in progress and run it under the per-graph lock

- Reconcile chunks left unfinished by an interrupted run before processing new documents - Write each chunk together with its content so cancellation can't leave chunks without content - Normalize vertex IDs the same way across every ingest path so documents with spaces or mixed case in filenames stay consistent - Re-create installed queries that have drifted from the shipped version on initialization - Surface ingestion failures as clear errors instead of failing silently

- Bump version to 1.4.2 - Update CHANGELOG and README releases - Add .gitignore - GML-2131 Honor configured GS/RESTPP ports at login

# Conflicts: # graphrag/app/routers/ui.py

- Omit gsPort/restppPort unless configured, matching the auth() path, so absent config falls back to pyTigerGraph defaults

tg-pr-agent · 2026-06-23T20:52:19Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 Security concerns Information disclosure: `graphrag/app/routers/supportai.py` now returns raw exception text to clients in HTTP 500 details for ingest preparation and ingestion failures. Those exception strings can include filesystem paths, TigerGraph responses, configuration details, or other internal diagnostics. Consider returning a generic client-facing message while logging the detailed exception server-side.
⚡ Recommended focus areas for review Drift Detection The GSQL hash compares the local file body directly against `SHOW QUERY` output after only comment/whitespace normalization. If TigerGraph canonicalizes `CREATE OR REPLACE` to `CREATE`, or otherwise changes harmless query boilerplate, every query may be reported as drifted and repeatedly reinstalled. Consider normalizing expected TG canonical forms before hashing. def _normalize_gsql(body: str) -> str: body = _BLOCK_COMMENT_RE.sub("", body) body = _LINE_COMMENT_RE.sub("", body) body = _WHITESPACE_RE.sub(" ", body).strip() return body def _gsql_hash(body: str) -> str: return hashlib.sha256(_normalize_gsql(body).encode()).hexdigest()[:16] ID Mismatch `v_id` is now only lowercased instead of passed through `util.process_id`, while chunk IDs and previous chunk links are normalized with `process_id`. Documents containing spaces, slashes, or parentheses can therefore create edges from an unnormalized document ID to normalized chunk IDs, which may not match the ingested document vertex. v_id = doc["v_id"].lower() # Use get_chunker for all types (including images) # For images, get_chunker returns SingleChunker which preserves markdown image references chunker = ecc_util.get_chunker(chunker_type, graphname=conn.graphname) # decode the text return from tigergraph as it was encoded when written into jsonl file for uploading chunks = chunker.chunk(doc["attributes"]["text"].encode('raw_unicode_escape').decode('unicode_escape')) # v_id / chunk_id derive from user document content. logger.debug(f"Chunking {v_id} into {len(chunks)} chunk(s)") for i, chunk in enumerate(chunks): chunk_id = util.process_id(f"{v_id}_chunk_{i}") Scalability Risk `stream_docs` and `stream_chunks` now ignore `ttl_batches` and fetch all unprocessed IDs in a single `StreamIds` call. On large graphs or after a long outage, this can produce very large query responses, timeouts, or high memory usage compared with the previous bounded batching behavior. logger.info("streaming docs (single-probe scan)") probe = await stream_ids(conn, "Document", 0, 1) n_docs = 0 if probe.get("error"): logger.warning("stream_docs: StreamIds probe failed; nothing to stream") else: doc_ids_all = probe.get("ids") or [] if not doc_ids_all: logger.info("stream_docs: no unprocessed Documents (epoch_processed == 0)") else: logger.info( f"stream_docs: {len(doc_ids_all)} unprocessed Document(s) to stream" ) for d in doc_ids_all:

- Restore partitioned batch streaming for documents and chunks - Check vertex existence through the TigerGraph client instead of hand-built requests - Use lowercase document ids consistently across the batch ingest path

chengbiao-jin and others added 7 commits June 23, 2026 13:32

GML-2134 Report per-file upload failures in the ingestion dialog

7f077f7

GML-2135 Release 1.4.2

50ddb55

- Bump version to 1.4.2 - Update CHANGELOG and README releases - Add .gitignore - GML-2131 Honor configured GS/RESTPP ports at login

GML-2131: Respect configured TigerGraph ports during UI login (#44)

f9b27b6

Merge remote-tracking branch 'origin/release_1.4.2' into release_1.4.2

70cd65b

# Conflicts: # graphrag/app/routers/ui.py

GML-2131 Guard configured ports in role-resolution connection

efff260

- Omit gsPort/restppPort unless configured, matching the auth() path, so absent config falls back to pyTigerGraph defaults

GML-2133 Refine ingestion changes per review

a1d6ff8

- Restore partitioned batch streaming for documents and chunks - Check vertex existence through the TigerGraph client instead of hand-built requests - Use lowercase document ids consistently across the batch ingest path

chengbiao-jin merged commit 8dfcadb into main Jun 23, 2026
1 check failed

chengbiao-jin deleted the release_1.4.2 branch June 23, 2026 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GML-2135 Release 1.4.2#45

GML-2135 Release 1.4.2#45
chengbiao-jin merged 8 commits into
mainfrom
release_1.4.2

chengbiao-jin commented Jun 23, 2026 •

edited by tg-pr-agent Bot

Loading

Uh oh!

tg-pr-agent Bot commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chengbiao-jin commented Jun 23, 2026 • edited by tg-pr-agent Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

tg-pr-agent Bot commented Jun 23, 2026

PR Reviewer Guide 🔍

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengbiao-jin commented Jun 23, 2026 •

edited by tg-pr-agent Bot

Loading