CXH-1390 CXP-481 Support for Incremental Sync#120
CXH-1390 CXP-481 Support for Incremental Sync#120JavierCarnelli-ConductorOne wants to merge 8 commits into
Conversation
…n all examples All resource-level incremental_sync pagination blocks were missing the required primary_key field, which caused "primary key not found in query results" errors when detecting new or modified resources via event feed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt feed First-run sources derive `since` from the default lookback window. The previous code unconditionally applied earliestEvent (which reflects the last processed event from another source) as a lower bound, pushing `since` forward past the lookback and permanently hiding events — including revokes — that occurred before the first cycle. sinceForSource now returns (time.Time, bool); earliestEvent is only applied when hasCommittedCursor is true. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix WordPress incremental_sync query missing ?<since> placeholder - Log debug message when event feed syncer creation fails instead of silently dropping the error - Refactor hasSince/hasID loops into queryUsesVar helper to reduce duplication - Validate IncrementalSyncConfig.default_lookback at startup - Fix event ID collisions for grants sharing the same (resource, principal, timestamp) by appending the pagination primary key - Fix SAP HANA YAML indentation: get and incremental_sync were siblings of user instead of children - Remove debug fmt.Println statements from event_feed.go Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @JavierCarnelli-ConductorOne's task in 4m 19s —— View job PR Review: Incremental Sync Support
OverviewThis PR adds incremental sync (event feed) support to baton-sql, enabling efficient change detection via timestamp-based queries instead of full re-syncs. The implementation supports three event types:
The architecture is well-designed: each "source" (resource type or grant query) maintains its own independent cursor, and a round-robin scheduler processes one page per source per What works well
Issues foundMust-fix
Should-fix
Nits / suggestions
SummaryThe core design is solid — the multi-source cursor state machine, the separation of committed vs in-flight state, and the reuse of existing query infrastructure are all well-thought-out. The main gap is test coverage: the cursor state machine and validation logic have enough complexity to warrant dedicated unit tests before merging. The |
- Add resource_id field to ResourceIncrementalSync to decouple event
feed ID extraction from list.map.id, mirroring the existing pattern
in GrantsIncrementalSync; falls back to list.map.id when not set
- Add bcel.Env.Compile for startup CEL expression validation; use it
to validate resource_id at config validation time
- Fix grantRowKey to format int64/float64 primary keys with strconv
instead of fmt.Sprintf("%v") which produces scientific notation for
large IDs
- Document why earliestEvent is intentionally ignored in ListEvents
- Document why EventFeeds re-creates syncers separately from
ResourceSyncers and why this is safe
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @JavierCarnelli-ConductorOne's task in 2m 8s —— View job PR Review: Incremental Sync Support
OverviewThis PR adds incremental sync (event feed) support to baton-sql. Instead of full re-syncs, the connector can now detect changes via timestamp-based SQL queries and emit Architecture AssessmentThe design is well-structured:
Issues1. No tests for new code (must-fix)
Key areas to test:
The validation additions in 2. Potential data loss:
|
|
Claude finished @JavierCarnelli-ConductorOne's task in 2m 59s —— View job PR Review: Incremental Sync Support
OverviewThis PR adds incremental sync (event feed) support to baton-sql. Instead of full re-syncs, the connector can now detect changes via timestamp-based SQL queries and emit Also adds ArchitectureThe design is solid:
The latest commit ( Issues1.
|
luisina-santos
left a comment
There was a problem hiding this comment.
I know you've done a lot of testing against a real sql database, but just mentioning: I would like to have test coverage for new event feed code
pkg/bsql/event_feed.go and event_pagination.go add ~500 lines of logic with no corresponding _test.go. The cursor advancement, source ordering, commitAndAdvance, and sinceForSource are all untested. At minimum, unit tests for getSources, commitAndAdvance, and unmarshalCursor would catch regressions.
There was a problem hiding this comment.
should we keep some comments as notes next to the fields/definitions?
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @JavierCarnelli-ConductorOne's task in 2m 8s —— View job PR Review: Incremental Sync Support
OverviewThis PR adds incremental sync (event feed) support to baton-sql — a significant feature that enables efficient change detection via timestamp-based SQL queries instead of full re-syncs. The implementation supports three event source kinds: resource changes, grant changes, and grant revokes, each with independent cursors. Also adds The new test file ( Architecture AssessmentThe design is clean and well-thought-out:
Issues1.
|
ggreer
left a comment
There was a problem hiding this comment.
Noticed a few things. I'm testing the branch out now and might add some more comments after that.
| } | ||
|
|
||
| // IncrementalSyncConfig holds global settings for incremental sync behavior. | ||
| type IncrementalSyncConfig struct { |
There was a problem hiding this comment.
Technically, event feeds and partial syncs are separate features, though the ability to do partial syncs is only useful if you can get a change log to determine what should be partially synced. I'd call a lot of these IncrementalSyncConfig & IncrementalSync attributes something like "EventFeedConfig" & "EventFeed" to make it clearer as to what's going on.
There was a problem hiding this comment.
Also I think this section should contain the query needed by the event feed, since event feeds might not be a 1:1 mapping to resource types. For example, there could be a table called audit_log that contains rows for when both users and groups were added/deleted. I've seen this in a decent number of APIs and I expect to see it in database schemas too.
There was a problem hiding this comment.
I get your idea in both comments.
About the first one, the naming of the properties. I pictured this as a separate feature from any other Event Feed type implementation (such as LastLogIn or other similar).
Since we aim for this to be really loose and untied to any specific DB schemas, I think that having independent fields for each is convenient, cause we cannot be sure if there will be actually something like a "Log table", "Audit registry" or similar. If we name the properties like "Event Feed", we will be implying that it's used in more than just the incremental sync process, when we may need to develop some other specific configs for stuff like the LastLogIn or similar event dependent data.
About the second comment, I will relay on the same idea of we aiming for this to be really flexible for different DB schemas. We are likely to have just something like a date type column on each relational (or even entity) table that will be used to tell when something was modified (such as a permission grant or an update or some resource property). Keeping that in mind, it's not convenient to "join" or put in one config the Query to get the updated data, since this could be different for each resource.
I think that having it separated it's more dynamic and will adapt better for any type of arrangement in different customers DBs.
I get your point tho, I'm open to discuss further about this. Your approach may be convenient in some situations, so let me know if you don't agree!
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @JavierCarnelli-ConductorOne's task in 3m 4s —— View job PR Review: Incremental Sync Support
OverviewThis PR adds incremental sync (event feed) support to baton-sql — a major feature enabling efficient change detection via timestamp-based SQL queries instead of full re-syncs. It also adds New files: The implementation supports three event source kinds per resource type:
Architecture AssessmentThe design is well-structured and shows careful thought about crash safety:
Issues1.
|
| return true, nil | ||
| }) | ||
| if err != nil { | ||
| return nil, "", time.Time{}, fmt.Errorf("baton-sql: failed to list grant changes: %w", err) |
There was a problem hiding this comment.
This always says "grant changes" even when isRevoke=true
There was a problem hiding this comment.
I think it's ok to say "grant changes" since that implies any modification on grants. Being a Grant or a Revoke
| syncers, err := c.config.GetSQLSyncers(ctx, c.db, c.dbEngine, c.celEnv) | ||
| if err != nil { | ||
| l := ctxzap.Extract(ctx) | ||
| l.Debug("baton-sql: failed to create syncers for event feed; incremental sync will be unavailable", zap.Error(err)) |
There was a problem hiding this comment.
Why not an err? A user who configured incremental sync would have no indication why it isn't working
There was a problem hiding this comment.
You mean why log a debug instead of an error?
We cannot return errors in this function since it's not supported and logging them would make the problem visible. Since we normally don't do that in connectors (for example, the common resource syncer in this connector returns nil without any logs) I thought on adding the debug log just in case we need some visibility here
ggreer
left a comment
There was a problem hiding this comment.
I'd like for the config to be tweaked so that we have fewer event feeds, but if that's too much trouble I think this can be approved. Feel free to slack me or whatever if you think that will result in this getting merged faster.
| Actions map[string]ActionConfig `yaml:"actions" json:"actions"` | ||
|
|
||
| // IncrementalSync holds global settings for incremental sync behavior. | ||
| IncrementalSync *IncrementalSyncConfig `yaml:"incremental_sync,omitempty" json:"incremental_sync,omitempty"` |
There was a problem hiding this comment.
Nitpick: I would call this section event_feeds or something similar, as it's possible to have event feeds unrelated to incremental sync, such as keeping track of when users logged in. We don't support that in baton-sql yet, but I'm sure we'll want it in the future.
There was a problem hiding this comment.
I get your point. However this particular section is to config the look back value for the Incremental Sync process if it's convenient to have a specific value for some reason.
I think it's ok to name it like this, since we may have situations where we need different look back values for different event feed processes, right?
Maybe for a process that tracks when users logged in we want some other value for a particular reason. In that we would need a specific field for it.
This type of configs can be useful in some specific situation, it's a handy tool but I guess it won't be commonly used.
If this doesn't make sense to you, we can discuss it further I could be wrong
| } | ||
|
|
||
| // IncrementalSyncConfig holds global settings for incremental sync behavior. | ||
| type IncrementalSyncConfig struct { |
There was a problem hiding this comment.
Also I think this section should contain the query needed by the event feed, since event feeds might not be a 1:1 mapping to resource types. For example, there could be a table called audit_log that contains rows for when both users and groups were added/deleted. I've seen this in a decent number of APIs and I expect to see it in database schemas too.
| Get *GetQuery `yaml:"get,omitempty" json:"get,omitempty"` | ||
|
|
||
| // IncrementalSync defines settings for detecting changed resources via the event feed. | ||
| IncrementalSync *ResourceIncrementalSync `yaml:"incremental_sync,omitempty" json:"incremental_sync,omitempty"` |
There was a problem hiding this comment.
I think this config should live in a separate section for event feeds, though the query to get a specific resource by id should remain in each resource type. That would allow us to specify fewer event feeds to keep track of changes. We have to emit a list_events task for every event feed, so the fewer event feeds we have in a connector, the better it should perform.
btipling
left a comment
There was a problem hiding this comment.
I think Geoff's comments about event feed make sense, there's a disconnect between the event_feed implementation and the incremental sync name in the customer facing config. However a search for incremental sync across our connector shows we have a lot of config options that refer to incremental sync so it is consistent. I don't have any other feedback for this connector. I think I would make a case directly with Geoff with the name. I'm not going override his objection. Whatever this PR ships will become a permanent feature on its config and we should get it right.
Description
Support for Incremental Sync included
Useful links: