Phase 3: Stored Procedure Parameter Substitution (embed: true)#3596
Open
prshri-msft wants to merge 11 commits into
Open
Phase 3: Stored Procedure Parameter Substitution (embed: true)#3596prshri-msft wants to merge 11 commits into
embed: true)#3596prshri-msft wants to merge 11 commits into
Conversation
Add 'embed' boolean property to ParameterMetadata for marking stored procedure parameters that should be automatically embedded via the EmbeddingService before being passed to the sproc. Config & Schema: - ParameterMetadata.cs: added Embed bool property (default false) - dab.draft.schema.json: added 'embed' to parameter array items Validation (RuntimeConfigValidator.cs): - embed:true only valid on stored-procedure entities - embed:true requires runtime.embeddings configured and enabled - embed:true cannot coexist with a default value Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Core implementation of automatic text-to-vector substitution for embed:true parameters. When a user sends text for an embed parameter, DAB automatically embeds it via EmbeddingService and passes the vector to the stored procedure. New file: - ParameterEmbeddingHelper.cs: text -> TryEmbedAsync -> float[] -> JSON string substitution. Handles empty/null text (400), embedding failures (500). DI wiring: - SqlQueryEngine + SqlMutationEngine: added IEmbeddingService? (optional) - QueryEngineFactory + MutationEngineFactory: pass service through to engines Execution path: - SqlQueryEngine.ExecuteAsync (REST + GraphQL): call helper before SqlExecuteStructure construction - SqlMutationEngine.ExecuteAsync (REST POST): same Metadata type override (Approach B): - MsSqlMetadataProvider: for embed:true params where SQL reports VECTOR as varbinary/Byte[], override SystemType->String, DbType->String, SqlDbType->NVarChar. Follows existing DateTime override pattern. Gated by: embed:true AND Byte[] type. Normal varbinary params unaffected. Verified: 13 tests (5 positive + 8 negative), including semantic search, cross-language, error cases, and non-embed entity regression check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements 4 fixes from independent code reviews (rubber-duck + agent-skills:code-reviewer): Fix 1 (Float precision): Changed ToString format from 'G' (G7 default for Single, ~30% precision loss) to 'R' (round-trippable). Embeddings are precision-sensitive for cosine similarity. Fix 2 (Non-MSSQL rejection): Added Rule 0 in ValidateEmbedParameters to reject embed:true on non-MSSQL data sources. The metadata type override only exists in MsSqlMetadataProvider, so PostgreSQL/MySQL/Cosmos would fail at runtime with confusing errors. Now caught at startup with clear message. Fix 3 (Non-string input validation): Added explicit type check before embedding. Handles both System.String and System.Text.Json.JsonElement (DAB wraps body values in JsonElement). Rejects Number, Boolean, Array, Object with 400. Azure OpenAI's embedding API only accepts strings — being permissive at DAB level would silently embed garbage like 'System.Object[]' for arrays. Fix 4 (Hot-reload null service hard-fail): Removed silent skip when _embeddingService is null but embed params exist. Helper now checks upfront for any embed params, then validates service is available — throws 503 if not. Prevents data integrity risk during hot-reload edge cases where embeddings config gets disabled while DAB is running. Verified: 11 tests (3 positive + 7 negative + 1 GraphQL + 1 non-embed regression). All pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses reviewer concern about silent failure when embed:true is misconfigured on a non-VECTOR parameter. Previously, embed:true on an NVARCHAR parameter would pass validation, bypass the metadata override (since it is not Byte[]), and silently produce empty/wrong results at request time. In MsSqlMetadataProvider.FillSchemaForStoredProcedureAsync: - Restructured the embed:true block to validate type FIRST, then override - If embed:true is configured but the SystemType is not Byte[] (i.e., not a VECTOR-shaped param), throw at startup with clear error message - Catches: NVARCHAR, INT, DATETIME, and other non-VECTOR types Documentation updates: - ParameterMetadata.cs: XML doc warns target sproc param must be VECTOR(N) - dab.draft.schema.json: schema description includes the requirement - MsSqlMetadataProvider.cs: comment explains the check rationale and the remaining VARBINARY edge case (loud SQL error at request time) Verified with real misconfiguration: created test sproc with NVARCHAR(MAX) parameter, marked embed:true in config, DAB fails to start with clear error identifying the procedure, parameter, and the actual type vs expected. Known remaining limitation: VECTOR(N) and varbinary(N) are indistinguishable in INFORMATION_SCHEMA.PARAMETERS. Real varbinary blob misuse would still pass this check but fail at SQL execution with implicit conversion error - loud and clear, so accepted. Documented; sys.parameters-based detection filed as future enhancement. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses Important issues from second-round code review:
Fix 1 (G9 vs R for float precision):
- Microsoft docs: "For Single values, the R format specifier in some cases
fails to successfully round-trip the original value. We recommend that
you use the G9 format specifier instead."
- Changed ToString("R") to ToString("G9") for guaranteed round-trip
- ParameterEmbeddingHelper.cs
Fix 2 (Scope 503 to per-request):
- Previously: helper threw 503 if entity has any embed param + service is null
- Problem: requests that omit optional embed params shouldn't fail just
because the embedding service is unavailable
- Now: 503 only thrown when this specific request supplies a value for an
embed param AND service is null
- Removed misleading hot-reload comment (engines hold cached service ref)
- ParameterEmbeddingHelper.cs
Fix 3 (Validator continue after each rule):
- Previously: a single misconfigured embed param could record up to 4 errors
in HandleOrRecord-record mode (CLI validate)
- Now: each rule fires continue after recording, preventing redundant noise
- Last rule (Default) doesn't need continue
- RuntimeConfigValidator.cs
Fix 4 (Hoist data source lookup):
- Previously: GetDataSourceFromEntityName called per-parameter inside loop
- Now: called once per entity outside the parameter loop
- Cheap fix; obvious code smell removed
- RuntimeConfigValidator.cs
Verified: all positive and negative tests still pass after these changes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverting two changes from Stage 3.7 that, on reflection, made the code worse: Revert 1: Removed the continue statements after each validator rule. - Why reverted: DAB's other validators in RuntimeConfigValidator.cs do NOT use this pattern. They HandleOrRecordException and continue checking all rules. Adding continue here was inconsistent with the codebase. - The reviewer's "noisy errors" concern was an aesthetic preference, not a correctness issue. Collect-all-errors is the established DAB pattern and better UX for users (fix all problems in one pass vs iterate). Revert 2: Moved the embeddingService null check back to the top of the helper. - Why reverted: The per-request scoped check made the failure mode unpredictable. Same DAB instance might fail with 503 sometimes (when embed param supplied) and succeed other times (when omitted). Hard to debug. - The reviewer's "over-broad 503" concern was theoretical. In practice, if embed:true is in your config, the embedding service should be available. Silently working when the service is missing creates a half-broken state that limps along instead of failing clearly. - Original Stage 3.5 behavior (fail upfront if any embed params + no service) is simpler, more predictable, and matches the principle that misconfiguration should fail loud and fast. Kept from Stage 3.7: - G9 vs R for float precision (correct fix, Microsoft-documented) - Hoisted GetDataSourceFromEntityName outside parameter loop (cheap, correct) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nale Addresses two PR review comments from JimRoberts-MS on PR ajtiwari07#1: Comment ajtiwari07#2 (ParameterEmbeddingHelper.cs:157): "is there a limitation that only one parameter can be configured for embedding at a time? I think we'd want to do batching here to avoid multiple sequential waits on the api" Comment ajtiwari07#1 (RuntimeConfigValidator.cs:507): "can this limitation about default values be clarified a bit more?" Changes: 1. ParameterEmbeddingHelper.cs — refactor to use TryEmbedBatchAsync (addresses ajtiwari07#2): Restructured the substitution loop into 3 phases: - COLLECT — validate each embed param value, gather (paramName, text) pairs (preserves per-param error specificity for type/null checks) - BATCH — single TryEmbedBatchAsync call instead of N sequential TryEmbedAsync calls (saves ~(N-1) × API_LATENCY on cache miss) - SUBSTITUTE — write each returned vector back into resolvedParams Behavior preserved: - Single-embed-param path is equivalent (batch of 1) - All error status codes unchanged (400 for bad input, 500 for service failure, 503 for missing service) - In-place mutation contract on resolvedParams unchanged - G9 float format preserved Defensive checks added: - Length mismatch between requests and returned embeddings → 500 - Null/empty embedding for any individual param → 500 Type validation extracted into private ExtractTextValue(paramName, value) helper to flatten the nested if/else in the main loop. Verified with 16 manual test cases across REST POST/GET and GraphQL, covering single-embed and multi-embed sprocs, positive and negative inputs. 2. ParameterEmbeddingHelper.cs — fix misleading internal comment (related to ajtiwari07#1's surrounding context): The previous comment claimed "DAB's existing required-param validation handles missing required params later" — but DAB's request validation for sprocs only checks for extra fields (not missing ones). The actual mechanism that catches missing required params is the SQL Server error "expects parameter X, which was not supplied", parsed by MsSqlDbExceptionParser into a 400 DatabaseInputError. Updated the comment to describe what actually happens. 3. RuntimeConfigValidator.cs — expand embed/default rule rationale (addresses ajtiwari07#1): Replaced the one-line rationale for "Rule 3: embed:true with a default value is not supported" with a multi-paragraph explanation that: - Leads with the conceptual UX point: an embed param represents user input (typically a search query); defaulting it would mean the server fabricates and embeds a query the user never typed. That isn't a sensible fallback. - Notes that defaults on non-embed params of the same sproc remain supported (rule only fires for embed: true params). - Briefly documents why even setting aside the UX concern, supporting embed-defaults would be non-trivial (GraphQL schema literal-baking has no VECTOR type; REST/MCP defaults would be re-embedded every request; embedding-at-startup couples startup to provider availability). - Documents the current observed behavior when a client forgets to supply an embed param (verified empirically): explicit null/empty → 400 BadRequest from the helper; field omitted → 400 DatabaseInputError from SQL via MsSqlDbExceptionParser. Both produce a clear, actionable client error. - Notes that the rule can be lifted later if a real use case emerges. No behavior changes beyond the batching itself. Validator rule unchanged; helper logic for missing-value handling unchanged; error message text unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Behavior-preserving refactors to make Phase 3's previously-untestable internal
logic accessible to the test project. No production behavior changes.
Three changes:
1. src/Core/Azure.DataApiBuilder.Core.csproj — add InternalsVisibleTo
New ItemGroup:
<InternalsVisibleTo Include="Azure.DataApiBuilder.Service.Tests" />
Lets the test project directly invoke 'internal' members of Core. This is
a new pattern in DAB (no existing usages of InternalsVisibleTo). Adopted
intentionally so we can test implementation-detail helpers without
expanding the production public API surface.
2. src/Core/Configurations/RuntimeConfigValidator.cs — visibility change
ValidateEmbedParameters: private → internal
Now testable from Service.Tests via the InternalsVisibleTo bridge above.
Added an XML <remarks> block explaining why it's internal rather than
private, and noting that external callers should still go through
ValidateConfigProperties.
3. src/Core/Services/MetadataProviders/MsSqlMetadataProvider.cs — extract helper
Pulled the inline embed type override + non-VECTOR rejection logic out
of FillSchemaForStoredProcedureAsync into a new internal static method:
internal static void ApplyEmbedTypeOverride(
ParameterDefinition parameterDefinition,
ParameterMetadata paramMetadata,
string schemaName,
string storedProcedureName,
string parameterName)
The body is byte-for-byte the same logic that was previously inline
(early-returns when Embed is false; throws if not Byte[]; otherwise
overrides SystemType/DbType/SqlDbType). All explanatory comments
preserved on the helper as XML <remarks>. The original call site is
replaced with a single call to ApplyEmbedTypeOverride.
Helper is internal static so tests can construct ParameterDefinition
instances and invoke it directly without needing a real metadata
provider or DB connection.
Why this commit lands separately from the actual tests
------------------------------------------------------
These three changes are pure refactors with no test code yet. Stage 4.2-4.4
add the new tests that depend on these refactors. Splitting the refactors
into their own commit keeps each commit focused and bisect-friendly:
this commit can be reverted independently if a future refactor needs to
reorganize the helper without disturbing test code.
Verification
------------
- dotnet build src/Core/Azure.DataApiBuilder.Core.csproj -c Release →
Build succeeded. 0 Warning(s). 0 Error(s).
- dotnet build src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj
-c Release →
Build succeeded. 0 Warning(s). 0 Error(s).
(Confirms InternalsVisibleTo is honored — test project still builds
cleanly with no test code yet referencing the internals.)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the first formal automated test coverage for Phase 3 production code. New file: src/Service.Tests/UnitTests/ParameterEmbeddingHelperTests.cs 28 tests covering ParameterEmbeddingHelper.SubstituteEmbedParametersAsync, organized into 7 #region blocks (matching the EmbeddingServiceTests.cs "#region Batch Embedding Tests" pattern): #region No-Op Cases (4 tests) - NullConfigParams_ReturnsImmediately_NoServiceCall - NoEmbedParams_ReturnsImmediately_NoServiceCall - EmbedParamsConfiguredButNoneSupplied_ReturnsAfterCollect_NoServiceCall - EmptyConfigParamsList_ReturnsImmediately_NoServiceCall #region Service Availability (2 tests) - NullService_WithEmbedParams_Throws503 (defense-in-depth) - NullService_WithoutEmbedParams_NoThrow (backward compat) #region Input Type Validation (8 tests) - PlainString_AcceptsAndEmbeds - JsonElementString_AcceptsAndEmbeds - JsonElementNumber_Throws400 - JsonElementBoolean_Throws400 - JsonElementArray_Throws400 - JsonElementObject_Throws400 - JsonElementNull_Throws400AsEmpty - NonStringNonJsonElementValue_Throws400 #region Empty And Whitespace Validation (3 tests) - EmptyString_Throws400 - WhitespaceString_Throws400 - NullValue_Throws400 #region Batching Behavior (5 tests) — covers Jim review comment ajtiwari07#2 - SingleEmbedParam_CallsBatchOnce_NotSequential - MultipleEmbedParams_CallsBatchOnce_NotSequential ← key batching guarantee - MixedEmbedAndNonEmbed_OnlyEmbedTextsBatched - MultipleEmbedParams_OrderPreserved_BatchTextsMatchConfigOrder - PartiallySuppliedEmbedParams_BatchesSubsetOnly #region Batch Result Handling (4 tests) - BatchSuccess_VectorsSubstitutedInResolvedParams - BatchFailure_Throws500_WithAllParamNames - BatchLengthMismatch_Throws500 - IndividualEmbeddingEmpty_Throws500_NamingFailedParam #region Output Format And Cancellation (2 tests) - VectorJson_UsesG9AndInvariantCulture ← validates locale-independent G9 float serialization - CancellationToken_ForwardedToEmbeddingService Implementation notes -------------------- - Mocking pattern: Mock<IEmbeddingService> (Strict). Each test sets up the expected batch call and verifies post-conditions on resolvedParams. No database, no DI container — pure unit tests. - Helper factories at top of class (EmbedParam, NormalParam, JsonElementFrom, SetupBatch, VerifyBatchedExactlyOnce) keep individual test bodies focused on the behavior being tested. - JsonElement construction uses JsonDocument.Parse(...).RootElement.Clone() so the parsed element survives after the source document is disposed. - Float values used in expected-output assertions are powers of 1/2 (0.5, 0.25, 0.125) which are exactly representable in binary float and round-trip through G9 to the same string representation. The "G9 + InvariantCulture" test specifically uses non-exact values (0.1f, -0.2f, 0.0001234567f) and asserts on parsed-back values rather than exact strings to verify precision and locale independence. - `#nullable enable` directive at the top of the file is required because the helper signature uses IDictionary<string, object?> and the test project is set to `<Nullable>disable</Nullable>` globally. Per-file enable is the smallest scoped change that lets us match the helper's signature without introducing project-wide nullable warnings. Test results ------------ - dotnet build src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj -c Release → Build succeeded. 0 Warning(s). 0 Error(s). - dotnet test --filter "FullyQualifiedName~ParameterEmbeddingHelperTests" → Total tests: 28. Passed: 28. Failed: 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ests)
Adds focused unit tests for two pieces of Phase 3 production code that
were untested before Stage 4.1's refactors made them testable.
Files modified
--------------
1. src/Service.Tests/UnitTests/ConfigValidationUnitTests.cs (+10 tests)
New `#region Embed Parameters Validation` block targeting
`RuntimeConfigValidator.ValidateEmbedParameters` (now `internal` after
Stage 4.1; accessible via `[InternalsVisibleTo]`).
Test coverage by validation rule:
Rule 0 — embed:true requires MSSQL data source (2 [DataRow]s)
- Rule 0: embed:true on PostgreSQL → 503
- Rule 0: embed:true on MySQL → 503
Rule 1 — embed:true requires stored-procedure entity (2 [DataRow]s + 1)
- ValidateEmbedParameters_EmbedOnStoredProcedure_NoError (happy path)
- Rule 1: embed:true on Table → 503
- Rule 1: embed:true on View → 503
Rule 2 — embed:true requires embeddings configured (2 [DataRow]s)
- Rule 2: embeddings.enabled=false → 503
- Rule 2: embeddings section missing → 503
Rule 3 — embed:true cannot have a default value (2 tests)
- ValidateEmbedParameters_EmbedTrue_WithDefault_ThrowsConfigError
- ValidateEmbedParameters_EmbedTrue_WithoutDefault_NoError
Multi-entity message content (1 test)
- ValidateEmbedParameters_MultipleEntities_OneViolates_NamesViolatingEntityAndParam
(asserts the error names ONLY the offending entity/param,
not the healthy ones)
Helper methods (private static, scoped to the test class):
- BuildEmbeddingsEnabled() — valid EmbeddingsOptions for happy path
- BuildSprocEntity(...) — stored-procedure entity with one parameter
- BuildEntityWithSourceType(...) — table/view/sproc entity for Rule 1 tests
- BuildRuntimeConfigForEmbedTest(...) — assembles the runtime config
Uses [DataTestMethod] + [DataRow] where test shapes repeat (matches the
existing ValidateEmbeddingsOptions_BaseUrl pattern in this file).
2. src/Service.Tests/UnitTests/SqlMetadataProviderUnitTests.cs (+6 tests)
New `#region Embed Type Override` block targeting
`MsSqlMetadataProvider.ApplyEmbedTypeOverride` (the static helper extracted
in Stage 4.1; `internal` accessible via `[InternalsVisibleTo]`).
Test coverage:
Type override behavior (4 tests)
- ApplyEmbedTypeOverride_ByteArrayParam_WithEmbedTrue_OverridesToString
- ApplyEmbedTypeOverride_StringParam_WithEmbedTrue_ThrowsAtStartup
- ApplyEmbedTypeOverride_IntParam_WithEmbedTrue_ThrowsAtStartup
- ApplyEmbedTypeOverride_ByteArrayParam_WithEmbedFalse_NoChange
Edge cases (2 tests)
- ApplyEmbedTypeOverride_ByteArrayWithRequiredAndDefault_OnlyTypeMetadataChanges
(proves helper only mutates type fields, not Required/Default/etc.)
- ApplyEmbedTypeOverride_MultipleParams_OnlyEmbedTrueOnesOverridden
(multi-call scenario; proves no cross-call state)
Implementation notes
--------------------
- All 16 tests construct ParameterDefinition / ParameterMetadata / Entity /
RuntimeConfig instances directly. No DB connection, no DI container, no
full metadata-provider construction. Pure unit tests.
- Validator-test helper deliberately leaves embeddings as null when not
passed (NOT defaulted to enabled). Tests targeting Rule 2's "section
missing" path can pass null literally; tests targeting other rules
must pass BuildEmbeddingsEnabled() explicitly.
- Order of rule checks (Rule 0 → 1 → 2 → 3) is reflected in the test
configurations: tests targeting a specific rule satisfy all earlier
rules so the targeted rule fires first.
- StringAssert.Contains is used for error-message verification rather
than full-string equality, to allow the validator's error messages
to evolve without breaking tests.
Test results
------------
- dotnet build src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj
-c Release → Build succeeded. 0 Warning(s). 0 Error(s).
- dotnet test --filter
"FullyQualifiedName~ApplyEmbedTypeOverride|FullyQualifiedName~ValidateEmbedParameters"
→ Total tests: 16. Passed: 16. Failed: 0.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two minor improvements to RuntimeConfigValidator.ValidateEmbedParameters based on review comments on PR ajtiwari07#1. 1. Add entity-level fast-path short-circuit (line ~452) Skip entities whose parameters list contains no embed:true entry, before doing the data-source lookup and entering the inner param loop. Avoids GetDataSourceFromEntityName() and the inner foreach for the common case of entities whose params are all normal pass-through. Before: foreach entity: if Parameters is null: continue lookup data source (work) foreach param: if !Embed: continue (work, repeated per-param) ... rules ... After: foreach entity: if Parameters is null: continue if !Parameters.Any(p => p.Embed): continue (NEW fast-path) lookup data source foreach param: if !Embed: continue ... rules ... The inner !Embed continue is left in place so Rule fields (param.Name etc.) are still scoped per-param when rules fire. 2. Add TODO comment near the MSSQL-only check (line ~477) One-line // TODO: comment noting that PostgreSQL/MySQL could be supported once their metadata providers grow embed-aware type-override logic. Documents the intentional scope boundary for future contributors. Behavior preserved ------------------ - All 4 validation rules still fire identically when embed:true params are present. - Skipped entities (no embed:true params) produce the same observable behavior — the inner loop's per-param continue would have skipped them anyway. The change is pure performance; no rules are bypassed. Verification ------------ - dotnet build src/Core/Azure.DataApiBuilder.Core.csproj -c Release → Build succeeded. 0 Warning(s). 0 Error(s). - All 44 Phase 3 unit tests pass: dotnet test --filter "FullyQualifiedName~ParameterEmbeddingHelperTests |FullyQualifiedName~ValidateEmbedParameters |FullyQualifiedName~ApplyEmbedTypeOverride" → Passed: 44, Failed: 0. - The 10 ValidateEmbedParameters tests in particular still cover all 4 rules; the fast-path change doesn't bypass any of them when an embed:true param is present. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds “Phase 3” support for stored-procedure parameter substitution via embeddings by introducing an embed: true flag on sproc parameters. At runtime, DAB converts user-provided text into an embedding vector using IEmbeddingService and substitutes the serialized vector into the sproc parameter value before execution (MSSQL only), with startup validation and unit tests covering key behaviors.
Changes:
- Extend config/schema to support
parameters[].embedand validate correct usage at startup (MSSQL + stored-proc only, embeddings enabled, no defaults). - Add
ParameterEmbeddingHelper.SubstituteEmbedParametersAsyncand wire it into REST + GraphQL stored-procedure execution paths. - Override MSSQL VECTOR parameter metadata (reported as
byte[]) to flow through the string pipeline, plus add unit tests for type override and substitution logic.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Service.Tests/UnitTests/SqlMetadataProviderUnitTests.cs | Adds unit tests for MSSQL VECTOR param type override behavior when embed: true is set. |
| src/Service.Tests/UnitTests/ParameterEmbeddingHelperTests.cs | New comprehensive unit tests for batching, validation, formatting, and cancellation in parameter embedding substitution. |
| src/Service.Tests/UnitTests/ConfigValidationUnitTests.cs | Adds unit tests for startup validation rules governing embed: true usage. |
| src/Core/Services/MetadataProviders/MsSqlMetadataProvider.cs | Applies and exposes ApplyEmbedTypeOverride to validate/override VECTOR-shaped parameters for embedding substitution. |
| src/Core/Services/Embeddings/ParameterEmbeddingHelper.cs | Implements the collect → batch → substitute pipeline to embed text parameters into vector JSON strings. |
| src/Core/Resolvers/SqlQueryEngine.cs | Wires embedding substitution into GraphQL sproc execution and REST sproc execution. |
| src/Core/Resolvers/SqlMutationEngine.cs | Wires embedding substitution into REST stored-procedure mutation execution. |
| src/Core/Resolvers/Factories/QueryEngineFactory.cs | Passes optional IEmbeddingService into SqlQueryEngine. |
| src/Core/Resolvers/Factories/MutationEngineFactory.cs | Passes optional IEmbeddingService into SqlMutationEngine. |
| src/Core/Configurations/RuntimeConfigValidator.cs | Adds ValidateEmbedParameters startup validation invoked during config validation. |
| src/Core/Azure.DataApiBuilder.Core.csproj | Adds InternalsVisibleTo to allow direct unit testing of new internal helpers. |
| src/Config/ObjectModel/ParameterMetadata.cs | Adds the Embed property and XML docs to parameter metadata. |
| schemas/dab.draft.schema.json | Adds embed to the JSON schema for stored-procedure parameter metadata. |
Comment on lines
+34
to
+38
| /// SQL Server's metadata system reports VECTOR(N) and varbinary indistinguishably, | ||
| /// so DAB cannot detect this misconfiguration at startup. If embed:true is applied | ||
| /// to a non-VECTOR parameter (e.g., NVARCHAR or VARBINARY), the request will fail | ||
| /// at runtime with a SQL error or return semantically incorrect results. | ||
| /// It is the developer's responsibility to ensure the sproc parameter is VECTOR(N). |
| "default": { "type": ["string", "number", "boolean", "null"], "description": "Default value" }, | ||
| "description": { "type": "string", "description": "Parameter description. Since descriptions for multiple parameters are provided as a comma-separated string, individual parameter descriptions must not contain a comma (',')." } | ||
| "description": { "type": "string", "description": "Parameter description. Since descriptions for multiple parameters are provided as a comma-separated string, individual parameter descriptions must not contain a comma (',')." }, | ||
| "embed": { "type": "boolean", "description": "When true, the parameter text is automatically converted to an embedding vector via the configured embedding service before being passed to the stored procedure. Requires runtime.embeddings to be configured. Only valid on stored-procedure entities. The target stored procedure parameter must be declared as VECTOR(N) — DAB cannot detect non-VECTOR misconfigurations at startup due to SQL Server metadata limitations.", "default": false } |
Comment on lines
+169
to
+170
| throw new DataApiBuilderException( | ||
| message: $"Failed to generate embeddings for parameter(s) {paramNames}.", |
Comment on lines
+62
to
65
| <InternalsVisibleTo Include="Azure.DataApiBuilder.Service.Tests" /> | ||
| </ItemGroup> | ||
|
|
||
| <ItemGroup> |
Comment on lines
+428
to
+429
| /// Validates that parameters with embed=true are only used on stored-procedure entities | ||
| /// and that runtime.embeddings is configured when embed parameters are present. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements #3331 — Parameter Substitution.
Closes #3331.
What this PR adds
A new
embed: trueflag for stored-procedure parameters. When the flag is set, DAB:IEmbeddingService(introduced in Enable internal text embedding API #3441) to convert text → vector embedding@paramvalue passed to the stored procedureVECTOR_DISTANCE(or any other VECTOR-shaped logic) and returns resultsThis enables a single-call semantic-search API surface: the client sends free text, DAB handles the embedding, the database does the vector match.
Example
Config:
The target stored procedure must declare
@query_vector VECTOR(N)(e.g.,VECTOR(1536)):Request:
Response: ranked products via the procedure's
VECTOR_DISTANCEsemantic match.Implementation overview
embedboolean property onParameterMetadata; documented in JSON schema.ValidateEmbedParametersinRuntimeConfigValidatorenforces 4 rules at startup: (0) MSSQL data sources only, (1) stored-procedure entities only, (2)runtime.embeddingsmust be configured + enabled, (3) cannot be combined with adefaultvalue.SqlQueryEngine,SqlMutationEngine, and their factories accept an optionalIEmbeddingService?. Existing engines without embed params work unchanged (null service is fine when no embed params exist).ParameterEmbeddingHelper.SubstituteEmbedParametersAsync— three-phase Collect → Batch → Substitute pipeline. UsesTryEmbedBatchAsyncso multi-embed-param sprocs make a single API call instead of N sequential ones.MsSqlMetadataProvideroverrides VECTOR(N) sproc params (reported asByte[]by SQL Server'sINFORMATION_SCHEMA.PARAMETERS) to flow through DAB's String type pipeline. SQL Server auto-casts NVARCHAR → VECTOR at execution. Non-VECTOR params withembed: trueare rejected at startup with a clear error.G9format withInvariantCultureto guarantee round-trip precision and locale independence.Test coverage
44 new unit tests added across 3 files. All pass cleanly.
src/Service.Tests/UnitTests/ParameterEmbeddingHelperTests.cs(new)SubstituteEmbedParametersAsync: no-op cases, service availability, input type validation (string + JsonElement variants), empty/whitespace rejection, single-and-multi batching, batch result handling (success/failure/length mismatch/empty individual vector), G9 + InvariantCulture output format, cancellation token forwardingsrc/Service.Tests/UnitTests/ConfigValidationUnitTests.cs(extended)src/Service.Tests/UnitTests/SqlMetadataProviderUnitTests.cs(extended)ApplyEmbedTypeOverridehelper: Byte[]+embed:true overridden, non-Byte[]+embed:true throws at startup, embed:false untouched, multi-param scenariosRun with:
Manual end-to-end verification
Beyond the unit tests, the feature was manually verified against a live Azure SQL database (with VECTOR(1536) sproc) and Azure OpenAI (
text-embedding-3-small). 16+ scenarios across REST POST, REST GET, and GraphQL — covering positive cases (single + multi-embed-param sprocs), input validation negatives (empty, whitespace, non-string types), and missing-param cases.Prior review history (while #3441 was still in flight):
ajtiwari07/data-api-builder#1.