Skip to content

Add product features support on products.yml#3043

Merged
cotti merged 56 commits intomainfrom
feature/product-features
Apr 15, 2026
Merged

Add product features support on products.yml#3043
cotti merged 56 commits intomainfrom
feature/product-features

Conversation

@cotti
Copy link
Copy Markdown
Contributor

@cotti cotti commented Apr 6, 2026

This pull request introduces a new "features" system for products, allowing finer control over which subsystems each product participates in (such as release notes and public documentation). It enables products to be included in release notes without requiring them to be public documentation products, and updates the configuration, schema, and codebase to support this distinction. Documentation and validation logic are updated to explain and enforce the new behavior.

Features system for products

  • Added a features mapping to product definitions in products.yml and documented it in products.md, allowing products to opt in to public-reference (public docs) and/or release-notes (changelog) features. Products with no features mapping participate in all subsystems for backward compatibility. [1] [2] [3]
  • Updated the schema and validation logic so that only products with public-reference enabled can be used in documentation frontmatter or as page references; products with only release-notes enabled can appear in changelogs but not in docs. [1] [2] [3] [4]
  • Updated changelog documentation to clarify that products with only the release-notes feature are valid for changelogs, and provided guidance for adding such products.

Codebase changes to support features

  • Introduced the ProductFeatures type and updated the Product and ProductsConfiguration records to track feature participation, including a new PublicReferenceProducts collection for fast lookups. [1] [2] [3] [4]
  • Updated YAML parsing and product lookup logic throughout the codebase to use the new PublicReferenceProducts collection where appropriate, ensuring only eligible products are used in documentation contexts. [1] [2] [3] [4]

Versioning system improvements

  • Added a none versioning system to support products that only participate in release notes and do not have public documentation or versions. [1] [2]
  • Updated product creation logic to assign the none versioning system to products that are not public-reference and have no explicit versioning.

Test and integration updates

  • Updated test helpers and integration tests to initialize the new PublicReferenceProducts property in ProductsConfiguration. [1] [2] [3] [4] [5] [6] [7] [8]

Documentation updates

  • Updated documentation to clarify the behavior of the new features system, including how substitutions and product references work, and added notes to help users configure products correctly for their intended use cases. [1] [2] [3] [4]

These changes make it possible to track internal tools and other non-public products in release notes without exposing them as documentation products, while maintaining backward compatibility for existing product definitions.

@cotti cotti self-assigned this Apr 6, 2026
@cotti cotti requested review from a team as code owners April 6, 2026 16:49
@cotti cotti requested a review from reakaleek April 6, 2026 16:49
@coderabbitai coderabbitai bot added documentation Improvements or additions to documentation feature labels Apr 6, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds per-product feature flags and a derived PublicReferenceProducts subset. Products may declare features.public-reference and/or features.release-notes; absence of features defaults to both enabled. Product creation resolves features, validates feature keys, and may assign VersioningSystem.None for products without public-reference. A PublicReferenceProducts frozen dictionary is exposed and used for substitution generation and frontmatter validation; changelog tooling and other consumers that need all products continue to use the full Products map. VersioningSystem.None and products YAML were updated accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant FS as FileSystem (config files)
    participant ConfigBuilder as ConfigurationFile.CreateProducts
    participant ProductsConfig as ProductsConfiguration
    participant Substitutions as ConfigurationFile (substitutions)
    participant FrontMatter as ProductConverter / YamlSerialization
    participant Changelog as docs-builder changelog

    FS->>ConfigBuilder: read products.yml (with features)
    ConfigBuilder->>ProductsConfig: build Products map
    ConfigBuilder->>ProductsConfig: derive PublicReferenceProducts (filter public-reference)
    ProductsConfig->>Substitutions: provide PublicReferenceProducts
    Substitutions->>Substitutions: generate product.{id} substitutions only for PublicReferenceProducts
    ProductsConfig->>FrontMatter: provide PublicReferenceProducts
    FrontMatter->>FrontMatter: validate frontmatter product IDs against PublicReferenceProducts
    ProductsConfig->>Changelog: provide Products (all configured products)
    Changelog->>Changelog: allow changelog entries for any product (including release-notes-only)
Loading

Suggested labels

feature

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.90% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing a features system for products in products.yml configuration.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the features system, codebase changes, versioning improvements, and documentation updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feature/product-features

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Elastic.Documentation.Configuration/Products/ProductExtensions.cs`:
- Around line 25-27: When creating products in CreateProducts
(ProductExtensions.cs), detect when
ResolveVersioningSystem(versionsConfiguration, kvp.Value.Versioning ?? kvp.Key)
returns null while the product is marked as public-reference
(kvp.Value.PublicReference == true) and throw a descriptive exception to fail
fast; keep allowing a null versioning system for release-notes-only products
(e.g., kvp.Value.ReleaseNotesOnly == true) so those still accept null. Update
both places where VersioningSystem is set (the block using
ResolveVersioningSystem at the top and the similar block at lines ~46-49) to
perform this null-check after calling ResolveVersioningSystem and throw a clear
error referencing the product key when public-reference is enabled. Ensure the
exception message includes the product identifier (kvp.Key) and the
invalid/missing versioning value to aid debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9a50b598-b075-49c7-974c-a01e100de1b8

📥 Commits

Reviewing files that changed from the base of the PR and between c4473c9 and f186f95.

📒 Files selected for processing (21)
  • config/changelog.example.yml
  • config/products.yml
  • docs/configure/site/products.md
  • docs/contribute/changelog.md
  • docs/syntax/frontmatter.md
  • src/Elastic.Documentation.Configuration/Builder/ConfigurationFile.cs
  • src/Elastic.Documentation.Configuration/Products/Product.cs
  • src/Elastic.Documentation.Configuration/Products/ProductExtensions.cs
  • src/Elastic.Markdown/Myst/FrontMatter/Products.cs
  • src/Elastic.Markdown/Myst/YamlSerialization.cs
  • tests-integration/Elastic.Assembler.IntegrationTests/TestHelpers.cs
  • tests-integration/Mcp.Remote.IntegrationTests/McpToolsIntegrationTestsBase.cs
  • tests/Elastic.ApiExplorer.Tests/TestHelpers.cs
  • tests/Elastic.Changelog.Tests/Changelogs/ChangelogTestBase.cs
  • tests/Elastic.Documentation.Build.Tests/TestHelpers.cs
  • tests/Elastic.Documentation.Configuration.Tests/CrossLinkRegistryTests.cs
  • tests/Elastic.Documentation.Configuration.Tests/DocumentInferrerServiceTests.cs
  • tests/Elastic.Documentation.Configuration.Tests/ProductFeaturesTests.cs
  • tests/Elastic.Documentation.Configuration.Tests/VersionInferenceTests.cs
  • tests/Elastic.Markdown.Tests/TestHelpers.cs
  • tests/authoring/Framework/Setup.fs

Comment thread src/Elastic.Documentation.Configuration/Products/ProductExtensions.cs Outdated
@cotti cotti force-pushed the feature/product-features branch from 7a01172 to f186f95 Compare April 6, 2026 17:37
@cotti cotti removed the documentation Improvements or additions to documentation label Apr 6, 2026
@coderabbitai coderabbitai bot added documentation Improvements or additions to documentation and removed feature labels Apr 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/Elastic.Documentation.Configuration/Products/ProductExtensions.cs (1)

24-27: ⚠️ Potential issue | 🟠 Major

Fail fast when public-reference product has unresolved versioning

At Line 26, null versioning is only handled for non-public-reference products. Public-reference products can still be created with VersioningSystem = null, which defers failure to later paths.

Suggested fix
 				var features = ResolveFeatures(kvp.Key, kvp.Value.Features);
 				var versioningSystem = ResolveVersioningSystem(versionsConfiguration, kvp.Value.Versioning ?? kvp.Key);

-				if (versioningSystem is null && !features.PublicReference)
-					versioningSystem = VersioningSystem.None;
+				if (versioningSystem is null)
+				{
+					if (!features.PublicReference)
+						versioningSystem = VersioningSystem.None;
+					else
+						throw new InvalidOperationException(
+							$"Product '{kvp.Key}' has invalid or missing versioning '{kvp.Value.Versioning ?? kvp.Key}' while 'public-reference' is enabled."
+						);
+				}

As per coding guidelines, "Fail fast by throwing exceptions early rather than hiding errors."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Elastic.Documentation.Configuration/Products/ProductExtensions.cs` around
lines 24 - 27, The code calls ResolveVersioningSystem(...) to populate
versioningSystem but only defaults null to VersioningSystem.None when
features.PublicReference is false; update the post-ResolveVersioningSystem check
so that if versioningSystem is null and features.PublicReference is true you
throw an exception (e.g., InvalidOperationException) with a clear message
including the product identifier (kvp.Key or kvp.Value.Name) to fail fast; keep
the existing branch that sets VersioningSystem.None when
features.PublicReference is false. Ensure this change is made in
ProductExtensions near the ResolveVersioningSystem(...) usage to prevent
creating public-reference products with unresolved versioning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/Elastic.Documentation.Configuration/Products/ProductExtensions.cs`:
- Around line 24-27: The code calls ResolveVersioningSystem(...) to populate
versioningSystem but only defaults null to VersioningSystem.None when
features.PublicReference is false; update the post-ResolveVersioningSystem check
so that if versioningSystem is null and features.PublicReference is true you
throw an exception (e.g., InvalidOperationException) with a clear message
including the product identifier (kvp.Key or kvp.Value.Name) to fail fast; keep
the existing branch that sets VersioningSystem.None when
features.PublicReference is false. Ensure this change is made in
ProductExtensions near the ResolveVersioningSystem(...) usage to prevent
creating public-reference products with unresolved versioning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 428f2b9c-f25b-4a6e-88ab-4318a4b85691

📥 Commits

Reviewing files that changed from the base of the PR and between f186f95 and 4d367e9.

📒 Files selected for processing (3)
  • src/Elastic.Documentation.Configuration/Products/ProductExtensions.cs
  • src/Elastic.Documentation.Configuration/Versions/VersionConfiguration.cs
  • tests/Elastic.Documentation.Configuration.Tests/ProductFeaturesTests.cs
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/Elastic.Documentation.Configuration.Tests/ProductFeaturesTests.cs

@coderabbitai coderabbitai bot added feature and removed documentation Improvements or additions to documentation labels Apr 6, 2026
@coderabbitai coderabbitai bot added documentation Improvements or additions to documentation and removed feature labels Apr 6, 2026
Copy link
Copy Markdown
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc updates LGTM!

@coderabbitai coderabbitai bot added feature and removed documentation Improvements or additions to documentation labels Apr 6, 2026
Mpdreamz and others added 27 commits April 15, 2026 11:52
ValidateRedirects now uses Configuration.IsExcluded (docset globs, folder
TOC excludes, include overrides) so deleted or renamed Markdown under
excluded trees does not require redirects.yml entries.

Adds ConfigurationFileExcludeTests for Elasticsearch-style exclude globs.

Made-with: Cursor
)

* Search: Add content_last_updated field for content-only change tracking

Add a new content_last_updated field to DocumentationDocument that only
advances when the page content (stripped_body) actually changes, ignoring
metadata-only changes like navigation reordering or mapping rollovers.

Uses a persistent lookup index (docs-{type}-content-dates-{env}) to
preserve timestamps across index rollovers. Content hashing normalizes
whitespace so reformatting doesn't trigger false updates.

Also updates the sitemap to use content_last_updated instead of
last_updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix JSON002 lint error in ContentDateLookup

Use JsonObject instead of raw string literal for index mapping
to satisfy the dotnet-format JSON002 rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix thread safety, JSON escaping, and unobserved exception

- Use ConcurrentDictionary for _existing and _changed since Resolve
  is called from Parallel.ForEachAsync
- Use JsonEncodedText.Encode for AOT-safe JSON escaping that handles
  control characters and Unicode
- Use Task.WhenAll to observe both tasks when running lookup load and
  orchestrator start in parallel

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Replace ContentDateLookup with Elasticsearch enrich policy

Move content_last_updated resolution from an in-memory drain-and-compare
approach to an Elasticsearch enrich policy + ingest pipeline. This
eliminates the startup memory overhead of loading the entire lookup index
into a ConcurrentDictionary and removes ~200 lines of hand-rolled PIT
pagination.

The ingest pipeline compares content hashes at index time via an enrich
processor and painless script. After indexing, the lookup index is synced
via reindex from the lexical index, which also implicitly cleans up
orphaned entries for deleted pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Harden ContentDateEnrichment error handling and refresh timing

Throw InvalidOperationException on setup failures (index creation, enrich
policy, pipeline) instead of logging warnings and continuing silently.
This ensures CI fails fast when infrastructure setup is broken rather than
indexing documents without content_last_updated.

Add an explicit index refresh between reindex and enrich policy execution
in SyncLookupIndexAsync so newly reindexed documents are visible when the
policy snapshots the lookup index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix ingest timestamp access in content-date pipeline

Replace ctx._ingest.timestamp (not available in Painless script
processors) with a set processor using Mustache {{{_ingest.timestamp}}}.
The set processor pre-sets content_last_updated to the ingest timestamp,
and the script processor only overwrites it when the enrich lookup finds
a matching content hash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Throw on failed async task start in ElasticsearchOperations

DeleteByQueryAsync, ReindexAsync, and UpdateByQueryAsync now throw
InvalidOperationException when PostAsyncTaskAsync returns null instead
of silently skipping the poll. These are wait-for-completion methods
where callers expect the operation to have succeeded on return.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enable autocomplete support for ai_questions by adding a SearchAsYouType
completion sub-field with synonym analyzers.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
updated-dependencies:
- dependency-name: MartinCostello.Logging.XUnit.v3
  dependency-version: 0.7.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
updated-dependencies:
- dependency-name: Elastic.Ingest.Elasticsearch
  dependency-version: 0.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…3099)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Changes API now queries and sorts by content_last_updated so that
metadata-only changes (nav reordering, mapping rollovers) no longer
surface in the feed. The API response shape is unchanged — lastUpdated
is still the JSON field name — keeping this non-breaking for consumers.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…3098)

* Search: Use staging index + alias swap for content date lookup sync

Replace the delete-then-reindex flow in ContentDateEnrichment with a
staging index + atomic alias swap to eliminate the window where the
lookup index is empty. If a reindex fails, the previous lookup data
remains intact behind the alias.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Harden staging index lifecycle error handling

Make RefreshIndexAsync throw on failure so SwapAliasAsync never runs
against an unrefreshed staging index. Distinguish 404 from transient
errors in ResolveBackingIndexAsync and fail deterministically when the
alias points at multiple indices. Add a GUID suffix to staging index
names for collision resistance under concurrent runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Search: Include synonym ID in synonym rule values

The synonym dictionary uses the first term as the key and remaining
terms as values. When publishing to Elasticsearch, only the values
were included in the synonym string, omitting the key term itself.
This meant e.g. "ilm" was the rule ID but not part of the synonym
set, so it wouldn't match "index lifecycle management".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Change synonyms from dictionary to flat list

The dictionary structure used the first term as the key and Skip(1) for
values, which artificially separated one synonym from its group. This
caused the key term to be omitted from synonym rules sent to
Elasticsearch.

Replace Dictionary<string, string[]> with IReadOnlyList<string[]> so all
terms in a synonym group are treated equally. The first term is still
used as the ES rule ID for readability but is no longer excluded from
the synonym string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add .superset to gitignore and remove from tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix F# test to use IReadOnlyList for synonyms

The synonyms type was changed from Dictionary<string, string[]> to
IReadOnlyList<string[]> in 5cea709 but the F# authoring test was
not updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Add missing final newline to SearchConfiguration.cs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sitemap command was querying the lexical index, which does not
have content_last_updated populated. Switch to the semantic index
where the field is available.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enrich policies can't be updated or deleted while referenced by a
pipeline. Version the policy name with a SHA256 hash of its definition
so re-runs reuse the existing policy, and definition changes create a
new one alongside the old. After the pipeline is updated to reference
the new policy, old policies are cleaned up.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Made with ❤️️ by updatecli

Co-authored-by: elastic-observability-automation[bot] <180520183+elastic-observability-automation[bot]@users.noreply.github.com>
* Search: Simplify ai_questions prompt for search-friendly output

The current prompt generates overly complex questions that don't match
real user search behavior. Redesign the prompt to produce shorter,
simpler questions (3-10 words) suitable for autocomplete and semantic
search — e.g. "What is agent builder?" instead of "How do I import
external tools using Model Context Protocol?"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix contradictory guidance in ai_questions prompt

The prompt said "Avoid specific API names" but then used "What is the
bulk API?" as an example. Remove the API name restriction since we want
questions to reference feature/product names naturally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Add ai_autocomplete_questions field with simplified prompt

Restore the original ai_questions prompt and add a new
ai_autocomplete_questions field with a prompt targeting short, simple
questions (3-10 words) suitable for search bar autocomplete. Includes
lexical mapping with SearchAsYouType completion multi-field and semantic
text mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Add completion suggest multi-field to ai_questions mappings

Add suggest completion multi-field to both ai_questions and
ai_autocomplete_questions, matching the approach in #3108.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Remove suggest multi-field from ai_questions mapping

Keep the suggest completion field only on ai_autocomplete_questions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Dependencies: Bump Elastic.Ingest.Elasticsearch to 0.41.1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Dependencies: Bump Elastic.Mapping to 0.41.1

Transitive dependency of Elastic.Ingest.Elasticsearch 0.41.1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: Update config/versions.yml apm-agent-go 2.7.7

Made with ❤️️ by updatecli

* chore: Update config/versions.yml terraform-google-edot-cf 0.1.3

Made with ❤️️ by updatecli

---------

Co-authored-by: elastic-observability-automation[bot] <180520183+elastic-observability-automation[bot]@users.noreply.github.com>
Set Assembler stack.next to the upcoming minor for staging validation
after feature freeze (Stack 9.4.0 GA).

Refs elastic/dev#3502

Made-with: Cursor
* Search: Add post-indexing content date resolution via update_by_query

HashedBulkUpdate uses bulk update actions (scripted upserts) which skip
Elasticsearch ingest pipelines, so content_last_updated was never set
during normal indexing. This adds a ResolveContentDatesAsync step that
runs _update_by_query with the enrichment pipeline after indexing
completes, and switches StopAsync to use read aliases instead of the
write target (which is removed after CompleteAsync).

Includes integration tests against a real Elasticsearch container
validating cold-start, date preservation, change detection, and the
bulk-update pipeline gap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix lint warnings in content date enrichment tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Felipe Cotti <felipe.cotti@elastic.co>
)

* Search: Add content_last_updated field for content-only change tracking

Add a new content_last_updated field to DocumentationDocument that only
advances when the page content (stripped_body) actually changes, ignoring
metadata-only changes like navigation reordering or mapping rollovers.

Uses a persistent lookup index (docs-{type}-content-dates-{env}) to
preserve timestamps across index rollovers. Content hashing normalizes
whitespace so reformatting doesn't trigger false updates.

Also updates the sitemap to use content_last_updated instead of
last_updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix JSON002 lint error in ContentDateLookup

Use JsonObject instead of raw string literal for index mapping
to satisfy the dotnet-format JSON002 rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix thread safety, JSON escaping, and unobserved exception

- Use ConcurrentDictionary for _existing and _changed since Resolve
  is called from Parallel.ForEachAsync
- Use JsonEncodedText.Encode for AOT-safe JSON escaping that handles
  control characters and Unicode
- Use Task.WhenAll to observe both tasks when running lookup load and
  orchestrator start in parallel

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Replace ContentDateLookup with Elasticsearch enrich policy

Move content_last_updated resolution from an in-memory drain-and-compare
approach to an Elasticsearch enrich policy + ingest pipeline. This
eliminates the startup memory overhead of loading the entire lookup index
into a ConcurrentDictionary and removes ~200 lines of hand-rolled PIT
pagination.

The ingest pipeline compares content hashes at index time via an enrich
processor and painless script. After indexing, the lookup index is synced
via reindex from the lexical index, which also implicitly cleans up
orphaned entries for deleted pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Harden ContentDateEnrichment error handling and refresh timing

Throw InvalidOperationException on setup failures (index creation, enrich
policy, pipeline) instead of logging warnings and continuing silently.
This ensures CI fails fast when infrastructure setup is broken rather than
indexing documents without content_last_updated.

Add an explicit index refresh between reindex and enrich policy execution
in SyncLookupIndexAsync so newly reindexed documents are visible when the
policy snapshots the lookup index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix ingest timestamp access in content-date pipeline

Replace ctx._ingest.timestamp (not available in Painless script
processors) with a set processor using Mustache {{{_ingest.timestamp}}}.
The set processor pre-sets content_last_updated to the ingest timestamp,
and the script processor only overwrites it when the enrich lookup finds
a matching content hash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Throw on failed async task start in ElasticsearchOperations

DeleteByQueryAsync, ReindexAsync, and UpdateByQueryAsync now throw
InvalidOperationException when PostAsyncTaskAsync returns null instead
of silently skipping the poll. These are wait-for-completion methods
where callers expect the operation to have succeeded on return.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Search: Add post-indexing content date resolution via update_by_query

HashedBulkUpdate uses bulk update actions (scripted upserts) which skip
Elasticsearch ingest pipelines, so content_last_updated was never set
during normal indexing. This adds a ResolveContentDatesAsync step that
runs _update_by_query with the enrichment pipeline after indexing
completes, and switches StopAsync to use read aliases instead of the
write target (which is removed after CompleteAsync).

Includes integration tests against a real Elasticsearch container
validating cold-start, date preservation, change detection, and the
bulk-update pipeline gap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Search: Fix lint warnings in content date enrichment tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cotti cotti merged commit f68d68b into main Apr 15, 2026
36 of 37 checks passed
@cotti cotti deleted the feature/product-features branch April 15, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants