Develop -> Master merge, release v7.0.0 by mdorf · Pull Request #282 · ncbo/ontologies_linked_data

mdorf · 2026-04-21T19:13:27Z

Overview

This PR merges develop into master for the v7.0.0 release of ncbo/ontologies_linked_data.

This is a major release and includes the substantial synchronization and modernization work recently introduced into develop, including the large alignment effort with the AgroPortal codebase and the updates required for compatibility with the AgroPortal-based versions of goo and sparql-client.

Highlights

This release is primarily driven by the work introduced in:

Sync: align ncbo/ontologies_linked_data with the AgroPortal codebase #266

Key changes include:

major synchronization of ontologies_linked_data with the AgroPortal implementation
support for schemaless Solr enabled by the updated GOO architecture
expanded triple-store backend compatibility, including Virtuoso and GraphDB
substantial refactoring of ontology submission processing
dynamic Solr schema support across several models
OAuth authentication and authorization improvements
infrastructure modernization, including Ruby 3.2, newer Minitest, and updated ActiveSupport compatibility
OntoPortal testkit integration and related CI/testing updates
BioPortal-specific improvements such as updated label generation logic

Prerequisites

This release assumes the AgroPortal-based replacements of the following repositories are already in place:

Notes

This PR is a release merge from develop into master. Version tagging and any follow-up release steps should be performed after the merge.

Post-merge

Tag master as:

v7.0.0

… develop

…onger occur, removed test

…_submissions

…ontology

…attributes 5 by 5

…copy the portal language label into the generic one

…n that caused intermittent errors

Refs ncbo/bioportal-project#386

Chore: disable index_all_data by default during submission processing

This method was moved to SubmissionMetricsCalculator during a prior refactoring but the original copy was left behind. No callers exist in this repo or dependent projects (ontologies_api, ncbo_cron, ncbo_annotator).

No CSV usage remains in this file after removal of metrics_for_submission. The csv library is still required by ontology_submission.rb and submission_mertrics_calculator.rb where it is needed.

No external callers found in this repo or dependent projects. Keeping the method for now pending further validation.

Verifies that class_count returns -1 gracefully when no metrics exist in the triplestore and no CSV fallback is available.

The inner rescue in metrics_for_submission caught errors, logged a minimal message, and returned nil. This masked the real error — the caller (compute_metrics) would then fail with NoMethodError on nil, and the outer rescue in process_metrics would log that misleading error instead of the root cause. process_metrics already handles errors properly: logs the real exception with full backtrace and sets the METRICS error status. The inner rescue was redundant and harmful.

max_depth_fn was reading maxDepth from the CSV file generated by owlapi_wrapper regardless of the flat flag. owlapi_wrapper has no knowledge of BioPortal's flat designation, so it reports the real tree depth. Now we short-circuit and return 0 for flat ontologies before any CSV or SPARQL calculation.

class_count was falling back to reading metrics.csv from disk when triplestore metrics were absent. This caused errors on API nodes where the file does not exist or is missing for older submissions. The API should always read metrics from the triplestore. The CSV file should only be consumed during ontology parsing in ncbo_cron.

query_groupby_classes was called with rdfsSC=nil for flat ontologies, producing invalid SPARQL (<> predicate). This was silently tolerated by 4store but caused a SPARQL::Client::MalformedQuery error on GraphDB, preventing the metrics status from being set. The groupby_children results were already unused for flat ontologies (the loop body was guarded by `unless is_flat`), so the query was wasteful even when it didn't error. Moved the entire block inside the `unless is_flat` guard.

During term indexing, index_doc called retrieve_hierarchy_ids per class, issuing iterative SPARQL queries level-by-level to collect ancestors. For large ontologies (100K+ classes), this produced hundreds of thousands of SPARQL round-trips. Replace with a single paginated SPARQL query to fetch all parent-child edges, then compute the transitive closure in memory using memoized BFS. The precomputed ancestor map is stored as a class-level cache on LinkedData::Models::Class for the duration of bulk indexing and cleared in an ensure block afterward.

Add test_ancestors_precompute.rb covering linear chain, diamond inheritance, multiple roots, cycles, complex DAG, memoization, and edge cases. All tests are pure in-memory, no triplestore required. Add temporary per-class validation in the indexing loop that compares precomputed ancestors against the old retrieve_hierarchy_ids SPARQL traversal for every class. Logs warnings on mismatches. To be removed once validated against production data.

Per-class ancestor validation is expensive (runs both old and new for every class). Only enable it when explicitly requested via OP_VALIDATE_ANCESTORS=1 so it does not slow down normal indexing.

When OP_VALIDATE_ANCESTORS=1 is set, log old vs new timing for each class and whether ancestors matched or mismatched. Useful for comparing SPARQL traversal cost against in-memory cache lookup.

libxml-ruby v6 removed lib/xml.rb which provided `require 'xml'` and mixed LibXML into the global namespace. Switch to `require 'libxml'` and use the fully qualified LibXML::XML:: namespace.

Per PR review feedback, maxDepth should reflect the actual hierarchy depth regardless of the flat flag. The flat flag is a UI/browsing concern, not a statement about ontology structure.

Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback

Fix missing label retry state leaking across ontology processing

Replace @old_ancestors_result/@new_ancestors_result instance variables with local variables. Rename to sparql_ancestors/cached_ancestors for clarity on what each represents. Addresses mdorf's review feedback on PR #279.

The test was pinning broken behavior (GH-274) where SKOS submissions without skos:Concept wrongly entered the retry path. PR #277 removed the CSV class_count fallback, which indirectly fixed the trigger — the SPARQL-reported class count now agrees with the actual empty result, so total_pages stays 0 and the loop exits cleanly. Rename the test and assert the correct outcome: processing completes, RDF_LABELS is set, ERROR_RDF_LABELS is not, and requested_lang is cleared. Closes GH-274.

Precompute ancestor hierarchy to speed up term indexing

libxml-ruby v6 ships only lib/libxml-ruby.rb as a top-level loadable file. `require 'libxml'` raises LoadError, which broke `require 'ontologies_linked_data'` at load time. Follow-up to 71cb17a, and also fixes pre-existing breakage in parse_diff_file.rb that had the same require.

mdorf added 30 commits August 6, 2025 10:31

Merge branch 'develop' of github.com:ncbo/ontologies_linked_data into…

6add8d1

… develop

Merge branch 'develop' of github.com:ncbo/ontologies_linked_data into…

4e58bce

… develop

Merge branch 'develop' of github.com:ncbo/ontologies_linked_data into…

e898033

… develop

Merge branch 'develop' of github.com:ncbo/ontologies_linked_data into…

73b4f60

… develop

work-in-progress on schemaless solr

f94def5

fixed an issue with oboId and Solr indexing

1bb138b

incremental integration commit

836b2dc

fixed submission archiving API

447f470

improvements to index_doc and generate_missing_labels_each

95a1cef

further fix to generate_missing_labels_each; disappearing values no l…

acfa98b

…onger occur, removed test

make sure no notification sent for archived ontologies

e8c4c2b

fix to a notification test

5de3ba3

fixed tests in test_class and removed solr old config

772802b

fixed test_roots_of_multiple_scheme, which was failing in AG

0c128fc

Gemfile.lock update

ae232f1

added missing method self.clear_indexed_content; refactored for clarity

16e17a3

upgraded to the latest version of minitest (5.1)

ad99768

enabled fuzzy_search for some attributes; fixed create_ontologies_and…

e38ed28

…_submissions

fixed a bug in submission_all_data_indexer; fixed #261

82797fc

added onUpdate: :update_submissions_has_part to :viewOf attribute of …

82aca9e

…ontology

added a custom bring_remaining method to OntologySubmission to query …

34bf949

…attributes 5 by 5

migrate to ruby 3.2

5c84a23

improved mappings handling

7373727

fix for label generation to give prefence to prefLabels over rdfs:labels

d9aefd5

updated a class in BRO test ontology to be used in unit tests

77a9806

if portal language label exists but no generic prefLabel is defined, …

3dd2309

…copy the portal language label into the generic one

if portal language label exists but no generic prefLabel is defined, …

d867727

…copy the portal language label into the generic one

upgraded minitest from 5 to 6; activesupport from 4 to 8

139bb2e

removed cube integration

8c20401

resolved an issue with the test test_remote_ontology_pull_notificatio…

9282060

…n that caused intermittent errors

alexskr and others added 29 commits April 8, 2026 09:07

Increase length of ontology submission version attribute

25e8aae

Refs ncbo/bioportal-project#386

Merge pull request #273 from ncbo/chore/disable-index-all-data

9741421

Chore: disable index_all_data by default during submission processing

Handle missing label page fetch failures as errors

ba27081

Fix missing label fetch failure handling and add regression test

7ea711b

Use temp directories for test repository and logs

1932dc1

Add missing-label retry regression tests

a35157e

Remove dead metrics_for_submission from metrics.rb

e16282b

This method was moved to SubmissionMetricsCalculator during a prior refactoring but the original copy was left behind. No callers exist in this repo or dependent projects (ontologies_api, ncbo_cron, ncbo_annotator).

Remove unused require 'csv' from metrics.rb

037f57c

No CSV usage remains in this file after removal of metrics_for_submission. The csv library is still required by ontology_submission.rb and submission_mertrics_calculator.rb where it is needed.

Add TODO comment marking recursive_depth as potentially unused

485eb29

No external callers found in this repo or dependent projects. Keeping the method for now pending further validation.

Add unit test for class_count when metrics are absent

d32bfd3

Verifies that class_count returns -1 gracefully when no metrics exist in the triplestore and no CSV fallback is available.

Gate ancestor validation behind OP_VALIDATE_ANCESTORS env var

96438b9

Per-class ancestor validation is expensive (runs both old and new for every class). Only enable it when explicitly requested via OP_VALIDATE_ANCESTORS=1 so it does not slow down normal indexing.

Add timing and per-class detail to ancestor validation logging

e8b6262

When OP_VALIDATE_ANCESTORS=1 is set, log old vs new timing for each class and whether ancestors matched or mismatched. Useful for comparing SPARQL traversal cost against in-memory cache lookup.

Fix libxml-ruby v6 compatibility: replace removed require 'xml' path

71cb17a

libxml-ruby v6 removed lib/xml.rb which provided `require 'xml'` and mixed LibXML into the global namespace. Switch to `require 'libxml'` and use the fully qualified LibXML::XML:: namespace.

Revert maxDepth=0 for flat ontologies, fix rdfsSC nil predicate

3c12fd6

Per PR review feedback, maxDepth should reflect the actual hierarchy depth regardless of the flat flag. The flat flag is a UI/browsing concern, not a statement about ontology structure.

Merge pull request #277 from ncbo/fix/metrics-cleanup

caebcd0

Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback

Merge pull request #272 from ncbo/fix/missing-labels-state-leak

68036dd

Fix missing label retry state leaking across ontology processing

Fix instance variable leak in validate_class_ancestors

5ffb7b6

Replace @old_ancestors_result/@new_ancestors_result instance variables with local variables. Rename to sparql_ancestors/cached_ancestors for clarity on what each represents. Addresses mdorf's review feedback on PR #279.

Merge pull request #279 from ncbo/feature/precompute-ancestors-indexing

d2baae3

Precompute ancestor hierarchy to speed up term indexing

Merge branch 'master' into develop

025a7e5

Gemfile.lock update

6982fc0

Gemfile & Gemfile.lock update

4a02eda

mdorf merged commit bcd3dd1 into master Apr 21, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop -> Master merge, release v7.0.0#282

Develop -> Master merge, release v7.0.0#282
mdorf merged 113 commits intomasterfrom
develop

mdorf commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mdorf commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Highlights

Prerequisites

Notes

Post-merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mdorf commented Apr 21, 2026 •

edited

Loading