Conversation
…onger occur, removed test
…attributes 5 by 5
…copy the portal language label into the generic one
…copy the portal language label into the generic one
…n that caused intermittent errors
Chore: disable index_all_data by default during submission processing
This method was moved to SubmissionMetricsCalculator during a prior refactoring but the original copy was left behind. No callers exist in this repo or dependent projects (ontologies_api, ncbo_cron, ncbo_annotator).
No CSV usage remains in this file after removal of metrics_for_submission. The csv library is still required by ontology_submission.rb and submission_mertrics_calculator.rb where it is needed.
No external callers found in this repo or dependent projects. Keeping the method for now pending further validation.
Verifies that class_count returns -1 gracefully when no metrics exist in the triplestore and no CSV fallback is available.
The inner rescue in metrics_for_submission caught errors, logged a minimal message, and returned nil. This masked the real error — the caller (compute_metrics) would then fail with NoMethodError on nil, and the outer rescue in process_metrics would log that misleading error instead of the root cause. process_metrics already handles errors properly: logs the real exception with full backtrace and sets the METRICS error status. The inner rescue was redundant and harmful.
max_depth_fn was reading maxDepth from the CSV file generated by owlapi_wrapper regardless of the flat flag. owlapi_wrapper has no knowledge of BioPortal's flat designation, so it reports the real tree depth. Now we short-circuit and return 0 for flat ontologies before any CSV or SPARQL calculation.
class_count was falling back to reading metrics.csv from disk when triplestore metrics were absent. This caused errors on API nodes where the file does not exist or is missing for older submissions. The API should always read metrics from the triplestore. The CSV file should only be consumed during ontology parsing in ncbo_cron.
query_groupby_classes was called with rdfsSC=nil for flat ontologies, producing invalid SPARQL (<> predicate). This was silently tolerated by 4store but caused a SPARQL::Client::MalformedQuery error on GraphDB, preventing the metrics status from being set. The groupby_children results were already unused for flat ontologies (the loop body was guarded by `unless is_flat`), so the query was wasteful even when it didn't error. Moved the entire block inside the `unless is_flat` guard.
During term indexing, index_doc called retrieve_hierarchy_ids per class, issuing iterative SPARQL queries level-by-level to collect ancestors. For large ontologies (100K+ classes), this produced hundreds of thousands of SPARQL round-trips. Replace with a single paginated SPARQL query to fetch all parent-child edges, then compute the transitive closure in memory using memoized BFS. The precomputed ancestor map is stored as a class-level cache on LinkedData::Models::Class for the duration of bulk indexing and cleared in an ensure block afterward.
Add test_ancestors_precompute.rb covering linear chain, diamond inheritance, multiple roots, cycles, complex DAG, memoization, and edge cases. All tests are pure in-memory, no triplestore required. Add temporary per-class validation in the indexing loop that compares precomputed ancestors against the old retrieve_hierarchy_ids SPARQL traversal for every class. Logs warnings on mismatches. To be removed once validated against production data.
Per-class ancestor validation is expensive (runs both old and new for every class). Only enable it when explicitly requested via OP_VALIDATE_ANCESTORS=1 so it does not slow down normal indexing.
When OP_VALIDATE_ANCESTORS=1 is set, log old vs new timing for each class and whether ancestors matched or mismatched. Useful for comparing SPARQL traversal cost against in-memory cache lookup.
libxml-ruby v6 removed lib/xml.rb which provided `require 'xml'` and mixed LibXML into the global namespace. Switch to `require 'libxml'` and use the fully qualified LibXML::XML:: namespace.
Per PR review feedback, maxDepth should reflect the actual hierarchy depth regardless of the flat flag. The flat flag is a UI/browsing concern, not a statement about ontology structure.
Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback
Fix missing label retry state leaking across ontology processing
Replace @old_ancestors_result/@new_ancestors_result instance variables with local variables. Rename to sparql_ancestors/cached_ancestors for clarity on what each represents. Addresses mdorf's review feedback on PR #279.
The test was pinning broken behavior (GH-274) where SKOS submissions without skos:Concept wrongly entered the retry path. PR #277 removed the CSV class_count fallback, which indirectly fixed the trigger — the SPARQL-reported class count now agrees with the actual empty result, so total_pages stays 0 and the loop exits cleanly. Rename the test and assert the correct outcome: processing completes, RDF_LABELS is set, ERROR_RDF_LABELS is not, and requested_lang is cleared. Closes GH-274.
Precompute ancestor hierarchy to speed up term indexing
libxml-ruby v6 ships only lib/libxml-ruby.rb as a top-level loadable file. `require 'libxml'` raises LoadError, which broke `require 'ontologies_linked_data'` at load time. Follow-up to 71cb17a, and also fixes pre-existing breakage in parse_diff_file.rb that had the same require.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR merges
developintomasterfor thev7.0.0release ofncbo/ontologies_linked_data.This is a major release and includes the substantial synchronization and modernization work recently introduced into
develop, including the large alignment effort with the AgroPortal codebase and the updates required for compatibility with the AgroPortal-based versions ofgooandsparql-client.Highlights
This release is primarily driven by the work introduced in:
Key changes include:
ontologies_linked_datawith the AgroPortal implementationPrerequisites
This release assumes the AgroPortal-based replacements of the following repositories are already in place:
Notes
This PR is a release merge from
developintomaster. Version tagging and any follow-up release steps should be performed after the merge.Post-merge
Tag
masteras: