Skip to content

Fix EBA-DEC-002 false positives on integer metrics with pure unit#111

Open
magnumog wants to merge 1 commit intoMeaningful-Data:mainfrom
magnumog:fix/eba-dec-002-integer-false-positives
Open

Fix EBA-DEC-002 false positives on integer metrics with pure unit#111
magnumog wants to merge 1 commit intoMeaningful-Data:mainfrom
magnumog:fix/eba-dec-002-integer-false-positives

Conversation

@magnumog
Copy link
Copy Markdown

@magnumog magnumog commented May 1, 2026

Description

Fixes false-positive EBA-DEC-002 findings raised on integer-typed metrics that use the pure unit (e.g. COREP "number of exposures", ALM counters). In the XBRL-XML path, every such fact is currently flagged as a percentage with insufficient decimals, producing hundreds to thousands of spurious errors on real COREP / FINREP / ALM filings. The spurious errors also block the convert pipeline when --validate / --eba is enabled, preventing XBRL-XML → XBRL-CSV conversion of otherwise-valid filings.

Root cause: Fact.metric is stored in Clark notation ({http://www.eba.europa.eu/xbrl/crr/dict/met}qAZH) as returned by lxml, but _build_metric_type_map() keys the lookup on prefix notation (eba_met:qAZH) taken from the module. The two strings never collide, so every fact falls through to _infer_type_from_unit(), which classifies any xbrli:pure fact as a percentage — including integer counters.

The fix adds a Fact.metric_qname property that exposes the metric in the same prefix form the module uses, and updates the four XML decimals rules to look up that property. For EBA-DEC-003 the unit-based fallback has been removed entirely: integer classification is taxonomy-driven or it does not happen.

Conformance to the EBA Filing Rules

Cross-checked against EBA Filing Rules v5.7 (24 November 2025), §2.18 "Interpretation of the decimals setting". The Accuracy Requirements table on p. 33 defines the allowed @decimals values by the metric's Data Type, not by its unit:

Data Type Decimals setting Representation
Monetary >= -3 (or >= -4, >= -6 for FP/ESG/P3/REM) 42563.26
Percentage >= 4 0.1234 (= 12.34 %)
Integer 0 126

Footnote 17 to the same rule is explicit that the classification is taxonomy-driven, not unit-driven:

N.B. Also applies to facts representing monetary values that are specified (via their primary item) to be reported as currency-less decimal values.

Since both Percentage and Integer metrics use xbrli:pure as their unit, the unit carries no information that can distinguish them — the validator must consult the taxonomy's primary-item classification. That is exactly what this patch now does: _build_metric_type_map() is derived from the module's Variable._attributes ($decimalsMonetary / $decimalsPercentage / $decimalsInteger / $decimalsDecimals), and each fact is resolved via its prefix-form QName. The previous code silently failed this lookup for every fact and then guessed from the unit — which is not a mechanism sanctioned anywhere in §2.18.

Source: eba_filing_rules_v5.7_2025_11_24.pdf, §2.18 "Interpretation of the decimals setting", pp. 31–34.

Real-world impact

On production COREP ALM filings (EBA Framework 4.2, submission date 2026-03-31), this bug caused several hundred spurious EBA-DEC-002 findings per instance, concentrated on integer counters such as qAZH, qCCG, and qDGB. Every one of these facts was correctly reported with decimals="0" on a pure unit, in conformance with §2.18, but xbridge misclassified them as percentages and demanded decimals >= 4.

Downstream consequence: any XBRL-XML → XBRL-CSV conversion invoked with validation enabled (convert --validate / convert --eba, or the equivalent programmatic convert_instance(..., eba=True)) aborted with ValidationError before producing the CSV report package, even though the input instance was a valid EBA filing. After this fix, the same instances convert cleanly and the post-conversion CSV output is unaffected (no CSV-side rules were modified).

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Dependency update
  • Other (please describe):

Related Issues

Closes #
Related to #

Changes Made

  • src/xbridge/instance.py — added Fact.metric_qname property that returns the fact's metric normalised to prefix form (eba_met:qAZH) via the existing _normalize_metric_value() helper. The raw Fact.metric (Clark notation) is left unchanged for backward compatibility. Result is cached on the instance and invalidated on re-parse().
  • src/xbridge/validation/rules/eba_decimals.py
    • Updated the docstring of _build_metric_type_map() to state that its keys are the prefix form used by the module.
    • Introduced _lookup_metric_type() as the single resolution helper used by DEC-001 (monetary) and DEC-002 (percentage). Lookup order: module type_map via Fact.metric_qname, then unit-based inference as a fallback when no module is loaded.
    • Added a _logger.debug diagnostic when a module is loaded but a metric is not present in the type_map — surfaces data-quality / taxonomy-mismatch issues without spamming normal runs.
    • check_integer_decimals_xml now uses the taxonomy type_map only — the unit-based fallback that caused the false positives has been removed. Integer classification without a module is a no-op (matching the original intent of EBA-DEC-003).
    • check_realistic_decimals_xml switched to metric_qname for consistent finding context.
  • tests/test_eba_decimals.py — added 12 regression tests across five new classes:
    • TestEBADEC002IntegerMetricRegression — the exact failure mode (integer metric + pure unit + decimals="0" / "4"): must not trigger DEC-002 and, with a module loaded, triggers DEC-003 only when appropriate.
    • TestEBADEC002PercentageMetricWithModule — genuine percentage metrics still trigger DEC-002 when a module is loaded (confirms the prefix path works end-to-end, not just that Clark matches fail).
    • TestEBADEC001MonetaryMetricWithModule — monetary metrics still trigger DEC-001.
    • TestFactMetricNormalisation — verifies Fact.metric remains Clark-form, Fact.metric_qname is prefix-form and is cached per-fact.
    • TestLookupMetricTypeFallbackLogging — confirms the debug log is emitted when (and only when) a module is loaded and a metric is missing from it.
  • CHANGELOG.md — entry added under ## [Unreleased]### Fixed.

Testing

Tests Added

  • Unit tests
  • Integration tests
  • Test coverage maintained or improved

Testing Performed

# Full suite
python -m pytest tests/ -q
# Targeted rule suite (includes 12 new regressions)
python -m pytest tests/test_eba_decimals.py -v
# Style/type gates
python -m ruff check src/xbridge/instance.py src/xbridge/validation/rules/eba_decimals.py tests/test_eba_decimals.py
python -m ruff format --check src/xbridge/instance.py src/xbridge/validation/rules/eba_decimals.py tests/test_eba_decimals.py

Test results:

  • All existing tests pass — 962 passed in the full suite.
  • New tests pass — 65 passed in test_eba_decimals.py (53 pre-existing + 12 new).

Documentation

  • Updated docstrings — _build_metric_type_map, new metric_qname property, new _lookup_metric_type helper, and check_integer_decimals_xml explain the prefix-keyed lookup and why the integer rule does not fall back to units.
  • Updated README.md
  • Updated documentation in docs/ — no user-facing API change (rule codes and semantics unchanged).
  • Updated CHANGELOG.md (added entry under "Unreleased")
  • No documentation needed for this change

Code Quality

  • Code follows the project's style guidelines (Ruff)
  • Ran ruff check and ruff format — clean.
  • Ran mypy type checking — no new errors introduced (pre-existing missing-stubs warnings for pandas / lxml unchanged).
  • Self-review of code completed
  • Comments added for complex/non-obvious code — especially around the intentional removal of the integer-rule unit fallback.
  • No new warnings generated

Breaking Changes

None.

Impact: Fact.metric retains its existing Clark-notation value; no callers are forced to migrate. Fact.metric_qname is a new additive property.

Migration guide: N/A.

Screenshots (if applicable)

N/A — validator-only change.

Checklist

  • My code follows the project's code style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published
  • I have updated the CHANGELOG.md

Additional Notes

Why not normalise Fact.metric itself to prefix form?
Doing so would silently change the value of a public attribute that callers (and downstream projects) may rely on — a behavioural breaking change with no visible signal. Adding a new metric_qname property is purely additive and lets callers opt in.

Why remove the unit-based fallback only in the integer rule?
For monetary and percentage facts, the unit (iso4217:*, xbrli:pure) is an adequate correctness signal even in the absence of a module — a monetary fact always has a currency unit and a percentage fact almost always has pure. No such signal exists for integer counters: they share the pure unit with percentages, which is exactly the ambiguity that produced this bug. Falling back to units there is guessing, and the guess is wrong for every integer metric. This is also consistent with EBA Filing Rules §2.18 footnote 17, which ties classification to the primary item (taxonomy data type), not the unit.

Reviewer Notes

Areas to focus on:

  • _lookup_metric_type() in eba_decimals.py — single entry point for all three module-driven rules; confirm the fallback ordering matches your expectations.
  • Removal of the unit-based fallback in check_integer_decimals_xml — this is the one semantics change in the patch. Integer classification is now strictly taxonomy-driven.
  • Fact.metric_qname cache invalidation in Fact.parse() — ensures re-parse (if it ever happens) produces fresh values.

Questions for reviewers:

  • Are there any existing callers of Fact.metric outside the decimals rules that would benefit from migrating to metric_qname in a follow-up?
  • Do you want the missing-metric diagnostic promoted from DEBUG to INFO / a dedicated warning rule? Keeping it at DEBUG for now to avoid noise on filings that intentionally use out-of-module metrics.

Fact.metric is stored in Clark notation ({namespace}localname), but
_build_metric_type_map() was keyed on prefix notation (eba_met:qXYZ)
taken from the module. Every lookup missed, and the validator fell
back to unit-based inference which classifies every pure-unit fact as
a percentage — including integer counters such as qAZH, qCCG, qDGB.

- Add Fact.metric_qname property that exposes the metric in the same
  prefix form the module uses (via the existing _normalize_metric_value
  helper). Fact.metric is left unchanged for backward compatibility.
- Update DEC-001, DEC-002 and the realistic rule to use metric_qname.
- Update DEC-003 to use metric_qname AND remove the unit-based fallback:
  integer classification is taxonomy-driven or it does not happen. The
  unit (xbrli:pure) carries no signal that distinguishes integer from
  percentage metrics, and EBA Filing Rules §2.18 ties classification to
  the primary item, not the unit.
- Add debug log when a module is loaded but the metric is not in its
  type_map — surfaces data-quality / taxonomy-mismatch issues.
- Add 12 regression tests across 5 classes covering the failure mode,
  the positive paths (percentage + monetary still flagged correctly),
  metric_qname normalisation, and fallback logging.
- CHANGELOG entry under [Unreleased] → Fixed.

Tests: 962 passed in full suite; 65 passed in tests/test_eba_decimals.py
(53 pre-existing + 12 new). ruff check clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant