Feature: Rule-based Fact Engine#59
Open
izzet wants to merge 9 commits into
Open
Conversation
added 8 commits
June 24, 2026 11:39
…+ rules Leaf additions for the facts feature, re-ported onto current main: - types.py: AnalysisFact / FactWindow / FactScope / FactSeverity / FactProvenance / FactEnvelope(+Context) (analyzer.fact-envelope.v1), appended after Views; main's Output*/AnalysisResult types untouched. - scoring.py: continuous slope-severity (normalize_slope) re-centered on the proportional baseline. - configs/schemas/analyzer.fact-envelope.v1.schema.json + configs/fact_rules/*.yaml. - meson.build packages scoring.py + the rule/schema data. Nothing is wired into the analyzer yet, so view/HLM output is structurally identical to main (verified). Leaf unit tests pass (scoring + fact dataclass roundtrip).
fact_engine.py (FactEngine rule builder, MetricFactBuilder slope builder, FactEmitter aggregate/detail + TEMPORAL_VIEW_TYPES, FactPipeline) and fact_rules.py (rule compile + validation). Self-contained on types/scoring/fact_rules; driven by synthetic flat_views, not wired into the analyzer -- view/HLM output stays structurally identical to main (verified). Tests: test_fact_engine / test_fact_rules / test_aggregate_facts + test_metric_facts (22 pass; 2 pipeline tests deferred to stage 3 FactsConfig).
Add FactsConfig (enabled / eval_mode / eval_rule_file / emit_flat_views / emit_mode / strict_time_semantics / allow_mixed_time_aggregates) and a `facts` field on Config. Additive: the analyzer does not consume cfg.facts yet (stage 4 wires it), so Hydra composition and view/HLM output are unchanged (verified). Unblocks the 2 deferred FactPipeline.from_facts_config tests -> full fact-engine suite now 27 pass.
…s-off-safe) Analyzer gains facts_config + fact_pipeline (built only when facts.enabled) and the _build_facts_config / _evaluate_analysis_facts / _materialize_output_artifacts methods; _analyze_hlm evaluates facts over the flat views and gates them by emit_flat_views. AnalysisResult gains analysis_facts + get_analysis_facts/iter_analysis_facts/ to_fact_envelope. With facts disabled (the default) the pipeline is None and flat views pass through untouched -> view/HLM output is structurally identical to main (verified on the dlio trace: file_name 48x1794, proc_name 6x1782). Facts-on path (reader window column + output envelope) follows in 4b/4c.
…=file) Wire facts_config from Hydra into the analyzer (init_with_hydra passes hydra_config.facts), and add FileOutput (output=file): writes the offline bundle facts.jsonl + detail_view_*.parquet + raw_stats.json that dfdiagnoser input=file consumes. Verified end-to-end on main's reader (dftracer-dlio, dftracer-utils 0.0.10): facts.enabled=true on the time_range view -> 84 analysis_facts -> FileOutput bundle -> diagnoser -> io_present finding. Facts-off remains structurally identical to main (file_name 48x1794, proc_name 6x1782). Streaming/window output (ZMQOutput/MofkaOutput) + the window view follow in stage 5.
…ptimizer chain Add a Facts section to the README: the opt-in facts.enabled model (additive; default output unchanged), rule vs metric builders, output=file bundle (facts.jsonl + detail_view_*.parquet + raw_stats.json), the full offline chain (dfanalyzer output=file -> dfdiagnoser input=file -> dfoptimizer), the time_range/window temporal axis note, and a facts.* config table.
…numbers) Correct the optimizer invocation (python main.py --transport file, not python -m dfoptimizer), add the eval_rule_file flag, and cite the verified end-to-end run (reader_pressure time_range -> 76 facts -> finding persistence 39 -> 2 ActionPlans). Clarify time_range as the offline axis; epoch/window via streaming.
…ule layer names) Pair with the dftracer-utils distributed-scan epoch assignment: - _postread_hlm_config passes epoch_query (the preset's epoch layer def) so the scan assigns per-pid epochs; add epoch/step to HLM_INT_INDEX_COLS so they index the HLM. - Reconcile the dlio* fact rules to main's AILogging preset layer naming: fetch_iter -> fetch_data (main renamed the per-iteration fetch layer; 0 fetch_iter remain in the preset). source_view stays epoch. Verified: offline view_types=[epoch] + shipped dlio.yaml -> fetch_pressure fact; facts-off structurally unchanged; time_range path unaffected (76 facts); 27 fact tests.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #59 +/- ##
===========================================
+ Coverage 26.37% 30.18% +3.81%
===========================================
Files 27 30 +3
Lines 3667 3671 +4
===========================================
+ Hits 967 1108 +141
+ Misses 2700 2563 -137 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a major new feature: integration of an "analysis facts" pipeline into DFAnalyzer, enabling machine-readable bottleneck signals that can be consumed by DFDiagnoser and DFOptimizer. This is an opt-in, configurable system for producing compact, actionable findings from analysis runs. The PR also adds the configuration and infrastructure needed to support this feature, including a sample ruleset for DLIO workloads.
The most important changes are:
Analysis Facts Pipeline Integration:
FactsConfigdataclass and corresponding configuration options to enable, configure, and control the emission of analysis facts. These facts can be generated by rule-based (YAML) or metric-based evaluation modes, and are opt-in by default. (python/dftracer/analyzer/config.py[1] [2]Analyzerclass to initialize and run the facts pipeline when enabled, evaluating facts over flat views and including them in the analysis result. (python/dftracer/analyzer/analyzer.py[1] [2] [3] [4] [5]python/dftracer/analyzer/__init__.py[1]python/dftracer/analyzer/config.py[2]Output and Configuration Enhancements:
FileOutputConfigfor writing output bundles, including facts and supporting files, for downstream consumption by DFDiagnoser and DFOptimizer. (python/dftracer/analyzer/config.py[1] [2]Ruleset Example:
dlio-all.yaml) for fact generation, covering various bottleneck types and optimization opportunities. (python/dftracer/analyzer/configs/fact_rules/dlio-all.yamlpython/dftracer/analyzer/configs/fact_rules/dlio-all.yamlR1-R194)Documentation:
README.mdwith detailed instructions and examples for using the analysis facts feature, including end-to-end workflow and configuration options. (README.md README.mdR119-R207)These changes collectively enable DFAnalyzer to produce actionable, machine-readable findings for downstream diagnosis and optimization, with a flexible and extensible configuration system.