You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(telemetry): latency histograms for LLM request duration and TTFB (#463) (#782)
* feat(telemetry): latency histograms for LLM request duration and TTFB (#463)
Adds request duration and time-to-first-token (TTFB) latency histograms
via the plugin pattern established in #653. Includes custom OTel bucket
views sized for LLM latencies, backend telemetry field assertions across
all backends, and updated dev/published docs.
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
* fix(telemetry): address PR review feedback on latency histograms
- Fix TTFB measurement: capture timestamp on first chunk in both the
non-blocking drain and blocking min-chunk loops, not after both loops
complete (previously measured time-to-Nth-chunk, not time-to-first)
- Add inline comment explaining @@@stream@@@ hard-coded string avoids
circular import between core and backends
- Remove invalid enable_metrics() calls from plugin docstring examples;
metrics are enabled via MELLEA_METRICS_ENABLED env var only
- Move _METRICS_PLUGIN_CLASSES to metrics_plugins.py alongside the
classes it describes; import it in metrics.py registration block
- Change LatencyMetricsPlugin priority from 50 to 51 so plugins have
distinct execution order
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
* refactor: extract _record_ttfb helper to deduplicate TTFB logic
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
---------
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
-**Friendly Dependency Errors**: Wraps optional backend imports in `try/except ImportError` with a helpful message (e.g., "Please pip install mellea[hf]"). See `mellea/stdlib/session.py` for examples.
91
-
-**Backend telemetry fields**: All backends must populate `mot.usage` (dict with `prompt_tokens`, `completion_tokens`, `total_tokens`), `mot.model` (str), and `mot.provider` (str) in their `post_processing()` method. Metrics are automatically recorded by `TokenMetricsPlugin` — don't add manual `record_token_usage_metrics()` calls.
91
+
-**Backend telemetry fields**: All backends must populate `mot.usage` (dict with `prompt_tokens`, `completion_tokens`, `total_tokens`), `mot.model` (str), and `mot.provider` (str) in their `post_processing()` method. `mot.streaming` (bool) and `mot.ttfb_ms` (float | None) are set automatically in `astream()` — backends do not need to set them. Metrics are automatically recorded by `TokenMetricsPlugin`and `LatencyMetricsPlugin`— don't add manual `record_token_usage_metrics()` or `record_request_duration()` calls.
0 commit comments