Skip to content

Commit 78c5aab

Browse files
authored
feat: add OpenTelemetry metrics support (#553)
* refactor: move tracing implementation from __init__.py to tracing.py Refactored telemetry module structure to follow Python best practices: - Moved tracing implementation from __init__.py to new tracing.py module - Updated __init__.py to only contain imports and exports - Updated backend_instrumentation.py to import from tracing module This improves code organization and maintainability by separating implementation from the package interface. Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * feat: implement OpenTelemetry metrics module Implemented comprehensive metrics support using OpenTelemetry Metrics API: - Created mellea/telemetry/metrics.py with: - Lazy MeterProvider initialization (similar to tracing pattern) - Environment-based configuration (MELLEA_METRICS_ENABLED, MELLEA_METRICS_CONSOLE) - Zero overhead when disabled (no-op instrument classes) - Named meter: mellea.metrics - Instrument creation helpers: - create_counter() - for monotonically increasing values - create_histogram() - for value distributions - create_up_down_counter() - for values that can increase/decrease - Updated mellea/telemetry/__init__.py to export metrics functions Environment Variables: - MELLEA_METRICS_ENABLED (default: false) - Enable metrics collection - MELLEA_METRICS_CONSOLE (default: false) - Print metrics to console Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * style: fix import sorting in telemetry __init__.py Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * fix: update test fixtures to reload tracing submodule Fixes test failures after f449d9a moved tracing code to submodule. Test fixtures now reload mellea.telemetry.tracing to pick up env var changes. Also moved test file to test/telemetry/test_tracing.py to mirror source structure. Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * test: add unit tests for OpenTelemetry metrics module Comprehensive test coverage for metrics configuration, lazy initialization, instrument creation, no-op behavior, and functional operations. Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * fix: update test to use tracing submodule after refactor After moving tracing code from __init__.py to tracing.py, the test fixture needs to import from mellea.telemetry.tracing instead of mellea.telemetry. Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * refactor(telemetry): remove unused boundaries param and add exporter validation Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * refactor: use dynamic version for telemetry tracers/meters Replace hardcoded version strings with importlib.metadata.version() Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * fix: use eager initialization for metrics to avoid race condition Switched from lazy to eager initialization to match tracing module pattern and prevent potential threading issues. Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> --------- Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
1 parent 7393095 commit 78c5aab

7 files changed

Lines changed: 956 additions & 242 deletions

File tree

mellea/telemetry/__init__.py

Lines changed: 62 additions & 228 deletions
Original file line numberDiff line numberDiff line change
@@ -1,242 +1,76 @@
11
"""OpenTelemetry instrumentation for Mellea.
22
3-
This module provides two independent trace scopes:
4-
1. Application Trace (mellea.application) - User-facing operations
5-
2. Backend Trace (mellea.backend) - LLM backend interactions
6-
7-
Follows OpenTelemetry Gen-AI semantic conventions:
8-
https://opentelemetry.io/docs/specs/semconv/gen-ai/
9-
10-
Configuration via environment variables:
11-
- MELLEA_TRACE_APPLICATION: Enable/disable application tracing (default: false)
12-
- MELLEA_TRACE_BACKEND: Enable/disable backend tracing (default: false)
13-
- OTEL_EXPORTER_OTLP_ENDPOINT: OTLP endpoint for trace export
14-
- OTEL_SERVICE_NAME: Service name for traces (default: mellea)
3+
This package provides observability capabilities for Mellea through OpenTelemetry,
4+
enabling tracing and metrics collection for both application-level operations and
5+
backend LLM interactions.
6+
7+
Package Structure:
8+
- tracing: Distributed tracing with two independent scopes:
9+
* Application traces (mellea.application): User-facing operations
10+
* Backend traces (mellea.backend): LLM backend interactions
11+
- metrics: Metrics collection for counters, histograms, and up-down counters
12+
- backend_instrumentation: Automatic instrumentation for backend operations
13+
14+
Configuration:
15+
All telemetry features are opt-in via environment variables:
16+
17+
Tracing:
18+
- MELLEA_TRACE_APPLICATION: Enable application tracing (default: false)
19+
- MELLEA_TRACE_BACKEND: Enable backend tracing (default: false)
20+
- OTEL_EXPORTER_OTLP_ENDPOINT: OTLP endpoint for trace export
21+
- OTEL_SERVICE_NAME: Service name for traces (default: mellea)
22+
23+
Metrics:
24+
- MELLEA_METRICS_ENABLED: Enable metrics collection (default: false)
25+
- MELLEA_METRICS_CONSOLE: Print metrics to console (default: false)
26+
- OTEL_EXPORTER_OTLP_ENDPOINT: OTLP endpoint for metric export (optional)
27+
- OTEL_SERVICE_NAME: Service name for metrics (default: mellea)
28+
29+
Dependencies:
30+
OpenTelemetry packages are optional. If not installed, telemetry features
31+
are gracefully disabled. Install with: pip install mellea[telemetry]
32+
33+
Example:
34+
from mellea.telemetry import trace_application, create_counter
35+
36+
# Trace application operations
37+
@trace_application("my_operation")
38+
def my_function():
39+
pass
40+
41+
# Collect metrics
42+
counter = create_counter("mellea.requests", unit="1")
43+
counter.add(1, {"backend": "ollama"})
1544
"""
1645

17-
import os
18-
from contextlib import contextmanager
19-
from typing import Any
20-
21-
# Try to import OpenTelemetry, but make it optional
22-
try:
23-
from opentelemetry import trace
24-
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
25-
from opentelemetry.sdk.resources import Resource
26-
from opentelemetry.sdk.trace import TracerProvider
27-
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
28-
from opentelemetry.semconv.trace import SpanAttributes
29-
30-
_OTEL_AVAILABLE = True
31-
except ImportError:
32-
_OTEL_AVAILABLE = False
33-
# Provide dummy types for type hints
34-
trace = None # type: ignore
35-
SpanAttributes = None # type: ignore
36-
37-
# Configuration from environment variables
38-
# Disable tracing if OpenTelemetry is not available
39-
_TRACE_APPLICATION_ENABLED = _OTEL_AVAILABLE and os.getenv(
40-
"MELLEA_TRACE_APPLICATION", "false"
41-
).lower() in ("true", "1", "yes")
42-
_TRACE_BACKEND_ENABLED = _OTEL_AVAILABLE and os.getenv(
43-
"MELLEA_TRACE_BACKEND", "false"
44-
).lower() in ("true", "1", "yes")
45-
_OTLP_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
46-
_SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "mellea")
47-
_CONSOLE_EXPORT = os.getenv("MELLEA_TRACE_CONSOLE", "false").lower() in (
48-
"true",
49-
"1",
50-
"yes",
46+
from .metrics import (
47+
create_counter,
48+
create_histogram,
49+
create_up_down_counter,
50+
is_metrics_enabled,
51+
)
52+
from .tracing import (
53+
end_backend_span,
54+
is_application_tracing_enabled,
55+
is_backend_tracing_enabled,
56+
set_span_attribute,
57+
set_span_error,
58+
start_backend_span,
59+
trace_application,
60+
trace_backend,
5161
)
52-
53-
54-
def _setup_tracer_provider():
55-
"""Set up the global tracer provider with OTLP exporter if configured."""
56-
if not _OTEL_AVAILABLE:
57-
return None
58-
59-
resource = Resource.create({"service.name": _SERVICE_NAME}) # type: ignore
60-
provider = TracerProvider(resource=resource) # type: ignore
61-
62-
# Add OTLP exporter if endpoint is configured
63-
if _OTLP_ENDPOINT:
64-
otlp_exporter = OTLPSpanExporter(endpoint=_OTLP_ENDPOINT) # type: ignore
65-
provider.add_span_processor(BatchSpanProcessor(otlp_exporter)) # type: ignore
66-
67-
# Add console exporter for debugging if enabled
68-
# Note: Console exporter may cause harmless errors during test cleanup
69-
if _CONSOLE_EXPORT:
70-
try:
71-
console_exporter = ConsoleSpanExporter() # type: ignore
72-
provider.add_span_processor(BatchSpanProcessor(console_exporter)) # type: ignore
73-
except Exception:
74-
# Silently ignore console exporter setup failures
75-
pass
76-
77-
trace.set_tracer_provider(provider) # type: ignore
78-
return provider
79-
80-
81-
# Initialize tracer provider if any tracing is enabled
82-
_tracer_provider = None
83-
_application_tracer = None
84-
_backend_tracer = None
85-
86-
if _OTEL_AVAILABLE and (_TRACE_APPLICATION_ENABLED or _TRACE_BACKEND_ENABLED):
87-
_tracer_provider = _setup_tracer_provider()
88-
# Create separate tracers for application and backend
89-
_application_tracer = trace.get_tracer("mellea.application", "0.3.0") # type: ignore
90-
_backend_tracer = trace.get_tracer("mellea.backend", "0.3.0") # type: ignore
91-
92-
93-
def is_application_tracing_enabled() -> bool:
94-
"""Check if application tracing is enabled."""
95-
return _TRACE_APPLICATION_ENABLED
96-
97-
98-
def is_backend_tracing_enabled() -> bool:
99-
"""Check if backend tracing is enabled."""
100-
return _TRACE_BACKEND_ENABLED
101-
102-
103-
@contextmanager
104-
def trace_application(name: str, **attributes: Any):
105-
"""Create an application trace span if application tracing is enabled.
106-
107-
Args:
108-
name: Name of the span
109-
**attributes: Additional attributes to add to the span
110-
111-
Yields:
112-
The span object if tracing is enabled, otherwise a no-op context manager
113-
"""
114-
if _TRACE_APPLICATION_ENABLED and _application_tracer is not None:
115-
with _application_tracer.start_as_current_span(name) as span: # type: ignore
116-
for key, value in attributes.items():
117-
if value is not None:
118-
_set_attribute_safe(span, key, value)
119-
yield span
120-
else:
121-
yield None
122-
123-
124-
@contextmanager
125-
def trace_backend(name: str, **attributes: Any):
126-
"""Create a backend trace span if backend tracing is enabled.
127-
128-
Follows Gen-AI semantic conventions for LLM operations.
129-
130-
Args:
131-
name: Name of the span
132-
**attributes: Additional attributes to add to the span
133-
134-
Yields:
135-
The span object if tracing is enabled, otherwise a no-op context manager
136-
"""
137-
if _TRACE_BACKEND_ENABLED and _backend_tracer is not None:
138-
with _backend_tracer.start_as_current_span(name) as span: # type: ignore
139-
# Set Gen-AI operation type
140-
span.set_attribute("gen_ai.operation.name", name)
141-
142-
for key, value in attributes.items():
143-
if value is not None:
144-
_set_attribute_safe(span, key, value)
145-
yield span
146-
else:
147-
yield None
148-
149-
150-
def start_backend_span(name: str, **attributes: Any):
151-
"""Start a backend trace span without auto-closing (for async operations).
152-
153-
Use this when you need to manually control span lifecycle, such as for
154-
async operations where the span should remain open until post-processing.
155-
156-
Args:
157-
name: Name of the span
158-
**attributes: Additional attributes to add to the span
159-
160-
Returns:
161-
The span object if tracing is enabled, otherwise None
162-
"""
163-
if _TRACE_BACKEND_ENABLED and _backend_tracer is not None:
164-
span = _backend_tracer.start_span(name) # type: ignore
165-
# Set Gen-AI operation type
166-
span.set_attribute("gen_ai.operation.name", name)
167-
168-
for key, value in attributes.items():
169-
if value is not None:
170-
_set_attribute_safe(span, key, value)
171-
return span
172-
return None
173-
174-
175-
def end_backend_span(span: Any) -> None:
176-
"""End a backend trace span.
177-
178-
Args:
179-
span: The span object to end
180-
"""
181-
if span is not None:
182-
span.end()
183-
184-
185-
def _set_attribute_safe(span: Any, key: str, value: Any) -> None:
186-
"""Set an attribute on a span, handling type conversions.
187-
188-
Args:
189-
span: The span object
190-
key: Attribute key
191-
value: Attribute value (will be converted to appropriate type)
192-
"""
193-
if value is None:
194-
return
195-
196-
# Handle different value types according to OpenTelemetry spec
197-
if isinstance(value, bool):
198-
span.set_attribute(key, value)
199-
elif isinstance(value, int | float):
200-
span.set_attribute(key, value)
201-
elif isinstance(value, str):
202-
span.set_attribute(key, value)
203-
elif isinstance(value, list | tuple):
204-
# Convert to list of strings
205-
span.set_attribute(key, [str(v) for v in value])
206-
else:
207-
# Convert other types to string
208-
span.set_attribute(key, str(value))
209-
210-
211-
def set_span_attribute(span: Any, key: str, value: Any) -> None:
212-
"""Set an attribute on a span if the span is not None.
213-
214-
Args:
215-
span: The span object (may be None if tracing is disabled)
216-
key: Attribute key
217-
value: Attribute value
218-
"""
219-
if span is not None and value is not None:
220-
_set_attribute_safe(span, key, value)
221-
222-
223-
def set_span_error(span: Any, exception: Exception) -> None:
224-
"""Record an exception on a span if the span is not None.
225-
226-
Args:
227-
span: The span object (may be None if tracing is disabled)
228-
exception: The exception to record
229-
"""
230-
if span is not None and _OTEL_AVAILABLE:
231-
span.record_exception(exception)
232-
span.set_status(trace.Status(trace.StatusCode.ERROR, str(exception))) # type: ignore
233-
23462

23563
__all__ = [
64+
"create_counter",
65+
"create_histogram",
66+
"create_up_down_counter",
67+
"end_backend_span",
23668
"is_application_tracing_enabled",
23769
"is_backend_tracing_enabled",
70+
"is_metrics_enabled",
23871
"set_span_attribute",
23972
"set_span_error",
73+
"start_backend_span",
24074
"trace_application",
24175
"trace_backend",
24276
]

mellea/telemetry/backend_instrumentation.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
from typing import Any
88

9-
from ..telemetry import set_span_attribute, trace_backend
9+
from .tracing import set_span_attribute, trace_backend
1010

1111

1212
def get_model_id_str(backend: Any) -> str:
@@ -127,7 +127,7 @@ def start_generate_span(
127127
Returns:
128128
Span object or None if tracing is disabled
129129
"""
130-
from . import start_backend_span
130+
from .tracing import start_backend_span
131131

132132
model_id = get_model_id_str(backend)
133133
system_name = get_system_name(backend)

0 commit comments

Comments
 (0)