Skip to content

Commit c32c783

Browse files
feat(llma): pass raw provider usage metadata for backend cost calculations (#411)
* feat: pass raw provider usage metadata for backend cost calculations Add raw_usage field to TokenUsage type to capture raw provider usage metadata (OpenAI, Anthropic, Gemini). This enables the backend to extract modality-specific token counts (text vs image vs audio) for accurate cost calculations. - Add raw_usage field to TokenUsage TypedDict - Update all provider converters to capture raw usage: - OpenAI: capture response.usage and chunk usage - Anthropic: capture usage from message_start and message_delta events - Gemini: capture usage_metadata from responses and chunks - Pass raw usage as $ai_usage property in PostHog events - Update merge_usage_stats to handle raw_usage in both modes - Add tests verifying $ai_usage is captured for all providers Backend will extract provider-specific details and delete $ai_usage after processing to avoid bloating properties. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: add serialize_raw_usage helper to ensure JSON serializability Address PR review feedback from @andrewm4894: 1. **Serialization**: Add serialize_raw_usage() helper with fallback chain: - .model_dump() for Pydantic models (OpenAI/Anthropic) - .to_dict() for protobuf-like objects - vars() for simple objects - str() as last resort This ensures we never pass unserializable objects to PostHog client. 2. **Data loss prevention**: Change from replacing to merging raw_usage in incremental mode. For Anthropic streaming, message_start has input token details and message_delta has output token details - merging preserves both instead of losing input data. 3. **Test coverage**: Enhanced tests to verify: - JSON serializability with json.dumps() - Expected structure of raw_usage dicts - Coverage for both non-streaming and streaming modes - Fixed Gemini test mocks to return proper dicts from model_dump() Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: move raw_usage serialization from utils to converters Address PR feedback from @andrewm4894 - serialize in converters, not utils. **Problem:** Utils was receiving raw Pydantic/protobuf objects and serializing them, which meant provider-specific knowledge leaked into generic code. **Solution:** Move serialization into converters where provider context exists: Converters (NEW): - OpenAI: serialize_raw_usage(response.usage) → dict - Anthropic: serialize_raw_usage(event.usage) → dict - Gemini: serialize_raw_usage(metadata) → dict Utils (SIMPLIFIED): - Just passes dicts through, no serialization needed - Merge operations work with dicts only **Benefits:** 1. Type correctness: raw_usage is always Dict[str, Any] 2. Separation of concerns: converters handle provider formats 3. Fail fast: serialization errors in converters with context 4. Cleaner abstraction: utils doesn't know about Pydantic/protobuf **Flow:** Provider object → Converter serializes → dict → Utils → PostHog Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: add type annotation for current_raw to satisfy mypy Fix mypy error: "Need type annotation for 'current_raw'" Extract value first, then apply explicit type annotation with ternary conditional to satisfy mypy's type checker. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 1875b71 commit c32c783

8 files changed

Lines changed: 218 additions & 0 deletions

File tree

posthog/ai/anthropic/anthropic_converter.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
TokenUsage,
1818
ToolInProgress,
1919
)
20+
from posthog.ai.utils import serialize_raw_usage
2021

2122

2223
def format_anthropic_response(response: Any) -> List[FormattedMessage]:
@@ -221,6 +222,12 @@ def extract_anthropic_usage_from_response(response: Any) -> TokenUsage:
221222
if web_search_count > 0:
222223
result["web_search_count"] = web_search_count
223224

225+
# Capture raw usage metadata for backend processing
226+
# Serialize to dict here in the converter (not in utils)
227+
serialized = serialize_raw_usage(response.usage)
228+
if serialized:
229+
result["raw_usage"] = serialized
230+
224231
return result
225232

226233

@@ -247,6 +254,11 @@ def extract_anthropic_usage_from_event(event: Any) -> TokenUsage:
247254
usage["cache_read_input_tokens"] = getattr(
248255
event.message.usage, "cache_read_input_tokens", 0
249256
)
257+
# Capture raw usage metadata for backend processing
258+
# Serialize to dict here in the converter (not in utils)
259+
serialized = serialize_raw_usage(event.message.usage)
260+
if serialized:
261+
usage["raw_usage"] = serialized
250262

251263
# Handle usage stats from message_delta event
252264
if hasattr(event, "usage") and event.usage:
@@ -262,6 +274,12 @@ def extract_anthropic_usage_from_event(event: Any) -> TokenUsage:
262274
if web_search_count > 0:
263275
usage["web_search_count"] = web_search_count
264276

277+
# Capture raw usage metadata for backend processing
278+
# Serialize to dict here in the converter (not in utils)
279+
serialized = serialize_raw_usage(event.usage)
280+
if serialized:
281+
usage["raw_usage"] = serialized
282+
265283
return usage
266284

267285

posthog/ai/gemini/gemini_converter.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
FormattedMessage,
1313
TokenUsage,
1414
)
15+
from posthog.ai.utils import serialize_raw_usage
1516

1617

1718
class GeminiPart(TypedDict, total=False):
@@ -487,6 +488,12 @@ def _extract_usage_from_metadata(metadata: Any) -> TokenUsage:
487488
if reasoning_tokens and reasoning_tokens > 0:
488489
usage["reasoning_tokens"] = reasoning_tokens
489490

491+
# Capture raw usage metadata for backend processing
492+
# Serialize to dict here in the converter (not in utils)
493+
serialized = serialize_raw_usage(metadata)
494+
if serialized:
495+
usage["raw_usage"] = serialized
496+
490497
return usage
491498

492499

posthog/ai/openai/openai_converter.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
FormattedTextContent,
1717
TokenUsage,
1818
)
19+
from posthog.ai.utils import serialize_raw_usage
1920

2021

2122
def format_openai_response(response: Any) -> List[FormattedMessage]:
@@ -429,6 +430,12 @@ def extract_openai_usage_from_response(response: Any) -> TokenUsage:
429430
if web_search_count > 0:
430431
result["web_search_count"] = web_search_count
431432

433+
# Capture raw usage metadata for backend processing
434+
# Serialize to dict here in the converter (not in utils)
435+
serialized = serialize_raw_usage(response.usage)
436+
if serialized:
437+
result["raw_usage"] = serialized
438+
432439
return result
433440

434441

@@ -482,6 +489,12 @@ def extract_openai_usage_from_chunk(
482489
chunk.usage.completion_tokens_details.reasoning_tokens
483490
)
484491

492+
# Capture raw usage metadata for backend processing
493+
# Serialize to dict here in the converter (not in utils)
494+
serialized = serialize_raw_usage(chunk.usage)
495+
if serialized:
496+
usage["raw_usage"] = serialized
497+
485498
elif provider_type == "responses":
486499
# For Responses API, usage is only in chunk.response.usage for completed events
487500
if hasattr(chunk, "type") and chunk.type == "response.completed":
@@ -516,6 +529,12 @@ def extract_openai_usage_from_chunk(
516529
if web_search_count > 0:
517530
usage["web_search_count"] = web_search_count
518531

532+
# Capture raw usage metadata for backend processing
533+
# Serialize to dict here in the converter (not in utils)
534+
serialized = serialize_raw_usage(response_usage)
535+
if serialized:
536+
usage["raw_usage"] = serialized
537+
519538
return usage
520539

521540

posthog/ai/types.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ class TokenUsage(TypedDict, total=False):
6464
cache_creation_input_tokens: Optional[int]
6565
reasoning_tokens: Optional[int]
6666
web_search_count: Optional[int]
67+
raw_usage: Optional[Any] # Raw provider usage metadata for backend processing
6768

6869

6970
class ProviderResponse(TypedDict, total=False):

posthog/ai/utils.py

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,54 @@
1313
from posthog.client import Client as PostHogClient
1414

1515

16+
def serialize_raw_usage(raw_usage: Any) -> Optional[Dict[str, Any]]:
17+
"""
18+
Convert raw provider usage objects to JSON-serializable dicts.
19+
20+
Handles Pydantic models (OpenAI/Anthropic) and protobuf-like objects (Gemini)
21+
with a fallback chain to ensure we never pass unserializable objects to PostHog.
22+
23+
Args:
24+
raw_usage: Raw usage object from provider SDK
25+
26+
Returns:
27+
Plain dict or None if conversion fails
28+
"""
29+
if raw_usage is None:
30+
return None
31+
32+
# Already a dict
33+
if isinstance(raw_usage, dict):
34+
return raw_usage
35+
36+
# Try Pydantic model_dump() (OpenAI/Anthropic)
37+
if hasattr(raw_usage, "model_dump") and callable(raw_usage.model_dump):
38+
try:
39+
return raw_usage.model_dump()
40+
except Exception:
41+
pass
42+
43+
# Try to_dict() (some protobuf objects)
44+
if hasattr(raw_usage, "to_dict") and callable(raw_usage.to_dict):
45+
try:
46+
return raw_usage.to_dict()
47+
except Exception:
48+
pass
49+
50+
# Try __dict__ / vars() for simple objects
51+
try:
52+
return vars(raw_usage)
53+
except Exception:
54+
pass
55+
56+
# Last resort: convert to string representation
57+
# This ensures we always return something rather than failing
58+
try:
59+
return {"_raw": str(raw_usage)}
60+
except Exception:
61+
return None
62+
63+
1664
def merge_usage_stats(
1765
target: TokenUsage, source: TokenUsage, mode: str = "incremental"
1866
) -> None:
@@ -60,6 +108,17 @@ def merge_usage_stats(
60108
current = target.get("web_search_count") or 0
61109
target["web_search_count"] = max(current, source_web_search)
62110

111+
# Merge raw_usage to avoid losing data from earlier events
112+
# For Anthropic streaming: message_start has input tokens, message_delta has output
113+
# Note: raw_usage is already serialized by converters, so it's a dict
114+
source_raw_usage = source.get("raw_usage")
115+
if source_raw_usage is not None and isinstance(source_raw_usage, dict):
116+
current_raw_value = target.get("raw_usage")
117+
current_raw: Dict[str, Any] = (
118+
current_raw_value if isinstance(current_raw_value, dict) else {}
119+
)
120+
target["raw_usage"] = {**current_raw, **source_raw_usage}
121+
63122
elif mode == "cumulative":
64123
# Replace with latest values (already cumulative)
65124
if source.get("input_tokens") is not None:
@@ -76,6 +135,9 @@ def merge_usage_stats(
76135
target["reasoning_tokens"] = source["reasoning_tokens"]
77136
if source.get("web_search_count") is not None:
78137
target["web_search_count"] = source["web_search_count"]
138+
# Note: raw_usage is already serialized by converters, so it's a dict
139+
if source.get("raw_usage") is not None:
140+
target["raw_usage"] = source["raw_usage"]
79141

80142
else:
81143
raise ValueError(f"Invalid mode: {mode}. Must be 'incremental' or 'cumulative'")
@@ -332,6 +394,11 @@ def call_llm_and_track_usage(
332394
if web_search_count is not None and web_search_count > 0:
333395
tag("$ai_web_search_count", web_search_count)
334396

397+
raw_usage = usage.get("raw_usage")
398+
if raw_usage is not None:
399+
# Already serialized by converters
400+
tag("$ai_usage", raw_usage)
401+
335402
if posthog_distinct_id is None:
336403
tag("$process_person_profile", False)
337404

@@ -457,6 +524,11 @@ async def call_llm_and_track_usage_async(
457524
if web_search_count is not None and web_search_count > 0:
458525
tag("$ai_web_search_count", web_search_count)
459526

527+
raw_usage = usage.get("raw_usage")
528+
if raw_usage is not None:
529+
# Already serialized by converters
530+
tag("$ai_usage", raw_usage)
531+
460532
if posthog_distinct_id is None:
461533
tag("$process_person_profile", False)
462534

@@ -594,6 +666,12 @@ def capture_streaming_event(
594666
):
595667
event_properties["$ai_web_search_count"] = web_search_count
596668

669+
# Add raw usage metadata if present (all providers)
670+
raw_usage = event_data["usage_stats"].get("raw_usage")
671+
if raw_usage is not None:
672+
# Already serialized by converters
673+
event_properties["$ai_usage"] = raw_usage
674+
597675
# Handle provider-specific fields
598676
if (
599677
event_data["provider"] == "openai"

posthog/test/ai/anthropic/test_anthropic.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import json
12
from unittest.mock import patch
23

34
import pytest
@@ -306,6 +307,15 @@ def test_basic_completion(mock_client, mock_anthropic_response):
306307
assert props["$ai_http_status"] == 200
307308
assert props["foo"] == "bar"
308309
assert isinstance(props["$ai_latency"], float)
310+
# Verify raw usage metadata is passed for backend processing
311+
assert "$ai_usage" in props
312+
assert props["$ai_usage"] is not None
313+
# Verify it's JSON-serializable
314+
json.dumps(props["$ai_usage"])
315+
# Verify it has expected structure
316+
assert isinstance(props["$ai_usage"], dict)
317+
assert "input_tokens" in props["$ai_usage"]
318+
assert "output_tokens" in props["$ai_usage"]
309319

310320

311321
def test_groups(mock_client, mock_anthropic_response):
@@ -918,6 +928,16 @@ def test_streaming_with_tool_calls(mock_client, mock_anthropic_stream_with_tools
918928
assert props["$ai_cache_read_input_tokens"] == 5
919929
assert props["$ai_cache_creation_input_tokens"] == 0
920930

931+
# Verify raw usage is captured in streaming mode (merged from events)
932+
assert "$ai_usage" in props
933+
assert props["$ai_usage"] is not None
934+
# Verify it's JSON-serializable
935+
json.dumps(props["$ai_usage"])
936+
# Verify it has expected structure (merged from message_start and message_delta)
937+
assert isinstance(props["$ai_usage"], dict)
938+
assert "input_tokens" in props["$ai_usage"]
939+
assert "output_tokens" in props["$ai_usage"]
940+
921941

922942
def test_async_streaming_with_tool_calls(mock_client, mock_anthropic_stream_with_tools):
923943
"""Test that tool calls are properly captured in async streaming mode."""

posthog/test/ai/gemini/test_gemini.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import json
12
from unittest.mock import MagicMock, patch
23

34
import pytest
@@ -34,6 +35,13 @@ def mock_gemini_response():
3435
# Ensure cache and reasoning tokens are not present (not MagicMock)
3536
mock_usage.cached_content_token_count = 0
3637
mock_usage.thoughts_token_count = 0
38+
# Make model_dump() return a proper dict for serialization
39+
mock_usage.model_dump.return_value = {
40+
"prompt_token_count": 20,
41+
"candidates_token_count": 10,
42+
"cached_content_token_count": 0,
43+
"thoughts_token_count": 0,
44+
}
3745
mock_response.usage_metadata = mock_usage
3846

3947
mock_candidate = MagicMock()
@@ -69,6 +77,13 @@ def mock_gemini_response_with_function_calls():
6977
mock_usage.candidates_token_count = 15
7078
mock_usage.cached_content_token_count = 0
7179
mock_usage.thoughts_token_count = 0
80+
# Make model_dump() return a proper dict for serialization
81+
mock_usage.model_dump.return_value = {
82+
"prompt_token_count": 25,
83+
"candidates_token_count": 15,
84+
"cached_content_token_count": 0,
85+
"thoughts_token_count": 0,
86+
}
7287
mock_response.usage_metadata = mock_usage
7388

7489
# Mock function call
@@ -117,6 +132,13 @@ def mock_gemini_response_function_calls_only():
117132
mock_usage.candidates_token_count = 12
118133
mock_usage.cached_content_token_count = 0
119134
mock_usage.thoughts_token_count = 0
135+
# Make model_dump() return a proper dict for serialization
136+
mock_usage.model_dump.return_value = {
137+
"prompt_token_count": 30,
138+
"candidates_token_count": 12,
139+
"cached_content_token_count": 0,
140+
"thoughts_token_count": 0,
141+
}
120142
mock_response.usage_metadata = mock_usage
121143

122144
# Mock function call
@@ -174,6 +196,15 @@ def test_new_client_basic_generation(
174196
assert props["foo"] == "bar"
175197
assert "$ai_trace_id" in props
176198
assert props["$ai_latency"] > 0
199+
# Verify raw usage metadata is passed for backend processing
200+
assert "$ai_usage" in props
201+
assert props["$ai_usage"] is not None
202+
# Verify it's JSON-serializable
203+
json.dumps(props["$ai_usage"])
204+
# Verify it has expected structure
205+
assert isinstance(props["$ai_usage"], dict)
206+
assert "prompt_token_count" in props["$ai_usage"]
207+
assert "candidates_token_count" in props["$ai_usage"]
177208

178209

179210
def test_new_client_streaming_with_generate_content_stream(
@@ -810,6 +841,13 @@ def test_streaming_cache_and_reasoning_tokens(mock_client, mock_google_genai_cli
810841
chunk1_usage.candidates_token_count = 5
811842
chunk1_usage.cached_content_token_count = 30 # Cache tokens
812843
chunk1_usage.thoughts_token_count = 0
844+
# Make model_dump() return a proper dict for serialization
845+
chunk1_usage.model_dump.return_value = {
846+
"prompt_token_count": 100,
847+
"candidates_token_count": 5,
848+
"cached_content_token_count": 30,
849+
"thoughts_token_count": 0,
850+
}
813851
chunk1.usage_metadata = chunk1_usage
814852

815853
chunk2 = MagicMock()
@@ -819,6 +857,13 @@ def test_streaming_cache_and_reasoning_tokens(mock_client, mock_google_genai_cli
819857
chunk2_usage.candidates_token_count = 10
820858
chunk2_usage.cached_content_token_count = 30 # Same cache tokens
821859
chunk2_usage.thoughts_token_count = 5 # Reasoning tokens
860+
# Make model_dump() return a proper dict for serialization
861+
chunk2_usage.model_dump.return_value = {
862+
"prompt_token_count": 100,
863+
"candidates_token_count": 10,
864+
"cached_content_token_count": 30,
865+
"thoughts_token_count": 5,
866+
}
822867
chunk2.usage_metadata = chunk2_usage
823868

824869
mock_stream = iter([chunk1, chunk2])
@@ -848,6 +893,16 @@ def test_streaming_cache_and_reasoning_tokens(mock_client, mock_google_genai_cli
848893
assert props["$ai_cache_read_input_tokens"] == 30
849894
assert props["$ai_reasoning_tokens"] == 5
850895

896+
# Verify raw usage is captured in streaming mode (merged from chunks)
897+
assert "$ai_usage" in props
898+
assert props["$ai_usage"] is not None
899+
# Verify it's JSON-serializable
900+
json.dumps(props["$ai_usage"])
901+
# Verify it has expected structure
902+
assert isinstance(props["$ai_usage"], dict)
903+
assert "prompt_token_count" in props["$ai_usage"]
904+
assert "candidates_token_count" in props["$ai_usage"]
905+
851906

852907
def test_web_search_grounding(mock_client, mock_google_genai_client):
853908
"""Test web search detection via grounding_metadata."""

0 commit comments

Comments
 (0)