Summary
When using langchain-anthropic with extended prompt-caching enabled, Anthropic's API returns a cache_creation field inside the LLM output usage dict. This field is a nested dict (not an int), which breaks _parse_usage_model in two different ways depending on the SDK version.
Repro
Trigger any LLM call with a ChatAnthropic model that has prompt caching active. The raw usage from response.llm_output["usage"] contains:
{
"input_tokens": 9454,
"output_tokens": 380,
"cache_read_input_tokens": 0,
"cache_creation": {
"ephemeral_1h_input_tokens": 0,
"ephemeral_5m_input_tokens": 0,
},
}
Impact by version
v2.x (≤ 2.60.10) — silent generation drop (critical)
The final filter in _parse_usage_model was:
usage_model = {k: v for k, v in usage_model.items() if v is not None and not isinstance(v, str)}
isinstance({"ephemeral_1h_input_tokens": 0, ...}, str) is False, so the nested dict passes through into usage_details. Langfuse's UpdateGenerationBody.usageDetails is typed as Union[Dict[str, int], OpenAiCompletionUsageSchema, OpenAiResponseUsageSchema]. The Pydantic validation rejects the nested dict and raises:
ValidationError: 7 validation errors for UpdateGenerationBody
usageDetails -> cache_creation
value is not a valid integer (type=type_error.integer)
This error is silently caught inside the Langfuse ingestion queue, dropping the entire generation end() event. The result: every generation shows endTime=null and input=0 / output=0.
v4.x (current HEAD) — data silently discarded
The final filter was tightened to isinstance(v, int), which correctly prevents the crash. However, the cache_creation dict is silently dropped, so the cache-tier creation token counts are not stored at all.
Expected behaviour
cache_creation should be handled the same way as input_token_details — flatten the per-tier values into individually named keys (e.g. cache_creation_ephemeral_1h_input_tokens, cache_creation_ephemeral_5m_input_tokens), and expose an aggregated cache_creation_input_tokens total for cost calculation.
Suggested fix
In _parse_usage_model, before the final isinstance(v, int) filter, add:
# Anthropic extended prompt caching: cache_creation is a dict keyed by cache tier
if "cache_creation" in usage_model and isinstance(usage_model["cache_creation"], dict):
cache_creation = usage_model.pop("cache_creation")
total = 0
for tier_key, tier_val in cache_creation.items():
if isinstance(tier_val, int):
usage_model[f"cache_creation_{tier_key}"] = tier_val
total += tier_val
if total > 0:
# Aggregate key mirrors the legacy cache_creation_input_tokens field
usage_model.setdefault("cache_creation_input_tokens", total)
A unit test reproducing both the v2 crash case and the proper flattening is included in the linked PR.
Environment
langchain-anthropic ≥ 0.3.x (models with tiered prompt caching, e.g. claude-haiku-4-5-20251001, claude-sonnet-4-6)
langfuse 2.60.10 exhibits the crash; langfuse 4.7.1 HEAD silently discards the data
Summary
When using
langchain-anthropicwith extended prompt-caching enabled, Anthropic's API returns acache_creationfield inside the LLM output usage dict. This field is a nested dict (not an int), which breaks_parse_usage_modelin two different ways depending on the SDK version.Repro
Trigger any LLM call with a
ChatAnthropicmodel that has prompt caching active. The raw usage fromresponse.llm_output["usage"]contains:{ "input_tokens": 9454, "output_tokens": 380, "cache_read_input_tokens": 0, "cache_creation": { "ephemeral_1h_input_tokens": 0, "ephemeral_5m_input_tokens": 0, }, }Impact by version
v2.x (≤ 2.60.10) — silent generation drop (critical)
The final filter in
_parse_usage_modelwas:isinstance({"ephemeral_1h_input_tokens": 0, ...}, str)isFalse, so the nested dict passes through intousage_details. Langfuse'sUpdateGenerationBody.usageDetailsis typed asUnion[Dict[str, int], OpenAiCompletionUsageSchema, OpenAiResponseUsageSchema]. The Pydantic validation rejects the nested dict and raises:This error is silently caught inside the Langfuse ingestion queue, dropping the entire generation
end()event. The result: every generation showsendTime=nullandinput=0 / output=0.v4.x (current HEAD) — data silently discarded
The final filter was tightened to
isinstance(v, int), which correctly prevents the crash. However, thecache_creationdict is silently dropped, so the cache-tier creation token counts are not stored at all.Expected behaviour
cache_creationshould be handled the same way asinput_token_details— flatten the per-tier values into individually named keys (e.g.cache_creation_ephemeral_1h_input_tokens,cache_creation_ephemeral_5m_input_tokens), and expose an aggregatedcache_creation_input_tokenstotal for cost calculation.Suggested fix
In
_parse_usage_model, before the finalisinstance(v, int)filter, add:A unit test reproducing both the v2 crash case and the proper flattening is included in the linked PR.
Environment
langchain-anthropic≥ 0.3.x (models with tiered prompt caching, e.g.claude-haiku-4-5-20251001,claude-sonnet-4-6)langfuse2.60.10 exhibits the crash;langfuse4.7.1 HEAD silently discards the data