Skip to content

fix(langchain): handle Anthropic cache_creation nested-dict in usage — causes silent generation drop on v2.x, data loss on v4.x #1697

@ASAD-BE18

Description

@ASAD-BE18

Summary

When using langchain-anthropic with extended prompt-caching enabled, Anthropic's API returns a cache_creation field inside the LLM output usage dict. This field is a nested dict (not an int), which breaks _parse_usage_model in two different ways depending on the SDK version.

Repro

Trigger any LLM call with a ChatAnthropic model that has prompt caching active. The raw usage from response.llm_output["usage"] contains:

{
    "input_tokens": 9454,
    "output_tokens": 380,
    "cache_read_input_tokens": 0,
    "cache_creation": {
        "ephemeral_1h_input_tokens": 0,
        "ephemeral_5m_input_tokens": 0,
    },
}

Impact by version

v2.x (≤ 2.60.10) — silent generation drop (critical)

The final filter in _parse_usage_model was:

usage_model = {k: v for k, v in usage_model.items() if v is not None and not isinstance(v, str)}

isinstance({"ephemeral_1h_input_tokens": 0, ...}, str) is False, so the nested dict passes through into usage_details. Langfuse's UpdateGenerationBody.usageDetails is typed as Union[Dict[str, int], OpenAiCompletionUsageSchema, OpenAiResponseUsageSchema]. The Pydantic validation rejects the nested dict and raises:

ValidationError: 7 validation errors for UpdateGenerationBody
usageDetails -> cache_creation
  value is not a valid integer (type=type_error.integer)

This error is silently caught inside the Langfuse ingestion queue, dropping the entire generation end() event. The result: every generation shows endTime=null and input=0 / output=0.

v4.x (current HEAD) — data silently discarded

The final filter was tightened to isinstance(v, int), which correctly prevents the crash. However, the cache_creation dict is silently dropped, so the cache-tier creation token counts are not stored at all.

Expected behaviour

cache_creation should be handled the same way as input_token_details — flatten the per-tier values into individually named keys (e.g. cache_creation_ephemeral_1h_input_tokens, cache_creation_ephemeral_5m_input_tokens), and expose an aggregated cache_creation_input_tokens total for cost calculation.

Suggested fix

In _parse_usage_model, before the final isinstance(v, int) filter, add:

# Anthropic extended prompt caching: cache_creation is a dict keyed by cache tier
if "cache_creation" in usage_model and isinstance(usage_model["cache_creation"], dict):
    cache_creation = usage_model.pop("cache_creation")
    total = 0
    for tier_key, tier_val in cache_creation.items():
        if isinstance(tier_val, int):
            usage_model[f"cache_creation_{tier_key}"] = tier_val
            total += tier_val
    if total > 0:
        # Aggregate key mirrors the legacy cache_creation_input_tokens field
        usage_model.setdefault("cache_creation_input_tokens", total)

A unit test reproducing both the v2 crash case and the proper flattening is included in the linked PR.

Environment

  • langchain-anthropic ≥ 0.3.x (models with tiered prompt caching, e.g. claude-haiku-4-5-20251001, claude-sonnet-4-6)
  • langfuse 2.60.10 exhibits the crash; langfuse 4.7.1 HEAD silently discards the data

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions