[mirror] fix(block_cost_config): audit + correct stale LLM/block rates + migrate generic ReplicateModelBlock to COST_USD by yashwant86 · Pull Request #5 · Mergemonkey-demo/AutoGPT

yashwant86 · 2026-04-26T14:31:30Z

Mirror of upstream Significant-Gravitas#12912 for benchmark. Do not merge.

Summary by MergeMonkey

Docs Updates:
- Updated docstrings and comments for Replicate block cost tracking and LLM/block rate audits.
Fresh Additions:
- Replicate block now emits provider_cost via predict_time metrics for accurate per-second billing instead of flat RUN charge.
Fixes & Patches:
- Corrected stale LLM token rates (Grok, DeepSeek, Mistral, Kimi, Perplexity) to match current pricing.
- Fixed Unreal Speech block to bill per-character USD instead of flat 5 credits.
- Adjusted FAL video generator rate from 3 to 15 credits/second to match actual pricing.
- Normalized COST_USD block margins (Jina, ZeroBounce, Unreal) to consistent 150 cr/$ baseline.
Maintenance:
- Updated .secrets.baseline line numbers and timestamp.

PR Significant-Gravitas#12909 refresh set GPT-5 to 94/1500 cr/1M which corresponds to a $0.625/$5 provider rate — that's OpenAI's Batch API tier (50% off Sync). Most block calls go through the Sync API; the correct Standard rate is $1.25/$10 per 1M, which at our 1.5x margin = 188/1500 cr/1M. This was under-billing every GPT-5 call by 2x on input. Source: https://openai.com/api/pricing — GPT-5 Standard pricing.

Beyond the GPT-5 Standard-vs-Batch fix, verified all TOKEN_COST entries against each provider's current pricing page. Additional corrections: - DEEPSEEK_CHAT: 42/63 -> 21/42 (provider unified deepseek-chat + deepseek-reasoner to deepseek-v4-flash $0.14/$0.28 in Sept 2025) - DEEPSEEK_R1_0528: 82/329 -> 21/42 (same v4-flash routing) - MISTRAL_LARGE_3: 300/900 -> 75/225 (Mistral dropped to $0.50/$1.50) - MISTRAL_NEMO: 3/6 -> 23/23 (was severely under-billing; provider is $0.15 flat for both input and output) - KIMI_K2_0905: 82/330 -> 90/375 (matches current K2-0905 $0.60/$2.50) - META_LLAMA_4_MAVERICK: 30/90 -> 75/116 (Groq prices $0.50/$0.77; note Groq deprecated this 2026-02-20 — consider retiring enum) Provider sources: openai.com/api/pricing, api-docs.deepseek.com, mistral.ai/pricing, platform.kimi.ai/docs/pricing, groq.com/pricing. Cross-verified via agent-browser for JS-rendered docs.x.ai + DeepSeek. All 40 cost-pipeline unit tests pass.

Full audit against provider pricing pages uncovered 10 more stale entries beyond the LLM token rates: Under-billing (was losing money): - AIVideoGeneratorBlock (FAL): SECOND 3 -> 15 cr/s (provider is $0.05-$0.30/s depending on tier; 3 cr only covered $0.02/s models) - CreateTalkingAvatarVideoBlock (D-ID): RUN 15 -> 100 cr (D-ID charges $5.90/min; 15 cr was ~10x under for a median 10-sec clip at $0.98 real cost) - Nano Banana Pro / Nano Banana 2 (3 blocks each): RUN 14 -> 21 cr (provider $0.14/image, 14 cr was under cost-of-goods) Over-billing (normalizing margin to 1.5x baseline): - IdeogramModelBlock default: RUN 16 -> 12 cr - IdeogramModelBlock V_3: RUN 18 -> 14 cr - AIImageEditorBlock FLUX_KONTEXT_MAX: RUN 20 -> 12 cr - ValidateEmailsBlock (ZeroBounce): COST_USD 250 -> 150 cr/$ - SearchTheWebBlock (Jina): COST_USD 100 -> 150 cr/$ - GetLinkedinProfilePictureBlock: RUN 3 -> 1 cr Tests updated to match new FAL 15 cr/s rate (was 3 cr/s in 2 tests). Sources: replicate.com, fal.ai, d-id.com, ideogram.ai, zerobounce.net, jina.ai. Cross-verified via agent-browser for JS-rendered docs.x.ai (Grok prices already correct at 300/900 for Grok 4.20 @ $2/$6).

…k legacy doc - KIMI_K2_5: 90/450 -> 66/300 (OpenRouter pass-through $0.44/$2) - KIMI_K2_6: 143/600 -> 112/698 (OpenRouter pass-through $0.7448/$4.655) - UnrealTextToSpeechBlock: RUN 5 cr -> COST_USD 150 cr/$. Block now computes USD from len(text) * $0.000016 (Unreal Speech $16/1M chars) and emits cost_usd via merge_stats. Long narrations no longer under-bill. - Grok legacy (grok-3, grok-4-0709, grok-4-fast, grok-code-fast-1): rates were already correct at their launch pricing; added inline comment noting the docs.x.ai page no longer lists them publicly but the API + historical rates remain valid.

…enRouter floor ReplicateModelBlock takes ANY model ref as input. Flat 10 cr/run was 10-500x under-billing long video/LLM runs ($1-$50+) and 20x over-billing tiny SDXL. Block now uses predictions.async_create + async_wait to read prediction.metrics.predict_time after completion, emits (predict_time * $0.0014/s) as provider_cost, billed at COST_USD 150 cr/$. $0.0014/s is the Nvidia L40S mid-tier rate where most popular public models run. Also: MISTRAL_LARGE_3 and MISTRAL_NEMO in TOKEN_COST are the safety floor for OpenRouter-routed calls (ModelMetadata.provider = 'open_router'). Rates now match OpenRouter's pass-through pricing instead of Mistral-direct's /v1/chat rates, which we never call. Addresses Sentry bug prediction on MISTRAL_NEMO being 'higher than actual cost from OpenRouter'. - ReplicateModelBlock: RUN 10 -> COST_USD 150 cr/$ (dynamic billing) - ReplicateFluxAdvancedModelBlock: unchanged (bounded to Flux models $0.04-$0.08, flat 10 cr stays within 1.25-2.5x margin) - MISTRAL_LARGE_3: 75/225 -> 300/900 (OpenRouter $2/$6) - MISTRAL_NEMO: 23/23 -> 5/5 (OpenRouter $0.035/$0.035)

…ling path Adds 7 unit tests for the refactored run_model: - Uses version= keyword when model_ref has ':' (pinned version) - Uses model= keyword otherwise (unpinned 'owner/name') - Emits provider_cost = predict_time * $0.0014/s via merge_stats - async_wait is awaited before reading metrics - Skips merge_stats when metrics missing OR predict_time is 0 (avoids silent wallet-free leak if SDK quirks return empty metrics) - Sanity-checks _REPLICATE_USD_PER_SEC is in the Replicate hardware tier range ($0.0005-$0.002/s) SDK surface confirmed against installed replicate==* package: - Predictions.async_create(model=, version=, input=) — matches - Prediction.metrics is Optional[Dict] — matches - Prediction.async_wait exists — matches

…lling async_wait() returns normally regardless of prediction terminal status — only async_run raises ModelError on 'failed'. Without an explicit status check we'd bill partial compute time on a failed run, yield empty output via extract_result(None), and hardcode 'status: succeeded' hiding the failure. Check prediction.status after async_wait and raise before merge_stats so failures surface as exceptions (caught by run() and re-raised as BlockExecutionError). Also guard against output=None on succeeded predictions (type-narrowing for extract_result). Addresses CodeRabbit critical on Significant-Gravitas#12912.

…ates Three tests on CI were still asserting old values: - UnrealTextToSpeech tests assumed provider_cost=len(text) with type 'characters'. Updated to assert provider_cost=len(text)*$0.000016 with type 'cost_usd' per the text_to_speech_block migration. - ZeroBounce ValidateEmailsBlock cost_amount test assumed 250, now 150 after the margin alignment in this PR.

bot-mergemonkey · 2026-04-26T14:59:11Z

⚡ Risk Assessment — CRITICAL · ~45 min review

Focus areas: Replicate block status/output validation order · LLM rate accuracy (DeepSeek, Kimi, Mistral, Grok) · COST_USD margin normalization (150 cr/$ baseline justification) · FAL video 5× rate increase verification

Assessment: Refactors billing logic for Replicate block and audits/corrects LLM+block rates across 10+ models.

Walkthrough

User calls ReplicateModelBlock.run_model() with a model reference and inputs. The block now parses the reference to determine if it's version-pinned (contains ':') and calls predictions.async_create with either version= or model= keyword. After awaiting async_wait(), it checks prediction.status for 'failed' or 'canceled' and raises if found. If metrics.predict_time exists and is non-zero, it emits provider_cost = predict_time * $0.0014/sec as cost_usd via merge_stats. Finally, it extracts and returns the output. The COST_USD resolver then bills ceil(provider_cost * 150) credits.

Changes

Files	Summary
Replicate Block Cost Tracking Refactor `autogpt_platform/backend/backend/blocks/replicate/replicate_block.py, replicate_block_cost_test.py`	Migrated ReplicateModelBlock from flat RUN billing to dynamic COST_USD via predictions.async_create + predict_time metrics. Emits provider_cost = predict_time * $0.0014/sec; handles version-pinned vs unpinned refs, status validation, and gracefully skips billing when metrics unavailable.
Unreal Speech Block Cost Migration `autogpt_platform/backend/backend/blocks/text_to_speech_block.py`	Changed Unreal Speech billing from flat 5 credits to per-character USD ($0.000016/char). Block now emits provider_cost = len(text) * 0.000016 with cost_usd type for proportional billing.
Block Cost Configuration Audit `autogpt_platform/backend/backend/data/block_cost_config.py`	Audited and corrected stale LLM rates (Grok, DeepSeek, Mistral, Kimi, Perplexity). Normalized COST_USD block margins to 150 cr/$ baseline (Jina 100→150, ZeroBounce 250→150, Unreal 5→150). Updated FAL video rate from 3 to 15 credits/second. Added detailed pricing comments for transparency.
Test Updates for Cost Changes `autogpt_platform/backend/backend/blocks/block_cost_tracking_test.py` `autogpt_platform/backend/backend/data/block_cost_config_test.py` `autogpt_platform/backend/backend/executor/block_usage_cost_test.py` `autogpt_platform/backend/backend/copilot/tools/helpers_test.py`	Updated test assertions to reflect new cost models: Unreal Speech character-based USD billing, FAL video 15 cr/s rate, ZeroBounce 150 cr/$ margin, and Replicate provider_cost emissions.
Secrets Baseline Maintenance `.secrets.baseline`	Updated line number reference and timestamp due to code additions in replicate_block.py.

Sequence Diagram

sequenceDiagram
    participant User
    participant ReplicateBlock
    participant ReplicateClient
    participant Prediction
    participant StatsResolver
    User->>ReplicateBlock: run_model(model_ref, inputs, api_key)
    ReplicateBlock->>ReplicateBlock: Parse model_ref for ':'
    alt version-pinned
        ReplicateBlock->>ReplicateClient: predictions.async_create(version=...)
    else unpinned
        ReplicateBlock->>ReplicateClient: predictions.async_create(model=...)
    end
    ReplicateClient-->>Prediction: return Prediction object
    ReplicateBlock->>Prediction: async_wait()
    Prediction-->>ReplicateBlock: metrics populated
    ReplicateBlock->>ReplicateBlock: Check status
    alt status == 'failed' or 'canceled'
        ReplicateBlock-->>User: raise RuntimeError
    else status == 'succeeded'
        alt metrics.predict_time exists and > 0
            ReplicateBlock->>ReplicateBlock: merge_stats(provider_cost=predict_time*0.0014, cost_usd)
            ReplicateBlock->>StatsResolver: emit NodeExecutionStats
        end
        ReplicateBlock->>ReplicateBlock: extract_result(prediction.output)
        ReplicateBlock-->>User: return result
    end

Dig Deeper With Commands

/review <file-path> <function-optional>
/chat <file-path> "<question>"
/roast <file-path>

Runs only when explicitly triggered.

majdyz added 8 commits April 24, 2026 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mirror] fix(block_cost_config): audit + correct stale LLM/block rates + migrate generic ReplicateModelBlock to COST_USD#5

[mirror] fix(block_cost_config): audit + correct stale LLM/block rates + migrate generic ReplicateModelBlock to COST_USD#5
yashwant86 wants to merge 8 commits intomm-base-12912from
mm-pr-12912

yashwant86 commented Apr 26, 2026 •

edited by bot-mergemonkey Bot

Loading

Uh oh!

bot-mergemonkey Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yashwant86 commented Apr 26, 2026 • edited by bot-mergemonkey Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by MergeMonkey

Uh oh!

bot-mergemonkey Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Dig Deeper With Commands

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yashwant86 commented Apr 26, 2026 •

edited by bot-mergemonkey Bot

Loading

bot-mergemonkey Bot commented Apr 26, 2026 •

edited

Loading