feat(bench): add BFCL function-calling adapter

## Problem

The external-benchmark roadmap now selects BFCL as the cheapest public size-axis benchmark for the policy-vs-free-form contrast, but `@tangle-network/agent-bench` does not yet expose a BFCL adapter.

## Desired adapter

Add a fail-loud `bfcl` adapter under `bench/src/benchmarks/` that delegates to the official Berkeley Function Calling Leaderboard assets/evaluator where possible. The first useful scope is deterministic function-call categories suitable for weak-vs-gold calibration and small-vs-large model comparisons.

## Constraints

- Do not fabricate BFCL scores.
- Fixture mode may test adapter plumbing, but official mode must require the real Gorilla/BFCL checkout or `bfcl-eval` project root.
- Keep BFCL V4/V3 naming current: V4 is the latest official line, while V3 multi-turn/missing-function categories are the immediate research target if they remain the best fit.
- Expose through the existing `ADAPTERS` map; no runtime-loop changes.
- Add preflight/load/judge tests that fail loud without official assets and pass on fixtures.

## Research use

This enables the AppWorld + BFCL easy-subset-first tranche: AppWorld serves external validity through stateful code/API orchestration, BFCL serves the size axis for function-calling and missing-function/free-form contrast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(bench): add BFCL function-calling adapter #412

Problem

Desired adapter

Constraints

Research use

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(bench): add BFCL function-calling adapter #412

Description

Problem

Desired adapter

Constraints

Research use

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions