Skip to content

Releases: Azure/agentops

v0.5.3

22 Jun 14:52

Choose a tag to compare

v0.5.2

19 Jun 20:00
0d86a6a

Choose a tag to compare

What's new

Per-evaluator input_mapping for grey-box RAG evaluation.

Evaluator overrides in agentops.yaml now accept an optional input_mapping map that is merged onto the preset's default inputs. You only list the keys you want to change, and the rest of the preset is preserved.

This is the piece that connects the multi-field HTTP response capture shipped in v0.5.0 (response_fields + \.<name>) to the RAG evaluators. You can now point a RAG evaluator at the live retrieved context captured from a JSON HTTP target, instead of static dataset context:

response_fields:
  context: context
evaluators:
  - name: GroundednessEvaluator
    input_mapping:
      context: \.context
  - name: RetrievalEvaluator
    input_mapping:
      context: \.context

The mapping applies to both explicitly listed overrides and auto-selected presets, and never mutates the shared evaluator catalog. A bare evaluator-name string (- GroundednessEvaluator) is still accepted as shorthand for { name: GroundednessEvaluator }, so existing configs keep working unchanged.

Install

pip install agentops-accelerator==0.5.2

Validation

  • ruff check clean.
  • Full unit suite: 1030 passed, 1 skipped (release build gate).

Full Changelog: v0.5.1...v0.5.2

v0.5.1

19 Jun 18:45

Choose a tag to compare

What's new

Rendered gate results in the GitHub Actions job summary. When AgentOps runs inside GitHub Actions, the gates now write their results straight to the workflow run summary, so reviewers read the report on the run page without downloading artifacts.

  • agentops eval run appends the full rendered report.md to the run summary.
  • agentops assert run and agentops redteam run append a concise pass/fail summary (suite, cases, pass rate, plus per-dimension and per-risk-category breakdowns).
  • Writes are best-effort and a no-op outside GitHub Actions, so local runs are unaffected.

Generated workflows use Node24-ready action versions. The prompt-agent and watchdog workflow templates now pin actions/download-artifact@v7 instead of the Node20 @v4, so freshly generated pipelines no longer emit the "Node.js 20 actions are deprecated" warning. A regression guard checks every workflow template against the known Node20 action majors.

Upgrade notes

  • The rendered-report feature is automatic in CI. No config change is required. Pipelines that install AgentOps from @main or upgrade to v0.5.1 pick it up on the next run.
  • If you generated workflows with an earlier version, re-run agentops workflow generate (or bump the action majors by hand) to clear any remaining Node20 deprecation warnings.

Links

  • Render gate results in Actions summary + Node24 workflow templates: #331
  • Grey-box multi-field JSON capture (shipped in v0.5.0): #330

Validation

  • Release build job: ruff + full unit test suite passed.
  • Published to PyPI and verified on TestPyPI.

v0.5.0

19 Jun 17:25
112dfba

Choose a tag to compare

What changed

This release lets AgentOps score the retrieval an agent actually used at eval time, instead of only the static context stored in your dataset. It is the engine piece that unlocks RAG evaluators (Groundedness, Retrieval, Document Retrieval) against a live HTTP orchestrator.

Added

  • Grey-box retrieval capture for HTTP JSON targets. An HTTP target can now capture extra named fields from a JSON response through a response_fields map (name -> dot-path). Captured values are exposed to evaluator input_mapping as $response.<name> (for example $response.context, $response.retrieved_documents), and dataset columns can be referenced with $row.<name> (for example $row.qrels).
  • This lets RAG evaluators such as Groundedness, Retrieval, and Document Retrieval score the retrieval that was actually used during the run, rather than static dataset context.

Compatibility

  • Fully backward compatible. The primary prediction (response_field) and existing single-field behavior are unchanged when response_fields is not set. Multi-field capture applies to JSON-mode HTTP targets only.

Docs

  • HTTP target reference now documents response_fields with a grey-box example, and the bundle input-mapping reference documents the $response.<name> and $row.<name> tokens.

Install

pip install agentops-accelerator==0.5.0

Validation

  • Release pipeline build gate (ruff + full unit suite + package) passed; published to TestPyPI and PyPI.

v0.4.5

19 Jun 13:03
bbaa7ef

Choose a tag to compare

What this release is about

This release lets AgentOps run its safety and quality gates (ASSERT and Red Team) against a live HTTP orchestrator endpoint, not only model or deployment targets. With this, an HTTP agent such as the GPT-RAG orchestrator can be governed end to end in CI: evaluation, ASSERT policy checks, and Red Team, with the generated GitHub Actions and Azure DevOps pipelines running those gates automatically.

It also fixes a reasoning-model judge error that could fail the eval gate in CI.

Merged via #326 (feature), #327 (release lint hotfix), and #328 (release test hotfix).

Added

  • Governance gates for HTTP agents (ASSERT and Red Team). agentops assert run and agentops redteam run now work against a live HTTP orchestrator endpoint. Red Team wraps the HTTP endpoint as an SDK-compatible target and reuses the AgentOps HTTP mapping (request_field, response_mode, stream, custom headers). ASSERT resolves assert-ai inside the active virtual environment, accepts non-secret values from assert.env, can request an AAD token from the Azure CLI for local auth-disabled Azure AI resources, injects the GPT-5 max_completion_tokens shim only when configured, and materializes a runtime ASSERT config so committed configs no longer need absolute artifact paths.
  • Generated workflows run the ASSERT and Red Team gates. agentops workflow generate now installs the optional ASSERT/Red Team dependencies, runs those gates when assert: or redteam: is present in agentops.yaml, uploads their artifacts, and emits the corrected Red Team command quoting.

Fixed

  • Reasoning-model judges no longer fail the eval gate in CI. The generated GitHub Actions and Azure DevOps eval and Red Team steps now forward AZURE_OPENAI_MODEL_NAME, so AgentOps detects reasoning models (such as gpt-5-nano) and uses max_completion_tokens instead of max_tokens. This removes the judge 400 error that could break the eval gate when the judge deployment is a reasoning model.

Upgrade notes

The CLI install spec in generated workflows can now target @main again. The HTTP governance code is on main as of this release, so projects that pinned a temporary feature branch can drop the pin. The HTTP agent tutorial no longer documents a temporary install pin.

Validated

  • A live GPT-RAG orchestrator project ran both generated pipelines green: the PR workflow (sandbox deploy candidate plus eval, ASSERT, and Red Team gates) and the dev deploy workflow (eval gate plus azd deploy), using OIDC and azd deploy only.
  • Unit suites for the generator and the ASSERT/Red Team runners pass.

v0.4.4

18 Jun 00:29

Choose a tag to compare

Evaluate streaming agents directly, no adapter

AgentOps http_json targets can now evaluate agents that stream their answers
(Server-Sent Events or raw text), not just agents that return a single JSON
body. This means you can point an eval straight at a streaming endpoint such as
the gpt-rag-orchestrator /orchestrator route without writing and hosting a
thin non-streaming adapter first.

Defaults are unchanged: existing http_json targets keep parsing a single JSON
response exactly as before (response_mode: json is the default and is
byte-for-byte compatible).

What you can do now

  • Set response_mode: sse or response_mode: text on an http_json target to
    read a streamed body and aggregate it into one answer for scoring.
  • Configure aggregation with an optional stream block:
    • text_field: dotted path to the token text when each SSE data: line is
      JSON (for example choices.0.delta.content).
    • done_marker: stop token, for example [DONE].
    • strip_leading_token: drop the first whitespace-delimited token. The
      gpt-rag orchestrator emits the conversation_id as the first chunk, so this
      removes it from the scored answer.
  • Use a non-bearer auth header for endpoints gated by a shared secret. Two new
    fields, auth_header_name (default Authorization) and
    auth_value_template (default Bearer {token}), let you send something like
    X-API-KEY: <secret> without putting the secret in agentops.yaml. The
    {token} placeholder is replaced by the value of the auth_header_env
    environment variable at run time.

When a JSON parse fails on a text/event-stream response, the error now tells
you to set response_mode: sse or response_mode: text, so the misconfiguration
is obvious instead of a raw decode error.

Example: gpt-rag-orchestrator

target:
  kind: http_json
  url: https://<your-orchestrator>/orchestrator
  request_field: ask
  response_mode: text
  auth_header_name: X-API-KEY
  auth_value_template: "{token}"
  auth_header_env: ORCHESTRATOR_APP_APIKEY
  stream:
    strip_leading_token: true

Compatibility and transport

  • No change for existing targets. response_mode defaults to json.
  • Streaming reuses the same Python standard-library (urllib) transport and the
    same 3-try backoff as the JSON path. The body is read to completion (eval
    answers are bounded) and then aggregated, so retries stay robust.

Validation

  • Full unit suite: 991 passed, 1 skipped (14 net-new tests, no regressions).
  • New tests cover SSE/text aggregation, the leading-token strip, done_marker
    and event: error handling, the configurable auth header, and the
    response_mode validation gate.

New config fields

Field Default Purpose
response_mode json Selects the response parser: json, sse, or text.
stream.text_field none Dotted path to token text inside each JSON SSE data: line.
stream.done_marker none Token that ends the stream, e.g. [DONE].
stream.strip_leading_token false Drops the first whitespace-delimited token (e.g. a conversation_id prefix).
auth_header_name Authorization Header used to send the auth value.
auth_value_template Bearer {token} Template for the auth value; {token} is replaced by auth_header_env.

Full changelog: see CHANGELOG.md ([0.4.4]).

v0.4.3

17 Jun 20:46

Choose a tag to compare

[0.4.3] - 2026-06-17

Added

  • Prompt-agent tutorials no longer require manual portal copy/paste.
    agentops prompt pull reads the configured Foundry prompt agent
    (agent: name:version), validates that the Foundry definition is actually a
    prompt agent, and writes the reviewed Sandbox instructions to
    .agentops/prompts/<agent-name>.prompt.md by default. Before writing, the CLI
    prints the resolved agent, endpoint, endpoint source, and destination file so
    operators can catch the wrong environment early. Changed prompt files are
    protected by default and require --force to overwrite reviewed local edits.
    The command updates prompt_file in agentops.yaml unless
    --no-update-config is passed, and it can resolve the endpoint from
    --project-endpoint, agentops.yaml, AZURE_AI_FOUNDRY_PROJECT_ENDPOINT, or
    the active .azure/<env>/.env. The prompt-agent tutorial and packaged
    agentops-eval skill now use this command instead of a manual here-string.
    (#322)

Changed

  • agentops eval init now recommends evaluators from the agent and dataset
    shape.
    The azd bootstrap path now reuses the same AgentOps evaluator
    catalog as agentops eval run: free-form answer datasets get answer-quality
    checks, RAG-shaped datasets get groundedness / relevance / retrieval checks,
    and tool-use datasets get tool-call / intent / task-adherence checks while
    avoiding literal-answer similarity metrics. Explicit evaluators: entries in
    agentops.yaml still win. The CLI prints the recommendation source, detected
    signals, and selected azd built-ins before reporting the generated
    eval.yaml, so users can see why those evaluators were chosen.
    (#323)

v0.4.2

17 Jun 11:56
22bed3c

Choose a tag to compare

What's Changed

Full Changelog: v0.4.1...v0.4.2

v0.4.1

15 Jun 19:28
2e791ea

Choose a tag to compare

Note: v0.4.0 was tagged but its Release workflow failed in the
build step (see #311), so v0.4.0 was never published to PyPI or the
VS Code Marketplace. v0.4.1 supersedes it: it contains every change from
v0.4.0 plus the test-suite fix (#311) that unblocks the release pipeline.
Installing or upgrading from PyPI/Marketplace will pick up v0.4.1 directly.

What's Changed * chore(deps-dev): update azure-monitor-query requirement from <2.0,>=1.3 to >=1.3,<3.0 by @dependabot[bot] in #248 * chore(deps-dev): update azure-mgmt-cognitiveservices requirement from <14.0,>=13.5 to >=13.5,<15.0 by @dependabot[bot] in #247 * chore(deps-dev): update pandas requirement from <3.0,>=2.0 to >=2.0,<4.0 by @dependabot[bot] in #246 * chore(deps): bump aiohttp from 3.13.5 to 3.14.0 by @dependabot[bot] in #231 * chore(deps): bump idna from 3.11 to 3.15 by @dependabot[bot] in #168 * feat(prompt-deploy): tag PR candidates as agentops:candidate=true by @placerda in #309 * Release v0.4.1 by @placerda in #310 * fix(tests): cherry-pick #311 to main to unblock Release by @placerda in #312 Full Changelog: v0.4.0...v0.4.1

v0.4.0

14 Jun 15:57
7a22c77

Choose a tag to compare

v0.4.0

Added

  • agentops doctor detects missing OpenAI data-plane RBAC on the Foundry resource. A new security.missing_openai_data_plane_rbac check resolves the signed-in principal (via the oid claim of the access token used by DefaultAzureCredential) and lists role assignments at the Foundry account scope using azure-mgmt-authorization. When none of Cognitive Services OpenAI User, Cognitive Services OpenAI Contributor or Cognitive Services Contributor is granted (directly or inherited), Doctor surfaces a WARNING with the exact az role assignment create command pre-populated with the principal object id and Foundry account scope. (#228, #307)

Changed

  • agentops-pr workflow templates now auto-detect a committed baseline. Both the GitHub Actions and Azure DevOps PR templates emitted by agentops workflow generate wrap agentops eval run with a small bash guard: when .agentops/baseline/results.json exists in the consumer repo the step automatically passes --baseline .agentops/baseline/results.json; without the file the behaviour is unchanged. Deploy templates (dev/qa/prod) are untouched. (#155, #306)

Fixed

  • agentops skills install --platform help text now lists cursor alongside copilot and claude. The cursor platform is fully implemented (registers rules in .cursor/rules/agentops.mdc); only the CLI help string was out of date. (#157, #305)