Releases · Azure/agentops

22 Jun 14:52

github-actions

v0.5.3

0a3bede

v0.5.3 Latest

Latest

Full Changelog: v0.5.2...v0.5.3

Assets 5

19 Jun 20:00

placerda

v0.5.2

0d86a6a

v0.5.2

What's new

Per-evaluator input_mapping for grey-box RAG evaluation.

Evaluator overrides in agentops.yaml now accept an optional input_mapping map that is merged onto the preset's default inputs. You only list the keys you want to change, and the rest of the preset is preserved.

This is the piece that connects the multi-field HTTP response capture shipped in v0.5.0 (response_fields + \.<name>) to the RAG evaluators. You can now point a RAG evaluator at the live retrieved context captured from a JSON HTTP target, instead of static dataset context:

response_fields:
  context: context
evaluators:
  - name: GroundednessEvaluator
    input_mapping:
      context: \.context
  - name: RetrievalEvaluator
    input_mapping:
      context: \.context

The mapping applies to both explicitly listed overrides and auto-selected presets, and never mutates the shared evaluator catalog. A bare evaluator-name string (- GroundednessEvaluator) is still accepted as shorthand for { name: GroundednessEvaluator }, so existing configs keep working unchanged.

Install

pip install agentops-accelerator==0.5.2

Validation

ruff check clean.
Full unit suite: 1030 passed, 1 skipped (release build gate).

Full Changelog: v0.5.1...v0.5.2

Assets 2

19 Jun 18:45

placerda

v0.5.1

dca5ca6

v0.5.1

What's new

Rendered gate results in the GitHub Actions job summary. When AgentOps runs inside GitHub Actions, the gates now write their results straight to the workflow run summary, so reviewers read the report on the run page without downloading artifacts.

agentops eval run appends the full rendered report.md to the run summary.
agentops assert run and agentops redteam run append a concise pass/fail summary (suite, cases, pass rate, plus per-dimension and per-risk-category breakdowns).
Writes are best-effort and a no-op outside GitHub Actions, so local runs are unaffected.

Generated workflows use Node24-ready action versions. The prompt-agent and watchdog workflow templates now pin actions/download-artifact@v7 instead of the Node20 @v4, so freshly generated pipelines no longer emit the "Node.js 20 actions are deprecated" warning. A regression guard checks every workflow template against the known Node20 action majors.

Upgrade notes

The rendered-report feature is automatic in CI. No config change is required. Pipelines that install AgentOps from @main or upgrade to v0.5.1 pick it up on the next run.
If you generated workflows with an earlier version, re-run agentops workflow generate (or bump the action majors by hand) to clear any remaining Node20 deprecation warnings.

Validation

Release build job: ruff + full unit test suite passed.
Published to PyPI and verified on TestPyPI.

Assets 2

19 Jun 17:25

placerda

v0.5.0

112dfba

v0.5.0

What changed

This release lets AgentOps score the retrieval an agent actually used at eval time, instead of only the static context stored in your dataset. It is the engine piece that unlocks RAG evaluators (Groundedness, Retrieval, Document Retrieval) against a live HTTP orchestrator.

Added

Grey-box retrieval capture for HTTP JSON targets. An HTTP target can now capture extra named fields from a JSON response through a response_fields map (name -> dot-path). Captured values are exposed to evaluator input_mapping as $response.<name> (for example $response.context, $response.retrieved_documents), and dataset columns can be referenced with $row.<name> (for example $row.qrels).
This lets RAG evaluators such as Groundedness, Retrieval, and Document Retrieval score the retrieval that was actually used during the run, rather than static dataset context.

Compatibility

Fully backward compatible. The primary prediction (response_field) and existing single-field behavior are unchanged when response_fields is not set. Multi-field capture applies to JSON-mode HTTP targets only.

Docs

HTTP target reference now documents response_fields with a grey-box example, and the bundle input-mapping reference documents the $response.<name> and $row.<name> tokens.

Install

pip install agentops-accelerator==0.5.0

Validation

Release pipeline build gate (ruff + full unit suite + package) passed; published to TestPyPI and PyPI.

Assets 2

19 Jun 13:03

placerda

v0.4.5

bbaa7ef

v0.4.5

What this release is about

This release lets AgentOps run its safety and quality gates (ASSERT and Red Team) against a live HTTP orchestrator endpoint, not only model or deployment targets. With this, an HTTP agent such as the GPT-RAG orchestrator can be governed end to end in CI: evaluation, ASSERT policy checks, and Red Team, with the generated GitHub Actions and Azure DevOps pipelines running those gates automatically.

It also fixes a reasoning-model judge error that could fail the eval gate in CI.

Merged via #326 (feature), #327 (release lint hotfix), and #328 (release test hotfix).

Added

Governance gates for HTTP agents (ASSERT and Red Team). agentops assert run and agentops redteam run now work against a live HTTP orchestrator endpoint. Red Team wraps the HTTP endpoint as an SDK-compatible target and reuses the AgentOps HTTP mapping (request_field, response_mode, stream, custom headers). ASSERT resolves assert-ai inside the active virtual environment, accepts non-secret values from assert.env, can request an AAD token from the Azure CLI for local auth-disabled Azure AI resources, injects the GPT-5 max_completion_tokens shim only when configured, and materializes a runtime ASSERT config so committed configs no longer need absolute artifact paths.
Generated workflows run the ASSERT and Red Team gates. agentops workflow generate now installs the optional ASSERT/Red Team dependencies, runs those gates when assert: or redteam: is present in agentops.yaml, uploads their artifacts, and emits the corrected Red Team command quoting.

Fixed

Reasoning-model judges no longer fail the eval gate in CI. The generated GitHub Actions and Azure DevOps eval and Red Team steps now forward AZURE_OPENAI_MODEL_NAME, so AgentOps detects reasoning models (such as gpt-5-nano) and uses max_completion_tokens instead of max_tokens. This removes the judge 400 error that could break the eval gate when the judge deployment is a reasoning model.

Upgrade notes

The CLI install spec in generated workflows can now target @main again. The HTTP governance code is on main as of this release, so projects that pinned a temporary feature branch can drop the pin. The HTTP agent tutorial no longer documents a temporary install pin.

Validated

A live GPT-RAG orchestrator project ran both generated pipelines green: the PR workflow (sandbox deploy candidate plus eval, ASSERT, and Red Team gates) and the dev deploy workflow (eval gate plus azd deploy), using OIDC and azd deploy only.
Unit suites for the generator and the ASSERT/Red Team runners pass.

Assets 2

18 Jun 00:29

placerda

v0.4.4

e2afa85

v0.4.4

Evaluate streaming agents directly, no adapter

AgentOps http_json targets can now evaluate agents that stream their answers
(Server-Sent Events or raw text), not just agents that return a single JSON
body. This means you can point an eval straight at a streaming endpoint such as
the gpt-rag-orchestrator /orchestrator route without writing and hosting a
thin non-streaming adapter first.

Defaults are unchanged: existing http_json targets keep parsing a single JSON
response exactly as before (response_mode: json is the default and is
byte-for-byte compatible).

What you can do now

Set response_mode: sse or response_mode: text on an http_json target to
read a streamed body and aggregate it into one answer for scoring.
Configure aggregation with an optional stream block:
- text_field: dotted path to the token text when each SSE data: line is
  JSON (for example choices.0.delta.content).
- done_marker: stop token, for example [DONE].
- strip_leading_token: drop the first whitespace-delimited token. The
  gpt-rag orchestrator emits the conversation_id as the first chunk, so this
  removes it from the scored answer.
Use a non-bearer auth header for endpoints gated by a shared secret. Two new
fields, auth_header_name (default Authorization) and
auth_value_template (default Bearer {token}), let you send something like
X-API-KEY: <secret> without putting the secret in agentops.yaml. The
{token} placeholder is replaced by the value of the auth_header_env
environment variable at run time.

When a JSON parse fails on a text/event-stream response, the error now tells
you to set response_mode: sse or response_mode: text, so the misconfiguration
is obvious instead of a raw decode error.

Example: gpt-rag-orchestrator

target:
  kind: http_json
  url: https://<your-orchestrator>/orchestrator
  request_field: ask
  response_mode: text
  auth_header_name: X-API-KEY
  auth_value_template: "{token}"
  auth_header_env: ORCHESTRATOR_APP_APIKEY
  stream:
    strip_leading_token: true

Compatibility and transport

No change for existing targets. response_mode defaults to json.
Streaming reuses the same Python standard-library (urllib) transport and the
same 3-try backoff as the JSON path. The body is read to completion (eval
answers are bounded) and then aggregated, so retries stay robust.

Validation

Full unit suite: 991 passed, 1 skipped (14 net-new tests, no regressions).
New tests cover SSE/text aggregation, the leading-token strip, done_marker
and event: error handling, the configurable auth header, and the
response_mode validation gate.

New config fields

Field	Default	Purpose
`response_mode`	`json`	Selects the response parser: `json`, `sse`, or `text`.
`stream.text_field`	none	Dotted path to token text inside each JSON SSE `data:` line.
`stream.done_marker`	none	Token that ends the stream, e.g. `[DONE]`.
`stream.strip_leading_token`	`false`	Drops the first whitespace-delimited token (e.g. a `conversation_id` prefix).
`auth_header_name`	`Authorization`	Header used to send the auth value.
`auth_value_template`	`Bearer {token}`	Template for the auth value; `{token}` is replaced by `auth_header_env`.

Full changelog: see CHANGELOG.md ([0.4.4]).

Assets 2

17 Jun 20:46

placerda

v0.4.3

4fe391a

v0.4.3

[0.4.3] - 2026-06-17

Added

Prompt-agent tutorials no longer require manual portal copy/paste.
agentops prompt pull reads the configured Foundry prompt agent
(agent: name:version), validates that the Foundry definition is actually a
prompt agent, and writes the reviewed Sandbox instructions to
.agentops/prompts/<agent-name>.prompt.md by default. Before writing, the CLI
prints the resolved agent, endpoint, endpoint source, and destination file so
operators can catch the wrong environment early. Changed prompt files are
protected by default and require --force to overwrite reviewed local edits.
The command updates prompt_file in agentops.yaml unless
--no-update-config is passed, and it can resolve the endpoint from
--project-endpoint, agentops.yaml, AZURE_AI_FOUNDRY_PROJECT_ENDPOINT, or
the active .azure/<env>/.env. The prompt-agent tutorial and packaged
agentops-eval skill now use this command instead of a manual here-string.
(#322)

Changed

agentops eval init now recommends evaluators from the agent and dataset
shape. The azd bootstrap path now reuses the same AgentOps evaluator
catalog as agentops eval run: free-form answer datasets get answer-quality
checks, RAG-shaped datasets get groundedness / relevance / retrieval checks,
and tool-use datasets get tool-call / intent / task-adherence checks while
avoiding literal-answer similarity metrics. Explicit evaluators: entries in
agentops.yaml still win. The CLI prints the recommendation source, detected
signals, and selected azd built-ins before reporting the generated
eval.yaml, so users can see why those evaluators were chosen.
(#323)

Assets 2

17 Jun 11:56

placerda

v0.4.2

22bed3c

v0.4.2

What's Changed

Simplify README and make Foundry dependencies default by @placerda in #315
Release v0.4.2 by @placerda in #321

Full Changelog: v0.4.1...v0.4.2

Contributors

placerda

Assets 2

15 Jun 19:28

github-actions

v0.4.1

2e791ea

v0.4.1

Note: v0.4.0 was tagged but its Release workflow failed in the
build step (see #311), so v0.4.0 was never published to PyPI or the
VS Code Marketplace. v0.4.1 supersedes it: it contains every change from
v0.4.0 plus the test-suite fix (#311) that unblocks the release pipeline.
Installing or upgrading from PyPI/Marketplace will pick up v0.4.1 directly.

What's Changed * chore(deps-dev): update azure-monitor-query requirement from <2.0,>=1.3 to >=1.3,<3.0 by @dependabot[bot] in #248 * chore(deps-dev): update azure-mgmt-cognitiveservices requirement from <14.0,>=13.5 to >=13.5,<15.0 by @dependabot[bot] in #247 * chore(deps-dev): update pandas requirement from <3.0,>=2.0 to >=2.0,<4.0 by @dependabot[bot] in #246 * chore(deps): bump aiohttp from 3.13.5 to 3.14.0 by @dependabot[bot] in #231 * chore(deps): bump idna from 3.11 to 3.15 by @dependabot[bot] in #168 * feat(prompt-deploy): tag PR candidates as agentops:candidate=true by @placerda in #309 * Release v0.4.1 by @placerda in #310 * fix(tests): cherry-pick #311 to main to unblock Release by @placerda in #312 Full Changelog: `v0.4.0...v0.4.1`

Contributors

placerda and dependabot

Assets 5

14 Jun 15:57

placerda

v0.4.0

7a22c77

v0.4.0

Added

agentops doctor detects missing OpenAI data-plane RBAC on the Foundry resource. A new security.missing_openai_data_plane_rbac check resolves the signed-in principal (via the oid claim of the access token used by DefaultAzureCredential) and lists role assignments at the Foundry account scope using azure-mgmt-authorization. When none of Cognitive Services OpenAI User, Cognitive Services OpenAI Contributor or Cognitive Services Contributor is granted (directly or inherited), Doctor surfaces a WARNING with the exact az role assignment create command pre-populated with the principal object id and Foundry account scope. (#228, #307)

Changed

agentops-pr workflow templates now auto-detect a committed baseline. Both the GitHub Actions and Azure DevOps PR templates emitted by agentops workflow generate wrap agentops eval run with a small bash guard: when .agentops/baseline/results.json exists in the consumer repo the step automatically passes --baseline .agentops/baseline/results.json; without the file the behaviour is unchanged. Deploy templates (dev/qa/prod) are untouched. (#155, #306)

Fixed

agentops skills install --platform help text now lists cursor alongside copilot and claude. The cursor platform is fully implemented (registers rules in .cursor/rules/agentops.mdc); only the CLI help string was out of date. (#157, #305)

Assets 2

Uh oh!

Releases: Azure/agentops

v0.5.3

Uh oh!

v0.5.2

What's new

Install

Validation

Uh oh!

v0.5.1

What's new

Upgrade notes

Links

Validation

Uh oh!

v0.5.0

What changed

Added

Compatibility

Docs

Install

Validation

Uh oh!

v0.4.5

What this release is about

Added

Fixed

Upgrade notes

Validated

Uh oh!

v0.4.4

Evaluate streaming agents directly, no adapter

What you can do now

Example: gpt-rag-orchestrator

Compatibility and transport

Validation

New config fields

Uh oh!

v0.4.3

[0.4.3] - 2026-06-17

Added

Changed

Uh oh!

v0.4.2

What's Changed

Contributors

Uh oh!

v0.4.1

Contributors

Uh oh!

v0.4.0

v0.4.0

Added

Changed

Fixed

Uh oh!