Releases: Azure/agentops
v0.5.3
Full Changelog: v0.5.2...v0.5.3
v0.5.2
What's new
Per-evaluator input_mapping for grey-box RAG evaluation.
Evaluator overrides in agentops.yaml now accept an optional input_mapping map that is merged onto the preset's default inputs. You only list the keys you want to change, and the rest of the preset is preserved.
This is the piece that connects the multi-field HTTP response capture shipped in v0.5.0 (response_fields + \.<name>) to the RAG evaluators. You can now point a RAG evaluator at the live retrieved context captured from a JSON HTTP target, instead of static dataset context:
response_fields:
context: context
evaluators:
- name: GroundednessEvaluator
input_mapping:
context: \.context
- name: RetrievalEvaluator
input_mapping:
context: \.contextThe mapping applies to both explicitly listed overrides and auto-selected presets, and never mutates the shared evaluator catalog. A bare evaluator-name string (- GroundednessEvaluator) is still accepted as shorthand for { name: GroundednessEvaluator }, so existing configs keep working unchanged.
Install
pip install agentops-accelerator==0.5.2
Validation
ruff checkclean.- Full unit suite: 1030 passed, 1 skipped (release build gate).
Full Changelog: v0.5.1...v0.5.2
v0.5.1
What's new
Rendered gate results in the GitHub Actions job summary. When AgentOps runs inside GitHub Actions, the gates now write their results straight to the workflow run summary, so reviewers read the report on the run page without downloading artifacts.
agentops eval runappends the full renderedreport.mdto the run summary.agentops assert runandagentops redteam runappend a concise pass/fail summary (suite, cases, pass rate, plus per-dimension and per-risk-category breakdowns).- Writes are best-effort and a no-op outside GitHub Actions, so local runs are unaffected.
Generated workflows use Node24-ready action versions. The prompt-agent and watchdog workflow templates now pin actions/download-artifact@v7 instead of the Node20 @v4, so freshly generated pipelines no longer emit the "Node.js 20 actions are deprecated" warning. A regression guard checks every workflow template against the known Node20 action majors.
Upgrade notes
- The rendered-report feature is automatic in CI. No config change is required. Pipelines that install AgentOps from
@mainor upgrade tov0.5.1pick it up on the next run. - If you generated workflows with an earlier version, re-run
agentops workflow generate(or bump the action majors by hand) to clear any remaining Node20 deprecation warnings.
Links
- Render gate results in Actions summary + Node24 workflow templates: #331
- Grey-box multi-field JSON capture (shipped in v0.5.0): #330
Validation
- Release build job: ruff + full unit test suite passed.
- Published to PyPI and verified on TestPyPI.
v0.5.0
What changed
This release lets AgentOps score the retrieval an agent actually used at eval time, instead of only the static context stored in your dataset. It is the engine piece that unlocks RAG evaluators (Groundedness, Retrieval, Document Retrieval) against a live HTTP orchestrator.
Added
- Grey-box retrieval capture for HTTP JSON targets. An HTTP target can now capture extra named fields from a JSON response through a
response_fieldsmap (name -> dot-path). Captured values are exposed to evaluatorinput_mappingas$response.<name>(for example$response.context,$response.retrieved_documents), and dataset columns can be referenced with$row.<name>(for example$row.qrels). - This lets RAG evaluators such as Groundedness, Retrieval, and Document Retrieval score the retrieval that was actually used during the run, rather than static dataset context.
Compatibility
- Fully backward compatible. The primary prediction (
response_field) and existing single-field behavior are unchanged whenresponse_fieldsis not set. Multi-field capture applies to JSON-mode HTTP targets only.
Docs
- HTTP target reference now documents
response_fieldswith a grey-box example, and the bundle input-mapping reference documents the$response.<name>and$row.<name>tokens.
Install
pip install agentops-accelerator==0.5.0
Validation
- Release pipeline build gate (ruff + full unit suite + package) passed; published to TestPyPI and PyPI.
v0.4.5
What this release is about
This release lets AgentOps run its safety and quality gates (ASSERT and Red Team) against a live HTTP orchestrator endpoint, not only model or deployment targets. With this, an HTTP agent such as the GPT-RAG orchestrator can be governed end to end in CI: evaluation, ASSERT policy checks, and Red Team, with the generated GitHub Actions and Azure DevOps pipelines running those gates automatically.
It also fixes a reasoning-model judge error that could fail the eval gate in CI.
Merged via #326 (feature), #327 (release lint hotfix), and #328 (release test hotfix).
Added
- Governance gates for HTTP agents (ASSERT and Red Team).
agentops assert runandagentops redteam runnow work against a live HTTP orchestrator endpoint. Red Team wraps the HTTP endpoint as an SDK-compatible target and reuses the AgentOps HTTP mapping (request_field,response_mode,stream, custom headers). ASSERT resolvesassert-aiinside the active virtual environment, accepts non-secret values fromassert.env, can request an AAD token from the Azure CLI for local auth-disabled Azure AI resources, injects the GPT-5max_completion_tokensshim only when configured, and materializes a runtime ASSERT config so committed configs no longer need absolute artifact paths. - Generated workflows run the ASSERT and Red Team gates.
agentops workflow generatenow installs the optional ASSERT/Red Team dependencies, runs those gates whenassert:orredteam:is present inagentops.yaml, uploads their artifacts, and emits the corrected Red Team command quoting.
Fixed
- Reasoning-model judges no longer fail the eval gate in CI. The generated GitHub Actions and Azure DevOps eval and Red Team steps now forward
AZURE_OPENAI_MODEL_NAME, so AgentOps detects reasoning models (such asgpt-5-nano) and usesmax_completion_tokensinstead ofmax_tokens. This removes the judge400error that could break the eval gate when the judge deployment is a reasoning model.
Upgrade notes
The CLI install spec in generated workflows can now target @main again. The HTTP governance code is on main as of this release, so projects that pinned a temporary feature branch can drop the pin. The HTTP agent tutorial no longer documents a temporary install pin.
Validated
- A live GPT-RAG orchestrator project ran both generated pipelines green: the PR workflow (sandbox deploy candidate plus eval, ASSERT, and Red Team gates) and the dev deploy workflow (eval gate plus
azd deploy), using OIDC andazd deployonly. - Unit suites for the generator and the ASSERT/Red Team runners pass.
v0.4.4
Evaluate streaming agents directly, no adapter
AgentOps http_json targets can now evaluate agents that stream their answers
(Server-Sent Events or raw text), not just agents that return a single JSON
body. This means you can point an eval straight at a streaming endpoint such as
the gpt-rag-orchestrator /orchestrator route without writing and hosting a
thin non-streaming adapter first.
Defaults are unchanged: existing http_json targets keep parsing a single JSON
response exactly as before (response_mode: json is the default and is
byte-for-byte compatible).
What you can do now
- Set
response_mode: sseorresponse_mode: texton anhttp_jsontarget to
read a streamed body and aggregate it into one answer for scoring. - Configure aggregation with an optional
streamblock:text_field: dotted path to the token text when each SSEdata:line is
JSON (for examplechoices.0.delta.content).done_marker: stop token, for example[DONE].strip_leading_token: drop the first whitespace-delimited token. The
gpt-rag orchestrator emits theconversation_idas the first chunk, so this
removes it from the scored answer.
- Use a non-bearer auth header for endpoints gated by a shared secret. Two new
fields,auth_header_name(defaultAuthorization) and
auth_value_template(defaultBearer {token}), let you send something like
X-API-KEY: <secret>without putting the secret inagentops.yaml. The
{token}placeholder is replaced by the value of theauth_header_env
environment variable at run time.
When a JSON parse fails on a text/event-stream response, the error now tells
you to set response_mode: sse or response_mode: text, so the misconfiguration
is obvious instead of a raw decode error.
Example: gpt-rag-orchestrator
target:
kind: http_json
url: https://<your-orchestrator>/orchestrator
request_field: ask
response_mode: text
auth_header_name: X-API-KEY
auth_value_template: "{token}"
auth_header_env: ORCHESTRATOR_APP_APIKEY
stream:
strip_leading_token: trueCompatibility and transport
- No change for existing targets.
response_modedefaults tojson. - Streaming reuses the same Python standard-library (
urllib) transport and the
same 3-try backoff as the JSON path. The body is read to completion (eval
answers are bounded) and then aggregated, so retries stay robust.
Validation
- Full unit suite: 991 passed, 1 skipped (14 net-new tests, no regressions).
- New tests cover SSE/text aggregation, the leading-token strip,
done_marker
andevent: errorhandling, the configurable auth header, and the
response_modevalidation gate.
New config fields
| Field | Default | Purpose |
|---|---|---|
response_mode |
json |
Selects the response parser: json, sse, or text. |
stream.text_field |
none | Dotted path to token text inside each JSON SSE data: line. |
stream.done_marker |
none | Token that ends the stream, e.g. [DONE]. |
stream.strip_leading_token |
false |
Drops the first whitespace-delimited token (e.g. a conversation_id prefix). |
auth_header_name |
Authorization |
Header used to send the auth value. |
auth_value_template |
Bearer {token} |
Template for the auth value; {token} is replaced by auth_header_env. |
Full changelog: see CHANGELOG.md ([0.4.4]).
v0.4.3
[0.4.3] - 2026-06-17
Added
- Prompt-agent tutorials no longer require manual portal copy/paste.
agentops prompt pullreads the configured Foundry prompt agent
(agent: name:version), validates that the Foundry definition is actually a
prompt agent, and writes the reviewed Sandbox instructions to
.agentops/prompts/<agent-name>.prompt.mdby default. Before writing, the CLI
prints the resolved agent, endpoint, endpoint source, and destination file so
operators can catch the wrong environment early. Changed prompt files are
protected by default and require--forceto overwrite reviewed local edits.
The command updatesprompt_fileinagentops.yamlunless
--no-update-configis passed, and it can resolve the endpoint from
--project-endpoint,agentops.yaml,AZURE_AI_FOUNDRY_PROJECT_ENDPOINT, or
the active.azure/<env>/.env. The prompt-agent tutorial and packaged
agentops-evalskill now use this command instead of a manual here-string.
(#322)
Changed
agentops eval initnow recommends evaluators from the agent and dataset
shape. The azd bootstrap path now reuses the same AgentOps evaluator
catalog asagentops eval run: free-form answer datasets get answer-quality
checks, RAG-shaped datasets get groundedness / relevance / retrieval checks,
and tool-use datasets get tool-call / intent / task-adherence checks while
avoiding literal-answer similarity metrics. Explicitevaluators:entries in
agentops.yamlstill win. The CLI prints the recommendation source, detected
signals, and selected azd built-ins before reporting the generated
eval.yaml, so users can see why those evaluators were chosen.
(#323)
v0.4.2
v0.4.1
Note: v0.4.0 was tagged but its
Releaseworkflow failed in the
buildstep (see #311), so v0.4.0 was never published to PyPI or the
VS Code Marketplace. v0.4.1 supersedes it: it contains every change from
v0.4.0 plus the test-suite fix (#311) that unblocks the release pipeline.
Installing or upgrading from PyPI/Marketplace will pick up v0.4.1 directly.
What's Changed * chore(deps-dev): update azure-monitor-query requirement from <2.0,>=1.3 to >=1.3,<3.0 by @dependabot[bot] in #248 * chore(deps-dev): update azure-mgmt-cognitiveservices requirement from <14.0,>=13.5 to >=13.5,<15.0 by @dependabot[bot] in #247 * chore(deps-dev): update pandas requirement from <3.0,>=2.0 to >=2.0,<4.0 by @dependabot[bot] in #246 * chore(deps): bump aiohttp from 3.13.5 to 3.14.0 by @dependabot[bot] in #231 * chore(deps): bump idna from 3.11 to 3.15 by @dependabot[bot] in #168 * feat(prompt-deploy): tag PR candidates as agentops:candidate=true by @placerda in #309 * Release v0.4.1 by @placerda in #310 * fix(tests): cherry-pick #311 to main to unblock Release by @placerda in #312 Full Changelog: v0.4.0...v0.4.1
v0.4.0
v0.4.0
Added
agentops doctordetects missing OpenAI data-plane RBAC on the Foundry resource. A newsecurity.missing_openai_data_plane_rbaccheck resolves the signed-in principal (via theoidclaim of the access token used byDefaultAzureCredential) and lists role assignments at the Foundry account scope usingazure-mgmt-authorization. When none of Cognitive Services OpenAI User, Cognitive Services OpenAI Contributor or Cognitive Services Contributor is granted (directly or inherited), Doctor surfaces a WARNING with the exactaz role assignment createcommand pre-populated with the principal object id and Foundry account scope. (#228, #307)
Changed
agentops-prworkflow templates now auto-detect a committed baseline. Both the GitHub Actions and Azure DevOps PR templates emitted byagentops workflow generatewrapagentops eval runwith a small bash guard: when.agentops/baseline/results.jsonexists in the consumer repo the step automatically passes--baseline .agentops/baseline/results.json; without the file the behaviour is unchanged. Deploy templates (dev/qa/prod) are untouched. (#155, #306)