Summary
The Azure AI Inference Python SDK (azure-ai-inference) is Microsoft's official client for accessing AI models through Azure AI Foundry serverless endpoints, GitHub Models, managed compute endpoints, and the Azure OpenAI Service. It exposes ChatCompletionsClient, EmbeddingsClient, and ImageEmbeddingsClient for executing inference against a broad catalog of models (Meta Llama 3.3, Mistral Large, DeepSeek-R1, Microsoft Phi-4, Cohere Command R, and others) that are hosted on Azure but are not accessible through the standard OpenAI Python client.
This repository has zero instrumentation for any azure-ai-inference execution surface — no integration directory, no wrapper, no patcher, no auto_instrument() support. Users who call ChatCompletionsClient.complete() or EmbeddingsClient.embed() directly get no Braintrust spans.
The SDK cannot be wrapped with wrap_openai() because ChatCompletionsClient is a distinct class with its own authentication (Azure API keys or Entra ID) and its own request/response types from the azure.ai.inference.models namespace. wrap_openai() requires an openai.OpenAI instance.
The Braintrust docs list "Azure AI Foundry" as a supported cloud provider, but this coverage is provided through the AI Proxy gateway (using an OpenAI client pointed at the Braintrust gateway URL), not through native azure-ai-inference SDK tracing. Users who follow Microsoft's official documentation for Azure AI Foundry (pip install azure-ai-inference) and call ChatCompletionsClient.complete() directly get zero Braintrust spans.
What needs to be instrumented
The azure-ai-inference package (v1.0.0b9) exposes these execution surfaces, none of which are instrumented:
Chat completions (highest priority)
| SDK Method |
Description |
Streaming |
ChatCompletionsClient.complete(messages, ...) |
Chat completions via Azure AI Foundry / GitHub Models |
No |
ChatCompletionsClient.complete(messages, stream=True, ...) |
Streaming chat completions |
StreamingChatCompletions iterator |
AsyncChatCompletionsClient.complete(...) |
Async chat completions |
No |
AsyncChatCompletionsClient.complete(..., stream=True) |
Async streaming chat completions |
AsyncStreamingChatCompletions |
Response shape: ChatCompletions with choices[0].message.content, choices[0].finish_reason, usage.prompt_tokens, usage.completion_tokens, usage.total_tokens, model, id. Mirrors the OpenAI response shape in structure but is a distinct Azure type.
Streaming: StreamingChatCompletions is an iterable of StreamingChatCompletionsUpdate objects with choices[0].delta.content. The integration must accumulate deltas and finalize the span when iteration completes.
Embeddings
| SDK Method |
Description |
EmbeddingsClient.embed(input, ...) |
Generate embeddings for a list of texts |
AsyncEmbeddingsClient.embed(input, ...) |
Async embeddings |
Return type: EmbeddingsResult with data[0].embedding (list of floats) and usage.prompt_tokens.
Implementation notes
Authentication: Uses Azure API key (AzureKeyCredential) or Entra ID (DefaultAzureCredential). VCR cassettes need api-key header sanitization.
Endpoint-per-model pattern: Unlike OpenAI where a single client accesses all models, each Azure AI Foundry deployment has its own endpoint URL. The model name is embedded in the endpoint or returned in the response. Span metadata should capture model from ChatCompletions.model.
GitHub Models support: The same ChatCompletionsClient is used for GitHub Models (with endpoint="https://models.inference.ai.azure.com" and a GitHub token). GitHub Models provides free-tier access to GPT-4o, Llama, Mistral, and others for prototyping.
Parameters relevant for span metadata: model (or inferred from endpoint), temperature, max_tokens, top_p, frequency_penalty, presence_penalty, seed, tools, response_format, stop.
No coverage in any instrumentation layer
- No integration directory (
py/src/braintrust/integrations/azure_ai_inference/)
- No wrapper function (e.g.
wrap_azure_ai_inference())
- No patcher in any existing integration
- No nox test session (
test_azure_ai_inference)
- No version entry in
py/src/braintrust/integrations/versioning.py
- No mention in
py/src/braintrust/integrations/__init__.py
A grep for azure.ai, azure-ai-inference, or azure_ai_inference across py/src/braintrust/ returns zero matches.
Braintrust docs status
unclear — The Braintrust AI providers page lists "Azure AI Foundry" as a supported cloud provider, but the integration is through the AI Proxy gateway (routing openai.AzureOpenAI or openai.OpenAI through the Braintrust gateway URL), not through a native azure-ai-inference SDK wrapper. Users following Microsoft's official azure-ai-inference quickstart docs get zero native Braintrust tracing.
Upstream references
Local repo files inspected
py/src/braintrust/integrations/ — no azure_ai_inference/ directory on main
py/src/braintrust/wrappers/ — no Azure AI Inference wrapper
py/noxfile.py — no test_azure_ai_inference session
py/pyproject.toml [tool.braintrust.matrix] — no azure-ai-inference entry
py/src/braintrust/integrations/__init__.py — Azure AI Inference not listed
py/src/braintrust/integrations/versioning.py — no Azure AI Inference version matrix
- Full repo grep for
azure.ai, azure-ai-inference, azure_ai_inference — zero matches in SDK source
Summary
The Azure AI Inference Python SDK (
azure-ai-inference) is Microsoft's official client for accessing AI models through Azure AI Foundry serverless endpoints, GitHub Models, managed compute endpoints, and the Azure OpenAI Service. It exposesChatCompletionsClient,EmbeddingsClient, andImageEmbeddingsClientfor executing inference against a broad catalog of models (Meta Llama 3.3, Mistral Large, DeepSeek-R1, Microsoft Phi-4, Cohere Command R, and others) that are hosted on Azure but are not accessible through the standard OpenAI Python client.This repository has zero instrumentation for any
azure-ai-inferenceexecution surface — no integration directory, no wrapper, no patcher, noauto_instrument()support. Users who callChatCompletionsClient.complete()orEmbeddingsClient.embed()directly get no Braintrust spans.The SDK cannot be wrapped with
wrap_openai()becauseChatCompletionsClientis a distinct class with its own authentication (Azure API keys or Entra ID) and its own request/response types from theazure.ai.inference.modelsnamespace.wrap_openai()requires anopenai.OpenAIinstance.The Braintrust docs list "Azure AI Foundry" as a supported cloud provider, but this coverage is provided through the AI Proxy gateway (using an OpenAI client pointed at the Braintrust gateway URL), not through native
azure-ai-inferenceSDK tracing. Users who follow Microsoft's official documentation for Azure AI Foundry (pip install azure-ai-inference) and callChatCompletionsClient.complete()directly get zero Braintrust spans.What needs to be instrumented
The
azure-ai-inferencepackage (v1.0.0b9) exposes these execution surfaces, none of which are instrumented:Chat completions (highest priority)
ChatCompletionsClient.complete(messages, ...)ChatCompletionsClient.complete(messages, stream=True, ...)StreamingChatCompletionsiteratorAsyncChatCompletionsClient.complete(...)AsyncChatCompletionsClient.complete(..., stream=True)AsyncStreamingChatCompletionsResponse shape:
ChatCompletionswithchoices[0].message.content,choices[0].finish_reason,usage.prompt_tokens,usage.completion_tokens,usage.total_tokens,model,id. Mirrors the OpenAI response shape in structure but is a distinct Azure type.Streaming:
StreamingChatCompletionsis an iterable ofStreamingChatCompletionsUpdateobjects withchoices[0].delta.content. The integration must accumulate deltas and finalize the span when iteration completes.Embeddings
EmbeddingsClient.embed(input, ...)AsyncEmbeddingsClient.embed(input, ...)Return type:
EmbeddingsResultwithdata[0].embedding(list of floats) andusage.prompt_tokens.Implementation notes
Authentication: Uses Azure API key (
AzureKeyCredential) or Entra ID (DefaultAzureCredential). VCR cassettes needapi-keyheader sanitization.Endpoint-per-model pattern: Unlike OpenAI where a single client accesses all models, each Azure AI Foundry deployment has its own endpoint URL. The model name is embedded in the endpoint or returned in the response. Span metadata should capture
modelfromChatCompletions.model.GitHub Models support: The same
ChatCompletionsClientis used for GitHub Models (withendpoint="https://models.inference.ai.azure.com"and a GitHub token). GitHub Models provides free-tier access to GPT-4o, Llama, Mistral, and others for prototyping.Parameters relevant for span metadata:
model(or inferred from endpoint),temperature,max_tokens,top_p,frequency_penalty,presence_penalty,seed,tools,response_format,stop.No coverage in any instrumentation layer
py/src/braintrust/integrations/azure_ai_inference/)wrap_azure_ai_inference())test_azure_ai_inference)py/src/braintrust/integrations/versioning.pypy/src/braintrust/integrations/__init__.pyA grep for
azure.ai,azure-ai-inference, orazure_ai_inferenceacrosspy/src/braintrust/returns zero matches.Braintrust docs status
unclear— The Braintrust AI providers page lists "Azure AI Foundry" as a supported cloud provider, but the integration is through the AI Proxy gateway (routingopenai.AzureOpenAIoropenai.OpenAIthrough the Braintrust gateway URL), not through a nativeazure-ai-inferenceSDK wrapper. Users following Microsoft's officialazure-ai-inferencequickstart docs get zero native Braintrust tracing.Upstream references
Local repo files inspected
py/src/braintrust/integrations/— noazure_ai_inference/directory onmainpy/src/braintrust/wrappers/— no Azure AI Inference wrapperpy/noxfile.py— notest_azure_ai_inferencesessionpy/pyproject.toml[tool.braintrust.matrix]— no azure-ai-inference entrypy/src/braintrust/integrations/__init__.py— Azure AI Inference not listedpy/src/braintrust/integrations/versioning.py— no Azure AI Inference version matrixazure.ai,azure-ai-inference,azure_ai_inference— zero matches in SDK source