Microsoft Agent Framework (Python) — Observability & Telemetry

Observability & Telemetry — Python

agent-framework-core emits OpenTelemetry signals following the GenAI semantic conventions. Every agent run, model call, tool invocation, workflow executor, and edge group is a span; every chat completion emits a duration histogram and a token-usage histogram. Nothing is exported by default — you opt in either by calling a helper, setting one env var, or wiring your own OTel providers.

Verified against agent-framework-core==1.2.2 (agent_framework.observability).

Three ways to turn it on

Pick exactly one. Mixing them causes duplicate providers.

1. `configure_otel_providers()` — batteries included

Auto-wires OTLP exporters from standard OTel env vars and enables instrumentation in one call.

from agent_framework.observability import configure_otel_providers

configure_otel_providers()

Set the usual OTel env vars before calling:

export ENABLE_INSTRUMENTATION=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_SERVICE_NAME=my-agent

For local debugging, route everything to stdout:

configure_otel_providers(enable_console_exporters=True)

2. Bring your own providers — just enable instrumentation

When your app already owns OTel setup (Azure Monitor, Datadog, Honeycomb agents), skip configure_otel_providers and call enable_instrumentation():

from azure.monitor.opentelemetry import configure_azure_monitor
from agent_framework.observability import enable_instrumentation

configure_azure_monitor(connection_string=AZURE_MONITOR_CONNECTION)
enable_instrumentation()

3. Custom exporters, keeping the framework’s providers

Pass your own exporters and optional Views; framework-managed providers are still created:

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.sdk.metrics.view import View
from agent_framework.observability import configure_otel_providers, create_metric_views

configure_otel_providers(
    exporters=[
        OTLPSpanExporter(endpoint="http://otel-collector:4317"),
        OTLPLogExporter(endpoint="http://otel-collector:4317"),
    ],
    views=create_metric_views(),  # default GenAI + agent_framework views
)

What you get

Spans

Span name	Source	Key attributes
`invoke_agent`	`Agent.run()`	`gen_ai.agent.name`, `gen_ai.agent.id`, `gen_ai.conversation.id`
`chat`	Every chat-client call	`gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons`
`execute_tool`	Each tool invocation	`gen_ai.tool.name`, `gen_ai.tool.call.id`, `gen_ai.tool.call.arguments`, `gen_ai.tool.call.result`
`workflow.build` / `workflow.run`	Workflow lifecycle	`workflow.name`, `workflow.id`
`executor.process`	Each executor run	`executor.id`, `executor.type`
`edge_group.process`	Each edge traversal	`edge_group.id`, `edge_group.type`, `edge_group.delivery_status`
`message.send`	Cross-executor messaging	`message.source_id`, `message.target_id`, `message.type`

All attribute constants live in agent_framework.observability.OtelAttr — use them to build span processors and samplers rather than typing strings.

Metrics

Two histograms emit automatically once instrumentation is on:

Metric	Unit	Dimensions
`gen_ai.client.operation.duration`	seconds	`gen_ai.operation.name`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.provider.name`, `error.type`
`gen_ai.client.token.usage`	tokens	Same + `gen_ai.token.type` (`input` \| `output`)

Plus agent_framework.function.invocation.duration per tool call.

Logs

Agent-framework forwards to the standard Python logging module. Configure via dictConfig:

import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("agent_framework").setLevel(logging.DEBUG)

When instrumentation is enabled, logs flow through OTel’s LoggingHandler and ship alongside traces/metrics.

Sensitive-data events

Prompts and completions are redacted by default. Flip enable_sensitive_data=True to include full message bodies as span events — useful in dev, avoid in production.

configure_otel_providers(enable_sensitive_data=True)
# or: export ENABLE_SENSITIVE_DATA=true

VS Code AI Toolkit / Foundry extension

The AI Toolkit and Azure AI Foundry VS Code extensions listen on a local port for live traces. Point the framework at it:

configure_otel_providers(vs_code_extension_port=4317)
# or: export VS_CODE_EXTENSION_PORT=4317

Open the “AI Toolkit: Traces” view — every agent run, tool call, and workflow edge shows up with inputs, outputs, and timing.

Custom spans around business logic

You can still wrap your own code in spans; they slot neatly into the framework’s tree.

from opentelemetry import trace

tracer = trace.get_tracer("myapp.customer")

with tracer.start_as_current_span("process_order", attributes={"order.id": order_id}):
    response = await agent.run(f"Draft reply for order {order_id}")

For agent-level wrapping use AgentMiddleware instead — see the middleware page.

Use the framework’s tracer / meter helpers

agent_framework.observability.get_tracer() and get_meter() return providers that share the same configuration the framework uses internally. Calling them ensures your custom spans land alongside invoke_agent / chat / execute_tool and inherit the same resource attributes (service name, version):

from agent_framework.observability import get_tracer, get_meter

tracer = get_tracer("myapp.pricing")
meter = get_meter("myapp.pricing")

cache_hits = meter.create_counter(
    "myapp.pricing.cache_hits",
    description="Pricing cache hit count",
    unit="{request}",
)


async def price_quote(sku: str, *, cache, agent):
    with tracer.start_as_current_span("price_quote", attributes={"sku": sku}):
        if cached := cache.get(sku):
            cache_hits.add(1, {"sku.tier": cached.tier})
            return cached
        return await agent.run(f"Price quote for {sku}")

Reading framework attributes with `OtelAttr`

OtelAttr is the single source of truth for every span attribute the framework emits. Use it to filter, sample, or post-process spans without typing string keys:

from opentelemetry.context import Context
from opentelemetry.sdk.trace import ReadableSpan, Span
from opentelemetry.sdk.trace.export import SpanProcessor
from agent_framework.observability import OtelAttr


class TokenSpendProcessor(SpanProcessor):
    """Side-car processor that counts model spend per agent."""

    def __init__(self) -> None:
        self.totals: dict[str, int] = {}

    def on_start(self, span: Span, parent_context: Context | None = None) -> None:
        # No-op — counting happens once spans are finalised in `on_end`.
        return

    def on_end(self, span: ReadableSpan) -> None:
        if span.name != OtelAttr.CHAT_COMPLETION_OPERATION.value:
            return
        attrs = dict(span.attributes or {})
        agent = attrs.get(OtelAttr.AGENT_NAME.value, "unknown")
        in_tok = attrs.get(OtelAttr.INPUT_TOKENS.value, 0)
        out_tok = attrs.get(OtelAttr.OUTPUT_TOKENS.value, 0)
        self.totals[agent] = self.totals.get(agent, 0) + in_tok + out_tok

    def shutdown(self) -> None: ...
    def force_flush(self, timeout_millis: int = 30_000) -> bool: return True


# Register alongside the framework's exporters.
from opentelemetry import trace
trace.get_tracer_provider().add_span_processor(TokenSpendProcessor())

Common attribute keys you’ll reach for (full list in OtelAttr):

Use case	Attribute
Filter to one agent	`OtelAttr.AGENT_NAME`
Group by model	`OtelAttr.RESPONSE_MODEL`, `OtelAttr.REQUEST_MODEL`
Track conversation	`OtelAttr.CONVERSATION_ID`
Slice tool usage	`OtelAttr.TOOL_NAME`, `OtelAttr.TOOL_TYPE`
Workflow drilldown	`OtelAttr.WORKFLOW_NAME`, `OtelAttr.EXECUTOR_ID`, `OtelAttr.EDGE_GROUP_TYPE`

Sampling — keep traces affordable

OTel samplers run before exporters; pair them with configure_otel_providers(...) so the framework spans inherit your sampler:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.trace.sampling import (
    ALWAYS_ON,
    ParentBased,
    TraceIdRatioBased,
)
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from agent_framework.observability import enable_instrumentation

# 5% head-based sampling for high-volume production; always trace if upstream did.
sampler = ParentBased(root=TraceIdRatioBased(0.05))
provider = TracerProvider(sampler=sampler)
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)

enable_instrumentation()        # do NOT call configure_otel_providers — your provider wins

For tail-based sampling (keep every trace that errored or exceeded a latency budget), use the OTel Collector’s tail_sampling processor — agent-framework spans expose error.type and the duration histograms as standard inputs.

Filtering noisy spans

A custom SpanProcessor can drop spans the exporter never sees — useful when you want to capture model calls but not every internal edge_group.process:

from opentelemetry.sdk.trace import ReadableSpan
from opentelemetry.sdk.trace.export import SpanExportResult, SpanExporter
from agent_framework.observability import OtelAttr


class DropInternalEdgeSpans(SpanExporter):
    """Wrap any exporter to drop edge-group spans below a configurable threshold."""

    def __init__(self, inner: SpanExporter, *, min_duration_ms: float = 50.0) -> None:
        self._inner = inner
        self._min_ns = min_duration_ms * 1_000_000

    def export(self, spans):
        keep = [
            s for s in spans
            if s.name != OtelAttr.EDGE_GROUP_PROCESS_SPAN.value
            or (s.end_time - s.start_time) > self._min_ns
        ]
        return self._inner.export(keep) if keep else SpanExportResult.SUCCESS

    def force_flush(self, timeout_millis: int = 30_000) -> bool:
        # Forward flush so graceful shutdown drains the wrapped exporter.
        return self._inner.force_flush(timeout_millis)

    def shutdown(self) -> None:
        self._inner.shutdown()

Wrap your real exporter in BatchSpanProcessor(DropInternalEdgeSpans(real_exporter)).

Capping metric cardinality

create_metric_views() returns the framework’s default views. Append your own to drop attributes you don’t query on — high-cardinality dimensions (per-user IDs, request IDs) blow up storage costs:

from opentelemetry.sdk.metrics.view import View
from agent_framework.observability import (
    configure_otel_providers,
    create_metric_views,
    OtelAttr,
)

views = create_metric_views() + [
    # Aggregate token usage without conversation_id — keep model + agent dimensions only.
    View(
        instrument_name=OtelAttr.LLM_TOKEN_USAGE.value,
        attribute_keys={
            OtelAttr.RESPONSE_MODEL.value,
            OtelAttr.AGENT_NAME.value,
            OtelAttr.T_TYPE.value,
        },
    ),
]

configure_otel_providers(views=views)

Stamping your own context onto every span

Two ways to attach business attributes — pick the one that matches lifetime:

Per-agent.run(...) call — pass additional_properties={"gen_ai.conversation.id": req_id} so the conversation correlator uses your upstream request ID.
Per-process resource attributes — set OTEL_RESOURCE_ATTRIBUTES=team=payments,env=prod,deployment.version=1.4.2 before launching your service. The framework attaches them to every span/metric/log automatically.

import os
os.environ["OTEL_RESOURCE_ATTRIBUTES"] = "team=payments,env=prod,deployment.version=1.4.2"
os.environ["OTEL_SERVICE_NAME"] = "checkout-agent"

from agent_framework.observability import configure_otel_providers
configure_otel_providers()

For request-scoped attributes use a span processor that reads from a contextvars.ContextVar you set at the API boundary — that way every framework-emitted span inherits them without changing call sites.

Correlating conversations

The framework stamps gen_ai.conversation.id on every span in a single agent.run(...) call and across session-scoped runs. In your tracing UI, filter by that attribute to see the full conversation on one trace.

Pass a custom ID through additional_properties on Agent(...) to use your own correlation key (e.g. a request ID from upstream).

Azure Monitor

One-liner for Application Insights:

from azure.monitor.opentelemetry import configure_azure_monitor
from agent_framework.observability import enable_instrumentation

configure_azure_monitor(connection_string="InstrumentationKey=...;IngestionEndpoint=...")
enable_instrumentation()

Then query in Application Insights:

All agent runs — dependencies | where name == "invoke_agent"
Slow chat calls — dependencies | where name == "chat" | where duration > 3000 | summarize count() by customDimensions.["gen_ai.response.model"]
Tool-call volume — dependencies | where name == "execute_tool" | summarize count() by customDimensions.["gen_ai.tool.name"]

Production checklist

Set OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES (team, env, version).
Leave ENABLE_SENSITIVE_DATA off unless you’re scrubbing downstream.
Use OTLP gRPC (OTEL_EXPORTER_OTLP_PROTOCOL=grpc) to an in-cluster collector; batch there rather than from each pod.
Wire a LogRecordExporter too — tool stack traces and middleware failures land in logs, not spans.
Filter noise with create_metric_views() and pass extra View(instrument_name="...", aggregation=...) entries to cap cardinality on high-traffic deployments.
Disable the user-agent telemetry banner with AGENT_FRAMEWORK_USER_AGENT_TELEMETRY_DISABLED=true if corporate policy forbids it.

Disabling everything

ENABLE_INSTRUMENTATION defaults to false. Without either env var or enable_instrumentation(), no signals are produced — even if OTel SDK is configured elsewhere.

To hard-disable mid-run, unset OBSERVABILITY_SETTINGS.enable_instrumentation = False from the same module — but prefer controlling it at startup.