Skip to content

PydanticAI — Class Deep Dives Vol. 9

import { Aside } from ‘@astrojs/starlight/components’;

Ten class groups from the pydantic_ai 1.105.0 source covering: the complete wire-format anatomy of ModelRequest and ModelResponse; the three request-side message part types (SystemPromptPart, UserPromptPart, RetryPromptPart); the call-part family (BaseToolCallPart, ToolCallPart, NativeToolCallPart) with typed-subclass promotion and streaming helpers; the return-part family (BaseToolReturnPart, ToolReturnPart, NativeToolReturnPart) with multi-modal content splitting and outcome tracking; GraphAgentState run-state internals; two new native tool types (MCPServerTool and FileSearchTool); IncludeReturnSchemasToolset for automatic tool-return-schema injection; ToolChoice + ToolOrOutput for surgical tool/output mode control; and the ServiceTier + ThinkingLevel type aliases that centralise cross-provider configuration.


1. ModelRequest + ModelResponse — Wire-Format Message Anatomy

Section titled “1. ModelRequest + ModelResponse — Wire-Format Message Anatomy”

Module: pydantic_ai.messages
Import: from pydantic_ai import ModelRequest, ModelResponse

Every conversation between PydanticAI and a model is represented as a list[ModelMessage] where ModelMessage = ModelRequest | ModelResponse. Understanding these two dataclasses in depth lets you parse, inspect, replay, or surgically modify conversation history.

from datetime import datetime
from typing import Literal, Sequence, Any
@dataclass(repr=False)
class ModelRequest:
"""A request generated by PydanticAI and sent to a model."""
parts: Sequence[ModelRequestPart]
# ModelRequestPart = SystemPromptPart | UserPromptPart | ToolReturnPart
# | NativeToolReturnPart | RetryPromptPart | InstructionPart
timestamp: datetime | None = None # when the request was sent
instructions: str | None = None # rendered instruction string (for logging/debugging)
kind: Literal['request'] = 'request' # discriminator
run_id: str | None = None # UUID7 for the agent run
conversation_id: str | None = None # UUID7 spanning multiple runs
metadata: dict[str, Any] | None = None # app-only data, never sent to LLM
@classmethod
def user_text_prompt(
cls,
user_prompt: str,
*,
instructions: str | None = None,
) -> 'ModelRequest': ...
@dataclass(repr=False)
class ModelResponse:
"""A response from a model."""
parts: Sequence[ModelResponsePart]
# ModelResponsePart = TextPart | ThinkingPart | ToolCallPart | NativeToolCallPart
# | FilePart | CompactionPart
usage: RequestUsage = field(default_factory=RequestUsage)
model_name: str | None = None # e.g. 'gpt-4o-2024-08-06'
timestamp: datetime = field(default_factory=now_utc)
finish_reason: FinishReason | None = None # 'stop' | 'tool_calls' | 'length' | 'content_filter' | ...
state: ModelResponseState | None = None # 'complete' | 'incomplete' | 'interrupted'
kind: Literal['response'] = 'response' # discriminator
run_id: str | None = None
conversation_id: str | None = None
# FinishReason type alias
# Literal['stop', 'tool_calls', 'length', 'content_filter', 'other']
# ModelResponseState type alias
# Literal['complete', 'incomplete', 'interrupted']
FieldTypeNotes
ModelRequest.partsSequence[ModelRequestPart]Ordered mix of system prompts, user prompts, tool returns, retries
ModelRequest.instructionsstr | NoneRendered instruction string stored for debugging/logging only
ModelRequest.run_idstr | NoneUUID7 shared with RunContext.run_id and OTel span
ModelRequest.conversation_idstr | NoneUUID7 that spans multi-run conversations
ModelRequest.metadatadict[str, Any] | NoneApp-only; never sent to LLM
ModelResponse.finish_reasonFinishReason | None'stop' = clean end; 'tool_calls' = pending calls; 'length' = token limit hit; 'content_filter' = blocked
ModelResponse.stateModelResponseState | None'complete' = all parts arrived; 'incomplete' = interrupted mid-stream; 'interrupted' = cancelled
ModelResponse.usageRequestUsagePer-request tokens (input, output, cache hit/miss, audio, details)
ModelResponse.model_namestr | NoneResolved model name as reported by the provider
import asyncio
from pydantic_ai import Agent, capture_run_messages
from pydantic_ai.messages import (
ModelRequest,
ModelResponse,
UserPromptPart,
TextPart,
ToolCallPart,
ToolReturnPart,
)
agent = Agent('openai:gpt-4o-mini')
async def inspect_history():
with capture_run_messages() as messages:
result = await agent.run("What is 2 + 2?")
for msg in messages:
if isinstance(msg, ModelRequest):
print(f"REQUEST run_id={msg.run_id} conv={msg.conversation_id}")
for part in msg.parts:
if isinstance(part, UserPromptPart):
print(f" user: {part.content!r}")
elif isinstance(msg, ModelResponse):
print(f"RESPONSE finish={msg.finish_reason} model={msg.model_name}")
for part in msg.parts:
if isinstance(part, TextPart):
print(f" text: {part.content!r}")
elif isinstance(part, ToolCallPart):
print(f" tool_call: {part.tool_name}({part.args_as_json_str()})")
asyncio.run(inspect_history())

Replaying a request with modified metadata

Section titled “Replaying a request with modified metadata”
from dataclasses import replace
from pydantic_ai.messages import ModelRequest, UserPromptPart
# Load from storage
raw = load_messages_from_db(conversation_id="abc-123")
# Find the last request and add metadata for replay
last_request = next(m for m in reversed(raw) if isinstance(m, ModelRequest))
tagged = replace(last_request, metadata={"replay": True, "analyst": "claude"})
result = await agent.run(
"Continue",
message_history=raw[:-1] + [tagged],
)

Serialisation with ModelMessagesTypeAdapter

Section titled “Serialisation with ModelMessagesTypeAdapter”
import json
from pydantic_ai import ModelMessagesTypeAdapter
from pydantic_ai.messages import ModelRequest, ModelResponse
# Serialise
messages: list[ModelRequest | ModelResponse] = ...
blob = ModelMessagesTypeAdapter.dump_json(messages)
# Deserialise (discriminated union on `kind` field)
restored = ModelMessagesTypeAdapter.validate_json(blob)
# To a Python list for storage
records = ModelMessagesTypeAdapter.dump_python(messages, mode='json')

2. SystemPromptPart + UserPromptPart + RetryPromptPart — Request-Side Message Parts

Section titled “2. SystemPromptPart + UserPromptPart + RetryPromptPart — Request-Side Message Parts”

Module: pydantic_ai.messages
Import: from pydantic_ai.messages import SystemPromptPart, UserPromptPart, RetryPromptPart

These three dataclasses represent the parts that can appear inside a ModelRequest. Each has a part_kind literal discriminator for serialisation.

@dataclass(repr=False)
class SystemPromptPart:
content: str
timestamp: datetime = field(default_factory=now_utc)
dynamic_ref: str | None = None # set when generated by @agent.system_prompt
part_kind: Literal['system-prompt'] = 'system-prompt'
@dataclass(repr=False)
class UserPromptPart:
content: str | Sequence[UserContent]
# UserContent = str | TextContent | ImageUrl | AudioUrl | VideoUrl
# | DocumentUrl | BinaryContent | UploadedFile | CachePoint
timestamp: datetime = field(default_factory=now_utc)
part_kind: Literal['user-prompt'] = 'user-prompt'
@dataclass(repr=False)
class RetryPromptPart:
content: list[pydantic_core.ErrorDetails] | str
tool_name: str | None = None # None for output validator retries
tool_call_id: str = field(default_factory=generate_tool_call_id)
timestamp: datetime = field(default_factory=now_utc)
part_kind: Literal['retry-prompt'] = 'retry-prompt'
def model_response(self) -> str: ...
# Returns formatted error feedback string sent to the model

SystemPromptPart — when to construct manually

Section titled “SystemPromptPart — when to construct manually”

SystemPromptPart is normally created by the framework. You construct it manually when building ModelRequest objects for replay, testing, or direct API calls:

from pydantic_ai.messages import SystemPromptPart, UserPromptPart, ModelRequest
request = ModelRequest(
parts=[
SystemPromptPart(content="You are a helpful assistant."),
UserPromptPart(content="Summarise this document."),
]
)

The dynamic_ref field is populated by the framework when a @agent.system_prompt function generates the part, allowing the OTel layer to attribute spans back to the source function.

content accepts a heterogeneous sequence of UserContent items, enabling rich multi-modal prompts:

from pydantic_ai.messages import UserPromptPart
from pydantic_ai import ImageUrl, BinaryContent, TextContent
# Plain text
part = UserPromptPart(content="Describe this image:")
# Multi-modal: text + image URL
part = UserPromptPart(content=[
"Describe this image:",
ImageUrl(url="https://example.com/chart.png"),
])
# Multi-modal: text + base64 image
with open("chart.png", "rb") as f:
data = f.read()
part = UserPromptPart(content=[
TextContent(content="Analyse this chart:"),
BinaryContent(data=data, media_type="image/png"),
])
# With a cache-point marker for Anthropic prompt caching
from pydantic_ai import CachePoint
part = UserPromptPart(content=[
"Long static context...",
CachePoint(), # Insert cache boundary here
"Dynamic question?",
])

RetryPromptPart — how retry feedback works

Section titled “RetryPromptPart — how retry feedback works”

RetryPromptPart is generated by PydanticAI whenever validation fails or a ModelRetry exception is raised. The model_response() method formats the error into the string that is sent back to the model:

from pydantic_ai.messages import RetryPromptPart
import pydantic_core
# From a ModelRetry exception (string content)
retry = RetryPromptPart(
content="The city name must be a real city. 'Faketown' is not valid.",
tool_name="get_weather",
tool_call_id="call_abc123",
)
print(retry.model_response())
# "The city name must be a real city. 'Faketown' is not valid.\n\nFix the errors and try again."
# From a Pydantic ValidationError (list[ErrorDetails] content)
retry = RetryPromptPart(
content=[
{
"type": "missing",
"loc": ("city",),
"msg": "Field required",
"input": {},
}
],
tool_name="get_weather",
)
print(retry.model_response())
# "1 validation error:\n```json\n[...]\n```\n\nFix the errors and try again."
# Output validator retry (tool_name=None strips redundant input from error details)
output_retry = RetryPromptPart(
content="Output validation failed: value must be positive",
)
from pydantic_ai.messages import RetryPromptPart, ModelRequest
with capture_run_messages() as messages:
result = await agent.run("bad input")
retries = [
part
for msg in messages
if isinstance(msg, ModelRequest)
for part in msg.parts
if isinstance(part, RetryPromptPart)
]
for r in retries:
print(f"Tool: {r.tool_name!r} Feedback: {r.model_response()[:80]}")

3. BaseToolCallPart + ToolCallPart + NativeToolCallPart — Call-Part Family

Section titled “3. BaseToolCallPart + ToolCallPart + NativeToolCallPart — Call-Part Family”

Module: pydantic_ai.messages
Import: from pydantic_ai.messages import BaseToolCallPart, ToolCallPart, NativeToolCallPart

When a model decides to call a tool it generates a ToolCallPart (for function tools) or NativeToolCallPart (for native tools such as web search). Both extend BaseToolCallPart which holds the shared fields.

@dataclass(repr=False)
class BaseToolCallPart:
"""Base class for all tool-call parts."""
tool_name: str
args: str | dict[str, Any] | None = None # JSON string OR dict, depending on provider
tool_call_id: str = field(default_factory=generate_tool_call_id)
# Provider round-trip fields (only populated for native tools)
tool_kind: ToolPartKind | None = None # discriminator for typed subclasses
id: str | None = None # provider-specific call ID (e.g. OpenAI Responses)
provider_name: str | None = None # required when id/provider_details is set
provider_details: dict[str, Any] | None = None
def args_as_dict(self, *, raise_if_invalid: bool = False) -> dict[str, Any]: ...
def args_as_json_str(self) -> str: ...
def has_content(self) -> bool: ...
@dataclass(repr=False)
class ToolCallPart(BaseToolCallPart):
"""A call to a user-defined function tool."""
part_kind: Literal['tool-call'] = 'tool-call'
@staticmethod
def narrow_type(
part: 'ToolCallPart',
*,
tool_kind: ToolPartKind | None = None,
) -> 'ToolCallPart': ...
@dataclass(repr=False)
class NativeToolCallPart(BaseToolCallPart):
"""A call to a native model tool (web search, code execution, etc.)."""
part_kind: Literal['builtin-tool-call'] = 'builtin-tool-call'
@staticmethod
def narrow_type(
part: 'NativeToolCallPart',
*,
tool_kind: ToolPartKind | None = None,
) -> 'NativeToolCallPart': ...
from pydantic_ai import capture_run_messages, Agent
from pydantic_ai.messages import ModelResponse, ToolCallPart, NativeToolCallPart
agent = Agent('openai:gpt-4o-mini', tools=[my_tool])
with capture_run_messages() as messages:
await agent.run("Use the tool please")
for msg in messages:
if isinstance(msg, ModelResponse):
for part in msg.parts:
if isinstance(part, ToolCallPart):
# args may be a JSON string or a dict depending on provider
args_dict = part.args_as_dict() # always a dict
args_json = part.args_as_json_str() # always a JSON string
print(f"{part.tool_name}({args_dict}) id={part.tool_call_id}")
elif isinstance(part, NativeToolCallPart):
print(f"native:{part.tool_name} provider={part.provider_name}")

For native tools with a stable cross-provider schema (currently tool_search), NativeToolCallPart can be promoted to a typed subclass whose args is a narrowed TypedDict:

from pydantic_ai.messages import NativeToolCallPart
raw_part: NativeToolCallPart = ... # from a tool_search native call
# Automatic promotion happens during Pydantic deserialisation.
# For manual construction or testing, use narrow_type():
narrowed = NativeToolCallPart.narrow_type(raw_part, tool_kind='tool-search')
# narrowed is now a NativeToolSearchCallPart with typed .args TypedDict
# The tool_kind can also be injected inline:
narrowed = NativeToolCallPart.narrow_type(raw_part) # uses part.tool_kind

Streaming delta accumulation with ToolCallPartDelta

Section titled “Streaming delta accumulation with ToolCallPartDelta”

During streaming, call arguments arrive incrementally as ToolCallPartDelta objects that you accumulate by appending to the args string:

from pydantic_ai.messages import ToolCallPartDelta
# Accumulate streaming deltas
accumulated_args = ""
async for event in agent.run_stream_events("call the tool"):
if hasattr(event, 'delta') and isinstance(event.delta, ToolCallPartDelta):
if event.delta.args_delta:
accumulated_args += event.delta.args_delta
# Default: graceful on malformed JSON
part = ToolCallPart(tool_name="my_tool", args='{"broken": ')
safe_dict = part.args_as_dict()
# Returns: {'INVALID_JSON': '{"broken": '} — safe to pass to a model retry
# Strict: re-raises ValueError on malformed JSON
try:
strict_dict = part.args_as_dict(raise_if_invalid=True)
except ValueError:
# handle truncated tool call (e.g. token limit hit)
...

4. BaseToolReturnPart + ToolReturnPart + NativeToolReturnPart — Return-Part Family

Section titled “4. BaseToolReturnPart + ToolReturnPart + NativeToolReturnPart — Return-Part Family”

Module: pydantic_ai.messages
Import: from pydantic_ai.messages import BaseToolReturnPart, ToolReturnPart, NativeToolReturnPart

After a tool executes, its result is wrapped in a ToolReturnPart (function tools) or NativeToolReturnPart (native tools) and placed in the next ModelRequest. Both extend BaseToolReturnPart which has the rich content-handling logic.

@dataclass(repr=False)
class BaseToolReturnPart:
"""Base class for all tool-return parts."""
tool_name: str
content: ToolReturnContent # str | dict | list | MultiModalContent | ...
tool_call_id: str = field(default_factory=generate_tool_call_id)
tool_kind: ToolPartKind | None = None
metadata: Any = None # app-only; never sent to LLM
timestamp: datetime = field(default_factory=now_utc)
outcome: Literal['success', 'failed', 'denied'] = 'success'
# Content accessors
def model_response_str(self) -> str: ...
def model_response_object(self) -> dict[str, Any]: ...
def content_items(
self, *, mode: Literal['raw', 'str', 'jsonable'] = 'raw'
) -> list: ...
def files(self) -> list[MultiModalContent]: ... # property
@dataclass(repr=False)
class ToolReturnPart(BaseToolReturnPart):
"""Result from a user-defined function tool."""
part_kind: Literal['tool-return'] = 'tool-return'
@staticmethod
def narrow_type(
part: 'ToolReturnPart',
*,
tool_kind: ToolPartKind | None = None,
) -> 'ToolReturnPart': ...
@dataclass(repr=False)
class NativeToolReturnPart(BaseToolReturnPart):
"""Result from a native model tool."""
provider_name: str | None = None
provider_details: dict[str, Any] | None = None
part_kind: Literal['builtin-tool-return'] = 'builtin-tool-return'
@staticmethod
def narrow_type(
part: 'NativeToolReturnPart',
*,
tool_kind: ToolPartKind | None = None,
) -> 'NativeToolReturnPart': ...

outcome field — tracking approval and failures

Section titled “outcome field — tracking approval and failures”
from pydantic_ai.messages import ToolReturnPart
# Normal success
part = ToolReturnPart(tool_name="search", content="Results...")
assert part.outcome == 'success'
# Denied by HITL approval
denied = ToolReturnPart(
tool_name="delete_file",
content="Tool call denied by operator.",
outcome='denied',
)
# Failed execution
failed = ToolReturnPart(
tool_name="execute_sql",
content="ERROR: table 'users' does not exist",
outcome='failed',
)
# Inspect from history
with capture_run_messages() as messages:
result = await agent.run("do something dangerous")
from pydantic_ai.messages import ModelRequest
denials = [
part
for msg in messages
if isinstance(msg, ModelRequest)
for part in msg.parts
if isinstance(part, ToolReturnPart) and part.outcome == 'denied'
]

Tools can return images, audio, or documents alongside text. The BaseToolReturnPart family handles splitting multi-modal content from scalar data:

from pydantic_ai import Agent, RunContext
from pydantic_ai.messages import BinaryContent
agent = Agent('anthropic:claude-opus-4-5')
@agent.tool
async def generate_chart(ctx: RunContext[None], data: list[float]) -> list:
# Return both a description and the chart image
chart_bytes = create_chart(data)
return [
"Here is the chart:",
BinaryContent(data=chart_bytes, media_type="image/png"),
]
# Inspect the return part files after the run
with capture_run_messages() as messages:
result = await agent.run("Plot [1, 2, 3, 4, 5]")
from pydantic_ai.messages import ModelRequest, ToolReturnPart
for msg in messages:
if isinstance(msg, ModelRequest):
for part in msg.parts:
if isinstance(part, ToolReturnPart):
print(f"text: {part.model_response_str()!r}")
print(f"files: {len(part.files)} file(s)")

content_items for fine-grained serialisation

Section titled “content_items for fine-grained serialisation”
from pydantic_ai.messages import ToolReturnPart
part = ToolReturnPart(
tool_name="analyze",
content=[{"score": 0.95}, BinaryContent(data=b"...", media_type="image/png")],
)
# Raw items (no serialisation)
raw = part.content_items(mode='raw')
# Serialize non-file items to strings; pass BinaryContent through unchanged
str_items = part.content_items(mode='str')
# Serialize non-file items to JSON-compatible Python objects
json_items = part.content_items(mode='jsonable')

NativeToolReturnPart.provider_details — round-trip data

Section titled “NativeToolReturnPart.provider_details — round-trip data”

Native tools like web search may embed provider_details that must be sent back to the same provider on the next turn. PydanticAI handles this automatically; you only need to be aware when building custom providers:

from pydantic_ai.messages import NativeToolReturnPart
# Constructed by model implementations — provider_name is mandatory when provider_details is set
return_part = NativeToolReturnPart(
tool_name="web_search",
content="Search result text...",
provider_name="anthropic",
provider_details={"search_result_id": "srq_123", "cache_control": {"type": "ephemeral"}},
tool_kind="tool-search",
)

5. GraphAgentState — Agent Run State Internals

Section titled “5. GraphAgentState — Agent Run State Internals”

Module: pydantic_ai._agent_graph
Import (internal): from pydantic_ai._agent_graph import GraphAgentState

GraphAgentState is the mutable state dataclass that the pydantic_graph runtime threads through UserPromptNode → ModelRequestNode → CallToolsNode on every step. Understanding it is key for building custom graph runners or interpreting low-level diagnostics.

@dataclasses.dataclass(kw_only=True)
class GraphAgentState:
"""State kept across the execution of the agent graph."""
message_history: list[ModelMessage] = field(default_factory=list)
usage: RunUsage = field(default_factory=RunUsage)
output_retries_used: int = 0
run_step: int = 0
run_id: str = field(default_factory=lambda: str(uuid7()))
conversation_id: str = field(default_factory=lambda: str(uuid7()))
metadata: dict[str, Any] | None = None
last_max_tokens: int | None = None
last_model_request_parameters: ModelRequestParameters | None = None
pending_messages: list[PendingMessage] = field(default_factory=list)
def check_incomplete_tool_call(self) -> None: ...
def consume_output_retry(
self,
max_output_retries: int,
error: BaseException | None = None,
) -> None: ...
FieldPurpose
message_historyAccumulated ModelRequest/ModelResponse list for the current run
usageAggregated RunUsage summed across all model calls so far
output_retries_usedCounter of output validator retries; checked against max_output_retries
run_stepIncremented on each ModelRequestNode execution; useful for observability
run_idUUID7 for the current agent run; matches RunContext.run_id
conversation_idUUID7 spanning multi-run conversations; matches RunContext.conversation_id
metadataApp-level metadata dict threaded through the run
last_max_tokensStored to produce accurate token-limit exceeded error messages
last_model_request_parametersLast ModelRequestParameters for OTel span attributes
pending_messagesInternal queue for RunContext.enqueue() / AgentRun.enqueue()
import asyncio
from pydantic_ai import Agent
from pydantic_ai._agent_graph import GraphAgentState
agent = Agent('openai:gpt-4o-mini')
async def track_state():
async with agent.iter("What is the capital of France?") as run:
async for node in run:
# Access state via the graph run context
state: GraphAgentState = run.ctx.state
print(
f"step={state.run_step} "
f"msgs={len(state.message_history)} "
f"tokens_so_far={state.usage.total_tokens}"
)
print(f"Final run_id: {state.run_id}")
print(f"Conversation ID: {state.conversation_id}")
asyncio.run(track_state())

check_incomplete_tool_call() — detecting token-limit truncation

Section titled “check_incomplete_tool_call() — detecting token-limit truncation”

The framework calls this automatically, but you can call it yourself when inspecting saved state:

from pydantic_ai._agent_graph import GraphAgentState
from pydantic_ai.exceptions import IncompleteToolCall
# Load a saved state snapshot
state = load_state_snapshot()
try:
state.check_incomplete_tool_call()
except IncompleteToolCall as e:
# Last model response was truncated mid-tool-call JSON
print(f"Truncated tool call detected: {e}")
# Increase max_tokens and re-run, or simplify the prompt

consume_output_retry() — retry budget enforcement

Section titled “consume_output_retry() — retry budget enforcement”
from pydantic_ai._agent_graph import GraphAgentState
from pydantic_ai.exceptions import UnexpectedModelBehavior
state = GraphAgentState()
# Simulates what CallToolsNode does when output validation fails
try:
state.consume_output_retry(max_output_retries=3)
state.consume_output_retry(max_output_retries=3)
state.consume_output_retry(max_output_retries=3)
state.consume_output_retry(max_output_retries=3) # raises on the 4th call
except UnexpectedModelBehavior:
print("Exceeded 3 output retries — abort")

6. MCPServerTool — Native MCP Server Integration

Section titled “6. MCPServerTool — Native MCP Server Integration”

Module: pydantic_ai.native_tools
Import: from pydantic_ai import MCPServerTool

MCPServerTool is a native tool that tells a model to connect to an MCP server at the network level, offloading tool discovery and invocation entirely to the provider. This is distinct from MCPToolset (which manages MCP tool calls inside PydanticAI); MCPServerTool delegates execution directly to the provider’s native MCP support.

@dataclass(kw_only=True)
class MCPServerTool(AbstractNativeTool):
"""A native tool that allows your agent to use MCP servers.
Supported by: OpenAI Responses, Anthropic, xAI
"""
id: str # unique identifier for this server
url: str # MCP server URL
authorization_token: str | None = None # Bearer token for auth
description: str | None = None # server description for the model
allowed_tools: list[str] | None = None # restrict which MCP tools are exposed
headers: dict[str, str] | None = None # custom HTTP headers
kind: str = 'mcp_server'
@property
def unique_id(self) -> str:
return f'mcp_server:{self.id}'
@property
def label(self) -> str:
return f'MCP: {self.id}'
FeatureOpenAI ResponsesAnthropicxAI
url
authorization_token
description
allowed_tools
headers
OpenAI connector_idvia url prefix
from pydantic_ai import Agent
from pydantic_ai.capabilities import NativeTool
from pydantic_ai import MCPServerTool
agent = Agent(
'openai:gpt-4o',
capabilities=[
NativeTool(
MCPServerTool(
id="my-db-mcp",
url="https://my-mcp-server.example.com/mcp",
authorization_token="Bearer sk-...",
allowed_tools=["query_database", "list_tables"],
description="Internal database MCP server",
)
)
],
)
result = agent.run_sync("List all tables in the database")
print(result.data)
from pydantic_ai import Agent, MCPServerTool
from pydantic_ai.capabilities import NativeTool
agent = Agent(
'anthropic:claude-opus-4-5',
capabilities=[
NativeTool(
MCPServerTool(
id="search-mcp",
url="https://search.example.com/mcp",
authorization_token="sk-search-token",
)
),
NativeTool(
MCPServerTool(
id="calendar-mcp",
url="https://calendar.example.com/mcp",
authorization_token="sk-calendar-token",
allowed_tools=["list_events", "create_event"],
)
),
],
)

OpenAI Responses supports managed MCP servers via connector IDs. Pass the connector ID with the x-openai-connector: prefix:

MCPServerTool(
id="openai-managed-server",
url="x-openai-connector:<your_connector_id>",
allowed_tools=["search", "summarize"],
)
MCPServerTool(
id="enterprise-mcp",
url="https://internal.corp.com/mcp",
headers={
"X-Tenant-ID": "acme-corp",
"X-Service-Account": "pydantic-ai-agent",
},
authorization_token="Bearer <service-account-token>",
)
AspectMCPServerToolMCPToolset
Execution locationProvider’s infrastructureYour Python process
Tool discoveryProvider handles itPydanticAI fetches tool list at startup
Supported providersOpenAI, Anthropic, xAIAll (provider-agnostic)
ObservabilityVia provider logsFull PydanticAI OTel traces
HITL / approvalProvider-onlyApprovalRequiredToolset wrapper
TransportProvider-managedSSE, HTTP, stdio (configurable)

Section titled “7. FileSearchTool — Native RAG File Search”

Module: pydantic_ai.native_tools
Import: from pydantic_ai import FileSearchTool

FileSearchTool gives the model access to a fully managed vector-search RAG system backed by the provider’s file storage infrastructure. It handles chunking, embedding generation, and context injection, requiring only file store IDs from you.

@dataclass(kw_only=True)
class FileSearchTool(AbstractNativeTool):
"""A native tool that allows your agent to search through uploaded files.
Supported by: OpenAI Responses, Google (Gemini), xAI
"""
file_store_ids: Sequence[str]
# OpenAI: vector store IDs created via OpenAI API
# Google: file search store names from Gemini Files API
# xAI: collection IDs for xAI collections search
kind: str = 'file_search'

OpenAI vector stores:

from openai import OpenAI
client = OpenAI()
# 1. Create a vector store
store = client.vector_stores.create(name="product-docs")
# 2. Upload files
with open("manual.pdf", "rb") as f:
client.vector_stores.file_batches.upload_and_poll(
vector_store_id=store.id,
files=[("manual.pdf", f, "application/pdf")],
)

Google Gemini Files API:

import google.generativeai as genai
genai.configure(api_key="...")
file_ref = genai.upload_file("docs/guide.pdf")
store_name = file_ref.name # e.g. "files/abc123"
import asyncio
from pydantic_ai import Agent, FileSearchTool
from pydantic_ai.capabilities import NativeTool
VECTOR_STORE_ID = "vs_abc123"
agent = Agent(
'openai:gpt-4o',
capabilities=[
NativeTool(
FileSearchTool(file_store_ids=[VECTOR_STORE_ID])
)
],
)
async def main():
result = await agent.run(
"What does the product manual say about warranty coverage?"
)
print(result.data)
asyncio.run(main())
agent = Agent(
'openai:gpt-4o',
capabilities=[
NativeTool(
FileSearchTool(
file_store_ids=[
"vs_product_docs",
"vs_support_tickets",
"vs_legal_contracts",
]
)
)
],
)
agent = Agent(
'google-gla:gemini-2.0-flash',
capabilities=[
NativeTool(
FileSearchTool(
file_store_ids=["files/abc123", "files/def456"]
)
)
],
)
agent = Agent(
'xai:grok-3',
capabilities=[
NativeTool(
FileSearchTool(
file_store_ids=["collection_id_1", "collection_id_2"]
)
)
],
)

Comparing FileSearchTool vs DeferredLoadingToolset for RAG

Section titled “Comparing FileSearchTool vs DeferredLoadingToolset for RAG”
AspectFileSearchToolCustom FunctionToolset RAG
InfrastructureProvider-managedYour embedding DB + retrieval code
Cross-providerOpenAI / Google / xAI onlyAny model
Chunking strategyProvider defaultFully configurable
Re-rankingProvider-managedConfigurable
LatencyProvider-optimisedYour infrastructure
Cost transparencyProvider billingYour embedding costs

8. IncludeReturnSchemasToolset — Auto Return-Schema Injection

Section titled “8. IncludeReturnSchemasToolset — Auto Return-Schema Injection”

Module: pydantic_ai.toolsets.include_return_schemas
Import: from pydantic_ai import IncludeReturnSchemasToolset

IncludeReturnSchemasToolset is a PreparedToolset subclass that sets include_return_schema=True on every ToolDefinition that doesn’t already have an explicit return schema setting. This instructs the model to validate its tool calls against the tool’s return type JSON schema — useful for models that support structured tool outputs and for improving type-safety in multi-step pipelines.

@dataclass(init=False)
class IncludeReturnSchemasToolset(PreparedToolset[AgentDepsT]):
"""A toolset that sets include_return_schema=True on all its tools.
Wraps any AbstractToolset and injects include_return_schema=True
into every ToolDefinition whose include_return_schema is still None.
"""
def __init__(self, wrapped: AbstractToolset[AgentDepsT]) -> None: ...

Internally it works by wrapping the wrapped toolset in a PreparedToolset with an async _include function that iterates over tool definitions and calls dataclasses.replace(td, include_return_schema=True) for any td where include_return_schema is None.

from pydantic_ai import Agent
from pydantic_ai.toolsets import FunctionToolset
from pydantic_ai import IncludeReturnSchemasToolset
from pydantic import BaseModel
class WeatherReport(BaseModel):
temperature_c: float
condition: str
humidity_percent: int
toolset = FunctionToolset()
@toolset.tool
def get_weather(city: str) -> WeatherReport:
"""Get current weather for a city."""
return WeatherReport(temperature_c=22.5, condition="sunny", humidity_percent=45)
agent = Agent(
'openai:gpt-4o',
toolsets=[IncludeReturnSchemasToolset(toolset)],
)
result = agent.run_sync("What's the weather in Paris?")
from pydantic_ai import Agent, FilteredToolset, IncludeReturnSchemasToolset
from pydantic_ai.toolsets import FunctionToolset
admin_toolset = FunctionToolset()
# ... register admin tools ...
user_context_toolset = FilteredToolset(
admin_toolset,
filter=lambda ctx, td: td.name in ctx.deps["allowed_tools"],
)
# Include return schemas on the filtered set
typed_toolset = IncludeReturnSchemasToolset(user_context_toolset)
agent = Agent('openai:gpt-4o', toolsets=[typed_toolset])

include_return_schema=True is most useful when:

  1. Chaining tools — downstream tools use the typed output of upstream tools
  2. Structured output pipelines — you want the model to reason about data structure
  3. OpenAI Structured Outputs — the model’s response_format is json_schema and tools should match
# Without IncludeReturnSchemasToolset: tool return schema omitted from model request
# With IncludeReturnSchemasToolset: each ToolDefinition.json_schema includes a
# "return" key with the full Pydantic JSON schema for the return type
# Inspect the resulting tool definitions:
from pydantic_ai.toolsets import FunctionToolset
from pydantic_ai import IncludeReturnSchemasToolset
base = FunctionToolset()
@base.tool
def add(a: int, b: int) -> int:
"""Add two numbers."""
return a + b
wrapped = IncludeReturnSchemasToolset(base)
import asyncio
from pydantic_ai.tools import RunContext
async def inspect():
ctx = RunContext(deps=None, ...) # minimal ctx for inspection
tool_defs = await wrapped.get_tools(ctx)
for td in tool_defs:
print(td.name, "include_return_schema:", td.include_return_schema)
# add include_return_schema: True

9. ToolChoice + ToolOrOutput — Tool Selection Control

Section titled “9. ToolChoice + ToolOrOutput — Tool Selection Control”

Module: pydantic_ai.settings
Import: from pydantic_ai.settings import ToolChoice, ToolOrOutput

ToolChoice is a type alias that controls how the model selects between available tools and output modes on a per-request basis. ToolOrOutput is a dataclass that lets you restrict function tools while keeping output and text/image output paths available.

ToolChoiceScalar = Literal['none', 'required', 'auto']
@dataclass
class ToolOrOutput:
"""Restricts function tools while keeping output tools and text/image output available."""
function_tools: list[str] # names of the function tools the model may call
ToolChoice = ToolChoiceScalar | list[str] | ToolOrOutput | None
ValueBehaviour
NoneDefault; model decides which tool to use (equivalent to 'auto')
'auto'Model may call any tool or produce text/output
'required'Model must call at least one tool before finishing
'none'Model must not call any tools; forces text/output response
list[str]Model must call exactly one of these named function tools
ToolOrOutput(...)Named function tools available plus output tools and text/image
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
agent = Agent('openai:gpt-4o-mini')
# Force the model to always call a tool
result = agent.run_sync(
"What is the weather in London?",
model_settings=ModelSettings(tool_choice='required'),
)
# Allow only a specific tool
result = agent.run_sync(
"Search for Python tutorials",
model_settings=ModelSettings(tool_choice=['web_search']),
)
# Disable tools entirely (force text response)
result = agent.run_sync(
"Tell me a joke",
model_settings=ModelSettings(tool_choice='none'),
)

ToolOrOutput — mixing function tools with output

Section titled “ToolOrOutput — mixing function tools with output”

ToolOrOutput is useful when you want to allow the model to call specific function tools or produce structured output, without allowing all available tools:

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings, ToolOrOutput
from pydantic_ai.toolsets import FunctionToolset
from pydantic import BaseModel
toolset = FunctionToolset()
@toolset.tool
def search_knowledge_base(query: str) -> str:
"""Search internal KB."""
return "relevant content..."
@toolset.tool
def escalate_to_human(reason: str) -> str:
"""Escalate the query to a human agent."""
return "escalated"
class FinalAnswer(BaseModel):
answer: str
confidence: float
agent = Agent(
'openai:gpt-4o',
output_type=FinalAnswer,
toolsets=[toolset],
)
# Allow only 'search_knowledge_base' as a function tool,
# but the model can still use the output tool to produce FinalAnswer
result = agent.run_sync(
"What is our refund policy?",
model_settings=ModelSettings(
tool_choice=ToolOrOutput(function_tools=['search_knowledge_base'])
),
)
from pydantic_ai.capabilities import AbstractCapability
from pydantic_ai.settings import ModelSettings
from pydantic_ai.models import ModelRequestContext
class ForceSearchCapability(AbstractCapability):
"""Forces the model to call search tools on the first step."""
async def before_model_request(
self,
messages,
info: ModelRequestContext,
) -> ModelSettings | None:
if info.run_step == 0:
return ModelSettings(tool_choice='required')
return None
agent = Agent('openai:gpt-4o', capabilities=[ForceSearchCapability()])
from pydantic_ai import Agent, RunContext
from pydantic_ai.settings import ModelSettings
async def get_model_settings(ctx: RunContext[dict]) -> ModelSettings | None:
user_role = ctx.deps.get("role", "user")
if user_role == "admin":
# Admins can call any tool
return ModelSettings(tool_choice='auto')
else:
# Regular users can only call read-only tools
return ModelSettings(tool_choice=['search', 'get_info'])
agent = Agent(
'openai:gpt-4o',
model_settings=get_model_settings, # callable form
)

10. ServiceTier + ThinkingLevel / ThinkingEffort — Cross-Provider Config Type Aliases

Section titled “10. ServiceTier + ThinkingLevel / ThinkingEffort — Cross-Provider Config Type Aliases”

Module: pydantic_ai.settings
Import: from pydantic_ai.settings import ServiceTier, ThinkingLevel, ThinkingEffort

These two type aliases centralise cross-provider configuration that would otherwise require provider-specific settings. Both are defined in pydantic_ai.settings and consumed via ModelSettings.

ServiceTier: TypeAlias = Literal['auto', 'default', 'flex', 'priority']
ThinkingEffort: TypeAlias = Literal['minimal', 'low', 'medium', 'high', 'xhigh']
ThinkingLevel: TypeAlias = bool | ThinkingEffort
# True → enable thinking with provider default effort
# False → disable thinking (silently ignored on always-on models)
# 'minimal' / 'low' / 'medium' / 'high' / 'xhigh' → specific effort level

ServiceTier — cross-provider billing tier control

Section titled “ServiceTier — cross-provider billing tier control”

ServiceTier provides a unified way to select processing tier across providers that support tiered billing, without needing to use provider-specific settings:

ValueOpenAIAnthropicBedrockGoogle Gemini APIGoogle Cloud
'auto''auto''auto'(omitted)(omitted)PT then on-demand
'default''default''standard_only'{'type': 'default'}'standard'PT then on-demand
'flex''flex'(omitted){'type': 'flex'}'flex'PT then Flex PayGo
'priority''priority'(omitted){'type': 'priority'}'priority'PT then Priority PayGo
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
# Cost-optimised batch processing
agent_flex = Agent(
'openai:gpt-4o',
model_settings=ModelSettings(service_tier='flex'),
)
# Low-latency customer-facing requests
agent_priority = Agent(
'openai:gpt-4o',
model_settings=ModelSettings(service_tier='priority'),
)
# Adapt tier based on request priority
async def get_settings(ctx) -> ModelSettings | None:
if ctx.deps.get("is_urgent"):
return ModelSettings(service_tier='priority')
return ModelSettings(service_tier='flex')
agent_dynamic = Agent('google-gla:gemini-2.0-flash', model_settings=get_settings)

Per-provider overrides (openai_service_tier, anthropic_service_tier, etc.) always take precedence over the unified service_tier when both are set.

ThinkingLevel — cross-provider extended thinking control

Section titled “ThinkingLevel — cross-provider extended thinking control”

ThinkingLevel wraps both boolean on/off control and granular effort levels into a single field:

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
# Enable with provider default effort
agent = Agent(
'anthropic:claude-opus-4-5',
model_settings=ModelSettings(thinking=True),
)
# Disable thinking (no-op on always-on models like o1/o3)
agent = Agent(
'openai:o1',
model_settings=ModelSettings(thinking=False), # silently ignored
)
# Specific effort level
agent = Agent(
'anthropic:claude-opus-4-5',
model_settings=ModelSettings(thinking='high'),
)
result = agent.run_sync("Prove Fermat's Last Theorem step by step")
print(result.data)

When an exact ThinkingEffort level isn’t supported by a provider, PydanticAI maps to the nearest available level:

EffortAnthropicOpenAI (o-series)Google (Gemini)
'minimal'low budget tokensnot supported → 'low'dynamic
'low'low budget tokens(omitted, default)dynamic
'medium'medium budget tokensmedium reasoning_effortdynamic
'high'high budget tokenshigh reasoning_effortalta
'xhigh'max budget tokens'high' on providers without xhighalta
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
# Dynamic effort selection based on task complexity
async def adaptive_thinking(ctx) -> ModelSettings | None:
complexity = ctx.deps.get("complexity_score", 0.5)
if complexity > 0.8:
effort = 'xhigh'
elif complexity > 0.5:
effort = 'high'
elif complexity > 0.2:
effort = 'medium'
else:
effort = 'low'
return ModelSettings(thinking=effort)
agent = Agent(
'anthropic:claude-opus-4-5',
model_settings=adaptive_thinking,
)

Combining ServiceTier + ThinkingLevel for cost management

Section titled “Combining ServiceTier + ThinkingLevel for cost management”
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
# High-accuracy, higher-cost pipeline
accuracy_agent = Agent(
'openai:gpt-4o',
model_settings=ModelSettings(
service_tier='priority',
thinking='xhigh',
),
)
# Cost-optimised background processing
batch_agent = Agent(
'openai:gpt-4o-mini',
model_settings=ModelSettings(
service_tier='flex',
thinking=False,
),
)

#Class(es)ModuleKey takeaways
1ModelRequest + ModelResponsepydantic_ai.messagesWire-format anatomy; FinishReason/ModelResponseState type aliases; run_id/conversation_id threading; metadata never sent to LLM; ModelMessagesTypeAdapter for ser/de
2SystemPromptPart + UserPromptPart + RetryPromptPartpydantic_ai.messagesThree request-side part types; multi-modal UserContent in UserPromptPart; dynamic_ref for OTel attribution; RetryPromptPart.model_response() formats validation errors
3BaseToolCallPart + ToolCallPart + NativeToolCallPartpydantic_ai.messagesCall-part family; args_as_dict(raise_if_invalid=) graceful/strict JSON parsing; narrow_type() for typed subclass promotion; ToolCallPartDelta for streaming accumulation
4BaseToolReturnPart + ToolReturnPart + NativeToolReturnPartpydantic_ai.messagesReturn-part family; outcome tracks success/failed/denied; content_items(mode=) for serialisation; files property for multi-modal extraction; provider_details round-trip
5GraphAgentStatepydantic_ai._agent_graphRun-state internals; run_id/conversation_id; check_incomplete_tool_call() for token-limit detection; consume_output_retry() budget enforcement; pending_messages queue
6MCPServerToolpydantic_ai.native_toolsNative MCP server (OpenAI/Anthropic/xAI); allowed_tools restriction; headers for enterprise auth; OpenAI connector ID via x-openai-connector: prefix; vs MCPToolset comparison
7FileSearchToolpydantic_ai.native_toolsProvider-managed RAG (OpenAI vector stores / Gemini Files API / xAI collections); file_store_ids parameter; zero-code chunking + embedding; vs custom RAG comparison
8IncludeReturnSchemasToolsetpydantic_ai.toolsetsPreparedToolset subclass; auto-injects include_return_schema=True; composable with FilteredToolset; useful for structured tool-output pipelines
9ToolChoice + ToolOrOutputpydantic_ai.settingsToolChoiceScalar ('auto'/'required'/'none') + list[str] + ToolOrOutput + None; ToolOrOutput.function_tools restricts function tools while allowing output tools; set via ModelSettings.tool_choice
10ServiceTier + ThinkingLevel / ThinkingEffortpydantic_ai.settingsCross-provider type aliases; ServiceTier maps to provider-specific billing tiers; ThinkingLevel unifies bool + 5 effort levels; per-provider settings override unified field; combine for cost management

All examples verified against pydantic-ai 1.105.0 source.