PydanticAI — Class Deep Dives Vol. 9

import { Aside } from ‘@astrojs/starlight/components’;

Ten class groups from the pydantic_ai 1.105.0 source covering: the complete wire-format anatomy of ModelRequest and ModelResponse; the three request-side message part types (SystemPromptPart, UserPromptPart, RetryPromptPart); the call-part family (BaseToolCallPart, ToolCallPart, NativeToolCallPart) with typed-subclass promotion and streaming helpers; the return-part family (BaseToolReturnPart, ToolReturnPart, NativeToolReturnPart) with multi-modal content splitting and outcome tracking; GraphAgentState run-state internals; two new native tool types (MCPServerTool and FileSearchTool); IncludeReturnSchemasToolset for automatic tool-return-schema injection; ToolChoice + ToolOrOutput for surgical tool/output mode control; and the ServiceTier + ThinkingLevel type aliases that centralise cross-provider configuration.

1. `ModelRequest` + `ModelResponse` — Wire-Format Message Anatomy

Module: pydantic_ai.messages
Import: from pydantic_ai import ModelRequest, ModelResponse

Every conversation between PydanticAI and a model is represented as a list[ModelMessage] where ModelMessage = ModelRequest | ModelResponse. Understanding these two dataclasses in depth lets you parse, inspect, replay, or surgically modify conversation history.

Class signatures

from datetime import datetime
from typing import Literal, Sequence, Any

@dataclass(repr=False)
class ModelRequest:
    """A request generated by PydanticAI and sent to a model."""

    parts: Sequence[ModelRequestPart]
    # ModelRequestPart = SystemPromptPart | UserPromptPart | ToolReturnPart
    #                  | NativeToolReturnPart | RetryPromptPart | InstructionPart

    timestamp: datetime | None = None         # when the request was sent
    instructions: str | None = None           # rendered instruction string (for logging/debugging)
    kind: Literal['request'] = 'request'      # discriminator
    run_id: str | None = None                 # UUID7 for the agent run
    conversation_id: str | None = None        # UUID7 spanning multiple runs
    metadata: dict[str, Any] | None = None    # app-only data, never sent to LLM

    @classmethod
    def user_text_prompt(
        cls,
        user_prompt: str,
        *,
        instructions: str | None = None,
    ) -> 'ModelRequest': ...


@dataclass(repr=False)
class ModelResponse:
    """A response from a model."""

    parts: Sequence[ModelResponsePart]
    # ModelResponsePart = TextPart | ThinkingPart | ToolCallPart | NativeToolCallPart
    #                   | FilePart | CompactionPart

    usage: RequestUsage = field(default_factory=RequestUsage)
    model_name: str | None = None             # e.g. 'gpt-4o-2024-08-06'
    timestamp: datetime = field(default_factory=now_utc)
    finish_reason: FinishReason | None = None # 'stop' | 'tool_calls' | 'length' | 'content_filter' | ...
    state: ModelResponseState | None = None   # 'complete' | 'incomplete' | 'interrupted'
    kind: Literal['response'] = 'response'   # discriminator
    run_id: str | None = None
    conversation_id: str | None = None

    # FinishReason type alias
    # Literal['stop', 'tool_calls', 'length', 'content_filter', 'other']

    # ModelResponseState type alias
    # Literal['complete', 'incomplete', 'interrupted']

Field reference

Field	Type	Notes
`ModelRequest.parts`	`Sequence[ModelRequestPart]`	Ordered mix of system prompts, user prompts, tool returns, retries
`ModelRequest.instructions`	`str \| None`	Rendered instruction string stored for debugging/logging only
`ModelRequest.run_id`	`str \| None`	UUID7 shared with `RunContext.run_id` and OTel span
`ModelRequest.conversation_id`	`str \| None`	UUID7 that spans multi-run conversations
`ModelRequest.metadata`	`dict[str, Any] \| None`	App-only; never sent to LLM
`ModelResponse.finish_reason`	`FinishReason \| None`	`'stop'` = clean end; `'tool_calls'` = pending calls; `'length'` = token limit hit; `'content_filter'` = blocked
`ModelResponse.state`	`ModelResponseState \| None`	`'complete'` = all parts arrived; `'incomplete'` = interrupted mid-stream; `'interrupted'` = cancelled
`ModelResponse.usage`	`RequestUsage`	Per-request tokens (input, output, cache hit/miss, audio, details)
`ModelResponse.model_name`	`str \| None`	Resolved model name as reported by the provider

Parsing message history

import asyncio
from pydantic_ai import Agent, capture_run_messages
from pydantic_ai.messages import (
    ModelRequest,
    ModelResponse,
    UserPromptPart,
    TextPart,
    ToolCallPart,
    ToolReturnPart,
)

agent = Agent('openai:gpt-4o-mini')

async def inspect_history():
    with capture_run_messages() as messages:
        result = await agent.run("What is 2 + 2?")

    for msg in messages:
        if isinstance(msg, ModelRequest):
            print(f"REQUEST  run_id={msg.run_id}  conv={msg.conversation_id}")
            for part in msg.parts:
                if isinstance(part, UserPromptPart):
                    print(f"  user: {part.content!r}")
        elif isinstance(msg, ModelResponse):
            print(f"RESPONSE finish={msg.finish_reason}  model={msg.model_name}")
            for part in msg.parts:
                if isinstance(part, TextPart):
                    print(f"  text: {part.content!r}")
                elif isinstance(part, ToolCallPart):
                    print(f"  tool_call: {part.tool_name}({part.args_as_json_str()})")

asyncio.run(inspect_history())

Replaying a request with modified metadata

from dataclasses import replace
from pydantic_ai.messages import ModelRequest, UserPromptPart

# Load from storage
raw = load_messages_from_db(conversation_id="abc-123")

# Find the last request and add metadata for replay
last_request = next(m for m in reversed(raw) if isinstance(m, ModelRequest))
tagged = replace(last_request, metadata={"replay": True, "analyst": "claude"})

result = await agent.run(
    "Continue",
    message_history=raw[:-1] + [tagged],
)

Serialisation with `ModelMessagesTypeAdapter`

import json
from pydantic_ai import ModelMessagesTypeAdapter
from pydantic_ai.messages import ModelRequest, ModelResponse

# Serialise
messages: list[ModelRequest | ModelResponse] = ...
blob = ModelMessagesTypeAdapter.dump_json(messages)

# Deserialise (discriminated union on `kind` field)
restored = ModelMessagesTypeAdapter.validate_json(blob)

# To a Python list for storage
records = ModelMessagesTypeAdapter.dump_python(messages, mode='json')

2. `SystemPromptPart` + `UserPromptPart` + `RetryPromptPart` — Request-Side Message Parts

Module: pydantic_ai.messages
Import: from pydantic_ai.messages import SystemPromptPart, UserPromptPart, RetryPromptPart

These three dataclasses represent the parts that can appear inside a ModelRequest. Each has a part_kind literal discriminator for serialisation.

Class signatures

@dataclass(repr=False)
class SystemPromptPart:
    content: str
    timestamp: datetime = field(default_factory=now_utc)
    dynamic_ref: str | None = None   # set when generated by @agent.system_prompt
    part_kind: Literal['system-prompt'] = 'system-prompt'


@dataclass(repr=False)
class UserPromptPart:
    content: str | Sequence[UserContent]
    # UserContent = str | TextContent | ImageUrl | AudioUrl | VideoUrl
    #             | DocumentUrl | BinaryContent | UploadedFile | CachePoint
    timestamp: datetime = field(default_factory=now_utc)
    part_kind: Literal['user-prompt'] = 'user-prompt'


@dataclass(repr=False)
class RetryPromptPart:
    content: list[pydantic_core.ErrorDetails] | str
    tool_name: str | None = None        # None for output validator retries
    tool_call_id: str = field(default_factory=generate_tool_call_id)
    timestamp: datetime = field(default_factory=now_utc)
    part_kind: Literal['retry-prompt'] = 'retry-prompt'

    def model_response(self) -> str: ...
    # Returns formatted error feedback string sent to the model

`SystemPromptPart` — when to construct manually

SystemPromptPart is normally created by the framework. You construct it manually when building ModelRequest objects for replay, testing, or direct API calls:

from pydantic_ai.messages import SystemPromptPart, UserPromptPart, ModelRequest

request = ModelRequest(
    parts=[
        SystemPromptPart(content="You are a helpful assistant."),
        UserPromptPart(content="Summarise this document."),
    ]
)

The dynamic_ref field is populated by the framework when a @agent.system_prompt function generates the part, allowing the OTel layer to attribute spans back to the source function.

`UserPromptPart` — multi-modal content

content accepts a heterogeneous sequence of UserContent items, enabling rich multi-modal prompts:

from pydantic_ai.messages import UserPromptPart
from pydantic_ai import ImageUrl, BinaryContent, TextContent

# Plain text
part = UserPromptPart(content="Describe this image:")

# Multi-modal: text + image URL
part = UserPromptPart(content=[
    "Describe this image:",
    ImageUrl(url="https://example.com/chart.png"),
])

# Multi-modal: text + base64 image
with open("chart.png", "rb") as f:
    data = f.read()
part = UserPromptPart(content=[
    TextContent(content="Analyse this chart:"),
    BinaryContent(data=data, media_type="image/png"),
])

# With a cache-point marker for Anthropic prompt caching
from pydantic_ai import CachePoint
part = UserPromptPart(content=[
    "Long static context...",
    CachePoint(),           # Insert cache boundary here
    "Dynamic question?",
])

`RetryPromptPart` — how retry feedback works

RetryPromptPart is generated by PydanticAI whenever validation fails or a ModelRetry exception is raised. The model_response() method formats the error into the string that is sent back to the model:

from pydantic_ai.messages import RetryPromptPart
import pydantic_core

# From a ModelRetry exception (string content)
retry = RetryPromptPart(
    content="The city name must be a real city. 'Faketown' is not valid.",
    tool_name="get_weather",
    tool_call_id="call_abc123",
)
print(retry.model_response())
# "The city name must be a real city. 'Faketown' is not valid.\n\nFix the errors and try again."

# From a Pydantic ValidationError (list[ErrorDetails] content)
retry = RetryPromptPart(
    content=[
        {
            "type": "missing",
            "loc": ("city",),
            "msg": "Field required",
            "input": {},
        }
    ],
    tool_name="get_weather",
)
print(retry.model_response())
# "1 validation error:\n```json\n[...]\n```\n\nFix the errors and try again."

# Output validator retry (tool_name=None strips redundant input from error details)
output_retry = RetryPromptPart(
    content="Output validation failed: value must be positive",
)

Inspecting retries in captured messages

from pydantic_ai.messages import RetryPromptPart, ModelRequest

with capture_run_messages() as messages:
    result = await agent.run("bad input")

retries = [
    part
    for msg in messages
    if isinstance(msg, ModelRequest)
    for part in msg.parts
    if isinstance(part, RetryPromptPart)
]
for r in retries:
    print(f"Tool: {r.tool_name!r}  Feedback: {r.model_response()[:80]}")

3. `BaseToolCallPart` + `ToolCallPart` + `NativeToolCallPart` — Call-Part Family

Module: pydantic_ai.messages
Import: from pydantic_ai.messages import BaseToolCallPart, ToolCallPart, NativeToolCallPart

When a model decides to call a tool it generates a ToolCallPart (for function tools) or NativeToolCallPart (for native tools such as web search). Both extend BaseToolCallPart which holds the shared fields.

Class signatures

@dataclass(repr=False)
class BaseToolCallPart:
    """Base class for all tool-call parts."""

    tool_name: str
    args: str | dict[str, Any] | None = None   # JSON string OR dict, depending on provider
    tool_call_id: str = field(default_factory=generate_tool_call_id)

    # Provider round-trip fields (only populated for native tools)
    tool_kind: ToolPartKind | None = None       # discriminator for typed subclasses
    id: str | None = None                       # provider-specific call ID (e.g. OpenAI Responses)
    provider_name: str | None = None            # required when id/provider_details is set
    provider_details: dict[str, Any] | None = None

    def args_as_dict(self, *, raise_if_invalid: bool = False) -> dict[str, Any]: ...
    def args_as_json_str(self) -> str: ...
    def has_content(self) -> bool: ...


@dataclass(repr=False)
class ToolCallPart(BaseToolCallPart):
    """A call to a user-defined function tool."""
    part_kind: Literal['tool-call'] = 'tool-call'

    @staticmethod
    def narrow_type(
        part: 'ToolCallPart',
        *,
        tool_kind: ToolPartKind | None = None,
    ) -> 'ToolCallPart': ...


@dataclass(repr=False)
class NativeToolCallPart(BaseToolCallPart):
    """A call to a native model tool (web search, code execution, etc.)."""
    part_kind: Literal['builtin-tool-call'] = 'builtin-tool-call'

    @staticmethod
    def narrow_type(
        part: 'NativeToolCallPart',
        *,
        tool_kind: ToolPartKind | None = None,
    ) -> 'NativeToolCallPart': ...

Reading tool call arguments

from pydantic_ai import capture_run_messages, Agent
from pydantic_ai.messages import ModelResponse, ToolCallPart, NativeToolCallPart

agent = Agent('openai:gpt-4o-mini', tools=[my_tool])

with capture_run_messages() as messages:
    await agent.run("Use the tool please")

for msg in messages:
    if isinstance(msg, ModelResponse):
        for part in msg.parts:
            if isinstance(part, ToolCallPart):
                # args may be a JSON string or a dict depending on provider
                args_dict = part.args_as_dict()          # always a dict
                args_json = part.args_as_json_str()      # always a JSON string
                print(f"{part.tool_name}({args_dict})  id={part.tool_call_id}")
            elif isinstance(part, NativeToolCallPart):
                print(f"native:{part.tool_name}  provider={part.provider_name}")

Typed subclass promotion with `narrow_type`

For native tools with a stable cross-provider schema (currently tool_search), NativeToolCallPart can be promoted to a typed subclass whose args is a narrowed TypedDict:

from pydantic_ai.messages import NativeToolCallPart

raw_part: NativeToolCallPart = ...   # from a tool_search native call

# Automatic promotion happens during Pydantic deserialisation.
# For manual construction or testing, use narrow_type():
narrowed = NativeToolCallPart.narrow_type(raw_part, tool_kind='tool-search')
# narrowed is now a NativeToolSearchCallPart with typed .args TypedDict

# The tool_kind can also be injected inline:
narrowed = NativeToolCallPart.narrow_type(raw_part)  # uses part.tool_kind

Streaming delta accumulation with `ToolCallPartDelta`

During streaming, call arguments arrive incrementally as ToolCallPartDelta objects that you accumulate by appending to the args string:

from pydantic_ai.messages import ToolCallPartDelta

# Accumulate streaming deltas
accumulated_args = ""
async for event in agent.run_stream_events("call the tool"):
    if hasattr(event, 'delta') and isinstance(event.delta, ToolCallPartDelta):
        if event.delta.args_delta:
            accumulated_args += event.delta.args_delta

`args_as_dict` error handling

# Default: graceful on malformed JSON
part = ToolCallPart(tool_name="my_tool", args='{"broken": ')
safe_dict = part.args_as_dict()
# Returns: {'INVALID_JSON': '{"broken": '}  — safe to pass to a model retry

# Strict: re-raises ValueError on malformed JSON
try:
    strict_dict = part.args_as_dict(raise_if_invalid=True)
except ValueError:
    # handle truncated tool call (e.g. token limit hit)
    ...

4. `BaseToolReturnPart` + `ToolReturnPart` + `NativeToolReturnPart` — Return-Part Family

Module: pydantic_ai.messages
Import: from pydantic_ai.messages import BaseToolReturnPart, ToolReturnPart, NativeToolReturnPart

After a tool executes, its result is wrapped in a ToolReturnPart (function tools) or NativeToolReturnPart (native tools) and placed in the next ModelRequest. Both extend BaseToolReturnPart which has the rich content-handling logic.

Class signatures

@dataclass(repr=False)
class BaseToolReturnPart:
    """Base class for all tool-return parts."""

    tool_name: str
    content: ToolReturnContent     # str | dict | list | MultiModalContent | ...
    tool_call_id: str = field(default_factory=generate_tool_call_id)
    tool_kind: ToolPartKind | None = None
    metadata: Any = None           # app-only; never sent to LLM
    timestamp: datetime = field(default_factory=now_utc)
    outcome: Literal['success', 'failed', 'denied'] = 'success'

    # Content accessors
    def model_response_str(self) -> str: ...
    def model_response_object(self) -> dict[str, Any]: ...
    def content_items(
        self, *, mode: Literal['raw', 'str', 'jsonable'] = 'raw'
    ) -> list: ...
    def files(self) -> list[MultiModalContent]: ...   # property


@dataclass(repr=False)
class ToolReturnPart(BaseToolReturnPart):
    """Result from a user-defined function tool."""
    part_kind: Literal['tool-return'] = 'tool-return'

    @staticmethod
    def narrow_type(
        part: 'ToolReturnPart',
        *,
        tool_kind: ToolPartKind | None = None,
    ) -> 'ToolReturnPart': ...


@dataclass(repr=False)
class NativeToolReturnPart(BaseToolReturnPart):
    """Result from a native model tool."""
    provider_name: str | None = None
    provider_details: dict[str, Any] | None = None
    part_kind: Literal['builtin-tool-return'] = 'builtin-tool-return'

    @staticmethod
    def narrow_type(
        part: 'NativeToolReturnPart',
        *,
        tool_kind: ToolPartKind | None = None,
    ) -> 'NativeToolReturnPart': ...

`outcome` field — tracking approval and failures

from pydantic_ai.messages import ToolReturnPart

# Normal success
part = ToolReturnPart(tool_name="search", content="Results...")
assert part.outcome == 'success'

# Denied by HITL approval
denied = ToolReturnPart(
    tool_name="delete_file",
    content="Tool call denied by operator.",
    outcome='denied',
)

# Failed execution
failed = ToolReturnPart(
    tool_name="execute_sql",
    content="ERROR: table 'users' does not exist",
    outcome='failed',
)

# Inspect from history
with capture_run_messages() as messages:
    result = await agent.run("do something dangerous")

from pydantic_ai.messages import ModelRequest
denials = [
    part
    for msg in messages
    if isinstance(msg, ModelRequest)
    for part in msg.parts
    if isinstance(part, ToolReturnPart) and part.outcome == 'denied'
]

Tools can return images, audio, or documents alongside text. The BaseToolReturnPart family handles splitting multi-modal content from scalar data:

from pydantic_ai import Agent, RunContext
from pydantic_ai.messages import BinaryContent

agent = Agent('anthropic:claude-opus-4-5')

@agent.tool
async def generate_chart(ctx: RunContext[None], data: list[float]) -> list:
    # Return both a description and the chart image
    chart_bytes = create_chart(data)
    return [
        "Here is the chart:",
        BinaryContent(data=chart_bytes, media_type="image/png"),
    ]

# Inspect the return part files after the run
with capture_run_messages() as messages:
    result = await agent.run("Plot [1, 2, 3, 4, 5]")

from pydantic_ai.messages import ModelRequest, ToolReturnPart
for msg in messages:
    if isinstance(msg, ModelRequest):
        for part in msg.parts:
            if isinstance(part, ToolReturnPart):
                print(f"text: {part.model_response_str()!r}")
                print(f"files: {len(part.files)} file(s)")

`content_items` for fine-grained serialisation

from pydantic_ai.messages import ToolReturnPart

part = ToolReturnPart(
    tool_name="analyze",
    content=[{"score": 0.95}, BinaryContent(data=b"...", media_type="image/png")],
)

# Raw items (no serialisation)
raw = part.content_items(mode='raw')

# Serialize non-file items to strings; pass BinaryContent through unchanged
str_items = part.content_items(mode='str')

# Serialize non-file items to JSON-compatible Python objects
json_items = part.content_items(mode='jsonable')

`NativeToolReturnPart.provider_details` — round-trip data

Native tools like web search may embed provider_details that must be sent back to the same provider on the next turn. PydanticAI handles this automatically; you only need to be aware when building custom providers:

from pydantic_ai.messages import NativeToolReturnPart

# Constructed by model implementations — provider_name is mandatory when provider_details is set
return_part = NativeToolReturnPart(
    tool_name="web_search",
    content="Search result text...",
    provider_name="anthropic",
    provider_details={"search_result_id": "srq_123", "cache_control": {"type": "ephemeral"}},
    tool_kind="tool-search",
)

5. `GraphAgentState` — Agent Run State Internals

Module: pydantic_ai._agent_graph
Import (internal): from pydantic_ai._agent_graph import GraphAgentState

GraphAgentState is the mutable state dataclass that the pydantic_graph runtime threads through UserPromptNode → ModelRequestNode → CallToolsNode on every step. Understanding it is key for building custom graph runners or interpreting low-level diagnostics.

Class signature

@dataclasses.dataclass(kw_only=True)
class GraphAgentState:
    """State kept across the execution of the agent graph."""

    message_history: list[ModelMessage] = field(default_factory=list)
    usage: RunUsage = field(default_factory=RunUsage)
    output_retries_used: int = 0
    run_step: int = 0
    run_id: str = field(default_factory=lambda: str(uuid7()))
    conversation_id: str = field(default_factory=lambda: str(uuid7()))
    metadata: dict[str, Any] | None = None
    last_max_tokens: int | None = None
    last_model_request_parameters: ModelRequestParameters | None = None
    pending_messages: list[PendingMessage] = field(default_factory=list)

    def check_incomplete_tool_call(self) -> None: ...
    def consume_output_retry(
        self,
        max_output_retries: int,
        error: BaseException | None = None,
    ) -> None: ...

Field reference

Field	Purpose
`message_history`	Accumulated `ModelRequest`/`ModelResponse` list for the current run
`usage`	Aggregated `RunUsage` summed across all model calls so far
`output_retries_used`	Counter of output validator retries; checked against `max_output_retries`
`run_step`	Incremented on each `ModelRequestNode` execution; useful for observability
`run_id`	UUID7 for the current agent run; matches `RunContext.run_id`
`conversation_id`	UUID7 spanning multi-run conversations; matches `RunContext.conversation_id`
`metadata`	App-level metadata dict threaded through the run
`last_max_tokens`	Stored to produce accurate token-limit exceeded error messages
`last_model_request_parameters`	Last `ModelRequestParameters` for OTel span attributes
`pending_messages`	Internal queue for `RunContext.enqueue()` / `AgentRun.enqueue()`

Accessing state from `AgentRun.iter()`

import asyncio
from pydantic_ai import Agent
from pydantic_ai._agent_graph import GraphAgentState

agent = Agent('openai:gpt-4o-mini')

async def track_state():
    async with agent.iter("What is the capital of France?") as run:
        async for node in run:
            # Access state via the graph run context
            state: GraphAgentState = run.ctx.state
            print(
                f"step={state.run_step} "
                f"msgs={len(state.message_history)} "
                f"tokens_so_far={state.usage.total_tokens}"
            )
    print(f"Final run_id: {state.run_id}")
    print(f"Conversation ID: {state.conversation_id}")

asyncio.run(track_state())

`check_incomplete_tool_call()` — detecting token-limit truncation

The framework calls this automatically, but you can call it yourself when inspecting saved state:

from pydantic_ai._agent_graph import GraphAgentState
from pydantic_ai.exceptions import IncompleteToolCall

# Load a saved state snapshot
state = load_state_snapshot()

try:
    state.check_incomplete_tool_call()
except IncompleteToolCall as e:
    # Last model response was truncated mid-tool-call JSON
    print(f"Truncated tool call detected: {e}")
    # Increase max_tokens and re-run, or simplify the prompt

`consume_output_retry()` — retry budget enforcement

from pydantic_ai._agent_graph import GraphAgentState
from pydantic_ai.exceptions import UnexpectedModelBehavior

state = GraphAgentState()

# Simulates what CallToolsNode does when output validation fails
try:
    state.consume_output_retry(max_output_retries=3)
    state.consume_output_retry(max_output_retries=3)
    state.consume_output_retry(max_output_retries=3)
    state.consume_output_retry(max_output_retries=3)  # raises on the 4th call
except UnexpectedModelBehavior:
    print("Exceeded 3 output retries — abort")

6. `MCPServerTool` — Native MCP Server Integration

Module: pydantic_ai.native_tools
Import: from pydantic_ai import MCPServerTool

MCPServerTool is a native tool that tells a model to connect to an MCP server at the network level, offloading tool discovery and invocation entirely to the provider. This is distinct from MCPToolset (which manages MCP tool calls inside PydanticAI); MCPServerTool delegates execution directly to the provider’s native MCP support.

Class signature

@dataclass(kw_only=True)
class MCPServerTool(AbstractNativeTool):
    """A native tool that allows your agent to use MCP servers.

    Supported by: OpenAI Responses, Anthropic, xAI
    """

    id: str                                    # unique identifier for this server
    url: str                                   # MCP server URL
    authorization_token: str | None = None     # Bearer token for auth
    description: str | None = None            # server description for the model
    allowed_tools: list[str] | None = None    # restrict which MCP tools are exposed
    headers: dict[str, str] | None = None     # custom HTTP headers

    kind: str = 'mcp_server'

    @property
    def unique_id(self) -> str:
        return f'mcp_server:{self.id}'

    @property
    def label(self) -> str:
        return f'MCP: {self.id}'

Provider support matrix

Feature	OpenAI Responses	Anthropic	xAI
`url`	✓	✓	✓
`authorization_token`	✓	✓	✓
`description`	✓	—	✓
`allowed_tools`	✓	✓	✓
`headers`	✓	—	✓
OpenAI connector_id	via `url` prefix	—	—

Basic usage

from pydantic_ai import Agent
from pydantic_ai.capabilities import NativeTool
from pydantic_ai import MCPServerTool

agent = Agent(
    'openai:gpt-4o',
    capabilities=[
        NativeTool(
            MCPServerTool(
                id="my-db-mcp",
                url="https://my-mcp-server.example.com/mcp",
                authorization_token="Bearer sk-...",
                allowed_tools=["query_database", "list_tables"],
                description="Internal database MCP server",
            )
        )
    ],
)

result = agent.run_sync("List all tables in the database")
print(result.data)

Multiple MCP servers

from pydantic_ai import Agent, MCPServerTool
from pydantic_ai.capabilities import NativeTool

agent = Agent(
    'anthropic:claude-opus-4-5',
    capabilities=[
        NativeTool(
            MCPServerTool(
                id="search-mcp",
                url="https://search.example.com/mcp",
                authorization_token="sk-search-token",
            )
        ),
        NativeTool(
            MCPServerTool(
                id="calendar-mcp",
                url="https://calendar.example.com/mcp",
                authorization_token="sk-calendar-token",
                allowed_tools=["list_events", "create_event"],
            )
        ),
    ],
)

OpenAI connector ID pattern

OpenAI Responses supports managed MCP servers via connector IDs. Pass the connector ID with the x-openai-connector: prefix:

MCPServerTool(
    id="openai-managed-server",
    url="x-openai-connector:<your_connector_id>",
    allowed_tools=["search", "summarize"],
)

Custom headers for enterprise auth

MCPServerTool(
    id="enterprise-mcp",
    url="https://internal.corp.com/mcp",
    headers={
        "X-Tenant-ID": "acme-corp",
        "X-Service-Account": "pydantic-ai-agent",
    },
    authorization_token="Bearer <service-account-token>",
)

Comparing `MCPServerTool` vs `MCPToolset`

Aspect	`MCPServerTool`	`MCPToolset`
Execution location	Provider’s infrastructure	Your Python process
Tool discovery	Provider handles it	PydanticAI fetches tool list at startup
Supported providers	OpenAI, Anthropic, xAI	All (provider-agnostic)
Observability	Via provider logs	Full PydanticAI OTel traces
HITL / approval	Provider-only	`ApprovalRequiredToolset` wrapper
Transport	Provider-managed	SSE, HTTP, stdio (configurable)

7. `FileSearchTool` — Native RAG File Search

Module: pydantic_ai.native_tools
Import: from pydantic_ai import FileSearchTool

FileSearchTool gives the model access to a fully managed vector-search RAG system backed by the provider’s file storage infrastructure. It handles chunking, embedding generation, and context injection, requiring only file store IDs from you.

Class signature

@dataclass(kw_only=True)
class FileSearchTool(AbstractNativeTool):
    """A native tool that allows your agent to search through uploaded files.

    Supported by: OpenAI Responses, Google (Gemini), xAI
    """

    file_store_ids: Sequence[str]
    # OpenAI: vector store IDs created via OpenAI API
    # Google: file search store names from Gemini Files API
    # xAI:    collection IDs for xAI collections search

    kind: str = 'file_search'

Provider-specific file store setup

OpenAI vector stores:

from openai import OpenAI

client = OpenAI()

# 1. Create a vector store
store = client.vector_stores.create(name="product-docs")

# 2. Upload files
with open("manual.pdf", "rb") as f:
    client.vector_stores.file_batches.upload_and_poll(
        vector_store_id=store.id,
        files=[("manual.pdf", f, "application/pdf")],
    )

Google Gemini Files API:

import google.generativeai as genai

genai.configure(api_key="...")
file_ref = genai.upload_file("docs/guide.pdf")
store_name = file_ref.name  # e.g. "files/abc123"

Using `FileSearchTool` with an agent

import asyncio
from pydantic_ai import Agent, FileSearchTool
from pydantic_ai.capabilities import NativeTool

VECTOR_STORE_ID = "vs_abc123"

agent = Agent(
    'openai:gpt-4o',
    capabilities=[
        NativeTool(
            FileSearchTool(file_store_ids=[VECTOR_STORE_ID])
        )
    ],
)

async def main():
    result = await agent.run(
        "What does the product manual say about warranty coverage?"
    )
    print(result.data)

asyncio.run(main())

Multiple vector stores

agent = Agent(
    'openai:gpt-4o',
    capabilities=[
        NativeTool(
            FileSearchTool(
                file_store_ids=[
                    "vs_product_docs",
                    "vs_support_tickets",
                    "vs_legal_contracts",
                ]
            )
        )
    ],
)

Google Gemini integration

agent = Agent(
    'google-gla:gemini-2.0-flash',
    capabilities=[
        NativeTool(
            FileSearchTool(
                file_store_ids=["files/abc123", "files/def456"]
            )
        )
    ],
)

xAI collections search

agent = Agent(
    'xai:grok-3',
    capabilities=[
        NativeTool(
            FileSearchTool(
                file_store_ids=["collection_id_1", "collection_id_2"]
            )
        )
    ],
)

Comparing `FileSearchTool` vs `DeferredLoadingToolset` for RAG

Aspect	`FileSearchTool`	Custom `FunctionToolset` RAG
Infrastructure	Provider-managed	Your embedding DB + retrieval code
Cross-provider	OpenAI / Google / xAI only	Any model
Chunking strategy	Provider default	Fully configurable
Re-ranking	Provider-managed	Configurable
Latency	Provider-optimised	Your infrastructure
Cost transparency	Provider billing	Your embedding costs

8. `IncludeReturnSchemasToolset` — Auto Return-Schema Injection

Module: pydantic_ai.toolsets.include_return_schemas
Import: from pydantic_ai import IncludeReturnSchemasToolset

IncludeReturnSchemasToolset is a PreparedToolset subclass that sets include_return_schema=True on every ToolDefinition that doesn’t already have an explicit return schema setting. This instructs the model to validate its tool calls against the tool’s return type JSON schema — useful for models that support structured tool outputs and for improving type-safety in multi-step pipelines.

Class signature

@dataclass(init=False)
class IncludeReturnSchemasToolset(PreparedToolset[AgentDepsT]):
    """A toolset that sets include_return_schema=True on all its tools.

    Wraps any AbstractToolset and injects include_return_schema=True
    into every ToolDefinition whose include_return_schema is still None.
    """

    def __init__(self, wrapped: AbstractToolset[AgentDepsT]) -> None: ...

Internally it works by wrapping the wrapped toolset in a PreparedToolset with an async _include function that iterates over tool definitions and calls dataclasses.replace(td, include_return_schema=True) for any td where include_return_schema is None.

Usage

from pydantic_ai import Agent
from pydantic_ai.toolsets import FunctionToolset
from pydantic_ai import IncludeReturnSchemasToolset
from pydantic import BaseModel

class WeatherReport(BaseModel):
    temperature_c: float
    condition: str
    humidity_percent: int

toolset = FunctionToolset()

@toolset.tool
def get_weather(city: str) -> WeatherReport:
    """Get current weather for a city."""
    return WeatherReport(temperature_c=22.5, condition="sunny", humidity_percent=45)

agent = Agent(
    'openai:gpt-4o',
    toolsets=[IncludeReturnSchemasToolset(toolset)],
)

result = agent.run_sync("What's the weather in Paris?")

Combining with `FilteredToolset` for RBAC

from pydantic_ai import Agent, FilteredToolset, IncludeReturnSchemasToolset
from pydantic_ai.toolsets import FunctionToolset

admin_toolset = FunctionToolset()
# ... register admin tools ...

user_context_toolset = FilteredToolset(
    admin_toolset,
    filter=lambda ctx, td: td.name in ctx.deps["allowed_tools"],
)

# Include return schemas on the filtered set
typed_toolset = IncludeReturnSchemasToolset(user_context_toolset)

agent = Agent('openai:gpt-4o', toolsets=[typed_toolset])

When to use

include_return_schema=True is most useful when:

Chaining tools — downstream tools use the typed output of upstream tools
Structured output pipelines — you want the model to reason about data structure
OpenAI Structured Outputs — the model’s response_format is json_schema and tools should match

# Without IncludeReturnSchemasToolset: tool return schema omitted from model request
# With IncludeReturnSchemasToolset: each ToolDefinition.json_schema includes a
#   "return" key with the full Pydantic JSON schema for the return type

# Inspect the resulting tool definitions:
from pydantic_ai.toolsets import FunctionToolset
from pydantic_ai import IncludeReturnSchemasToolset

base = FunctionToolset()

@base.tool
def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

wrapped = IncludeReturnSchemasToolset(base)

import asyncio
from pydantic_ai.tools import RunContext

async def inspect():
    ctx = RunContext(deps=None, ...)  # minimal ctx for inspection
    tool_defs = await wrapped.get_tools(ctx)
    for td in tool_defs:
        print(td.name, "include_return_schema:", td.include_return_schema)
        # add  include_return_schema: True

9. `ToolChoice` + `ToolOrOutput` — Tool Selection Control

Module: pydantic_ai.settings
Import: from pydantic_ai.settings import ToolChoice, ToolOrOutput

ToolChoice is a type alias that controls how the model selects between available tools and output modes on a per-request basis. ToolOrOutput is a dataclass that lets you restrict function tools while keeping output and text/image output paths available.

Type aliases and class signature

ToolChoiceScalar = Literal['none', 'required', 'auto']

@dataclass
class ToolOrOutput:
    """Restricts function tools while keeping output tools and text/image output available."""
    function_tools: list[str]   # names of the function tools the model may call

ToolChoice = ToolChoiceScalar | list[str] | ToolOrOutput | None

`ToolChoice` value semantics

Value	Behaviour
`None`	Default; model decides which tool to use (equivalent to `'auto'`)
`'auto'`	Model may call any tool or produce text/output
`'required'`	Model must call at least one tool before finishing
`'none'`	Model must not call any tools; forces text/output response
`list[str]`	Model must call exactly one of these named function tools
`ToolOrOutput(...)`	Named function tools available plus output tools and text/image

Setting `tool_choice` via `ModelSettings`

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings

agent = Agent('openai:gpt-4o-mini')

# Force the model to always call a tool
result = agent.run_sync(
    "What is the weather in London?",
    model_settings=ModelSettings(tool_choice='required'),
)

# Allow only a specific tool
result = agent.run_sync(
    "Search for Python tutorials",
    model_settings=ModelSettings(tool_choice=['web_search']),
)

# Disable tools entirely (force text response)
result = agent.run_sync(
    "Tell me a joke",
    model_settings=ModelSettings(tool_choice='none'),
)

`ToolOrOutput` — mixing function tools with output

ToolOrOutput is useful when you want to allow the model to call specific function tools or produce structured output, without allowing all available tools:

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings, ToolOrOutput
from pydantic_ai.toolsets import FunctionToolset
from pydantic import BaseModel

toolset = FunctionToolset()

@toolset.tool
def search_knowledge_base(query: str) -> str:
    """Search internal KB."""
    return "relevant content..."

@toolset.tool
def escalate_to_human(reason: str) -> str:
    """Escalate the query to a human agent."""
    return "escalated"

class FinalAnswer(BaseModel):
    answer: str
    confidence: float

agent = Agent(
    'openai:gpt-4o',
    output_type=FinalAnswer,
    toolsets=[toolset],
)

# Allow only 'search_knowledge_base' as a function tool,
# but the model can still use the output tool to produce FinalAnswer
result = agent.run_sync(
    "What is our refund policy?",
    model_settings=ModelSettings(
        tool_choice=ToolOrOutput(function_tools=['search_knowledge_base'])
    ),
)

Using `tool_choice` in a capability hook

from pydantic_ai.capabilities import AbstractCapability
from pydantic_ai.settings import ModelSettings
from pydantic_ai.models import ModelRequestContext

class ForceSearchCapability(AbstractCapability):
    """Forces the model to call search tools on the first step."""

    async def before_model_request(
        self,
        messages,
        info: ModelRequestContext,
    ) -> ModelSettings | None:
        if info.run_step == 0:
            return ModelSettings(tool_choice='required')
        return None

agent = Agent('openai:gpt-4o', capabilities=[ForceSearchCapability()])

Dynamic `tool_choice` based on context

from pydantic_ai import Agent, RunContext
from pydantic_ai.settings import ModelSettings

async def get_model_settings(ctx: RunContext[dict]) -> ModelSettings | None:
    user_role = ctx.deps.get("role", "user")
    if user_role == "admin":
        # Admins can call any tool
        return ModelSettings(tool_choice='auto')
    else:
        # Regular users can only call read-only tools
        return ModelSettings(tool_choice=['search', 'get_info'])

agent = Agent(
    'openai:gpt-4o',
    model_settings=get_model_settings,  # callable form
)

10. `ServiceTier` + `ThinkingLevel` / `ThinkingEffort` — Cross-Provider Config Type Aliases

Module: pydantic_ai.settings
Import: from pydantic_ai.settings import ServiceTier, ThinkingLevel, ThinkingEffort

These two type aliases centralise cross-provider configuration that would otherwise require provider-specific settings. Both are defined in pydantic_ai.settings and consumed via ModelSettings.

Type alias definitions

ServiceTier: TypeAlias = Literal['auto', 'default', 'flex', 'priority']

ThinkingEffort: TypeAlias = Literal['minimal', 'low', 'medium', 'high', 'xhigh']

ThinkingLevel: TypeAlias = bool | ThinkingEffort
# True  → enable thinking with provider default effort
# False → disable thinking (silently ignored on always-on models)
# 'minimal' / 'low' / 'medium' / 'high' / 'xhigh' → specific effort level

`ServiceTier` — cross-provider billing tier control

ServiceTier provides a unified way to select processing tier across providers that support tiered billing, without needing to use provider-specific settings:

Value	OpenAI	Anthropic	Bedrock	Google Gemini API	Google Cloud
`'auto'`	`'auto'`	`'auto'`	(omitted)	(omitted)	PT then on-demand
`'default'`	`'default'`	`'standard_only'`	`{'type': 'default'}`	`'standard'`	PT then on-demand
`'flex'`	`'flex'`	(omitted)	`{'type': 'flex'}`	`'flex'`	PT then Flex PayGo
`'priority'`	`'priority'`	(omitted)	`{'type': 'priority'}`	`'priority'`	PT then Priority PayGo

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings

# Cost-optimised batch processing
agent_flex = Agent(
    'openai:gpt-4o',
    model_settings=ModelSettings(service_tier='flex'),
)

# Low-latency customer-facing requests
agent_priority = Agent(
    'openai:gpt-4o',
    model_settings=ModelSettings(service_tier='priority'),
)

# Adapt tier based on request priority
async def get_settings(ctx) -> ModelSettings | None:
    if ctx.deps.get("is_urgent"):
        return ModelSettings(service_tier='priority')
    return ModelSettings(service_tier='flex')

agent_dynamic = Agent('google-gla:gemini-2.0-flash', model_settings=get_settings)

Per-provider overrides (openai_service_tier, anthropic_service_tier, etc.) always take precedence over the unified service_tier when both are set.

`ThinkingLevel` — cross-provider extended thinking control

ThinkingLevel wraps both boolean on/off control and granular effort levels into a single field:

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings

# Enable with provider default effort
agent = Agent(
    'anthropic:claude-opus-4-5',
    model_settings=ModelSettings(thinking=True),
)

# Disable thinking (no-op on always-on models like o1/o3)
agent = Agent(
    'openai:o1',
    model_settings=ModelSettings(thinking=False),  # silently ignored
)

# Specific effort level
agent = Agent(
    'anthropic:claude-opus-4-5',
    model_settings=ModelSettings(thinking='high'),
)

result = agent.run_sync("Prove Fermat's Last Theorem step by step")
print(result.data)

Provider-level effort mapping

When an exact ThinkingEffort level isn’t supported by a provider, PydanticAI maps to the nearest available level:

Effort	Anthropic	OpenAI (o-series)	Google (Gemini)
`'minimal'`	`low` budget tokens	not supported → `'low'`	`dynamic`
`'low'`	low budget tokens	(omitted, default)	`dynamic`
`'medium'`	medium budget tokens	medium `reasoning_effort`	`dynamic`
`'high'`	high budget tokens	high `reasoning_effort`	`alta`
`'xhigh'`	max budget tokens	→ `'high'` on providers without xhigh	`alta`

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings

# Dynamic effort selection based on task complexity
async def adaptive_thinking(ctx) -> ModelSettings | None:
    complexity = ctx.deps.get("complexity_score", 0.5)
    if complexity > 0.8:
        effort = 'xhigh'
    elif complexity > 0.5:
        effort = 'high'
    elif complexity > 0.2:
        effort = 'medium'
    else:
        effort = 'low'
    return ModelSettings(thinking=effort)

agent = Agent(
    'anthropic:claude-opus-4-5',
    model_settings=adaptive_thinking,
)

Combining `ServiceTier` + `ThinkingLevel` for cost management

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings

# High-accuracy, higher-cost pipeline
accuracy_agent = Agent(
    'openai:gpt-4o',
    model_settings=ModelSettings(
        service_tier='priority',
        thinking='xhigh',
    ),
)

# Cost-optimised background processing
batch_agent = Agent(
    'openai:gpt-4o-mini',
    model_settings=ModelSettings(
        service_tier='flex',
        thinking=False,
    ),
)

Summary

#	Class(es)	Module	Key takeaways
1	`ModelRequest` + `ModelResponse`	`pydantic_ai.messages`	Wire-format anatomy; `FinishReason`/`ModelResponseState` type aliases; `run_id`/`conversation_id` threading; `metadata` never sent to LLM; `ModelMessagesTypeAdapter` for ser/de
2	`SystemPromptPart` + `UserPromptPart` + `RetryPromptPart`	`pydantic_ai.messages`	Three request-side part types; multi-modal `UserContent` in `UserPromptPart`; `dynamic_ref` for OTel attribution; `RetryPromptPart.model_response()` formats validation errors
3	`BaseToolCallPart` + `ToolCallPart` + `NativeToolCallPart`	`pydantic_ai.messages`	Call-part family; `args_as_dict(raise_if_invalid=)` graceful/strict JSON parsing; `narrow_type()` for typed subclass promotion; `ToolCallPartDelta` for streaming accumulation
4	`BaseToolReturnPart` + `ToolReturnPart` + `NativeToolReturnPart`	`pydantic_ai.messages`	Return-part family; `outcome` tracks success/failed/denied; `content_items(mode=)` for serialisation; `files` property for multi-modal extraction; `provider_details` round-trip
5	`GraphAgentState`	`pydantic_ai._agent_graph`	Run-state internals; `run_id`/`conversation_id`; `check_incomplete_tool_call()` for token-limit detection; `consume_output_retry()` budget enforcement; `pending_messages` queue
6	`MCPServerTool`	`pydantic_ai.native_tools`	Native MCP server (OpenAI/Anthropic/xAI); `allowed_tools` restriction; `headers` for enterprise auth; OpenAI connector ID via `x-openai-connector:` prefix; vs `MCPToolset` comparison
7	`FileSearchTool`	`pydantic_ai.native_tools`	Provider-managed RAG (OpenAI vector stores / Gemini Files API / xAI collections); `file_store_ids` parameter; zero-code chunking + embedding; vs custom RAG comparison
8	`IncludeReturnSchemasToolset`	`pydantic_ai.toolsets`	`PreparedToolset` subclass; auto-injects `include_return_schema=True`; composable with `FilteredToolset`; useful for structured tool-output pipelines
9	`ToolChoice` + `ToolOrOutput`	`pydantic_ai.settings`	`ToolChoiceScalar` (`'auto'`/`'required'`/`'none'`) + `list[str]` + `ToolOrOutput` + `None`; `ToolOrOutput.function_tools` restricts function tools while allowing output tools; set via `ModelSettings.tool_choice`
10	`ServiceTier` + `ThinkingLevel` / `ThinkingEffort`	`pydantic_ai.settings`	Cross-provider type aliases; `ServiceTier` maps to provider-specific billing tiers; `ThinkingLevel` unifies bool + 5 effort levels; per-provider settings override unified field; combine for cost management

All examples verified against pydantic-ai 1.105.0 source.

PydanticAI — Class Deep Dives Vol. 9

1. ModelRequest + ModelResponse — Wire-Format Message Anatomy

Class signatures

Field reference

Parsing message history

Replaying a request with modified metadata

Serialisation with ModelMessagesTypeAdapter

2. SystemPromptPart + UserPromptPart + RetryPromptPart — Request-Side Message Parts

Class signatures

SystemPromptPart — when to construct manually

UserPromptPart — multi-modal content

RetryPromptPart — how retry feedback works

Inspecting retries in captured messages

3. BaseToolCallPart + ToolCallPart + NativeToolCallPart — Call-Part Family

Class signatures

Reading tool call arguments

Typed subclass promotion with narrow_type

Streaming delta accumulation with ToolCallPartDelta

args_as_dict error handling

4. BaseToolReturnPart + ToolReturnPart + NativeToolReturnPart — Return-Part Family

Class signatures

outcome field — tracking approval and failures

Multi-modal tool returns

content_items for fine-grained serialisation

NativeToolReturnPart.provider_details — round-trip data

5. GraphAgentState — Agent Run State Internals

Class signature

Field reference

Accessing state from AgentRun.iter()

check_incomplete_tool_call() — detecting token-limit truncation

consume_output_retry() — retry budget enforcement

6. MCPServerTool — Native MCP Server Integration

Class signature

Provider support matrix

Basic usage

Multiple MCP servers

OpenAI connector ID pattern

Custom headers for enterprise auth

Comparing MCPServerTool vs MCPToolset

7. FileSearchTool — Native RAG File Search

Class signature

Provider-specific file store setup

Using FileSearchTool with an agent

Multiple vector stores

Google Gemini integration

xAI collections search

Comparing FileSearchTool vs DeferredLoadingToolset for RAG

8. IncludeReturnSchemasToolset — Auto Return-Schema Injection

Class signature

Usage

Combining with FilteredToolset for RBAC

When to use

9. ToolChoice + ToolOrOutput — Tool Selection Control

Type aliases and class signature

ToolChoice value semantics

Setting tool_choice via ModelSettings

ToolOrOutput — mixing function tools with output

Using tool_choice in a capability hook

Dynamic tool_choice based on context

10. ServiceTier + ThinkingLevel / ThinkingEffort — Cross-Provider Config Type Aliases

Type alias definitions

ServiceTier — cross-provider billing tier control

ThinkingLevel — cross-provider extended thinking control

Provider-level effort mapping

Combining ServiceTier + ThinkingLevel for cost management

Summary

1. `ModelRequest` + `ModelResponse` — Wire-Format Message Anatomy

Serialisation with `ModelMessagesTypeAdapter`

2. `SystemPromptPart` + `UserPromptPart` + `RetryPromptPart` — Request-Side Message Parts

`SystemPromptPart` — when to construct manually

`UserPromptPart` — multi-modal content

`RetryPromptPart` — how retry feedback works

3. `BaseToolCallPart` + `ToolCallPart` + `NativeToolCallPart` — Call-Part Family

Typed subclass promotion with `narrow_type`

Streaming delta accumulation with `ToolCallPartDelta`

`args_as_dict` error handling

4. `BaseToolReturnPart` + `ToolReturnPart` + `NativeToolReturnPart` — Return-Part Family

`outcome` field — tracking approval and failures

`content_items` for fine-grained serialisation

`NativeToolReturnPart.provider_details` — round-trip data

5. `GraphAgentState` — Agent Run State Internals

Accessing state from `AgentRun.iter()`

`check_incomplete_tool_call()` — detecting token-limit truncation

`consume_output_retry()` — retry budget enforcement

6. `MCPServerTool` — Native MCP Server Integration

Comparing `MCPServerTool` vs `MCPToolset`

7. `FileSearchTool` — Native RAG File Search

Using `FileSearchTool` with an agent

Comparing `FileSearchTool` vs `DeferredLoadingToolset` for RAG

8. `IncludeReturnSchemasToolset` — Auto Return-Schema Injection

Combining with `FilteredToolset` for RBAC

9. `ToolChoice` + `ToolOrOutput` — Tool Selection Control

`ToolChoice` value semantics

Setting `tool_choice` via `ModelSettings`

`ToolOrOutput` — mixing function tools with output

Using `tool_choice` in a capability hook

Dynamic `tool_choice` based on context

10. `ServiceTier` + `ThinkingLevel` / `ThinkingEffort` — Cross-Provider Config Type Aliases

`ServiceTier` — cross-provider billing tier control

`ThinkingLevel` — cross-provider extended thinking control

Combining `ServiceTier` + `ThinkingLevel` for cost management