PydanticAI: Testing with TestModel, FunctionModel & Overrides
Testing Agents
Section titled “Testing Agents”Verified against pydantic-ai==1.102.0 — source modules: pydantic_ai.models.test, pydantic_ai.models.function, pydantic_ai.agent.
PydanticAI ships two model implementations built for tests: TestModel (auto-generates tool calls + a response from JSON schema) and FunctionModel (you write the response-generating function). Combined with agent.override(...) and capture_run_messages, you can unit-test agents hermetically — no network, no API keys, deterministic.
Minimal runnable example
Section titled “Minimal runnable example”from pydantic_ai import Agentfrom pydantic_ai.models.test import TestModel
agent = Agent('openai:gpt-5.2', system_prompt='Be terse.')
def test_greet(): with agent.override(model=TestModel()): out = agent.run_sync('Hi') assert isinstance(out.output, str)agent.override(...) returns a context manager — always use with (or hold a single instance and call __enter__ / __exit__ on that same object). The override reverts on exit.
TestModel — structural test double
Section titled “TestModel — structural test double”Lives at pydantic_ai.models.test.TestModel (models/test.py:60). Given the agent’s tool schemas, it:
- Calls every tool once (unless you restrict with
call_tools=[...]). - Synthesises tool arguments that match each tool’s JSON schema.
- Produces a final response — a string, or args that match the output tool’s schema.
Constructor (models/test.py:94):
| Argument | Type | Default | Purpose |
|---|---|---|---|
call_tools | list[str] | Literal['all'] | 'all' | Which tools to call. Empty list = skip tools, go straight to output. |
custom_output_text | str | None | None | Force this string as the final text output. |
custom_output_args | Any | None | None | Force these args for the output tool (overrides schema-gen). |
seed | int | 0 | Seed for the schema-driven arg generator. |
model_name / profile / settings | — | — | Forwarded to the base Model. |
After the run, TestModel.last_model_request_parameters holds the final ModelRequestParameters — useful for asserting tools were offered in a given shape.
Asserting tool calls happened
Section titled “Asserting tool calls happened”from pydantic_ai import Agent, RunContextfrom pydantic_ai.models.test import TestModel
agent = Agent('openai:gpt-5.2')
@agent.tooldef lookup(ctx: RunContext[None], sku: str) -> dict: return {'sku': sku, 'price': 9.99}
def test_model_calls_lookup(): tm = TestModel() with agent.override(model=tm): result = agent.run_sync('Price of SKU ABC?') # TestModel invokes every tool; verify via the messages tool_names = [ p.tool_name for m in result.all_messages() for p in m.parts if getattr(p, 'part_kind', None) == 'tool-call' ] assert 'lookup' in tool_namesForcing specific output
Section titled “Forcing specific output”tm = TestModel(custom_output_text='mocked reply')with agent.override(model=tm): result = agent.run_sync('ignored')assert result.output == 'mocked reply'For agents with output_type=MyModel, set custom_output_args to a dict matching the schema.
FunctionModel — write your own response
Section titled “FunctionModel — write your own response”pydantic_ai.models.function.FunctionModel (models/function.py:45) lets you implement the model as a function. You get the full message history and metadata, and return a ModelResponse.
from pydantic_ai import Agentfrom pydantic_ai.messages import ModelMessage, ModelResponse, TextPartfrom pydantic_ai.models.function import AgentInfo, FunctionModel
def echo(messages: list[ModelMessage], info: AgentInfo) -> ModelResponse: last = messages[-1].parts[-1] return ModelResponse(parts=[TextPart(content=f'echo: {last.content}')])
agent = Agent(FunctionModel(echo))
result = agent.run_sync('hello')assert result.output == 'echo: hello'AgentInfo (models/function.py:219) exposes what the agent decided to send this step:
function_tools: list[ToolDefinition]output_tools: list[ToolDefinition]allow_text_output: boolmodel_settings: ModelSettings | Nonemodel_request_parameters: ModelRequestParametersinstructions: str | None
Use FunctionModel when you need branching behaviour (e.g. “first call returns a tool call, second returns the final answer”):
from pydantic_ai.messages import ToolCallPart
def first_tool_then_answer(messages, info): calls = [p for m in messages for p in m.parts if isinstance(p, ToolCallPart)] if not calls: return ModelResponse(parts=[ToolCallPart(tool_name='lookup', args={'sku': 'ABC'})]) return ModelResponse(parts=[TextPart('done')])Streaming FunctionModel
Section titled “Streaming FunctionModel”Pass stream_function= (either alone or alongside function=). The stream function is an async generator that yields str chunks (text deltas), DeltaToolCalls, or DeltaThinkingCalls. Yielding a str produces a text delta; DeltaToolCalls simulates a streaming tool call:
import asynciofrom collections.abc import AsyncIteratorfrom pydantic_ai import Agentfrom pydantic_ai.messages import ModelMessagefrom pydantic_ai.models.function import AgentInfo, FunctionModel
# A simple stream function that yields a sentence word-by-wordasync def stream_word_by_word( messages: list[ModelMessage], info: AgentInfo) -> AsyncIterator[str]: sentence = 'The answer is forty-two.' for word in sentence.split(): yield word + ' '
agent = Agent(FunctionModel(stream_function=stream_word_by_word))
async def test_streaming(): chunks = [] async with agent.run_stream('What is the answer?') as stream: async for chunk in stream.stream_text(delta=True): chunks.append(chunk) final = await stream.get_output()
assert final == 'The answer is forty-two. ' assert len(chunks) > 1 # Multiple streamed chunks received
asyncio.run(test_streaming())For streaming tool calls, yield a DeltaToolCalls dict — see models/function.py for the exact type signature.
Agent.override(...) — swap parts per run/test
Section titled “Agent.override(...) — swap parts per run/test”agent/__init__.py:1639. Temporarily replaces any of:
model(Model, aKnownModelName, or anystr)depstoolsets,toolsinstructionsmodel_settings,metadata,name
Returns a context manager — everything reverts on exit. Overrides are captured in contextvars, so they are safe under asyncio concurrency (each task sees its own overrides).
with agent.override(model=TestModel(), deps=FakeDB()): result = agent.run_sync('query')pytest fixture
Section titled “pytest fixture”import pytestfrom pydantic_ai import Agentfrom pydantic_ai.models.test import TestModel
@pytest.fixturedef test_agent(my_agent: Agent): with my_agent.override(model=TestModel()): yield my_agentcapture_run_messages — test error paths
Section titled “capture_run_messages — test error paths”from pydantic_ai import Agent, capture_run_messagesfrom pydantic_ai.exceptions import UnexpectedModelBehavior
def test_bad_output_surfaces_messages(): agent = Agent(..., output_type=StrictSchema) with capture_run_messages() as msgs: with pytest.raises(UnexpectedModelBehavior): agent.run_sync('...') # Inspect what the model actually produced assert any('tool-call' == getattr(p, 'part_kind', None) for m in msgs for p in m.parts)Only the first run* call inside the with is captured — don’t loop runs inside one context.
Snapshotting tool traffic
Section titled “Snapshotting tool traffic”from pydantic_ai.messages import ModelMessagesTypeAdapter
def test_tool_flow(snapshot): with agent.override(model=TestModel()): result = agent.run_sync('go') snapshot.assert_match( ModelMessagesTypeAdapter.dump_json(result.all_messages(), indent=2), 'tool_flow.json', )Pair with syrupy or pytest-snapshot.
Feature comparison
Section titled “Feature comparison”| Feature | TestModel | FunctionModel |
|---|---|---|
| Auto-generates args | yes (from JSON schema + seed) | no — you build the ModelResponse |
| Calls every tool | yes (unless call_tools=[...]) | only if your function emits ToolCallPart |
| Deterministic | yes | yes |
| Good for | smoke tests, end-to-end schema check | protocol tests, multi-step scenarios, error injection |
| Streaming | yes (via TestStreamedResponse) | yes (pass stream_function=) |
Mocking real providers vs TestModel
Section titled “Mocking real providers vs TestModel”| Approach | When to use |
|---|---|
TestModel | You want schema-correct, deterministic behaviour. Fast. |
FunctionModel | You need to control the exact ModelResponse. |
Real model + pytest-vcr | Regression-test against the real provider. Slow, flaky, needs API keys on record. |
respx / httpx_mock + real OpenAIModel | HTTP-layer testing. Flakier than FunctionModel. |
For most unit tests, prefer TestModel or FunctionModel. For contract tests, use real models in a CI nightly job.
Patterns
Section titled “Patterns”1. Test an agent’s contract without calling an LLM
Section titled “1. Test an agent’s contract without calling an LLM”def test_pricing_tool_is_registered(): tm = TestModel() with agent.override(model=tm): agent.run_sync('anything') names = [t.name for t in tm.last_model_request_parameters.function_tools] assert 'pricing' in names2. Inject failing tools to test recovery
Section titled “2. Inject failing tools to test recovery”@agent.tooldef fragile(ctx): raise ValueError('boom')
def test_retries_then_succeeds(): with agent.override(model=TestModel()): result = agent.run_sync('do it') assert result.output # agent recovered past the tool error3. Drive a multi-turn protocol with FunctionModel
Section titled “3. Drive a multi-turn protocol with FunctionModel”def script(messages, info): step = sum(1 for m in messages if m.kind == 'response') if step == 0: return ModelResponse(parts=[ToolCallPart('search', {'q': 'x'})]) if step == 1: return ModelResponse(parts=[ToolCallPart('refine', {'doc_id': 1})]) return ModelResponse(parts=[TextPart('final')])4. Assert ModelRetry is triggered by an output validator
Section titled “4. Assert ModelRetry is triggered by an output validator”from pydantic_ai import ModelRetry
@agent.output_validatorasync def must_be_uppercase(ctx, out: str) -> str: if out != out.upper(): raise ModelRetry('uppercase please') return out
def test_validator_retries(): with agent.override(model=TestModel(custom_output_text='hello')): with pytest.raises(UnexpectedModelBehavior): agent.run_sync('go')After output_retries attempts, ModelRetry bubbles up as UnexpectedModelBehavior.
5. Swap deps per test without rebuilding the agent
Section titled “5. Swap deps per test without rebuilding the agent”fake_db = FakeDB([{'sku': 'ABC', 'price': 9.99}])with agent.override(deps=fake_db, model=TestModel()): result = agent.run_sync('price ABC')Gotchas
Section titled “Gotchas”TestModelrandomises tool args. Pin behaviour withseed=42if you assert on them.overrideleaks if you manually__enter__without__exit__. Always usewith.- Async tests: prefer
await agent.run(...)insideasync def test_*— don’t mixrun_syncand a running event loop. include_return_schema:TestModeldoes not honour tool return schemas unless the agent is set to include them; seeIncludeReturnSchemasToolset.
Testing AgentInfo — assert what the agent offers the model
Section titled “Testing AgentInfo — assert what the agent offers the model”AgentInfo is passed to your FunctionModel function. Inspect it to write whitebox assertions about tool registration, output modes, and model settings:
from dataclasses import dataclassfrom pydantic import BaseModelfrom pydantic_ai import Agent, RunContextfrom pydantic_ai.messages import ModelMessage, ModelResponse, TextPartfrom pydantic_ai.models.function import AgentInfo, FunctionModel
class WeatherReport(BaseModel): city: str temperature_c: float conditions: str
@dataclassclass Deps: api_key: str
agent = Agent('test', output_type=WeatherReport, deps_type=Deps)
@agent.tooldef get_weather(ctx: RunContext[Deps], city: str) -> str: return f'Current weather in {city}: 22°C, sunny'
captured_info: AgentInfo | None = None
def capture_model(messages: list[ModelMessage], info: AgentInfo) -> ModelResponse: global captured_info captured_info = info # Return a valid WeatherReport-like tool output return ModelResponse(parts=[TextPart(content='{"city":"London","temperature_c":22.0,"conditions":"sunny"}')])
def test_agent_offers_correct_tools(): with agent.override(model=FunctionModel(capture_model)): agent.run_sync('What is the weather in London?', deps=Deps(api_key='key'))
assert captured_info is not None tool_names = {t.name for t in captured_info.function_tools} assert 'get_weather' in tool_names
# Verify output tools are configured assert len(captured_info.output_tools) > 0 # WeatherReport output tool
# Verify allow_text_output behaviour assert captured_info.allow_text_output is False # Structured output only
test_agent_offers_correct_tools()print('All assertions passed.')Testing structured output with TestModel
Section titled “Testing structured output with TestModel”Use custom_output_args to inject specific structured output and verify your output validators:
from pydantic import BaseModelfrom pydantic_ai import Agentfrom pydantic_ai.models.test import TestModel
class Invoice(BaseModel): amount: float currency: str description: str
agent = Agent('test', output_type=Invoice)
@agent.output_validatorasync def validate_currency(ctx, invoice: Invoice) -> Invoice: if invoice.currency not in ('USD', 'EUR', 'GBP'): from pydantic_ai import ModelRetry raise ModelRetry(f'Unknown currency: {invoice.currency!r}. Use USD, EUR, or GBP.') return invoice
def test_valid_invoice(): tm = TestModel(custom_output_args={ 'amount': 99.99, 'currency': 'USD', 'description': 'Monthly subscription', }) with agent.override(model=tm): result = agent.run_sync('Generate an invoice.') assert result.output.amount == 99.99 assert result.output.currency == 'USD'
test_valid_invoice()Reference
Section titled “Reference”TestModel—models/test.py:60FunctionModel,AgentInfo,DeltaToolCalls—models/function.pyAgent.override(...)—agent/__init__.py:1639capture_run_messages()—_agent_graph.py:1791ModelMessagesTypeAdapter—messages.py:2034