PydanticAI: Testing with TestModel, FunctionModel & Overrides
Testing Agents
Section titled “Testing Agents”Verified against pydantic-ai==1.85.1 — source modules: pydantic_ai.models.test, pydantic_ai.models.function, pydantic_ai.agent.
PydanticAI ships two model implementations built for tests: TestModel (auto-generates tool calls + a response from JSON schema) and FunctionModel (you write the response-generating function). Combined with agent.override(...) and capture_run_messages, you can unit-test agents hermetically — no network, no API keys, deterministic.
Minimal runnable example
Section titled “Minimal runnable example”from pydantic_ai import Agentfrom pydantic_ai.models.test import TestModel
agent = Agent('openai:gpt-5.2', system_prompt='Be terse.')
def test_greet(): with agent.override(model=TestModel()): out = agent.run_sync('Hi') assert isinstance(out.output, str)agent.override(...) returns a context manager — always use with (or hold a single instance and call __enter__ / __exit__ on that same object). The override reverts on exit.
TestModel — structural test double
Section titled “TestModel — structural test double”Lives at pydantic_ai.models.test.TestModel (models/test.py:60). Given the agent’s tool schemas, it:
- Calls every tool once (unless you restrict with
call_tools=[...]). - Synthesises tool arguments that match each tool’s JSON schema.
- Produces a final response — a string, or args that match the output tool’s schema.
Constructor (models/test.py:94):
| Argument | Type | Default | Purpose |
|---|---|---|---|
call_tools | list[str] | Literal['all'] | 'all' | Which tools to call. Empty list = skip tools, go straight to output. |
custom_output_text | str | None | None | Force this string as the final text output. |
custom_output_args | Any | None | None | Force these args for the output tool (overrides schema-gen). |
seed | int | 0 | Seed for the schema-driven arg generator. |
model_name / profile / settings | — | — | Forwarded to the base Model. |
After the run, TestModel.last_model_request_parameters holds the final ModelRequestParameters — useful for asserting tools were offered in a given shape.
Asserting tool calls happened
Section titled “Asserting tool calls happened”from pydantic_ai import Agent, RunContextfrom pydantic_ai.models.test import TestModel
agent = Agent('openai:gpt-5.2')
@agent.tooldef lookup(ctx: RunContext[None], sku: str) -> dict: return {'sku': sku, 'price': 9.99}
def test_model_calls_lookup(): tm = TestModel() with agent.override(model=tm): result = agent.run_sync('Price of SKU ABC?') # TestModel invokes every tool; verify via the messages tool_names = [ p.tool_name for m in result.all_messages() for p in m.parts if getattr(p, 'part_kind', None) == 'tool-call' ] assert 'lookup' in tool_namesForcing specific output
Section titled “Forcing specific output”tm = TestModel(custom_output_text='mocked reply')with agent.override(model=tm): result = agent.run_sync('ignored')assert result.output == 'mocked reply'For agents with output_type=MyModel, set custom_output_args to a dict matching the schema.
FunctionModel — write your own response
Section titled “FunctionModel — write your own response”pydantic_ai.models.function.FunctionModel (models/function.py:45) lets you implement the model as a function. You get the full message history and metadata, and return a ModelResponse.
from pydantic_ai import Agentfrom pydantic_ai.messages import ModelMessage, ModelResponse, TextPartfrom pydantic_ai.models.function import AgentInfo, FunctionModel
def echo(messages: list[ModelMessage], info: AgentInfo) -> ModelResponse: last = messages[-1].parts[-1] return ModelResponse(parts=[TextPart(content=f'echo: {last.content}')])
agent = Agent(FunctionModel(echo))
result = agent.run_sync('hello')assert result.output == 'echo: hello'AgentInfo (models/function.py:219) exposes what the agent decided to send this step:
function_tools: list[ToolDefinition]output_tools: list[ToolDefinition]allow_text_output: boolmodel_settings: ModelSettings | Nonemodel_request_parameters: ModelRequestParametersinstructions: str | None
Use FunctionModel when you need branching behaviour (e.g. “first call returns a tool call, second returns the final answer”):
from pydantic_ai.messages import ToolCallPart
def first_tool_then_answer(messages, info): calls = [p for m in messages for p in m.parts if isinstance(p, ToolCallPart)] if not calls: return ModelResponse(parts=[ToolCallPart(tool_name='lookup', args={'sku': 'ABC'})]) return ModelResponse(parts=[TextPart('done')])Streaming FunctionModel
Section titled “Streaming FunctionModel”Pass stream_function= as well (either alone or with function=). The stream function is an async generator yielding DeltaToolCalls or strings — see models/function.py for the exact signature if you need true streaming tests.
Agent.override(...) — swap parts per run/test
Section titled “Agent.override(...) — swap parts per run/test”agent/__init__.py:1639. Temporarily replaces any of:
model(Model, aKnownModelName, or anystr)depstoolsets,toolsinstructionsmodel_settings,metadata,name
Returns a context manager — everything reverts on exit. Overrides are captured in contextvars, so they are safe under asyncio concurrency (each task sees its own overrides).
with agent.override(model=TestModel(), deps=FakeDB()): result = agent.run_sync('query')pytest fixture
Section titled “pytest fixture”import pytestfrom pydantic_ai import Agentfrom pydantic_ai.models.test import TestModel
@pytest.fixturedef test_agent(my_agent: Agent): with my_agent.override(model=TestModel()): yield my_agentcapture_run_messages — test error paths
Section titled “capture_run_messages — test error paths”from pydantic_ai import Agent, capture_run_messagesfrom pydantic_ai.exceptions import UnexpectedModelBehavior
def test_bad_output_surfaces_messages(): agent = Agent(..., output_type=StrictSchema) with capture_run_messages() as msgs: with pytest.raises(UnexpectedModelBehavior): agent.run_sync('...') # Inspect what the model actually produced assert any('tool-call' == getattr(p, 'part_kind', None) for m in msgs for p in m.parts)Only the first run* call inside the with is captured — don’t loop runs inside one context.
Snapshotting tool traffic
Section titled “Snapshotting tool traffic”from pydantic_ai.messages import ModelMessagesTypeAdapter
def test_tool_flow(snapshot): with agent.override(model=TestModel()): result = agent.run_sync('go') snapshot.assert_match( ModelMessagesTypeAdapter.dump_json(result.all_messages(), indent=2), 'tool_flow.json', )Pair with syrupy or pytest-snapshot.
Feature comparison
Section titled “Feature comparison”| Feature | TestModel | FunctionModel |
|---|---|---|
| Auto-generates args | yes (from JSON schema + seed) | no — you build the ModelResponse |
| Calls every tool | yes (unless call_tools=[...]) | only if your function emits ToolCallPart |
| Deterministic | yes | yes |
| Good for | smoke tests, end-to-end schema check | protocol tests, multi-step scenarios, error injection |
| Streaming | yes (via TestStreamedResponse) | yes (pass stream_function=) |
Mocking real providers vs TestModel
Section titled “Mocking real providers vs TestModel”| Approach | When to use |
|---|---|
TestModel | You want schema-correct, deterministic behaviour. Fast. |
FunctionModel | You need to control the exact ModelResponse. |
Real model + pytest-vcr | Regression-test against the real provider. Slow, flaky, needs API keys on record. |
respx / httpx_mock + real OpenAIModel | HTTP-layer testing. Flakier than FunctionModel. |
For most unit tests, prefer TestModel or FunctionModel. For contract tests, use real models in a CI nightly job.
Patterns
Section titled “Patterns”1. Test an agent’s contract without calling an LLM
Section titled “1. Test an agent’s contract without calling an LLM”def test_pricing_tool_is_registered(): tm = TestModel() with agent.override(model=tm): agent.run_sync('anything') names = [t.name for t in tm.last_model_request_parameters.function_tools] assert 'pricing' in names2. Inject failing tools to test recovery
Section titled “2. Inject failing tools to test recovery”@agent.tooldef fragile(ctx): raise ValueError('boom')
def test_retries_then_succeeds(): with agent.override(model=TestModel()): result = agent.run_sync('do it') assert result.output # agent recovered past the tool error3. Drive a multi-turn protocol with FunctionModel
Section titled “3. Drive a multi-turn protocol with FunctionModel”def script(messages, info): step = sum(1 for m in messages if m.kind == 'response') if step == 0: return ModelResponse(parts=[ToolCallPart('search', {'q': 'x'})]) if step == 1: return ModelResponse(parts=[ToolCallPart('refine', {'doc_id': 1})]) return ModelResponse(parts=[TextPart('final')])4. Assert ModelRetry is triggered by an output validator
Section titled “4. Assert ModelRetry is triggered by an output validator”from pydantic_ai import ModelRetry
@agent.output_validatorasync def must_be_uppercase(ctx, out: str) -> str: if out != out.upper(): raise ModelRetry('uppercase please') return out
def test_validator_retries(): with agent.override(model=TestModel(custom_output_text='hello')): with pytest.raises(UnexpectedModelBehavior): agent.run_sync('go')After output_retries attempts, ModelRetry bubbles up as UnexpectedModelBehavior.
5. Swap deps per test without rebuilding the agent
Section titled “5. Swap deps per test without rebuilding the agent”fake_db = FakeDB([{'sku': 'ABC', 'price': 9.99}])with agent.override(deps=fake_db, model=TestModel()): result = agent.run_sync('price ABC')Gotchas
Section titled “Gotchas”TestModelrandomises tool args. Pin behaviour withseed=42if you assert on them.overrideleaks if you manually__enter__without__exit__. Always usewith.- Async tests: prefer
await agent.run(...)insideasync def test_*— don’t mixrun_syncand a running event loop. include_return_schema:TestModeldoes not honour tool return schemas unless the agent is set to include them; seeIncludeReturnSchemasToolset.
Reference
Section titled “Reference”TestModel—models/test.py:60FunctionModel,AgentInfo,DeltaToolCall—models/function.pyAgent.override(...)—agent/__init__.py:1639capture_run_messages()—_agent_graph.py:1791ModelMessagesTypeAdapter—messages.py:2034