PydanticAI: Model Providers & FallbackModel
Model Providers & FallbackModel
Section titled “Model Providers & FallbackModel”Verified against pydantic-ai==1.85.1 — source modules: pydantic_ai.providers, pydantic_ai.models, pydantic_ai.models.fallback.
A PydanticAI Agent talks to an LLM through a Model backed by a Provider. The quickest way to wire one up is the 'provider:model-name' string; the full way is constructing SpecificModel(..., provider=SpecificProvider(...)) yourself. This page lists every prefix the installed source recognises and how to compose, gateway, or fall back between them.
Minimal runnable example
Section titled “Minimal runnable example”from pydantic_ai import Agent
agent = Agent('openai:gpt-5.2')print(agent.run_sync('Hello!').output)
# Same thing, constructed explicitly:from pydantic_ai.models.openai import OpenAIChatModelfrom pydantic_ai.providers.openai import OpenAIProvider
agent = Agent(OpenAIChatModel('gpt-5.2', provider=OpenAIProvider(api_key='sk-...')))The string 'openai:gpt-5.2' is parsed by infer_provider_class + infer_provider (providers/__init__.py:100, :234). The first token before : selects the provider; the remainder is the model name, verbatim.
Provider prefixes
Section titled “Provider prefixes”Verified from providers/__init__.py:100:
| Prefix | Model class | Notes |
|---|---|---|
openai | OpenAIChatModel | Default. OPENAI_API_KEY env var. |
openai-chat | OpenAIChatModel | Forces the Chat Completions API. |
openai-responses | OpenAIResponsesModel | Forces the Responses API (reasoning, built-in tools). |
anthropic | AnthropicModel | ANTHROPIC_API_KEY. |
google-gla | GoogleModel (Gemini API) | GEMINI_API_KEY. Formerly google. |
google-vertex / vertexai | GoogleModel (Vertex AI) | Uses ADC / service-account credentials. |
bedrock | BedrockConverseModel | AWS credentials resolution. |
groq | GroqModel | GROQ_API_KEY. |
mistral | MistralModel | MISTRAL_API_KEY. |
cohere | CohereModel | COHERE_API_KEY. |
xai | OpenAI*-compatible xAI model | XAI_API_KEY. Supports XSearchTool. |
grok | deprecated alias of xai | Prefer xai:. |
deepseek | OpenAI-compatible DeepSeek | |
openrouter | OpenAI-compatible OpenRouter | Route to any OR model. |
vercel | Vercel AI Gateway | |
azure | Azure OpenAI | AzureProvider(endpoint=..., api_key=...). |
cerebras | Cerebras | |
moonshotai | Moonshot / Kimi | |
fireworks | Fireworks AI | |
together | Together AI | |
heroku | Heroku Inference | |
huggingface | HF Inference API | |
ollama | OllamaModel (local) | OpenAI-chat-compatible, no API key. |
github | GitHub Models | |
litellm | LiteLLM gateway | |
nebius, ovhcloud, alibaba, sambanova | OpenAI-compatible | regional/cloud providers |
outlines | Outlines (Transformers, vLLM, …) | Local constrained decoding. |
sentence-transformers | Embeddings only | pydantic_ai.embeddings. |
voyageai | Embeddings only | pydantic_ai.embeddings. |
gateway/<upstream> | Any upstream via Pydantic AI Gateway | e.g. 'gateway/openai:gpt-5.2'. |
The full list of KnownModelName literals (200+ entries) is in models/__init__.py. An unknown string with a known prefix still works — it’s passed through to the provider.
Explicit provider construction
Section titled “Explicit provider construction”Each provider accepts an api_key, a pre-built SDK client, or env-var fallback. Typical pattern:
from pydantic_ai import Agentfrom pydantic_ai.models.anthropic import AnthropicModelfrom pydantic_ai.providers.anthropic import AnthropicProvider
model = AnthropicModel( 'claude-sonnet-4-6', provider=AnthropicProvider(api_key='...'),)agent = Agent(model)Useful when you need to:
- Configure a custom
httpx.AsyncClient(timeouts, proxies, retries). - Share a single SDK client across many agents.
- Point at a self-hosted OpenAI-compatible endpoint (pass
base_url=toOpenAIProvider).
OpenAI-compatible providers (OpenAIProvider(base_url='http://localhost:8000/v1', api_key='...')) unlock vLLM, LM Studio, oobabooga, or any homegrown server.
ModelSettings — provider-agnostic knobs
Section titled “ModelSettings — provider-agnostic knobs”pydantic_ai.settings.ModelSettings is a TypedDict. Common fields verified in settings.py:
max_tokens, temperature, top_p, timeout, parallel_tool_calls, seed, presence_penalty, frequency_penalty, logit_bias, stop_sequences, extra_headers, extra_body, thinking (True / False / 'minimal' | 'low' | 'medium' | 'high' | 'xhigh').
agent = Agent( 'openai:gpt-5.2', model_settings=ModelSettings(temperature=0.1, max_tokens=1024),)Provider-specific extensions (OpenAIChatModelSettings, AnthropicModelSettings, GoogleModelSettings) subclass it and add provider keys (e.g. openai_reasoning_effort, anthropic_thinking).
FallbackModel — wrap primaries with backups
Section titled “FallbackModel — wrap primaries with backups”pydantic_ai.models.fallback.FallbackModel accepts a default + one or more fallbacks and decides when to switch.
from pydantic_ai import Agentfrom pydantic_ai.models.fallback import FallbackModelfrom pydantic_ai.exceptions import ModelAPIError
model = FallbackModel( 'openai:gpt-5.2', 'anthropic:claude-sonnet-4-6', 'google-gla:gemini-embedding-001', # won't ever run in practice, just showing multi-fallback fallback_on=(ModelAPIError,),)agent = Agent(model)fallback_on — exception types and response predicates
Section titled “fallback_on — exception types and response predicates”FallbackModel.__init__ (models/fallback.py:81) accepts any of:
- A tuple of exception types:
(ModelAPIError, RateLimitError) - A single exception type:
ModelAPIError - A sync/async exception handler:
def(exc) -> bool - A sync/async response handler:
def(resp: ModelResponse) -> bool - A sequence mixing any of the above.
Handler type is auto-detected from the first parameter’s type hint. ModelResponse → response handler; anything else (or untyped) → exception handler.
from pydantic_ai.messages import ModelResponse
def weak_response(resp: ModelResponse) -> bool: # treat empty text as a failure worth switching on texts = [p.content for p in resp.parts if getattr(p, 'part_kind', None) == 'text'] return not texts or all(not t.strip() for t in texts)
def is_rate_limit(exc) -> bool: return 'rate' in str(exc).lower()
model = FallbackModel( 'openai:gpt-5.2', 'anthropic:claude-sonnet-4-6', fallback_on=[weak_response, is_rate_limit, ModelAPIError],)Gotchas
Section titled “Gotchas”fallback_on=()(empty tuple) raisesUserError— “All exceptions will propagate”. Always supply at least one condition.- Fallbacks do not stack usage costs;
result.usagereflects whichever model finally succeeded. Track per-model cost via OpenTelemetry (seeInstrumentationSettings). - Exceptions from the last model propagate wrapped in
FallbackExceptionGroup.
Gateway routing
Section titled “Gateway routing”Prefix a known provider with gateway/ to route it through the Pydantic AI Gateway:
agent = Agent('gateway/openai:gpt-5.2')# => uses the gateway provider, normalising to the upstream OpenAI profilenormalize_gateway_provider (providers/gateway.py) strips the prefix so model-profile lookups still resolve correctly.
Local models
Section titled “Local models”# Ollama (no key, OpenAI-chat compatible)agent = Agent('ollama:llama3.1')
# Any OpenAI-compatible local serverfrom pydantic_ai.models.openai import OpenAIChatModelfrom pydantic_ai.providers.openai import OpenAIProvider
model = OpenAIChatModel( 'qwen2.5-coder:32b', provider=OpenAIProvider(base_url='http://localhost:8000/v1', api_key='x'),)OpenAIProvider injects api_key='api-key-not-set' when you pass base_url without a key, which keeps the OpenAI SDK happy against local servers that don’t require auth.
Instrumenting model calls
Section titled “Instrumenting model calls”from pydantic_ai.models.instrumented import InstrumentedModel, InstrumentationSettings
instrumented = InstrumentedModel( OpenAIChatModel('gpt-5.2'), options=InstrumentationSettings(event_mode='attributes'),)agent = Agent(instrumented)Or globally: Agent.instrument_all(InstrumentationSettings(...)) (agent/__init__.py:844). See the production guide for Logfire / OTel wiring.
Patterns
Section titled “Patterns”1. Provider-level concurrency limit
Section titled “1. Provider-level concurrency limit”from pydantic_ai import limit_model_concurrency
model = limit_model_concurrency(OpenAIChatModel('gpt-5.2'), limit=8)Enforces a max of 8 concurrent in-flight requests at the model layer.
2. Region-aware fallback
Section titled “2. Region-aware fallback”model = FallbackModel( 'bedrock:us.anthropic.claude-sonnet-4-6', 'bedrock:eu.anthropic.claude-sonnet-4-6', fallback_on=(ModelAPIError,),)3. Rate-limit-aware fallback with response sniff
Section titled “3. Rate-limit-aware fallback with response sniff”def empty_or_short(resp: ModelResponse) -> bool: for p in resp.parts: if getattr(p, 'part_kind', None) == 'text' and len(p.content) >= 20: return False return True
model = FallbackModel('openai:gpt-5.2', 'anthropic:claude-sonnet-4-6', fallback_on=[empty_or_short, ModelAPIError])4. Self-hosted vLLM with a shared httpx client
Section titled “4. Self-hosted vLLM with a shared httpx client”import httpx
shared = httpx.AsyncClient(timeout=60, limits=httpx.Limits(max_connections=50))provider = OpenAIProvider(base_url='http://vllm:8000/v1', api_key='x', http_client=shared)model = OpenAIChatModel('meta-llama/Llama-3.1-8B-Instruct', provider=provider)5. Swap model per environment with agent.override
Section titled “5. Swap model per environment with agent.override”if env == 'production': ctx = agent.override(model='openai:gpt-5.2')elif env == 'canary': ctx = agent.override(model=FallbackModel('openai:gpt-5.2', 'anthropic:claude-sonnet-4-6'))else: from pydantic_ai.models.test import TestModel ctx = agent.override(model=TestModel())with ctx: result = agent.run_sync(prompt)Reference
Section titled “Reference”infer_provider,infer_provider_class—providers/__init__.py:100,:234KnownModelName—models/__init__.py(near the top)FallbackModel—models/fallback.py:69InstrumentedModel,InstrumentationSettings—models/instrumented.py:78,:388ModelSettingsbase —settings.py:24limit_model_concurrency—models/concurrency.py