Skip to content

PydanticAI: Model Providers & FallbackModel

Verified against pydantic-ai==1.85.1 — source modules: pydantic_ai.providers, pydantic_ai.models, pydantic_ai.models.fallback.

A PydanticAI Agent talks to an LLM through a Model backed by a Provider. The quickest way to wire one up is the 'provider:model-name' string; the full way is constructing SpecificModel(..., provider=SpecificProvider(...)) yourself. This page lists every prefix the installed source recognises and how to compose, gateway, or fall back between them.

from pydantic_ai import Agent
agent = Agent('openai:gpt-5.2')
print(agent.run_sync('Hello!').output)
# Same thing, constructed explicitly:
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
agent = Agent(OpenAIChatModel('gpt-5.2', provider=OpenAIProvider(api_key='sk-...')))

The string 'openai:gpt-5.2' is parsed by infer_provider_class + infer_provider (providers/__init__.py:100, :234). The first token before : selects the provider; the remainder is the model name, verbatim.

Verified from providers/__init__.py:100:

PrefixModel classNotes
openaiOpenAIChatModelDefault. OPENAI_API_KEY env var.
openai-chatOpenAIChatModelForces the Chat Completions API.
openai-responsesOpenAIResponsesModelForces the Responses API (reasoning, built-in tools).
anthropicAnthropicModelANTHROPIC_API_KEY.
google-glaGoogleModel (Gemini API)GEMINI_API_KEY. Formerly google.
google-vertex / vertexaiGoogleModel (Vertex AI)Uses ADC / service-account credentials.
bedrockBedrockConverseModelAWS credentials resolution.
groqGroqModelGROQ_API_KEY.
mistralMistralModelMISTRAL_API_KEY.
cohereCohereModelCOHERE_API_KEY.
xaiOpenAI*-compatible xAI modelXAI_API_KEY. Supports XSearchTool.
grokdeprecated alias of xaiPrefer xai:.
deepseekOpenAI-compatible DeepSeek
openrouterOpenAI-compatible OpenRouterRoute to any OR model.
vercelVercel AI Gateway
azureAzure OpenAIAzureProvider(endpoint=..., api_key=...).
cerebrasCerebras
moonshotaiMoonshot / Kimi
fireworksFireworks AI
togetherTogether AI
herokuHeroku Inference
huggingfaceHF Inference API
ollamaOllamaModel (local)OpenAI-chat-compatible, no API key.
githubGitHub Models
litellmLiteLLM gateway
nebius, ovhcloud, alibaba, sambanovaOpenAI-compatibleregional/cloud providers
outlinesOutlines (Transformers, vLLM, …)Local constrained decoding.
sentence-transformersEmbeddings onlypydantic_ai.embeddings.
voyageaiEmbeddings onlypydantic_ai.embeddings.
gateway/<upstream>Any upstream via Pydantic AI Gatewaye.g. 'gateway/openai:gpt-5.2'.

The full list of KnownModelName literals (200+ entries) is in models/__init__.py. An unknown string with a known prefix still works — it’s passed through to the provider.

Each provider accepts an api_key, a pre-built SDK client, or env-var fallback. Typical pattern:

from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.providers.anthropic import AnthropicProvider
model = AnthropicModel(
'claude-sonnet-4-6',
provider=AnthropicProvider(api_key='...'),
)
agent = Agent(model)

Useful when you need to:

  • Configure a custom httpx.AsyncClient (timeouts, proxies, retries).
  • Share a single SDK client across many agents.
  • Point at a self-hosted OpenAI-compatible endpoint (pass base_url= to OpenAIProvider).

OpenAI-compatible providers (OpenAIProvider(base_url='http://localhost:8000/v1', api_key='...')) unlock vLLM, LM Studio, oobabooga, or any homegrown server.

pydantic_ai.settings.ModelSettings is a TypedDict. Common fields verified in settings.py:

max_tokens, temperature, top_p, timeout, parallel_tool_calls, seed, presence_penalty, frequency_penalty, logit_bias, stop_sequences, extra_headers, extra_body, thinking (True / False / 'minimal' | 'low' | 'medium' | 'high' | 'xhigh').

agent = Agent(
'openai:gpt-5.2',
model_settings=ModelSettings(temperature=0.1, max_tokens=1024),
)

Provider-specific extensions (OpenAIChatModelSettings, AnthropicModelSettings, GoogleModelSettings) subclass it and add provider keys (e.g. openai_reasoning_effort, anthropic_thinking).

FallbackModel — wrap primaries with backups

Section titled “FallbackModel — wrap primaries with backups”

pydantic_ai.models.fallback.FallbackModel accepts a default + one or more fallbacks and decides when to switch.

from pydantic_ai import Agent
from pydantic_ai.models.fallback import FallbackModel
from pydantic_ai.exceptions import ModelAPIError
model = FallbackModel(
'openai:gpt-5.2',
'anthropic:claude-sonnet-4-6',
'google-gla:gemini-embedding-001', # won't ever run in practice, just showing multi-fallback
fallback_on=(ModelAPIError,),
)
agent = Agent(model)

fallback_on — exception types and response predicates

Section titled “fallback_on — exception types and response predicates”

FallbackModel.__init__ (models/fallback.py:81) accepts any of:

  • A tuple of exception types: (ModelAPIError, RateLimitError)
  • A single exception type: ModelAPIError
  • A sync/async exception handler: def(exc) -> bool
  • A sync/async response handler: def(resp: ModelResponse) -> bool
  • A sequence mixing any of the above.

Handler type is auto-detected from the first parameter’s type hint. ModelResponse → response handler; anything else (or untyped) → exception handler.

from pydantic_ai.messages import ModelResponse
def weak_response(resp: ModelResponse) -> bool:
# treat empty text as a failure worth switching on
texts = [p.content for p in resp.parts if getattr(p, 'part_kind', None) == 'text']
return not texts or all(not t.strip() for t in texts)
def is_rate_limit(exc) -> bool:
return 'rate' in str(exc).lower()
model = FallbackModel(
'openai:gpt-5.2',
'anthropic:claude-sonnet-4-6',
fallback_on=[weak_response, is_rate_limit, ModelAPIError],
)
  • fallback_on=() (empty tuple) raises UserError — “All exceptions will propagate”. Always supply at least one condition.
  • Fallbacks do not stack usage costs; result.usage reflects whichever model finally succeeded. Track per-model cost via OpenTelemetry (see InstrumentationSettings).
  • Exceptions from the last model propagate wrapped in FallbackExceptionGroup.

Prefix a known provider with gateway/ to route it through the Pydantic AI Gateway:

agent = Agent('gateway/openai:gpt-5.2')
# => uses the gateway provider, normalising to the upstream OpenAI profile

normalize_gateway_provider (providers/gateway.py) strips the prefix so model-profile lookups still resolve correctly.

# Ollama (no key, OpenAI-chat compatible)
agent = Agent('ollama:llama3.1')
# Any OpenAI-compatible local server
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
model = OpenAIChatModel(
'qwen2.5-coder:32b',
provider=OpenAIProvider(base_url='http://localhost:8000/v1', api_key='x'),
)

OpenAIProvider injects api_key='api-key-not-set' when you pass base_url without a key, which keeps the OpenAI SDK happy against local servers that don’t require auth.

from pydantic_ai.models.instrumented import InstrumentedModel, InstrumentationSettings
instrumented = InstrumentedModel(
OpenAIChatModel('gpt-5.2'),
options=InstrumentationSettings(event_mode='attributes'),
)
agent = Agent(instrumented)

Or globally: Agent.instrument_all(InstrumentationSettings(...)) (agent/__init__.py:844). See the production guide for Logfire / OTel wiring.

from pydantic_ai import limit_model_concurrency
model = limit_model_concurrency(OpenAIChatModel('gpt-5.2'), limit=8)

Enforces a max of 8 concurrent in-flight requests at the model layer.

model = FallbackModel(
'bedrock:us.anthropic.claude-sonnet-4-6',
'bedrock:eu.anthropic.claude-sonnet-4-6',
fallback_on=(ModelAPIError,),
)

3. Rate-limit-aware fallback with response sniff

Section titled “3. Rate-limit-aware fallback with response sniff”
def empty_or_short(resp: ModelResponse) -> bool:
for p in resp.parts:
if getattr(p, 'part_kind', None) == 'text' and len(p.content) >= 20:
return False
return True
model = FallbackModel('openai:gpt-5.2', 'anthropic:claude-sonnet-4-6',
fallback_on=[empty_or_short, ModelAPIError])

4. Self-hosted vLLM with a shared httpx client

Section titled “4. Self-hosted vLLM with a shared httpx client”
import httpx
shared = httpx.AsyncClient(timeout=60, limits=httpx.Limits(max_connections=50))
provider = OpenAIProvider(base_url='http://vllm:8000/v1', api_key='x', http_client=shared)
model = OpenAIChatModel('meta-llama/Llama-3.1-8B-Instruct', provider=provider)

5. Swap model per environment with agent.override

Section titled “5. Swap model per environment with agent.override”
if env == 'production':
ctx = agent.override(model='openai:gpt-5.2')
elif env == 'canary':
ctx = agent.override(model=FallbackModel('openai:gpt-5.2', 'anthropic:claude-sonnet-4-6'))
else:
from pydantic_ai.models.test import TestModel
ctx = agent.override(model=TestModel())
with ctx:
result = agent.run_sync(prompt)
  • infer_provider, infer_provider_classproviders/__init__.py:100, :234
  • KnownModelNamemodels/__init__.py (near the top)
  • FallbackModelmodels/fallback.py:69
  • InstrumentedModel, InstrumentationSettingsmodels/instrumented.py:78, :388
  • ModelSettings base — settings.py:24
  • limit_model_concurrencymodels/concurrency.py