Skip to content

LLM providers and routing

Verified against crewai==1.14.3a2 (source: crewai/llm.py, crewai/llms/base_llm.py, crewai/llms/providers/).

The LLM factory is doing more than it looks. It’s a __new__-time router that picks between a native provider SDK and the LiteLLM fallback based on the model name — and it’s stricter than older docs suggest.

from crewai import LLM
# Native provider (OpenAI) — uses `openai` SDK directly
gpt = LLM(model="openai/gpt-4o-mini", temperature=0.2)
# Provider prefix omitted — LLM infers from model name
gpt = LLM(model="gpt-4o-mini")
# Non-native → LiteLLM fallback (requires litellm installed)
mistral = LLM(model="mistral/mistral-large-latest")

The first argument must include a recognised provider prefix if the model name is ambiguous; otherwise LLM._infer_provider_from_model tries to guess.

LLM(model=..., **kwargs)
┌─────────────────────────────────────┐
│ 1. kwargs has `provider=...` │
│ → force that provider, native │
└─────────────────────────────────────┘
│ no
┌─────────────────────────────────────┐
│ 2. "/" in model ("openai/gpt-4o") │
│ → prefix looked up in │
│ provider_mapping │
│ → if model is in the native │
│ constants → native SDK │
│ → otherwise → LiteLLM │
└─────────────────────────────────────┘
│ no
┌─────────────────────────────────────┐
│ 3. No "/" — infer from name pattern │
│ (gpt-*/claude-*/gemini-*…) │
└─────────────────────────────────────┘
Native SDK? → provider module under
crewai.llms.providers/
LiteLLM? → litellm.completion(...)

Native providers (source: SUPPORTED_NATIVE_PROVIDERS):

PrefixCanonical providerSDK package
openaiopenaiopenai
anthropic, claudeanthropicanthropic
azure, azure_openaiazureopenai (Azure)
google, geminigeminigoogle-generativeai
bedrock, awsbedrockboto3
openrouteropenrouteropenai-compatible
deepseek, ollama, ollama_chat, hosted_vllm, cerebras, dashscopesameopenai-compatible

Anything else (mistral/, groq/, cohere/, custom prefixes) falls through to LiteLLM.

LLM(model="openai/gpt-4o-mini", is_litellm=True)

Handy when you want LiteLLM’s uniform interface even for natively supported models (e.g. to reuse a LiteLLM router config).

The default LLM class extends BaseLLM with LiteLLM-specific knobs:

FieldTypeNotes
modelstrRequired. provider/model or a bare name.
temperaturefloat | None
top_p, top_logprobs, logprobsvariousStandard sampling params.
max_tokens, max_completion_tokensint | float | None
response_formatJsonResponseFormat | type[BaseModel] | NoneProvider-native structured output.
stoplist[str]Stop sequences.
seedint | NoneDeterministic sampling where supported.
presence_penalty, frequency_penalty, logit_biasfloat / dictOpenAI-style knobs.
base_urlstr | NoneInherited from BaseLLM. Used by native OpenAI-compatible paths (ollama/..., hosted_vllm/...).
api_base, api_version, api_keystrLLM-specific. api_base is the LiteLLM-style endpoint override; both base_url and api_base are passed through, so either name works for the fallback path.
reasoning_effort"none" | "low" | "medium" | "high"For o1/o3/o4 and compatible reasoning models.
thinkingAnyAnthropic extended thinking config.
streamboolGlobal default — Crew(stream=True) overrides per run.
callbackslist[Any]Passed to LiteLLM.
timeoutfloat | int | NonePer-call wall clock.
context_window_sizeintOverride auto-detected context window.
from crewai import LLM
o3 = LLM(model="openai/o3", reasoning_effort="medium")
claude = LLM(
model="anthropic/claude-sonnet-4-6",
thinking={"type": "enabled", "budget_tokens": 4096},
)
  • reasoning_effort maps onto OpenAI’s reasoning.effort.
  • thinking is passed through to Anthropic — the reasoning tokens show up in CrewOutput.token_usage.reasoning_tokens when enabled.
  • GPT-5 and the o-series intentionally ignore stop= in 1.14+; the router strips it so you don’t have to.
LLM(model="ollama/llama3.1:8b", base_url="http://localhost:11434")
LLM(model="hosted_vllm/my-finetune", api_base="http://vllm:8000/v1", api_key="EMPTY")

Both of these take the native OpenAI-compatible path — no LiteLLM needed.

from crewai import Agent, LLM
writer = Agent(
role="Writer",
goal="...",
backstory="...",
llm=LLM(model="openai/gpt-4o-mini"),
function_calling_llm=LLM(model="openai/gpt-4o"), # tool-call formatting on a smarter model
)

function_calling_llm is used only to format tool-call JSON arguments. Most models don’t need it; leave it unset unless your primary model struggles with JSON schemas.

Crew accepts three extra LLM slots:

FieldUsed by
manager_llmHierarchical process — spins up a default manager agent.
planning_llmCrew-level planning=True planner.
chat_llmcrewai chat CLI against this crew.
from crewai import Crew, Process, LLM
Crew(
agents=[...],
tasks=[...],
process=Process.hierarchical,
manager_llm=LLM(model="openai/gpt-4o"),
planning=True,
planning_llm=LLM(model="openai/gpt-4o-mini"),
)

When you need a provider CrewAI doesn’t support (or you want to proxy through your own gateway), subclass BaseLLM:

from crewai.llms.base_llm import BaseLLM
from typing import Any
class MyGatewayLLM(BaseLLM):
llm_type: str = "my_gateway"
def call(self, messages: list[dict], **kwargs: Any) -> str:
import httpx
r = httpx.post(
"https://gateway.internal/chat",
headers={"Authorization": f"Bearer {self.api_key}"},
json={"model": self.model, "messages": messages, "temperature": self.temperature},
timeout=60,
)
r.raise_for_status()
return r.json()["text"]
agent = Agent(role="...", goal="...", backstory="...",
llm=MyGatewayLLM(model="gw/llama-70b", api_key="..."))

Minimum surface: a call() method. Streaming and tool-use are opt-in — see the existing native providers under crewai/llms/providers/ for reference implementations.

import os
from crewai import LLM
def default_llm():
if os.getenv("USE_LOCAL"):
return LLM(model="ollama/llama3.1:8b", base_url="http://localhost:11434")
return LLM(model="openai/gpt-4o-mini")
agent = Agent(role="...", goal="...", backstory="...", llm=default_llm())
Agent(
role="Tool User",
goal="...",
backstory="...",
llm=LLM(model="openai/gpt-4o-mini"),
function_calling_llm=LLM(model="openai/gpt-4o"),
)

Only the tool-calling round uses the expensive model.

class Timing:
def log_pre_api_call(self, model, messages, kwargs):
self.t0 = time.perf_counter()
def log_post_api_call(self, *a, **k):
print("call took", time.perf_counter() - self.t0, "s")
llm = LLM(model="openai/gpt-4o", callbacks=[Timing()])

Callbacks are passed through to LiteLLM — the standard LiteLLM callback API applies.

4. Structured output via provider-native mode

Section titled “4. Structured output via provider-native mode”
from pydantic import BaseModel
class Plan(BaseModel):
steps: list[str]
llm = LLM(model="openai/gpt-4o", response_format=Plan)

Works for OpenAI, Gemini 2, and Claude’s structured-tool mode. For other providers set Task.output_pydantic instead.

LLM(model="groq/llama3-70b", is_litellm=True, api_base=os.environ["LITELLM_PROXY"])

Forces LiteLLM so you benefit from its router, rate-limiting, and fallbacks.

  • Provider prefix is checked against a whitelist. Typos (openaii/...) fall through to LiteLLM, which then fails with a confusing error. Stick to the table above.
  • is_anthropic and stop — the router strips stop for GPT-5 / o-series models in 1.14; passing it elsewhere works as expected.
  • LiteLLM is optional. Without it installed, any non-native model raises ImportError. Install with pip install litellm or uv add 'crewai[litellm]'.
  • max_tokens vs max_completion_tokens. OpenAI o-series wants the latter; LiteLLM maps both. If you set both, the native provider path picks max_completion_tokens.
  • Custom BaseLLM subclasses don’t inherit streaming — implement call_with_streaming / the streaming protocol yourself.
  • reasoning=True on Agent was renamed — use planning=True. The LLM’s reasoning_effort is separate and still valid.