LLM providers and routing
Verified against crewai==1.14.3a2 (source:
crewai/llm.py,crewai/llms/base_llm.py,crewai/llms/providers/).
The LLM factory is doing more than it looks. It’s a __new__-time router that picks between a native provider SDK and the LiteLLM fallback based on the model name — and it’s stricter than older docs suggest.
Minimal runnable example
Section titled “Minimal runnable example”from crewai import LLM
# Native provider (OpenAI) — uses `openai` SDK directlygpt = LLM(model="openai/gpt-4o-mini", temperature=0.2)
# Provider prefix omitted — LLM infers from model namegpt = LLM(model="gpt-4o-mini")
# Non-native → LiteLLM fallback (requires litellm installed)mistral = LLM(model="mistral/mistral-large-latest")The first argument must include a recognised provider prefix if the model name is ambiguous; otherwise LLM._infer_provider_from_model tries to guess.
How routing works
Section titled “How routing works” LLM(model=..., **kwargs) │ ▼ ┌─────────────────────────────────────┐ │ 1. kwargs has `provider=...` │ │ → force that provider, native │ └─────────────────────────────────────┘ │ no ▼ ┌─────────────────────────────────────┐ │ 2. "/" in model ("openai/gpt-4o") │ │ → prefix looked up in │ │ provider_mapping │ │ → if model is in the native │ │ constants → native SDK │ │ → otherwise → LiteLLM │ └─────────────────────────────────────┘ │ no ▼ ┌─────────────────────────────────────┐ │ 3. No "/" — infer from name pattern │ │ (gpt-*/claude-*/gemini-*…) │ └─────────────────────────────────────┘ │ ▼ Native SDK? → provider module under crewai.llms.providers/ LiteLLM? → litellm.completion(...)Native providers (source: SUPPORTED_NATIVE_PROVIDERS):
| Prefix | Canonical provider | SDK package |
|---|---|---|
openai | openai | openai |
anthropic, claude | anthropic | anthropic |
azure, azure_openai | azure | openai (Azure) |
google, gemini | gemini | google-generativeai |
bedrock, aws | bedrock | boto3 |
openrouter | openrouter | openai-compatible |
deepseek, ollama, ollama_chat, hosted_vllm, cerebras, dashscope | same | openai-compatible |
Anything else (mistral/, groq/, cohere/, custom prefixes) falls through to LiteLLM.
Force LiteLLM
Section titled “Force LiteLLM”LLM(model="openai/gpt-4o-mini", is_litellm=True)Handy when you want LiteLLM’s uniform interface even for natively supported models (e.g. to reuse a LiteLLM router config).
Constructor fields (litellm-backed LLM)
Section titled “Constructor fields (litellm-backed LLM)”The default LLM class extends BaseLLM with LiteLLM-specific knobs:
| Field | Type | Notes |
|---|---|---|
model | str | Required. provider/model or a bare name. |
temperature | float | None | |
top_p, top_logprobs, logprobs | various | Standard sampling params. |
max_tokens, max_completion_tokens | int | float | None | |
response_format | JsonResponseFormat | type[BaseModel] | None | Provider-native structured output. |
stop | list[str] | Stop sequences. |
seed | int | None | Deterministic sampling where supported. |
presence_penalty, frequency_penalty, logit_bias | float / dict | OpenAI-style knobs. |
base_url | str | None | Inherited from BaseLLM. Used by native OpenAI-compatible paths (ollama/..., hosted_vllm/...). |
api_base, api_version, api_key | str | LLM-specific. api_base is the LiteLLM-style endpoint override; both base_url and api_base are passed through, so either name works for the fallback path. |
reasoning_effort | "none" | "low" | "medium" | "high" | For o1/o3/o4 and compatible reasoning models. |
thinking | Any | Anthropic extended thinking config. |
stream | bool | Global default — Crew(stream=True) overrides per run. |
callbacks | list[Any] | Passed to LiteLLM. |
timeout | float | int | None | Per-call wall clock. |
context_window_size | int | Override auto-detected context window. |
Reasoning models
Section titled “Reasoning models”from crewai import LLM
o3 = LLM(model="openai/o3", reasoning_effort="medium")
claude = LLM( model="anthropic/claude-sonnet-4-6", thinking={"type": "enabled", "budget_tokens": 4096},)reasoning_effortmaps onto OpenAI’sreasoning.effort.thinkingis passed through to Anthropic — the reasoning tokens show up inCrewOutput.token_usage.reasoning_tokenswhen enabled.- GPT-5 and the o-series intentionally ignore
stop=in 1.14+; the router strips it so you don’t have to.
Connecting to self-hosted / Ollama / vLLM
Section titled “Connecting to self-hosted / Ollama / vLLM”LLM(model="ollama/llama3.1:8b", base_url="http://localhost:11434")LLM(model="hosted_vllm/my-finetune", api_base="http://vllm:8000/v1", api_key="EMPTY")Both of these take the native OpenAI-compatible path — no LiteLLM needed.
Per-agent vs per-task LLMs
Section titled “Per-agent vs per-task LLMs”from crewai import Agent, LLM
writer = Agent( role="Writer", goal="...", backstory="...", llm=LLM(model="openai/gpt-4o-mini"), function_calling_llm=LLM(model="openai/gpt-4o"), # tool-call formatting on a smarter model)function_calling_llm is used only to format tool-call JSON arguments. Most models don’t need it; leave it unset unless your primary model struggles with JSON schemas.
Chat / manager / planner LLMs
Section titled “Chat / manager / planner LLMs”Crew accepts three extra LLM slots:
| Field | Used by |
|---|---|
manager_llm | Hierarchical process — spins up a default manager agent. |
planning_llm | Crew-level planning=True planner. |
chat_llm | crewai chat CLI against this crew. |
from crewai import Crew, Process, LLM
Crew( agents=[...], tasks=[...], process=Process.hierarchical, manager_llm=LLM(model="openai/gpt-4o"), planning=True, planning_llm=LLM(model="openai/gpt-4o-mini"),)Custom LLM — subclass BaseLLM
Section titled “Custom LLM — subclass BaseLLM”When you need a provider CrewAI doesn’t support (or you want to proxy through your own gateway), subclass BaseLLM:
from crewai.llms.base_llm import BaseLLMfrom typing import Any
class MyGatewayLLM(BaseLLM): llm_type: str = "my_gateway"
def call(self, messages: list[dict], **kwargs: Any) -> str: import httpx r = httpx.post( "https://gateway.internal/chat", headers={"Authorization": f"Bearer {self.api_key}"}, json={"model": self.model, "messages": messages, "temperature": self.temperature}, timeout=60, ) r.raise_for_status() return r.json()["text"]
agent = Agent(role="...", goal="...", backstory="...", llm=MyGatewayLLM(model="gw/llama-70b", api_key="..."))Minimum surface: a call() method. Streaming and tool-use are opt-in — see the existing native providers under crewai/llms/providers/ for reference implementations.
Patterns
Section titled “Patterns”1. Environment-driven provider switch
Section titled “1. Environment-driven provider switch”import osfrom crewai import LLM
def default_llm(): if os.getenv("USE_LOCAL"): return LLM(model="ollama/llama3.1:8b", base_url="http://localhost:11434") return LLM(model="openai/gpt-4o-mini")
agent = Agent(role="...", goal="...", backstory="...", llm=default_llm())2. Cheap primary, smart function-calling
Section titled “2. Cheap primary, smart function-calling”Agent( role="Tool User", goal="...", backstory="...", llm=LLM(model="openai/gpt-4o-mini"), function_calling_llm=LLM(model="openai/gpt-4o"),)Only the tool-calling round uses the expensive model.
3. Observed latency/cost with callbacks
Section titled “3. Observed latency/cost with callbacks”class Timing: def log_pre_api_call(self, model, messages, kwargs): self.t0 = time.perf_counter() def log_post_api_call(self, *a, **k): print("call took", time.perf_counter() - self.t0, "s")
llm = LLM(model="openai/gpt-4o", callbacks=[Timing()])Callbacks are passed through to LiteLLM — the standard LiteLLM callback API applies.
4. Structured output via provider-native mode
Section titled “4. Structured output via provider-native mode”from pydantic import BaseModelclass Plan(BaseModel): steps: list[str]
llm = LLM(model="openai/gpt-4o", response_format=Plan)Works for OpenAI, Gemini 2, and Claude’s structured-tool mode. For other providers set Task.output_pydantic instead.
5. LiteLLM routing config
Section titled “5. LiteLLM routing config”LLM(model="groq/llama3-70b", is_litellm=True, api_base=os.environ["LITELLM_PROXY"])Forces LiteLLM so you benefit from its router, rate-limiting, and fallbacks.
Gotchas
Section titled “Gotchas”- Provider prefix is checked against a whitelist. Typos (
openaii/...) fall through to LiteLLM, which then fails with a confusing error. Stick to the table above. is_anthropicandstop— the router stripsstopfor GPT-5 / o-series models in 1.14; passing it elsewhere works as expected.- LiteLLM is optional. Without it installed, any non-native model raises
ImportError. Install withpip install litellmoruv add 'crewai[litellm]'. max_tokensvsmax_completion_tokens. OpenAI o-series wants the latter; LiteLLM maps both. If you set both, the native provider path picksmax_completion_tokens.- Custom
BaseLLMsubclasses don’t inherit streaming — implementcall_with_streaming/ the streaming protocol yourself. reasoning=Trueon Agent was renamed — useplanning=True. The LLM’sreasoning_effortis separate and still valid.