Knowledge sources
Verified against crewai==1.14.3a2 (source:
crewai/knowledge/knowledge.py,crewai/knowledge/source/*.py,crewai/knowledge/knowledge_config.py).
Knowledge is a read-mostly vector store attached to an Agent or Crew. Unlike Memory it’s populated ahead of time and isn’t automatically written to during a run.
Minimal runnable example
Section titled “Minimal runnable example”from crewai import Agent, Crew, Taskfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
faq = StringKnowledgeSource( content=( "Returns: accepted within 30 days, original packaging only.\n" "Shipping: free over $50 to the continental US.\n" ),)
support = Agent( role="Customer Support", goal="Answer customer questions using company policy", backstory="Careful agent who quotes policy verbatim.", knowledge_sources=[faq],)
task = Task( description="Customer asks: {question}", expected_output="Short answer citing policy.", agent=support,)
out = Crew(agents=[support], tasks=[task]).kickoff( inputs={"question": "Can I return an item after 6 weeks?"})print(out.raw)Built-in sources
Section titled “Built-in sources”All live under crewai.knowledge.source.*. They subclass BaseKnowledgeSource (or BaseFileKnowledgeSource for file-backed ones).
| Source | Import | Takes | Notes |
|---|---|---|---|
| String | StringKnowledgeSource | content: str | In-memory text. |
| Text file | TextFileKnowledgeSource | file_paths=[...] | Plain .txt, .md, etc. |
PDFKnowledgeSource | file_paths=[...] | Needs pdfplumber (pip install pdfplumber). | |
| CSV | CSVKnowledgeSource | file_paths=[...] | Concatenates rows as text. |
| JSON | JSONKnowledgeSource | file_paths=[...] | Recursively flattens to text. |
| Excel | ExcelKnowledgeSource | file_paths=[...] | Reads each sheet; needs openpyxl. |
| Docling | CrewDoclingSource | file_paths=[...] or URLs | Universal — PDF/DOCX/HTML/Markdown/AsciiDoc/images. Needs docling. |
Common fields on every source:
| Field | Type | Default | Notes |
|---|---|---|---|
chunk_size | int | 4000 | Character size per chunk. |
chunk_overlap | int | 200 | Overlap between chunks. |
metadata | dict | {} | Currently unused but preserved on the source. |
collection_name | str | None | None | Explicit collection name. Defaults to the agent role. |
File sources additionally accept file_paths=[...]. The singular file_path still works but is deprecated.
String source
Section titled “String source”from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
policy = StringKnowledgeSource(content=open("policy.txt").read(), chunk_size=1200, chunk_overlap=100)PDF / Text / CSV / JSON
Section titled “PDF / Text / CSV / JSON”from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSourcefrom crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSourcefrom crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSourcefrom crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
pdfs = PDFKnowledgeSource(file_paths=["contracts/2024.pdf", "contracts/2025.pdf"])csvs = CSVKnowledgeSource(file_paths=["sales/q1.csv"])jsons = JSONKnowledgeSource(file_paths=["config/routes.json"])texts = TextFileKnowledgeSource(file_paths=["README.md"])Paths are resolved relative to knowledge/ in the project root unless absolute — CrewAI will look there first.
Excel source
Section titled “Excel source”from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource
xl = ExcelKnowledgeSource(file_paths=["quarterly.xlsx"])Each sheet becomes an inner dict keyed by sheet name; chunks tag the sheet in the metadata when available.
Docling — universal source
Section titled “Docling — universal source”from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
docs = CrewDoclingSource(file_paths=[ "https://arxiv.org/pdf/2305.15334", # URL "internal/handbook.pdf", "internal/handbook.docx",])Accepted formats per source: MD, ASCIIDOC, PDF, DOCX, HTML, images. If docling isn’t installed, importing the class raises immediately.
Custom sources
Section titled “Custom sources”Subclass BaseKnowledgeSource (or BaseFileKnowledgeSource for file I/O). Two methods to implement:
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
class NotionSource(BaseKnowledgeSource): database_id: str
def validate_content(self) -> None: if not self.database_id: raise ValueError("database_id required")
def add(self) -> None: pages = fetch_notion_pages(self.database_id) # your code for p in pages: self.chunks.extend(self._chunk_text(p.text)) self._save_documents()_chunk_text and _save_documents are inherited — you rarely need to override them.
Attaching knowledge
Section titled “Attaching knowledge”To an agent
Section titled “To an agent”from crewai import Agentagent = Agent( role="Legal Researcher", goal="Find relevant clauses", backstory="GC with 20 years of experience.", knowledge_sources=[pdfs, faq],)The collection name defaults to the agent’s role; storage lives at ./knowledge/<collection> (LanceDB).
To a whole crew
Section titled “To a whole crew”from crewai import Crewcrew = Crew( agents=[a, b], tasks=[t1, t2], knowledge_sources=[pdfs], # shared across all agents embedder={"provider": "openai", "config": {"model": "text-embedding-3-large"}},)Agents inherit the crew embedder if they don’t set their own.
Tuning retrieval — KnowledgeConfig
Section titled “Tuning retrieval — KnowledgeConfig”from crewai import Agentfrom crewai.knowledge.knowledge_config import KnowledgeConfig
agent = Agent( role="Researcher", goal="...", backstory="...", knowledge_sources=[pdfs], knowledge_config=KnowledgeConfig(results_limit=8, score_threshold=0.5),)results_limit(default5) — how many chunks the retriever returns.score_threshold(default0.6) — similarity floor; increase for precision, lower for recall.
Querying directly
Section titled “Querying directly”from crewai import Knowledgefrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
k = Knowledge( collection_name="kb", sources=[StringKnowledgeSource(content="Python was released in 1991.")],)k.add_sources()
hits = k.query(["when was python released?"], results_limit=3, score_threshold=0.6)for h in hits: print(h)query(queries, results_limit=5, score_threshold=0.6)— synchronous.aquery(...)— async equivalent.reset()— wipe the collection.
Patterns
Section titled “Patterns”1. Versioned company KB
Section titled “1. Versioned company KB”kb = CrewDoclingSource(file_paths=["knowledge/handbook-2026-q1.pdf"])crew = Crew(agents=[...], tasks=[...], knowledge_sources=[kb])New quarter → new file, rebuild the collection (call Knowledge.reset() then add_sources()).
2. Per-agent specialised sources
Section titled “2. Per-agent specialised sources”legal = Agent(role="Legal", ..., knowledge_sources=[legal_pdfs])finops = Agent(role="FinOps", ..., knowledge_sources=[sql_schema_txt, runbooks_md])Collections don’t collide — they’re isolated by role.
3. Tighten retrieval for a precise task
Section titled “3. Tighten retrieval for a precise task”Agent( role="Compliance", goal="Quote the exact paragraph", backstory="Paranoid reader.", knowledge_sources=[pdfs], knowledge_config=KnowledgeConfig(results_limit=3, score_threshold=0.75),)4. Reuse one Knowledge across crews
Section titled “4. Reuse one Knowledge across crews”from crewai import Knowledge
k = Knowledge(collection_name="shared-kb", sources=[pdfs, docs])k.add_sources() # build once
crew_a = Crew(..., knowledge=k)crew_b = Crew(..., knowledge=k)Pass knowledge= (a built Knowledge instance) instead of knowledge_sources= to skip re-indexing.
5. Mixed URL + local ingestion
Section titled “5. Mixed URL + local ingestion”CrewDoclingSource(file_paths=[ "https://developer.mozilla.org/en-US/docs/Web/API", "internal/onboarding.pdf",])Docling treats URLs and local paths the same as long as the format is supported.
Gotchas
Section titled “Gotchas”knowledge/is the default root. Relative paths are resolved there; absolute paths work everywhere.- Embedder needed. Either pass
embedder=on the agent/crew, or setOPENAI_API_KEYfor the default. PDFKnowledgeSourceuses pdfplumber, not the Docling path. If you need tables/images extracted, useCrewDoclingSource.- Chunking is naive by default — a sliding window over characters. Override
_chunk_textin a subclass if you need sentence-aware splits. file_pathis deprecated. Always usefile_paths=[...], even for a single file.- Knowledge is read-heavy. It doesn’t auto-update. For dynamic content, use Memory or rebuild the collection.
knowledge_storage=on an agent overrides the default LanceDB store. Pass a customBaseKnowledgeStorageif you need Qdrant/Pinecone/etc.