LlamaIndex Architecture and System Diagrams
LlamaIndex Architecture and System Diagrams
Section titled “LlamaIndex Architecture and System Diagrams”Comprehensive visual documentation of LlamaIndex components, workflows, and patterns.
Table of Contents
Section titled “Table of Contents”- Core Architecture
- Data Pipeline
- RAG Systems
- Agent Workflows
- Multi-Agent Systems
- Query Processing
- Index Types
- Memory and Context
- Advanced Patterns
Core Architecture
Section titled “Core Architecture”LlamaIndex System Overview
Section titled “LlamaIndex System Overview”┌─────────────────────────────────────────────────────────────┐│ Application Layer ││ (Agents, Query Engines, Chat Interfaces, APIs) │└────────────────────────┬────────────────────────────────────┘ │ ┌───────────────┼───────────────┐ │ │ │ ┌────▼────┐ ┌─────▼─────┐ ┌────▼────┐ │ Agents │ │ Query │ │ Chat │ │ (ReAct, │ │ Engines │ │Interface│ │Function)│ │(Router, │ │ │ └────┬────┘ │Sub-Qs) │ └────┬────┘ │ └─────┬─────┘ │ └────────────┬─┴──────────────┘ │ ┌────────────▼──────────────┐ │ RAG Pipeline & Tools │ │ (Retrievers, Rerankers, │ │ Postprocessors) │ └────────────┬──────────────┘ │ ┌────────────▼──────────────┐ │ Indexing & Storage │ │ (VectorStoreIndex, │ │ TreeIndex, ListIndex) │ └────────────┬──────────────┘ │ ┌────────────▼──────────────┐ │ Data Layer │ │ (Documents, Nodes, │ │ Loaders, Embeddings) │ └────────────┬──────────────┘ │ ┌────────────▼──────────────┐ │ External Integrations │ │ (LLMs, Vector Stores, │ │ Memory Systems) │ └──────────────────────────┘Component Communication
Section titled “Component Communication” ┌─────────────┐ │ User │ │ Query │ └──────┬──────┘ │ ┌──────▼──────────┐ │ Query Engine │ │ (Orchestrator) │ └──────┬──────────┘ │ ┌─────────┼─────────┐ │ │ │┌───▼──┐ ┌───▼──┐ ┌───▼──┐│Ret. │ │Post │ │Output││ │ │Proc. │ │Form. │└───┬──┘ └───┬──┘ └───┬──┘ │ │ │ └────┬───┴───┬────┘ │ │ ┌────▼──┐ ┌─▼────┐ │Vector │ │ LLM │ │Store │ │ │ └───────┘ └──────┘Data Pipeline
Section titled “Data Pipeline”Document Processing Pipeline
Section titled “Document Processing Pipeline”Input Sources │ ├─ PDFs ┐ ├─ Text Files │ ├─ Web Pages ├─▶ Data Loaders ┐ ├─ Databases │ (100+) │ ├─ APIs │ ├─▶ Documents └─ Custom │ │ (Structured │ │ Text + Meta) │ │ └──────────────────┘ │ ▼ ┌──────────────────┐ │ Text Splitting │ │ (Chunk Strategy) │ └──────┬───────────┘ │ ▼ ┌──────────────────┐ │ Create Nodes │ │ (Add Metadata) │ └──────┬───────────┘ │ ▼ ┌──────────────────┐ │ Generate │ │ Embeddings │ └──────┬───────────┘ │ ▼ ┌──────────────────┐ │ Store in Vector │ │ Store & Index │ └──────────────────┘Node Structure
Section titled “Node Structure”┌─────────────────────────────────┐│ TextNode │├─────────────────────────────────┤│ • node_id: "unique-id" ││ • text: "Content..." ││ • embedding: [0.1, 0.2, ...] ││ • metadata: { ││ page: 1, ││ source: "doc.pdf", ││ section: "intro" ││ } ││ • relationships: { ││ next: node_2, ││ source: doc_1 ││ } │└─────────────────────────────────┘RAG Systems
Section titled “RAG Systems”Complete RAG Pipeline
Section titled “Complete RAG Pipeline”┌────────────────────────┐│ Question/Query │└────────────┬───────────┘ │ ▼ ┌────────────────────┐ │ Query Processing │ │ (Expansion, │ │ Rewriting) │ └────────┬───────────┘ │ ▼ ┌────────────────────┐ │ Retrieval │ │ (Semantic Search │ │ + Ranking) │ └────────┬───────────┘ │ ┌────────▼───────────┐ │ Retrieved Context │ │ + User Query │ └────────┬───────────┘ │ ▼ ┌────────────────────┐ │ LLM Processing │ │ (Generation) │ └────────┬───────────┘ │ ▼ ┌────────────────────┐ │ Generated Response │ │ + Citations │ └────────────────────┘Retrieval Architecture
Section titled “Retrieval Architecture”User Query │ ▼Embedding → Vector │ ▼Vector Store Search (Similarity) │ ├─▶ Top-K Similar Nodes │ ▼Postprocessors │ ├─▶ Similarity Threshold Filter ├─▶ Reranking (LLM/Semantic) ├─▶ Deduplication └─▶ Metadata Filtering │ ▼Final Ranked Nodes │ ▼Context Building (with Metadata)Hybrid Search Architecture
Section titled “Hybrid Search Architecture”Query │ ├─ Vector Search Branch Keyword Search Branch │ │ │ │ ├─ Embedding Generation ├─ BM25 Tokenization │ │ │ │ │ │ │ ├─ Vector Store Lookup │ ├─ Inverted Index Lookup │ │ │ │ │ │ │ │ │ │ └─▶ Similarity Score │ │ └─▶ Relevance Score │ │ │ │ │ │ └─▶ Results Set A └─▶ Results Set B │ ├─ Merge Results │ (Weighted Combination) │ ▼Unified Ranked ResultsAgent Workflows
Section titled “Agent Workflows”ReAct Loop
Section titled “ReAct Loop”┌─────────────────────┐│ User Query/Goal │└──────────┬──────────┘ │ ▼┌─────────────────────┐│ OBSERVE ││ - Current State ││ - Available Tools ││ - Context │└──────────┬──────────┘ │ ▼┌─────────────────────┐│ THINK ││ LLM Reasons About ││ What To Do Next │└──────────┬──────────┘ │ ▼ ┌──────────────┐ │ Decision │ ├──────────────┤ │ Use Tool? │ └──┬────────┬──┘ │ │ YES NO │ │ ▼ ▼ ┌───────┐ ┌─────────────┐ │ ACT │ │ Generate │ │ │ │ Final │ │Execute│ │ Answer │ │Tool │ │ │ └───┬───┘ └─────────────┘ │ │ └─────┬──────┘ │ ▼ ┌─────────────┐ │ Loop Check │ ├─────────────┤ │ Max Iters? │ │ Done? │ └──┬─────┬────┘ │ │ YES NO │ │ │ └──▶ Loop Back │ to OBSERVE ▼ ┌─────────────┐ │ Return │ │ Response │ └─────────────┘Function Calling Agent
Section titled “Function Calling Agent”Query │ ▼LLM with Function Definitions │ ├─ Parse Available Functions │ ▼Parallel Function Calls (Async) │ ├─▶ Function 1 ──▶ Result 1 ├─▶ Function 2 ──▶ Result 2 └─▶ Function 3 ──▶ Result 3 │ ▼Aggregate Results │ ▼Final SynthesisTool Integration
Section titled “Tool Integration” Agent │ ┌─────────────┼─────────────┐ │ │ │ ┌───▼────┐ ┌───▼────┐ ┌───▼────┐ │ Tool 1 │ │ Tool 2 │ │ Tool 3 │ │ │ │ │ │ │ │┌──────┐│ │┌──────┐│ │┌──────┐│ ││Input ││ ││Input ││ ││Input ││ │└───┬──┘│ │└───┬──┘│ │└───┬──┘│ │ │ │ │ │ │ │ │ │ │ └─┬─┘ │ └─┬─┘ │ └─┬─┘ │ │ │ │ │ │ └──┬───┘ └──┬───┘ └──┬───┘ │ │ │ ├────────────┼────────────┤ │ │ │ ▼ ▼ ▼ Result1 Result2 Result3 │ │ │ └────────────┼────────────┘ │ ▼ Combined Results │ ▼ Agent ResponseMulti-Agent Systems
Section titled “Multi-Agent Systems”Multi-Agent Architecture
Section titled “Multi-Agent Architecture”┌───────────────────────────────────────┐│ Control Plane / Orchestrator ││ (Task Distribution, Coordination) │└────────────┬────────────────────────┬──┘ │ │ ┌────────▼────────┐ ┌────────▼────────┐ │ Message Queue │◀────▶│ State Manager │ │ (Redis/RabbitMQ)│ │ (Consensus) │ └────────┬────────┘ └────────┬────────┘ │ │ ┌────────┼─────────────────────────┼────────┐ │ │ │ │┌───▼──┐ ┌──▼────┐ ┌──▼───┐ ┌──▼──┐│Agent1│ │Agent 2│ ... Agent N │Monit.│ │Cache││ │ │ │ │ │ │ ││Spec │ │Special│ │ │ │ ││Task1 │ │Task2 │ │ │ │ │└──────┘ └───────┘ └──────┘ └─────┘Agent Communication Flow
Section titled “Agent Communication Flow”Agent A Agent B │ │ ├─▶ "I need data on topic X" │ │ │ │ ◀─ "Requesting from service" ───┤ │ │ │ ┌─ Message Queue (RabbitMQ) │ │ │ ├─ Topic: retrieve_data │ │ │ ├─ Priority: high │ │ │ └─ Callback_queue: agent_a │ │ │ │ │ └──────────────────────────────▶│ │ │ │ Process │ │ Data │ │ │ │ ◀────────────────────────────────┤ │ "Data ready at /storage/x" │ │ │ Continue Processing DoneQuery Processing
Section titled “Query Processing”Query Engine Selection Flow
Section titled “Query Engine Selection Flow”User Query │ ▼Router Engine │ ├─ Query Analysis │ ├─ Domain detection │ ├─ Complexity analysis │ └─ Intent extraction │ ├─ Engine Candidates │ ├─ Vector Search Engine │ ├─ Keyword Search Engine │ ├─ Hierarchical Engine │ └─ Hybrid Engine │ ▼LLM Selection Logic │ ├─ Score each engine ├─ Rank by relevance └─ Select best match │ ▼Selected Query Engine │ ▼Retrieval + Generation │ ▼ResponseSub-Question Decomposition
Section titled “Sub-Question Decomposition”Complex Query │ ▼LLM Decomposition │ ├─▶ Sub-Question 1 ├─▶ Sub-Question 2 ├─▶ Sub-Question 3 └─▶ Sub-Question 4 │ ├──────────────────┐ │ Parallel Query │ │ Execution │ │ (async/await) │ └──┬───────────────┘ │ ├─▶ [Results 1] ──────┐ ├─▶ [Results 2] ──────┼─▶ Results Pool ├─▶ [Results 3] ──────┤ └─▶ [Results 4] ──────┘ │ ▼ LLM Synthesis │ ▼ Final AnswerIndex Types
Section titled “Index Types”Index Comparison
Section titled “Index Comparison”┌──────────────────────────────────────────────────────┐│ Index Type Comparison │├──────────────────────────────────────────────────────┤│ ││ VectorStoreIndex ││ ├─ Fast semantic search ││ ├─ Requires embeddings ││ ├─ Memory: Moderate ││ └─ Use: General RAG, semantic queries ││ ││ ListIndex ││ ├─ Simple sequential retrieval ││ ├─ No embeddings needed ││ ├─ Memory: Low ││ └─ Use: Small datasets, similarity scoring ││ ││ TreeIndex ││ ├─ Hierarchical decomposition ││ ├─ Creates tree structure ││ ├─ Memory: Moderate-High ││ └─ Use: Complex documents, multi-level queries ││ ││ KeywordTableIndex ││ ├─ Keyword-based lookup ││ ├─ Inverted index structure ││ ├─ Memory: Low-Moderate ││ └─ Use: Keyword-heavy queries, exact matches ││ ││ SummaryIndex ││ ├─ Summary-based retrieval ││ ├─ LLM-generated summaries ││ ├─ Memory: High ││ └─ Use: Collection-level queries ││ │└──────────────────────────────────────────────────────┘Vector Store Integration
Section titled “Vector Store Integration”Index Creation │ ├─▶ Document Processing │ ├─ Chunking │ └─ Embedding │ ├─▶ Vector Generation │ ├─ OpenAI Embeddings │ ├─ HuggingFace Embeddings │ └─ Local Models │ ▼Vector Store Options │ ├─ Chroma (Local/Vector DB) ├─ Pinecone (Cloud) ├─ Weaviate (Distributed) ├─ Supabase (PostgreSQL+pgvector) ├─ Milvus (Scalable) └─ Many more... │ ▼Index Ready for QueriesMemory and Context
Section titled “Memory and Context”Chat Memory Buffer
Section titled “Chat Memory Buffer”┌──────────────────────────────────┐│ ChatMemoryBuffer │├──────────────────────────────────┤│ ││ Max Tokens: 2000 ││ ││ Turn 1: ││ ├─ User: "Hello" ││ └─ Assistant: "Hi there!" ││ ││ Turn 2: ││ ├─ User: "What did I ask?" ││ └─ Assistant: "You said hello" ││ ││ ... (recent context maintained)│ ││ Token Count: ││ ├─ Current: 1850 / 2000 ││ ├─ When full: Prune oldest ││ └─ Keep recent turns ││ │└──────────────────────────────────┘Memory Types
Section titled “Memory Types” Memory Systems │ ┌────────────────┼────────────────┐ │ │ │ ┌───▼──┐ ┌───▼──┐ ┌───▼──┐ │ Chat │ │Vector│ │Summary│ │Memory│ │Store │ │Memory │ │Buffer│ │Memory│ │ │ └──┬───┘ └──┬───┘ └──┬───┘ │ │ │ ├─ Last N ├─ Semantic ├─ Condensed │ messages │ search │ history │ │ over past │ ├─ Fast │ turns ├─ Compact │ retrieval │ │ storage │ ├─ Chroma, │ └─ Limited │ Pinecone, └─ LLM context │ Weaviate │ generated │ │ └─ Flexible └─ Summarized retentionAdvanced Patterns
Section titled “Advanced Patterns”Agentic RAG
Section titled “Agentic RAG”Query │ ▼Agent Reasoning │ ├─ Plan retrieval strategy ├─ Determine if retrieval needed └─ Plan reasoning steps │ ▼Conditional Retrieval │ ├─ Query 1: Search for context ├─ Evaluate results ├─ Query 2: Fetch specific details └─ Combine all retrieved data │ ▼LLM Reasoning + Generation │ ├─ Reason over retrieved data ├─ Apply multi-step logic ├─ Verify answers └─ Generate response │ ▼Response with VerificationSelf-Reflection Loop
Section titled “Self-Reflection Loop”Initial Response Generation │ ▼ ┌──────────────┐ │ Reflection │ │ Analysis │ │ (LLM) │ └──────┬───────┘ │ ┌──▼──┐ │Pass?│ └──┬──┘ │ ┌────┴────┐ │ │ YES NO │ │ ▼ ▼ Return ┌──────────────┐ │ Improvement │ │ Suggestions │ │ (LLM) │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Regenerate │ │ Response │ │ (Improved) │ └──────┬───────┘ │ ▼ ReturnTool Chaining
Section titled “Tool Chaining”Task │ ▼[Tool A] ──▶ Output A │ │ (Depends on A) ▼[Tool B] ──▶ Output B │ │ (Depends on B) ▼[Tool C] ──▶ Output C │ │ (Aggregate) ▼Final ResultRouter Pattern
Section titled “Router Pattern” Query │ ▼ Router Agent │ ┌──────┼──────┬──────┐ │ │ │ │ ▼ ▼ ▼ ▼ [Sales] [Support] [Technical] [Product] Agent Agent Agent Agent │ │ │ │ └──────┼──────┼──────┘ │ ▼ Route Query to Selected Agent │ ▼ ResponsePerformance Optimization
Section titled “Performance Optimization”Retrieval Optimization
Section titled “Retrieval Optimization”Query │ ▼Preprocessing (5ms) │ ├─ Normalization └─ Expansion │ ▼Embedding (10ms) │ ▼Vector Search (50ms) │ ├─ Fast approximate search ├─ Top-K candidates └─ Initial ranking │ ▼Reranking (20ms) │ ├─ Semantic reranking ├─ Relevance scoring └─ Final ranking │ ▼Generation (100ms) │ ▼Total: ~185msCaching Strategy
Section titled “Caching Strategy”Incoming Query │ ▼┌─────────────────┐│ Cache Check ││ (Fast) │└────┬────────────┘ │ ┌──▼──┐ │Hit? │ └──┬──┘ │ ┌──┴──┐ │ │ YES NO │ │ │ └─▶ Cache Miss │ │ │ ├─ Retrieve │ ├─ Generate │ ├─ Store in Cache │ │ ▼ ▼Return Cached Return GeneratedResponse Response + CacheThis diagrams document provides visual representations of all major LlamaIndex components and workflows. Each diagram illustrates the flow of data and control through different aspects of the framework.