Architecture Overview
Learn how Agent-Gantry is designed and how its components work together.
System Architecture
Agent-Gantry follows a layered architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ (Your Agent / LangChain / AutoGen / CrewAI) │
└─────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AGENT-GANTRY FACADE │
│ (AgentGantry Class) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────┐│
│ │ @with_ │ │ register() │ │ sync() │ │execute││
│ │semantic_tools│ │retrieve_tools│ │add_mcp_server│ │ ││
│ └─────────────┘ └─────────────┘ └─────────────┘ └────────┘│
└─────────────────────────┬───────────────────────────────────────┘
│
┌───────────────┼───────────────┬───────────────┐
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Registry │ │ Router │ │ Executor │ │ MCP │
│ │ │ │ │ │ │ Router │
└────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │ │
└──────────────┴──────────────┴──────────────┘
│
┌───────────────┼───────────────┬───────────────┐
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Vector │ │ Embedder │ │ Reranker │ │ LLM │
│ Store │ │ │ │ │ │ Client │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│
┌───────────────┼───────────────┬───────────────┐
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Python │ │ MCP │ │ REST │ │ A2A │
│ Functions│ │ Servers │ │ APIs │ │ Agents │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
Core Components
1. AgentGantry (Facade)
Purpose: Main entry point providing a unified interface for all operations.
Responsibilities:
- Tool registration and lifecycle management
- Orchestrating semantic search
- Coordinating tool execution
- Managing MCP and A2A integrations
Key Methods:
register()- Register toolssync()- Sync tools to vector storeretrieve_tools()- Semantic tool searchexecute()- Execute tools with policiesadd_mcp_server()- Integrate MCP serversserve_mcp()- Serve as MCP server
Why a Facade?
The facade pattern simplifies the API by hiding complex subsystem interactions. Users interact with a single AgentGantry object rather than managing registry, router, executor, etc. separately.
2. Tool Registry
Purpose: Centralized storage and management of tool definitions.
Responsibilities:
- Store tool metadata and implementations
- Deduplication of tools
- Namespace management
- Tool lookup and retrieval
Key Features:
- Fast in-memory lookup by name/namespace
- Support for tool versioning
- Metadata indexing
- Namespace isolation
Implementation: agent_gantry/core/registry.py
3. Semantic Router
Purpose: Intelligent tool selection using semantic similarity.
Responsibilities:
- Embed tool descriptions into vectors
- Perform similarity search against user queries
- Apply filters (namespace, tags, capabilities)
- Rerank results for relevance
Key Workflow:
1. User query → Embed query into vector
2. Vector search in tool database
3. Apply filters (tags, namespace, score threshold)
4. Optional: Rerank using LLM or cross-encoder
5. Return top-K tools
Implementation: agent_gantry/core/router.py
Why Semantic Search?
Traditional keyword matching fails for natural language queries. Semantic search uses embeddings to understand meaning, enabling queries like:
- “I need to get the current time” → matches
get_timestamp()tool - “Calculate percentage” → matches
calculate()tool
4. MCP Router (NEW in v0.1.2)
Purpose: Semantic selection of MCP servers before connecting.
Responsibilities:
- Store MCP server metadata
- Embed server descriptions into vectors
- Semantic search for relevant servers
- Health tracking for connected servers
Key Workflow:
1. Register MCP servers with metadata
2. Sync server metadata to vector store
3. User query → Find relevant servers
4. Connect only to selected servers
5. Discover and register tools from servers
Implementation: agent_gantry/core/mcp_router.py
Benefits:
- 🎯 Lazy loading - connect only when needed
- 🔒 Security - minimize attack surface
- ⚡ Performance - avoid unnecessary connections
- 📊 Health tracking - monitor server availability
5. Executor
Purpose: Secure, reliable tool execution with observability.
Responsibilities:
- Execute tool implementations
- Apply security policies and permissions
- Circuit breaker pattern for fault tolerance
- Retry logic with exponential backoff
- Telemetry and structured logging
Key Features:
Security:
- Capability-based permissions
- Input validation against tool schemas
- Sandboxing support (optional)
Reliability:
- Circuit breakers (open/half-open/closed states)
- Automatic retries with backoff
- Timeout protection
- Health metrics tracking
Observability:
- Structured logging of all executions
- Success/failure metrics
- Execution duration tracking
- Error categorization
Implementation: agent_gantry/core/executor.py
6. Adapters (Extensibility Layer)
Agent-Gantry uses the Adapter Pattern for all external integrations. This allows swapping implementations without changing core logic.
Vector Store Adapters
Purpose: Abstract vector storage and search.
Implementations:
InMemoryVectorStore- Fast, ephemeral storage (default)LanceDBVectorStore- Local persistent storage with LanceDBQdrantVectorStore- Qdrant cloud/self-hostedChromaVectorStore- Chroma DB integrationPGVectorStore- PostgreSQL with pgvector extension
Protocol: agent_gantry/adapters/vector_stores/base.py
Key Methods:
add()- Add tool embeddingssearch()- Vector similarity searchdelete()- Remove toolsget_stored_fingerprints()- Change detection
Embedder Adapters
Purpose: Convert text to vector embeddings.
Implementations:
SimpleEmbedder- Deterministic TF-IDF (no API required)NomicEmbedder- Nomic AI Matryoshka embeddings (768D, local)OpenAIEmbedder- OpenAI text-embedding-3-small/largeSentenceTransformersEmbedder- HuggingFace models (local)
Protocol: agent_gantry/adapters/embedders/base.py
Key Methods:
embed()- Single text to vectorembed_batch()- Batch processingdimension- Embedding dimensionality
Choosing an Embedder:
| Embedder | Dimension | Quality | Speed | Cost |
|---|---|---|---|---|
| SimpleEmbedder | ~1000 | Low | Fast | Free |
| NomicEmbedder | 768 | High | Fast | Free (local) |
| OpenAIEmbedder | 1536/3072 | Highest | Medium | Paid API |
| SentenceTransformers | Varies | High | Slow (CPU) | Free (local) |
Recommendation: Use NomicEmbedder for production, SimpleEmbedder for testing.
Reranker Adapters (Optional)
Purpose: Improve search quality by reranking results.
Implementations:
CohereReranker- Cohere Rerank APICrossEncoderReranker- Local sentence-transformers cross-encoder
When to use:
- Large tool libraries (100+ tools)
- When top-3 accuracy is critical
- When initial search returns many similar results
Trade-off: Reranking adds latency but improves accuracy.
Executor Adapters
Purpose: Different execution strategies for tools.
Implementations:
DirectExecutor- Direct Python function calls (default)SandboxExecutor- Execute in isolated environment (future)MCPExecutor- Execute tools via MCP protocolHTTPExecutor- Call remote REST APIsA2AExecutor- Execute A2A agent skills
Data Flow
Tool Registration Flow
┌───────────────┐
│ @register │
│ decorator │
└───────┬───────┘
│
▼
┌───────────────────┐
│ Extract metadata │
│ (name, params, │
│ description) │
└───────┬───────────┘
│
▼
┌───────────────────┐
│ Store in Registry │
│ (in-memory map) │
└───────┬───────────┘
│
▼
┌───────────────────┐
│ Mark as pending │
│ for sync │
└───────────────────┘
Semantic Search Flow
┌─────────────────┐
│ User Query │
│ "get weather" │
└────────┬────────┘
│
▼
┌─────────────────────┐
│ Embed query using │
│ embedder adapter │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Vector search in │
│ vector store │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Apply filters │
│ (namespace, tags) │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Optional: Rerank │
│ using LLM/reranker │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Return top-K tools │
│ (ToolDefinition[]) │
└─────────────────────┘
Tool Execution Flow
┌──────────────────┐
│ ToolCall │
│ (name, args) │
└────────┬─────────┘
│
▼
┌─────────────────────┐
│ Validate arguments │
│ against schema │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Check permissions │
│ & capabilities │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Check circuit │
│ breaker state │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Execute via │
│ executor adapter │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Update health │
│ metrics │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Log telemetry │
│ event │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Return ToolResult │
│ (output, metadata) │
└─────────────────────┘
Design Principles
1. Async-First
All core operations are async. This enables:
- Non-blocking I/O for network calls
- Concurrent tool execution
- Better scalability
Pattern:
# All core methods are async
await gantry.sync()
tools = await gantry.retrieve_tools("query")
result = await gantry.execute(call)
2. Schema-First with Pydantic v2
Data models are defined using Pydantic before implementation:
- Type safety
- Automatic validation
- JSON serialization
- Clear contracts
Example:
from pydantic import BaseModel, Field
class ToolDefinition(BaseModel):
name: str
description: str
parameters: dict[str, Any]
# ... validation happens automatically
3. Protocol-Based Adapters
External integrations use Python protocols (structural typing):
- No inheritance required
- Duck typing with type safety
- Easy to extend
Example:
from typing import Protocol
class VectorStoreAdapter(Protocol):
async def add(self, tools: list[ToolDefinition]) -> None: ...
async def search(self, query_vector: list[float], limit: int) -> list[tuple]: ...
Any class implementing these methods can be a vector store.
4. Separation of Concerns
Each component has a single responsibility:
- Registry: Storage only
- Router: Search and ranking only
- Executor: Execution and observability only
- AgentGantry: Orchestration only
5. Context-Local State
Uses contextvars for thread-safe and async-safe state:
from agent_gantry import set_default_gantry
set_default_gantry(gantry) # Context-local, safe for async
Extensibility Points
1. Custom Vector Store
class MyVectorStore:
async def add(self, tools, embeddings): ...
async def search(self, query_vector, limit): ...
async def delete(self, tool_ids): ...
gantry = AgentGantry(vector_store=MyVectorStore())
2. Custom Embedder
class MyEmbedder:
@property
def dimension(self) -> int: return 768
async def embed(self, text: str) -> list[float]: ...
async def embed_batch(self, texts: list[str]) -> list[list[float]]: ...
gantry = AgentGantry(embedder=MyEmbedder())
3. Custom Executor
class MyExecutor:
async def execute(self, call: ToolCall, func: Callable) -> ToolResult: ...
gantry = AgentGantry(executor=MyExecutor())
4. Custom Tool Provider
Implement a provider to import tools from external sources:
from agent_gantry.providers.base import ToolProvider
class MyProvider(ToolProvider):
async def discover_tools(self) -> list[ToolDefinition]: ...
Performance Considerations
Vector Search Latency
- InMemoryVectorStore: <10ms for 1000 tools
- LanceDB: <50ms for 100k tools
- Qdrant/Chroma: Varies by network latency
Optimization: Use local storage (LanceDB) for low latency.
Embedding Latency
- SimpleEmbedder: <1ms (deterministic)
- NomicEmbedder: <20ms (local inference)
- OpenAI: 50-200ms (API call)
Optimization: Batch embeddings during sync, cache query embeddings.
Tool Execution
Execution time depends on tool implementation. Use:
- Timeouts to prevent long-running tools
- Circuit breakers to isolate failing tools
- Async execution to run tools concurrently
Security Architecture
1. Capability-Based Permissions
Tools declare required capabilities:
@gantry.register(capabilities=["file_write"])
def write_file(path: str, content: str): ...
Executor checks permissions before execution.
2. Input Validation
All tool arguments validated against JSON Schema:
# Automatic validation
tool.parameters = {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
3. Circuit Breakers
Prevent cascading failures:
- Closed: Normal operation
- Open: Tool disabled after N failures
- Half-Open: Test recovery
4. Sandboxing (Future)
Execute untrusted tools in isolated environments.
Next Steps
- Best Practices - Production deployment patterns
- API Reference - Detailed API documentation
- Configuration - Configuration options