Architecture Overview
TL;DR: Igris Overture is a high-performance routing platform that sits between your app and LLM providers, intelligently selecting the best provider for each request.
How It Works
Your Application
↓
Igris Overture API
↓
Intelligent Router
├─ Thompson Sampling (learns best provider)
├─ Semantic Routing (matches task to provider)
├─ Cost Optimization (cheapest quality option)
└─ Automatic Failover (if provider fails)
↓
Multiple LLM Providers
├─ OpenAI
├─ Anthropic
├─ Google Gemini
├─ DeepSeek
└─ Mistral
What Happens to Your Request
- Your app makes a request to Igris Overture (same format as OpenAI)
- Router analyzes the request and selects the best provider
- Request goes to the selected provider (OpenAI, Anthropic, etc.)
- Response comes back with metadata about cost, latency, and routing
- Metrics are tracked for future optimization
Key Components
Intelligent Router
The brain of Igris Overture. Uses multiple algorithms to select the right provider:
- Thompson Sampling: Learns which provider performs best for your workload
- Semantic Routing: Matches request types to optimal providers (e.g., code → DeepSeek, creative → Claude)
- Cost-Aware: Balances quality and cost to minimize spend
- Circuit Breakers: Detects provider failures and routes around them
Provider Integrations
Direct integrations with major LLM providers:
- OpenAI (GPT-4, GPT-3.5, GPT-4 Turbo)
- Anthropic (Claude 3 family)
- Google (Gemini Pro, Gemini Ultra)
- DeepSeek (Coder, Chat)
- Mistral (Large, Medium)
- Custom providers via API
Observability
Complete visibility into every request:
- Real-time cost tracking
- Latency monitoring
- Provider performance metrics
- Distributed tracing with correlation IDs
Advanced Features
Speculative Execution
Race multiple providers in parallel and use the fastest response:
- 60% faster time-to-first-token
- Automatic cancellation of slower providers
- Cost-aware (stops if waste exceeds threshold)
- Available on Growth+ tiers
Council Mode
Send requests to multiple providers and compare responses:
- Detect hallucinations via cross-validation
- Get consensus answers for critical queries
- Quality scoring and best response selection
- Available on Scale tier
Multi-Tenancy
Full tenant isolation with BYOK (Bring Your Own Key):
- Each tenant uses their own provider API keys
- Per-tenant budgets and rate limits
- Usage tracking and billing
- Complete data isolation
Deployment Options
Cloud Hosted
We run it for you. Zero ops overhead.
- Fully managed infrastructure
- Automatic scaling
- 99.9% uptime SLA (Scale tier)
- Global edge locations
Self-Hosted
Run in your own infrastructure (Scale tier):
- Full control over deployment
- Deploy to AWS, GCP, Azure, or on-prem
- Bring your own observability stack
- Custom compliance requirements
Technology Highlights
Built for performance and reliability:
- High-throughput HTTP: Handles 10k+ requests per second
- Sub-10ms routing overhead: Minimal latency added
- Production-ready: Circuit breakers, retries, timeouts
- Secure: TLS encryption, API key auth, tenant isolation
Next Steps
- Quick Start - Get started in 5 minutes
- Routing Policies - Configure intelligent routing
- Observability - Monitor your requests
- Multi-Tenancy - Set up tenant isolation
Learn More
Want to dive deeper into the internals? Check out our GitHub repository for architecture diagrams, design docs, and implementation details.