Architecture Overview

TL;DR: Igris Overture is a high-performance routing platform that sits between your app and LLM providers, intelligently selecting the best provider for each request.

How It Works

Your Application
      ↓
Igris Overture API
      ↓
   Intelligent Router
   ├─ Thompson Sampling (learns best provider)
   ├─ Semantic Routing (matches task to provider)
   ├─ Cost Optimization (cheapest quality option)
   └─ Automatic Failover (if provider fails)
      ↓
Multiple LLM Providers
├─ OpenAI
├─ Anthropic
├─ Google Gemini
├─ DeepSeek
└─ Mistral

What Happens to Your Request

Your app makes a request to Igris Overture (same format as OpenAI)
Router analyzes the request and selects the best provider
Request goes to the selected provider (OpenAI, Anthropic, etc.)
Response comes back with metadata about cost, latency, and routing
Metrics are tracked for future optimization

Key Components

Intelligent Router

The brain of Igris Overture. Uses multiple algorithms to select the right provider:

Thompson Sampling: Learns which provider performs best for your workload
Semantic Routing: Matches request types to optimal providers (e.g., code → DeepSeek, creative → Claude)
Cost-Aware: Balances quality and cost to minimize spend
Circuit Breakers: Detects provider failures and routes around them

Provider Integrations

Direct integrations with major LLM providers:

OpenAI (GPT-4, GPT-3.5, GPT-4 Turbo)
Anthropic (Claude 3 family)
Google (Gemini Pro, Gemini Ultra)
DeepSeek (Coder, Chat)
Mistral (Large, Medium)
Custom providers via API

Observability

Complete visibility into every request:

Real-time cost tracking
Latency monitoring
Provider performance metrics
Distributed tracing with correlation IDs

Advanced Features

Speculative Execution

Race multiple providers in parallel and use the fastest response:

60% faster time-to-first-token
Automatic cancellation of slower providers
Cost-aware (stops if waste exceeds threshold)
Available on Growth+ tiers

Council Mode

Send requests to multiple providers and compare responses:

Detect hallucinations via cross-validation
Get consensus answers for critical queries
Quality scoring and best response selection
Available on Scale tier

Multi-Tenancy

Full tenant isolation with BYOK (Bring Your Own Key):

Each tenant uses their own provider API keys
Per-tenant budgets and rate limits
Usage tracking and billing
Complete data isolation

Deployment Options

Cloud Hosted

We run it for you. Zero ops overhead.

Fully managed infrastructure
Automatic scaling
99.9% uptime SLA (Scale tier)
Global edge locations

Self-Hosted

Run in your own infrastructure (Scale tier):

Full control over deployment
Deploy to AWS, GCP, Azure, or on-prem
Bring your own observability stack
Custom compliance requirements

Technology Highlights

Built for performance and reliability:

High-throughput HTTP: Handles 10k+ requests per second
Sub-10ms routing overhead: Minimal latency added
Production-ready: Circuit breakers, retries, timeouts
Secure: TLS encryption, API key auth, tenant isolation

Next Steps

Quick Start - Get started in 5 minutes
Routing Policies - Configure intelligent routing
Observability - Monitor your requests
Multi-Tenancy - Set up tenant isolation

Learn More

Want to dive deeper into the internals? Check out our GitHub repository for architecture diagrams, design docs, and implementation details.

View on GitHub →