Architecture Overview

TL;DR: Igris Overture is a high-performance routing platform that sits between your app and LLM providers, intelligently selecting the best provider for each request.


How It Works

Your Application
      ↓
Igris Overture API
      ↓
   Intelligent Router
   ├─ Thompson Sampling (learns best provider)
   ├─ Semantic Routing (matches task to provider)
   ├─ Cost Optimization (cheapest quality option)
   └─ Automatic Failover (if provider fails)
      ↓
Multiple LLM Providers
├─ OpenAI
├─ Anthropic
├─ Google Gemini
├─ DeepSeek
└─ Mistral

What Happens to Your Request

  1. Your app makes a request to Igris Overture (same format as OpenAI)
  2. Router analyzes the request and selects the best provider
  3. Request goes to the selected provider (OpenAI, Anthropic, etc.)
  4. Response comes back with metadata about cost, latency, and routing
  5. Metrics are tracked for future optimization

Key Components

Intelligent Router

The brain of Igris Overture. Uses multiple algorithms to select the right provider:

  • Thompson Sampling: Learns which provider performs best for your workload
  • Semantic Routing: Matches request types to optimal providers (e.g., code → DeepSeek, creative → Claude)
  • Cost-Aware: Balances quality and cost to minimize spend
  • Circuit Breakers: Detects provider failures and routes around them

Provider Integrations

Direct integrations with major LLM providers:

  • OpenAI (GPT-4, GPT-3.5, GPT-4 Turbo)
  • Anthropic (Claude 3 family)
  • Google (Gemini Pro, Gemini Ultra)
  • DeepSeek (Coder, Chat)
  • Mistral (Large, Medium)
  • Custom providers via API

Observability

Complete visibility into every request:

  • Real-time cost tracking
  • Latency monitoring
  • Provider performance metrics
  • Distributed tracing with correlation IDs

Advanced Features

Speculative Execution

Race multiple providers in parallel and use the fastest response:

  • 60% faster time-to-first-token
  • Automatic cancellation of slower providers
  • Cost-aware (stops if waste exceeds threshold)
  • Available on Growth+ tiers
Council Mode

Send requests to multiple providers and compare responses:

  • Detect hallucinations via cross-validation
  • Get consensus answers for critical queries
  • Quality scoring and best response selection
  • Available on Scale tier
Multi-Tenancy

Full tenant isolation with BYOK (Bring Your Own Key):

  • Each tenant uses their own provider API keys
  • Per-tenant budgets and rate limits
  • Usage tracking and billing
  • Complete data isolation

Deployment Options

Cloud Hosted

We run it for you. Zero ops overhead.

  • Fully managed infrastructure
  • Automatic scaling
  • 99.9% uptime SLA (Scale tier)
  • Global edge locations

Self-Hosted

Run in your own infrastructure (Scale tier):

  • Full control over deployment
  • Deploy to AWS, GCP, Azure, or on-prem
  • Bring your own observability stack
  • Custom compliance requirements

Technology Highlights

Built for performance and reliability:

  • High-throughput HTTP: Handles 10k+ requests per second
  • Sub-10ms routing overhead: Minimal latency added
  • Production-ready: Circuit breakers, retries, timeouts
  • Secure: TLS encryption, API key auth, tenant isolation

Next Steps


Learn More

Want to dive deeper into the internals? Check out our GitHub repository for architecture diagrams, design docs, and implementation details.

View on GitHub →