Introduction
TL;DR: Igris Overture is a decision intelligence and routing control plane for AI workloads. It decides what to route, where to send it, and why — with trust-aware provider selection, cost/quality/latency optimization, and explainable decisions.
The Problem
You're using OpenAI. Then GPT-4 goes down. Or gets rate-limited. Or you realize you're burning $10k/month on requests that could run on cheaper models.
You add Anthropic as a backup. Now you're managing two APIs, two auth flows, two error handlers. Then you want to try Google Gemini for certain tasks...
This is the complexity. Igris Overture automates it.
What Igris Overture Does
Overture is the decision intelligence layer that determines what to route, where to send it, and why. One API call becomes an explainable routing decision backed by trust verification and observability.
# Instead of this mess:
try:
openai_response = openai.chat.completions.create(...)
except OpenAIError:
try:
anthropic_response = anthropic.messages.create(...)
except AnthropicError:
# Give up or try another provider...
# Do this:
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
# Igris Overture handles routing, failover, and optimization
What You Get
Drop-in Replacement:
- OpenAI-compatible API
- Change one URL, done
- Use any existing OpenAI SDK
Decision Intelligence:
- Thompson Sampling with cold-start protection
- Trust-aware provider selection (observed vs reported verification)
- Cost, quality, and latency optimization
- Explainable routing traces
Reliability:
- Circuit breaker and automatic failover
- Provider health monitoring
- Automatic retries with exponential backoff
Observability:
- 150+ metrics and comprehensive monitoring
- Per-request cost tracking
- Routing decision traces with rejection reasons
- Distributed tracing with correlation IDs
Quick Example
Before: Managing Multiple Providers
# Manually manage multiple providers
import openai
import anthropic
import google.generativeai as genai
# Configure all providers
openai.api_key = os.getenv("OPENAI_API_KEY")
anthropic_client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
# Try OpenAI first
try:
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
except Exception as e:
# Fallback to Anthropic
try:
response = anthropic_client.messages.create(
model="claude-3-opus-20240229",
messages=[{"role": "user", "content": prompt}]
)
except Exception as e:
# Fallback to Google...
response = genai.GenerativeModel('gemini-pro').generate_content(prompt)
# Manual cost tracking, no optimization, fragile error handling
After: One API, Automatic Routing
from openai import OpenAI
# Point to Igris Overture
client = OpenAI(
base_url="https://api.igrisinertial.com/v1",
api_key="your-api-key"
)
# Make your request - Igris Overture handles everything
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
# Automatic routing, failover, cost optimization, and tracking
# Response includes metadata about routing decision
print(response.metadata)
# {
# "provider": "anthropic",
# "routing_decision": "thompson-sampling",
# "cost_usd": 0.00034,
# "latency_ms": 187
# }
Result:
- One API instead of three
- Automatic failover (no try/except hell)
- Cost optimization (Thompson Sampling learns best provider)
- Complete observability (every request tracked)
Core Features
1. Thompson Sampling Routing
Bayesian multi-armed bandit algorithm that learns which provider is best for your workload.
How it works:
- Tracks success rate, latency, and cost for each provider
- Balances exploration (trying new providers) with exploitation (using the best one)
- Adapts in real-time as provider performance changes
Result: 30-40% cost reduction by automatically selecting cheaper providers when quality is equivalent.
2. Automatic Failover
# Primary provider fails
curl -X POST https://api.igrisinertial.com/v1/infer \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Response shows automatic failover
{
"choices": [...],
"metadata": {
"provider": "anthropic", # Fallback provider used
"primary_provider": "openai", # Primary was unavailable
"fallback": true,
"latency_ms": 234
}
}
3. Semantic Routing
ML-powered classifier routes requests based on intent:
- Code generation → DeepSeek Coder
- Creative writing → Claude 3
- Math/reasoning → GPT-4
- General Q&A → GPT-3.5 Turbo
Result: Right provider for the task = better quality + lower cost.
4. Real-Time Cost Tracking
Every request returns cost breakdown:
{
"usage": {
"prompt_tokens": 15,
"completion_tokens": 42,
"total_tokens": 57
},
"metadata": {
"cost_usd": 0.00171,
"provider": "openai",
"model": "gpt-4"
}
}
Track spending per tenant, per provider, per model in real-time.
Production Features
Multi-Tenancy with BYOK
Each tenant brings their own provider API keys:
- Encrypted storage
- Per-tenant routing policies
- Complete isolation (no cross-tenant data leakage)
Budget Enforcement
Set monthly budgets per tenant:
- Soft limit warning at 90%
- Hard limit block at 100% (HTTP 402 Payment Required)
- Real-time spend tracking
Observability
Comprehensive metrics:
- Request success rate, latency, and throughput
- Cost tracking by provider and model
- Routing decision insights
Distributed tracing:
- Every request gets a trace ID
- Correlate across providers, retries, failovers
Audit logs:
- Who made what request when
- Policy changes tracked
- Cost anomaly alerts
Deployment Options
Cloud Hosted (All Tiers)
We run it for you. Zero ops overhead.
from openai import OpenAI
client = OpenAI(
base_url="https://api.igrisinertial.com/v1",
api_key="your-api-key"
)
Self-Hosted (Scale Tier)
Run in your own infrastructure with full control:
- Deploy to AWS, GCP, Azure, or on-premises
- Bring your own monitoring and observability stack
- Custom compliance and security requirements
- Full access to source code
Getting Started
5-Minute Quick Start
- Start free trial (no credit card)
- Point your OpenAI SDK to Igris Overture
- Make requests - automatic routing enabled
- Check dashboard - see cost savings and routing decisions
Next Steps
- Quick Start Guide - Set up in 5 minutes
- SDK Usage - Go, Python, OpenAI-compatible
- API Reference - Complete API docs
Support
- Docs: github.com/Igris-inertial/docs
- GitHub: github.com/Igris-inertial/system
- Discord: Join community
- Email: support@igrisinertial.com