Introduction

TL;DR: Igris Overture is a decision intelligence and routing control plane for AI workloads. It decides what to route, where to send it, and why — with trust-aware provider selection, cost/quality/latency optimization, and explainable decisions.

The Problem

You're using OpenAI. Then GPT-4 goes down. Or gets rate-limited. Or you realize you're burning $10k/month on requests that could run on cheaper models.

You add Anthropic as a backup. Now you're managing two APIs, two auth flows, two error handlers. Then you want to try Google Gemini for certain tasks...

This is the complexity. Igris Overture automates it.

What Igris Overture Does

Overture is the decision intelligence layer that determines what to route, where to send it, and why. One API call becomes an explainable routing decision backed by trust verification and observability.

# Instead of this mess:
try:
    openai_response = openai.chat.completions.create(...)
except OpenAIError:
    try:
        anthropic_response = anthropic.messages.create(...)
    except AnthropicError:
        # Give up or try another provider...

# Do this:
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
# Igris Overture handles routing, failover, and optimization

What You Get

Drop-in Replacement:

OpenAI-compatible API
Change one URL, done
Use any existing OpenAI SDK

Decision Intelligence:

Thompson Sampling with cold-start protection
Trust-aware provider selection (observed vs reported verification)
Cost, quality, and latency optimization
Explainable routing traces

Reliability:

Circuit breaker and automatic failover
Provider health monitoring
Automatic retries with exponential backoff

Observability:

150+ metrics and comprehensive monitoring
Per-request cost tracking
Routing decision traces with rejection reasons
Distributed tracing with correlation IDs

Quick Example

Before: Managing Multiple Providers

# Manually manage multiple providers
import openai
import anthropic
import google.generativeai as genai

# Configure all providers
openai.api_key = os.getenv("OPENAI_API_KEY")
anthropic_client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Try OpenAI first
try:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
except Exception as e:
    # Fallback to Anthropic
    try:
        response = anthropic_client.messages.create(
            model="claude-3-opus-20240229",
            messages=[{"role": "user", "content": prompt}]
        )
    except Exception as e:
        # Fallback to Google...
        response = genai.GenerativeModel('gemini-pro').generate_content(prompt)

# Manual cost tracking, no optimization, fragile error handling

After: One API, Automatic Routing

from openai import OpenAI

# Point to Igris Overture
client = OpenAI(
    base_url="https://api.igrisinertial.com/v1",
    api_key="your-api-key"
)

# Make your request - Igris Overture handles everything
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)

# Automatic routing, failover, cost optimization, and tracking
# Response includes metadata about routing decision
print(response.metadata)
# {
#   "provider": "anthropic",
#   "routing_decision": "thompson-sampling",
#   "cost_usd": 0.00034,
#   "latency_ms": 187
# }

Result:

One API instead of three
Automatic failover (no try/except hell)
Cost optimization (Thompson Sampling learns best provider)
Complete observability (every request tracked)

Core Features

1. Thompson Sampling Routing

Bayesian multi-armed bandit algorithm that learns which provider is best for your workload.

How it works:

Tracks success rate, latency, and cost for each provider
Balances exploration (trying new providers) with exploitation (using the best one)
Adapts in real-time as provider performance changes

Result: 30-40% cost reduction by automatically selecting cheaper providers when quality is equivalent.

2. Automatic Failover

# Primary provider fails
curl -X POST https://api.igrisinertial.com/v1/infer \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

# Response shows automatic failover
{
  "choices": [...],
  "metadata": {
    "provider": "anthropic",          # Fallback provider used
    "primary_provider": "openai",     # Primary was unavailable
    "fallback": true,
    "latency_ms": 234
  }
}

3. Semantic Routing

ML-powered classifier routes requests based on intent:

Code generation → DeepSeek Coder
Creative writing → Claude 3
Math/reasoning → GPT-4
General Q&A → GPT-3.5 Turbo

Result: Right provider for the task = better quality + lower cost.

4. Real-Time Cost Tracking

Every request returns cost breakdown:

{
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 42,
    "total_tokens": 57
  },
  "metadata": {
    "cost_usd": 0.00171,
    "provider": "openai",
    "model": "gpt-4"
  }
}

Track spending per tenant, per provider, per model in real-time.

Production Features

Multi-Tenancy with BYOK

Each tenant brings their own provider API keys:

Encrypted storage
Per-tenant routing policies
Complete isolation (no cross-tenant data leakage)

Budget Enforcement

Set monthly budgets per tenant:

Soft limit warning at 90%
Hard limit block at 100% (HTTP 402 Payment Required)
Real-time spend tracking

Observability

Comprehensive metrics:

Request success rate, latency, and throughput
Cost tracking by provider and model
Routing decision insights

Distributed tracing:

Every request gets a trace ID
Correlate across providers, retries, failovers

Audit logs:

Who made what request when
Policy changes tracked
Cost anomaly alerts

Deployment Options

Cloud Hosted (All Tiers)

We run it for you. Zero ops overhead.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.igrisinertial.com/v1",
    api_key="your-api-key"
)

Start Free Trial →

Self-Hosted (Scale Tier)

Run in your own infrastructure with full control:

Deploy to AWS, GCP, Azure, or on-premises
Bring your own monitoring and observability stack
Custom compliance and security requirements
Full access to source code

View Deployment Guide →

Getting Started

5-Minute Quick Start

Start free trial (no credit card)
Point your OpenAI SDK to Igris Overture
Make requests - automatic routing enabled
Check dashboard - see cost savings and routing decisions

Get Started Now →

Next Steps

Quick Start Guide - Set up in 5 minutes
SDK Usage - Go, Python, OpenAI-compatible
API Reference - Complete API docs

Support

Docs: github.com/Igris-inertial/docs
GitHub: github.com/Igris-inertial/system
Discord: Join community
Email: support@igrisinertial.com