Frequently Asked Questions

Get answers to common questions about features, pricing, deployment, and integration.

Most Asked Questions

How much does it cost?

Trial: $0 for 14 days (no credit card) Develop: $149/mo for 500k requests Growth: $899/mo for 2M requests Scale: $2,999/mo for unlimited requests

View Full Pricing →

Can I start for free?

Yes. 14-day free trial with all features unlocked. No credit card required.

How long does setup take?

5 minutes. Sign up, get your API key, change one line of code.

client = OpenAI(
    base_url="https://api.igrisinertial.com/v1",
    api_key="sk-igris-YOUR_KEY"
)

Do I need to switch all my code at once?

No. Start with one endpoint or service. Gradual migration works perfectly.

What if a provider goes down?

Igris Overture automatically fails over to the next best provider in <30 seconds. Your requests never fail.

Getting Started

What is Igris Overture?

Igris Overture is an AI routing and cost optimization platform that sits between your app and multiple LLM providers. Instead of managing OpenAI, Anthropic, Google separately, you make one API call—Igris Overture routes it to the best provider based on cost, speed, and quality.

Think of it as a smart load balancer for AI.

How is this different from calling OpenAI directly?

Using OpenAI directly = one provider, one point of failure, manual cost management.

Using Igris Overture = automatic routing across OpenAI, Anthropic, Google, etc. with:

30-40% cost savings from intelligent provider selection
Zero-downtime failover if OpenAI goes down
60% faster responses with Speculative Execution (races multiple providers)
Complete cost tracking with per-request breakdown
One API instead of managing 3-5 provider integrations

Can I use my existing OpenAI code?

Yes—it's a 2-line change.

from openai import OpenAI

# Before:
# client = OpenAI(api_key="sk-...")

# After:
client = OpenAI(
    base_url="https://api.igrisinertial.com/v1",
    api_key="sk-igris-YOUR_KEY"
)

# Everything else stays the same
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Igris Overture is 100% OpenAI-compatible. Use any OpenAI SDK (Python, Node.js, Go, etc.) with zero code changes beyond the base URL.

Pricing & Billing

What's included in the 14-day trial?

The trial includes:

50,000 requests total
All features unlocked including Speculative, Council, and Cognitive Advisor
No credit card required
No automatic charges at trial end

Do I need a credit card for the trial?

No! The trial requires no credit card. Simply sign up and start using all features immediately.

What happens after the trial ends?

Your account will be paused. You can upgrade to Develop, Growth, or Scale tier anytime to continue using Igris Overture.

Can I change tiers mid-month?

Yes! Upgrades apply immediately with prorated charges. Downgrades take effect at the next billing cycle with a 30-day grace period.

What are overage charges?

Develop: $0.25 per 1,000 requests after 500k/month
Growth: $0.20 per 1,000 requests after 2M/month
Scale: No overage charges (unlimited)

Is there an Enterprise tier?

No. Scale tier is our highest public tier with all features unlocked. For custom needs (air-gapped, white-label), contact sales@igrisinertial.com.

Can I get a discount for yearly billing?

No. We only offer monthly billing with simple cancellation. No yearly contracts.

Features

What is Thompson Sampling?

Thompson Sampling is a Bayesian multi-armed bandit algorithm that intelligently selects the best provider by balancing exploration (trying new providers) and exploitation (using the best-known provider).

It continuously learns from request outcomes (latency, cost, errors) and adapts routing decisions in real-time.

What is Speculative Execution?

Speculative Execution races multiple providers in parallel and delivers the fastest response. This reduces time-to-first-token (TTFT) by up to 60%.

Example: Request "gpt-4" → race OpenAI GPT-4, Anthropic Claude-3.5, Google Gemini-Pro → stream winner's response immediately.

Available in: Growth and Scale tiers

What is Council Mode?

Council Mode sends requests to multiple providers and evaluates responses for quality and consistency using an LLM judge. This improves answer quality by 15-20%.

Use cases: High-stakes decisions, hallucination detection, quality assurance

Available in: Growth and Scale tiers

What is Cognitive Advisor?

Cognitive Advisor is an AI-driven routing assistant that provides recommendations on provider selection, cost optimization, and routing policy adjustments based on your usage patterns.

Available in: Growth and Scale tiers

What is BYOK (Bring Your Own Key)?

BYOK allows you to use your own LLM provider API keys instead of Igris Overture's. Keys are:

AES-256 encrypted in storage
Isolated per tenant (no cross-tenant access)
Rotatable with zero-downtime support

What providers are supported?

Cloud Providers:

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude 3 Opus, Sonnet, Haiku)
Google Gemini (Pro, Ultra)
xAI (Grok)
Mistral AI
DeepSeek

Chinese Providers:

Kimi (Moonshot)
Qwen (Alibaba)
Zai (MiniMax)

Custom: Add your own provider via generic provider adapter (Scale tier)

Can I use multiple providers simultaneously?

Yes! Igris Overture routes requests across all configured providers based on:

Thompson Sampling scores
Semantic routing classification
Cost optimization goals
Availability and health status

What is Semantic Routing?

Semantic Routing uses an ONNX-powered ML classifier to classify prompts into semantic categories (code generation, question answering, creative writing, etc.) and routes to the best provider for that category.

Accuracy: 92%+ Latency: <50ms

Technical

What architecture does Igris Overture use?

Igris Overture uses a hybrid polyglot architecture:

Go - HTTP gateway, routing, middleware
Rust - Thompson Sampling optimizer (FFI)
Python - ML semantic classifier (gRPC)

What is the Rust FFI optimizer?

The Rust optimizer is a high-performance Thompson Sampling implementation called via FFI (Foreign Function Interface) from Go. It's 10-100x faster than Go for Beta distribution sampling at scale.

What databases are required?

PostgreSQL - Tenant data, policies, usage tracking
Redis - Caching, rate limiting, session storage

What monitoring is included?

150+ Prometheus metrics exposed at /metrics
Distributed tracing with OpenTelemetry (trace IDs in every response)
Structured JSON logging with trace correlation
Grafana dashboards (templates provided)

Can I self-host Igris Overture?

Yes! Self-hosting is available on the Scale tier with:

Kubernetes Helm charts
Docker Compose configurations
Full deployment documentation

What's the SLA?

Develop: 99.0% uptime (informational)
Growth: 99.5% uptime (enforced)
Scale: 99.9% uptime (enforced, customizable)

How fast is Igris Overture?

Latency:

API gateway P95: <95ms
Inference P95: <450ms
Thompson Sampling: <5ms
Semantic classification: <50ms

Throughput:

1,200+ RPS per instance (validated)
Horizontal scaling supported

Security

How are API keys stored?

Provider API keys are stored with AES-256-GCM encryption:

32-byte master key (environment variable)
Encrypted key + IV + authentication tag
Per-tenant isolation
Never stored in plaintext

Is tenant data isolated?

Yes! Every tenant's data is:

Logically separated at database level (row-level security)
Physically separated via tenant_id filtering
No cross-tenant access possible

What authentication methods are supported?

JWT Tokens (HMAC-SHA256, 24h expiry)
API Keys (SHA-256 hashed)
RBAC (Role-Based Access Control)
SSO (roadmap - not yet implemented)

Is data encrypted in transit?

Yes. All external connections use TLS 1.3 encryption.

Are audit logs available?

Yes, audit logs are available in Growth and Scale tiers:

Policy changes
Request traces
Authentication events
Cost changes

Integration

What SDKs are available?

Official SDKs:

Go - internal/sdk/go/igris
Python - internal/sdk/python
OpenAI-compatible - Use any OpenAI SDK

Can I use the OpenAI Python SDK?

Yes! Simply change the base URL:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8081/v1",
    api_key="your-api-key"
)

What about streaming responses?

Streaming is fully supported via Server-Sent Events (SSE):

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content)

Can I use cURL?

Yes! Igris Overture is a standard HTTP API:

curl -X POST http://localhost:8081/v1/infer \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Deployment

Where can I deploy Igris Overture?

Cloud Hosted - Managed by Igris Overture (all tiers)
Self-Hosted - Your infrastructure (Scale tier only)
- AWS, GCP, Azure
- On-premise Kubernetes
- Docker Compose

What are the resource requirements?

Minimum (Development):

API Gateway: 2 CPU cores, 4GB RAM
PostgreSQL: 2 CPU cores, 4GB RAM
Redis: 1 CPU core, 2GB RAM

Production (Scale tier):

API Gateway: 4 CPU cores, 8GB RAM (auto-scaling)
PostgreSQL: 4 CPU cores, 8GB RAM (with replicas)
Redis: 2 CPU cores, 4GB RAM (cluster mode)

Can I use managed services?

Yes! Use managed services for:

Database: AWS RDS, Google Cloud SQL, Azure Database
Cache: AWS ElastiCache, Google Memorystore, Azure Cache for Redis
Kubernetes: AWS EKS, Google GKE, Azure AKS

Support

How do I get support?

Documentation: github.com/Igris-inertial/docs
GitHub Issues: github.com/Igris-inertial/system/issues
Email: support@igrisinertial.com
Discord: Join our community

Growth & Scale tiers: Priority support + dedicated solutions engineer

What's the response time for support?

Develop: 48 hours
Growth: 24 hours (priority)
Scale: 12 hours (priority) + dedicated engineer

Can I request features?

Yes! Submit feature requests via:

GitHub Issues
Email to support@igrisinertial.com
Discord community

Troubleshooting

Why is my request failing?

Common causes:

Invalid API key - Check your OPENAI_API_KEY or provider credentials
Rate limit exceeded - Check tier limits in dashboard
Provider down - Igris Overture will auto-fallback to next provider
Invalid model name - Verify model is supported by the provider

How do I check system health?

curl http://localhost:8081/v1/health

How do I view metrics?

Access Prometheus metrics at:

http://localhost:8081/metrics

Or view in Grafana dashboard (templates provided).

Why is latency high?

Possible causes:

Cold start - First request warms up connections
Provider latency - Check provider status
Network issues - Check connectivity to providers
Semantic routing - ONNX classification adds ~50ms

Enable Speculative Execution (Growth+) to reduce latency by 60%.

Roadmap

What features are coming?

In Development:

SSO authentication (OAuth2, SAML)
Multi-region routing
L2 in-memory cache
Advanced alerting backend

Planned:

Embeddings API support
Fine-tuning endpoints
Batch processing
White-label deployment

Can I contribute?

Yes! Igris Overture is open source (MIT license). See CONTRIBUTING.md for guidelines.

Frequently Asked Questions

Most Asked Questions

How much does it cost?

Can I start for free?

How long does setup take?

Do I need to switch all my code at once?

What if a provider goes down?

Getting Started

What is Igris Overture?

How is this different from calling OpenAI directly?

Can I use my existing OpenAI code?

Pricing & Billing

What's included in the 14-day trial?

Do I need a credit card for the trial?

What happens after the trial ends?

Can I change tiers mid-month?

What are overage charges?

Is there an Enterprise tier?

Can I get a discount for yearly billing?

Features

What is Thompson Sampling?

What is Speculative Execution?

What is Council Mode?

What is Cognitive Advisor?

What is BYOK (Bring Your Own Key)?

What providers are supported?

Can I use multiple providers simultaneously?

What is Semantic Routing?

Technical

What architecture does Igris Overture use?

What is the Rust FFI optimizer?

What databases are required?

What monitoring is included?

Can I self-host Igris Overture?

What's the SLA?

How fast is Igris Overture?

Security

How are API keys stored?

Is tenant data isolated?

What authentication methods are supported?

Is data encrypted in transit?

Are audit logs available?

Integration

What SDKs are available?

Can I use the OpenAI Python SDK?

What about streaming responses?

Can I use cURL?

Deployment

Where can I deploy Igris Overture?

What are the resource requirements?

Can I use managed services?

Support

How do I get support?

What's the response time for support?

Can I request features?

Troubleshooting

Why is my request failing?

How do I check system health?

How do I view metrics?

Why is latency high?

Roadmap

What features are coming?

Can I contribute?

Still Have Questions?