Frequently Asked Questions

Get answers to common questions about features, pricing, deployment, and integration.


Most Asked Questions

How much does it cost?

Trial: $0 for 14 days (no credit card) Develop: $149/mo for 500k requests Growth: $899/mo for 2M requests Scale: $2,999/mo for unlimited requests

View Full Pricing →

Can I start for free?

Yes. 14-day free trial with all features unlocked. No credit card required.

How long does setup take?

5 minutes. Sign up, get your API key, change one line of code.

client = OpenAI(
    base_url="https://api.igrisinertial.com/v1",
    api_key="sk-igris-YOUR_KEY"
)

Do I need to switch all my code at once?

No. Start with one endpoint or service. Gradual migration works perfectly.

What if a provider goes down?

Igris Overture automatically fails over to the next best provider in <30 seconds. Your requests never fail.


Getting Started

What is Igris Overture?

Igris Overture is an AI routing and cost optimization platform that sits between your app and multiple LLM providers. Instead of managing OpenAI, Anthropic, Google separately, you make one API call—Igris Overture routes it to the best provider based on cost, speed, and quality.

Think of it as a smart load balancer for AI.

How is this different from calling OpenAI directly?

Using OpenAI directly = one provider, one point of failure, manual cost management.

Using Igris Overture = automatic routing across OpenAI, Anthropic, Google, etc. with:

  • 30-40% cost savings from intelligent provider selection
  • Zero-downtime failover if OpenAI goes down
  • 60% faster responses with Speculative Execution (races multiple providers)
  • Complete cost tracking with per-request breakdown
  • One API instead of managing 3-5 provider integrations

Can I use my existing OpenAI code?

Yes—it's a 2-line change.

from openai import OpenAI

# Before:
# client = OpenAI(api_key="sk-...")

# After:
client = OpenAI(
    base_url="https://api.igrisinertial.com/v1",
    api_key="sk-igris-YOUR_KEY"
)

# Everything else stays the same
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Igris Overture is 100% OpenAI-compatible. Use any OpenAI SDK (Python, Node.js, Go, etc.) with zero code changes beyond the base URL.


Pricing & Billing

What's included in the 14-day trial?

The trial includes:

  • 50,000 requests total
  • All features unlocked including Speculative, Council, and Cognitive Advisor
  • No credit card required
  • No automatic charges at trial end

Do I need a credit card for the trial?

No! The trial requires no credit card. Simply sign up and start using all features immediately.

What happens after the trial ends?

Your account will be paused. You can upgrade to Develop, Growth, or Scale tier anytime to continue using Igris Overture.

Can I change tiers mid-month?

Yes! Upgrades apply immediately with prorated charges. Downgrades take effect at the next billing cycle with a 30-day grace period.

What are overage charges?

  • Develop: $0.25 per 1,000 requests after 500k/month
  • Growth: $0.20 per 1,000 requests after 2M/month
  • Scale: No overage charges (unlimited)

Is there an Enterprise tier?

No. Scale tier is our highest public tier with all features unlocked. For custom needs (air-gapped, white-label), contact sales@igrisinertial.com.

Can I get a discount for yearly billing?

No. We only offer monthly billing with simple cancellation. No yearly contracts.


Features

What is Thompson Sampling?

Thompson Sampling is a Bayesian multi-armed bandit algorithm that intelligently selects the best provider by balancing exploration (trying new providers) and exploitation (using the best-known provider).

It continuously learns from request outcomes (latency, cost, errors) and adapts routing decisions in real-time.

What is Speculative Execution?

Speculative Execution races multiple providers in parallel and delivers the fastest response. This reduces time-to-first-token (TTFT) by up to 60%.

Example: Request "gpt-4" → race OpenAI GPT-4, Anthropic Claude-3.5, Google Gemini-Pro → stream winner's response immediately.

Available in: Growth and Scale tiers

What is Council Mode?

Council Mode sends requests to multiple providers and evaluates responses for quality and consistency using an LLM judge. This improves answer quality by 15-20%.

Use cases: High-stakes decisions, hallucination detection, quality assurance

Available in: Growth and Scale tiers

What is Cognitive Advisor?

Cognitive Advisor is an AI-driven routing assistant that provides recommendations on provider selection, cost optimization, and routing policy adjustments based on your usage patterns.

Available in: Growth and Scale tiers

What is BYOK (Bring Your Own Key)?

BYOK allows you to use your own LLM provider API keys instead of Igris Overture's. Keys are:

  • AES-256 encrypted in storage
  • Isolated per tenant (no cross-tenant access)
  • Rotatable with zero-downtime support

What providers are supported?

Cloud Providers:

  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude 3 Opus, Sonnet, Haiku)
  • Google Gemini (Pro, Ultra)
  • xAI (Grok)
  • Mistral AI
  • DeepSeek

Chinese Providers:

  • Kimi (Moonshot)
  • Qwen (Alibaba)
  • Zai (MiniMax)

Custom: Add your own provider via generic provider adapter (Scale tier)

Can I use multiple providers simultaneously?

Yes! Igris Overture routes requests across all configured providers based on:

  • Thompson Sampling scores
  • Semantic routing classification
  • Cost optimization goals
  • Availability and health status

What is Semantic Routing?

Semantic Routing uses an ONNX-powered ML classifier to classify prompts into semantic categories (code generation, question answering, creative writing, etc.) and routes to the best provider for that category.

Accuracy: 92%+ Latency: <50ms


Technical

What architecture does Igris Overture use?

Igris Overture uses a hybrid polyglot architecture:

  • Go - HTTP gateway, routing, middleware
  • Rust - Thompson Sampling optimizer (FFI)
  • Python - ML semantic classifier (gRPC)

What is the Rust FFI optimizer?

The Rust optimizer is a high-performance Thompson Sampling implementation called via FFI (Foreign Function Interface) from Go. It's 10-100x faster than Go for Beta distribution sampling at scale.

What databases are required?

  • PostgreSQL - Tenant data, policies, usage tracking
  • Redis - Caching, rate limiting, session storage

What monitoring is included?

  • 150+ Prometheus metrics exposed at /metrics
  • Distributed tracing with OpenTelemetry (trace IDs in every response)
  • Structured JSON logging with trace correlation
  • Grafana dashboards (templates provided)

Can I self-host Igris Overture?

Yes! Self-hosting is available on the Scale tier with:

  • Kubernetes Helm charts
  • Docker Compose configurations
  • Full deployment documentation

What's the SLA?

  • Develop: 99.0% uptime (informational)
  • Growth: 99.5% uptime (enforced)
  • Scale: 99.9% uptime (enforced, customizable)

How fast is Igris Overture?

Latency:

  • API gateway P95: <95ms
  • Inference P95: <450ms
  • Thompson Sampling: <5ms
  • Semantic classification: <50ms

Throughput:

  • 1,200+ RPS per instance (validated)
  • Horizontal scaling supported

Security

How are API keys stored?

Provider API keys are stored with AES-256-GCM encryption:

  • 32-byte master key (environment variable)
  • Encrypted key + IV + authentication tag
  • Per-tenant isolation
  • Never stored in plaintext

Is tenant data isolated?

Yes! Every tenant's data is:

  • Logically separated at database level (row-level security)
  • Physically separated via tenant_id filtering
  • No cross-tenant access possible

What authentication methods are supported?

  • JWT Tokens (HMAC-SHA256, 24h expiry)
  • API Keys (SHA-256 hashed)
  • RBAC (Role-Based Access Control)
  • SSO (roadmap - not yet implemented)

Is data encrypted in transit?

Yes. All external connections use TLS 1.3 encryption.

Are audit logs available?

Yes, audit logs are available in Growth and Scale tiers:

  • Policy changes
  • Request traces
  • Authentication events
  • Cost changes

Integration

What SDKs are available?

Official SDKs:

  • Go - internal/sdk/go/igris
  • Python - internal/sdk/python
  • OpenAI-compatible - Use any OpenAI SDK

Can I use the OpenAI Python SDK?

Yes! Simply change the base URL:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8081/v1",
    api_key="your-api-key"
)

What about streaming responses?

Streaming is fully supported via Server-Sent Events (SSE):

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content)

Can I use cURL?

Yes! Igris Overture is a standard HTTP API:

curl -X POST http://localhost:8081/v1/infer \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Deployment

Where can I deploy Igris Overture?

  • Cloud Hosted - Managed by Igris Overture (all tiers)
  • Self-Hosted - Your infrastructure (Scale tier only)
    • AWS, GCP, Azure
    • On-premise Kubernetes
    • Docker Compose

What are the resource requirements?

Minimum (Development):

  • API Gateway: 2 CPU cores, 4GB RAM
  • PostgreSQL: 2 CPU cores, 4GB RAM
  • Redis: 1 CPU core, 2GB RAM

Production (Scale tier):

  • API Gateway: 4 CPU cores, 8GB RAM (auto-scaling)
  • PostgreSQL: 4 CPU cores, 8GB RAM (with replicas)
  • Redis: 2 CPU cores, 4GB RAM (cluster mode)

Can I use managed services?

Yes! Use managed services for:

  • Database: AWS RDS, Google Cloud SQL, Azure Database
  • Cache: AWS ElastiCache, Google Memorystore, Azure Cache for Redis
  • Kubernetes: AWS EKS, Google GKE, Azure AKS

Support

How do I get support?

  • Documentation: github.com/Igris-inertial/docs
  • GitHub Issues: github.com/Igris-inertial/system/issues
  • Email: support@igrisinertial.com
  • Discord: Join our community

Growth & Scale tiers: Priority support + dedicated solutions engineer

What's the response time for support?

  • Develop: 48 hours
  • Growth: 24 hours (priority)
  • Scale: 12 hours (priority) + dedicated engineer

Can I request features?

Yes! Submit feature requests via:


Troubleshooting

Why is my request failing?

Common causes:

  1. Invalid API key - Check your OPENAI_API_KEY or provider credentials
  2. Rate limit exceeded - Check tier limits in dashboard
  3. Provider down - Igris Overture will auto-fallback to next provider
  4. Invalid model name - Verify model is supported by the provider

How do I check system health?

curl http://localhost:8081/v1/health

How do I view metrics?

Access Prometheus metrics at:

http://localhost:8081/metrics

Or view in Grafana dashboard (templates provided).

Why is latency high?

Possible causes:

  • Cold start - First request warms up connections
  • Provider latency - Check provider status
  • Network issues - Check connectivity to providers
  • Semantic routing - ONNX classification adds ~50ms

Enable Speculative Execution (Growth+) to reduce latency by 60%.


Roadmap

What features are coming?

In Development:

  • SSO authentication (OAuth2, SAML)
  • Multi-region routing
  • L2 in-memory cache
  • Advanced alerting backend

Planned:

  • Embeddings API support
  • Fine-tuning endpoints
  • Batch processing
  • White-label deployment

Can I contribute?

Yes! Igris Overture is open source (MIT license). See CONTRIBUTING.md for guidelines.


Still Have Questions?