Frequently Asked Questions
Get answers to common questions about features, pricing, deployment, and integration.
Most Asked Questions
How much does it cost?
Trial: $0 for 14 days (no credit card) Develop: $149/mo for 500k requests Growth: $899/mo for 2M requests Scale: $2,999/mo for unlimited requests
Can I start for free?
Yes. 14-day free trial with all features unlocked. No credit card required.
How long does setup take?
5 minutes. Sign up, get your API key, change one line of code.
client = OpenAI(
base_url="https://api.igrisinertial.com/v1",
api_key="sk-igris-YOUR_KEY"
)
Do I need to switch all my code at once?
No. Start with one endpoint or service. Gradual migration works perfectly.
What if a provider goes down?
Igris Overture automatically fails over to the next best provider in <30 seconds. Your requests never fail.
Getting Started
What is Igris Overture?
Igris Overture is an AI routing and cost optimization platform that sits between your app and multiple LLM providers. Instead of managing OpenAI, Anthropic, Google separately, you make one API call—Igris Overture routes it to the best provider based on cost, speed, and quality.
Think of it as a smart load balancer for AI.
How is this different from calling OpenAI directly?
Using OpenAI directly = one provider, one point of failure, manual cost management.
Using Igris Overture = automatic routing across OpenAI, Anthropic, Google, etc. with:
- 30-40% cost savings from intelligent provider selection
- Zero-downtime failover if OpenAI goes down
- 60% faster responses with Speculative Execution (races multiple providers)
- Complete cost tracking with per-request breakdown
- One API instead of managing 3-5 provider integrations
Can I use my existing OpenAI code?
Yes—it's a 2-line change.
from openai import OpenAI
# Before:
# client = OpenAI(api_key="sk-...")
# After:
client = OpenAI(
base_url="https://api.igrisinertial.com/v1",
api_key="sk-igris-YOUR_KEY"
)
# Everything else stays the same
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
Igris Overture is 100% OpenAI-compatible. Use any OpenAI SDK (Python, Node.js, Go, etc.) with zero code changes beyond the base URL.
Pricing & Billing
What's included in the 14-day trial?
The trial includes:
- 50,000 requests total
- All features unlocked including Speculative, Council, and Cognitive Advisor
- No credit card required
- No automatic charges at trial end
Do I need a credit card for the trial?
No! The trial requires no credit card. Simply sign up and start using all features immediately.
What happens after the trial ends?
Your account will be paused. You can upgrade to Develop, Growth, or Scale tier anytime to continue using Igris Overture.
Can I change tiers mid-month?
Yes! Upgrades apply immediately with prorated charges. Downgrades take effect at the next billing cycle with a 30-day grace period.
What are overage charges?
- Develop: $0.25 per 1,000 requests after 500k/month
- Growth: $0.20 per 1,000 requests after 2M/month
- Scale: No overage charges (unlimited)
Is there an Enterprise tier?
No. Scale tier is our highest public tier with all features unlocked. For custom needs (air-gapped, white-label), contact sales@igrisinertial.com.
Can I get a discount for yearly billing?
No. We only offer monthly billing with simple cancellation. No yearly contracts.
Features
What is Thompson Sampling?
Thompson Sampling is a Bayesian multi-armed bandit algorithm that intelligently selects the best provider by balancing exploration (trying new providers) and exploitation (using the best-known provider).
It continuously learns from request outcomes (latency, cost, errors) and adapts routing decisions in real-time.
What is Speculative Execution?
Speculative Execution races multiple providers in parallel and delivers the fastest response. This reduces time-to-first-token (TTFT) by up to 60%.
Example: Request "gpt-4" → race OpenAI GPT-4, Anthropic Claude-3.5, Google Gemini-Pro → stream winner's response immediately.
Available in: Growth and Scale tiers
What is Council Mode?
Council Mode sends requests to multiple providers and evaluates responses for quality and consistency using an LLM judge. This improves answer quality by 15-20%.
Use cases: High-stakes decisions, hallucination detection, quality assurance
Available in: Growth and Scale tiers
What is Cognitive Advisor?
Cognitive Advisor is an AI-driven routing assistant that provides recommendations on provider selection, cost optimization, and routing policy adjustments based on your usage patterns.
Available in: Growth and Scale tiers
What is BYOK (Bring Your Own Key)?
BYOK allows you to use your own LLM provider API keys instead of Igris Overture's. Keys are:
- AES-256 encrypted in storage
- Isolated per tenant (no cross-tenant access)
- Rotatable with zero-downtime support
What providers are supported?
Cloud Providers:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude 3 Opus, Sonnet, Haiku)
- Google Gemini (Pro, Ultra)
- xAI (Grok)
- Mistral AI
- DeepSeek
Chinese Providers:
- Kimi (Moonshot)
- Qwen (Alibaba)
- Zai (MiniMax)
Custom: Add your own provider via generic provider adapter (Scale tier)
Can I use multiple providers simultaneously?
Yes! Igris Overture routes requests across all configured providers based on:
- Thompson Sampling scores
- Semantic routing classification
- Cost optimization goals
- Availability and health status
What is Semantic Routing?
Semantic Routing uses an ONNX-powered ML classifier to classify prompts into semantic categories (code generation, question answering, creative writing, etc.) and routes to the best provider for that category.
Accuracy: 92%+ Latency: <50ms
Technical
What architecture does Igris Overture use?
Igris Overture uses a hybrid polyglot architecture:
- Go - HTTP gateway, routing, middleware
- Rust - Thompson Sampling optimizer (FFI)
- Python - ML semantic classifier (gRPC)
What is the Rust FFI optimizer?
The Rust optimizer is a high-performance Thompson Sampling implementation called via FFI (Foreign Function Interface) from Go. It's 10-100x faster than Go for Beta distribution sampling at scale.
What databases are required?
- PostgreSQL - Tenant data, policies, usage tracking
- Redis - Caching, rate limiting, session storage
What monitoring is included?
- 150+ Prometheus metrics exposed at
/metrics - Distributed tracing with OpenTelemetry (trace IDs in every response)
- Structured JSON logging with trace correlation
- Grafana dashboards (templates provided)
Can I self-host Igris Overture?
Yes! Self-hosting is available on the Scale tier with:
- Kubernetes Helm charts
- Docker Compose configurations
- Full deployment documentation
What's the SLA?
- Develop: 99.0% uptime (informational)
- Growth: 99.5% uptime (enforced)
- Scale: 99.9% uptime (enforced, customizable)
How fast is Igris Overture?
Latency:
- API gateway P95: <95ms
- Inference P95: <450ms
- Thompson Sampling: <5ms
- Semantic classification: <50ms
Throughput:
- 1,200+ RPS per instance (validated)
- Horizontal scaling supported
Security
How are API keys stored?
Provider API keys are stored with AES-256-GCM encryption:
- 32-byte master key (environment variable)
- Encrypted key + IV + authentication tag
- Per-tenant isolation
- Never stored in plaintext
Is tenant data isolated?
Yes! Every tenant's data is:
- Logically separated at database level (row-level security)
- Physically separated via tenant_id filtering
- No cross-tenant access possible
What authentication methods are supported?
- JWT Tokens (HMAC-SHA256, 24h expiry)
- API Keys (SHA-256 hashed)
- RBAC (Role-Based Access Control)
- SSO (roadmap - not yet implemented)
Is data encrypted in transit?
Yes. All external connections use TLS 1.3 encryption.
Are audit logs available?
Yes, audit logs are available in Growth and Scale tiers:
- Policy changes
- Request traces
- Authentication events
- Cost changes
Integration
What SDKs are available?
Official SDKs:
- Go -
internal/sdk/go/igris - Python -
internal/sdk/python - OpenAI-compatible - Use any OpenAI SDK
Can I use the OpenAI Python SDK?
Yes! Simply change the base URL:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8081/v1",
api_key="your-api-key"
)
What about streaming responses?
Streaming is fully supported via Server-Sent Events (SSE):
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content)
Can I use cURL?
Yes! Igris Overture is a standard HTTP API:
curl -X POST http://localhost:8081/v1/infer \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}'
Deployment
Where can I deploy Igris Overture?
- Cloud Hosted - Managed by Igris Overture (all tiers)
- Self-Hosted - Your infrastructure (Scale tier only)
- AWS, GCP, Azure
- On-premise Kubernetes
- Docker Compose
What are the resource requirements?
Minimum (Development):
- API Gateway: 2 CPU cores, 4GB RAM
- PostgreSQL: 2 CPU cores, 4GB RAM
- Redis: 1 CPU core, 2GB RAM
Production (Scale tier):
- API Gateway: 4 CPU cores, 8GB RAM (auto-scaling)
- PostgreSQL: 4 CPU cores, 8GB RAM (with replicas)
- Redis: 2 CPU cores, 4GB RAM (cluster mode)
Can I use managed services?
Yes! Use managed services for:
- Database: AWS RDS, Google Cloud SQL, Azure Database
- Cache: AWS ElastiCache, Google Memorystore, Azure Cache for Redis
- Kubernetes: AWS EKS, Google GKE, Azure AKS
Support
How do I get support?
- Documentation: github.com/Igris-inertial/docs
- GitHub Issues: github.com/Igris-inertial/system/issues
- Email: support@igrisinertial.com
- Discord: Join our community
Growth & Scale tiers: Priority support + dedicated solutions engineer
What's the response time for support?
- Develop: 48 hours
- Growth: 24 hours (priority)
- Scale: 12 hours (priority) + dedicated engineer
Can I request features?
Yes! Submit feature requests via:
- GitHub Issues
- Email to support@igrisinertial.com
- Discord community
Troubleshooting
Why is my request failing?
Common causes:
- Invalid API key - Check your
OPENAI_API_KEYor provider credentials - Rate limit exceeded - Check tier limits in dashboard
- Provider down - Igris Overture will auto-fallback to next provider
- Invalid model name - Verify model is supported by the provider
How do I check system health?
curl http://localhost:8081/v1/health
How do I view metrics?
Access Prometheus metrics at:
http://localhost:8081/metrics
Or view in Grafana dashboard (templates provided).
Why is latency high?
Possible causes:
- Cold start - First request warms up connections
- Provider latency - Check provider status
- Network issues - Check connectivity to providers
- Semantic routing - ONNX classification adds ~50ms
Enable Speculative Execution (Growth+) to reduce latency by 60%.
Roadmap
What features are coming?
In Development:
- SSO authentication (OAuth2, SAML)
- Multi-region routing
- L2 in-memory cache
- Advanced alerting backend
Planned:
- Embeddings API support
- Fine-tuning endpoints
- Batch processing
- White-label deployment
Can I contribute?
Yes! Igris Overture is open source (MIT license). See CONTRIBUTING.md for guidelines.
Still Have Questions?
- Email: support@igrisinertial.com
- Discord: Join our community
- GitHub: Report an issue