Observability
TL;DR: Complete visibility into every request with real-time metrics, distributed tracing, and cost tracking. Export to your existing monitoring tools.
What You Can Track
Igris Overture provides comprehensive observability:
- Request Metrics: Success rate, latency, throughput
- Cost Tracking: Per-request, per-provider, per-tenant costs
- Provider Performance: Which providers are fastest, cheapest, most reliable
- Routing Decisions: See why each request went to a specific provider
- Distributed Traces: Follow requests across the entire system
- Error Tracking: Detailed error rates and types
Metrics Endpoint
All metrics are exposed in Prometheus format at /metrics:
curl https://api.igrisinertial.com/metrics
This endpoint is compatible with:
- Prometheus
- Datadog
- Grafana Cloud
- New Relic
- Any Prometheus-compatible monitoring system
Key Metrics Available
Request Metrics
Track overall request health:
- Total requests: Count of all requests
- Success rate: Percentage of successful requests
- Request duration: Latency histograms (p50, p95, p99)
- Requests per second: Current throughput
Inference Metrics
LLM-specific metrics:
- Requests by provider: OpenAI, Anthropic, Google, etc.
- Requests by model: GPT-4, Claude 3, Gemini, etc.
- Token usage: Prompt tokens, completion tokens
- Cost per request: Real-time cost in USD
Routing Metrics
Understand routing decisions:
- Thompson Sampling scores: Which provider is winning
- Semantic routing classifications: Creative, analytical, coding, etc.
- Speculative execution: Race winners and latency improvements
- Circuit breaker status: Which providers are healthy/unhealthy
Distributed Tracing
Every request gets a unique trace ID for end-to-end visibility:
{
"id": "chatcmpl-abc123",
"metadata": {
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"provider": "anthropic",
"latency_ms": 187
}
}
Trace Spans
Each request creates spans for:
- HTTP Request (parent span)
- Authentication (validating API key)
- Rate Limiting (checking tenant limits)
- Routing Decision (Thompson Sampling or semantic routing)
- Provider Request (actual LLM API call)
- Cost Tracking (recording usage and cost)
Viewing Traces
Use trace IDs to search in your tracing system:
- Jaeger
- Zipkin
- Honeycomb
- Datadog APM
- New Relic
Example trace timeline:
HTTP Request (total: 234ms)
├─ Auth (2ms)
├─ Rate Limit (1ms)
├─ Routing Decision (5ms)
│ └─ Thompson Sampling (4ms)
├─ Provider Request (187ms)
│ └─ Anthropic API (185ms)
└─ Cost Tracking (3ms)
Cost Tracking
Per-Request Cost
Every response includes cost breakdown:
{
"usage": {
"prompt_tokens": 15,
"completion_tokens": 42,
"total_tokens": 57
},
"metadata": {
"cost_usd": 0.00171,
"provider": "openai",
"model": "gpt-4",
"cost_breakdown": {
"prompt_cost": 0.00045,
"completion_cost": 0.00126
}
}
}
Aggregate Cost Metrics
Track spending over time:
# Get cost metrics by provider
curl https://api.igrisinertial.com/v1/metrics/cost?group_by=provider
# Get cost metrics by tenant
curl https://api.igrisinertial.com/v1/metrics/cost?group_by=tenant
# Get cost metrics by model
curl https://api.igrisinertial.com/v1/metrics/cost?group_by=model
Response:
{
"period": "today",
"total_cost_usd": 456.78,
"by_provider": {
"openai": 234.56,
"anthropic": 222.22
},
"by_model": {
"gpt-4": 178.90,
"gpt-3.5-turbo": 55.66,
"claude-3-sonnet": 222.22
}
}
Dashboards
Cloud Hosted Dashboards
If you're using cloud hosted Igris Overture, dashboards are built-in:
-
Overview Dashboard
- Requests per second
- Success rate
- Average latency
- Total cost today
-
Provider Performance
- Latency by provider
- Success rate by provider
- Cost efficiency comparison
-
Cost Analytics
- Spend over time
- Top spending tenants
- Cost per provider/model
- Budget alerts
-
Routing Insights
- Thompson Sampling scores
- Provider selection distribution
- Speculative execution wins
Self-Hosted Monitoring
For self-hosted deployments, export metrics to your existing stack:
Prometheus + Grafana:
# prometheus.yml
scrape_configs:
- job_name: 'igris-overture'
static_configs:
- targets: ['api.igris.internal:8081']
metrics_path: '/metrics'
scrape_interval: 15s
Datadog:
# datadog.yaml
instances:
- prometheus_url: http://api.igris.internal:8081/metrics
namespace: igris
metrics:
- '*'
Grafana Cloud:
Use the Prometheus remote write endpoint to send metrics directly to Grafana Cloud.
Alerts
Set up alerts for critical events:
Budget Alerts
Get notified when spending approaches limits:
POST /v1/tenants/{tenant_id}/budget
{
"monthly_budget_usd": 5000.00,
"alert_threshold": 0.90,
"notification_channels": ["email", "webhook"]
}
Performance Alerts
Monitor latency and error rates:
- High Latency: Alert when p95 latency > 2000ms
- High Error Rate: Alert when error rate > 5%
- Provider Failure: Alert when circuit breaker opens
- Rate Limit: Alert when approaching rate limits
Custom Webhooks
Send alerts to your own systems:
{
"event": "high_latency_alert",
"tenant_id": "tenant_abc123",
"metric": "p95_latency_ms",
"current_value": 2345,
"threshold": 2000,
"provider": "openai",
"timestamp": "2025-11-30T12:00:00Z"
}
Log Integration
Structured Logging
All logs are JSON-formatted for easy parsing:
{
"timestamp": "2025-11-30T12:00:00Z",
"level": "info",
"message": "inference_request_completed",
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"tenant_id": "tenant_abc123",
"provider": "anthropic",
"model": "claude-3-sonnet",
"latency_ms": 187,
"cost_usd": 0.00034,
"status": "success"
}
Log Aggregation
Compatible with standard log aggregation tools:
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Loki + Grafana
- Splunk
- Datadog Logs
- CloudWatch Logs
Example Queries
Prometheus Queries
Requests per second:
rate(http_requests_total[5m])
Success rate:
sum(rate(http_requests_total{status=~"2.."}[5m]))
/
sum(rate(http_requests_total[5m]))
P95 latency:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
Cost per hour:
sum(rate(cost_usd_total[1h]))
Best Practices
Monitoring
- Set up dashboards early - Don't wait until you have issues
- Monitor all three: latency, error rate, throughput
- Track costs daily - Catch unexpected spending quickly
- Use trace IDs to debug specific failed requests
Alerting
- Start with budget alerts - Most critical for cost control
- Alert on trends not just thresholds (e.g., latency increasing)
- Use different channels for different severity (email vs. PagerDuty)
- Test your alerts before going to production
Optimization
- Review provider performance weekly - Thompson Sampling adapts, but review manually
- Check for cost anomalies - Unusual spikes might indicate issues
- Monitor circuit breaker state - Frequent opens = provider reliability issues
- Track speculative execution waste - Should be
<30%
API Reference
GET /metrics
Prometheus-formatted metrics endpoint.
Response format: Prometheus exposition format
Access: Requires admin token or metrics-specific token
GET /v1/metrics/cost
Aggregate cost metrics.
Query parameters:
period:today,week,monthgroup_by:provider,model,tenanttenant_id: Filter by tenant (admin only)
Response:
{
"period": "today",
"total_cost_usd": 456.78,
"breakdown": {...}
}
GET /v1/metrics/latency
Latency percentiles.
Query parameters:
period:5m,1h,1dprovider: Filter by providermodel: Filter by model
Response:
{
"period": "1h",
"p50_ms": 234,
"p95_ms": 456,
"p99_ms": 678
}
Next Steps
- Multi-Tenancy - Per-tenant monitoring
- API Reference - Complete API docs
- Quick Start - Get started with Igris Overture