Observability

TL;DR: Complete visibility into every request with real-time metrics, distributed tracing, and cost tracking. Export to your existing monitoring tools.

What You Can Track

Igris Overture provides comprehensive observability:

Request Metrics: Success rate, latency, throughput
Cost Tracking: Per-request, per-provider, per-tenant costs
Provider Performance: Which providers are fastest, cheapest, most reliable
Routing Decisions: See why each request went to a specific provider
Distributed Traces: Follow requests across the entire system
Error Tracking: Detailed error rates and types

Metrics Endpoint

All metrics are exposed in Prometheus format at /metrics:

curl https://api.igrisinertial.com/metrics

This endpoint is compatible with:

Prometheus
Datadog
Grafana Cloud
New Relic
Any Prometheus-compatible monitoring system

Key Metrics Available

Request Metrics

Track overall request health:

Total requests: Count of all requests
Success rate: Percentage of successful requests
Request duration: Latency histograms (p50, p95, p99)
Requests per second: Current throughput

Inference Metrics

LLM-specific metrics:

Requests by provider: OpenAI, Anthropic, Google, etc.
Requests by model: GPT-4, Claude 3, Gemini, etc.
Token usage: Prompt tokens, completion tokens
Cost per request: Real-time cost in USD

Routing Metrics

Understand routing decisions:

Thompson Sampling scores: Which provider is winning
Semantic routing classifications: Creative, analytical, coding, etc.
Speculative execution: Race winners and latency improvements
Circuit breaker status: Which providers are healthy/unhealthy

Distributed Tracing

Every request gets a unique trace ID for end-to-end visibility:

{
  "id": "chatcmpl-abc123",
  "metadata": {
    "trace_id": "550e8400-e29b-41d4-a716-446655440000",
    "provider": "anthropic",
    "latency_ms": 187
  }
}

Trace Spans

Each request creates spans for:

HTTP Request (parent span)
Authentication (validating API key)
Rate Limiting (checking tenant limits)
Routing Decision (Thompson Sampling or semantic routing)
Provider Request (actual LLM API call)
Cost Tracking (recording usage and cost)

Viewing Traces

Use trace IDs to search in your tracing system:

Jaeger
Zipkin
Honeycomb
Datadog APM
New Relic

Example trace timeline:

HTTP Request (total: 234ms)
├─ Auth (2ms)
├─ Rate Limit (1ms)
├─ Routing Decision (5ms)
│  └─ Thompson Sampling (4ms)
├─ Provider Request (187ms)
│  └─ Anthropic API (185ms)
└─ Cost Tracking (3ms)

Cost Tracking

Per-Request Cost

Every response includes cost breakdown:

{
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 42,
    "total_tokens": 57
  },
  "metadata": {
    "cost_usd": 0.00171,
    "provider": "openai",
    "model": "gpt-4",
    "cost_breakdown": {
      "prompt_cost": 0.00045,
      "completion_cost": 0.00126
    }
  }
}

Aggregate Cost Metrics

Track spending over time:

# Get cost metrics by provider
curl https://api.igrisinertial.com/v1/metrics/cost?group_by=provider

# Get cost metrics by tenant
curl https://api.igrisinertial.com/v1/metrics/cost?group_by=tenant

# Get cost metrics by model
curl https://api.igrisinertial.com/v1/metrics/cost?group_by=model

Response:

{
  "period": "today",
  "total_cost_usd": 456.78,
  "by_provider": {
    "openai": 234.56,
    "anthropic": 222.22
  },
  "by_model": {
    "gpt-4": 178.90,
    "gpt-3.5-turbo": 55.66,
    "claude-3-sonnet": 222.22
  }
}

Dashboards

Cloud Hosted Dashboards

If you're using cloud hosted Igris Overture, dashboards are built-in:

Overview Dashboard
- Requests per second
- Success rate
- Average latency
- Total cost today
Provider Performance
- Latency by provider
- Success rate by provider
- Cost efficiency comparison
Cost Analytics
- Spend over time
- Top spending tenants
- Cost per provider/model
- Budget alerts
Routing Insights
- Thompson Sampling scores
- Provider selection distribution
- Speculative execution wins

View Dashboard →

Self-Hosted Monitoring

For self-hosted deployments, export metrics to your existing stack:

Prometheus + Grafana:

# prometheus.yml
scrape_configs:
  - job_name: 'igris-overture'
    static_configs:
      - targets: ['api.igris.internal:8081']
    metrics_path: '/metrics'
    scrape_interval: 15s

Datadog:

# datadog.yaml
instances:
  - prometheus_url: http://api.igris.internal:8081/metrics
    namespace: igris
    metrics:
      - '*'

Grafana Cloud:

Use the Prometheus remote write endpoint to send metrics directly to Grafana Cloud.

Alerts

Set up alerts for critical events:

Budget Alerts

Get notified when spending approaches limits:

POST /v1/tenants/{tenant_id}/budget
{
  "monthly_budget_usd": 5000.00,
  "alert_threshold": 0.90,
  "notification_channels": ["email", "webhook"]
}

Performance Alerts

Monitor latency and error rates:

High Latency: Alert when p95 latency > 2000ms
High Error Rate: Alert when error rate > 5%
Provider Failure: Alert when circuit breaker opens
Rate Limit: Alert when approaching rate limits

Custom Webhooks

Send alerts to your own systems:

{
  "event": "high_latency_alert",
  "tenant_id": "tenant_abc123",
  "metric": "p95_latency_ms",
  "current_value": 2345,
  "threshold": 2000,
  "provider": "openai",
  "timestamp": "2025-11-30T12:00:00Z"
}

Log Integration

Structured Logging

All logs are JSON-formatted for easy parsing:

{
  "timestamp": "2025-11-30T12:00:00Z",
  "level": "info",
  "message": "inference_request_completed",
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "tenant_id": "tenant_abc123",
  "provider": "anthropic",
  "model": "claude-3-sonnet",
  "latency_ms": 187,
  "cost_usd": 0.00034,
  "status": "success"
}

Log Aggregation

Compatible with standard log aggregation tools:

ELK Stack (Elasticsearch, Logstash, Kibana)
Loki + Grafana
Splunk
Datadog Logs
CloudWatch Logs

Example Queries

Prometheus Queries

Requests per second:

rate(http_requests_total[5m])

Success rate:

sum(rate(http_requests_total{status=~"2.."}[5m]))
/
sum(rate(http_requests_total[5m]))

P95 latency:

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Cost per hour:

sum(rate(cost_usd_total[1h]))

Best Practices

Monitoring

Set up dashboards early - Don't wait until you have issues
Monitor all three: latency, error rate, throughput
Track costs daily - Catch unexpected spending quickly
Use trace IDs to debug specific failed requests

Alerting

Start with budget alerts - Most critical for cost control
Alert on trends not just thresholds (e.g., latency increasing)
Use different channels for different severity (email vs. PagerDuty)
Test your alerts before going to production

Optimization

Review provider performance weekly - Thompson Sampling adapts, but review manually
Check for cost anomalies - Unusual spikes might indicate issues
Monitor circuit breaker state - Frequent opens = provider reliability issues
Track speculative execution waste - Should be <30%

API Reference

GET /metrics

Prometheus-formatted metrics endpoint.

Response format: Prometheus exposition format

Access: Requires admin token or metrics-specific token

GET /v1/metrics/cost

Aggregate cost metrics.

Query parameters:

period: today, week, month
group_by: provider, model, tenant
tenant_id: Filter by tenant (admin only)

Response:

{
  "period": "today",
  "total_cost_usd": 456.78,
  "breakdown": {...}
}

GET /v1/metrics/latency

Latency percentiles.

Query parameters:

period: 5m, 1h, 1d
provider: Filter by provider
model: Filter by model

Response:

{
  "period": "1h",
  "p50_ms": 234,
  "p95_ms": 456,
  "p99_ms": 678
}

Next Steps

Multi-Tenancy - Per-tenant monitoring
API Reference - Complete API docs
Quick Start - Get started with Igris Overture