Introduction

TL;DR: Igris Runtime is a licensed governed execution engine for AI workloads. It executes with secure defaults, enforced resource limits, deterministic execution envelopes, and telemetry-backed observability. Runtime ensures execution safety and auditability whether running in cloud, edge, or air-gapped environments.

Licensing Model: Runtime is licensed software with recurring maintenance fees, not a usage-metered SaaS. Pricing reflects deployment scope, policy enforcement capabilities, telemetry infrastructure, and ongoing support—not request volume.


The Problem

Your application depends on cloud AI providers. Then the network goes down. Or you're deploying to an edge location with spotty connectivity. Or you need to run in an air-gapped environment for security compliance.

The result: Your AI-powered features stop working exactly when users need them most.

Traditional solutions force you to choose: cloud-only (unreliable in degraded networks) or local-only (limited model quality). Igris Runtime gives you both.


What Igris Runtime Does

Runtime executes AI workloads with governance, safety, and observability. Every execution happens within a deterministic envelope with resource safety limits, telemetry capture, and signed execution contracts.

# Make a request - works with or without internet
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

# With internet: Routes to GPT-4
# Without internet: Automatically falls back to local Phi-3 model
# User gets a response either way

What You Get

Governed Execution:

  • Secure-by-default execution environment
  • Resource safety limits (tool calls, execution time, output size)
  • Policy enforcement hooks
  • Signed execution envelopes with cryptographic verification

Telemetry & Observability:

  • Real-time execution telemetry
  • Configurable retention periods (short/extended/90-day)
  • Execution graph capture (DAG per request)
  • Prometheus metrics and structured logging

Why Recurring Pricing:

Runtime is licensed software with recurring maintenance fees because deployment value scales with capabilities, not usage:

  • Deployment Scope: Single-node → Multi-runtime → Fleet-wide coordination and management
  • Policy Enforcement Engine: Continuous evolution of governance rules and safety limits
  • Telemetry Infrastructure: 7-day → 30-day → 90-day retention with storage, streaming, and export
  • Security & Updates: Priority patches, signed releases, and vulnerability remediation
  • Compliance Support: Audit-ready execution traces, tamper-evident logs, retention guarantees

Runtime does not meter requests or rate-limit execution. Pricing reflects the ongoing infrastructure required to maintain governance, security, and auditability at your deployment scale.

Offline-First Architecture:

  • Local LLM inference using on-device models (Phi-3, Mistral, Llama, etc.)
  • Automatic failover when cloud providers are unreachable
  • Works in air-gapped environments with zero cloud dependencies

Advanced AI Agents:

  • Reflection agents that critique and improve their own responses
  • Planning agents with chain-of-thought reasoning
  • Tool-calling agents that can execute HTTP requests, shell commands, and file operations
  • Multi-agent swarms for collaborative problem-solving

MCP Swarm Mode:

  • Peer-to-peer context sharing across multiple instances
  • Auto-discovery via mDNS (zero configuration)
  • Encrypted context storage with AES-256-GCM
  • Any instance can pick up where another left off

On-Device Training:

  • QLoRA fine-tuning directly on the device
  • Automatic training after N requests
  • Domain specialization without sending data off-device
  • Hot-swappable LoRA adapters

Robotics & Industrial Automation:

  • Control robots and autonomous systems with AI
  • Connect to ROS2 ecosystems for navigation and coordination
  • Deploy AI in safety-critical environments with compliance hooks
  • Manage fleets of edge devices from a central control plane

Advanced AI Capabilities:

  • Train models collaboratively across devices without sharing data
  • Automatically switch between models based on the task
  • Add human approval workflows for high-stakes decisions
  • Test your AI systems with chaos engineering before deployment

Quick Example

Before: Cloud-Only (Fails Offline)

import openai

# Requires internet connection
try:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
except Exception as e:
    # Application breaks when network is unavailable
    return error_response("AI unavailable")

After: Cloud + Local Fallback

from openai import OpenAI

# Point to Igris Runtime
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-required-for-local"  # Auth optional
)

# Works online AND offline
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# Runtime automatically chooses:
# - GPT-4 via cloud (if internet available)
# - Local Phi-3 model (if offline)
# User always gets a response

Core Features

1. Local LLM Fallback

When all cloud providers fail or are unreachable, Runtime automatically switches to an on-device model.

How it works:

  • Download a GGUF model (Phi-3, Mistral, Llama 3, etc.)
  • Configure the local fallback path
  • Runtime tries cloud first, falls back to local if needed

Result: Your AI features keep working even without internet.

2. Reflection Agents

Self-improving AI that critiques and refines its own responses.

Generate → Critique → Regenerate loop:

  • Model generates an initial response
  • Reflection agent scores quality (0.0-1.0)
  • If score is below threshold, model regenerates with critique feedback
  • Repeats until quality threshold is met or max iterations reached

Result: Higher quality responses with minimal human intervention.

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "phi3", "mode": "reflection", "messages": [...]}'

3. Planning Agents

Chain-of-thought reasoning with Plan → Act → Observe → Reflect loops.

How it works:

  • Agent breaks down complex tasks into steps
  • Executes each step sequentially
  • Observes results and reflects on progress
  • Adjusts plan based on observations

Result: Better handling of multi-step tasks and complex reasoning.

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "phi3", "mode": "planning", "messages": [...]}'

4. Tool Use

Let your local LLM call external tools: HTTP APIs, shell commands, file operations.

Available tools:

  • HTTP: Make GET/POST requests to external APIs
  • Shell: Execute shell commands (sandboxed)
  • Filesystem: Read, write, and list files

Security:

  • Whitelisting for allowed domains, commands, and paths
  • Timeout and concurrency limits
  • Full audit trail

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "phi3", "mode": "tools", "messages": [...]}'

5. Multi-Agent Swarms

Multiple specialized agents collaborate on complex tasks.

How it works:

  • Dynamic role assignment (researcher, engineer, critic, synthesizer)
  • Each agent contributes their perspective
  • Synthesizer combines inputs into final answer
  • Consensus voting for high-stakes decisions

Result: Better quality on complex reasoning tasks.

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "phi3", "mode": "swarm", "messages": [...]}'

6. MCP Swarm Mode

Peer-to-peer context sharing across Runtime instances.

How it works:

  • Instances auto-discover each other via mDNS
  • Conversation context syncs in real-time
  • All context encrypted at rest (AES-256-GCM)
  • Any instance can answer questions using shared context

Use cases:

  • Edge AI with multiple nodes
  • High-availability deployments
  • Distributed AI workloads

7. On-Device Training

Fine-tune your local model based on usage patterns.

How it works:

  • Runtime logs prompts and responses locally
  • After N requests (default: 100), training triggers automatically
  • Creates a small LoRA adapter (< 64 MB)
  • New adapter loads automatically, improving future responses

Result: Model becomes specialized to your domain without sending data off-device.

Security:

  • All training data stays on device
  • Adapters encrypted with device-specific keys
  • No network calls during training

Production Features

Deployment Flexibility

Run anywhere:

  • Bare metal: Single binary deployment
  • Docker: Official images available
  • Kubernetes: Helm charts included
  • Edge devices: Raspberry Pi, Jetson Nano
  • systemd: Production-ready service files

Platform Support

  • x86_64: Intel/AMD processors
  • ARM64: Raspberry Pi 4/5, Jetson, Apple Silicon
  • Linux: Ubuntu, Debian, Alpine (musl)
  • macOS: Intel and Apple Silicon

Security

  • TLS support: Post-quantum crypto with AWS LC
  • API authentication: Optional API key validation
  • Rate limiting: Token bucket per-provider
  • Encrypted storage: AES-256-GCM for sensitive data

Observability

  • Prometheus metrics: /metrics endpoint
  • Health checks: /v1/health endpoint
  • Swagger UI: Interactive API documentation
  • Structured logging: JSON logs for production

Getting Started

3-Step Quick Start

  1. Download a local model

    cd igris-runtime
    ./download-model.sh
    
  2. Configure fallback

    {
      local_fallback: {
        enabled: true,
        model_path: "models/phi-3-mini-4k-instruct-q4.gguf"
      }
    }
    
  3. Run the server

    cargo run --release
    

That's it! Runtime is now serving requests with automatic local fallback.

Get Started Now

Next Steps


Support