Introduction

TL;DR: Igris Runtime is a licensed governed execution engine for AI workloads. It executes with secure defaults, enforced resource limits, deterministic execution envelopes, and telemetry-backed observability. Runtime ensures execution safety and auditability whether running in cloud, edge, or air-gapped environments.

Licensing Model: Runtime is licensed software with recurring maintenance fees, not a usage-metered SaaS. Pricing reflects deployment scope, policy enforcement capabilities, telemetry infrastructure, and ongoing support—not request volume.

The Problem

Your application depends on cloud AI providers. Then the network goes down. Or you're deploying to an edge location with spotty connectivity. Or you need to run in an air-gapped environment for security compliance.

The result: Your AI-powered features stop working exactly when users need them most.

Traditional solutions force you to choose: cloud-only (unreliable in degraded networks) or local-only (limited model quality). Igris Runtime gives you both.

What Igris Runtime Does

Runtime executes AI workloads with governance, safety, and observability. Every execution happens within a deterministic envelope with resource safety limits, telemetry capture, and signed execution contracts.

# Make a request - works with or without internet
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

# With internet: Routes to GPT-4
# Without internet: Automatically falls back to local Phi-3 model
# User gets a response either way

What You Get

Governed Execution:

Secure-by-default execution environment
Resource safety limits (tool calls, execution time, output size)
Policy enforcement hooks
Signed execution envelopes with cryptographic verification

Telemetry & Observability:

Real-time execution telemetry
Configurable retention periods (short/extended/90-day)
Execution graph capture (DAG per request)
Prometheus metrics and structured logging

Why Recurring Pricing:

Runtime is licensed software with recurring maintenance fees because deployment value scales with capabilities, not usage:

Deployment Scope: Single-node → Multi-runtime → Fleet-wide coordination and management
Policy Enforcement Engine: Continuous evolution of governance rules and safety limits
Telemetry Infrastructure: 7-day → 30-day → 90-day retention with storage, streaming, and export
Security & Updates: Priority patches, signed releases, and vulnerability remediation
Compliance Support: Audit-ready execution traces, tamper-evident logs, retention guarantees

Runtime does not meter requests or rate-limit execution. Pricing reflects the ongoing infrastructure required to maintain governance, security, and auditability at your deployment scale.

Offline-First Architecture:

Local LLM inference using on-device models (Phi-3, Mistral, Llama, etc.)
Automatic failover when cloud providers are unreachable
Works in air-gapped environments with zero cloud dependencies

Advanced AI Agents:

Reflection agents that critique and improve their own responses
Planning agents with chain-of-thought reasoning
Tool-calling agents that can execute HTTP requests, shell commands, and file operations
Multi-agent swarms for collaborative problem-solving

MCP Swarm Mode:

Peer-to-peer context sharing across multiple instances
Auto-discovery via mDNS (zero configuration)
Encrypted context storage with AES-256-GCM
Any instance can pick up where another left off

On-Device Training:

QLoRA fine-tuning directly on the device
Automatic training after N requests
Domain specialization without sending data off-device
Hot-swappable LoRA adapters

Robotics & Industrial Automation:

Control robots and autonomous systems with AI
Connect to ROS2 ecosystems for navigation and coordination
Deploy AI in safety-critical environments with compliance hooks
Manage fleets of edge devices from a central control plane

Advanced AI Capabilities:

Train models collaboratively across devices without sharing data
Automatically switch between models based on the task
Add human approval workflows for high-stakes decisions
Test your AI systems with chaos engineering before deployment

Quick Example

Before: Cloud-Only (Fails Offline)

import openai

# Requires internet connection
try:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
except Exception as e:
    # Application breaks when network is unavailable
    return error_response("AI unavailable")

After: Cloud + Local Fallback

from openai import OpenAI

# Point to Igris Runtime
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-required-for-local"  # Auth optional
)

# Works online AND offline
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# Runtime automatically chooses:
# - GPT-4 via cloud (if internet available)
# - Local Phi-3 model (if offline)
# User always gets a response

Core Features

1. Local LLM Fallback

When all cloud providers fail or are unreachable, Runtime automatically switches to an on-device model.

How it works:

Download a GGUF model (Phi-3, Mistral, Llama 3, etc.)
Configure the local fallback path
Runtime tries cloud first, falls back to local if needed

Result: Your AI features keep working even without internet.

2. Reflection Agents

Self-improving AI that critiques and refines its own responses.

Generate → Critique → Regenerate loop:

Model generates an initial response
Reflection agent scores quality (0.0-1.0)
If score is below threshold, model regenerates with critique feedback
Repeats until quality threshold is met or max iterations reached

Result: Higher quality responses with minimal human intervention.

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "phi3", "mode": "reflection", "messages": [...]}'

3. Planning Agents

Chain-of-thought reasoning with Plan → Act → Observe → Reflect loops.

How it works:

Agent breaks down complex tasks into steps
Executes each step sequentially
Observes results and reflects on progress
Adjusts plan based on observations

Result: Better handling of multi-step tasks and complex reasoning.

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "phi3", "mode": "planning", "messages": [...]}'

4. Tool Use

Let your local LLM call external tools: HTTP APIs, shell commands, file operations.

Available tools:

HTTP: Make GET/POST requests to external APIs
Shell: Execute shell commands (sandboxed)
Filesystem: Read, write, and list files

Security:

Whitelisting for allowed domains, commands, and paths
Timeout and concurrency limits
Full audit trail

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "phi3", "mode": "tools", "messages": [...]}'

5. Multi-Agent Swarms

Multiple specialized agents collaborate on complex tasks.

How it works:

Dynamic role assignment (researcher, engineer, critic, synthesizer)
Each agent contributes their perspective
Synthesizer combines inputs into final answer
Consensus voting for high-stakes decisions

Result: Better quality on complex reasoning tasks.

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "phi3", "mode": "swarm", "messages": [...]}'

6. MCP Swarm Mode

Peer-to-peer context sharing across Runtime instances.

How it works:

Instances auto-discover each other via mDNS
Conversation context syncs in real-time
All context encrypted at rest (AES-256-GCM)
Any instance can answer questions using shared context

Use cases:

Edge AI with multiple nodes
High-availability deployments
Distributed AI workloads

7. On-Device Training

Fine-tune your local model based on usage patterns.

How it works:

Runtime logs prompts and responses locally
After N requests (default: 100), training triggers automatically
Creates a small LoRA adapter (< 64 MB)
New adapter loads automatically, improving future responses

Result: Model becomes specialized to your domain without sending data off-device.

Security:

All training data stays on device
Adapters encrypted with device-specific keys
No network calls during training

Production Features

Deployment Flexibility

Run anywhere:

Bare metal: Single binary deployment
Docker: Official images available
Kubernetes: Helm charts included
Edge devices: Raspberry Pi, Jetson Nano
systemd: Production-ready service files

Platform Support

x86_64: Intel/AMD processors
ARM64: Raspberry Pi 4/5, Jetson, Apple Silicon
Linux: Ubuntu, Debian, Alpine (musl)
macOS: Intel and Apple Silicon

Security

TLS support: Post-quantum crypto with AWS LC
API authentication: Optional API key validation
Rate limiting: Token bucket per-provider
Encrypted storage: AES-256-GCM for sensitive data

Observability

Prometheus metrics: /metrics endpoint
Health checks: /v1/health endpoint
Swagger UI: Interactive API documentation
Structured logging: JSON logs for production

Getting Started

3-Step Quick Start

Download a local model
```
cd igris-runtime
./download-model.sh
```

Configure fallback

{
  local_fallback: {
    enabled: true,
    model_path: "models/phi-3-mini-4k-instruct-q4.gguf"
  }
}

Run the server
```
cargo run --release
```

That's it! Runtime is now serving requests with automatic local fallback.

Get Started Now →

Next Steps

Quick Start Guide - Complete setup in 10 minutes
Local Models - Download and configure models
Deployment - Docker, K8s, bare metal
API Reference - Complete endpoint docs

Support

Documentation: You're reading it!
GitHub: github.com/Igris-inertial/system
Issues: Report bugs and request features