Introduction
TL;DR: Igris Runtime is a licensed governed execution engine for AI workloads. It executes with secure defaults, enforced resource limits, deterministic execution envelopes, and telemetry-backed observability. Runtime ensures execution safety and auditability whether running in cloud, edge, or air-gapped environments.
Licensing Model: Runtime is licensed software with recurring maintenance fees, not a usage-metered SaaS. Pricing reflects deployment scope, policy enforcement capabilities, telemetry infrastructure, and ongoing support—not request volume.
The Problem
Your application depends on cloud AI providers. Then the network goes down. Or you're deploying to an edge location with spotty connectivity. Or you need to run in an air-gapped environment for security compliance.
The result: Your AI-powered features stop working exactly when users need them most.
Traditional solutions force you to choose: cloud-only (unreliable in degraded networks) or local-only (limited model quality). Igris Runtime gives you both.
What Igris Runtime Does
Runtime executes AI workloads with governance, safety, and observability. Every execution happens within a deterministic envelope with resource safety limits, telemetry capture, and signed execution contracts.
# Make a request - works with or without internet
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'
# With internet: Routes to GPT-4
# Without internet: Automatically falls back to local Phi-3 model
# User gets a response either way
What You Get
Governed Execution:
- Secure-by-default execution environment
- Resource safety limits (tool calls, execution time, output size)
- Policy enforcement hooks
- Signed execution envelopes with cryptographic verification
Telemetry & Observability:
- Real-time execution telemetry
- Configurable retention periods (short/extended/90-day)
- Execution graph capture (DAG per request)
- Prometheus metrics and structured logging
Why Recurring Pricing:
Runtime is licensed software with recurring maintenance fees because deployment value scales with capabilities, not usage:
- Deployment Scope: Single-node → Multi-runtime → Fleet-wide coordination and management
- Policy Enforcement Engine: Continuous evolution of governance rules and safety limits
- Telemetry Infrastructure: 7-day → 30-day → 90-day retention with storage, streaming, and export
- Security & Updates: Priority patches, signed releases, and vulnerability remediation
- Compliance Support: Audit-ready execution traces, tamper-evident logs, retention guarantees
Runtime does not meter requests or rate-limit execution. Pricing reflects the ongoing infrastructure required to maintain governance, security, and auditability at your deployment scale.
Offline-First Architecture:
- Local LLM inference using on-device models (Phi-3, Mistral, Llama, etc.)
- Automatic failover when cloud providers are unreachable
- Works in air-gapped environments with zero cloud dependencies
Advanced AI Agents:
- Reflection agents that critique and improve their own responses
- Planning agents with chain-of-thought reasoning
- Tool-calling agents that can execute HTTP requests, shell commands, and file operations
- Multi-agent swarms for collaborative problem-solving
MCP Swarm Mode:
- Peer-to-peer context sharing across multiple instances
- Auto-discovery via mDNS (zero configuration)
- Encrypted context storage with AES-256-GCM
- Any instance can pick up where another left off
On-Device Training:
- QLoRA fine-tuning directly on the device
- Automatic training after N requests
- Domain specialization without sending data off-device
- Hot-swappable LoRA adapters
Robotics & Industrial Automation:
- Control robots and autonomous systems with AI
- Connect to ROS2 ecosystems for navigation and coordination
- Deploy AI in safety-critical environments with compliance hooks
- Manage fleets of edge devices from a central control plane
Advanced AI Capabilities:
- Train models collaboratively across devices without sharing data
- Automatically switch between models based on the task
- Add human approval workflows for high-stakes decisions
- Test your AI systems with chaos engineering before deployment
Quick Example
Before: Cloud-Only (Fails Offline)
import openai
# Requires internet connection
try:
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
except Exception as e:
# Application breaks when network is unavailable
return error_response("AI unavailable")
After: Cloud + Local Fallback
from openai import OpenAI
# Point to Igris Runtime
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-required-for-local" # Auth optional
)
# Works online AND offline
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
# Runtime automatically chooses:
# - GPT-4 via cloud (if internet available)
# - Local Phi-3 model (if offline)
# User always gets a response
Core Features
1. Local LLM Fallback
When all cloud providers fail or are unreachable, Runtime automatically switches to an on-device model.
How it works:
- Download a GGUF model (Phi-3, Mistral, Llama 3, etc.)
- Configure the local fallback path
- Runtime tries cloud first, falls back to local if needed
Result: Your AI features keep working even without internet.
2. Reflection Agents
Self-improving AI that critiques and refines its own responses.
Generate → Critique → Regenerate loop:
- Model generates an initial response
- Reflection agent scores quality (0.0-1.0)
- If score is below threshold, model regenerates with critique feedback
- Repeats until quality threshold is met or max iterations reached
Result: Higher quality responses with minimal human intervention.
Usage:
curl -X POST http://localhost:8080/v1/chat/completions \
-d '{"model": "phi3", "mode": "reflection", "messages": [...]}'
3. Planning Agents
Chain-of-thought reasoning with Plan → Act → Observe → Reflect loops.
How it works:
- Agent breaks down complex tasks into steps
- Executes each step sequentially
- Observes results and reflects on progress
- Adjusts plan based on observations
Result: Better handling of multi-step tasks and complex reasoning.
Usage:
curl -X POST http://localhost:8080/v1/chat/completions \
-d '{"model": "phi3", "mode": "planning", "messages": [...]}'
4. Tool Use
Let your local LLM call external tools: HTTP APIs, shell commands, file operations.
Available tools:
- HTTP: Make GET/POST requests to external APIs
- Shell: Execute shell commands (sandboxed)
- Filesystem: Read, write, and list files
Security:
- Whitelisting for allowed domains, commands, and paths
- Timeout and concurrency limits
- Full audit trail
Usage:
curl -X POST http://localhost:8080/v1/chat/completions \
-d '{"model": "phi3", "mode": "tools", "messages": [...]}'
5. Multi-Agent Swarms
Multiple specialized agents collaborate on complex tasks.
How it works:
- Dynamic role assignment (researcher, engineer, critic, synthesizer)
- Each agent contributes their perspective
- Synthesizer combines inputs into final answer
- Consensus voting for high-stakes decisions
Result: Better quality on complex reasoning tasks.
Usage:
curl -X POST http://localhost:8080/v1/chat/completions \
-d '{"model": "phi3", "mode": "swarm", "messages": [...]}'
6. MCP Swarm Mode
Peer-to-peer context sharing across Runtime instances.
How it works:
- Instances auto-discover each other via mDNS
- Conversation context syncs in real-time
- All context encrypted at rest (AES-256-GCM)
- Any instance can answer questions using shared context
Use cases:
- Edge AI with multiple nodes
- High-availability deployments
- Distributed AI workloads
7. On-Device Training
Fine-tune your local model based on usage patterns.
How it works:
- Runtime logs prompts and responses locally
- After N requests (default: 100), training triggers automatically
- Creates a small LoRA adapter (< 64 MB)
- New adapter loads automatically, improving future responses
Result: Model becomes specialized to your domain without sending data off-device.
Security:
- All training data stays on device
- Adapters encrypted with device-specific keys
- No network calls during training
Production Features
Deployment Flexibility
Run anywhere:
- Bare metal: Single binary deployment
- Docker: Official images available
- Kubernetes: Helm charts included
- Edge devices: Raspberry Pi, Jetson Nano
- systemd: Production-ready service files
Platform Support
- x86_64: Intel/AMD processors
- ARM64: Raspberry Pi 4/5, Jetson, Apple Silicon
- Linux: Ubuntu, Debian, Alpine (musl)
- macOS: Intel and Apple Silicon
Security
- TLS support: Post-quantum crypto with AWS LC
- API authentication: Optional API key validation
- Rate limiting: Token bucket per-provider
- Encrypted storage: AES-256-GCM for sensitive data
Observability
- Prometheus metrics:
/metricsendpoint - Health checks:
/v1/healthendpoint - Swagger UI: Interactive API documentation
- Structured logging: JSON logs for production
Getting Started
3-Step Quick Start
-
Download a local model
cd igris-runtime ./download-model.sh -
Configure fallback
{ local_fallback: { enabled: true, model_path: "models/phi-3-mini-4k-instruct-q4.gguf" } } -
Run the server
cargo run --release
That's it! Runtime is now serving requests with automatic local fallback.
Next Steps
- Quick Start Guide - Complete setup in 10 minutes
- Local Models - Download and configure models
- Deployment - Docker, K8s, bare metal
- API Reference - Complete endpoint docs
Support
- Documentation: You're reading it!
- GitHub: github.com/Igris-inertial/system
- Issues: Report bugs and request features