Architecture

Understand how Igris Runtime works under the hood.

System Overview

Igris Runtime is built with Rust for maximum performance, reliability, and safety. It provides local AI inference with automatic cloud fallback.

┌─────────────────────────────────────────────────────┐
│                   USER REQUEST                       │
└────────────────────┬────────────────────────────────┘
                     │
          ┌──────────▼──────────┐
          │   HTTP Server        │
          │   (Axum + OpenAPI)  │
          └──────────┬──────────┘
                     │
          ┌──────────▼──────────────┐
          │   Request Router         │
          │   - Mode detection       │
          │   - Provider selection   │
          └──────────┬──────────────┘
                     │
         ┌───────────┴───────────┐
         │                       │
    ┌────▼─────┐           ┌────▼─────┐
    │  Cloud    │          │  Local    │
    │ Providers │          │   LLM     │
    │ (Optional)│          │(llama.cpp)│
    └────┬─────┘           └────┬─────┘
         │                       │
         └───────────┬───────────┘
                     │
          ┌──────────▼──────────┐
          │   Response          │
          └─────────────────────┘

Core Components

1. HTTP Server (Axum)

OpenAI-compatible API
Streaming support (SSE)
CORS and middleware
Swagger UI documentation

2. Request Router

Determines how to handle each request:

mode: "reflection" → Reflection Agent
mode: "planning" → Planning Agent
mode: "tools" → Tool Agent
mode: "swarm" → Multi-Agent Swarm
Default → Direct inference with fallback

3. Local LLM Engine

llama.cpp integration: Native performance
GGUF model support: Phi-3, Mistral, Llama, etc.
CPU/GPU: Automatic layer offloading
KV cache: Prompt caching for speed
Streaming: Token-by-token generation

4. Cloud Provider Layer (Optional)

Parallel provider support
Automatic failover
Cost tracking
Circuit breakers

5. Robotics & Edge Components (Phase 2)

ROS2 Integration (igris-ros2):

DDS pub/sub messaging
Nav2 autonomous navigation
Multi-robot coordination

Sensor & Actuator Tooling (igris-sensors):

GPIO control (Raspberry Pi)
Camera interfaces
LIDAR integration

Safety & Certification (igris-safety):

Watchdog timers
Safety modes (Normal/FailSafe/EmergencyStop)
ISO 26262 / IEC 61508 hooks
Audit logging

Swarm Coordination (igris-swarm):

Raft-style leader election
Distributed task voting
Conflict resolution

Fleet Management (igris-fleet):

Centralized control via Overture
Agent registration
Real-time telemetry
Config synchronization

6. Advanced Intelligence Components (Phase 3)

Federated Learning (igris-federated):

Privacy-preserving collaborative learning
QLoRA aggregation (4 strategies)
Differential privacy
Secure aggregation

Dynamic Model Manager (igris-model-manager):

Task-based model selection
Hot-swapping (< 200ms)
LRU unloading
Usage analytics

Human-in-the-Loop (igris-hitl):

Approval workflows
Auto-approve thresholds
Context snapshots
REST API integration

Simulation & Testing (igris-simulation):

Virtual swarm environments
Chaos engineering
Benchmarking suite
Gazebo/Isaac Sim integration

Request Flow

Standard Request

1. Request arrives at /v1/chat/completions
2. Router checks mode parameter
3. If cloud configured:
   a. Try cloud providers (with timeout)
   b. On failure, fallback to local
4. If no cloud or local-only:
   a. Use local model directly
5. Generate response
6. Return to client

With Reflection

1. Request with mode: "reflection"
2. Generate initial response
3. Critique response (score 0.0-1.0)
4. If score < threshold:
   a. Regenerate with critique feedback
   b. Repeat steps 3-4
5. Return final improved response

With Tools

1. Request with mode: "tools"
2. Model generates tool calls
3. Runtime executes tools (HTTP/Shell/FS)
4. Feed results back to model
5. Model decides: more tools or final answer
6. Return result

Data Flow

Local Inference Path

Request → Prompt → llama.cpp → Tokens → Response

No network calls
All data stays on device
Works 100% offline

Hybrid Path (Cloud + Local)

Request → Cloud API (timeout) → Local Fallback → Response

Cloud first for quality
Local fallback for reliability
Transparent to user

Storage Architecture

Embedded Database (Redb)

Purpose: Metadata, metrics, training data
Type: ACID-compliant embedded DB
Location: igris.db
No external database required

Model Storage

Format: GGUF files
Location: models/ directory
Size: 2-8 GB per model
Loading: Memory-mapped for efficiency

LoRA Adapters

Format: GGUF adapters
Location: lora_adapters/ directory
Size: 32-64 MB each
Encryption: AES-256-GCM at rest

MCP Context Storage

Format: Encrypted key-value store
Location: mcp_contexts.db
Encryption: AES-256-GCM
Sync: Real-time via mDNS/multicast

Concurrency Model

Async Runtime (Tokio)

Multi-threaded: Work-stealing scheduler
Non-blocking I/O: Efficient resource usage
Streaming: SSE via async streams

Inference Threading

Model loading: One-time on startup
Inference: Configurable threads (CPU cores)
Tool execution: Semaphore-limited concurrency

Security Architecture

Input Validation

Request size limits
Parameter validation
JSON schema enforcement

Tool Sandboxing

Whitelisting: Domains, commands, paths
Timeout enforcement
Concurrent execution limits

Data Protection

At rest: AES-256-GCM encryption
In transit: Optional TLS
Secrets: Environment variable substitution

Authentication

Optional API key validation
Rate limiting per client
Request audit logging

Performance Optimizations

1. KV Cache

Reuses computed key-value pairs across requests:

{
  local_fallback: {
    prompt_cache_dir: "prompt_cache"
  }
}

Benefit: 2-3x faster for repeated prompts.

2. Batching

Processes multiple tokens at once:

{
  local_fallback: {
    batch_size: 512
  }
}

Benefit: Better throughput.

3. GPU Offloading

Moves computation to GPU:

{
  local_fallback: {
    n_gpu_layers: 32
  }
}

Benefit: 5-10x faster inference.

4. Speculative Execution

Races multiple providers in parallel:

{
  routing: {
    speculative: {
      max_providers: 3
    }
  }
}

Benefit: Lower latency (first to respond wins).

Deployment Patterns

Single Instance

┌──────────────┐
│   Runtime     │
│   Instance    │
│  (Standalone) │
└──────────────┘

Use case: Development, small deployments.

Load Balanced

        ┌──────────────┐
        │ Load Balancer │
        └───────┬───────┘
         ┌──────┴──────┐
    ┌────▼────┐   ┌────▼────┐
    │Runtime 1│   │Runtime 2│
    └─────────┘   └─────────┘

Use case: Scale horizontally for throughput.

MCP Swarm

    ┌──────────┐   ┌──────────┐
    │Runtime 1 │◄─►│Runtime 2 │
    └────┬─────┘   └─────┬────┘
         │   MCP Sync    │
         └───────┬───────┘
            ┌────▼────┐
            │Runtime 3│
            └─────────┘

Use case: Distributed context sharing.

Technology Stack

Core

Language: Rust (1.75+)
Async runtime: Tokio
HTTP server: Axum
Database: Redb

ML/AI

Local inference: llama.cpp (via llama-cpp-rs)
Model format: GGUF
Fine-tuning: QLoRA via llama-finetune
Federated learning: Custom implementation with differential privacy
Model management: Dynamic selection and hot-swapping

Robotics & Edge

ROS2: rclrs (Rust client library)
Navigation: Nav2 action client
GPIO: rppal (Raspberry Pi GPIO)
Camera: opencv-rust bindings
LIDAR: Custom point cloud processing

Networking

MCP discovery: mDNS (mdns-sd)
Context sync: UDP multicast
Metrics: Prometheus (built-in)
Fleet control: TLS-secured HTTP/2

Security

Encryption: AES-256-GCM (aes-gcm)
TLS: Rustls with AWS LC
Hashing: SHA-256
Safety certification: ISO 26262 / IEC 61508 hooks
Audit logging: Tamper-evident records

Simulation & Testing

Virtual environments: Custom simulation engine
Chaos engineering: Failure injection framework
Benchmarking: Performance testing suite
ROS integration: Gazebo / Isaac Sim support

Comparison to Other Solutions

Aspect	Igris Runtime	Ollama	LM Studio
Language	Rust	Go	JavaScript
Cloud fallback	✅ Automatic	❌ No	❌ No
Agents	✅ Reflection, Planning, Swarm	❌ No	❌ No
Tool use	✅ HTTP, Shell, FS	❌ No	❌ No
MCP Swarm	✅ P2P sync	❌ No	❌ No
Training	✅ On-device QLoRA	❌ No	❌ No
API	OpenAI-compatible	OpenAI-compatible	OpenAI-compatible

Future Architecture

Planned enhancements:

Model hot-swapping: Switch models without restart
Multi-model support: Run multiple models concurrently
WebAssembly plugins: Extend functionality via WASM
Distributed training: Share training across swarm
Edge optimization: Further reduce binary size

Next Steps

Configuration - Configure Runtime
Deployment - Deploy to production
Core Features - Explore advanced features