Architecture

Understand how Igris Runtime works under the hood.


System Overview

Igris Runtime is built with Rust for maximum performance, reliability, and safety. It provides local AI inference with automatic cloud fallback.

┌─────────────────────────────────────────────────────┐
│                   USER REQUEST                       │
└────────────────────┬────────────────────────────────┘
                     │
          ┌──────────▼──────────┐
          │   HTTP Server        │
          │   (Axum + OpenAPI)  │
          └──────────┬──────────┘
                     │
          ┌──────────▼──────────────┐
          │   Request Router         │
          │   - Mode detection       │
          │   - Provider selection   │
          └──────────┬──────────────┘
                     │
         ┌───────────┴───────────┐
         │                       │
    ┌────▼─────┐           ┌────▼─────┐
    │  Cloud    │          │  Local    │
    │ Providers │          │   LLM     │
    │ (Optional)│          │(llama.cpp)│
    └────┬─────┘           └────┬─────┘
         │                       │
         └───────────┬───────────┘
                     │
          ┌──────────▼──────────┐
          │   Response          │
          └─────────────────────┘

Core Components

1. HTTP Server (Axum)

  • OpenAI-compatible API
  • Streaming support (SSE)
  • CORS and middleware
  • Swagger UI documentation

2. Request Router

Determines how to handle each request:

  • mode: "reflection" → Reflection Agent
  • mode: "planning" → Planning Agent
  • mode: "tools" → Tool Agent
  • mode: "swarm" → Multi-Agent Swarm
  • Default → Direct inference with fallback

3. Local LLM Engine

  • llama.cpp integration: Native performance
  • GGUF model support: Phi-3, Mistral, Llama, etc.
  • CPU/GPU: Automatic layer offloading
  • KV cache: Prompt caching for speed
  • Streaming: Token-by-token generation

4. Cloud Provider Layer (Optional)

  • Parallel provider support
  • Automatic failover
  • Cost tracking
  • Circuit breakers

5. Robotics & Edge Components (Phase 2)

ROS2 Integration (igris-ros2):

  • DDS pub/sub messaging
  • Nav2 autonomous navigation
  • Multi-robot coordination

Sensor & Actuator Tooling (igris-sensors):

  • GPIO control (Raspberry Pi)
  • Camera interfaces
  • LIDAR integration

Safety & Certification (igris-safety):

  • Watchdog timers
  • Safety modes (Normal/FailSafe/EmergencyStop)
  • ISO 26262 / IEC 61508 hooks
  • Audit logging

Swarm Coordination (igris-swarm):

  • Raft-style leader election
  • Distributed task voting
  • Conflict resolution

Fleet Management (igris-fleet):

  • Centralized control via Overture
  • Agent registration
  • Real-time telemetry
  • Config synchronization

6. Advanced Intelligence Components (Phase 3)

Federated Learning (igris-federated):

  • Privacy-preserving collaborative learning
  • QLoRA aggregation (4 strategies)
  • Differential privacy
  • Secure aggregation

Dynamic Model Manager (igris-model-manager):

  • Task-based model selection
  • Hot-swapping (< 200ms)
  • LRU unloading
  • Usage analytics

Human-in-the-Loop (igris-hitl):

  • Approval workflows
  • Auto-approve thresholds
  • Context snapshots
  • REST API integration

Simulation & Testing (igris-simulation):

  • Virtual swarm environments
  • Chaos engineering
  • Benchmarking suite
  • Gazebo/Isaac Sim integration

Request Flow

Standard Request

1. Request arrives at /v1/chat/completions
2. Router checks mode parameter
3. If cloud configured:
   a. Try cloud providers (with timeout)
   b. On failure, fallback to local
4. If no cloud or local-only:
   a. Use local model directly
5. Generate response
6. Return to client

With Reflection

1. Request with mode: "reflection"
2. Generate initial response
3. Critique response (score 0.0-1.0)
4. If score < threshold:
   a. Regenerate with critique feedback
   b. Repeat steps 3-4
5. Return final improved response

With Tools

1. Request with mode: "tools"
2. Model generates tool calls
3. Runtime executes tools (HTTP/Shell/FS)
4. Feed results back to model
5. Model decides: more tools or final answer
6. Return result

Data Flow

Local Inference Path

Request → Prompt → llama.cpp → Tokens → Response
  • No network calls
  • All data stays on device
  • Works 100% offline

Hybrid Path (Cloud + Local)

Request → Cloud API (timeout) → Local Fallback → Response
  • Cloud first for quality
  • Local fallback for reliability
  • Transparent to user

Storage Architecture

Embedded Database (Redb)

  • Purpose: Metadata, metrics, training data
  • Type: ACID-compliant embedded DB
  • Location: igris.db
  • No external database required

Model Storage

  • Format: GGUF files
  • Location: models/ directory
  • Size: 2-8 GB per model
  • Loading: Memory-mapped for efficiency

LoRA Adapters

  • Format: GGUF adapters
  • Location: lora_adapters/ directory
  • Size: 32-64 MB each
  • Encryption: AES-256-GCM at rest

MCP Context Storage

  • Format: Encrypted key-value store
  • Location: mcp_contexts.db
  • Encryption: AES-256-GCM
  • Sync: Real-time via mDNS/multicast

Concurrency Model

Async Runtime (Tokio)

  • Multi-threaded: Work-stealing scheduler
  • Non-blocking I/O: Efficient resource usage
  • Streaming: SSE via async streams

Inference Threading

  • Model loading: One-time on startup
  • Inference: Configurable threads (CPU cores)
  • Tool execution: Semaphore-limited concurrency

Security Architecture

Input Validation

  • Request size limits
  • Parameter validation
  • JSON schema enforcement

Tool Sandboxing

  • Whitelisting: Domains, commands, paths
  • Timeout enforcement
  • Concurrent execution limits

Data Protection

  • At rest: AES-256-GCM encryption
  • In transit: Optional TLS
  • Secrets: Environment variable substitution

Authentication

  • Optional API key validation
  • Rate limiting per client
  • Request audit logging

Performance Optimizations

1. KV Cache

Reuses computed key-value pairs across requests:

{
  local_fallback: {
    prompt_cache_dir: "prompt_cache"
  }
}

Benefit: 2-3x faster for repeated prompts.

2. Batching

Processes multiple tokens at once:

{
  local_fallback: {
    batch_size: 512
  }
}

Benefit: Better throughput.

3. GPU Offloading

Moves computation to GPU:

{
  local_fallback: {
    n_gpu_layers: 32
  }
}

Benefit: 5-10x faster inference.

4. Speculative Execution

Races multiple providers in parallel:

{
  routing: {
    speculative: {
      max_providers: 3
    }
  }
}

Benefit: Lower latency (first to respond wins).


Deployment Patterns

Single Instance

┌──────────────┐
│   Runtime     │
│   Instance    │
│  (Standalone) │
└──────────────┘

Use case: Development, small deployments.

Load Balanced

        ┌──────────────┐
        │ Load Balancer │
        └───────┬───────┘
         ┌──────┴──────┐
    ┌────▼────┐   ┌────▼────┐
    │Runtime 1│   │Runtime 2│
    └─────────┘   └─────────┘

Use case: Scale horizontally for throughput.

MCP Swarm

    ┌──────────┐   ┌──────────┐
    │Runtime 1 │◄─►│Runtime 2 │
    └────┬─────┘   └─────┬────┘
         │   MCP Sync    │
         └───────┬───────┘
            ┌────▼────┐
            │Runtime 3│
            └─────────┘

Use case: Distributed context sharing.


Technology Stack

Core

  • Language: Rust (1.75+)
  • Async runtime: Tokio
  • HTTP server: Axum
  • Database: Redb

ML/AI

  • Local inference: llama.cpp (via llama-cpp-rs)
  • Model format: GGUF
  • Fine-tuning: QLoRA via llama-finetune
  • Federated learning: Custom implementation with differential privacy
  • Model management: Dynamic selection and hot-swapping

Robotics & Edge

  • ROS2: rclrs (Rust client library)
  • Navigation: Nav2 action client
  • GPIO: rppal (Raspberry Pi GPIO)
  • Camera: opencv-rust bindings
  • LIDAR: Custom point cloud processing

Networking

  • MCP discovery: mDNS (mdns-sd)
  • Context sync: UDP multicast
  • Metrics: Prometheus (built-in)
  • Fleet control: TLS-secured HTTP/2

Security

  • Encryption: AES-256-GCM (aes-gcm)
  • TLS: Rustls with AWS LC
  • Hashing: SHA-256
  • Safety certification: ISO 26262 / IEC 61508 hooks
  • Audit logging: Tamper-evident records

Simulation & Testing

  • Virtual environments: Custom simulation engine
  • Chaos engineering: Failure injection framework
  • Benchmarking: Performance testing suite
  • ROS integration: Gazebo / Isaac Sim support

Comparison to Other Solutions

AspectIgris RuntimeOllamaLM Studio
LanguageRustGoJavaScript
Cloud fallback✅ Automatic❌ No❌ No
Agents✅ Reflection, Planning, Swarm❌ No❌ No
Tool use✅ HTTP, Shell, FS❌ No❌ No
MCP Swarm✅ P2P sync❌ No❌ No
Training✅ On-device QLoRA❌ No❌ No
APIOpenAI-compatibleOpenAI-compatibleOpenAI-compatible

Future Architecture

Planned enhancements:

  • Model hot-swapping: Switch models without restart
  • Multi-model support: Run multiple models concurrently
  • WebAssembly plugins: Extend functionality via WASM
  • Distributed training: Share training across swarm
  • Edge optimization: Further reduce binary size

Next Steps