Architecture
Understand how Igris Runtime works under the hood.
System Overview
Igris Runtime is built with Rust for maximum performance, reliability, and safety. It provides local AI inference with automatic cloud fallback.
┌─────────────────────────────────────────────────────┐
│ USER REQUEST │
└────────────────────┬────────────────────────────────┘
│
┌──────────▼──────────┐
│ HTTP Server │
│ (Axum + OpenAPI) │
└──────────┬──────────┘
│
┌──────────▼──────────────┐
│ Request Router │
│ - Mode detection │
│ - Provider selection │
└──────────┬──────────────┘
│
┌───────────┴───────────┐
│ │
┌────▼─────┐ ┌────▼─────┐
│ Cloud │ │ Local │
│ Providers │ │ LLM │
│ (Optional)│ │(llama.cpp)│
└────┬─────┘ └────┬─────┘
│ │
└───────────┬───────────┘
│
┌──────────▼──────────┐
│ Response │
└─────────────────────┘
Core Components
1. HTTP Server (Axum)
- OpenAI-compatible API
- Streaming support (SSE)
- CORS and middleware
- Swagger UI documentation
2. Request Router
Determines how to handle each request:
mode: "reflection"→ Reflection Agentmode: "planning"→ Planning Agentmode: "tools"→ Tool Agentmode: "swarm"→ Multi-Agent Swarm- Default → Direct inference with fallback
3. Local LLM Engine
- llama.cpp integration: Native performance
- GGUF model support: Phi-3, Mistral, Llama, etc.
- CPU/GPU: Automatic layer offloading
- KV cache: Prompt caching for speed
- Streaming: Token-by-token generation
4. Cloud Provider Layer (Optional)
- Parallel provider support
- Automatic failover
- Cost tracking
- Circuit breakers
5. Robotics & Edge Components (Phase 2)
ROS2 Integration (igris-ros2):
- DDS pub/sub messaging
- Nav2 autonomous navigation
- Multi-robot coordination
Sensor & Actuator Tooling (igris-sensors):
- GPIO control (Raspberry Pi)
- Camera interfaces
- LIDAR integration
Safety & Certification (igris-safety):
- Watchdog timers
- Safety modes (Normal/FailSafe/EmergencyStop)
- ISO 26262 / IEC 61508 hooks
- Audit logging
Swarm Coordination (igris-swarm):
- Raft-style leader election
- Distributed task voting
- Conflict resolution
Fleet Management (igris-fleet):
- Centralized control via Overture
- Agent registration
- Real-time telemetry
- Config synchronization
6. Advanced Intelligence Components (Phase 3)
Federated Learning (igris-federated):
- Privacy-preserving collaborative learning
- QLoRA aggregation (4 strategies)
- Differential privacy
- Secure aggregation
Dynamic Model Manager (igris-model-manager):
- Task-based model selection
- Hot-swapping (< 200ms)
- LRU unloading
- Usage analytics
Human-in-the-Loop (igris-hitl):
- Approval workflows
- Auto-approve thresholds
- Context snapshots
- REST API integration
Simulation & Testing (igris-simulation):
- Virtual swarm environments
- Chaos engineering
- Benchmarking suite
- Gazebo/Isaac Sim integration
Request Flow
Standard Request
1. Request arrives at /v1/chat/completions
2. Router checks mode parameter
3. If cloud configured:
a. Try cloud providers (with timeout)
b. On failure, fallback to local
4. If no cloud or local-only:
a. Use local model directly
5. Generate response
6. Return to client
With Reflection
1. Request with mode: "reflection"
2. Generate initial response
3. Critique response (score 0.0-1.0)
4. If score < threshold:
a. Regenerate with critique feedback
b. Repeat steps 3-4
5. Return final improved response
With Tools
1. Request with mode: "tools"
2. Model generates tool calls
3. Runtime executes tools (HTTP/Shell/FS)
4. Feed results back to model
5. Model decides: more tools or final answer
6. Return result
Data Flow
Local Inference Path
Request → Prompt → llama.cpp → Tokens → Response
- No network calls
- All data stays on device
- Works 100% offline
Hybrid Path (Cloud + Local)
Request → Cloud API (timeout) → Local Fallback → Response
- Cloud first for quality
- Local fallback for reliability
- Transparent to user
Storage Architecture
Embedded Database (Redb)
- Purpose: Metadata, metrics, training data
- Type: ACID-compliant embedded DB
- Location:
igris.db - No external database required
Model Storage
- Format: GGUF files
- Location:
models/directory - Size: 2-8 GB per model
- Loading: Memory-mapped for efficiency
LoRA Adapters
- Format: GGUF adapters
- Location:
lora_adapters/directory - Size: 32-64 MB each
- Encryption: AES-256-GCM at rest
MCP Context Storage
- Format: Encrypted key-value store
- Location:
mcp_contexts.db - Encryption: AES-256-GCM
- Sync: Real-time via mDNS/multicast
Concurrency Model
Async Runtime (Tokio)
- Multi-threaded: Work-stealing scheduler
- Non-blocking I/O: Efficient resource usage
- Streaming: SSE via async streams
Inference Threading
- Model loading: One-time on startup
- Inference: Configurable threads (CPU cores)
- Tool execution: Semaphore-limited concurrency
Security Architecture
Input Validation
- Request size limits
- Parameter validation
- JSON schema enforcement
Tool Sandboxing
- Whitelisting: Domains, commands, paths
- Timeout enforcement
- Concurrent execution limits
Data Protection
- At rest: AES-256-GCM encryption
- In transit: Optional TLS
- Secrets: Environment variable substitution
Authentication
- Optional API key validation
- Rate limiting per client
- Request audit logging
Performance Optimizations
1. KV Cache
Reuses computed key-value pairs across requests:
{
local_fallback: {
prompt_cache_dir: "prompt_cache"
}
}
Benefit: 2-3x faster for repeated prompts.
2. Batching
Processes multiple tokens at once:
{
local_fallback: {
batch_size: 512
}
}
Benefit: Better throughput.
3. GPU Offloading
Moves computation to GPU:
{
local_fallback: {
n_gpu_layers: 32
}
}
Benefit: 5-10x faster inference.
4. Speculative Execution
Races multiple providers in parallel:
{
routing: {
speculative: {
max_providers: 3
}
}
}
Benefit: Lower latency (first to respond wins).
Deployment Patterns
Single Instance
┌──────────────┐
│ Runtime │
│ Instance │
│ (Standalone) │
└──────────────┘
Use case: Development, small deployments.
Load Balanced
┌──────────────┐
│ Load Balancer │
└───────┬───────┘
┌──────┴──────┐
┌────▼────┐ ┌────▼────┐
│Runtime 1│ │Runtime 2│
└─────────┘ └─────────┘
Use case: Scale horizontally for throughput.
MCP Swarm
┌──────────┐ ┌──────────┐
│Runtime 1 │◄─►│Runtime 2 │
└────┬─────┘ └─────┬────┘
│ MCP Sync │
└───────┬───────┘
┌────▼────┐
│Runtime 3│
└─────────┘
Use case: Distributed context sharing.
Technology Stack
Core
- Language: Rust (1.75+)
- Async runtime: Tokio
- HTTP server: Axum
- Database: Redb
ML/AI
- Local inference: llama.cpp (via llama-cpp-rs)
- Model format: GGUF
- Fine-tuning: QLoRA via llama-finetune
- Federated learning: Custom implementation with differential privacy
- Model management: Dynamic selection and hot-swapping
Robotics & Edge
- ROS2: rclrs (Rust client library)
- Navigation: Nav2 action client
- GPIO: rppal (Raspberry Pi GPIO)
- Camera: opencv-rust bindings
- LIDAR: Custom point cloud processing
Networking
- MCP discovery: mDNS (mdns-sd)
- Context sync: UDP multicast
- Metrics: Prometheus (built-in)
- Fleet control: TLS-secured HTTP/2
Security
- Encryption: AES-256-GCM (aes-gcm)
- TLS: Rustls with AWS LC
- Hashing: SHA-256
- Safety certification: ISO 26262 / IEC 61508 hooks
- Audit logging: Tamper-evident records
Simulation & Testing
- Virtual environments: Custom simulation engine
- Chaos engineering: Failure injection framework
- Benchmarking: Performance testing suite
- ROS integration: Gazebo / Isaac Sim support
Comparison to Other Solutions
| Aspect | Igris Runtime | Ollama | LM Studio |
|---|---|---|---|
| Language | Rust | Go | JavaScript |
| Cloud fallback | ✅ Automatic | ❌ No | ❌ No |
| Agents | ✅ Reflection, Planning, Swarm | ❌ No | ❌ No |
| Tool use | ✅ HTTP, Shell, FS | ❌ No | ❌ No |
| MCP Swarm | ✅ P2P sync | ❌ No | ❌ No |
| Training | ✅ On-device QLoRA | ❌ No | ❌ No |
| API | OpenAI-compatible | OpenAI-compatible | OpenAI-compatible |
Future Architecture
Planned enhancements:
- Model hot-swapping: Switch models without restart
- Multi-model support: Run multiple models concurrently
- WebAssembly plugins: Extend functionality via WASM
- Distributed training: Share training across swarm
- Edge optimization: Further reduce binary size
Next Steps
- Configuration - Configure Runtime
- Deployment - Deploy to production
- Core Features - Explore advanced features