Visualization & Monitoring

Real-time visualization and monitoring system for debugging and production observability.


Overview

The TreeVisualizer provides comprehensive observability into behavior tree execution:

  • Real-time tree state export - Complete tree structure with current status
  • Execution tracing - Tick-by-tick history of node executions
  • Metrics collection - Replan count, tick rate, LLM latency, failure rates
  • Replan tracking - Before/after subtree diffs for LLM replanning events
  • JSON export - Dashboard-ready format for WebSocket/REST integration
  • Performance optimized - <5ms export overhead, non-blocking

Quick Start

Basic Setup

use igris_btree::prelude::*;
use igris_btree::visualizer::{TreeVisualizer, VisualizerConfig};
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create visualizer
    let visualizer = Arc::new(TreeVisualizer::new());

    // Create executor with visualizer
    let executor = BTreeExecutor::new()
        .with_visualizer(visualizer.clone());

    // Build and execute tree
    let mut tree = Sequence::new("mission");
    let mut context = BTreeContext::new();

    let result = executor.execute(&mut tree, &mut context).await?;

    // Export final snapshot
    let snapshot = visualizer
        .export_snapshot(&tree, &context, result.tick_count)
        .await?;

    // Save to file
    let json = serde_json::to_string_pretty(&snapshot)?;
    std::fs::write("tree_snapshot.json", json)?;

    Ok(())
}

Configuration

VisualizerConfig

pub struct VisualizerConfig {
    /// Enable/disable visualization
    pub enabled: bool,

    /// Maximum trace entries to keep (rolling window)
    pub max_trace_entries: usize,

    /// Maximum replan events to keep
    pub max_replan_events: usize,

    /// Optional filter for blackboard keys
    pub blackboard_filter: Option<Vec<String>>,

    /// Export frequency (0 = every tick, N = every N ticks)
    pub export_frequency: u64,

    /// Compute diffs between snapshots
    pub compute_diffs: bool,
}

Configuration Examples

High-Frequency Monitoring

For development and debugging:

let config = VisualizerConfig {
    enabled: true,
    max_trace_entries: 200,
    max_replan_events: 20,
    export_frequency: 0,  // Every tick
    compute_diffs: true,
    blackboard_filter: None,  // All keys
};

let visualizer = Arc::new(TreeVisualizer::with_config(config));

Production Optimized

For production monitoring:

let config = VisualizerConfig {
    enabled: true,
    max_trace_entries: 50,
    max_replan_events: 5,
    export_frequency: 10,  // Every 10 ticks
    compute_diffs: true,
    blackboard_filter: Some(vec![
        "mission_task".to_string(),
        "status".to_string(),
    ]),
};

Debug Mode

For intensive debugging:

let config = VisualizerConfig {
    enabled: true,
    max_trace_entries: 1000,
    max_replan_events: 50,
    export_frequency: 0,
    compute_diffs: false,  // Save CPU
    blackboard_filter: None,
};

TreeSnapshot

Complete tree state at a point in time.

Structure

pub struct TreeSnapshot {
    /// Timestamp in milliseconds since epoch
    pub timestamp_ms: u64,

    /// Current tick count
    pub tick_count: u64,

    /// Root node with recursive children
    pub root: NodeSnapshot,

    /// Current blackboard state
    pub blackboard: HashMap<String, Value>,

    /// Execution trace (recent entries)
    pub execution_trace: Vec<ExecutionTraceEntry>,

    /// Replan events
    pub replan_events: Vec<ReplanEvent>,

    /// Aggregated metrics
    pub metrics: MetricsSummary,
}

Example JSON

{
  "timestamp_ms": 1770033271626,
  "tick_count": 5,
  "root": {
    "id": "root",
    "name": "Mission",
    "node_type": "Sequence",
    "status": "Success",
    "children": [...],
    "stats": {
      "tick_count": 5,
      "success_count": 5,
      "failure_count": 0,
      "avg_execution_ms": 0.8,
      "last_execution_ms": 0.7
    }
  },
  "blackboard": {
    "status": "completed",
    "mission_task": "Navigate to warehouse"
  },
  "execution_trace": [...],
  "replan_events": [],
  "metrics": {
    "total_replans": 0,
    "avg_tick_rate": 1250.0,
    "avg_llm_latency_ms": 0.0,
    "watchdog_triggers": 0,
    "total_ticks": 5,
    "failure_rate": 0.0,
    "total_execution_ms": 4.0
  }
}

Node Statistics

Each node tracks execution statistics.

NodeStats

pub struct NodeStats {
    /// Times this node was ticked
    pub tick_count: u64,

    /// Times returned Success
    pub success_count: u64,

    /// Times returned Failure
    pub failure_count: u64,

    /// Average execution time in ms
    pub avg_execution_ms: f64,

    /// Last execution time in ms
    pub last_execution_ms: f64,
}

Use Cases

  • Performance profiling: Identify slow nodes
  • Reliability tracking: Monitor failure rates
  • Execution patterns: Understand node behavior
  • Bottleneck detection: Find optimization opportunities

Execution Tracing

Track tick-by-tick execution history.

ExecutionTraceEntry

pub struct ExecutionTraceEntry {
    /// Tick number when executed
    pub tick: u64,

    /// Timestamp in milliseconds
    pub timestamp_ms: u64,

    /// Node identifier
    pub node_id: NodeId,

    /// Node name
    pub node_name: String,

    /// Result status
    pub status: NodeStatus,

    /// Execution duration in milliseconds
    pub duration_ms: f64,
}

Recording Executions

Automatically recorded by visualizer when attached to executor:

// Automatic recording
let executor = BTreeExecutor::new()
    .with_visualizer(visualizer);

// Manual recording (if needed)
visualizer.record_execution(
    node_id,
    node_name,
    status,
    duration,
    tick
).await;

Example Trace

[
  {
    "tick": 1,
    "timestamp_ms": 1770033271626,
    "node_id": "root/0",
    "node_name": "init",
    "status": "Success",
    "duration_ms": 0.1
  },
  {
    "tick": 2,
    "timestamp_ms": 1770033271628,
    "node_id": "root/1",
    "node_name": "navigate",
    "status": "Running",
    "duration_ms": 1.5
  }
]

Replan Tracking

Monitor LLM replanning events.

ReplanEvent

pub struct ReplanEvent {
    /// When replan occurred
    pub timestamp_ms: u64,

    /// Tick number
    pub tick: u64,

    /// Node that triggered replan
    pub trigger_node_id: NodeId,

    /// Reason for replanning
    pub reason: String,

    /// Replan attempt number
    pub attempt: u32,

    /// Subtree before replanning (optional)
    pub before_subtree: Option<Value>,

    /// Subtree after replanning (optional)
    pub after_subtree: Option<Value>,
}

Recording Replans

visualizer.record_replan(
    trigger_node_id,
    "Primary action failed",
    attempt,
    before_subtree,
    after_subtree,
    tick
).await;

Example Event

{
  "timestamp_ms": 1770033275000,
  "tick": 15,
  "trigger_node_id": "root/adaptive/0",
  "reason": "Navigation failed - obstacle detected",
  "attempt": 1,
  "before_subtree": {
    "type": "Action",
    "name": "navigate_direct",
    "tool": "navigate",
    "args": {"path": "direct"}
  },
  "after_subtree": {
    "type": "Sequence",
    "name": "navigate_around",
    "children": [
      {"type": "Action", "name": "avoid_obstacle"},
      {"type": "Action", "name": "resume_navigation"}
    ]
  }
}

Metrics Collection

Aggregated metrics for performance monitoring.

MetricsSummary

pub struct MetricsSummary {
    /// Total replan events
    pub total_replans: u64,

    /// Average ticks per second
    pub avg_tick_rate: f64,

    /// Average LLM call latency
    pub avg_llm_latency_ms: f64,

    /// Times watchdog was triggered
    pub watchdog_triggers: u64,

    /// Total ticks executed
    pub total_ticks: u64,

    /// Ratio of failed ticks
    pub failure_rate: f64,

    /// Total execution time
    pub total_execution_ms: f64,
}

Updating Metrics

visualizer.update_metrics(
    tick_duration,
    llm_latency,
    watchdog_triggered
).await;

Example Metrics

{
  "total_replans": 3,
  "avg_tick_rate": 850.0,
  "avg_llm_latency_ms": 250.5,
  "watchdog_triggers": 0,
  "total_ticks": 42,
  "failure_rate": 0.05,
  "total_execution_ms": 49.4
}

Tree Diffs

Lightweight diffs to reduce bandwidth.

TreeDiff

pub struct TreeDiff {
    /// Nodes that changed status
    pub status_changes: Vec<StatusChange>,

    /// Newly added nodes
    pub added_nodes: Vec<NodeId>,

    /// Removed nodes
    pub removed_nodes: Vec<NodeId>,

    /// Blackboard changes
    pub blackboard_changes: HashMap<String, BlackboardChange>,
}

pub struct StatusChange {
    pub node_id: NodeId,
    pub old_status: NodeStatus,
    pub new_status: NodeStatus,
}

pub enum BlackboardChange {
    Added(Value),
    Modified { old: Value, new: Value },
    Removed(Value),
}

Computing Diffs

// Enable diffs in config
let config = VisualizerConfig {
    compute_diffs: true,
    ..Default::default()
};

// Get diff since last snapshot
let diff = visualizer.compute_diff(&current_snapshot).await;

Example Diff

{
  "status_changes": [
    {
      "node_id": "root/2",
      "old_status": "Running",
      "new_status": "Success"
    }
  ],
  "added_nodes": [],
  "removed_nodes": [],
  "blackboard_changes": {
    "status": {
      "Modified": {
        "old": "running",
        "new": "completed"
      }
    }
  }
}

Performance Characteristics

Export Performance

Measured timings on typical hardware:

  • Simple tree (5 nodes): ~0.2ms
  • Medium tree (20 nodes): ~1.5ms
  • Large tree (100 nodes): ~4.8ms
  • Target: <5ms for all trees

Memory Usage

  • Trace entries: ~200 bytes each (default max 100 = 20KB)
  • Replan events: ~500 bytes each (default max 10 = 5KB)
  • Node stats: ~50 bytes per node
  • Total overhead: <100KB for typical trees

CPU Overhead

  • Export: ~0.1% of tick time
  • Metrics update: <0.01ms
  • Trace recording: <0.01ms
  • Impact: Negligible on execution

Optimization Techniques

  1. Rolling windows: Limit history to prevent unbounded growth
  2. Diff computation: Send only changes
  3. Blackboard filtering: Export only relevant keys
  4. Lazy serialization: JSON created on demand
  5. Arc<RwLock<>>: Concurrent read access

Dashboard Integration

WebSocket Streaming

Real-time updates via WebSocket:

// Client-side example
const ws = new WebSocket('ws://localhost:8080/btree/visualize');

ws.onmessage = (event) => {
  const snapshot: TreeSnapshot = JSON.parse(event.data);

  // Update UI
  renderTree(snapshot.root);
  updateMetrics(snapshot.metrics);
  appendTrace(snapshot.execution_trace);
  highlightReplans(snapshot.replan_events);
};

REST API Polling

Periodic polling for snapshots:

async function pollSnapshot() {
  const response = await fetch('/api/btree/snapshot');
  const snapshot: TreeSnapshot = await response.json();
  return snapshot;
}

// Poll every second
setInterval(async () => {
  const snapshot = await pollSnapshot();
  updateVisualization(snapshot);
}, 1000);

Diff Endpoint

Efficient updates via diffs:

let lastTimestamp = 0;

async function getDiff() {
  const response = await fetch(
    `/api/btree/diff?since=${lastTimestamp}`
  );
  const diff: TreeDiff = await response.json();

  // Apply diff to current state
  applyDiff(diff);

  lastTimestamp = Date.now();
}

Visualization Components

Tree Graph

Recommended: D3.js hierarchical layout

import * as d3 from 'd3';

function renderTree(root: NodeSnapshot) {
  const hierarchy = d3.hierarchy(root);
  const treeLayout = d3.tree().size([800, 600]);

  const nodes = treeLayout(hierarchy);

  // Color by status
  const colorMap = {
    'Running': '#FFD700',   // Yellow
    'Success': '#00FF00',   // Green
    'Failure': '#FF0000',   // Red
    'Skipped': '#CCCCCC',   // Gray
  };

  // Render nodes
  svg.selectAll('circle')
    .data(nodes.descendants())
    .join('circle')
    .attr('r', 10)
    .attr('fill', d => colorMap[d.data.status]);

  // Render labels
  svg.selectAll('text')
    .data(nodes.descendants())
    .join('text')
    .text(d => d.data.name);
}

Execution Timeline

Horizontal timeline showing tick progression:

function renderTimeline(trace: ExecutionTraceEntry[]) {
  const svg = d3.select('#timeline');

  svg.selectAll('rect')
    .data(trace)
    .join('rect')
    .attr('x', d => d.tick * 20)
    .attr('y', d => nodeIndex(d.node_id) * 30)
    .attr('width', 18)
    .attr('height', 25)
    .attr('fill', d => statusColor(d.status))
    .on('mouseover', showDetails);
}

Metrics Dashboard

Real-time metrics display:

function updateMetrics(metrics: MetricsSummary) {
  document.getElementById('tick-rate').textContent =
    `${metrics.avg_tick_rate.toFixed(1)} ticks/sec`;

  document.getElementById('replans').textContent =
    `${metrics.total_replans}`;

  document.getElementById('failure-rate').textContent =
    `${(metrics.failure_rate * 100).toFixed(2)}%`;

  document.getElementById('llm-latency').textContent =
    `${metrics.avg_llm_latency_ms.toFixed(1)}ms`;
}

Production Monitoring

Alerting

Set up alerts for critical metrics:

if snapshot.metrics.failure_rate > 0.1 {
    alert("High failure rate: {:.1}%", snapshot.metrics.failure_rate * 100.0);
}

if snapshot.metrics.total_replans > 10 {
    alert("Excessive replans: {}", snapshot.metrics.total_replans);
}

if snapshot.metrics.watchdog_triggers > 0 {
    alert("Watchdog triggered {} times", snapshot.metrics.watchdog_triggers);
}

Logging

Export snapshots to logs:

use tracing::info;

info!(
    "Tree execution snapshot: ticks={}, status={:?}, replans={}",
    snapshot.tick_count,
    snapshot.root.status,
    snapshot.metrics.total_replans
);

Time-Series Export

Export to Prometheus, InfluxDB, etc.:

// Prometheus example
prometheus::gauge!("btree_tick_rate").set(snapshot.metrics.avg_tick_rate);
prometheus::counter!("btree_replans").increment(snapshot.metrics.total_replans);
prometheus::histogram!("btree_llm_latency_ms").observe(snapshot.metrics.avg_llm_latency_ms);

Example: Complete Monitoring Setup

use igris_btree::prelude::*;
use igris_btree::visualizer::{TreeVisualizer, VisualizerConfig};
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Configure visualizer
    let viz_config = VisualizerConfig {
        enabled: true,
        max_trace_entries: 100,
        max_replan_events: 10,
        export_frequency: 5,  // Every 5 ticks
        compute_diffs: true,
        blackboard_filter: Some(vec![
            "status".to_string(),
            "mission_task".to_string(),
        ]),
    };

    let visualizer = Arc::new(TreeVisualizer::with_config(viz_config));

    // Create executor
    let executor = BTreeExecutor::new()
        .with_max_ticks(100)
        .with_tracing(true)
        .with_visualizer(visualizer.clone());

    // Build tree
    let llm = Arc::new(MockLlmProvider::with_navigation_plan());
    let mut context = BTreeContext::new().with_llm(llm);

    let mut tree = Sequence::new("monitored_mission")
        .add_child(Box::new(SetBlackboard::new("init", "status", "starting")))
        .add_child(Box::new(LLMPlannerNode::new("planner", "task", "plan")))
        .add_child(Box::new(SubtreeLoader::new("executor", "plan")))
        .add_child(Box::new(SetBlackboard::new("done", "status", "completed")));

    // Execute with monitoring
    let result = executor.execute(&mut tree, &mut context).await?;

    // Export final snapshot
    let snapshot = visualizer
        .export_snapshot(&tree, &context, result.tick_count)
        .await?;

    // Display metrics
    println!("\n=== Execution Metrics ===");
    println!("Total Ticks: {}", snapshot.metrics.total_ticks);
    println!("Tick Rate: {:.2} ticks/sec", snapshot.metrics.avg_tick_rate);
    println!("Replans: {}", snapshot.metrics.total_replans);
    println!("Failure Rate: {:.2}%", snapshot.metrics.failure_rate * 100.0);
    println!("LLM Latency: {:.2}ms", snapshot.metrics.avg_llm_latency_ms);

    // Save snapshot
    let json = serde_json::to_string_pretty(&snapshot)?;
    std::fs::write("visualization_snapshot.json", json)?;

    println!("\nāœ“ Snapshot saved to visualization_snapshot.json");

    Ok(())
}

Next Steps


Summary

Key takeaways:

  • Real-time observability: Complete tree state export
  • Execution tracing: Tick-by-tick history
  • Metrics collection: Performance and reliability tracking
  • Replan tracking: Monitor LLM replanning events
  • Performance optimized: <5ms overhead
  • Dashboard ready: JSON export for WebSocket/REST
  • Production features: Alerting, logging, time-series export