phi-core

Simple, effective agent loop in Rust.

phi-core is a library for building LLM-powered agents that can use tools. It provides the core loop — prompt the model, execute tool calls, feed results back — and gets out of your way.

Philosophy

The loop is the product. An agent is just a loop: send messages to an LLM, get back text and tool calls, execute the tools, repeat until the model stops. phi-core implements this loop with streaming, cancellation, context management, and multi-provider support — so you don't have to.

Features

  • Streaming events — Real-time AgentEvent stream for UI updates (text deltas, thinking, tool execution)
  • Multi-provider — Anthropic, OpenAI, Google Gemini, Amazon Bedrock, Azure OpenAI, and any OpenAI-compatible API
  • Tool systemAgentTool trait with built-in coding tools (bash, file read/write/edit, search)
  • Context management — Automatic token estimation, tiered compaction (truncate tool outputs → summarize → drop old messages)
  • Execution limits — Max turns, tokens, and wall-clock time
  • Steering & follow-ups — Interrupt the agent mid-run or queue work for after it finishes
  • CancellationCancellationToken-based abort at any point
  • Builder pattern — Ergonomic BasicAgent struct with chainable configuration; Agent trait for polymorphism
  • Config-driven construction — TOML/JSON/YAML config → agent_from_config()Arc<dyn Agent>
  • Session persistenceSessionRecorder materializes structured session/loop/turn records from events
  • Sub-agents — Delegate tasks to child agent loops via SubAgentTool
  • MCP integration — Connect to external tool servers via Model Context Protocol (stdio + HTTP)
  • Evaluational parallelism — Run N configs concurrently, select the best result via EvaluationStrategy

Ecosystem

phi-core is part of the LazyBouy ecosystem. It powers the agent backend for Phi applications.

Installation

Requirements

  • Rust 2021 edition (1.75+)
  • Tokio async runtime

Add to Cargo.toml

[dependencies]
phi-core = "0.8"

Dependencies

phi-core brings in these key dependencies automatically:

CratePurpose
tokioAsync runtime (full features)
serde / serde_jsonSerialization
reqwestHTTP client for provider APIs
reqwest-eventsourceSSE streaming
async-traitAsync trait support
tokio-utilCancellationToken
thiserrorError types
tracingLogging

Feature Flags

All providers and built-in tools are included by default. Optional features:

FeatureDependenciesDescription
openapiopenapiv3, serde_yamlAuto-generate tools from OpenAPI 3.0 specs

Enable in Cargo.toml:

[dependencies]
phi-core = { version = "0.7", features = ["openapi"] }

Quick Start

Basic Example with Anthropic

use phi_core::{BasicAgent, AgentEvent, StreamDelta};
use phi_core::provider::ModelConfig;
use phi_core::tools::default_tools;

#[tokio::main]
async fn main() {
    let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
    let mut agent = BasicAgent::new(ModelConfig::anthropic(
        "claude-sonnet-4-20250514",
        "Claude Sonnet 4",
        &api_key,
    ))
    .with_system_prompt("You are a helpful coding assistant.")
    .with_tools(default_tools());

    let mut rx = agent.prompt("List the files in the current directory").await;

    while let Some(event) = rx.recv().await {
        match event {
            AgentEvent::MessageUpdate { delta, .. } => match delta {
                StreamDelta::Text { delta } => print!("{}", delta),
                StreamDelta::Thinking { delta } => print!("[thinking] {}", delta),
                _ => {}
            },
            AgentEvent::ToolExecutionStart { tool_name, .. } => {
                println!("\n→ Running tool: {}", tool_name);
            }
            AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => {
                if is_error {
                    println!("  ✗ {} failed", tool_name);
                } else {
                    println!("  ✓ {} done", tool_name);
                }
            }
            AgentEvent::AgentEnd { .. } => {
                println!("\n\nDone.");
            }
            _ => {}
        }
    }
}

Example with OpenAI-Compatible Provider

For OpenAI, xAI, Groq, or any compatible API, use ModelConfig::openai() or ModelConfig::local():

use phi_core::{BasicAgent, AgentEvent, StreamDelta};
use phi_core::provider::ModelConfig;
use phi_core::tools::default_tools;

#[tokio::main]
async fn main() {
    let api_key = std::env::var("OPENAI_API_KEY").unwrap();
    let mut agent = BasicAgent::new(ModelConfig::openai("gpt-4o", "GPT-4o", &api_key))
        .with_system_prompt("You are a helpful assistant.")
        .with_tools(default_tools());

    let mut rx = agent.prompt("What is 2 + 2?").await;

    while let Some(event) = rx.recv().await {
        match event {
            AgentEvent::MessageUpdate { delta, .. } => {
                if let StreamDelta::Text { delta } = delta {
                    print!("{}", delta);
                }
            }
            AgentEvent::AgentEnd { .. } => println!(),
            _ => {}
        }
    }
}

Real-Time Streaming

By default, agent.prompt() blocks until the loop finishes and returns a receiver with all events buffered. To consume events in real-time, use prompt_with_sender() with a caller-provided channel:

use phi_core::{BasicAgent, AgentEvent, StreamDelta};
use phi_core::provider::ModelConfig;
use phi_core::tools::default_tools;

#[tokio::main]
async fn main() {
    let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
    let mut agent = BasicAgent::new(ModelConfig::anthropic(
        "claude-sonnet-4-20250514",
        "Claude Sonnet 4",
        &api_key,
    ))
    .with_system_prompt("You are a helpful assistant.")
    .with_tools(default_tools());

    let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel();

    // Consume events in real-time on a separate task
    tokio::spawn(async move {
        while let Some(event) = rx.recv().await {
            match event {
                AgentEvent::MessageUpdate { delta, .. } => {
                    if let StreamDelta::Text { delta } = delta {
                        print!("{}", delta);
                    }
                }
                AgentEvent::AgentEnd { .. } => println!(),
                _ => {}
            }
        }
    });

    // This blocks until the loop finishes; state is restored automatically
    agent.prompt_with_sender("What is 2 + 2?", tx).await;

    // Agent is ready for another prompt immediately
    let _rx = agent.prompt("Follow up question").await;
}

Using the Low-Level API

For more control, use agent_loop() directly:

use phi_core::agent_loop::{agent_loop, AgentLoopConfig};
use phi_core::provider::ModelConfig;
use phi_core::types::*;
use tokio::sync::mpsc;
use tokio_util::sync::CancellationToken;

#[tokio::main]
async fn main() {
    let (tx, mut rx) = mpsc::unbounded_channel();
    let cancel = CancellationToken::new();

    let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();

    let mut context = AgentContext {
        system_prompt: "You are helpful.".into(),
        messages: Vec::new(),
        tools: phi_core::tools::default_tools(),
        ..Default::default()
    };

    let config = AgentLoopConfig {
        model_config: ModelConfig::anthropic(
            "claude-sonnet-4-20250514",
            "Claude Sonnet 4",
            &api_key,
        ),
        thinking_level: ThinkingLevel::Off,
        max_tokens: None,
        temperature: None,
        convert_to_llm: None,
        transform_context: None,
        get_steering_messages: None,
        get_follow_up_messages: None,
        context_config: None,
        execution_limits: None,
        cache_config: CacheConfig::default(),
        tool_execution: ToolExecutionStrategy::default(),
        retry_config: phi_core::RetryConfig::default(),
        before_turn: None,
        after_turn: None,
        on_error: None,
        input_filters: vec![],
        ..Default::default()
    };

    let prompts = vec![AgentMessage::Llm(Message::user("Hello!"))];
    let new_messages = agent_loop(prompts, &mut context, &config, tx, cancel).await;

    // Drain events
    while let Ok(_event) = rx.try_recv() {
        // handle events...
    }

    println!("Got {} new messages", new_messages.len());
}

The Agent Loop

The agent loop is the core of phi-core. It implements the fundamental cycle:

User prompt → LLM call → Tool execution → LLM call → ... → Final response

The agent_loop module contains the core loop logic in mod.rs and the evaluation sub-module for evaluational parallelism strategies.

How It Works

┌──────────────────────────────────────────────┐
│                  agent_loop()                │
│                                              │
│  1. Add prompts to context                   │
│  2. Emit AgentStart + TurnStart              │
│                                              │
│  ┌─────────── Inner Loop ──────────────┐     │
│  │  • Check steering messages          │     │
│  │  • Check execution limits           │     │
│  │  • Compact context (if configured)  │     │
│  │  • Stream LLM response              │     │
│  │  • Extract tool calls               │     │
│  │  • Execute tools (with steering)    │     │
│  │  • Emit TurnEnd                     │     │
│  │  • Continue if tool_calls or steer  │     │
│  └─────────────────────────────────────┘     │
│                                              │
│  3. Check follow-up messages                 │
│  4. If follow-ups exist, loop again          │
│  5. Emit AgentEnd                            │
└──────────────────────────────────────────────┘

Entry Points

agent_loop()

Starts a new agent run with prompt messages:

#![allow(unused)]
fn main() {
pub async fn agent_loop(
    prompts: Vec<AgentMessage>,
    context: &mut AgentContext,
    config: &AgentLoopConfig,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> Vec<AgentMessage>
}

The prompts are added to context, then the loop runs. Returns all new messages generated during the run.

agent_loop_continue()

Resumes from existing context (e.g., after an error, retry, or branch):

#![allow(unused)]
fn main() {
pub async fn agent_loop_continue(
    context: &mut AgentContext,
    config: &AgentLoopConfig,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> Vec<AgentMessage>
}

Preconditions: context.agent_id and context.session_id must be Some — the function panics with a descriptive message otherwise. In practice, any context that passed through agent_loop() at least once already has these set. When constructing a context manually (e.g., from a persisted snapshot), set them explicitly before calling this function.

The last message in context must also not be an assistant message.

AgentLoopConfig

#![allow(unused)]
fn main() {
pub struct AgentLoopConfig {
    /// REQUIRED — complete provider identity: model id, api_key, base_url, protocol, cost rates.
    pub model_config: ModelConfig,
    /// Optional override — bypasses ProviderRegistry, used for MockProvider in tests.
    pub provider_override: Option<Arc<dyn StreamProvider>>,
    pub config_id: Option<String>,
    pub thinking_level: ThinkingLevel,
    pub max_tokens: Option<u32>,
    pub temperature: Option<f32>,
    pub convert_to_llm: Option<ConvertToLlmFn>,
    pub transform_context: Option<TransformContextFn>,
    pub get_steering_messages: Option<GetMessagesFn>,
    pub get_follow_up_messages: Option<GetMessagesFn>,
    pub context_config: Option<ContextConfig>,
    pub execution_limits: Option<ExecutionLimits>,
    pub cache_config: CacheConfig,
    pub tool_execution: ToolExecutionStrategy,
    pub retry_config: RetryConfig,
    pub before_loop: Option<BeforeLoopFn>,
    pub after_loop: Option<AfterLoopFn>,
    pub before_turn: Option<BeforeTurnFn>,
    pub after_turn: Option<AfterTurnFn>,
    pub on_error: Option<OnErrorFn>,
    pub before_tool_execution: Option<BeforeToolExecutionFn>,
    pub after_tool_execution: Option<AfterToolExecutionFn>,
    pub before_tool_execution_update: Option<BeforeToolExecutionUpdateFn>,
    pub after_tool_execution_update: Option<AfterToolExecutionUpdateFn>,
    pub before_compaction_start: Option<BeforeCompactionStartFn>,
    pub after_compaction_end: Option<AfterCompactionEndFn>,
    pub input_filters: Vec<Arc<dyn InputFilter>>,
    pub first_turn_trigger: TurnTrigger,
    pub context_translation: Option<Arc<dyn ContextTranslationStrategy>>,
    pub prun_pending: Option<Arc<Mutex<Vec<PrunRequest>>>>,
}
}
FieldPurpose
model_configRequired. Complete provider identity: model id, api_key, base_url, api protocol, cost rates, compat flags. The provider is resolved from model_config.api via ProviderRegistry.
provider_overrideCustom Arc<dyn StreamProvider> — bypasses registry when Some. Used for MockProvider in tests or fully custom backends.
config_idOptional stable identity for this config; auto-derived as "{provider_id}.{model_slug}[.thinking]" when None. Used as the middle segment of loop_id.
thinking_levelOff, Minimal, Low, Medium, High
convert_to_llmCustom AgentMessage[] → Message[] conversion
transform_contextPre-processing hook for context pruning
get_steering_messagesReturns user interruptions during tool execution
get_follow_up_messagesReturns queued work after agent would stop
context_configToken budget and compaction settings
execution_limitsMax turns, tokens, duration
cache_configPrompt caching behavior (see Prompt Caching)
tool_executionParallel, Sequential, or Batched (see Tools)
retry_configRetry behavior for transient errors (see Retry)
before_loopCalled once before AgentStart; return false to abort the entire run (see Callbacks)
after_loopCalled once after AgentEnd with all new messages and accumulated usage (see Callbacks)
before_turnCalled before each LLM call; return false to abort (see Callbacks)
after_turnCalled after each turn with messages and usage (see Callbacks)
on_errorCalled on StopReason::Error with the error string (see Callbacks)
before_tool_executionCalled before each tool call; return false to skip it (see Callbacks)
after_tool_executionCalled after each tool call completes (see Callbacks)
before_tool_execution_updateCalled before each streaming tool update; return false to suppress the event (see Callbacks)
after_tool_execution_updateCalled after each streaming tool update event (see Callbacks)
before_compaction_startCalled before compaction starts with (estimated_tokens, message_count); return false to skip compaction for this cycle (see Callbacks)
after_compaction_endCalled after compaction completes with (messages_before, messages_after, tokens_before, tokens_after) (see Callbacks)
input_filtersInput filters applied to user messages before the LLM call (see Tools)
first_turn_triggerThe TurnTrigger for the first TurnStart event; defaults to TurnTrigger::User, set to SubAgent by sub-agent callers
context_translationOptional ContextTranslationStrategy for cross-provider compatibility — translates content types (e.g., Content::Thinking) when targeting a different provider (G8)
prun_pendingShared state for PrunTool to communicate pruning requests to the loop; set automatically by with_prun_tool()

0.9.0 — async lifecycle hooks. BeforeLoopFn, AfterLoopFn, BeforeTurnFn, AfterTurnFn, OnErrorFn, BeforeToolExecutionFn, AfterToolExecutionFn, BeforeCompactionStartFn, and AfterCompactionEndFn are now async — their function bodies return Pin<Box<dyn Future<Output = T> + Send>> (alias: HookFuture<'_, T>). Sync closure bodies migrate by wrapping in Box::pin(async move { ... }). Closures can now .await LLM calls and other async work directly without a tokio::task::block_in_place bridge.

Pre-existing-behaviour preservation note (phi-core 0.9.0): tool-update hooks stay sync. BeforeToolExecutionUpdateFn and AfterToolExecutionUpdateFn remain Arc<dyn Fn(&str, &str, &str) -> bool + Send + Sync> / Arc<dyn Fn(&str, &str, &str) + Send + Sync> respectively. Async-ifying them would cascade into the ToolUpdateFn callback type and every AgentTool::execute body that invokes ctx.on_update(...) — a materially wider migration than the 0.9.0 scope. The veto decision in BeforeToolExecutionUpdateFn is synchronous so the surrounding emit gate works without an .await at every streamed tool-update; consumers that need async work at update-time should dispatch via tokio::spawn(...) inside the sync closure body. Tracked under the CHANGELOG [Unreleased] "Forward markers" section for a future release. InputFilter::filter() is also now async fn via #[async_trait] — see Tools and the per-turn debug-capture surface at debugging.md.

Steering & Follow-Ups

Steering

Steering messages interrupt the agent between tool executions. When the agent is executing multiple tool calls from a single LLM response, steering is checked after each tool completes. If a steering message is found:

  1. The current tool finishes normally
  2. All remaining tool calls are skipped with is_error: true and "Skipped due to queued user message"
  3. The steering message is injected into context
  4. The loop continues with a new LLM call that sees the interruption
#![allow(unused)]
fn main() {
// While agent is running tools, redirect it:
agent.steer(AgentMessage::Llm(Message::user("Stop that. Instead, explain what you found.")));
}

Follow-Ups

Follow-up messages are checked after the agent would normally stop (no more tool calls, no steering). If follow-ups exist, the loop continues with them as new input — the agent doesn't need to be re-prompted.

#![allow(unused)]
fn main() {
// Queue work for after the agent finishes its current task:
agent.follow_up(AgentMessage::Llm(Message::user("Now run the tests.")));
agent.follow_up(AgentMessage::Llm(Message::user("Then commit the changes.")));
}

Queue Modes

Both queues support two delivery modes:

ModeBehavior
QueueMode::OneAtATimeDelivers one message per turn (default)
QueueMode::AllDelivers all queued messages at once
#![allow(unused)]
fn main() {
agent.set_steering_mode(QueueMode::All);
agent.set_follow_up_mode(QueueMode::OneAtATime);
}

Queue Management

#![allow(unused)]
fn main() {
agent.clear_steering_queue();   // Drop all pending steers
agent.clear_follow_up_queue();  // Drop all pending follow-ups
agent.clear_all_queues();       // Drop everything
}

Low-Level API

When using agent_loop() directly, steering and follow-ups are provided via callback functions:

#![allow(unused)]
fn main() {
let config = AgentLoopConfig {
    get_steering_messages: Some(Box::new(|| {
        // Return Vec<AgentMessage> — checked between tool calls
        vec![]
    })),
    get_follow_up_messages: Some(Box::new(|| {
        // Return Vec<AgentMessage> — checked when agent would stop
        vec![]
    })),
    // ...
};
}

Custom Compaction

By default, when context exceeds the token budget in ContextConfig, phi-core runs a 3-level compaction strategy: truncate tool outputs → summarize old turns → drop middle messages (legacy in-memory path via compact_messages()). When a Session is available, the modern system uses non-destructive CompactionBlock overlays — see compaction. You can replace this with your own CompactionStrategy.

CompactionStrategy vs BlockCompactionStrategy

  • CompactionStrategy — Legacy in-memory approach. Destructive: it mutates the message list directly. Used when AgentContext.session is None (no session persistence).
  • BlockCompactionStrategy — New overlay approach. Non-destructive: it creates a CompactionBlock on the LoopRecord rather than altering the original messages. Used when AgentContext.session is Some (session-backed execution). Original messages remain authoritative for replay and branching.

Example of a custom CompactionStrategy:

#![allow(unused)]
fn main() {
use phi_core::context::{CompactionStrategy, ContextConfig, CompactionConfig, compact_messages};
use phi_core::types::*;
use std::sync::Arc;

struct MyCompaction;

impl CompactionStrategy for MyCompaction {
    fn compact(
        &self,
        messages: Vec<AgentMessage>,
        config: &ContextConfig,
    ) -> Vec<AgentMessage> {
        // Your logic here — then optionally delegate to the default:
        compact_messages(messages, config)
    }
}

// Modern pattern: set strategies via ContextConfig.compaction
let context_config = ContextConfig {
    compaction: CompactionConfig {
        // in_memory_strategy: used when AgentContext.session is None (sub-agents, tests)
        in_memory_strategy: Some(Arc::new(MyCompaction)),
        // block_strategy: used when AgentContext.session is Some (session-backed execution)
        // block_strategy: Some(Arc::new(MyBlockCompaction)),
        ..CompactionConfig::default()
    },
    ..ContextConfig::default()
};

let agent = BasicAgent::new(model_config)
    .with_context_config(context_config);
}

The in-memory strategy is called once per turn, right before the LLM call, whenever context_config is Some and AgentContext.session is None. When in_memory_strategy is None, DefaultCompaction (which wraps compact_messages()) is used automatically. When a session is present, block_strategy is used instead (defaulting to DefaultBlockCompaction).

Use Cases

Memory-aware compaction — Index messages into a vector store before they're dropped, so the agent can recall them later via a search tool:

#![allow(unused)]
fn main() {
struct MemoryAwareCompaction {
    memory: Arc<dyn MemoryStore>,
}

impl CompactionStrategy for MemoryAwareCompaction {
    fn compact(
        &self,
        messages: Vec<AgentMessage>,
        config: &ContextConfig,
    ) -> Vec<AgentMessage> {
        let compacted = compact_messages(messages.clone(), config);

        // Index what was dropped
        let dropped: Vec<_> = messages.iter()
            .filter(|m| !compacted.contains(m))
            .collect();
        if !dropped.is_empty() {
            self.memory.index(dropped);
        }

        compacted
    }
}
}

Semantic pointer compaction — Replace dropped messages with a marker so the agent knows context was lost:

#![allow(unused)]
fn main() {
struct SemanticPointerCompaction;

impl CompactionStrategy for SemanticPointerCompaction {
    fn compact(
        &self,
        messages: Vec<AgentMessage>,
        config: &ContextConfig,
    ) -> Vec<AgentMessage> {
        let compacted = compact_messages(messages.clone(), config);
        let dropped_count = messages.len() - compacted.len();

        if dropped_count == 0 {
            return compacted;
        }

        // Insert a marker after the first kept messages
        let mut result = compacted;
        let insert_at = config.compaction.keep_first_turns.min(result.len());
        result.insert(insert_at, AgentMessage::Extension(
            ExtensionMessage::new("compaction_marker", serde_json::json!({
                "dropped": dropped_count,
                "note": format!("{} earlier messages were compacted", dropped_count),
            }))
        ));
        result
    }
}
}

Priority-preserving compaction — Never drop messages containing important keywords:

#![allow(unused)]
fn main() {
struct PriorityPreservingCompaction {
    preserve_keywords: Vec<String>,
}

impl CompactionStrategy for PriorityPreservingCompaction {
    fn compact(
        &self,
        messages: Vec<AgentMessage>,
        config: &ContextConfig,
    ) -> Vec<AgentMessage> {
        let (priority, normal): (Vec<_>, Vec<_>) = messages.into_iter()
            .partition(|m| self.is_priority(m));

        let mut compacted = compact_messages(normal, config);

        // Re-insert priority messages — they're never dropped
        for msg in priority {
            compacted.push(msg);
        }
        compacted
    }
}
}

Evaluational Parallelism

agent_loop_parallel runs the same prompt through multiple AgentLoopConfigs concurrently, evaluates the results with a pluggable EvaluationStrategy, and returns the winning branch. This is useful for multi-model comparison, A/B prompt testing, and selecting the best response among different reasoning approaches.

#![allow(unused)]
fn main() {
use phi_core::{agent_loop_parallel, PickFirstEvaluation, AgentContext, AgentLoopConfig};
use std::sync::Arc;

let result = agent_loop_parallel(
    prompts,
    base_context,           // cloned per branch; Arc tools shared
    vec![config_a, config_b],
    Arc::new(PickFirstEvaluation),
    tx,
    cancel,
).await;

// result.selected_context feeds directly into agent_loop_continue()
// result.selected_messages is the winning branch's output
}

See Evaluational Parallelism for the full guide including built-in strategies, the LLM judge, and session continuity.

Evaluational Parallelism

Evaluational parallelism runs the same prompt through multiple AgentLoopConfigs concurrently, evaluates the results with a pluggable strategy, and delivers the single best outcome. This lets you compare models, prompt variants, or reasoning settings in one call — then continue the session normally with the winner.

Overview

             ┌─ Config A ─► Branch A ─► response A ─┐
prompt ──────┤                                        ├─► Evaluate ─► selected response
             └─ Config B ─► Branch B ─► response B ─┘

Every branch receives an identical copy of the base context (message history, tools) and the same prompt. Branches run concurrently. After all branches finish, the EvaluationStrategy picks the winner and returns its context and messages.

When to use evaluational parallelism vs. parallel sub-agents

Evaluational parallelismParallel sub-agents
Task structureSame task, different configsDifferent subtasks
Context sharedYes (cloned base context)No (isolated child contexts)
ResultOne selected outcomeAll results merged
Typical useMulti-model comparison, A/B promptsDivide-and-conquer work

Entry point

#![allow(unused)]
fn main() {
pub async fn agent_loop_parallel(
    prompts: Vec<AgentMessage>,
    base_context: AgentContext,           // cloned once per config
    configs: Vec<AgentLoopConfig>,        // one per branch
    strategy: Arc<dyn EvaluationStrategy>,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> ParallelLoopResult
}

base_context is cloned once per config entry — tools are Arc-shared (zero copy); the message history is deep-cloned so branches start from identical state but diverge independently.

Minimal example

#![allow(unused)]
fn main() {
use phi_core::{agent_loop_parallel, PickFirstEvaluation, AgentContext, AgentLoopConfig};
use phi_core::provider::ModelConfig;
use std::sync::Arc;
use tokio::sync::mpsc;
use tokio_util::sync::CancellationToken;

let config_a = AgentLoopConfig {
    model_config: ModelConfig::anthropic("claude-opus-4-6", "my-key", "claude-opus-4-6"),
    ..AgentLoopConfig::default()
};
let config_b = AgentLoopConfig {
    model_config: ModelConfig::anthropic("claude-haiku-4-5", "my-key", "claude-haiku-4-5"),
    ..AgentLoopConfig::default()
};

let (tx, mut rx) = mpsc::unbounded_channel();
let result = agent_loop_parallel(
    vec![AgentMessage::Llm(Message::user("Explain quantum entanglement."))],
    AgentContext { system_prompt: "Be concise.".into(), ..Default::default() },
    vec![config_a, config_b],
    Arc::new(PickFirstEvaluation),  // or any EvaluationStrategy
    tx,
    CancellationToken::new(),
)
.await;

println!("Selected branch: {}", result.selected_index);
// Continue the session with the winning context
// agent_loop_continue(&mut result.selected_context, &next_config, tx, cancel).await;
}

ParallelLoopResult

#![allow(unused)]
fn main() {
pub struct ParallelLoopResult {
    pub selected_context: AgentContext,        // winning branch's full context
    pub selected_messages: Vec<AgentMessage>,  // messages produced by the winner
    pub selected_index: usize,                 // 0-based index into original configs
    pub all_outcomes: Vec<ParallelLoopOutcome>,// remaining (non-selected) outcomes
    pub total_usage: Usage,                    // all branch usages + evaluation usage
}
}

Feed selected_context directly into agent_loop_continue() to resume the session normally — parallel execution is a single-loop operation, not a special session mode.

Built-in strategies

TransparentEvaluation

Single-branch pass-through. Panics if more than one config is provided.

Use this when you want the parallel plumbing (events, ParallelLoopResult) for a single config — zero evaluation overhead.

#![allow(unused)]
fn main() {
Arc::new(TransparentEvaluation)
}

PickFirstEvaluation

Always selects index 0 regardless of content.

Deterministic, zero-cost. Useful for testing and debugging multi-branch setups where you only care about the first config's output.

#![allow(unused)]
fn main() {
Arc::new(PickFirstEvaluation)
}

TokenEfficientEvaluation

Selects the branch with the lowest total token usage.

Prefer when cost or latency matters more than response depth. The model that solved the task most concisely wins.

#![allow(unused)]
fn main() {
Arc::new(TokenEfficientEvaluation)
}

ElaborateEvaluation

Selects the branch with the highest total token usage.

Prefer when depth and thoroughness are the priority. The most verbose response wins — useful when you want the most comprehensive analysis.

#![allow(unused)]
fn main() {
Arc::new(ElaborateEvaluation)
}

LlmJudgeEvaluation

Uses a separate LLM call to evaluate which branch produced the best response.

#![allow(unused)]
fn main() {
use phi_core::LlmJudgeEvaluation;

Arc::new(LlmJudgeEvaluation {
    judge_config: AgentLoopConfig {
        model_config: ModelConfig::anthropic("claude-opus-4-6", "my-key", "claude-opus-4-6"),
        context_config: Some(ContextConfig {
            max_context_tokens: 100_000,
            ..Default::default()
        }),
        ..AgentLoopConfig::default()
    },
    system_prompt: None, // use built-in judge prompt
})
}

agent_loop_continue mode

When prompts is empty, agent_loop_parallel routes each branch to agent_loop_continue instead of agent_loop. This lets you run parallel evaluation from an existing conversation context — the user query is already the last message in base_context.

#![allow(unused)]
fn main() {
// The user query is the last message in context (no new prompts to add).
let result = agent_loop_parallel(
    vec![],          // empty → agent_loop_continue mode
    base_context,    // must be non-empty and not end on an assistant message
    configs,
    strategy,
    tx,
    cancel,
)
.await;
}

Same preconditions as agent_loop_continue apply: base_context.messages must be non-empty and must not end on an assistant message.

original_context_len on ParallelLoopOutcome

Each outcome carries original_context_len: usize — the number of messages in the cloned context at the moment the branch was dispatched:

#![allow(unused)]
fn main() {
pub struct ParallelLoopOutcome {
    // ...
    pub original_context_len: usize,
}
}

context.messages[..original_context_len] is the shared base context all branches started from. Messages at [original_context_len..] are new messages produced by that branch.

Evaluation strategies use this field to extract the original user query and prior conversation history without separate bookkeeping, regardless of whether agent_loop or agent_loop_continue mode was used.

LLM Judge — prompt construction and comprehension criteria

What the judge sees

The judge receives only clean, relevant content:

  • Prior conversation context (new): the conversation history before the user query, formatted as a human-readable transcript. Tool call arguments and images are stripped — only Content::Text survives. Omitted from the prompt when empty.
  • Original query: text extracted from user messages in prompts (agent_loop mode), or from the last Message::User in context.messages[..original_context_len] (agent_loop_continue mode). Tool calls, images, and thinking are stripped.
  • Per-branch response: the text of the last Message::Assistant in each branch's new_messages. Tool calls, tool results, and intermediate multi-turn exchanges are stripped entirely — the judge evaluates outcomes, not reasoning traces.

Example judge prompt (with prior context):

Prior conversation context:
User: What is quantum mechanics?
Assistant: Quantum mechanics is the branch of physics that...

Original query:
Can you explain quantum entanglement in simple terms?

Response 1:
Quantum entanglement is when two particles share a quantum state...

Response 2:
Think of two magic dice...

Which response is best? Reply with ONLY the response number (e.g., "1" or "2").

Query extraction in agent_loop_continue mode

When prompts is empty, the judge cannot read the query directly from the prompts slice. It instead locates the last Message::User in outcome.context.messages[..original_context_len] and extracts its text content. Everything before that message becomes the prior conversation context.

Judge's comprehension criteria

The judge can only make a fair comparison when it sees all N branch final responses simultaneously alongside the prior context and query. For this to work, the combined content must fit within the judge model's context window.

This condition — all content fitting in the judge's context at once — is called the judge's comprehension criteria.

The budget is derived automatically from judge_config.context_config.max_context_tokens (if set). About 20% of the budget is reserved for the system prompt, query framing, and overhead; the remaining 80% is allocated for prior context + branch responses combined.

When no context_config is set on judge_config, no compaction is applied (all content is passed through as-is).

2-iteration compaction strategy

When the combined content exceeds the budget, compaction is applied in two iterations:

Iteration 1 — compact prior context only, outputs intact

The prior conversation context is compacted through 3 progressive tiers while branch outputs are preserved verbatim:

  1. Tier 1 — tail truncation: keep only the last 80 lines of the context transcript.
  2. Tier 2 — paragraph summary: keep only the first paragraph and last paragraph (separated by ...).
  3. Tier 3 — hard char limit: truncate to a per-response char limit derived from the remaining budget, minimum 200 chars. The formula is max(200, (token_budget * 4) / n) where n is the number of texts being compacted and the * 4 factor converts from tokens to chars (1 token ~ 4 chars estimate).

After each tier, the combined token estimate is re-checked. If the budget is satisfied, the judge proceeds with the compacted context and intact outputs.

Iteration 2 — compact both context and outputs independently

If iteration 1 cannot satisfy the budget even at tier 3, the context stays at its most- compacted (tier-3) form and branch outputs are now compacted independently through the same tiered compaction pipeline (legacy compact_messages(); see compaction for the modern CompactionBlock system).

prior context (tier-3)  +  outputs (tier-1 → 2 → 3)  →  check budget after each tier

If the criteria still cannot be satisfied after iteration 2, a ProgressMessage warning is emitted to tx and the judge proceeds best-effort.

Why context is compacted first

Iteration 1 biases the judge towards seeing the complete, uncompacted branch outputs — the actual decision material. Prior conversation history is ancillary; trimming it first preserves the most important information for fair comparison.

Original responses are always preserved

Compaction only affects what the judge reads. The selected_messages field in ParallelLoopResult always contains the original, uncompacted winning branch response.

Setting the judge's context limit

Set judge_config.context_config.max_context_tokens to the judge model's context window size (in tokens). This enables the comprehension-criteria check:

#![allow(unused)]
fn main() {
context_config: Some(ContextConfig {
    max_context_tokens: 200_000, // Claude Opus 4.6 context window
    ..Default::default()
}),
}

Different judge models have different context windows — the limit is co-located with the model config that actually has the constraint.

Design decisions

original_context_len on outcome (not a separate parameter) The EvaluationStrategy trait receives only outcomes and prompts. Embedding original_context_len in each outcome avoids changing the trait signature and keeps all outcome data co-located. Since all branches share the same base context, the value is identical across outcomes — using outcomes[0] is idiomatic.

Same tier functions for context and output compaction compact_tier1/2/3 were designed for document text but work equally well on a formatted conversation transcript. Reusing the same primitives minimises code surface and keeps compaction behaviour consistent.

Budget allocation — context gets priority (iteration 1) Iteration 1 compacts only the prior context, keeping outputs intact. This preserves the complete branch responses — the actual decision material — while trimming ancillary history first. Outputs are only compacted in iteration 2 when the context alone cannot satisfy the budget.

Session identity and loop IDs

All branches share the same session_id for traceability. Each branch gets a distinct loop_id following the format:

{session_id}.{config_segment}.{N}

where config_segment is derived from config.config_id (if set) or auto-derived as {provider}.{model-slug}[.thinking].

Example with two configs:

ses_abc123.anthropic.claude-opus-4-6.1
ses_abc123.anthropic.claude-haiku-4-5.2

The judge loop (if used) also runs in the same session:

ses_abc123.anthropic.claude-opus-4-6.3   ← judge's loop

Observability

Two events bracket the entire parallel execution:

#![allow(unused)]
fn main() {
AgentEvent::ParallelLoopStart {
    session_id: String,
    loop_ids: Vec<String>,   // one per branch, in config order
    timestamp: DateTime<Utc>,
}

AgentEvent::ParallelLoopEnd {
    session_id: String,
    selected_loop_id: String,
    selected_config_index: usize,
    evaluation_usage: Usage,  // judge LLM usage (zero if no judge)
    timestamp: DateTime<Utc>,
}
}

Events from all branches are interleaved in tx. Demultiplex by loop_id from each branch's AgentStart event.

Session continuity

agent_loop_parallel is a single-loop operation. After it returns, call agent_loop_continue on result.selected_context to continue the session:

#![allow(unused)]
fn main() {
let result = agent_loop_parallel(prompts, base_ctx, configs, strategy, tx, cancel).await;

// The session continues normally with the winning branch's context
let follow_up = agent_loop_continue(
    &mut result.selected_context,
    &next_config,
    tx2,
    cancel2,
)
.await;
}

Complete example — multi-model comparison with LLM judge

use phi_core::{
    agent_loop_parallel, agent_loop_continue,
    AgentContext, AgentLoopConfig, AgentMessage, AgentEvent, Message,
};
use phi_core::context::ContextConfig;
use phi_core::LlmJudgeEvaluation;
use phi_core::provider::ModelConfig;
use std::sync::Arc;
use tokio::sync::mpsc;
use tokio_util::sync::CancellationToken;

#[tokio::main]
async fn main() {
    // Branch A: fast, cost-efficient model
    let config_a = AgentLoopConfig {
        model_config: ModelConfig::anthropic("claude-haiku-4-5", API_KEY, "claude-haiku-4-5"),
        ..AgentLoopConfig::default()
    };

    // Branch B: powerful model
    let config_b = AgentLoopConfig {
        model_config: ModelConfig::anthropic("claude-opus-4-6", API_KEY, "claude-opus-4-6"),
        ..AgentLoopConfig::default()
    };

    // Judge: evaluates which response is better
    let judge_config = AgentLoopConfig {
        model_config: ModelConfig::anthropic("claude-opus-4-6", API_KEY, "claude-opus-4-6"),
        context_config: Some(ContextConfig {
            max_context_tokens: 200_000,
            ..Default::default()
        }),
        ..AgentLoopConfig::default()
    };

    let (tx, mut rx) = mpsc::unbounded_channel::<AgentEvent>();
    let cancel = CancellationToken::new();

    let result = agent_loop_parallel(
        vec![AgentMessage::Llm(Message::user("What is the most important physics discovery of the 20th century?"))],
        AgentContext {
            system_prompt: "You are a knowledgeable assistant.".into(),
            ..Default::default()
        },
        vec![config_a, config_b],
        Arc::new(LlmJudgeEvaluation { judge_config, system_prompt: None }),
        tx,
        cancel,
    )
    .await;

    println!("Selected branch: {}", result.selected_index);
    println!("Total tokens used: {}", result.total_usage.total_tokens);

    // Collect and display the winning response
    for msg in &result.selected_messages {
        if let phi_core::AgentMessage::Llm(phi_core::Message::Assistant { content, .. }) = msg {
            for block in content {
                if let phi_core::Content::Text { text } = block {
                    println!("Response: {}", text);
                }
            }
        }
    }

    // Continue the session with the winner
    // let (tx2, _rx2) = mpsc::unbounded_channel();
    // agent_loop_continue(&mut result.selected_context, &next_config, tx2, cancel2).await;
}

Custom evaluation strategies

Implement EvaluationStrategy for custom evaluation logic:

#![allow(unused)]
fn main() {
use phi_core::{AgentEvent, AgentMessage, ParallelLoopOutcome, Usage};
use phi_core::{EvaluationDecision, EvaluationStrategy};
use async_trait::async_trait;
use tokio::sync::mpsc;
use tokio_util::sync::CancellationToken;

struct LongestResponseEvaluation;

#[async_trait::async_trait]
impl EvaluationStrategy for LongestResponseEvaluation {
    async fn evaluate(
        &self,
        _prompts: &[AgentMessage],
        outcomes: &[ParallelLoopOutcome],
        _tx: &mpsc::UnboundedSender<AgentEvent>,
        _cancel: CancellationToken,
    ) -> (EvaluationDecision, Usage) {
        let idx = outcomes
            .iter()
            .enumerate()
            .max_by_key(|(_, o)| {
                // Sum all text content lengths across new messages
                o.new_messages.iter().filter_map(|m| m.as_llm()).flat_map(|msg| {
                    if let phi_core::Message::Assistant { content, .. } = msg {
                        content.iter().filter_map(|c| {
                            if let phi_core::Content::Text { text } = c { Some(text.len()) } else { None }
                        }).collect::<Vec<_>>()
                    } else {
                        vec![]
                    }
                }).sum::<usize>()
            })
            .map(|(i, _)| i)
            .unwrap_or(0);
        (EvaluationDecision::Select(idx), Usage::default())
    }
}
}

Messages & Events

Message Types

Message

The core LLM message type, tagged by role:

#![allow(unused)]
fn main() {
pub enum Message {
    User {
        content: Vec<Content>,
        timestamp: u64,
    },
    Assistant {
        content: Vec<Content>,
        stop_reason: StopReason,
        model: String,
        provider: String,
        usage: Usage,
        timestamp: u64,
        error_message: Option<String>,
    },
    ToolResult {
        tool_call_id: String,
        tool_name: String,
        content: Vec<Content>,
        is_error: bool,
        timestamp: u64,
        child_loop_id: Option<String>,  // set by sub-agent tools
    },
}
}

Create user messages easily:

#![allow(unused)]
fn main() {
let msg = Message::user("Hello, world!");
}

AgentMessage

Wraps Message with support for extension messages (UI-only, notifications, etc.):

#![allow(unused)]
fn main() {
pub enum AgentMessage {
    Llm(LlmMessage),
    Extension(ExtensionMessage),
}

pub struct LlmMessage {
    pub message: Message,
    /// Which turn produced this message. `None` for messages that predate
    /// turn tracking or are created outside the agent loop.
    pub turn_id: Option<TurnId>,
}

pub struct ExtensionMessage {
    pub role: String,
    pub kind: String,
    pub data: serde_json::Value,
}
}

Create extension messages with the convenience constructor:

#![allow(unused)]
fn main() {
let ext = ExtensionMessage::new("status_update", serde_json::json!({"status": "running"}));
let msg = AgentMessage::Extension(ext);
}

The kind field categorizes the extension (e.g., "status_update", "ui_event", "notification"). Use as_llm() to extract the Message if it's an LLM message. LlmMessage wraps a Message with an optional TurnId { loop_id, turn_index } for compaction tracking — this allows the compaction system to identify which turn produced each message. The default convert_to_llm function filters out Extension messages before sending to the provider.

All core message types implement Serialize, Deserialize, Clone, and PartialEq, enabling state persistence and test assertions.

Content

Each message contains Vec<Content>:

#![allow(unused)]
fn main() {
pub enum Content {
    Text { text: String },
    Image { data: String, mime_type: String },
    Thinking { thinking: String, signature: Option<String> },
    ToolCall { id: String, name: String, arguments: serde_json::Value },
}
}

An assistant message can contain multiple content blocks — e.g., thinking + text + tool calls.

The signature field on Content::Thinking is a cryptographic integrity token issued by the LLM provider (Anthropic calls it signature, OpenAI calls it encrypted_content, Gemini calls it thought_signature). It must be echoed back unmodified in multi-turn conversations — tampering or omitting it causes the provider to reject the request. It is None on providers that don't support extended thinking or on the first-turn generation.

StopReason

#![allow(unused)]
fn main() {
pub enum StopReason {
    Stop,              // Natural completion
    Length,            // Hit max tokens
    ToolUse,           // Wants to call tools
    Error,             // Provider error
    Aborted,           // Cancelled by user
    MaxTurns,          // Reached maximum allowed turns
    UserStop,          // Explicit user stop command
    Handoff,           // Handing off to a human operator
    GuardRail,         // Stopped by content moderation / safety filter
    ContextCompacted,  // Context was compacted to fit within limits
    Paused,            // Paused waiting for external input
}
}

Usage

Token usage from the provider:

#![allow(unused)]
fn main() {
pub struct Usage {
    pub input: u64,
    pub output: u64,
    pub cache_read: u64,
    pub cache_write: u64,
    pub total_tokens: u64,
}
}

AgentEvent

Events emitted during the agent loop for real-time UI updates:

EventWhen
AgentStart { agent_id, session_id, loop_id, parent_loop_id, continuation_kind, config_snapshot, timestamp }Loop begins. loop_id is "{session_id}.{config_id}.{N}". parent_loop_id is Some for continuations and sub-agents. continuation_kind is a ContinuationKind (Initial for first loops, Default/Rerun/Branch/Compaction for continuations). config_snapshot is Option<LoopConfigSnapshot> capturing model/provider settings for the loop.
AgentEnd { messages, timestamp, rejection }Loop finishes; rejection is Some when an InputFilter blocked input
TurnStart { turn_index, timestamp, triggered_by }New LLM call starting; turn_index is 0-based, triggered_by is User | SubAgent | Continuation | Branch
TurnEnd { message, timestamp, tool_results }LLM call + tool execution complete
MessageStart { message }A message is available
MessageUpdate { message, delta }Streaming delta arrived
MessageEnd { message }Message finalized
ToolExecutionStart { tool_call_id, tool_name, args }Tool about to run
ToolExecutionUpdate { tool_call_id, tool_name, partial_result }Tool progress
ToolExecutionEnd { tool_call_id, tool_name, result, is_error, child_loop_id }Tool finished. child_loop_id is Some when the tool was a sub-agent — it identifies the child loop that ran.
ProgressMessage { tool_call_id, tool_name, text }User-facing progress text from a tool
InputRejected { reason }Input filter rejected the user's message

StreamDelta

Deltas within MessageUpdate:

#![allow(unused)]
fn main() {
pub enum StreamDelta {
    Text { delta: String },
    Thinking { delta: String },
    ToolCallDelta { delta: String },
}
}

Agent State

The Agent struct provides access to its current state:

#![allow(unused)]
fn main() {
// Check if the agent is currently streaming a response
if agent.is_streaming() {
    // Use steer() or follow_up() instead of prompt()
    agent.steer(AgentMessage::Llm(Message::user("New instruction")));
}

// Access the full message history
let messages: &[AgentMessage] = agent.messages();

// Check the last message
if let Some(last) = messages.last() {
    println!("Last message role: {}", last.role());
}
}

The is_streaming() flag is true between prompt()/continue_loop() call and completion. While streaming, calling prompt() will panic — use steer() or follow_up() instead.

Tools

The AgentTool Trait

Every tool implements AgentTool:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait AgentTool: Send + Sync {
    fn name(&self) -> &str;
    fn label(&self) -> &str;
    fn description(&self) -> &str;
    fn parameters_schema(&self) -> serde_json::Value;
    async fn execute(
        &self,
        params: serde_json::Value,
        ctx: ToolContext,
    ) -> Result<ToolResult, ToolError>;
}
}
MethodPurpose
name()Unique ID sent to LLM (e.g., "bash")
label()Human-readable name for UI (e.g., "Run Command")
description()Tells the LLM what the tool does
parameters_schema()JSON Schema for the tool's parameters
execute()Runs the tool, returns ToolResult or ToolError. Receives a ToolContext with cancellation, update, and progress callbacks.

ToolContext

All execution context is bundled into a single struct, making the trait easier to extend in the future:

#![allow(unused)]
fn main() {
pub struct ToolContext {
    pub tool_call_id: String,
    pub tool_name: String,
    pub cancel: CancellationToken,
    pub on_update: Option<ToolUpdateFn>,
    pub on_progress: Option<ProgressFn>,
}
}
FieldPurpose
tool_call_idUnique ID for this tool call (for correlating events)
tool_nameName of the tool being executed
cancelCancellation token — check ctx.cancel.is_cancelled() in long-running tools
on_updateCallback for streaming partial ToolResult updates to the UI — carries structured data (ToolResult with content + details), emits AgentEvent::ToolExecutionUpdate. Use when you need progress percentages, partial results, or structured metadata.
on_progressCallback for lightweight text-only status messages — takes a single String, emits AgentEvent::ProgressMessage. Use for simple human-readable status lines (e.g., "Compiling...", "Almost done...").

ToolContext implements Clone and Debug.

ToolResult

#![allow(unused)]
fn main() {
pub struct ToolResult {
    pub content: Vec<Content>,
    pub details: serde_json::Value,
}
}

The content is sent back to the LLM. The details field holds metadata (not sent to the LLM) for UI/logging.

ToolError

#![allow(unused)]
fn main() {
pub enum ToolError {
    Failed(String),
    NotFound(String),
    InvalidArgs(String),
    Cancelled,
}
}

Errors are converted to ToolResult with is_error: true and sent back to the LLM so it can recover.

Implementing a Custom Tool

#![allow(unused)]
fn main() {
use phi_core::types::*;
use async_trait::async_trait;

pub struct WeatherTool;

#[async_trait]
impl AgentTool for WeatherTool {
    fn name(&self) -> &str { "get_weather" }
    fn label(&self) -> &str { "Weather" }
    fn description(&self) -> &str {
        "Get current weather for a city."
    }

    fn parameters_schema(&self) -> serde_json::Value {
        serde_json::json!({
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["city"]
        })
    }

    async fn execute(
        &self,
        params: serde_json::Value,
        _ctx: ToolContext,
    ) -> Result<ToolResult, ToolError> {
        let city = params["city"].as_str()
            .ok_or(ToolError::InvalidArgs("missing city".into()))?;

        // Call weather API...
        Ok(ToolResult {
            content: vec![Content::Text {
                text: format!("Weather in {}: 72°F, sunny", city),
            }],
            details: serde_json::Value::Null,
        })
    }
}
}

Register custom tools alongside defaults:

#![allow(unused)]
fn main() {
use phi_core::tools::default_tools;

let mut tools = default_tools();
tools.push(Box::new(WeatherTool));
let agent = BasicAgent::new(model_config).with_tools(tools);
}

Error Handling

Return Err(ToolError) on failure, not Ok with error text. When a tool returns Err, the agent loop converts it to a Message::ToolResult with is_error: true and sends it to the LLM. The LLM sees the error and can self-correct — retry with different arguments, try a different approach, or explain the failure to the user.

#![allow(unused)]
fn main() {
async fn execute(&self, params: serde_json::Value, _ctx: ToolContext) -> Result<ToolResult, ToolError> {
    let path = params["path"].as_str()
        .ok_or(ToolError::InvalidArgs("missing 'path'".into()))?;

    let content = std::fs::read_to_string(path)
        .map_err(|e| ToolError::Failed(format!("Cannot read {}: {}", path, e)))?;

    Ok(ToolResult {
        content: vec![Content::Text { text: content }],
        details: serde_json::Value::Null,
    })
}
}

Exception: BashTool. The built-in BashTool returns Ok even on non-zero exit codes, with both stdout and stderr in the result. This is intentional — the LLM needs to see the actual error output (compilation errors, test failures, etc.) to diagnose and fix issues. Only truly exceptional failures (e.g., command not found, cancellation) return Err.

Tool Execution Flow

  1. LLM returns Content::ToolCall blocks in its response
  2. Agent loop emits ToolExecutionStart for each
  3. Tool's execute() is called with parsed arguments
  4. Result (or error) is wrapped in Message::ToolResult
  5. ToolExecutionEnd is emitted
  6. All tool results are added to context
  7. Loop continues with another LLM call

Streaming Tool Output

Long-running tools can stream progress updates to the UI via the on_update callback. Each call emits a ToolExecutionUpdate event. Partial results are for UI/logging only — they are not sent to the LLM. Only the final ToolResult returned from execute() becomes part of the conversation.

The ToolUpdateFn type

#![allow(unused)]
fn main() {
pub type ToolUpdateFn = Arc<dyn Fn(ToolResult) + Send + Sync>;
}

Basic usage

Call on_update whenever you have progress to report:

#![allow(unused)]
fn main() {
use phi_core::types::*;

struct DataProcessorTool;

#[async_trait]
impl AgentTool for DataProcessorTool {
    // ... name, label, description, parameters_schema ...

    async fn execute(
        &self,
        params: serde_json::Value,
        ctx: ToolContext,
    ) -> Result<ToolResult, ToolError> {
        let rows = fetch_rows(&params)?;
        let total = rows.len();

        for (i, row) in rows.iter().enumerate() {
            // Check for cancellation
            if ctx.cancel.is_cancelled() {
                return Err(ToolError::Cancelled);
            }

            process_row(row);

            // Stream progress every 100 rows
            if i % 100 == 0 {
                if let Some(ref cb) = &ctx.on_update {
                    cb(ToolResult {
                        content: vec![Content::Text {
                            text: format!("Processed {}/{} rows", i, total),
                        }],
                        details: serde_json::json!({"progress": i as f64 / total as f64}),
                    });
                }
            }
        }

        Ok(ToolResult {
            content: vec![Content::Text {
                text: format!("Processed all {} rows", total),
            }],
            details: serde_json::Value::Null,
        })
    }
}
}

Consuming updates in your UI

Updates arrive as AgentEvent::ToolExecutionUpdate events on the same event stream as all other agent events:

#![allow(unused)]
fn main() {
while let Some(event) = rx.recv().await {
    match event {
        AgentEvent::ToolExecutionStart { tool_name, .. } => {
            println!("⏳ {} started", tool_name);
        }
        AgentEvent::ToolExecutionUpdate { tool_name, partial_result, .. } => {
            // Show progress in your UI
            if let Some(Content::Text { text }) = partial_result.content.first() {
                println!("  📊 {}: {}", tool_name, text);
            }
        }
        AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => {
            println!("{} {}", if is_error { "❌" } else { "✅" }, tool_name);
        }
        AgentEvent::ProgressMessage { tool_name, text, .. } => {
            println!("  💬 {}: {}", tool_name, text);
        }
        _ => {}
    }
}
}

Progress Messages

In addition to on_update (which streams partial ToolResult values), tools can emit lightweight text-only progress messages via ctx.on_progress. These appear as AgentEvent::ProgressMessage events:

#![allow(unused)]
fn main() {
async fn execute(&self, params: serde_json::Value, ctx: ToolContext) -> Result<ToolResult, ToolError> {
    if let Some(ref progress) = &ctx.on_progress {
        progress("Starting analysis...".into());
    }

    // ... do work ...

    if let Some(ref progress) = &ctx.on_progress {
        progress("Almost done...".into());
    }

    Ok(ToolResult { /* ... */ })
}
}

Use on_progress for simple status text. Use on_update when you need structured data (progress percentages, partial results).

Guidelines

  • Call on_update as often as useful — there's no rate limit. The callback is synchronous and cheap.
  • Always check ctx.on_update.is_some() before building the ToolResult. If None, the loop isn't interested in updates (e.g., testing).
  • Use details for structured datacontent is for human-readable text, details can carry progress percentages, byte counts, etc.
  • Don't rely on updates reaching the LLM — they won't. Only the final return value is added to context.
  • Simple tools don't need it — if your tool completes in <1 second, just ignore ctx (prefix with _ctx to suppress the warning).

End-to-end example

Here's a complete example: a CLI agent with a deploy tool that streams progress. The human sees real-time output while the LLM only gets the final result.

use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;
use phi_core::types::*;

/// A tool that deploys an app and streams each step.
struct DeployTool;

#[async_trait]
impl AgentTool for DeployTool {
    fn name(&self) -> &str { "deploy" }
    fn label(&self) -> &str { "Deploy App" }
    fn description(&self) -> &str { "Deploy the application to production." }
    fn parameters_schema(&self) -> serde_json::Value {
        serde_json::json!({
            "type": "object",
            "properties": {
                "env": { "type": "string", "description": "Target environment" }
            },
            "required": ["env"]
        })
    }

    async fn execute(
        &self,
        params: serde_json::Value,
        ctx: ToolContext,
    ) -> Result<ToolResult, ToolError> {
        let env = params["env"].as_str().unwrap_or("staging");

        let steps = ["Building image", "Running tests", "Pushing to registry", "Rolling out"];
        for (i, step) in steps.iter().enumerate() {
            if ctx.cancel.is_cancelled() {
                return Err(ToolError::Cancelled);
            }

            // Stream each step to the UI
            if let Some(ref cb) = &ctx.on_update {
                cb(ToolResult {
                    content: vec![Content::Text {
                        text: format!("[{}/{}] {}...", i + 1, steps.len(), step),
                    }],
                    details: serde_json::json!({
                        "step": i + 1,
                        "total": steps.len(),
                        "phase": step,
                    }),
                });
            }

            // Simulate work
            tokio::time::sleep(std::time::Duration::from_secs(2)).await;
        }

        // Only this final result is sent to the LLM
        Ok(ToolResult {
            content: vec![Content::Text {
                text: format!("Successfully deployed to {}", env),
            }],
            details: serde_json::json!({"env": env, "status": "success"}),
        })
    }
}

#[tokio::main]
async fn main() {
    let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
    let mut agent = BasicAgent::new(ModelConfig::anthropic(
        "claude-sonnet-4-20250514",
        "Claude Sonnet 4",
        &api_key,
    ))
    .with_system_prompt("You are a deployment assistant.")
    .with_tools(vec![Box::new(DeployTool)]);

    let mut rx = agent.prompt("Deploy to production").await;

    while let Some(event) = rx.recv().await {
        match event {
            // LLM text streaming
            AgentEvent::MessageUpdate {
                delta: StreamDelta::Text { delta }, ..
            } => print!("{}", delta),

            // Tool progress streaming
            AgentEvent::ToolExecutionStart { tool_name, .. } => {
                println!("\n🚀 Starting {}...", tool_name);
            }
            AgentEvent::ToolExecutionUpdate { partial_result, .. } => {
                if let Some(Content::Text { text }) = partial_result.content.first() {
                    println!("  {}", text);
                }
            }
            AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => {
                if is_error {
                    println!("  ❌ {} failed", tool_name);
                } else {
                    println!("  ✅ {} complete", tool_name);
                }
            }
            AgentEvent::ProgressMessage { text, .. } => {
                println!("  💬 {}", text);
            }

            AgentEvent::AgentEnd { .. } => break,
            _ => {}
        }
    }
}

Running this produces:

🚀 Starting deploy...
  [1/4] Building image...
  [2/4] Running tests...
  [3/4] Pushing to registry...
  [4/4] Rolling out...
  ✅ deploy complete
Successfully deployed to production. The deployment completed all 4 stages.

The human sees each step as it happens. The LLM only sees "Successfully deployed to production" and can continue the conversation from there.

How agents benefit

When an AI agent (like a coding assistant) uses phi-core, streaming tool output helps in two ways:

  1. Human oversight — The human watching the agent work sees real-time progress instead of waiting for a tool to finish. A bash command running cargo build can stream compiler output as it happens, so the human can interrupt early if something is wrong.

  2. Agent UIs — Tools like web dashboards, IDE extensions, or chat interfaces can render live progress bars, log tails, or status indicators. The details field in ToolResult carries structured data (progress percentage, byte counts, etc.) that UIs can render however they want.

The LLM itself doesn't see updates — it works with final results only. This is intentional: partial output would waste context tokens and confuse the model. The streaming is purely a human-facing feature.

Execution Strategies

When the LLM returns multiple tool calls in a single response (e.g., "read file A, read file B, run bash C"), ToolExecutionStrategy controls how they run:

The enum is defined with #[derive(Default)] and Parallel carries the #[default] attribute:

#![allow(unused)]
fn main() {
pub enum ToolExecutionStrategy {
    Sequential,
    #[default]
    Parallel,
    Batched { size: usize },
}
}
StrategyBehavior
SequentialOne at a time. Steering checked between each tool. Use for debugging or tools with shared mutable state.
Parallel (#[default])All tool calls run concurrently via futures::join_all. Steering checked after all complete. Best latency for independent tools.
Batched { size: usize }Run in groups of size. Steering checked between batches. Balances speed with human-in-the-loop control.

Configuration

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;
use phi_core::types::ToolExecutionStrategy;

// Default — parallel (fastest)
let agent = BasicAgent::new(model_config.clone());

// Sequential (debug / shared state)
let agent = BasicAgent::new(model_config.clone())
    .with_tool_execution(ToolExecutionStrategy::Sequential);

// Batched — 3 at a time
let agent = BasicAgent::new(model_config.clone())
    .with_tool_execution(ToolExecutionStrategy::Batched { size: 3 });
}

When to use each

  • Parallel (default): Most tool calls are independent — file reads, searches, API calls. Running them concurrently can cut latency dramatically (3 tools × 50ms = ~50ms instead of ~150ms).
  • Sequential: When tools have side effects that depend on order, or when you need fine-grained steering control between each tool.
  • Batched: When you want parallelism but also want steering checkpoints. For example, Batched { size: 3 } runs 3 tools concurrently, checks for user interrupts, then runs the next 3.

Steering messages are always checked between execution units (between each tool in Sequential, after all tools in Parallel, between batches in Batched). If a user interrupts, remaining tools are skipped.

Context Management

Long-running agents accumulate messages that exceed the model's context window. phi-core provides token tracking, overflow detection, tiered compaction, and execution limits.

The context module is split into sub-modules: token, config, tracker, compaction, strategy, compact_messages, execution, orchestration.

Token Estimation

Fast estimation without external tokenizer dependencies:

#![allow(unused)]
fn main() {
use phi_core::context::{estimate_tokens, message_tokens, total_tokens};

estimate_tokens("Hello world");          // ~3 tokens (chars / 4)
message_tokens(&agent_message);          // estimate for a single message
total_tokens(&messages);                 // estimate for all messages
}

Context Tracking

ContextTracker combines real token counts from provider responses with estimation for new messages — more accurate than pure estimation:

#![allow(unused)]
fn main() {
use phi_core::context::ContextTracker;

let mut tracker = ContextTracker::new();

// After each assistant response, record the real usage:
tracker.record_usage(&assistant_usage, message_index);

// Get current context size (real usage + estimated trailing):
let tokens = tracker.estimate_context_tokens(agent.messages());

// After compaction, reset the tracker:
tracker.reset();
}

When no usage data is available, it falls back to chars/4 estimation.

Context Overflow Detection

When the context exceeds a model's window, providers return overflow errors. phi-core detects these automatically across all major providers.

HTTP-level detection

Providers that check before streaming (Google, Bedrock, Vertex) return ProviderError::ContextOverflow:

#![allow(unused)]
fn main() {
use phi_core::provider::ProviderError;

match agent.prompt("...").await {
    // The loop already handles this — but you can also match it:
    Err(ProviderError::ContextOverflow { message }) => {
        // Compact and retry
    }
    _ => {}
}
}

ProviderError::classify() auto-detects overflow from error messages covering Anthropic, OpenAI, Google, AWS Bedrock, xAI, Groq, OpenRouter, llama.cpp, LM Studio, MiniMax, Kimi, GitHub Copilot, and generic patterns.

Message-level detection

SSE-based providers (Anthropic, OpenAI) return overflow as a StopReason::Error message. Check with:

#![allow(unused)]
fn main() {
if message.is_context_overflow() {
    // Compact and retry
}
}

Handling overflow in your application

phi-core provides the detection and building blocks. Your application wires the compaction strategy:

#![allow(unused)]
fn main() {
// Proactive: check before each prompt
let tokens = tracker.estimate_context_tokens(agent.messages());
if tokens > context_window - reserve {
    let compacted = compact_messages(agent.messages().to_vec(), &config);
    agent.replace_messages(compacted);
}

// Reactive: catch overflow errors
// ... on ContextOverflow or message.is_context_overflow():
//   compact, then retry with agent.continue_loop()
}

For LLM-based summarization (asking the model to summarize old messages), implement that in your application layer — phi-core provides replace_messages() and compact_messages() as building blocks.

ContextConfig

#![allow(unused)]
fn main() {
pub struct ContextConfig {
    pub max_context_tokens: usize,      // Default: 100,000
    pub system_prompt_tokens: usize,    // Default: 4,000
    pub compaction: CompactionConfig,   // Primary compaction settings

    // Custom token counter (serde-skipped). None → HeuristicTokenCounter (chars/4).
    pub token_counter: Option<Arc<dyn TokenCounter>>,

    // Legacy backward-compat fields (prefer CompactionConfig equivalents):
    pub keep_recent: usize,             // Default: 10
    pub keep_first: usize,             // Default: 2
    pub tool_output_max_lines: usize,  // Default: 50
}

pub struct CompactionConfig {
    // ── WHEN to compact ──
    pub compact_at_pct: f64,                     // Default: 0.90 (90%)
    pub compact_budget_threshold_pct: f64,       // Default: 0.05 (5%)
    pub compaction_scope: CompactionScope,       // Default: FixedCount(3)

    // ── HOW to compact ──
    pub keep_first_turns: usize,                 // Default: 2
    pub keep_recent_turns: usize,                // Default: 10
    pub max_summary_tokens: usize,               // Default: 2_000 (budget, not per-turn)
    pub tool_output_max_lines: usize,            // Default: 50
}
}

CompactionScope

Controls how many earlier loops are included in compaction and context loading:

VariantDescription
FixedCount(usize)Compact a fixed number of earlier loops on the active chain. Default: FixedCount(3).
TokenBudgetWalk the chain backward, accumulating per-loop token estimates, and stop when max_context_tokens would be exceeded. Loops whose raw messages exceed the budget are still included — their compacted summaries will fit.

See compaction.md for full details on the non-destructive overlay model.

Tiered Compaction

compact_messages() tries each level in order, stopping as soon as messages fit the budget:

Level 1: Truncate Tool Outputs

Replaces long tool outputs with head + tail (keeping first N/2 and last N/2 lines). This is the cheapest — preserves conversation structure, typically saves 50-70% in coding sessions.

Level 2: Summarize Old Turns

Keeps the last keep_recent messages in full detail. Older assistant messages are replaced with one-line summaries like "[Summary] [Assistant used 3 tool(s)]", and their tool results are dropped.

Level 3: Drop Middle Messages

Keeps keep_first messages from the start and keep_recent from the end, dropping everything in between. A marker message notes how many were removed.

ExecutionLimits

Prevents runaway agents:

#![allow(unused)]
fn main() {
pub struct ExecutionLimits {
    pub max_turns: usize,              // Default: 50
    pub max_total_tokens: usize,       // Default: 1,000,000
    pub max_duration: Duration,        // Default: 600s (10 min)
    pub max_cost: Option<f64>,         // Default: None (no cost cap)
}
}

max_cost caps cumulative dollar cost for the run. Requires AgentLoopConfig.cost_config to be set — without pricing rates the accumulated cost is always 0.0 and this limit has no effect.

When a limit is reached, the agent stops with a message like "[Agent stopped: Max turns reached (50/50)]".

Disabling Context Management

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(model_config)
    .without_context_management();
}

This sets both context_config and execution_limits to None.

Prompt Caching

phi-core automatically optimizes API costs through prompt caching. For providers that support it, stable content (system prompts, tool definitions, conversation history) is cached between turns, giving you up to 90% savings on input tokens.

How It Works

In a multi-turn agent loop, each request sends the full context: system prompt + tools + conversation history. Without caching, you pay full price for all of it every turn. With caching, the provider reuses previously processed prefixes.

Provider Support

ProviderCaching TypeSavingsFramework Action
AnthropicExplicit (cache breakpoints)90% on hits✅ Auto-placed
OpenAIAutomatic (>1024 tokens)50% on hitsNone needed
Google GeminiImplicit (automatic)VariesNone needed
Azure OpenAIAutomatic (same as OpenAI)50% on hitsNone needed
Amazon BedrockNot yet implementedN/ACacheConfig accepted but no breakpoints placed

What Gets Cached (Anthropic)

phi-core places up to 3 cache breakpoints automatically:

  1. System prompt — stable across all turns
  2. Tool definitions — rarely change between turns
  3. Conversation history — second-to-last message, so the growing prefix is cached

This means on a typical multi-turn conversation, only the latest user message and the new assistant response cost full price.

Configuration

Caching is enabled by default with automatic breakpoint placement. No configuration needed for optimal behavior.

Disable Caching

Use CacheStrategy::Disabled to turn off all cache breakpoint placement while keeping the config structure intact. Alternatively, set enabled: false on the CacheConfig master switch.

#![allow(unused)]
fn main() {
use phi_core::{BasicAgent, CacheConfig, CacheStrategy};
use phi_core::provider::ModelConfig;

let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();

// Option 1: CacheStrategy::Disabled (preferred — explicit intent)
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_cache_config(CacheConfig {
    strategy: CacheStrategy::Disabled,
    ..Default::default()
});

// Option 2: Master switch (equivalent effect)
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_cache_config(CacheConfig {
    enabled: false,
    ..Default::default()
});
}

Fine-Grained Control

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_cache_config(CacheConfig {
    enabled: true,
    strategy: CacheStrategy::Manual {
        cache_system: true,
        cache_tools: true,
        cache_messages: false, // Don't cache conversation history
    },
});
}

Monitoring Cache Usage

Every Usage struct includes cache statistics:

#![allow(unused)]
fn main() {
// After a response:
let usage = message.usage(); // from assistant message
println!("Cache read: {} tokens", usage.cache_read);
println!("Cache write: {} tokens", usage.cache_write);
println!("Cache hit rate: {:.1}%", usage.cache_hit_rate() * 100.0);
}
  • cache_read — tokens served from cache (cheap)
  • cache_write — tokens written to cache (slightly more than base price)
  • cache_hit_rate() — fraction of input tokens from cache (0.0–1.0)

Cost Impact

For a typical 10-turn agent conversation with Anthropic Claude:

Without CachingWith Caching (auto)
~500K input tokens billed at full price~50K at full price + ~450K at 10% price
$2.50 (Sonnet)$0.39 (Sonnet)

That's an 84% cost reduction with zero configuration.

Best Practices

  1. Keep system prompts stable — changing the system prompt between turns invalidates the cache
  2. Don't shuffle tools — tool order matters for cache prefix matching
  3. Let it work automatically — the default CacheStrategy::Auto is optimal for most use cases. The three strategies are Auto (recommended), Disabled (no breakpoints), and Manual (fine-grained control)
  4. Monitor cache_hit_rate() — if it's consistently low, check if your system prompt or tools are changing unexpectedly

Retry with Backoff

When an LLM provider returns a transient error — rate limit (HTTP 429) or network failure — phi-core automatically retries with exponential backoff and jitter. No configuration required; it works out of the box.

How it works

Request → Error? → Retryable? → Wait (backoff + jitter) → Retry → ...
                       ↓ No
                  Fail immediately
  1. The agent loop calls the provider
  2. If the provider returns a retryable error:
    • If a retry-after delay was provided (rate limits), use that
    • Otherwise, calculate delay: initial_delay × multiplier^(attempt-1) with ±20% jitter
    • Wait, then retry
  3. After max_retries attempts, the error propagates normally

What gets retried

Error TypeRetried?Why
RateLimited (429)✅ YesTemporary — provider will accept requests again soon
Network✅ YesTransient — connection resets, timeouts, DNS failures
Auth (401/403)❌ NoPermanent — wrong API key won't fix itself
Api (400, etc.)❌ NoPermanent — bad request won't change on retry
Cancelled❌ NoUser-initiated — respect the cancellation

Default configuration

#![allow(unused)]
fn main() {
RetryConfig {
    max_retries: 3,          // Up to 3 retry attempts
    initial_delay_ms: 1000,  // 1 second before first retry
    backoff_multiplier: 2.0, // Double the delay each attempt
    max_delay_ms: 30_000,    // Cap at 30 seconds
}
}

With defaults, the retry delays are approximately:

  • Attempt 1: ~1s
  • Attempt 2: ~2s
  • Attempt 3: ~4s

(±20% jitter to avoid thundering herd when multiple agents hit the same provider)

Configuration

Using the Agent builder

#![allow(unused)]
fn main() {
use phi_core::{BasicAgent, RetryConfig};
use phi_core::provider::ModelConfig;

let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();

// Default — 3 retries, exponential backoff (recommended)
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
));

// Custom — more retries, longer initial delay
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_retry_config(RetryConfig {
    max_retries: 5,
    initial_delay_ms: 2000,
    backoff_multiplier: 2.0,
    max_delay_ms: 60_000,
});

// Disable retries entirely
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_retry_config(RetryConfig::none());
}

Using AgentLoopConfig directly

#![allow(unused)]
fn main() {
use phi_core::agent_loop::AgentLoopConfig;
use phi_core::RetryConfig;

let config = AgentLoopConfig {
    // ...other fields...
    retry_config: RetryConfig {
        max_retries: 3,
        initial_delay_ms: 1000,
        backoff_multiplier: 2.0,
        max_delay_ms: 30_000,
    },
    ..Default::default()
};
}

Rate limit headers

When a provider returns ProviderError::RateLimited { retry_after_ms: Some(5000) }, phi-core uses that exact delay instead of the calculated backoff. This respects the provider's guidance — if Anthropic says "retry after 5 seconds", we wait 5 seconds, not our own estimate.

If no retry_after_ms is provided, the exponential backoff kicks in.

Observability

Retry attempts are logged via tracing at the WARN level:

WARN Provider error (attempt 1/3), retrying in 1.1s: Rate limited, retry after 1000ms
WARN Provider error (attempt 2/3), retrying in 2.3s: Rate limited, retry after 2000ms

Subscribe to tracing events in your application to surface these in your UI:

#![allow(unused)]
fn main() {
use tracing_subscriber;

// Simple stderr logging
tracing_subscriber::fmt::init();

// Or filter to just retries
tracing_subscriber::fmt()
    .with_env_filter("phi_core::provider::retry=warn")
    .init();
}

Design notes

  • Retry lives in the agent loop, not inside individual providers. One config controls all retry behavior.
  • Jitter prevents thundering herd: when many agents hit a rate limit simultaneously, jitter spreads their retries so they don't all retry at the same instant.
  • Cancellation is respected: if the user cancels while waiting for a retry, the loop exits immediately.
  • No retry on API errors: a malformed request will fail the same way every time. Retrying wastes time and tokens.

Skills

Skills extend an agent with domain expertise using the AgentSkills open standard. A skill is a directory containing a SKILL.md file with instructions the agent can load on demand.

How it works

Skills use progressive disclosure to manage context efficiently:

  1. Metadata (~100 tokens/skill) — name + description, always in the system prompt
  2. Instructions (<5k tokens) — SKILL.md body, loaded when the agent decides the skill is relevant
  3. Resources (unlimited) — scripts, references, assets — loaded only when needed

The agent decides when to activate a skill based on the description alone. No trigger engine needed.

Skill format

my-skill/
├── SKILL.md          # Required: YAML frontmatter + instructions
├── scripts/          # Optional: executable code
├── references/       # Optional: documentation loaded on demand
└── assets/           # Optional: templates, static resources

SKILL.md uses YAML frontmatter:

---
name: git
description: Git operations — commit, branch, merge, rebase. Use when the user mentions version control.
---

# Git Skill

## Workflow
1. Run `git status` first
2. Stage changes, write conventional commit messages
3. For merges, check for conflicts first

## Scripts
For complex diffs: `bash {baseDir}/scripts/diff_summary.sh`

Loading skills

#![allow(unused)]
fn main() {
use phi_core::SkillSet;
use std::path::PathBuf;

// Load from multiple directories (later dirs override earlier on name conflict)
let skills = SkillSet::load(&[PathBuf::from("./skills"), PathBuf::from("~/.phi-core/skills")]);

// Or load from a single directory with a label
let workspace_skills = SkillSet::load_dir("./skills", "workspace");
}

Using with Agent

#![allow(unused)]
fn main() {
use phi_core::{BasicAgent, SkillSet};
use phi_core::provider::ModelConfig;
use std::path::PathBuf;

let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
let skills = SkillSet::load(&[PathBuf::from("./skills")]);

let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_system_prompt("You are a coding assistant.")
.with_skills(skills)  // Appends skill index to system prompt
.with_tools(tools);
}

The agent's system prompt will include:

<available_skills>
  <skill>
    <name>git</name>
    <description>Git operations — commit, branch, merge, rebase.</description>
    <location>/path/to/skills/git/SKILL.md</location>
  </skill>
</available_skills>

When the agent encounters a task matching a skill, it reads the SKILL.md using the read_file tool and follows the instructions. No special infrastructure needed.

Precedence

When loading from multiple directories, later directories take precedence. A skill in ./skills/ overrides the same-named skill in ~/.phi-core/skills/.

You can also merge skill sets explicitly:

#![allow(unused)]
fn main() {
let mut base = SkillSet::load_dir("/usr/share/phi-core/skills", "bundled")?;
let user = SkillSet::load_dir("~/.phi-core/skills", "user")?;
let workspace = SkillSet::load_dir("./skills", "workspace")?;

base.merge(user);
base.merge(workspace); // workspace wins on conflict
}

Compatibility

By following the AgentSkills standard, skills written for phi-core work with Claude Code, Codex CLI, Gemini CLI, Cursor, OpenCode, Goose, and any other compatible agent. Write once, use everywhere.

Design philosophy

Skills are deliberately simple:

  • No trigger engine — the LLM decides from descriptions
  • No compile-time registration — skills use existing tools (read_file, bash)
  • No plugin API — skills are just files
  • No runtime loading — loaded at startup, that's it

If a skill needs a custom tool, it can provide an MCP server.

Sub-Agents

Sub-agents let a parent agent delegate tasks to child agent loops, each with their own system prompt, tools, and ModelConfig. The parent LLM invokes them like any other tool.

Overview

Parent Agent
├── prompt("Research X and implement Y")
│   ├── calls SubAgentTool("researcher", task="Research X")
│   │   └── child agent_loop() with read/search tools → returns findings
│   ├── calls SubAgentTool("coder", task="Implement Y based on findings")
│   │   └── child agent_loop() with edit/write tools → returns result
│   └── summarizes both results

Each sub-agent invocation starts a fresh conversation — no state leaks between calls.

Creating Sub-Agents

#![allow(unused)]
fn main() {
use phi_core::agents::SubAgentTool;
use phi_core::provider::ModelConfig;
use phi_core::tools;
use std::sync::Arc;

let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();

let researcher = SubAgentTool::new(
    "researcher",
    ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key),
)
.with_description("Searches and reads files to gather information.")
.with_system_prompt("You are a research assistant. Be thorough and concise.")
.with_tools(vec![
    Arc::new(tools::ReadFileTool::new()),
    Arc::new(tools::SearchTool::new()),
])
.with_max_turns(10);
}

Registering on a Parent Agent

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;

let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
let mut agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_system_prompt("You coordinate between sub-agents.")
.with_sub_agent(researcher)
.with_sub_agent(coder);
}

The parent sees sub-agents as regular tools. It decides when to delegate based on its system prompt.

Parallel Execution

When the parent LLM calls multiple sub-agents in a single response, they run concurrently (default Parallel strategy). Two sub-agents each taking 50ms complete in ~50ms total, not 100ms.

Configuration

MethodPurpose
with_description()What the parent LLM sees (helps it decide when to delegate)
with_system_prompt()The sub-agent's own instructions
with_provider_override(provider)Bypass ProviderRegistry (primarily for tests)
with_tools()Tools available to the sub-agent (accepts Vec<Arc<dyn AgentTool>>)
with_max_turns(N)Turn limit (default: 10). Primary guard against runaway execution.
with_max_tokens(N)Max tokens for LLM responses
with_thinking()Enable extended thinking for the sub-agent
with_cache_config()Prompt caching settings
with_tool_execution(strategy)Tool execution strategy (Parallel, Sequential, Batched)
with_retry_config(config)Retry configuration for transient errors
with_parent_loop_id(id: String)Sets parent_loop_id on the child's AgentContext. The child's AgentStart event will carry this value, enabling parent→child ancestry tracing across the event stream.

Event Forwarding

When the parent provides an on_update callback (standard for all tools), sub-agent events are forwarded as ToolExecutionUpdate events. The parent's UI sees real-time progress from the child:

  • Text deltas from the sub-agent's LLM responses
  • Tool call notifications from the sub-agent's tool usage

When the child loop completes, the parent emits ToolExecutionEnd with child_loop_id: Some(loop_id) set to the child's loop_id. This lets you correlate ToolExecutionEnd on the parent side with AgentStart/AgentEnd on the child side when both event streams are consumed.

Design Decisions

  • Context isolation: Each invocation starts fresh. Sub-agents don't accumulate history across calls.
  • No nesting: Sub-agents are not given other SubAgentTools. This prevents infinite delegation chains.
  • Cancellation propagation: The parent's cancellation token is forwarded. Aborting the parent aborts all sub-agents.
  • Turn limiting: The default 10-turn limit prevents runaway execution. The parent's execution limits also apply to total wall-clock time.

Example

See examples/sub_agent.rs for a complete coordinator with researcher and coder sub-agents.

State Persistence

phi-core supports saving and restoring agent conversation state, enabling pause/resume workflows, state transfer between processes, and conversation checkpointing.

Save and Restore

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;

// After running some conversation turns...
let json = agent.save_messages();
std::fs::write("conversation.json", &json)?;

// Later, in a new process:
let json = std::fs::read_to_string("conversation.json")?;
let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
let mut agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_system_prompt("You are helpful.");

agent.restore_messages(&json)?;

// Continue the conversation — the agent sees the full history
let rx = agent.prompt("Follow up question").await;
}

Builder Initialization

For constructing an agent with pre-existing history:

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;

let saved: Vec<AgentMessage> = serde_json::from_str(&json)?;
let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .with_messages(saved)
    .with_system_prompt("...");
}

JSON Format

Messages serialize as a JSON array. Each message is tagged by role:

[
  {
    "role": "user",
    "content": [{"type": "text", "text": "Hello"}],
    "timestamp": 1700000000000
  },
  {
    "role": "assistant",
    "content": [{"type": "text", "text": "Hi there!"}],
    "stopReason": "stop",
    "model": "claude-sonnet-4-20250514",
    "provider": "anthropic",
    "usage": {"input": 100, "output": 50, "cache_read": 0, "cache_write": 0, "total_tokens": 150},
    "timestamp": 1700000001000
  }
]

Extension messages use a nested structure:

{
  "role": "extension",
  "kind": "status_update",
  "data": {"status": "running"}
}

Context Tracking

ContextTracker and ExecutionTracker are runtime-only and not persisted. This is by design — both are created fresh each agent_loop() invocation and operate on whatever messages are in context at that point. Restoring messages and calling prompt() works correctly without any special recalculation.

What's Serializable

TypeSerializeDeserializePartialEq
ContentYesYesYes
MessageYesYesYes
AgentMessageYesYesYes
ExtensionMessageYesYesYes
UsageYesYesYes
StopReasonYesYesYes
ToolResultYesYesYes
CacheConfigYesYesYes
ToolExecutionStrategyYesYesYes
ContextConfigYesYesNo
ExecutionLimitsYesYesNo

Lifecycle Callbacks

phi-core provides four tiers of lifecycle callbacks that let you observe and control the agent loop without modifying its internals. Loop-level, turn-level, and tool-level callbacks are set on AgentLoopConfig (or via Agent builder methods). Session-level callbacks (before_task / after_task) are set on SessionRecorderConfig.

0.9.0 — async hook bodies. All loop-level, turn-level, and the non-update tool-level hooks below (plus the two compaction hooks) are now async. The on_* builders on BasicAgent accept closures whose bodies return Pin<Box<dyn Future<Output = T> + Send>> — wrap sync bodies in Box::pin(async move { ... }), or .await LLM and other async work directly. The two tool-update hooks (before_tool_execution_update / after_tool_execution_update) stay sync — see the note next to their sections for the rationale. CHANGELOG [0.9.0] § Migration carries the full mechanical recipe.

Tiers Overview

TierHooksScope
Session-levelbefore_task, after_taskOnce per session (on SessionRecorderConfig)
Loop-levelbefore_loop, after_loopOnce per agent_loop() / agent_loop_continue() call
Turn-levelbefore_turn, after_turn, on_errorOnce per LLM call (every turn)
Tool-levelbefore_tool_execution, after_tool_execution, before_tool_execution_update, after_tool_execution_updateOnce per tool call

Loop-Level Hooks

before_loop

Called once before AgentStart is emitted. Receives the current message history and an initial usage counter of 0. Return false to abort the entire run — AgentEnd is emitted with an empty message list and the loop exits immediately.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_before_loop(|messages, _usage| {
        println!("Starting run with {} existing messages", messages.len());
        true // return false to abort
    });
}

after_loop

Called once after AgentEnd is emitted. Receives the new messages produced during the run and the accumulated Usage across all turns.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_after_loop(|new_messages, total_usage| {
        println!(
            "Run complete: {} new messages, {} total tokens",
            new_messages.len(),
            total_usage.total_tokens
        );
    });
}

Turn-Level Hooks

before_turn

Called before each LLM call. Receives the current message history and the turn number (0-indexed). Return false to abort the loop.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_before_turn(|messages, turn| {
        println!("Turn {} starting with {} messages", turn, messages.len());
        turn < 10 // Stop after 10 turns
    });
}

after_turn

Called after each LLM response and tool execution. Receives the updated message history and the turn's token usage.

#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex};

let total_cost = Arc::new(Mutex::new(0u64));
let cost_tracker = total_cost.clone();

let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_after_turn(move |_messages, usage| {
        let mut cost = cost_tracker.lock().unwrap();
        *cost += usage.input + usage.output;
        println!("Cumulative tokens: {}", *cost);
    });
}

on_error

Called when the LLM returns a StopReason::Error. Receives the error message string.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_error(|err| {
        eprintln!("LLM error: {}", err);
        // Log to monitoring, send alert, etc.
    });
}

Tool-Level Hooks

before_tool_execution

Called before each tool starts, after the ToolExecutionStart event would normally emit. Receives the tool name, call ID, and arguments. Return false to skip the tool — a ToolExecutionEnd with an error result is emitted and the tool's execute() is never called.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_before_tool_execution(|name, call_id, _args| {
        println!("About to run tool: {}", name);
        // Return false to block specific tools:
        name != "bash" // block bash, allow everything else
    });
}

after_tool_execution

Called after each tool finishes (after ToolExecutionEnd is emitted). Receives the tool name, call ID, and whether the result was an error.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_after_tool_execution(|name, call_id, is_error| {
        if is_error {
            eprintln!("Tool {} ({}) failed", name, call_id);
        }
    });
}

before_tool_execution_update (sync — see note below)

Called before each ToolExecutionUpdate event (streaming progress from a running tool). Return false to suppress the event — the tool keeps running and the final ToolResult is unaffected; only the intermediate streaming update is dropped.

Pre-existing-behaviour preservation note (phi-core 0.9.0). The two tool-update hooks (before_tool_execution_update / after_tool_execution_update) remain sync after the 0.9.0 async-trait migration. Async-ifying them would cascade into the ToolUpdateFn callback type and every AgentTool::execute body that invokes ctx.on_update(...) — materially wider than the 0.9.0 scope. The veto decision in before_tool_execution_update must be synchronous so the surrounding emit gate works without an .await suspension at every streamed tool-update. Async work at update-time should be dispatched via tokio::spawn(...) inside the sync closure body. Tracked in the CHANGELOG [Unreleased] "Forward markers" for a future release.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_before_tool_execution_update(|name, call_id, text| {
        // Only forward updates for bash tool
        name == "bash"
    });
}

after_tool_execution_update

Called after each ToolExecutionUpdate event, only if it was not suppressed by before_tool_execution_update.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_after_tool_execution_update(|name, call_id, text| {
        // e.g., log streaming updates to a file
    });
}

Script Callbacks

In addition to Rust closures, callbacks can be implemented as external shell or Python scripts. This allows non-Rust consumers to hook into the agent lifecycle without compiling Rust code.

Script callbacks are specified as command strings (e.g., "./scripts/on_task_start.sh" or "python3 scripts/after_turn.py"). The agent loop spawns the script as a subprocess, passing relevant context (such as session ID, turn number, or tool name) as environment variables or arguments. The script's exit code determines whether the action proceeds (0 = continue, non-zero = abort, for Before* hooks).

Script callbacks can be configured in the [callbacks] section of the config file or set programmatically via the Agent trait.

All callback tiers are wired in the script callback bridge. Loop-level (before_loop, after_loop), tool-level (before_tool_execution, after_tool_execution), compaction-level (before_compaction_start, after_compaction_end), and turn-level (before_turn, after_turn) hooks are all resolved from the [callbacks] config section and bridged to external scripts. The bridge passes hook context as JSON (message count, turn index, tool name, etc.) via stdin to the subprocess.


Hook Ordering

The hooks fire in strict order relative to their paired events. This ordering is an invariant — it is enforced at runtime:

before_loop
  → AgentStart
    before_turn
      → TurnStart
        [MessageStart / MessageUpdate* / MessageEnd]
        [per tool call:]
          before_tool_execution
            → ToolExecutionStart
              (before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)*
            ToolExecutionEnd →
          after_tool_execution
        [if context budget exceeded:]
          before_compaction_start
            → CompactionStarted
            CompactionEnded →
          after_compaction_end
      TurnEnd →
    after_turn
  AgentEnd →
after_loop

Short-Circuit Rules

Hook returns falseEffect
before_loopAborts before AgentStart; emits AgentEnd(messages=[])
before_turnSkips turn; neither TurnStart nor TurnEnd is emitted
before_tool_executionSkips tool; emits error ToolExecutionEnd without calling execute()
before_tool_execution_updateSuppresses ToolExecutionUpdate; tool keeps running; ToolResult unaffected

Steering Checkpoints

Steering messages (injected via the agent's steering queue) are checked at six specific points in the turn cycle. These checkpoints give the caller opportunities to redirect the agent mid-run without waiting for the current loop iteration to complete.

The Six Checkpoints

  1. Before turn -- After before_turn fires, before the LLM call. The steering message is prepended to the message history as a User message before the model sees it.
  2. After turn -- After the LLM response is received and after_turn fires. Steering is appended before the next turn begins.
  3. Between tool executions (Sequential) -- When tool_strategy = "sequential", the steering queue is checked between each individual tool call. This is the finest-grained checkpoint.
  4. Between batches (Batched) -- When tool_strategy = "batched", the steering queue is checked after each batch completes, before the next batch starts.
  5. After all tools (Parallel) -- When tool_strategy = "parallel", steering is checked once after all tool calls complete. No mid-batch interruption.
  6. On loop re-entry -- At the top of each loop iteration, before before_turn fires.

Per-Strategy Behavior

StrategyWhen steering is checkedGranularity
SequentialBetween each tool callPer-tool
BatchedAfter each batch completesPer-batch
ParallelAfter all tools completePost-batch

In all strategies, checkpoints 1, 2, and 6 always apply. The strategy only affects when steering is checked during tool execution (checkpoints 3-5).

Why Mid-Stream and Mid-Tool Steering Is Not Supported

Steering is intentionally not checked:

  • During an LLM streaming response -- The SSE stream is atomic from the agent loop's perspective. Interrupting a partial response would produce an inconsistent message (partial assistant text with no stop reason). The model's response must complete or fail before steering can take effect.
  • During a single tool's execution -- A tool call is an atomic unit. Interrupting a bash command mid-execution or a file write mid-stream would leave the environment in an undefined state. The tool must return its ToolResult before steering is considered.

These boundaries are not limitations but invariants that keep the message history and environment consistent.

Hard Abort with CancellationToken

For cases where waiting for the next steering checkpoint is unacceptable (e.g., runaway tool, user-initiated cancel), CancellationToken provides a hard abort:

#![allow(unused)]
fn main() {
use tokio_util::sync::CancellationToken;

let cancel = CancellationToken::new();
let cancel_clone = cancel.clone();

// In another task:
cancel_clone.cancel(); // triggers immediate abort
}

When the token is cancelled:

  • The current LLM stream is dropped (partial response discarded)
  • Running tools are cancelled via their async cancellation
  • The loop emits AgentEnd with StopReason::Aborted
  • No further turns or tool calls are attempted

CancellationToken is a last resort. Prefer steering for graceful redirection; use cancellation only when the agent must stop immediately.


Combining Callbacks

All callbacks are optional and independent:

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key))
    .on_before_loop(|_msgs, _| true)
    .on_after_loop(|msgs, usage| {
        println!("Done: {} messages, {} tokens", msgs.len(), usage.total_tokens);
    })
    .on_before_turn(|_msgs, turn| turn < 20)
    .on_after_turn(|msgs, usage| {
        println!("Messages: {}, Tokens: {}/{}", msgs.len(), usage.input, usage.output);
    })
    .on_error(|err| eprintln!("Error: {}", err))
    .on_before_tool_execution(|name, _id, _args| {
        println!("Running: {}", name);
        true
    })
    .on_after_tool_execution(|name, _id, is_error| {
        println!("Tool {} finished (error={})", name, is_error);
    });
}

Using with AgentLoopConfig

For direct loop usage without the Agent wrapper:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use phi_core::agent_loop::AgentLoopConfig;
use phi_core::provider::ModelConfig;

let config = AgentLoopConfig {
    model_config: ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key),
    // Loop-level
    before_loop: Some(Arc::new(|_msgs, _| true)),
    after_loop: Some(Arc::new(|msgs, usage| { /* log */ })),
    // Turn-level
    before_turn: Some(Arc::new(|_msgs, turn| turn < 5)),
    after_turn: Some(Arc::new(|_msgs, _usage| { /* log */ })),
    on_error: Some(Arc::new(|err| eprintln!("{}", err))),
    // Tool-level
    before_tool_execution: Some(Arc::new(|name, id, args| true)),
    after_tool_execution: Some(Arc::new(|name, id, is_error| {})),
    before_tool_execution_update: Some(Arc::new(|name, id, text| true)),
    after_tool_execution_update: Some(Arc::new(|name, id, text| {})),
    ..Default::default()
};
}

Sessions

A Session is a named container (keyed by session_id) that groups all LoopRecords belonging to one agent session. Sessions provide persistent, structured memory of every agent interaction — suitable for logging, replay, branching, and tracing agent-spawning chains.

The session module is split into sub-modules: model, recorder, storage, helpers.

Session (session_id)
├── LoopRecord (loop_id: A)       ← origin loop
│   ├── LoopRecord (loop_id: B)   ← continuation of A
│   └── LoopRecord (loop_id: C)   ← another continuation of A
│       ├── LoopRecord (loop_id: D)  ← parallel branch
│       └── LoopRecord (loop_id: E)  ← parallel branch (selected)
└── child_loop_refs → Session (sub-agent session)

Overview

ConceptDescription
SessionContainer for all loops belonging to one session_id
LoopRecordComplete record of one agent_loop / agent_loop_continue execution
LoopEventOne event in a loop's ordered event stream
SessionRecorderStateful consumer that builds sessions from AgentEvent streams

Relationship to loops

One session contains many loops. Loops within a session form a tree via parent_loop_id / children_loop_ids links. Parallel-evaluation branches form a sibling group linked by ParallelGroupRecord. Sub-agent loops are cross-session (different session_id) and connected via ChildLoopRef / SpawnRef instead.


Session Formation

A new Session is opened when SessionRecorder first encounters a session_id it has not seen before. Three scenarios produce a new session:

PerSessionId (default)

One Session per session_id. Maps naturally onto BasicAgent lifetime — one BasicAgent instance = one session for its entire lifetime.

#![allow(unused)]
fn main() {
let mut recorder = SessionRecorder::new(SessionRecorderConfig::default());
// Every event from a single BasicAgent feeds into one Session.
}

When to use: The default for most applications. No infrastructure needed.

InactivityTimeout

Opens a new session when the agent has been idle for longer than a configured threshold. Requires the caller to rotate session_id beforehand — the recorder detects the new session_id on the next AgentStart.

#![allow(unused)]
fn main() {
// In your agent orchestrator, before prompting:
if agent.check_and_rotate(Duration::from_secs(1800)).is_some() {
    println!("Started new session after 30 minutes idle");
}
}

When to use: Long-running assistants where each "conversation" should be a distinct session even if the BasicAgent object persists.

Explicit rotation

Call BasicAgent::new_session() directly to rotate immediately.

#![allow(unused)]
fn main() {
let new_id = agent.new_session();
// All subsequent loops belong to the new session.
}

When to use: At conversation boundaries you control explicitly (e.g. "clear chat" button, new document context).


LoopRecord Anatomy

Field table

FieldTypeDescription
loop_idStringUnique id for this execution
session_idStringSession this loop belongs to
agent_idStringAgent that ran this loop
parent_loop_idOption<String>Preceding loop (same or different session)
continuation_kindContinuationKindHow this loop relates to its parent (Initial for first loops)
started_atDateTime<Utc>Timestamp from AgentStart
ended_atOption<DateTime<Utc>>Timestamp from AgentEnd
statusLoopStatusLifecycle state
rejectionOption<String>Input-filter rejection reason (if any)
configOption<LoopConfigSnapshot>Model/provider that ran this loop
messagesVec<AgentMessage>All new messages produced (from AgentEnd)
turnsVec<Turn>Materialized turn records (one per LLM call-response cycle). Built from TurnStart/TurnEnd event pairs. Empty for old sessions or loops that ended before any turn completed.
usageUsageToken usage for this loop
metadataOption<Value>Caller-supplied metadata from AgentStart
eventsVec<LoopEvent>Full ordered event stream
children_loop_idsVec<String>Same-session child loops (parent→children)
child_loop_refsVec<ChildLoopRef>Cross-session sub-agent spawn links
compaction_blockOption<CompactionBlock>Non-destructive compaction overlay (see below)
parallel_groupOption<ParallelGroupRecord>Parallel-evaluation group metadata

LoopStatus lifecycle

                              AgentEnd (no rejection)
┌─────────┐  AgentStart  ┌─────────┐ ───────────────────────► ┌───────────┐
│ Pending ├─────────────►│ Running │                           │ Completed │
└─────────┘              └────┬────┘ AgentEnd (rejection Some) └───────────┘
                               │ ────────────────────────────► ┌──────────┐
                               │                               │ Rejected │
                               │ flush() before AgentEnd       └──────────┘
                               └─────────────────────────────► ┌─────────┐
                                                               │ Aborted  │
                                                               └─────────┘

Pending is only used for parallel-evaluation branches: they are pre-registered when ParallelLoopStart arrives, before their individual AgentStart fires.

continuation_kind classification

parent_loop_idcontinuation_kindMeaning
NoneInitialFresh origin loop (agent_loop)
Same-session parentDefaultRegular continuation
Same-session parentRerun { tag }Retry / error recovery
Same-session parentBranch { tag }Branch exploration
Different-session parentInitialSub-agent loop (spawned by a tool)

LoopConfigSnapshot

LoopConfigSnapshot captures model identity and key configuration from the AgentLoopConfig that ran the loop:

#![allow(unused)]
fn main() {
pub struct LoopConfigSnapshot {
    pub model: String,                    // e.g. "claude-opus-4-6"
    pub provider: String,                 // e.g. "anthropic"
    pub config_id: Option<String>,        // from AgentLoopConfig.config_id
    pub name: Option<String>,             // model display name
    pub api: Option<ApiProtocol>,         // which API protocol was used
    pub base_url: Option<String>,         // provider base URL
    pub reasoning: Option<bool>,          // whether model supports reasoning/thinking
    pub context_window: Option<u32>,      // model context window size
    pub max_tokens: Option<u32>,          // max output tokens
    pub thinking_level: Option<ThinkingLevel>, // reasoning depth for this loop
    pub temperature: Option<f32>,         // sampling temperature
}
}

The first three fields (model, provider, config_id) are always populated. The remaining fields use Option with #[serde(skip_serializing_if = "Option::is_none")] so they only appear when set, keeping serialized output compact.

Why not store the full AgentLoopConfig? The full config contains API keys (in ModelConfig.api_key) and non-serialisable hook closures. Storing it would require stripping secrets and skipping closures for little extra value. LoopConfigSnapshot is sufficient for cost attribution, replay (the caller reconstructs the config), and identifying parallel branches (e.g. "haiku vs. opus").

Note: thinking_level and temperature were previously stored on the Session struct. They are now tracked per-loop in LoopConfigSnapshot, which more accurately reflects that these settings can vary between loops (e.g. across parallel evaluation branches with different configs).

events field

LoopRecord.events contains every AgentEvent emitted during the loop, in order, tagged with a monotonic sequence counter.

MessageUpdate (streaming delta) events are excluded by default — they are 100–1 000× more numerous than final messages and are not needed for replay. Enable them with SessionRecorderConfig { include_streaming_events: true, .. }.

AgentEnd.messages is the authoritative message source for a loop. LoopRecord.messages is populated directly from it. Reconstructing messages from MessageStart/MessageEnd events would be fragile.

compaction_block field

LoopRecord.compaction_block holds a non-destructive compaction overlay. When present, the context loader uses this block instead of the raw messages field to reconstruct the agent's working context. The original messages remain authoritative for replay and branching — they are never mutated or discarded. This overlay model means compaction is always reversible: removing or replacing the CompactionBlock restores the original conversation without data loss.

Both directions of the loop tree are maintained:

  • LoopRecord.parent_loop_id — child → parent (set at loop creation)
  • LoopRecord.children_loop_ids — parent → children (appended at AgentEnd)

This allows O(1) traversal in either direction without scanning the full loops vec.


Loop Tree Navigation

Session provides four navigation methods:

#![allow(unused)]
fn main() {
// Root loops — no parent in this session.
session.root_loops();

// Direct same-session children of a loop.
session.children_of("loop-id-A");

// All parallel siblings (including the loop itself).
session.parallel_siblings("loop-id-branch-1");

// Lookup by id.
session.get_loop("loop-id-X");

// Cumulative token usage for the whole session.
session.total_usage();
}

Reconstructing a conversation thread

Follow the parent→child chain from a root:

#![allow(unused)]
fn main() {
fn print_thread(session: &Session, loop_id: &str, indent: usize) {
    if let Some(lr) = session.get_loop(loop_id) {
        println!("{:indent$}{loop_id}: {:?}", "", lr.status, indent = indent);
        for child_id in &lr.children_loop_ids {
            print_thread(session, child_id, indent + 2);
        }
    }
}

for root in session.root_loops() {
    print_thread(&session, &root.loop_id, 0);
}
}

Identifying branches

Branches share the same parent_loop_id and each has parallel_group set:

#![allow(unused)]
fn main() {
let branches: Vec<_> = session.parallel_siblings("branch-loop-id").collect();
let winner = branches.iter().find(|l| {
    l.parallel_group.as_ref().map(|pg| pg.is_selected).unwrap_or(false)
});
}

Cross-Session Sub-Agent Tracking

Sub-agents run with their own session_id. phi-core maintains bidirectional links between the parent session and the child session:

Parent Session                         Child Session
──────────────────────                 ──────────────────────────
LoopRecord (loop-P)                    Session
  child_loop_refs:                       parent_spawn_ref:
    ChildLoopRef {                         SpawnRef {
      tool_call_id: "call-1"               parent_session_id: "sess-P"
      tool_name: "sub_agent"               parent_loop_id: "loop-P"
      child_loop_id: "loop-C"              tool_call_id: "call-1"
      child_session_id: "sess-C"           tool_name: "sub_agent"
    }                                    }

Tracing a full spawn chain

#![allow(unused)]
fn main() {
// Load parent and child sessions from disk.
let parent = load_session("sess-P", dir)?;
let child = load_session("sess-C", dir)?;

// From parent: find all sub-agent spawns.
for lr in &parent.loops {
    for child_ref in &lr.child_loop_refs {
        println!("Tool {} spawned sub-agent loop {}",
            child_ref.tool_name, child_ref.child_loop_id);
    }
}

// From child: find the parent that triggered it.
if let Some(ref sr) = child.parent_spawn_ref {
    println!("This session was spawned by {} in session {}",
        sr.tool_name, sr.parent_session_id);
}
}

Why sub-agents get separate sessions

Sub-agents have clean identity boundaries — they can be loaded and analyzed independently of their parent. Embedding child data inside the parent session would bloat the parent record and couple two independent execution traces. The bidirectional ChildLoopRef / SpawnRef pair provides a complete spawn graph without that coupling.


Parallel Evaluation Groups

When agent_loop_parallel runs N branches, each branch gets its own LoopRecord. All N records are linked by ParallelGroupRecord:

#![allow(unused)]
fn main() {
pub struct ParallelGroupRecord {
    pub all_loop_ids: Vec<String>,       // all branch loop_ids in config order
    pub selected_loop_id: String,        // winner chosen by EvaluationStrategy
    pub selected_config_index: usize,    // 0-based index into original configs
    pub evaluation_usage: Usage,         // judge LLM tokens (zero if no judge)
    pub is_selected: bool,               // true only on the winner's record
}
}

LoopStatus::Pending is used before AgentStart arrives for each branch. ParallelLoopStart announces all loop_ids in advance, so the group can be registered immediately without retroactive wiring.


SessionRecorder Usage

Wire the recorder to your agent's event channel:

#![allow(unused)]
fn main() {
use phi_core::session::{SessionRecorder, SessionRecorderConfig, save_session};
use phi_core::AgentEvent;
use std::path::Path;
use tokio::sync::mpsc;

let (tx, mut rx) = mpsc::unbounded_channel::<AgentEvent>();
let mut recorder = SessionRecorder::new(SessionRecorderConfig::default());

// Spawn a task to consume events from the channel.
tokio::spawn(async move {
    while let Some(event) = rx.recv().await {
        recorder.on_event(event);
    }
    // Channel closed — flush and persist.
    recorder.flush();
    for session in recorder.drain_completed() {
        save_session(&session, Path::new("./sessions")).unwrap();
    }
});

// Pass tx to agent_loop / agent_loop_continue / BasicAgent.
}

include_streaming_events

Enable only when you need to replay or audit the raw token stream:

#![allow(unused)]
fn main() {
SessionRecorderConfig {
    include_streaming_events: true,
    ..Default::default()
}
}

Storage implications: a single turn with extended thinking may produce thousands of MessageUpdate events. Each is a full clone of the accumulated message plus the delta.


Session Lifecycle Callbacks

SessionRecorderConfig supports two session-level callbacks for billing, audit, and metrics:

FieldTypeDescription
before_taskOption<BeforeTaskFn>Arc<dyn Fn(&Session) -> bool>. Fires on the first AgentStart with a new session_id. Return false to reject. Use for session initialization, billing setup, or audit logging.
after_taskOption<AfterTaskFn>Arc<dyn Fn(&Session)>. Fires in flush() when the session is finalized. Use for billing finalization, metrics emission, or cleanup.

These are session-level (not loop-level) hooks. Unlike before_loop/after_loop on AgentLoopConfig which fire around each individual agent loop, before_task and after_task fire once per session lifecycle:

#![allow(unused)]
fn main() {
use phi_core::session::{SessionRecorder, SessionRecorderConfig};

let config = SessionRecorderConfig {
    before_task: Some(Arc::new(|session: &phi_core::session::Session| -> bool {
        println!("Session started: {}", session.session_id);
        // Initialize billing, start audit trail, etc.
        true // return false to reject the session
    })),
    after_task: Some(Arc::new(|session| {
        println!("Session finalized: {} ({} loops)", session.session_id, session.loops.len());
        // Finalize billing, emit metrics, etc.
    })),
    ..Default::default()
};

let mut recorder = SessionRecorder::new(config);
}

Persistence API

FunctionDescription
save_session(session, dir)Write {dir}/{session_id}.json (atomic via tmp + rename)
load_session(session_id, dir)Read {dir}/{session_id}.json
list_session_ids(dir)List all .json filenames, newest first
load_sessions_for_agent(agent_id, dir)Load all sessions matching agent_id
delete_session(session_id, dir)Remove {dir}/{session_id}.json

File format: pretty-printed JSON (serde_json::to_writer_pretty). Directory layout: flat — {dir}/{session_id}.json, no sub-directories, no index. Writes are atomic: the implementation writes to a temp file then renames over the target, so readers never observe a partially-written file.

Pluggable store trait (0.7.0+)

For callers that want to swap the persistence backend (e.g. S3, SQLite, an in-memory fake for tests) or that need concurrent-writer safety, phi-core exposes a SessionStore async trait alongside the free functions:

#![allow(unused)]
fn main() {
use phi_core::session::{SessionStore, FileSystemSessionStore};

let store = FileSystemSessionStore::new("./sessions");
store.save(&session).await?;          // acquires fs2 exclusive lock
let loaded = store.load("sess-1").await?;
let ids    = store.list_ids().await?;
store.delete("sess-1").await?;
}

FileSystemSessionStore::save() takes an advisory fs2 exclusive lock on the target file before the atomic rename. Concurrent writers to the same session_id get back SessionError::Locked { session_id } instead of silently producing a corrupt file. Readers take a shared lock and so coexist with themselves.

The free save_session() / load_session() / etc. functions remain available and unchanged — use the trait when you need pluggability or contention safety.

When to call flush()

Call flush() before saving to finalize any loops that have not received AgentEnd yet (e.g. on process shutdown). Flushed loops get status Aborted.

#![allow(unused)]
fn main() {
recorder.flush();
let sessions = recorder.drain_completed();
for s in &sessions {
    save_session(s, Path::new("./sessions"))?;
}
}

Design Decisions

1. loop_id on every AgentEvent variant

Decision: Add loop_id: String to all 11 AgentEvent variants that lacked it.

Why: agent_loop_parallel interleaves branch events on one tx channel. Without loop_id on every event, TurnStart, ToolExecutionEnd, etc. cannot be reliably attributed to the correct branch LoopRecord. The only alternative — heuristically assigning events to the last-opened loop — produces incorrect records when two branches overlap.

Rejected alternative: Last-opened-loop heuristic. Rejected because parallel branches genuinely interleave; the heuristic would silently misattribute events.


2. LoopStatus::Pending for parallel branches

Decision: Pre-register LoopRecord { status: Pending } entries when ParallelLoopStart arrives, before their AgentStart events fire.

Why: ParallelLoopStart announces all loop_ids in advance. Pre-creating records lets the ParallelGroupRecord be registered immediately, so no retroactive wiring is needed when each branch's AgentStart arrives later.

Rejected alternative: Create LoopRecords only on AgentStart and retroactively set ParallelGroupRecord on ParallelLoopEnd. Rejected because it requires a second pass over all records and makes the group state inconsistent during the parallel execution window.


3. Messages from AgentEnd, not reconstructed from events

Decision: LoopRecord.messages is populated directly from AgentEnd.messages.

Why: AgentEnd.messages is the authoritative, ordered list of all messages produced by a loop. The LLM loop already assembles this — there is no value in re-assembling it from MessageStart/MessageEnd events in the recorder.

Rejected alternative: Reconstruct messages from streaming events. Rejected because it duplicates work, is fragile (missed events, ordering edge cases), and requires special handling for partial messages.


4. Bidirectional parent↔child within a session

Decision: Maintain both parent_loop_id (child→parent) and children_loop_ids (parent→children) on every LoopRecord.

Why: O(1) traversal in both directions without scanning the full loops vec. The recorder appends to parent.children_loop_ids when a loop's AgentEnd arrives and its parent_loop_id is in the same session.

Rejected alternative: Single-direction links + O(N) scan. Rejected because deep continuation trees (10+ loops) would incur O(N²) cost for common tree operations.


5. continuation_kind classifies loop origin

Decision: Reuse the existing ContinuationKind enum (Initial, Default, Rerun, Branch, Compaction) to classify loop relationships, supplemented by the parent_loop_id/session_id cross-session check.

Why: ContinuationKind is already threaded through AgentStart — no new enum is needed. The full classification table (origin / continuation / retry / branch / sub-agent) is derivable from (parent_loop_id, session_id, continuation_kind).

Rejected alternative: A dedicated LoopOrigin enum on LoopRecord. Rejected because it would duplicate information already present in the existing fields and require an additional mapping step in the recorder.


Decision: Sub-agents always get their own session_id. The parent records ChildLoopRef (outbound); the child Session records SpawnRef (inbound).

Why: Clean agent identity boundaries — sub-agent sessions can be loaded and analyzed independently. The bidirectional link pair provides a complete spawn graph without coupling the parent and child session records.

Rejected alternative: Embed sub-agent loops inside the parent session. Rejected because a sub-agent may have many of its own continuations, parallel branches, and even nested sub-agents — treating it as a flat loop inside the parent session would obscure this structure.


7. SpawnRef on Session (not on LoopRecord)

Decision: The inbound cross-session spawn reference lives on Session.parent_spawn_ref, not on an individual LoopRecord.

Why: Sub-agent spawning is a session-level concern. The entire child session was triggered by one parent loop — the reference belongs at the session level, not on individual loop records within it. Placing it on a LoopRecord would require choosing which loop gets the ref (the first? the origin?) arbitrarily.

Rejected alternative: LoopRecord.parent_spawn_ref. Rejected because a sub-agent session may have multiple origin loops (e.g. after new_session()) and the spawn ref would be duplicated or placed inconsistently.


8. include_streaming_events: bool (default false)

Decision: MessageUpdate (streaming delta) events are excluded from LoopRecord.events by default.

Why: Streaming deltas are 100–1 000× more numerous than final messages and are not needed for replay or branching. The final message content in AgentEnd.messages is authoritative. Opt-in ensures that session files stay compact by default.

Rejected alternative: Always store all events. Rejected because a single session with a few extended-thinking turns could easily produce megabytes of delta events.


9. Flat file layout: {dir}/{session_id}.json

Decision: One JSON file per session. No index file, no sub-directories.

Why: Simplest observable format — files can be inspected directly with any JSON tool. list_session_ids is a directory listing. No index to maintain or synchronize.

Rejected alternative: Indexed layout (e.g. sessions/index.json + sessions/{id}.json). Rejected because the index requires atomic updates (write to two files) and can fall out of sync. An indexed layout can be added in a future iteration when query patterns (filtering, pagination) are clearer.

Context Compaction

Compaction manages context window pressure by creating non-destructive overlays on session history. Nothing is deleted or replaced — original messages remain authoritative in LoopRecord.messages.

How it works

When the context approaches the token budget, a CompactionBlock is created on the current LoopRecord. This block controls what gets loaded into context for subsequent LLM calls, replacing the raw messages with a compacted view.

CompactionBlock anatomy

A block has three sections:

┌─────────────────────────────────────────────┐
│  keep_first    │ Original turns, verbatim    │  Most recent loop only
│  (turns 0..1)  │ No modification              │
├────────────────┼────────────────────────────-│
│  keep_compacted│ Summarised one-liners       │  All loops
│  (turns 2..N-6)│ ≤ max_summary_tokens        │
├────────────────┼────────────────────────────-│
│  keep_recent   │ Tool outputs truncated      │  Most recent loop only
│  (turns N-5..N)│ Rest unchanged              │
└─────────────────────────────────────────────┘
  • keep_first — verbatim turns from the start. Only for the most recent loop. Original messages in this range are used as-is.
  • keep_compacted — fully summarised middle section. For the most recent loop this is the gap between keep_first and keep_recent. For older loops this covers the entire loop.
  • keep_recent — recent turns with only tool outputs truncated. Only for the most recent loop.

When compaction fires

Compaction uses a percentage-based threshold:

headroom = compact_at_pct − (system_tokens / max_tokens) − (current_tokens / max_tokens)

Compaction fires when headroom < compact_budget_threshold_pct.

With defaults (100k max, 4k system, 90% ceiling, 5% threshold): fires when current tokens exceed ~81k.

Configuration

ContextConfig

#![allow(unused)]
fn main() {
ContextConfig {
    max_context_tokens: 100_000,   // Model's context window
    system_prompt_tokens: 4_000,   // Reserved for system prompt
    compaction: CompactionConfig { // Always present when limits are set
        // WHEN
        compact_at_pct: 0.90,
        compact_budget_threshold_pct: 0.05,
        compaction_scope: CompactionScope::FixedCount(3),
        // HOW
        keep_first_turns: 2,
        keep_recent_turns: 10,
        max_summary_tokens: 2_000,
        tool_output_max_lines: 50,
    },
}
}

Compaction is disabled entirely by setting context_config: None on AgentLoopConfig.

CompactionScope

Controls how many earlier loops are included in compaction and context loading:

  • FixedCount(n) — Compact a fixed number of earlier loops. Simple and predictable.
  • TokenBudget — Walk the chain backward, accumulating per-loop token estimates, and stop when max_context_tokens would be exceeded.

TokenBudget and exceeding the window

The TokenBudget scope can include loops whose raw messages exceed max_context_tokens. This is intentional: the compacted summaries of those loops will fit in the window, even though the originals did not. This enables richer context for expensive summarisation strategies (e.g. LLM summarisers) that compress large loops into compact representations that then fit within the budget.

For example, if a loop has 50k tokens of raw messages and the window is 100k, TokenBudget includes it in scope. The strategy's keep_compacted method produces a ~500 token summary of that loop, which fits easily.

Cross-loop compaction

When compaction fires, blocks are created for the current loop and earlier loops within the compaction_scope on the active chain.

The "active chain" is the linear path from root to current loop via parent_loop_id links:

  • Parallel branches — only the selected branch is on the chain. Unselected siblings get their own compaction if/when they become active.
  • Reruns — the rerun's parent points to the pre-rerun loop. Superseded runs are siblings, not ancestors.

Loading rule

When building context from session history:

  • Most recent loop: keep_first + keep_compacted + keep_recent
  • Earlier loops (within compaction_scope): only keep_compacted
  • Loops older than that: skipped entirely

Custom strategies

Compaction strategies are fields on CompactionConfig, not on AgentLoopConfig. The dispatch logic in run.rs reads them from ctx_config.compaction:

  • in_memory_strategy — custom in-memory compaction strategy (used when session is None)
  • block_strategy — block-based compaction strategy (used when session is Some; falls back to DefaultBlockCompaction)

Implement BlockCompactionStrategy to customise any section.

As of phi-core 0.9.0, BlockCompactionStrategy is #[async_trait]-marked and all four methods are async fn — implementations can issue LLM calls inside keep_compacted / keep_recent without block_in_place workarounds:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use phi_core::{BlockCompactionStrategy, CompactionConfig, CompactedSection, TurnRange, TurnMap, DefaultBlockCompaction};
use phi_core::session::LoopRecord;

struct MyStrategy;

#[async_trait]
impl BlockCompactionStrategy for MyStrategy {
    async fn keep_first(&self, record: &LoopRecord, turn_map: &TurnMap, config: &CompactionConfig) -> Option<TurnRange> {
        DefaultBlockCompaction.keep_first(record, turn_map, config).await // delegate
    }

    async fn keep_recent(&self, record: &LoopRecord, turn_map: &TurnMap, config: &CompactionConfig) -> Option<CompactedSection> {
        DefaultBlockCompaction.keep_recent(record, turn_map, config).await // delegate
    }

    async fn keep_compacted(&self, record: &LoopRecord, turn_map: &TurnMap, config: &CompactionConfig, is_most_recent: bool) -> Option<CompactedSection> {
        // Custom LLM-based summarisation — issue LLM calls directly without bridging.
        my_llm_summarize(record, turn_map, config, is_most_recent).await
    }
}
}

Sync impls that don't .await anything migrate by adding #[async_trait::async_trait] + the async keyword on each method signature; the bodies remain unchanged. See the per-turn debug-capture surface in debugging.md for the canonical pattern to inspect what each compacted turn looked like to the model.

Set the custom strategy on CompactionConfig:

#![allow(unused)]
fn main() {
let compaction_config = CompactionConfig {
    block_strategy: Some(Arc::new(MyStrategy)),
    ..Default::default()
};
}

Public APIs

Orchestration functions

  • compact_session_loops(session, loop_id, strategy, config, max_tokens) — Creates CompactionBlocks for the current loop and earlier loops within the configured scope. Mutates the session in place; caller persists to disk.
  • build_context_from_session(session, loop_id, config, max_tokens) — Builds a compacted context by walking the loop chain, loading from blocks where available and raw messages otherwise.

BasicAgent methods

  • compact_context_with_sender(&mut self, tx) — Standalone compaction with full event lifecycle: AgentStart(Compaction)CompactionStarted → compact → CompactionEndedAgentEnd. No-op if session or config is missing.
  • compact_context(&mut self) -> usize — Fire-and-forget compaction. Returns the number of loops that received new CompactionBlocks. Returns 0 if session or config is missing.

Events

Two events bracket compaction:

  • CompactionStarted { loop_id, estimated_tokens, message_count, timestamp }
  • CompactionEnded { loop_id, messages_before, messages_after, estimated_tokens_before, estimated_tokens_after, loops_compacted, timestamp }

For standalone compaction (compact_context_with_sender), these appear inside a dedicated LoopRecord with continuation_kind: Compaction.

TurnId tracking

Every message pushed during the agent loop carries a TurnId { loop_id, turn_index } identifying which turn produced it. This enables TurnMap::from_messages() to group messages by turn without replaying the event stream.

TurnId is stored on LlmMessage.turn_id and serialized as an optional turnId field alongside the existing message JSON. Old data without turnId deserializes with turn_id: None.

Data model

Struct definitions

#![allow(unused)]
fn main() {
pub struct CompactionBlock {
    pub keep_first: Option<TurnRange>,         // verbatim turns from start (most recent loop only)
    pub keep_recent: Option<CompactedSection>,  // truncated tool outputs (most recent loop only)
    pub keep_compacted: Option<CompactedSection>,// summarised section (all loops)
    pub created_at: DateTime<Utc>,
}

pub struct TurnRange {
    pub start_turn: u32,  // inclusive, matches TurnId.turn_index
    pub end_turn: u32,    // inclusive
}

pub struct CompactedSection {
    pub range: TurnRange,
    pub messages: Vec<AgentMessage>,  // replacement messages for this range
}

pub struct TurnId {
    pub loop_id: String,
    pub turn_index: u32,
}
}

Serialization format

CompactionBlock on LoopRecord:

{
  "loop_id": "session123.model.1",
  "messages": [ ... ],
  "compaction_block": {
    "keep_first": { "startTurn": 0, "endTurn": 1 },
    "keep_compacted": {
      "range": { "startTurn": 2, "endTurn": 7 },
      "messages": [
        { "role": "user", "content": [{"type": "text", "text": "[Summary] User asked about X"}], "timestamp": 123 }
      ]
    },
    "keep_recent": {
      "range": { "startTurn": 8, "endTurn": 12 },
      "messages": [ ... ]
    },
    "createdAt": "2026-03-28T10:00:00Z"
  }
}

TurnId on LlmMessage:

{
  "role": "assistant",
  "content": [...],
  "stopReason": "stop",
  "model": "claude-sonnet-4-6",
  "provider": "anthropic",
  "usage": { ... },
  "timestamp": 123,
  "turnId": { "loopId": "session123.model.1", "turnIndex": 3 }
}

Old data without turnId deserializes as turn_id: None.

Invariants

  1. If keep_first is Some, keep_compacted must also be Some (there must be a middle to summarise).
  2. If keep_recent is Some, keep_compacted must also be Some.
  3. For older loops (not most recent), keep_first and keep_recent are always None.
  4. CompactedSection.range bounds must be within the loop's turn count.
  5. If a loop has a compaction_block, all older loops on the same chain must also have one.
  6. If a ToolCall content block is within a section's turn range, its corresponding ToolResult message must also be within the same section. Turn-based grouping (via TurnId) enforces this.

Summary budget semantics

max_summary_tokens is a token budget for the summarised output, not a per-turn limit. Strategies should aim to summarise ALL turns within this budget (e.g. shorter summaries or LLM-generated digests), not merely process turns until the budget runs out. DefaultBlockCompaction is a basic implementation that drops remaining turns when exhausted.

Backward compatibility

  • LoopRecord.compaction_block uses #[serde(default, skip_serializing_if = "Option::is_none")] — old records without the field deserialize as None.
  • LlmMessage.turn_id uses #[serde(default, skip_serializing_if = "Option::is_none")] — old messages without turnId deserialize as None.
  • The CompactionConfig field on ContextConfig uses #[serde(default)] — old configs get CompactionConfig::default().

Focused Compaction

Focused compaction extends the context compaction system with two features: focus messages that steer what the compaction summary emphasizes, and compaction instances that let you define named compaction configurations reusable across agent profiles.

Focus Message

The focus_message field on CompactionConfig is an optional string prepended to the compacted section before the LLM summarizes it. It tells the summarizer what to prioritize when condensing conversation history.

Without a focus message, compaction produces a generic summary. With one, the summary retains details relevant to your domain:

#![allow(unused)]
fn main() {
use phi_core::context::{ContextConfig, CompactionConfig};

let config = ContextConfig {
    max_context_tokens: 200_000,
    compaction: CompactionConfig {
        focus_message: Some(
            "Focus on specification details, API contracts, and architectural decisions.".to_string()
        ),
        ..Default::default()
    },
    ..Default::default()
};
}

The focus message does not change the compaction trigger logic (thresholds, turn counts). It only affects the content of the summarized middle section.

When to use a focus message

  • Domain-specific agents: An agent reviewing legal contracts should retain clause references, not general pleasantries.
  • Long coding sessions: Focus on file paths, function signatures, and design rationale so the agent can continue working after compaction.
  • Research agents: Preserve citations, data points, and methodology notes.

Compaction Instances

Compaction instances are named variations of the compaction defaults, declared with [[context.compaction.instances]] in the config file. Each instance uses the {{...}} ID reference protocol to declare its name, and overrides specific fields from the parent [compaction] section. Fields not set on the instance fall through to the parent defaults.

Config example

# ── Context config (max_context_tokens lives on ContextConfig, not CompactionConfig) ──
[context]
max_context_tokens = 200000

# ── Compaction defaults ─────────────────────────────────────────
[context.compaction]
compact_at_pct = 0.85
compact_budget_threshold_pct = 0.05
keep_first_turns = 2
keep_recent_turns = 4
max_summary_tokens = 2000
tool_output_max_lines = 50
focus_message = "Retain key decisions and code changes."

# ── Named compaction instances ──────────────────────────────────
[[context.compaction.instances]]
id = "{{%coding%}}"
description = "Compaction tuned for coding tasks"
focus_message = "Focus on file paths, function signatures, and design rationale."
keep_recent_turns = 6
max_summary_tokens = 3000

[[context.compaction.instances]]
id = "{{%research%}}"
description = "Compaction tuned for research tasks"
focus_message = "Preserve citations, data sources, and methodology."
keep_first_turns = 3
max_summary_tokens = 4000

Referencing from an agent profile

Agent profiles reference a compaction instance via the compaction field, using the {{...}} ID protocol:

[agent.profile]
name = "coding-agent"
system_prompt = "You are an expert software engineer."
compaction = "{{compaction.coding}}"

[[agent.profile.instances]]
id = "{{%researcher%}}"
description = "A research-focused profile variant"
compaction = "{{compaction.research}}"

When the agent is constructed from config, the referenced compaction instance is resolved and its fields are merged with the compaction defaults to produce the final CompactionConfig.


Programmatic Usage

When building agents in Rust without a config file, focused compaction is set directly on CompactionConfig:

#![allow(unused)]
fn main() {
use phi_core::context::CompactionConfig;
use phi_core::agent_loop::AgentLoopConfig;
use phi_core::provider::ModelConfig;

let context = phi_core::context::ContextConfig {
    max_context_tokens: 200_000,
    compaction: CompactionConfig {
        compact_at_pct: 0.85,
        compact_budget_threshold_pct: 0.05,
        keep_first_turns: 2,
        keep_recent_turns: 6,
        max_summary_tokens: 3_000,
        tool_output_max_lines: 50,
        focus_message: Some(
            "Focus on file paths, function signatures, and design rationale.".to_string()
        ),
        ..Default::default()
    },
    ..Default::default()
};

let config = AgentLoopConfig {
    model_config: ModelConfig::anthropic("claude-sonnet-4-20250514", "Sonnet", &api_key),
    context_config: Some(context),
    ..Default::default()
};
}

Or via BasicAgent builder methods:

#![allow(unused)]
fn main() {
use phi_core::{BasicAgent, context::CompactionConfig};
use phi_core::provider::ModelConfig;

let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Sonnet", &api_key))
    .with_context_config(phi_core::context::ContextConfig {
        max_context_tokens: 200_000,
        compaction: CompactionConfig {
            focus_message: Some("Retain specification details and API contracts.".to_string()),
            ..Default::default()
        },
        ..Default::default()
    });
}

Summary

FeaturePurpose
focus_messageSteers compaction summarization toward domain-relevant content
[[compaction.instances]]Named compaction configurations with {{...}} ID protocol
Profile compaction fieldLinks an agent profile to a specific compaction instance

Context Translation

Context translation solves a fundamental problem in multi-provider agent systems: when an agent switches providers mid-session, content types from the original provider may be silently dropped or cause errors on the new provider. The ContextTranslationStrategy trait provides a read-only translation layer that produces temporary copies of messages, never modifying the canonical history.

Why it is needed

Different LLM providers support different content types. For example:

  • Anthropic emits Content::Thinking blocks (chain-of-thought reasoning)
  • OpenAI has no native thinking block format
  • Google/Bedrock do not support thinking blocks at all

Without translation, switching from Anthropic to OpenAI mid-session would cause thinking blocks to be silently dropped or rejected. The agent loses reasoning context it previously produced.

Design principles

The canonical Message format IS the master layout

phi-core's Message enum (User, Assistant, ToolResult) and Content enum (Text, Image, Thinking, ToolCall) define the canonical format. All providers parse into this format and all session history is stored in it. Translation happens only at the boundary, right before messages are sent to a provider.

Read-only translation

Translation produces temporary copies of the message slice. The original messages in LoopRecord.messages are never modified. This means:

  • Session persistence always stores the full-fidelity canonical format
  • Multiple providers can read the same history with different translations
  • No information is permanently lost

Lossless round-trip guarantee

Consider this scenario:

Turn 1-3: Anthropic (produces Content::Thinking blocks)
Turn 4:   Switch to OpenAI
Turn 5-6: Switch back to Anthropic

Here is what happens:

  1. Turns 1-3 are stored with full Content::Thinking blocks in canonical format.
  2. Turn 4: Before calling OpenAI, the translator converts Content::Thinking to Content::Text prefixed with [Reasoning]. OpenAI sees text, not thinking blocks. The canonical history is untouched.
  3. Turns 5-6: Back on Anthropic. The translator passes Content::Thinking through unchanged. Anthropic sees the original thinking blocks from turns 1-3 exactly as they were produced.

The original thinking blocks from turns 1-3 are never lost. They remain in the canonical history and are available whenever the session returns to a provider that supports them.


Content type translation rules

The DefaultContextTranslation implementation applies these rules per target provider:

Content::Thinking

Target ProviderTranslation
AnthropicKept as-is
OpenAI CompletionsConverted to Content::Text with [Reasoning] prefix
OpenAI ResponsesConverted to Content::Text with [Reasoning] prefix
Azure OpenAIConverted to Content::Text with [Reasoning] prefix
Google GeminiDropped (unsupported)
Google VertexDropped (unsupported)
Amazon BedrockDropped (unsupported)

All other content types

Content::Text, Content::Image, and Content::ToolCall pass through unchanged for all providers.

Message-level behavior

Only Message::Assistant messages are translated (since they are the only ones that carry provider-specific content types). Message::User and Message::ToolResult pass through unchanged.


The ContextTranslationStrategy trait

#![allow(unused)]
fn main() {
pub trait ContextTranslationStrategy: Send + Sync {
    /// Translate a slice of messages for the given target provider protocol.
    fn translate_for_provider(&self, messages: &[Message], target: ApiProtocol) -> Vec<Message>;
}
}

The trait receives the full message slice and the target ApiProtocol enum variant. It returns a new Vec<Message> with translations applied.

DefaultContextTranslation

The built-in implementation applies the content type rules described above. It is the default when no custom strategy is provided.

Custom strategies

Implement the trait to define custom translation logic:

#![allow(unused)]
fn main() {
use phi_core::provider::context_translation::{ContextTranslationStrategy, DefaultContextTranslation};
use phi_core::provider::model::ApiProtocol;
use phi_core::types::content::Message;

struct MyTranslation;

impl ContextTranslationStrategy for MyTranslation {
    fn translate_for_provider(&self, messages: &[Message], target: ApiProtocol) -> Vec<Message> {
        // Custom logic here — e.g., strip all images for text-only providers
        // Fall back to default for everything else
        DefaultContextTranslation.translate_for_provider(messages, target)
    }
}
}

Usage

On AgentLoopConfig

Set the context_translation field to inject a strategy into the agent loop:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use phi_core::agent_loop::AgentLoopConfig;
use phi_core::provider::context_translation::DefaultContextTranslation;
use phi_core::provider::ModelConfig;

let config = AgentLoopConfig {
    model_config: ModelConfig::openai("gpt-4o", "GPT-4o", &api_key),
    context_translation: Some(Arc::new(DefaultContextTranslation)),
    ..Default::default()
};
}

When context_translation is Some, the loop calls translate_for_provider() on the message slice before each LLM call. When None, messages are passed to the provider as-is.

When to enable translation

Enable context translation when:

  • Your agent may switch providers mid-session (e.g., using different models for different tasks)
  • You are loading session history that was produced by a different provider
  • You are running parallel sub-agents on different providers that share context

If your agent always uses a single provider, translation is unnecessary.

Context Pruning

Context pruning is a model-directed mechanism for surgically removing irrelevant content from the working context during a run. Unlike compaction (which is threshold-triggered and bulk), pruning gives the model fine-grained control over what stays in the context window.

Philosophy

Context pruning saves context length (tokens in the context window), not monetary cost. The token cost of a pruned message has already been paid -- pruning cannot reclaim it. What pruning reclaims is space: room in the context window for the model to continue working without hitting the context limit.

Think of it as a researcher working through a stack of papers. The researcher freely explores tangential references, reads through lengthy tool outputs, and investigates dead ends. When a line of inquiry turns out to be irrelevant, the researcher sets those papers aside rather than keeping them on the desk. The desk has limited space; the filing cabinet does not. Pruning moves content from desk to cabinet.

This freedom to explore without anxiety about context length is the core value proposition. The model can request verbose tool outputs, try multiple approaches, and investigate broadly -- knowing it can prune the dead ends and keep only what matters for the current task.

Static vs In-Run Context

Every message in the context belongs to one of two streams:

  • user_context -- All User messages: the initial prompt, follow-ups, steering messages, and any user-injected content. These represent user intent.
  • inrun_context -- All model-generated content: Assistant messages, ToolCall content, and ToolResult messages. These are the model's working memory.

The system_prompt is separate from both streams. It is a dedicated field on AgentContext, always occupies the first position, and is never subject to pruning or compaction.

Pruning Rules

  • user_context is NEVER pruned. User intent is sacred. The model cannot discard the user's words, steering messages, or follow-up instructions.
  • inrun_context CAN be surgically pruned by the model using the PrunTool. The model decides what is no longer relevant and removes it.
  • system_prompt is never pruned. It is not part of either stream.

PrunTool Variants

phi-core provides two pruning operations, both invoked by the model as tool calls:

prun(tokens)

Silent removal. The model specifies a token budget to reclaim, and the oldest inrun_context entries (by timestamp) are removed from the working context until the budget is met.

  • Removed content is preserved in the session log -- nothing is lost permanently
  • Removed entries become invisible to the LLM on subsequent turns
  • The model sees a ToolResult confirming how many tokens were reclaimed

prun_with_memo(tokens, memo)

Removal with summary. Same as prun, but the model provides a concise memo string that replaces the pruned content in the working context.

  • Each pruned message with a memo creates a separate PrunedMemo entry at its original timestamp, preserving chronological order
  • Useful when the pruned content contained decisions or conclusions the model wants to remember
  • The memo should be concise -- a few sentences, not a reproduction of the pruned content

Model Autonomy

The model decides which variant to use and when. Typical patterns:

  • Silent prune after exploring a dead end (e.g., reading a file that turned out to be irrelevant)
  • Memo prune after a productive investigation (e.g., "Investigated auth module: uses JWT with RS256, tokens expire after 1h, refresh handled in middleware")
  • No prune when all context remains relevant to the current task

Working Context Rebuild

Each turn, the working context sent to the LLM is rebuilt from scratch by build_working_context(), merging the two streams:

  1. Collect all user_context entries with their timestamps
  2. Collect all live inrun_context entries with their timestamps
  3. For each PrunedMemo entry, create a separate User message with the memo text at the entry's original timestamp
  4. Sort all collected entries by timestamp to preserve chronological order
  5. Prepend the system_prompt

The result is a coherent conversation history where:

  • User messages are always present
  • Pruned-silent entries are invisible (the conversation flows as if they never existed)
  • Each pruned-with-memo entry appears as a separate brief summary message at its original timestamp position, preserving the chronological position of the message it replaced

Session Log Integrity

The session log (context.messages) records everything that happened during the run. Pruning never modifies the session log -- it only affects what the LLM sees in the working context.

PrunRecord

Each prune operation emits a PrunApplied event (recorded in LoopRecord.events by SessionRecorder) containing:

  • pruned_timestamps -- Vec<u64> of timestamps identifying the pruned messages
  • tokens_removed -- Total tokens reclaimed
  • messages_removed -- Number of messages pruned
  • memo -- Optional summary string (present only for prun_with_memo)

On session reload, the two context streams are reconstructed by walking LoopRecord.events to find PrunApplied events. The pruned_timestamps field identifies which messages were pruned. These messages are placed in the pruned state (PrunedSilent or PrunedMemo depending on whether memo is Some), and their memo (if any) is restored as a separate message at the correct chronological position.

Compaction Interaction

Pruning and compaction are complementary mechanisms that operate at different levels:

PruningCompaction
TriggerModel-directed (tool call)Threshold-triggered (automatic)
GranularitySurgical (specific messages)Bulk (entire middle section)
Scopeinrun_context onlyAll messages in the compaction window
Preserved inSession log + PrunRecordCompactionBlock overlay

After Compaction

When compaction fires, it summarizes a range of messages into a CompactionBlock. After compaction:

  • All surviving messages (the summary, kept-first, and kept-recent) become part of user_context -- they are treated as established context and are unprunable
  • New model-generated content after compaction starts a fresh inrun_context stream
  • The model can prune this new inrun_context as usual

This means compaction resets the pruning boundary. Content that was once prunable inrun_context, if it survives compaction, becomes permanent user_context.

Configuration

TOML

[tools]
enabled = ["bash", "read_file", "write_file", "edit_file", "search", "prun"]

Adding "prun" to the enabled tools list makes both prun and prun_with_memo available to the model. They are two operations exposed through a single tool registration.

Rust (Programmatic)

#![allow(unused)]
fn main() {
use phi_core::agents::BasicAgent;
use phi_core::provider::ModelConfig;

let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Sonnet", &api_key))
    .with_default_tools()
    .with_prun_tool();  // enables context pruning
}

The with_prun_tool() builder method registers the PrunTool, making both pruning variants available. It can be combined with any other tool configuration.

Context pruning works best when compaction is also configured, providing both surgical (model-directed) and bulk (automatic) context management:

[tools]
enabled = ["bash", "read_file", "write_file", "edit_file", "search", "prun"]

[compaction]
max_context_tokens = 200000
compact_at_pct = 0.85
keep_first_turns = 2
keep_recent_turns = 4

With this setup, the model can prune irrelevant exploration results as it works, and compaction provides a safety net if the context still grows too large.

Configuration Guide

Define your entire agent in a config file — model, tools, compaction, limits — and construct it with two lines of Rust:

use phi_core::{parse_config_file, agent_from_config, Agent};
use std::path::Path;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = parse_config_file(Path::new("agent.toml"))?;
    let agent = agent_from_config(&config)?;

    // agent is Arc<dyn Agent> — ready to prompt
    println!("Agent model: {:?}", agent.model_config().unwrap().id);
    Ok(())
}

Overview

The configuration system replaces scattered Rust builder calls with a declarative config file. Instead of this:

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Sonnet", &key))
    .with_system_prompt("You are a coding assistant.")
    .with_thinking(ThinkingLevel::High)
    .with_temperature(0.2)
    .with_execution_limits(ExecutionLimits { max_turns: 50, .. })
    .with_context_config(ContextConfig { .. });
}

You write a TOML file:

[agent]
system_prompt = "You are a coding assistant."

[agent.profile]
thinking_level = "high"
temperature = 0.2

[provider]
model = "claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"

[execution]
max_turns = 50

Three formats supported: TOML (primary, Rust-idiomatic), JSON (programmatic generation), YAML (human-friendly alternative).

Pipeline: Config file → parse_config_file()AgentConfig struct → agent_from_config()Arc<dyn Agent>


Quick Start

1. Create agent.toml:

[provider]
model = "claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"

2. Load and use it:

use phi_core::{parse_config_file, agent_from_config, Agent};
use std::path::Path;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = parse_config_file(Path::new("agent.toml"))?;
    let agent = agent_from_config(&config)?;

    // The agent is an Arc<dyn Agent> wrapping a BasicAgent internally.
    // Access configuration through trait methods:
    let model = agent.model_config().unwrap();
    println!("Using model: {} via {}", model.id, model.provider);

    Ok(())
}

Only the [provider] section is required. Everything else has sensible defaults.


Config Formats

The primary format. Clean, readable, Rust-idiomatic.

#![allow(unused)]
fn main() {
use phi_core::config::{parse_config, ConfigFormat};

let toml_str = r#"
[provider]
model = "claude-sonnet-4-20250514"
api_key = "sk-..."
"#;
let config = parse_config(toml_str, ConfigFormat::Toml)?;
}

JSON

Useful when generating config programmatically.

#![allow(unused)]
fn main() {
let json_str = r#"{ "provider": { "model": "gpt-4o", "api_key": "sk-...", "api": "openai" } }"#;
let config = parse_config(json_str, ConfigFormat::Json)?;
}

YAML

Human-friendly alternative.

#![allow(unused)]
fn main() {
let yaml_str = "provider:\n  model: claude-sonnet-4-20250514\n  api_key: sk-...";
let config = parse_config(yaml_str, ConfigFormat::Yaml)?;
}

Auto-Detection

parse_config_file detects format from the file extension:

ExtensionFormat
.tomlTOML
.jsonJSON
.yaml, .ymlYAML

parse_config_auto tries all formats in order (TOML → JSON → YAML) and returns the first successful parse.


Environment Variable Substitution

Any string field in the config can reference environment variables with ${VAR}:

[provider]
api_key = "${ANTHROPIC_API_KEY}"
base_url = "${CUSTOM_API_URL}"

[agent]
system_prompt = "Running in ${ENVIRONMENT} mode."

How it works:

  • Substitution happens before parsing (pre-parse text replacement)
  • Works in all three formats (TOML, JSON, YAML)
  • Missing variables produce ConfigError::MissingEnvVar
  • Malformed patterns like ${UNCLOSED are passed through literally
  • Empty ${} is passed through literally

Agent Profile

An AgentProfile is a reusable blueprint that defines default configuration. Multiple agent instances can share the same profile while overriding specific fields.

[agent.profile]
name = "coding-agent"
description = "An agent specialized for code generation and review"
system_prompt = "You are an expert software engineer."
thinking_level = "high"
temperature = 0.2
max_tokens = 16384
config_id = "coder"
skills = ["code-review", "debugging"]

System Prompt Resolution

The system prompt is resolved through a priority chain. The first non-empty value wins:

  1. [agent].system_prompt — explicit agent-level override (highest priority)
  2. Profile instance system_prompt — when an agent instance references a profile via {{agent_profile.name}}
  3. [agent.profile].system_prompt — base inline profile fallback
  4. Empty string (no system prompt)

Inline Text

The simplest form — write the prompt directly:

[agent.profile]
system_prompt = "You are an expert software engineer."

Agent-level overrides the profile:

[agent.profile]
system_prompt = "You are a general assistant."   # default from blueprint

[agent]
system_prompt = "You are a Python specialist."   # overrides the profile

File Reference (file: prefix)

Load the prompt from a file. Relative paths resolve from the agent's workspace directory:

[agent]
workspace = "workspace"

[agent.profile]
system_prompt = "file:system_prompt.md"        # resolves to workspace/system_prompt.md

Absolute paths are used as-is:

[agent.profile]
system_prompt = "file:/etc/phi/prompts/coder.md"

The file: prefix works at all levels: [agent].system_prompt, [agent.profile].system_prompt, and [[agent.profile.instances]].system_prompt.

Strategy Reference ({{...}} protocol)

For advanced multi-block prompt composition, reference a system prompt instance. This uses a 3-entity chain: strategy (block template) → prompt instance (block content) → agent reference.

# 1. Define the strategy template — block structure with ordering and size limits
[[system_prompt_strategy.instances]]
id = "{{coding_strategy}}"

[[system_prompt_strategy.instances.blocks]]
name = "identity"
order = 0
max_length = 2000

[[system_prompt_strategy.instances.blocks]]
name = "instructions"
order = 1
max_length = 3000

[[system_prompt_strategy.instances.blocks]]
name = "constraints"
order = 2
max_length = 1000

# 2. Define the prompt instance — fills content into the strategy's blocks
#    Block values can be inline text or file: references (relative to workspace)
[[system_prompt.instances]]
id = "{{coding_prompt}}"
description = "System prompt for coding agents"
type = "{{system_prompt_strategy.coding_strategy}}"
identity = "You are an expert software engineer at a fintech company."
instructions = "file:prompts/coding_instructions.md"
constraints = "Never modify production databases. Always write tests."

# 3. Reference the prompt instance from the agent
[agent]
system_prompt = "{{system_prompt.coding_prompt}}"
workspace = "workspace"

The builder resolves the chain: finds the prompt instance → finds its strategy → sorts blocks by order → resolves file: paths → truncates each block to max_length → joins with double newlines.

See the Field Reference for [system_prompt_strategy] and [system_prompt] sections.

Profile Instance Override

When using named profile instances, the instance's system_prompt participates in resolution. The system_prompt field on a profile instance supports all three modes — inline text, file: path, or {{...}} reference to a system_prompt instance:

# ── System prompt strategy + instance (reusable prompt definition) ───
[[system_prompt_strategy.instances]]
id = "{{simple}}"

[[system_prompt_strategy.instances.blocks]]
name = "identity"
order = 0
max_length = 5000

[[system_prompt.instances]]
id = "{{coder_prompt}}"
type = "{{system_prompt_strategy.simple}}"
identity = "file:prompts/coder.md"

[[system_prompt.instances]]
id = "{{reviewer_prompt}}"
type = "{{system_prompt_strategy.simple}}"
identity = "file:prompts/reviewer.md"

# ── Profile instances reference system_prompt instances ──────────────
[agent.profile]
name = "base"
system_prompt = "You are a general assistant."   # base fallback

[[agent.profile.instances]]
id = "{{coder}}"
system_prompt = "{{system_prompt.coder_prompt}}"   # profile → system_prompt instance
temperature = 0.2
max_tokens = 16384

[[agent.profile.instances]]
id = "{{reviewer}}"
system_prompt = "{{system_prompt.reviewer_prompt}}" # profile → system_prompt instance
temperature = 0.1
max_tokens = 8192

# ── Agent instances reference profile instances ──────────────────────
[[agent.instances]]
name = "code-writer"
agent_profile = "{{agent_profile.coder}}"          # agent → profile → system_prompt

[[agent.instances]]
name = "code-reviewer"
agent_profile = "{{agent_profile.reviewer}}"       # agent → profile → system_prompt

[[agent.instances]]
name = "generalist"
# no agent_profile → uses base [agent.profile].system_prompt

Full reference chain: agent.instancesagent.profile.instances (via agent_profile) → system_prompt.instances (via system_prompt) → system_prompt_strategy.instances (via type). Each layer can override or inherit from the one above.

When an agent instance omits agent_profile, it is built using the base [agent.profile] directly (no instance override). The base profile's system_prompt, temperature, and other fields apply as defaults.

Workspace-relative Resolution

The file: prefix resolves relative to the agent's workspace directory. Each agent instance can set its own workspace, so the same file: reference resolves to different files per agent:

[agent.profile]
name = "base"

[[agent.profile.instances]]
id = "{{copywriter}}"
system_prompt = "file:system_prompt.md"   # same file ref, different workspace resolution
temperature = 0.7

[[agent.instances]]
name = "alpha-writer"
agent_profile = "{{agent_profile.copywriter}}"
workspace = "projects/alpha"              # reads projects/alpha/system_prompt.md

[[agent.instances]]
name = "beta-writer"
agent_profile = "{{agent_profile.copywriter}}"
workspace = "projects/beta"               # reads projects/beta/system_prompt.md

Workspace resolution order:

  1. [[agent.instances]].workspace — per-agent instance (highest priority)
  2. [agent].workspace — shared agent-level
  3. default_workspace — global default
  4. "." — current directory

Thinking Level

Controls depth of model reasoning. Specified as a string in config:

Config ValueRust EnumDescription
"off"ThinkingLevel::OffNo chain-of-thought (default)
"minimal"ThinkingLevel::MinimalLightweight reasoning
"low"ThinkingLevel::LowSome reasoning
"medium"ThinkingLevel::MediumModerate reasoning
"high"ThinkingLevel::HighDeep reasoning before responding

Parsing is case-insensitive: "High", "HIGH", "high" all work.

Skills vs Tools

skills in the profile are skill names loaded via SkillSet from SKILL.md files (per the AgentSkills standard). They are NOT tools. See Skills for details.


Profile Instances

Profile instances are named variations of the profile blueprint. Each instance inherits the profile defaults and overrides specific fields. This lets you define a single profile and then create specialized variants without duplicating the entire configuration.

Use [[agent.profile.instances]] to define instances. Each instance requires an id field using the {{...}} ID reference protocol (see below). Instance fields override the corresponding profile defaults; any field not specified falls through to the profile value.

Agent instances reference a profile instance via the agent_profile field, using either a qualified reference ({{agent_profile.name}}) or an unqualified reference ({{name}}) if the name is unique across all namespaces.

Example

# ── Profile defaults ──────────────────────────────────────────
[agent.profile]
name = "coding-agent"
description = "An agent specialized for code tasks"
system_prompt = "You are an expert software engineer."
thinking_level = "high"
temperature = 0.2
max_tokens = 16384

# ── Profile instances (override specific fields) ─────────────
[[agent.profile.instances]]
id = "{{%coder%}}"
description = "A code generation specialist"
thinking_level = "high"
temperature = 0.2
max_tokens = 16384
config_id = "coder"

[[agent.profile.instances]]
id = "{{%reviewer%}}"
description = "A code review specialist"
thinking_level = "high"
temperature = 0.1
max_tokens = 8192
config_id = "reviewer"

# ── Agent instances referencing profile instances ─────────────
[[agent.instances]]
name = "code-writer"
agent_profile = "{{agent_profile.coder}}"
system_prompt = "You write clean, well-tested code. Follow existing patterns."

[[agent.instances]]
name = "code-reviewer"
agent_profile = "{{agent_profile.reviewer}}"
system_prompt = "You review code for bugs, security issues, and style violations."

The code-writer agent inherits all profile defaults and applies the coder instance overrides. The code-reviewer agent uses the reviewer instance, which sets a lower temperature and smaller token budget for more focused review output.


ID Reference Protocol

The {{...}} syntax is a lightweight reference protocol for linking configuration entities (providers, profile instances, sub-agents) by name. It appears in id fields (to declare an entity) and in reference fields like provider and agent_profile (to point to an entity).

Syntax

PatternMeaning
{{type.name}}Qualified reference, recreate if invoked
{{%type.name%}}Qualified reference, no recreation if already exists
{{name}}Unqualified reference (unique resolve), recreate if invoked
{{%name%}}Unqualified reference, no recreation if already exists
{{#system_id#}}Literal system ID, no recreation

Namespaces

References are resolved within namespaces. The three namespaces are:

  • agent_profile -- Profile instances declared in [[agent.profile.instances]]
  • provider -- Provider instances declared in [[provider.instances]]
  • sub_agent -- Sub-agent instances declared in [[sub_agents.instances]]

Resolution

Qualified references ({{type.name}}) include the namespace prefix and always resolve unambiguously. Use these when multiple namespaces could contain the same name.

Unqualified references ({{name}}) omit the namespace. The system searches all namespaces and resolves the reference only if the name is unique. If multiple entities share the same name across namespaces, an unqualified reference is ambiguous and will produce an error.

Recreation Semantics

The % sigil controls whether an entity is recreated when referenced:

  • Without % ({{name}} or {{type.name}}): The entity is recreated each time it is resolved. Use this when you want fresh instances.
  • With % ({{%name%}} or {{%type.name%}}): The entity is reused if it already exists (matched by latest creation date). Use this for shared singletons like provider connections.

The {{#system_id#}} form references a literal system-generated ID and never triggers recreation.

Usage in ID Fields

When declaring an entity, the id field establishes the entity's name within its namespace:

[[provider.instances]]
id = "{{%openai%}}"          # declares "openai" in the provider namespace
model = "gpt-4o"

Usage in Reference Fields

When referencing an entity from another section, use the reference syntax:

[[agent.instances]]
name = "my-agent"
provider = "{{provider.openai}}"       # qualified reference
agent_profile = "{{reviewer}}"         # unqualified (must be unique)

Provider Configuration

The [provider] section defines the LLM model, API credentials, and protocol.

[provider]
model = "claude-sonnet-4-20250514"    # Model ID sent to the API
api_key = "${ANTHROPIC_API_KEY}"      # API credential
api = "anthropic_messages"            # API protocol
provider = "anthropic"                # Provider name
name = "Claude Sonnet 4"             # Human-friendly display name
reasoning = true                      # Model supports thinking
context_window = 200000               # Context window in tokens
max_tokens = 8192                     # Default max output tokens

API Protocols

Config ValueAliasesProtocol
"anthropic_messages""anthropic"Anthropic Messages API
"openai_completions""openai"OpenAI Chat Completions
"openai_responses"OpenAI Responses API
"azure_openai_responses""azure"Azure OpenAI
"google_generative_ai""google", "gemini"Google Gemini
"google_vertex""vertex"Google Vertex AI
"bedrock_converse_stream""bedrock"Amazon Bedrock

Default base URLs are set automatically per protocol when base_url is omitted:

  • Anthropic: https://api.anthropic.com
  • OpenAI: https://api.openai.com
  • Google: https://generativelanguage.googleapis.com
  • Others: empty (uses provider defaults)

Important: The API protocol is NOT auto-detected from the model name. If you set model = "gpt-4o", you must also set api = "openai" explicitly.

Cost Rates

Enable cost tracking by setting per-token rates:

[provider.cost]
input_per_million = 3.0       # $ per million input tokens
output_per_million = 15.0     # $ per million output tokens
cache_read_per_million = 0.3  # $ per million cache-read tokens
cache_write_per_million = 3.75

Cost is tracked automatically after each turn. Combine with [execution].max_cost to enforce a budget.

Custom Headers

[provider]
model = "my-model"

[provider.headers]
"X-Custom-Header" = "value"
"Authorization" = "Bearer ${CUSTOM_TOKEN}"

Multiple Providers

Use [[provider.instances]] to define named providers alongside the default. Each instance uses the {{...}} ID reference protocol to declare its name in the provider namespace. The url field is an alias for base_url.

# Default provider — Anthropic (used unless overridden)
[provider]
model = "claude-sonnet-4-20250514"
name = "Claude Sonnet 4"
api_key = "${ANTHROPIC_API_KEY}"
api = "anthropic_messages"
provider = "anthropic"

[provider.cost]
input_per_million = 3.0
output_per_million = 15.0
cache_read_per_million = 0.3
cache_write_per_million = 3.75

# OpenAI
[[provider.instances]]
id = "{{%openai%}}"
description = "OpenAI GPT-4o provider"
name = "GPT-4o"
model = "gpt-4o"
api_key = "${OPENAI_API_KEY}"
api = "openai_completions"
url = "https://api.openai.com/v1"

# OpenRouter
[[provider.instances]]
id = "{{%openrouter%}}"
description = "OpenRouter multi-model gateway"
name = "OpenRouter"
model = "anthropic/claude-sonnet-4"
api_key = "${OPENROUTER_API_KEY}"
api = "openai_completions"
url = "https://openrouter.ai/api/v1"
provider = "openrouter"

# Google Gemini
[[provider.instances]]
id = "{{%gemini%}}"
description = "Google Gemini 2.5 Flash provider"
name = "Gemini 2.5 Flash"
model = "gemini-2.5-flash"
api_key = "${GOOGLE_API_KEY}"
api = "google_generative_ai"

# Ollama (local)
[[provider.instances]]
id = "{{%ollama%}}"
description = "Local Ollama instance for development"
name = "Ollama Llama 3.2"
model = "llama3.2"
api = "openai_completions"
url = "http://localhost:11434/v1"
api_key = "not-needed"
provider = "ollama"

Agent instances and sub-agents reference these via the ID protocol (e.g., provider = "{{provider.openai}}" or provider = "{{ollama}}" if unique).


Session Configuration

The [session] section controls session scope.

[session]
scope = "persistent"       # "ephemeral" (default) or "persistent"

Session Scope

ValueBehavior
"ephemeral"Session exists only in memory for the process lifetime (default)
"persistent"Session data is written to a store and survives restarts

Note: Setting scope = "persistent" declares intent but does not automatically configure a storage backend. The caller must set up session persistence using the session recorder.

Thinking level and temperature are configured per-loop via LoopConfigSnapshot (captured on each AgentStart event) rather than at the session level. Set them on the agent profile or AgentLoopConfig.


Tools

The [tools] section declares which tools the agent can use and how they execute.

[tools]
enabled = ["bash", "file_read", "file_write", "search"]
tool_strategy = "parallel"   # "sequential", "parallel", or "batched"
batch_size = 3               # Only used when strategy is "batched"

Tool Execution Strategies

StrategyBehavior
"sequential"One tool at a time; checks steering queue between each
"parallel"All tool calls concurrent; check steering after all complete (default)
"batched"Run N concurrent, wait, check steering, next batch

Context Pruning

Enable model-directed context pruning with with_prun_tool(). This lets the model surgically remove irrelevant inrun content (its own messages, tool calls, tool results) from the working context to reclaim space in the context window. User messages are never pruned. See Context Pruning for details.

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(model_config)
    .with_default_tools()
    .with_prun_tool();
}

Or via config:

[tools]
enabled = ["bash", "read_file", "write_file", "prun"]

Registering Tools at Runtime

Tools are NOT instantiated from the config file. The config specifies tool names only. You must register tool instances after constructing the agent:

#![allow(unused)]
fn main() {
use phi_core::{parse_config_file, agent_from_config, Agent};
use phi_core::tools::{BashTool, ReadFileTool, WriteFileTool, SearchTool};
use std::sync::Arc;

let config = parse_config_file(Path::new("agent.toml"))?;
let agent = agent_from_config(&config)?;

// Cast to mutable and register tools
let agent_mut = Arc::get_mut(&mut agent).unwrap();
agent_mut.set_tools(vec![
    Arc::new(BashTool::default()),
    Arc::new(ReadFileTool::new()),
    Arc::new(WriteFileTool::new()),
    Arc::new(SearchTool::new()),
]);
}

Tool Registry

Instead of manually registering tools after construction, use agent_from_config_with_registry() to resolve tool names from the config automatically:

#![allow(unused)]
fn main() {
use phi_core::{parse_config_file, agent_from_config_with_registry, Agent};
use phi_core::tools::ToolRegistry;
use std::path::Path;

let config = parse_config_file(Path::new("agent.toml"))?;

// Create a registry with the 6 built-in tools
let registry = ToolRegistry::new().with_defaults();

// Tools listed in config.tools.enabled are resolved through the registry
let agent = agent_from_config_with_registry(&config, &registry)?;
}

The default registry includes all 6 built-in tools: bash, read_file, write_file, edit_file, list_files, search. You can also register custom tools:

#![allow(unused)]
fn main() {
let mut registry = ToolRegistry::new().with_defaults();
registry.register("my_tool", || Arc::new(MyCustomTool::new()));

let agent = agent_from_config_with_registry(&config, &registry)?;
}

Unknown tool names in tools.enabled are silently skipped. Use registry.contains(name) to check availability before construction if needed.


Context & Compaction

The [compaction] section controls automatic context management. When the conversation grows too long, compaction summarizes older messages to stay within the model's context window.

[compaction]
max_context_tokens = 200000     # Model's context window
system_prompt_tokens = 4000     # Tokens reserved for system prompt
compact_at_pct = 0.85           # Start measuring at 85% capacity
compact_budget_threshold_pct = 0.05  # Compact when < 5% headroom remains
keep_first_turns = 2            # Keep first 2 turns verbatim
keep_recent_turns = 4           # Keep last 4 turns verbatim
max_summary_tokens = 2000       # Token budget for the summarized middle
tool_output_max_lines = 50      # Truncate tool outputs to 50 lines

Compaction must be explicitly enabled by setting max_context_tokens. If omitted, compaction is disabled entirely.

How Compaction Works

  1. Before each LLM turn, the loop estimates current token usage
  2. If usage exceeds the trigger threshold, compaction fires
  3. First N turns are kept verbatim (preserves initial context)
  4. Middle turns are summarized (aggressive token reduction)
  5. Last M turns are kept verbatim (preserves recent history)
  6. Tool outputs in kept turns are truncated to max_lines

See Context Compaction for the full algorithm.

Focused Compaction

The focus_message field steers what the compaction summary emphasizes. Compaction instances let you define named variations that agent profiles can reference.

[compaction]
max_context_tokens = 200000
focus_message = "Retain key decisions and code changes."

# Named compaction instances
[[compaction.instances]]
id = "{{%coding%}}"
focus_message = "Focus on file paths, function signatures, and design rationale."
keep_recent_turns = 6
max_summary_tokens = 3000

[[compaction.instances]]
id = "{{%research%}}"
focus_message = "Preserve citations, data sources, and methodology."
keep_first_turns = 3
max_summary_tokens = 4000

Profiles reference a compaction instance via compaction = "{{compaction.coding}}":

[agent.profile]
name = "coding-agent"
compaction = "{{compaction.coding}}"

See Focused Compaction for full details.


Execution Limits

The [execution] section sets safety guards that prevent runaway loops and budget overruns.

[execution]
max_turns = 50              # Maximum LLM turns (default: 50)
max_total_tokens = 1000000  # Total token budget (default: 1,000,000)
max_duration_secs = 600     # Wall-clock timeout in seconds (default: 600)
max_cost = 5.0              # Dollar cost cap (requires [provider.cost] rates)

Cost Tracking

Cost enforcement requires both cost rates and a budget:

[provider.cost]
input_per_million = 3.0
output_per_million = 15.0

[execution]
max_cost = 5.0   # Stop when accumulated cost reaches $5

Without cost rates (all zeros), max_cost has no effect. Token usage is always tracked regardless.

Retry Configuration

Automatic retry for transient provider errors (rate limits, network issues):

[execution.retry]
max_retries = 3           # Retry attempts (default: 3, 0 = disabled)
initial_delay_ms = 1000   # First retry delay in ms
backoff_multiplier = 2.0  # Exponential backoff multiplier
max_delay_ms = 30000      # Maximum delay cap

Only RateLimited and Network errors are retried. Invalid requests and context overflows fail immediately.

Cache Configuration

Control prompt caching behavior:

[execution.cache]
enabled = true        # Master switch (default: true)
strategy = "auto"     # "auto" or "disabled"

Sub-Agents

Define sub-agents that run their own agent loops when invoked as tools:

[[sub_agents.instances]]
name = "researcher"
description = "Searches the web for information"
system_prompt = "You are a research assistant. Search thoroughly."
model = "claude-haiku-4-5-20251001"
max_turns = 10
tools = ["web_search"]

[[sub_agents.instances]]
name = "code_writer"
description = "Writes and edits code files"
system_prompt = "You are a code generation expert."
provider = "openai"    # References a [[provider.instances]] by name
max_turns = 20
tools = ["bash", "file_write"]

Sub-agents do NOT inherit the parent agent's configuration. Each sub-agent is fully independent — set all needed fields explicitly.


Multi-Agent Configurations

For complex setups, combine named providers with named agent instances:

# Providers
[provider]
model = "claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"

[[provider.instances]]
name = "fast"
model = "claude-haiku-4-5-20251001"
api_key = "${ANTHROPIC_API_KEY}"

# Agent instances
[[agent.instances]]
name = "planner"
system_prompt = "You are an architect. Plan the approach."
provider = "fast"

[[agent.instances]]
name = "executor"
system_prompt = "You are an implementer. Write the code."

Agent Workspace

The workspace field sets the working directory for an agent. Tools that interact with the filesystem (bash, file read/write, etc.) use this as their base path.

There are two levels of workspace configuration:

  • default_workspace (top-level config field): Sets the default workspace for all agents. If omitted, the current working directory is used.
  • workspace (per-agent field on [agent.profile] or [[agent.instances]]): Overrides default_workspace for a specific agent.
default_workspace = "/home/user/projects"

[agent.profile]
workspace = "/home/user/projects/my-app"   # overrides default_workspace for this agent

Callbacks & Hooks

The config schema accepts [callbacks] and [hooks] sections for lifecycle hooks:

[callbacks]
before_loop = "my_plugin::before_loop"
after_turn = "my_plugin::after_turn"
before_task = "./scripts/on_task_start.sh"
after_task = "python3 scripts/after_task.py"

[hooks]
transform_context = "my_plugin::transform"

Script-based callbacks (shell scripts, Python scripts) are supported. The agent spawns the script as a subprocess, passing context via environment variables. Exit code 0 means continue; non-zero aborts the action (for Before* hooks). WASM plugin loading for Rust-native callbacks is planned for Phase 2.

Session-Level Callbacks

before_task and after_task are session-level callbacks configured on SessionRecorderConfig:

  • before_task: Fires on the first AgentStart event with a new session_id. Use for task-level setup, metrics initialization, or audit logging.
  • after_task: Fires on flush(). Use for task-level teardown, billing, or summary generation.

Programmatic Hooks

To set hooks programmatically, use the Agent trait setter methods after construction:

#![allow(unused)]
fn main() {
let agent = agent_from_config(&config)?;
let agent_mut = Arc::get_mut(&mut agent).unwrap();
agent_mut.set_before_loop(Some(Arc::new(|msgs, n| {
    println!("Loop starting with {} messages", msgs.len());
    true // return false to abort
})));
}

Complete Example

A full coding agent configuration using every section:

# ── Agent identity ────────────────────────────────────────────
[agent]
system_prompt = "You are an expert software engineer."

[agent.profile]
name = "coding-agent"
description = "Full-featured coding assistant"
thinking_level = "high"
temperature = 0.2
max_tokens = 16384
config_id = "coder-v1"
skills = ["code-review"]

# ── Provider ──────────────────────────────────────────────────
[provider]
model = "claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"
reasoning = true
context_window = 200000

[provider.cost]
input_per_million = 3.0
output_per_million = 15.0
cache_read_per_million = 0.3
cache_write_per_million = 3.75

# ── Session ───────────────────────────────────────────────────
[session]
scope = "persistent"

# ── Tools ─────────────────────────────────────────────────────
[tools]
enabled = ["bash", "file_read", "file_write", "search", "edit_file"]
tool_strategy = "parallel"

# ── Context management ────────────────────────────────────────
[compaction]
max_context_tokens = 200000
system_prompt_tokens = 4000
compact_at_pct = 0.85
keep_first_turns = 2
keep_recent_turns = 4
max_summary_tokens = 2000
tool_output_max_lines = 50

# ── Execution limits ──────────────────────────────────────────
[execution]
max_turns = 100
max_total_tokens = 2000000
max_duration_secs = 1800
max_cost = 10.0

[execution.retry]
max_retries = 3
initial_delay_ms = 1000
backoff_multiplier = 2.0

[execution.cache]
enabled = true
strategy = "auto"

# ── Sub-agents ────────────────────────────────────────────────
[[sub_agents.instances]]
name = "researcher"
description = "Searches for information and documentation"
system_prompt = "Find relevant information. Be thorough."
model = "claude-haiku-4-5-20251001"
max_turns = 10
tools = ["web_search"]

Field Reference

[agent]

FieldTypeDefaultDescription
system_promptstringNoneAgent-level system prompt (overrides profile). Supports: inline text, file:path (relative to workspace), or {{...}} reference to a [[system_prompt.instances]] entry.
profiletable(empty)Profile blueprint (see below)
workspacestringNoneWorkspace directory for file: resolution and tool paths
instancesarray[]Named agent instances

[agent.profile]

FieldTypeDefaultDescription
profile_idstringUUIDUnique profile identifier
namestringNoneHuman-readable name
descriptionstringNoneProfile description
system_promptstringNoneDefault system prompt. Supports: inline text, file:path, or {{...}} reference.
thinking_levelstringNone"off", "minimal", "low", "medium", "high"
temperaturefloatNoneLLM temperature (0.0-2.0)
max_tokensintegerNoneMax output tokens
config_idstringNoneStable identity for loop_id generation
skillsarray[]Skill names (SKILL.md, not tools)
instancesarray[]Named profile instances (see ProfileInstanceSection)

ProfileInstanceSection

Each entry in [[agent.profile.instances]]:

FieldTypeDefaultDescription
idstringrequired{{...}} ID in the agent_profile namespace
descriptionstringNoneHuman-readable description of this variant
namestring(from profile)Override name
system_promptstring(from profile)Override system prompt (supports inline, file:, or {{...}})
thinking_levelstring(from profile)Override thinking level
temperaturefloat(from profile)Override temperature
max_tokensinteger(from profile)Override max output tokens
config_idstringNoneStable identity for loop_id generation
skillsarray(from profile)Override skill names

AgentInstanceSection

Each entry in [[agent.instances]]:

FieldTypeDefaultDescription
namestring"unnamed"Instance name
agent_profilestringNoneProfile instance reference ({{...}} syntax)
profiletableNoneInline profile override (not a reference)
system_promptstringNoneInstance-specific system prompt
providerstring(default provider)Provider reference ({{...}} syntax)
workspacestringNonePer-instance workspace directory (overrides [agent].workspace)

[provider]

FieldTypeDefaultDescription
modelstring"unknown"Model ID sent to API
api_keystring""API credential (supports ${VAR})
apistring"anthropic_messages"API protocol
base_urlstring(per protocol)API base URL (url is an accepted alias)
providerstring"anthropic"Provider name
namestringmodel valueDisplay name
reasoningboolfalseSupports thinking/reasoning
context_windowinteger200000Context window tokens
max_tokensinteger8192Default max output tokens

ProviderInstanceSection

Each entry in [[provider.instances]] accepts all fields from [provider] above, plus:

FieldTypeDefaultDescription
idstringNone{{...}} ID in the provider namespace
descriptionstringNoneHuman-readable description of this provider
urlstringNoneAlias for base_url

[provider.cost]

FieldTypeDefaultDescription
input_per_millionfloat0.0Input token rate
output_per_millionfloat0.0Output token rate
cache_read_per_millionfloat0.0Cache read rate
cache_write_per_millionfloat0.0Cache write rate

[session]

FieldTypeDefaultDescription
scopestring"ephemeral""ephemeral" or "persistent"

[tools]

FieldTypeDefaultDescription
enabledarray[]Tool names (resolved by caller)
tool_strategystring"parallel""sequential", "parallel", "batched"
batch_sizeinteger3Batch size for "batched" strategy

[compaction]

FieldTypeDefaultDescription
max_context_tokensintegerNoneContext window (must set to enable compaction)
system_prompt_tokensinteger4000Reserved system prompt tokens
compact_at_pctfloat0.90Measurement threshold
compact_budget_threshold_pctfloat0.05Compaction trigger
keep_first_turnsinteger2Verbatim turns from start
keep_recent_turnsinteger10Verbatim turns from end
max_summary_tokensinteger2000Summary token budget
tool_output_max_linesinteger50Tool output line cap

[system_prompt_strategy]

Strategy templates define block structure for multi-block system prompts.

[[system_prompt_strategy.instances]]

FieldTypeDefaultDescription
idstringrequired{{...}} ID for this strategy template
descriptionstringNoneHuman-readable description
blocksarray[]Block definitions (see below)

[[system_prompt_strategy.instances.blocks]]

FieldTypeDefaultDescription
namestringrequiredBlock name (e.g., "identity", "instructions", "constraints")
orderinteger0Assembly order — lower appears first in the composed prompt
max_lengthintegerunlimitedMaximum character budget for this block

[system_prompt]

Prompt instances fill content into a strategy's blocks.

[[system_prompt.instances]]

FieldTypeDefaultDescription
idstringrequired{{...}} ID for this prompt instance
descriptionstringNoneHuman-readable description
typestringNone{{...}} reference to a strategy instance (e.g., "{{system_prompt_strategy.coding}}")
(block names)stringEach block defined in the strategy gets a field here. Value is inline text or "file:path" (relative to workspace).

Note: Block content fields use #[serde(flatten)] — they appear as top-level keys on the instance, not nested under a blocks table.

[execution]

FieldTypeDefaultDescription
max_turnsinteger50Maximum LLM turns
max_total_tokensinteger1000000Total token budget
max_duration_secsinteger600Wall-clock timeout (seconds)
max_costfloatNoneDollar cost cap

[execution.retry]

FieldTypeDefaultDescription
max_retriesinteger3Retry attempts (0 = disabled)
initial_delay_msinteger1000First retry delay (ms)
backoff_multiplierfloat2.0Exponential backoff factor
max_delay_msinteger30000Maximum delay cap (ms)

[execution.cache]

FieldTypeDefaultDescription
enabledbooltrueMaster switch
strategystring"auto""auto" or "disabled"

Error Handling

agent_from_config() and the parse functions return ConfigError:

VariantCauseFix
Parse(msg)Invalid TOML/JSON/YAML syntaxCheck syntax; the message includes the parser error
MissingEnvVar { var }${VAR} references an unset env varSet the variable or remove the reference
InvalidField { field, value, expected }Invalid enum value (e.g., thinking_level = "extreme")Use one of the expected values
Io(err)File not found or not readableCheck file path and permissions

Common Mistakes

Forgetting to set the API protocol for non-Anthropic models:

# Wrong — defaults to anthropic_messages, fails at runtime
[provider]
model = "gpt-4o"
api_key = "${OPENAI_API_KEY}"

# Correct
[provider]
model = "gpt-4o"
api_key = "${OPENAI_API_KEY}"
api = "openai"

Setting max_cost without cost rates:

# max_cost is ignored — no rates to compute cost from
[execution]
max_cost = 5.0

# Correct — set rates AND budget
[provider.cost]
input_per_million = 3.0
output_per_million = 15.0

[execution]
max_cost = 5.0

Expecting tools to be instantiated from config:

[tools]
enabled = ["bash", "file_read"]
# These are names only — you must call agent.set_tools() in Rust

Programmatic Usage

Using AgentConfig Directly

You can construct AgentConfig in Rust without a file:

#![allow(unused)]
fn main() {
use phi_core::config::schema::{AgentConfig, ProviderSection, ProfileSection, AgentSection};

let config = AgentConfig {
    provider: ProviderSection {
        model: Some("claude-sonnet-4-20250514".into()),
        api_key: Some(std::env::var("ANTHROPIC_API_KEY")?),
        ..Default::default()
    },
    agent: AgentSection {
        system_prompt: Some("You are helpful.".into()),
        profile: ProfileSection {
            thinking_level: Some("high".into()),
            ..Default::default()
        },
        ..Default::default()
    },
    ..Default::default()
};

let agent = agent_from_config(&config)?;
}

Mixing Config with Programmatic Overrides

After agent_from_config(), use Agent trait methods to add hooks, tools, or modify settings:

#![allow(unused)]
fn main() {
use phi_core::{parse_config_file, agent_from_config, Agent};
use std::sync::Arc;

let config = parse_config_file(Path::new("agent.toml"))?;
let mut agent = agent_from_config(&config)?;

// Get mutable access to add tools and hooks
let a = Arc::get_mut(&mut agent).unwrap();
a.set_tools(vec![Arc::new(phi_core::tools::BashTool::default())]);
a.set_before_loop(Some(Arc::new(|msgs, _| {
    println!("Starting with {} messages", msgs.len());
    true
})));
}

Reading Config Through the Agent Trait

All configuration is accessible through Agent trait methods:

#![allow(unused)]
fn main() {
let agent = agent_from_config(&config)?;

// Config accessors (all have defaults)
agent.model_config();       // Option<&ModelConfig>
agent.profile();            // Option<&AgentProfile>
agent.system_prompt();      // &str
agent.thinking_level();     // ThinkingLevel
agent.temperature();        // Option<f32>
agent.max_tokens();         // Option<u32>
agent.context_config();     // Option<&ContextConfig>
agent.execution_limits();   // Option<&ExecutionLimits>
agent.cache_config();       // CacheConfig
agent.tool_execution();     // ToolExecutionStrategy
agent.retry_config();       // RetryConfig
agent.session();            // Option<&Session>
agent.build_config();       // Result<AgentLoopConfig, AgentBuildError>
                            // Default impl returns Err(MissingModelConfig)
                            // if model_config() returns None. BasicAgent's
                            // override always returns Ok(...).
}

MCP Integration

What is MCP?

The Model Context Protocol (MCP) is a JSON-RPC 2.0 protocol that lets AI agents discover and call tools from external servers. It defines a standard way for agents to connect to tool providers over two transports:

  • Stdio — spawn a child process, communicate via stdin/stdout (newline-delimited JSON)
  • HTTP — POST JSON-RPC requests to an HTTP endpoint

Connecting to MCP Servers

Stdio Transport

Use with_mcp_server_stdio() to spawn an MCP server process and register its tools:

use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api_key = std::env::var("ANTHROPIC_API_KEY")?;
    let mut agent = BasicAgent::new(ModelConfig::anthropic(
        "claude-sonnet-4-20250514",
        "Claude Sonnet 4",
        &api_key,
    ))
    .with_system_prompt("You are a helpful assistant with file access.")
    .with_mcp_server_stdio(
        "npx",
        &["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
        None,
    )
    .await?;

    let rx = agent.prompt("List files in /tmp").await;
    // handle events...
    Ok(())
}

You can pass environment variables to the server process:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let mut env = HashMap::new();
env.insert("API_TOKEN".into(), "secret".into());

let agent = BasicAgent::new(model_config)
    .with_mcp_server_stdio("my-mcp-server", &["--port", "0"], Some(env))
    .await?;
}

HTTP Transport

For remote MCP servers exposed over HTTP:

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(model_config)
    .with_mcp_server_http("http://localhost:8080/mcp")
    .await?;
}

How MCP Tools Work

When you call with_mcp_server_stdio() or with_mcp_server_http(), phi-core:

  1. Connects to the MCP server and performs the initialize handshake
  2. Calls tools/list to discover available tools
  3. Wraps each MCP tool as an AgentTool via McpToolAdapter
  4. Adds them to the agent's tool list

MCP tools appear alongside built-in tools. The LLM sees them with their original names, descriptions, and JSON Schema parameters — it can call them just like any other tool.

Mixing Built-in and MCP Tools

#![allow(unused)]
fn main() {
use phi_core::tools::default_tools;

let agent = BasicAgent::new(model_config)
    .with_tools(default_tools())  // bash, read, write, edit, list, search
    .with_mcp_server_stdio("my-db-server", &[], None)
    .await?;
// Agent now has both built-in coding tools AND MCP database tools
}

Using the MCP Client Directly

For lower-level control, use McpClient directly:

#![allow(unused)]
fn main() {
use phi_core::mcp::{McpClient, McpToolAdapter};
use std::sync::Arc;
use tokio::sync::Mutex;

let client = McpClient::connect_stdio("my-server", &[], None).await?;
let tools = client.list_tools().await?;

for tool in &tools {
    println!("{}: {}", tool.name, tool.description.as_deref().unwrap_or(""));
}

// Call a tool directly
let result = client.call_tool("read_file", serde_json::json!({"path": "/tmp/test.txt"})).await?;

// Or wrap as AgentTool adapters
let client = Arc::new(Mutex::new(client));
let adapters = McpToolAdapter::from_client(client).await?;
}

Error Handling

MCP operations return McpError:

  • McpError::Transport — connection or I/O failure
  • McpError::Protocol — unexpected response format
  • McpError::JsonRpc — server returned a JSON-RPC error (code + message)
  • McpError::Serialization — JSON serialization/deserialization failure
  • McpError::Io — standard I/O error
  • McpError::ConnectionClosed — server process exited

When an MCP tool returns isError: true, the adapter converts it to a ToolError::Failed, which the agent loop sends back to the LLM with is_error: true so it can self-correct.

OpenAPI Tool Adapter

Auto-generate AgentTool implementations from OpenAPI 3.0 specs. Point an agent at any API spec and it instantly gets callable tools for every operation.

Feature-gated — add features = ["openapi"] to your Cargo.toml.

Quick Start

use phi_core::BasicAgent;
use phi_core::openapi::{OpenApiToolAdapter, OpenApiConfig, OperationFilter};
use phi_core::provider::ModelConfig;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api_key = std::env::var("ANTHROPIC_API_KEY")?;
    let config = OpenApiConfig::new()
        .with_bearer_token("sk-...");

    let agent = BasicAgent::new(ModelConfig::anthropic(
        "claude-sonnet-4-20250514",
        "Claude Sonnet 4",
        &api_key,
    ))
    .with_system_prompt("You are an API assistant.")
    .with_openapi_file("petstore.yaml", config, &OperationFilter::All)
    .await?;

    Ok(())
}

Loading Specs

Three ways to load an OpenAPI spec:

#![allow(unused)]
fn main() {
// From a file
let agent = agent.with_openapi_file("spec.yaml", config, &filter).await?;

// From a URL
let agent = agent.with_openapi_url("https://api.example.com/openapi.json", config, &filter).await?;

// From a string (sync)
let agent = agent.with_openapi_spec(&spec_string, config, &filter)?;
}

Or create adapters directly for more control:

#![allow(unused)]
fn main() {
let adapters = OpenApiToolAdapter::from_str(&spec, config, &OperationFilter::All)?;
let tools: Vec<Box<dyn AgentTool>> = adapters.into_iter().map(|a| Box::new(a) as _).collect();
}

Configuration

OpenApiConfig controls auth, headers, timeouts, and response limits:

#![allow(unused)]
fn main() {
let config = OpenApiConfig::new()
    .with_base_url("https://api.staging.example.com") // Override spec's servers
    .with_bearer_token("sk-...")                       // Bearer auth
    .with_header("X-Custom", "value")                  // Extra headers
    .with_timeout_secs(60)                             // Request timeout
    .with_max_response_bytes(128 * 1024)               // Truncate large responses
    .with_name_prefix("github");                       // Tool names: github__listRepos
}

Authentication

#![allow(unused)]
fn main() {
// Bearer token
let config = OpenApiConfig::new().with_bearer_token("token");

// API key in a custom header
let config = OpenApiConfig::new().with_api_key("X-API-Key", "key-value");

// No auth
let config = OpenApiConfig::new(); // default
}

Filtering Operations

Most API specs have dozens or hundreds of operations. Use OperationFilter to select which ones become tools:

#![allow(unused)]
fn main() {
// All operations (default)
let filter = OperationFilter::All;

// Specific operations by ID
let filter = OperationFilter::ByOperationId(vec![
    "listRepos".into(),
    "getRepo".into(),
    "createIssue".into(),
]);

// All operations with a specific tag
let filter = OperationFilter::ByTag(vec!["repos".into()]);

// All operations under a path prefix
let filter = OperationFilter::ByPathPrefix("/repos".into());
}

How It Works

Each OpenAPI operation becomes one AgentTool:

AgentTool methodMapped from
name()operationId (with optional prefix)
label()summary or operationId
description()description or summary
parameters_schema()Combined JSON Schema from path/query/header params + request body

When the LLM calls a tool, the adapter:

  1. Substitutes path parameters in the URL (/pets/{petId}/pets/123)
  2. Adds query parameters as ?key=value
  3. Adds header parameters
  4. Applies auth from config
  5. Sends the request body as JSON (if the operation has one)
  6. Returns the response text (with status code) to the LLM

Non-2xx responses are not treated as errors — they're returned as text so the LLM can reason about them and retry or adjust.

Mixing with Other Tools

OpenAPI tools work alongside built-in tools and MCP tools:

#![allow(unused)]
fn main() {
use phi_core::tools::default_tools;

let agent = BasicAgent::new(model_config)
    .with_tools(default_tools())
    .with_openapi_file("github.yaml", github_config, &github_filter).await?
    .with_mcp_server_stdio("db-server", &[], None).await?;
}

Limitations (v1)

  • OpenAPI 3.0.x only (not 3.1.x)
  • JSON request/response bodies only (no multipart/form-data)
  • No OAuth2 or token refresh (pass tokens via OpenApiConfig)
  • Operations without operationId are skipped
  • Path-level $ref items are skipped

Providers Overview

phi-core supports multiple LLM providers through the StreamProvider trait and ApiProtocol dispatch. Callers never name a provider struct directly — ModelConfig is the single descriptor for every provider connection.

Supported Protocols

ApiProtocolWire FormatFactory Method
AnthropicMessagesAnthropic Messages APIModelConfig::anthropic(id, name, key)
OpenAiCompletionsOpenAI Chat Completions (15+ backends)ModelConfig::openai(id, name, key) / ModelConfig::local(url, id, key) / ModelConfig::openrouter(id, key)
OpenAiResponsesOpenAI Responses APIDirect struct construction
AzureOpenAiResponsesAzure OpenAI ResponsesDirect struct construction
GoogleGenerativeAiGoogle Gemini APIModelConfig::google(id, name, key)
GoogleVertexGoogle Vertex AIDirect struct construction
BedrockConverseStreamAWS Bedrock ConverseStreamDirect struct construction

ApiProtocol Enum

#![allow(unused)]
fn main() {
pub enum ApiProtocol {
    AnthropicMessages,
    OpenAiCompletions,
    OpenAiResponses,
    AzureOpenAiResponses,
    GoogleGenerativeAi,
    GoogleVertex,
    BedrockConverseStream,
}
}

ModelConfig

ModelConfig is the single, complete description of a provider connection. Pass it to BasicAgent::new(), SubAgentTool::new(), or AgentLoopConfig.model_config:

#![allow(unused)]
fn main() {
pub struct ModelConfig {
    pub id: String,              // e.g. "gpt-4o" — model name sent to the API
    pub name: String,            // e.g. "GPT-4o" — display label for logging/UI
    pub api: ApiProtocol,        // Which wire protocol to use (dispatch key)
    pub provider: String,        // e.g. "openai" — logging label
    pub base_url: String,        // API endpoint (no trailing slash)
    pub api_key: String,         // Auth credential (sk-..., or "access_key:secret" for Bedrock)
    pub reasoning: bool,         // Supports thinking/reasoning
    pub context_window: u32,     // Context size in tokens
    pub max_tokens: u32,         // Default max output
    pub cost: CostConfig,        // Pricing per million tokens (0.0 = no tracking)
    pub headers: HashMap<String, String>,  // Extra HTTP headers
    pub compat: Option<OpenAiCompat>,      // Quirk flags (OpenAiCompletions only)
}
}

Factory methods (all accept api_key as the auth parameter):

#![allow(unused)]
fn main() {
let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
let anthropic = ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key);

let openai_key = std::env::var("OPENAI_API_KEY").unwrap();
let openai = ModelConfig::openai("gpt-4o", "GPT-4o", &openai_key);

let gemini_key = std::env::var("GEMINI_API_KEY").unwrap();
let google = ModelConfig::google("gemini-2.0-flash", "Gemini 2.0 Flash", &gemini_key);

// Local server — pass empty string for api_key if unauthenticated
let local = ModelConfig::local("http://localhost:1234/v1", "my-model", "");

// OpenRouter — dedicated factory with correct compat flags
let or_key = std::env::var("OPENROUTER_API_KEY").unwrap();
let openrouter = ModelConfig::openrouter("anthropic/claude-sonnet-4", &or_key);
}

ProviderRegistry

Maps ApiProtocolStreamProvider. The default registry includes all built-in providers:

#![allow(unused)]
fn main() {
let registry = ProviderRegistry::default();

// Use it to stream with any model
let result = registry.stream(&model_config, stream_config, tx, cancel).await?;
}

Custom registries (advanced — for adding a fully custom StreamProvider implementation):

#![allow(unused)]
fn main() {
use phi_core::provider::{ProviderRegistry, ApiProtocol};

let mut registry = ProviderRegistry::new();
registry.register(ApiProtocol::AnthropicMessages, my_custom_provider);
// Then pass to AgentLoopConfig... (most users should use provider_override instead)
}

StreamProvider Trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait StreamProvider: Send + Sync {
    async fn stream(
        &self,
        config: StreamConfig,
        tx: mpsc::UnboundedSender<StreamEvent>,
        cancel: CancellationToken,
    ) -> Result<Message, ProviderError>;
}
}

All providers receive a StreamConfig, emit StreamEvents through the channel, and return the final Message.

OpenAPI Tool Adapter

In addition to LLM providers, phi-core can auto-generate tools from any OpenAPI 3.0 spec. This is a tool integration (not a provider), but it complements the provider system by letting agents call external APIs.

Enable with features = ["openapi"]. See the OpenAPI Tools guide for details.

Anthropic Provider

Handles the Anthropic Messages API with SSE streaming. Selected automatically when ModelConfig.api == ApiProtocol::AnthropicMessages.

Usage

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;

let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
));
}

Features

Streaming SSE

Uses reqwest-eventsource to parse Anthropic's SSE stream. Events handled:

  • message_start — Input token usage, cache stats
  • content_block_start — Text, thinking, or tool_use block
  • content_block_delta — Text, thinking, input JSON, or signature deltas
  • content_block_stop — Block complete
  • message_delta — Stop reason, output usage
  • message_stop — Stream complete

Extended Thinking

Set thinking_level to enable thinking with a token budget:

LevelBudget Tokens
Minimal128
Low512
Medium2,048
High8,192

Thinking content is streamed as Content::Thinking with a cryptographic signature for verification.

Cache Control

Automatic prompt caching via cache_control markers:

  • System prompt: Always cached with {"type": "ephemeral"}
  • Second-to-last message: Gets cache_control on its last content block, creating a cache breakpoint

This means on repeated calls, only the latest message is processed at full price.

Configuration

SettingValue
API URLhttps://api.anthropic.com/v1/messages
API Version2023-06-01
Auth Headerx-api-key
Default Max Tokens8,192

Environment Variables

VariablePurpose
ANTHROPIC_API_KEYAPI key

OpenAI Compatible Provider

One implementation (OpenAiCompatProvider) covers OpenAI, xAI, Groq, Cerebras, OpenRouter, Mistral, DeepSeek, and any other OpenAI Chat Completions-compatible API. The provider is selected automatically when ModelConfig.api == ApiProtocol::OpenAiCompletions.

Per-service behavior is controlled by OpenAiCompat flags stored in ModelConfig.compat.

Usage

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;

// OpenAI
let api_key = std::env::var("OPENAI_API_KEY").unwrap();
let agent = BasicAgent::new(ModelConfig::openai("gpt-4o", "GPT-4o", &api_key));

// OpenRouter
let or_key = std::env::var("OPENROUTER_API_KEY").unwrap();
let agent = BasicAgent::new(ModelConfig::openrouter("anthropic/claude-sonnet-4", &or_key));

// Local server (LM Studio, Ollama, llama.cpp, vLLM)
let agent = BasicAgent::new(ModelConfig::local(
    "http://localhost:1234/v1",
    "my-model",
    "",  // empty string — most local servers don't require auth
));
}

OpenAiCompat Quirk Flags

Different providers have behavioral differences even though they share the same API:

#![allow(unused)]
fn main() {
pub struct OpenAiCompat {
    pub supports_store: bool,
    pub supports_developer_role: bool,
    pub supports_reasoning_effort: bool,
    pub supports_usage_in_streaming: bool,
    pub max_tokens_field: MaxTokensField,       // MaxTokens or MaxCompletionTokens
    pub requires_tool_result_name: bool,
    pub requires_assistant_after_tool_result: bool,
    pub thinking_format: ThinkingFormat,        // OpenAi, Xai, Qwen, or OpenRouter
}
}

Provider Presets

ProviderModelConfig factoryKey Differences
OpenAIModelConfig::openai(id, name, key)developer role, max_completion_tokens, store, reasoning_effort
OpenRouterModelConfig::openrouter(id, key)developer role, max_tokens, OpenRouter thinking format
LocalModelConfig::local(url, id, key)Generic defaults, empty api_key OK
xAI (Grok)Direct construction with OpenAiCompat::xai()reasoning field for thinking
GroqDirect construction with OpenAiCompat::groq()Standard defaults
CerebrasDirect construction with OpenAiCompat::cerebras()Standard defaults
MistralDirect construction with OpenAiCompat::mistral()max_tokens field
DeepSeekDirect construction with OpenAiCompat::deepseek()max_completion_tokens

Adding a New Compatible Provider

  1. Add a constructor to OpenAiCompat:
#![allow(unused)]
fn main() {
impl OpenAiCompat {
    pub fn my_provider() -> Self {
        Self {
            supports_usage_in_streaming: true,
            // set flags as needed...
            ..Default::default()
        }
    }
}
}
  1. Create a ModelConfig that uses it:
#![allow(unused)]
fn main() {
use phi_core::provider::{ModelConfig, ApiProtocol, OpenAiCompat};

let config = ModelConfig {
    id: "my-model".into(),
    name: "My Model".into(),
    api: ApiProtocol::OpenAiCompletions,
    provider: "my-provider".into(),
    base_url: "https://api.myprovider.com/v1".into(),
    api_key: std::env::var("MY_API_KEY").unwrap_or_default(),
    compat: Some(OpenAiCompat::my_provider()),
    ..Default::default()
};
BasicAgent::new(config)
}

Thinking/Reasoning

The ThinkingFormat enum controls how reasoning content is parsed from streams:

  • ThinkingFormat::OpenAi — Uses reasoning_content field (most providers, default)
  • ThinkingFormat::Xai — Uses reasoning field (Grok)
  • ThinkingFormat::Qwen — Uses reasoning_content field (Qwen variant)
  • ThinkingFormat::OpenRouter — Uses reasoning_details array (OpenRouter extended thinking)

Auth

Uses Authorization: Bearer {api_key} header. Extra headers can be added via ModelConfig.headers.

Google Gemini Provider

Two providers for Google's Gemini models:

  • GoogleProvider — Google AI Studio (Generative AI API) via ApiProtocol::GoogleGenerativeAi
  • GoogleVertexProvider — Google Cloud Vertex AI via ApiProtocol::GoogleVertex

Google AI Studio

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::ModelConfig;

let api_key = std::env::var("GOOGLE_API_KEY").unwrap();
let agent = BasicAgent::new(ModelConfig::google(
    "gemini-2.0-flash",
    "Gemini 2.0 Flash",
    &api_key,
));
}

API Details

  • Endpoint: {base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={api_key}
  • Auth: API key as query parameter
  • Default base URL: https://generativelanguage.googleapis.com
  • Default context window: 1,000,000 tokens

Message Format

Google uses a different message format than OpenAI/Anthropic:

phi-coreGoogle API
user roleuser role
assistant rolemodel role
Content::Text{"text": "..."}
Content::Image{"inlineData": {...}}
Content::ToolCall{"functionCall": {...}}
Message::ToolResult{"functionResponse": {...}}
System promptsystemInstruction field
Toolstools[].functionDeclarations[]

Streaming

Uses SSE format (alt=sse). Each chunk contains candidates with content.parts and optional usageMetadata.

Google Vertex AI

GoogleVertexProvider uses the same message format but with Vertex AI authentication and endpoints.

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::{ModelConfig, ApiProtocol};

// Vertex AI uses OAuth2 Bearer tokens as the api_key
let access_token = get_access_token(); // your OAuth2 helper
let agent = BasicAgent::new(ModelConfig {
    id: "gemini-2.0-flash".into(),
    name: "Gemini 2.0 Flash (Vertex)".into(),
    api: ApiProtocol::GoogleVertex,
    provider: "google_vertex".into(),
    base_url: "https://us-central1-aiplatform.googleapis.com".into(),
    api_key: access_token,
    ..Default::default()
});
}
  • Protocol: ApiProtocol::GoogleVertex
  • Auth: OAuth2 / service account credentials (Bearer token in api_key)
  • Endpoint pattern: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:streamGenerateContent

Amazon Bedrock Provider

Handles the AWS Bedrock ConverseStream API. Selected automatically when ModelConfig.api == ApiProtocol::BedrockConverseStream.

Usage

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::{ModelConfig, ApiProtocol};

// With static credentials in api_key: "ACCESS_KEY:SECRET_KEY" or "ACCESS_KEY:SECRET_KEY:SESSION_TOKEN"
let creds = std::env::var("AWS_BEDROCK_CREDENTIALS").unwrap_or_default();
let agent = BasicAgent::new(ModelConfig {
    id: "anthropic.claude-3-sonnet-20240229-v1:0".into(),
    name: "Claude Sonnet (Bedrock)".into(),
    api: ApiProtocol::BedrockConverseStream,
    provider: "bedrock".into(),
    base_url: "https://bedrock-runtime.us-east-1.amazonaws.com".into(),
    api_key: creds, // "access_key:secret_key[:session_token]", or "" for IAM roles
    ..Default::default()
});
}

Authentication

The api_key field uses a colon-separated format:

{access_key_id}:{secret_access_key}
{access_key_id}:{secret_access_key}:{session_token}

For IAM roles (e.g., EC2 instance profiles, ECS task roles), pass an empty api_key and provide pre-computed Authorization headers via ModelConfig.headers.

API Details

  • Endpoint: {base_url}/model/{model}/converse-stream
  • Default base URL: https://bedrock-runtime.us-east-1.amazonaws.com
  • Protocol: ApiProtocol::BedrockConverseStream

Message Format

Bedrock uses its own content block format:

phi-coreBedrock API
Content::Text{"text": "..."}
Content::Image{"image": {"format": "...", "source": {"bytes": "..."}}}
Content::ToolCall{"toolUse": {"toolUseId": "...", "name": "...", "input": ...}}
Message::ToolResult{"toolResult": {"toolUseId": "...", "content": [...], "status": "success"}}
System promptsystem array of text blocks
ToolstoolConfig.tools[].toolSpec
Max tokensinferenceConfig.maxTokens

Stream Events

Bedrock's ConverseStream returns these event types:

  • contentBlockStart — New content block (text or tool use)
  • contentBlockDelta — Text or tool use input delta
  • contentBlockStop — Block complete
  • messageStop — Stop reason (end_turn, max_tokens, tool_use)
  • metadata — Token usage

Azure OpenAI Provider

Handles the OpenAI Responses API format with Azure-specific authentication and URL patterns. Selected automatically when ModelConfig.api == ApiProtocol::AzureOpenAiResponses.

Usage

#![allow(unused)]
fn main() {
use phi_core::BasicAgent;
use phi_core::provider::{ModelConfig, ApiProtocol};

let api_key = std::env::var("AZURE_OPENAI_API_KEY").unwrap();
let agent = BasicAgent::new(ModelConfig {
    id: "gpt-4o".into(),
    name: "GPT-4o (Azure)".into(),
    api: ApiProtocol::AzureOpenAiResponses,
    provider: "azure_openai".into(),
    base_url: "https://my-resource.openai.azure.com/openai/deployments/my-deployment".into(),
    api_key,
    ..Default::default()
});
}

Authentication

Uses the api-key header (not Authorization: Bearer):

api-key: {your_api_key}

Additional headers can be set via ModelConfig.headers (e.g., for Azure AD Bearer tokens).

URL Format

https://{resource}.openai.azure.com/openai/deployments/{deployment}

Set this as ModelConfig.base_url. The provider appends /responses?api-version=2025-01-01-preview.

API Details

  • Protocol: ApiProtocol::AzureOpenAiResponses
  • Format: OpenAI Responses API (not Chat Completions)
  • Streaming: SSE with event types:
    • response.output_text.delta — Text content
    • response.function_call_arguments.start — Tool call start
    • response.function_call_arguments.delta — Tool call arguments
    • response.completed — Final usage data

Message Format

Uses the Responses API input format:

phi-coreAzure Responses API
User message{"role": "user", "content": "..."}
Assistant text{"type": "message", "role": "assistant", "content": [{"type": "output_text", ...}]}
Tool call{"type": "function_call", "call_id": "...", "name": "...", "arguments": "..."}
Tool result{"type": "function_call_output", "call_id": "...", "output": "..."}
System promptinstructions field

Built-in Tools

phi-core ships with six coding-oriented tools. Get them all with default_tools():

#![allow(unused)]
fn main() {
use phi_core::tools::default_tools;
let tools = default_tools();
}

BashTool

Execute shell commands with timeout and output capture.

  • Name: bash
  • Parameters: command (string, required)

Configuration

#![allow(unused)]
fn main() {
pub struct BashTool {
    pub cwd: Option<String>,           // Working directory
    pub timeout: Duration,             // Default: 120s
    pub max_output_bytes: usize,       // Default: 256KB
    pub deny_patterns: Vec<String>,    // Blocked commands
    pub confirm_fn: Option<ConfirmFn>, // Confirmation callback
}
}

Default deny patterns: rm -rf /, rm -rf /*, mkfs, dd if=, fork bomb.

Example

#![allow(unused)]
fn main() {
let bash = BashTool::default();
// Or customize:
let bash = BashTool {
    cwd: Some("/workspace".into()),
    timeout: Duration::from_secs(60),
    ..Default::default()
};
}

ReadFileTool

Read file contents with optional line range.

  • Name: read_file
  • Parameters: path (required), offset (optional, 1-indexed line), limit (optional, number of lines)

Configuration

#![allow(unused)]
fn main() {
pub struct ReadFileTool {
    pub max_bytes: usize,              // Default: 1MB
    pub allowed_paths: Vec<String>,    // Path restrictions (empty = no restriction)
}
}

WriteFileTool

Write content to a file. Creates parent directories automatically.

  • Name: write_file
  • Parameters: path (required), content (required)

EditFileTool

Surgical search/replace edits. The most important tool for coding agents — instead of rewriting entire files, the agent specifies exact text to find and replace.

  • Name: edit_file
  • Parameters: path (required), old_text (required), new_text (required)

The old_text must match exactly, including whitespace and indentation.

ListFilesTool

List files and directories with optional glob filtering.

  • Name: list_files
  • Parameters: path (optional, default: .), pattern (optional glob)

Configuration

#![allow(unused)]
fn main() {
pub struct ListFilesTool {
    pub max_results: usize,    // Default: 200
    pub timeout: Duration,     // Default: 10s
}
}

Uses find or fd for efficient traversal.

SearchTool

Search files using grep (or ripgrep if available).

  • Name: search
  • Parameters: pattern (required, regex), path (optional root directory)

Configuration

#![allow(unused)]
fn main() {
pub struct SearchTool {
    pub root: Option<String>,      // Root directory
    pub max_results: usize,        // Default: 50
    pub timeout: Duration,         // Default: 30s
}
}

Returns matching lines with file paths and line numbers.

PrunTool

Model-directed context pruning. Removes the oldest inrun_context entries (model-generated messages) from the working context to reclaim space in the context window. Pruned content is preserved in the session log.

  • Name: prun
  • Parameters: tokens (integer, required) -- approximate number of tokens to reclaim

The tool removes inrun_context entries oldest-first until the requested token budget is met. User messages are never affected. Returns a confirmation with the actual token count reclaimed.

Configuration

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(model_config)
    .with_prun_tool();  // enables both prun and prun_with_memo
}

PrunWithMemoTool

Context pruning with a summary replacement. Same removal behavior as prun, but inserts a concise memo at the position of the earliest pruned message so the model retains key takeaways.

  • Name: prun_with_memo
  • Parameters: tokens (integer, required) -- approximate number of tokens to reclaim; memo (string, required) -- concise summary to retain in working context

The memo appears at the original timestamp of the earliest pruned message, preserving conversation chronology. Useful when pruned content contained decisions or conclusions worth remembering.

See Context Pruning for the full design.

Configuration

AgentLoopConfig

The main configuration for the agent loop:

#![allow(unused)]
fn main() {
pub struct AgentLoopConfig {
    /// REQUIRED — Complete provider identity: model id, api_key, base_url, protocol, compat flags, cost rates.
    pub model_config: ModelConfig,
    /// Custom provider override. When Some, bypasses ProviderRegistry. Use for MockProvider in tests.
    pub provider_override: Option<Arc<dyn StreamProvider>>,
    /// Stable config identity for loop_id generation.
    pub config_id: Option<String>,
    pub thinking_level: ThinkingLevel,
    pub max_tokens: Option<u32>,
    pub temperature: Option<f32>,
    pub convert_to_llm: Option<ConvertToLlmFn>,
    pub transform_context: Option<TransformContextFn>,
    pub get_steering_messages: Option<GetMessagesFn>,
    pub get_follow_up_messages: Option<GetMessagesFn>,
    /// Context config (includes CompactionConfig with strategies and token counter).
    pub context_config: Option<ContextConfig>,
    pub execution_limits: Option<ExecutionLimits>,
    pub cache_config: CacheConfig,
    pub tool_execution: ToolExecutionStrategy,
    pub retry_config: RetryConfig,
    // ── Lifecycle callbacks ──
    pub before_turn: Option<BeforeTurnFn>,
    pub after_turn: Option<AfterTurnFn>,
    pub before_loop: Option<BeforeLoopFn>,
    pub after_loop: Option<AfterLoopFn>,
    pub before_tool_execution: Option<BeforeToolExecutionFn>,
    pub after_tool_execution: Option<AfterToolExecutionFn>,
    pub before_tool_execution_update: Option<BeforeToolExecutionUpdateFn>,
    pub after_tool_execution_update: Option<AfterToolExecutionUpdateFn>,
    /// Compaction lifecycle callbacks (G1).
    pub before_compaction_start: Option<BeforeCompactionStartFn>,
    pub after_compaction_end: Option<AfterCompactionEndFn>,
    pub on_error: Option<OnErrorFn>,
    pub input_filters: Vec<Arc<dyn InputFilter>>,
    pub first_turn_trigger: TurnTrigger,
    /// Context translation strategy for cross-provider compatibility (G8).
    pub context_translation: Option<Arc<dyn ContextTranslationStrategy>>,
    /// Shared state for PrunTool to communicate pruning requests to the loop.
    pub prun_pending: Option<Arc<Mutex<Vec<PrunRequest>>>>,
}
}

Note: Compaction strategies (in_memory_strategy, block_strategy) are fields on CompactionConfig (inside ContextConfig), not on AgentLoopConfig. The token_counter for pluggable token counting is also on ContextConfig.

StreamConfig

Internal config passed to StreamProvider::stream(). All provider identity comes from model_config:

#![allow(unused)]
fn main() {
pub struct StreamConfig {
    /// REQUIRED — full provider identity: id, api_key, base_url, compat, cost.
    pub model_config: ModelConfig,
    pub system_prompt: String,
    pub messages: Vec<Message>,
    pub tools: Vec<ToolDefinition>,
    pub thinking_level: ThinkingLevel,
    pub max_tokens: Option<u32>,  // overrides model_config.max_tokens when Some
    pub temperature: Option<f32>,
    pub cache_config: CacheConfig,
}
}

ContextConfig

Controls context window management and compaction:

#![allow(unused)]
fn main() {
pub struct ContextConfig {
    pub max_context_tokens: usize,                          // Default: 100,000
    pub system_prompt_tokens: usize,                        // Default: 4,000
    pub compaction: CompactionConfig,                       // Full compaction policy (nested)
    pub token_counter: Option<Arc<dyn TokenCounter>>,       // Pluggable token counting (REQ-162)
    // Legacy fields (backward compat — use compaction.* instead):
    pub keep_recent: usize,                                 // Default: 10
    pub keep_first: usize,                                  // Default: 2
    pub tool_output_max_lines: usize,                       // Default: 50
}
}

CompactionConfig

#![allow(unused)]
fn main() {
pub struct CompactionConfig {
    // WHEN to compact:
    pub compact_at_pct: f64,                                // Default: 0.90
    pub compact_budget_threshold_pct: f64,                  // Default: 0.05
    pub compaction_scope: CompactionScope,                  // Default: FixedCount(3)
    // HOW to compact:
    pub keep_first_turns: usize,                            // Default: 2
    pub keep_recent_turns: usize,                           // Default: 10
    pub max_summary_tokens: usize,                          // Default: 2,000
    pub tool_output_max_lines: usize,                       // Default: 50
    pub focus_message: Option<String>,                      // Guides summarization focus
    // Strategy objects (G5 — moved from AgentLoopConfig):
    pub in_memory_strategy: Option<Arc<dyn CompactionStrategy>>,
    pub block_strategy: Option<Arc<dyn BlockCompactionStrategy>>,
}
}

ExecutionLimits

Prevents runaway agents:

#![allow(unused)]
fn main() {
pub struct ExecutionLimits {
    pub max_turns: usize,              // Default: 50
    pub max_total_tokens: usize,       // Default: 1,000,000
    pub max_duration: Duration,        // Default: 600s
    pub max_cost: Option<f64>,         // Default: None (no cost cap)
}
}

ThinkingLevel

#![allow(unused)]
fn main() {
pub enum ThinkingLevel {
    Off,        // No thinking (default)
    Minimal,    // 128 tokens (Anthropic budget)
    Low,        // 512 tokens
    Medium,     // 2,048 tokens
    High,       // 8,192 tokens
}
}

CostConfig

Token pricing per million:

#![allow(unused)]
fn main() {
pub struct CostConfig {
    pub input_per_million: f64,
    pub output_per_million: f64,
    pub cache_read_per_million: f64,
    pub cache_write_per_million: f64,
}
}

API Reference

Top-Level Functions

agent_loop()

#![allow(unused)]
fn main() {
pub async fn agent_loop(
    prompts: Vec<AgentMessage>,
    context: &mut AgentContext,
    config: &AgentLoopConfig,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> Vec<AgentMessage>
}

Start an agent loop with new prompt messages. Returns all messages generated during the run.

agent_loop_continue()

#![allow(unused)]
fn main() {
pub async fn agent_loop_continue(
    context: &mut AgentContext,
    config: &AgentLoopConfig,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> Vec<AgentMessage>
}

Resume from existing context. The last message must not be an assistant message.

default_tools()

#![allow(unused)]
fn main() {
pub fn default_tools() -> Vec<Arc<dyn AgentTool>>
}

Returns: BashTool, ReadFileTool, WriteFileTool, EditFileTool, ListFilesTool, SearchTool.

Agent Trait

The runtime interface for all agent implementations. Programs against this trait to remain independent of the specific implementation.

#![allow(unused)]
fn main() {
use phi_core::Agent;  // trait must be in scope to call trait methods
}

Trait methods cover: prompting (prompt, prompt_messages, prompt_with_sender, prompt_messages_with_sender, continue_loop, continue_loop_with_sender), state access (messages, is_streaming, agent_id, session_id, last_loop_id), message mutation (clear_messages, append_message, replace_messages, save_messages, restore_messages, set_tools), control (abort, reset), and steering/follow-up queues (steer, follow_up, clear_steering_queue, clear_follow_up_queue, clear_all_queues, set_steering_mode, set_follow_up_mode).

The trait is object-safe: Box<dyn Agent> and &mut dyn Agent work for runtime polymorphism.

phi_core::* re-exports Agent, so use phi_core::* brings it into scope automatically.

BasicAgent Struct

The default in-memory Agent implementation. Owns a single linear message history, tool registry, and model configuration.

Construction

#![allow(unused)]
fn main() {
let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
));
}
SignatureDescription
BasicAgent::new(model_config: ModelConfig) -> SelfCreate a new agent with the given model configuration

Builder Methods

All return Self for chaining (unless noted as Result).

Core

MethodDescription
with_system_prompt(prompt) -> SelfSet the system prompt
with_thinking(level: ThinkingLevel) -> SelfSet thinking level (Off, Minimal, Low, Medium, High)
with_max_tokens(max: u32) -> SelfSet max output tokens
with_model_config(config: ModelConfig) -> SelfReplace the entire ModelConfig (id, api_key, base_url, compat, cost, etc.)
with_provider_override(provider: Arc<dyn StreamProvider>) -> SelfBypass ProviderRegistry dispatch and use this provider directly (primarily for testing with MockProvider)

Tools & Integrations

MethodDescription
with_tools(tools: Vec<Arc<dyn AgentTool>>) -> SelfSet tools (replaces existing)
with_sub_agent(sub: SubAgentTool) -> SelfAdd a sub-agent tool
with_skills(skills: SkillSet) -> SelfLoad skills and append their index to the system prompt
async with_mcp_server_stdio(command, args, env) -> Result<Self, McpError>Connect to MCP server via stdio and add its tools
async with_mcp_server_http(url) -> Result<Self, McpError>Connect to MCP server via HTTP and add its tools
async with_openapi_file(path, config, filter) -> Result<Self, OpenApiError>Load tools from an OpenAPI spec file (requires openapi feature)
async with_openapi_url(url, config, filter) -> Result<Self, OpenApiError>Fetch spec from URL and add tools (requires openapi feature)
with_openapi_spec(spec_str, config, filter) -> Result<Self, OpenApiError>Parse spec string and add tools (requires openapi feature)

Workspace & System Prompt

MethodDescription
with_workspace(path: impl Into<PathBuf>) -> SelfSet the agent's workspace directory

Context & Limits

MethodDescription
with_context_config(config: ContextConfig) -> SelfSet context compaction config
with_execution_limits(limits: ExecutionLimits) -> SelfSet execution limits (max turns, tokens, duration)
with_compaction_strategy(strategy: impl CompactionStrategy) -> SelfSet a custom compaction strategy
without_context_management() -> SelfDisable automatic context compaction and execution limits

Behavior

MethodDescription
with_messages(msgs: Vec<AgentMessage>) -> SelfPre-load message history
with_cache_config(config: CacheConfig) -> SelfSet prompt caching configuration
with_tool_execution(strategy: ToolExecutionStrategy) -> SelfSet tool execution strategy (Parallel, Sequential, Batched)
with_retry_config(config: RetryConfig) -> SelfSet retry configuration
with_input_filter(filter: impl InputFilter) -> SelfAdd an input filter (runs on user messages before LLM call)

Callbacks

MethodDescription
on_before_loop(f: Fn(&[AgentMessage], u64) -> bool) -> SelfCalled once before AgentStart; return false to abort the entire run
on_after_loop(f: Fn(&[AgentMessage], &Usage)) -> SelfCalled once after AgentEnd with all new messages and accumulated usage
on_before_turn(f: Fn(&[AgentMessage], usize) -> bool) -> SelfCalled before each LLM call; return false to abort
on_after_turn(f: Fn(&[AgentMessage], &Usage)) -> SelfCalled after each LLM response and tool execution
on_error(f: Fn(&str)) -> SelfCalled when the LLM returns StopReason::Error
on_before_tool_execution(f: Fn(&str, &str, &Value) -> bool) -> SelfCalled before each tool call (name, call_id, args); return false to skip
on_after_tool_execution(f: Fn(&str, &str, bool)) -> SelfCalled after each tool call (name, call_id, is_error)
on_before_tool_execution_update(f: Fn(&str, &str, &str) -> bool) -> SelfCalled before each streaming tool update (name, call_id, text); return false to suppress the event
on_after_tool_execution_update(f: Fn(&str, &str, &str)) -> SelfCalled after each streaming tool update (name, call_id, text)
on_before_compaction_start(f: Fn(usize, usize) -> bool) -> SelfCalled before compaction begins (estimated_tokens, message_count); return false to skip compaction
on_after_compaction_end(f: Fn(usize, usize, usize, usize)) -> SelfCalled after compaction completes (messages_before, messages_after, tokens_before, tokens_after)

Prompting

MethodDescription
async prompt(text) -> UnboundedReceiver<AgentEvent>Send a text prompt, returns event stream
async prompt_messages(messages) -> UnboundedReceiver<AgentEvent>Send messages as prompt
async prompt_with_sender(text, tx: UnboundedSender<AgentEvent>)Send a text prompt, streaming events to a caller-provided sender for real-time consumption
async prompt_messages_with_sender(messages, tx)Send messages, streaming events to a caller-provided sender
async continue_loop() -> UnboundedReceiver<AgentEvent>Resume from current context with ContinuationKind::Default. continuation_kind on AgentStart is ContinuationKind (not Option).
async continue_loop_with_sender(tx: UnboundedSender<AgentEvent>, kind: ContinuationKind)Resume from current context with an explicit continuation kind, streaming events to a caller-provided sender

State Access

MethodDescription
messages() -> &[AgentMessage]Get the full message history
is_streaming() -> boolWhether the agent is currently running
agent_id() -> &strStable UUID assigned at construction; included in every AgentStart event
session_id() -> &strStable UUID assigned at construction; groups all loops from this Agent instance
last_loop_id() -> Option<&str>The loop_id of the most recently started loop; None before first run
workspace() -> Option<&Path>The agent's workspace directory, if set (Agent trait method)

State Mutation

MethodDescription
set_tools(tools: Vec<Arc<dyn AgentTool>>)Replace the tool set
clear_messages()Clear all messages
append_message(msg: AgentMessage)Add a message to history
replace_messages(msgs: Vec<AgentMessage>)Replace all messages
save_messages() -> Result<String, serde_json::Error>Serialize message history to JSON
restore_messages(json: &str) -> Result<(), serde_json::Error>Restore message history from JSON

Steering & Follow-Up Queues

MethodDescription
steer(msg: AgentMessage)Queue a steering message (interrupts mid-tool-execution)
follow_up(msg: AgentMessage)Queue a follow-up message (processed after agent finishes)
clear_steering_queue()Clear pending steering messages
clear_follow_up_queue()Clear pending follow-up messages
clear_all_queues()Clear both queues
set_steering_mode(mode: QueueMode)Set delivery mode: OneAtATime or All
set_follow_up_mode(mode: QueueMode)Set delivery mode: OneAtATime or All

Control

MethodDescription
abort()Cancel the current run via CancellationToken
reset()Clear all state (messages, queues, streaming flag)

Session Callback Types

TypeSignatureDescription
BeforeTaskFnArc<dyn Fn(&Session) -> bool + Send + Sync>Called on first AgentStart with a new session_id. Parameter is the Session. Return false to reject.
AfterTaskFnArc<dyn Fn(&Session) + Send + Sync>Called in flush() when the session is finalized. Parameter is the completed Session.

These are set on SessionRecorderConfig and fire at the session level (not per-loop). See Sessions for usage.

Re-exports

The crate re-exports key types from lib.rs:

#![allow(unused)]
fn main() {
// Agent system
pub use agents::{Agent, AgentProfile, BasicAgent, QueueMode};
pub use agents::SubAgentTool;

// Agent loop
pub use agent_loop::{agent_loop, agent_loop_continue, agent_loop_parallel};
pub use agent_loop::evaluation::{
    ElaborateEvaluation, LlmJudgeEvaluation, PickFirstEvaluation,
    TokenEfficientEvaluation, TransparentEvaluation,
};

// Config-driven construction
pub use config::{
    agent_from_config, agent_from_config_with_registry, agents_from_config,
    parse_config, parse_config_file, AgentConfig, ConfigError, ConfigFormat,
};

// Context management
pub use context::{
    CompactionStrategy, CompactionConfig, CompactionScope, ContextConfig,
    DefaultCompaction, DefaultBlockCompaction, BlockCompactionStrategy,
    ContextTracker, CompactionBlock, CompactedSection, TurnMap, TurnRange,
    build_context_from_session, compact_session_loops,
};
pub use context::skills::SkillSet;

// Session persistence
pub use session::{
    Session, SessionRecorder, SessionRecorderConfig, SessionScope, SessionError,
    LoopRecord, LoopEvent, LoopStatus, Turn, LoopConfigSnapshot,
    ParallelGroupRecord, ChildLoopRef, SpawnRef, SessionFormation,
    save_session, load_session, list_session_ids, delete_session, load_sessions_for_agent,
};

// Provider
pub use provider::retry::RetryConfig;

// Types (glob re-export)
pub use types::*;  // Message, Content, AgentMessage, AgentEvent, Usage, LlmMessage,
                    // StopReason, StreamDelta, TurnTrigger, ThinkingLevel, CacheConfig, etc.
}

0.7.0 additions (reachable via module paths)

These symbols were added in 0.7.0 but are not (yet) part of the top-level glob. Import them via their module path:

#![allow(unused)]
fn main() {
// Session: trait-based pluggable store + atomic-write filesystem impl with
// advisory locks (fs2 exclusive lock; returns SessionError::Locked on contention).
use phi_core::session::{SessionStore, FileSystemSessionStore};

// Provider: credential refresh hook for long-running agents whose token expires
// mid-run. On ProviderError::Auth, the loop invalidates the cached credential
// and retries once before propagating.
use phi_core::provider::{CredentialProvider, StaticCredentialProvider};

// Provider: structured-output contract. JsonObject = free-form JSON;
// JsonSchema = strict schema (native where supported, tool-call emulation on
// Anthropic and Anthropic-on-Bedrock, SchemaMismatch on others).
use phi_core::provider::ResponseFormat;

// Agent: fallible build_config(). Default impl returns Err(MissingModelConfig)
// instead of panicking when model_config() is None.
use phi_core::agents::AgentBuildError;

// MCP: configurable per-request timeout (default 30s) on both stdio + HTTP
// transports. Use McpClientConfig with connect_stdio_with_config /
// connect_http_with_config.
use phi_core::mcp::{McpClientConfig, DEFAULT_REQUEST_TIMEOUT};
}

phi-core — Project Overview

1. Purpose Statement

phi-core is a Rust async library for building stateful, multi-turn LLM agents that can autonomously execute tools to accomplish tasks. The library solves the core engineering problems of agent construction: routing between many LLM provider APIs through a unified interface, running a prompt-then-tool-call loop until the model signals completion, streaming real-time events to UI consumers, and automatically managing context windows so conversations do not exceed model token limits. It is designed to be embedded as a dependency in application code — it provides no standalone binary, no HTTP server, and no user interface of its own.

2. Key Capabilities

CapabilitySource Location
Multi-turn conversation loop (prompt → LLM → tool call → repeat)src/agent_loop/
Support for 20+ LLM providers via 7 distinct API protocolssrc/provider/
Real-time event streaming over an async channelsrc/types/ (AgentEvent), src/agent_loop/
Parallel, sequential, or batched tool executionsrc/agent_loop/:execute_tool_calls()
Context compaction via CompactionBlock overlays (legacy: tiered compact_messages())src/context/ — compaction is now modeled via CompactionBlock
Built-in coding tools: bash execution, file read/write/edit, directory listing, grep searchsrc/tools/
Sub-agent delegation: run an isolated child agent as a toolsrc/agents/sub_agent.rs
Model Context Protocol (MCP) client for stdio and HTTP tool serverssrc/mcp/
AgentSkills system: load instruction sets from directory-based skill filessrc/context/skills.rs
OpenAPI tool auto-generation from spec files or URLs (optional feature)src/openapi/
JSON serialization of entire conversation history for persistencesrc/types/ (all types derive Serialize/Deserialize)
Exponential-backoff retry for rate-limit and network errorssrc/provider/retry.rs
Prompt caching hints for compatible providers (Anthropic)src/types/ (CacheConfig)
Extended thinking / reasoning modesrc/types/ (ThinkingLevel)
Lifecycle callbacks: before/after each turn, on errorsrc/agent_loop/ (BeforeTurnFn, AfterTurnFn, OnErrorFn)
Loop-level hooks: setup/teardown around each complete agent runsrc/agent_loop/ (BeforeLoopFn, AfterLoopFn)
Tool-level hooks: intercept each tool execution and streaming updatesrc/agent_loop/ (BeforeToolExecutionFn, AfterToolExecutionFn, BeforeToolExecutionUpdateFn, AfterToolExecutionUpdateFn)
Agent identity: stable agent_id / session_id / loop_id for cross-loop traceabilitysrc/agents/basic_agent.rs, src/types/
Evaluational parallelism: agent_loop_parallel() runs N AgentLoopConfigs concurrently on the same prompt, evaluates results via the pluggable EvaluationStrategy trait, and delivers the best outcome. Built-in strategies: TransparentEvaluation, PickFirstEvaluation, TokenEfficientEvaluation, ElaborateEvaluation, LlmJudgeEvaluation (with iterative compaction to satisfy judge's comprehension criteria). ParallelLoopStart/ParallelLoopEnd events bracket execution. Session continuity: selected_context feeds directly into agent_loop_continue().src/agent_loop/ (agent_loop_parallel), src/agent_loop/evaluation.rs, src/types/
Continuation kinds: Initial, Default, Rerun, Branch, Compaction variants for origin, retry, explore, and compaction semanticssrc/types/ (ContinuationKind), src/agent_loop/
Input filtering: moderation, PII redaction, injection detectionsrc/types/ (InputFilter)
User steering mid-run: inject messages between tool callssrc/agents/basic_agent.rs (steering queue), src/agent_loop/
Follow-up work queuing: append more tasks after agent would stopsrc/agents/basic_agent.rs (follow-up queue), src/agent_loop/
Execution limits: max turns, max total tokens, max durationsrc/context/ (ExecutionLimits, ExecutionTracker)

3. Inputs & Outputs

Inputs

InputFormatDescription
User promptVec<AgentMessage> or StringText (or multi-content) messages to start or continue a conversation
System promptStringInstruction set defining agent behavior, injected at each LLM call
Tool definitionsVec<Box<dyn AgentTool>>Executable tools exposed to the LLM via JSON Schema
LLM provider configModelConfigSingle provider identity card: id, api_key, base_url, api: ApiProtocol, cost, compat. Factory methods: ModelConfig::anthropic(), ::openai(), ::local(), ::google(), ::openrouter(). Pass to BasicAgent::new() or AgentLoopConfig.model_config.
Steering messagesVec<AgentMessage> via queueUser-injected messages that interrupt mid-run tool execution
Follow-up messagesVec<AgentMessage> via queueQueued tasks appended when the agent would otherwise stop
Context configContextConfigToken budget, compaction parameters
Execution limitsExecutionLimitsMax turns, tokens, duration
Skill directoriesVec<Path>Directories containing SKILL.md files
MCP server commandsCommand string, args, envStdio or HTTP MCP server specifications
OpenAPI specFile path, URL, or YAML/JSON stringAPI specs to auto-generate tools from
Cancellation tokenCancellationTokenExternal abort signal

Outputs

OutputFormatDescription
Agent event streamUnboundedReceiver<AgentEvent>Real-time stream of all events (text deltas, tool calls, results, errors)
Final messagesVec<AgentMessage>All new messages produced in the run (returned from agent_loop())
Serialized conversationJSONComplete message history, serializable for persistence
Tool resultsEmbedded in AgentEvent::ToolExecutionEndStructured result of each tool call
Usage statisticsUsage struct per turnInput/output/cache token counts per LLM call

4. Actors & Use Cases

Application Developer

The primary consumer. Embeds phi-core as a library dependency.

Use CaseHow Triggered
Build a coding assistantCreate Agent, attach built-in tools, call agent.prompt("...")
Build a CLI REPLLoop reading stdin, call agent.prompt(), render events (see examples/cli.rs)
Persist conversation across sessionsCall agent.save_messages() → JSON → agent.restore_messages()
Run a task autonomously with limitsSet ExecutionLimits, observe AgentEvent::AgentEnd
Interrupt a running agentCall agent.steer(message) while event loop is running
Chain specialized agentsAttach SubAgentTool instances to a parent agent
Use third-party toolsConnect to an MCP server via agent.with_mcp_server_stdio()
Expose a REST API as toolsLoad OpenAPI spec via agent.with_openapi_file()

End User (via application)

Interacts through the application wrapping this library. Uses cases match what the application exposes (e.g., CLI prompts in examples/cli.rs: /quit, /clear, /model).

LLM Provider

External service receiving structured HTTP requests. The library sends conversation history and tool schemas; the provider returns streaming token deltas and final messages. Providers never call back into the library.

MCP Server

External process exposing tools over the Model Context Protocol. The library connects as a client via stdio pipe or HTTP. The server exposes tool definitions that are adapted into AgentTool instances.

Sub-Agent

A child instance of the agent loop spawned internally when a SubAgentTool is called. Operates with its own fresh context and toolset. Results are returned to the parent as a ToolResult.

5. Constraints & Non-Goals

  • No built-in HTTP server. The library is embeddable only; serving the agent over HTTP requires external frameworks.
  • No user interface. UI rendering (text display, color, input handling) is the application's responsibility (see examples/cli.rs for a reference implementation).
  • No authentication management. API keys must be supplied by the caller. The library does not fetch, rotate, or cache credentials.
  • Single event consumer per run. agent_loop() returns a single UnboundedReceiver<AgentEvent>. Fan-out to multiple consumers requires application-level bridging.
  • No agent-to-agent networking. Sub-agents run in-process only. No remote agent delegation.
  • No persistent storage. Conversation state is held in memory. Serialization to disk is the caller's responsibility (the library provides serialize/deserialize helpers).
  • No built-in precision token counting. The default HeuristicTokenCounter uses 4 characters per token. A pluggable TokenCounter trait (src/context/token.rs) allows callers to supply a custom counter (e.g., tiktoken-based), but no precision implementation ships with the library.
  • No multi-modal generation. Images can be sent to the model (as Content::Image), but image generation is not supported.
  • No structured output / JSON mode. The library passes raw messages; enforcing structured output is the caller's responsibility via system prompt.
  • Skipped tools on steering. When steering messages arrive mid-batch, remaining tool calls in that batch are skipped with an error result — their outputs are never computed. This is a documented behavior, not a bug.

6. Key Terminology Glossary

TermDefinition
AgentThe runtime interface trait (src/agents/agent.rs). Programs against this trait to remain independent of the specific implementation. BasicAgent (src/agents/basic_agent.rs) is the default in-memory implementation: owns conversation history, tools, ModelConfig (provider identity + auth + cost), and configuration. Construction: BasicAgent::new(ModelConfig::anthropic(...)). The application-facing entry point.
Agent LoopThe recursive execution cycle (src/agent_loop/) that calls the LLM, processes tool calls, checks steering, and repeats until the LLM stops or limits are hit.
TurnOne complete LLM call plus the resulting tool executions. Bounded by TurnStart/TurnEnd events. Materialized as a Turn struct on LoopRecord.turns (src/session/model.rs).
SteeringA Vec<AgentMessage> injected into the running loop between tool executions. Used to redirect the agent mid-task without restarting it.
Follow-upA Vec<AgentMessage> queued to be injected after the agent would naturally stop. Extends the run without creating a new agent_loop() call.
ModelConfigThe single, complete description of a provider connection (src/provider/model.rs). Fields: id (model name sent to API), name (display label), api: ApiProtocol (wire-protocol dispatch key), provider (logging label), base_url, api_key, cost: CostConfig, headers, compat: Option<OpenAiCompat>. Factory methods: anthropic(), openai(), local(), google(), openrouter(). Passed to BasicAgent::new(), SubAgentTool::new(), and AgentLoopConfig.model_config.
ApiProtocolEnum that selects which HTTP wire format to use: AnthropicMessages, OpenAiCompletions, OpenAiResponses, AzureOpenAiResponses, GoogleGenerativeAi, GoogleVertex, BedrockConverseStream. Used by ProviderRegistry as a dispatch key.
StreamProviderThe trait (src/provider/traits.rs) that any LLM backend must implement. Has a single method stream() that takes a StreamConfig and sends StreamEvents.
AgentToolThe trait (src/types/) that any executable tool must implement. Methods: name(), label(), description(), parameters_schema(), execute().
ToolContextA struct passed to AgentTool::execute() containing the call ID, name, cancellation token, and optional progress callbacks.
AgentEventThe streaming event enum emitted to the consumer during a run. Covers agent lifecycle, turn lifecycle, message streaming, and tool execution.
StreamDeltaA partial content update emitted during LLM streaming: Text, Thinking, or ToolCallDelta.
StopReasonWhy the LLM ended its response. Variants: Stop (natural end), Length (token limit), ToolUse (returned tool calls), Error (failure), Aborted (cancellation), MaxTurns, UserStop, Handoff, GuardRail, ContextCompacted, Paused.
AgentMessageThe top-level message enum stored in the conversation history. Either Llm(LlmMessage) (sent to the LLM; LlmMessage wraps Message + optional TurnId for turn tracking) or Extension(ExtensionMessage) (app-only metadata).
MessageThe LLM-protocol message enum: User, Assistant, or ToolResult.
ContentA single content block within a message: Text, Image (base64), Thinking, or ToolCall.
UsageToken count metadata returned with each Assistant message: input, output, cache_read, cache_write, total_tokens.
ContextConfigConfiguration for the automatic context compaction: token budget, lines-to-keep per tool output, number of recent/first messages to preserve.
CompactionStrategyA trait for customizing how messages are compacted when the token budget is exceeded. The default implementation uses 3 tiers.
CompactionBlockThe model used by the compaction system to represent compacted message regions. Replaces the previous inline approach in compact_messages() with a structured block-based representation.
ExecutionLimitsHard caps on agent execution: max_turns, max_total_tokens, max_duration, max_cost: Option<f64>. When exceeded, the loop appends a system message and stops.
ToolExecutionStrategyHow multiple tool calls from one LLM response are dispatched: Sequential, Parallel (default), or Batched { size }.
CacheConfig / CacheStrategyControls prompt caching breakpoint placement for providers that support it (Anthropic). Strategies: Auto, Disabled, Manual.
ThinkingLevelControls extended reasoning depth: Off, Minimal, Low, Medium, High. Translated to provider-specific parameters.
AgentSkillsA directory-based system for loading instruction files (SKILL.md) that extend agent capabilities. Compatible with the AgentSkills open standard.
MCPModel Context Protocol. A standard for tool servers that communicate over stdio or HTTP. The library acts as an MCP client.
SubAgentToolAn AgentTool implementation that, when called by the parent LLM, spawns a complete child agent_loop() with isolated context.
InputFilterA synchronous trait applied to user text before the LLM call. Returns Pass, Warn(text) (appended to message), or Reject(reason) (aborts run).
ExtensionMessageAn AgentMessage variant that is not sent to the LLM. Used for application-specific metadata (UI state, notifications) stored in conversation history.
ContextTrackerTracks context token usage using a hybrid of real provider-reported counts and local heuristic estimates for messages since the last report.
ProviderErrorThe error enum returned by StreamProvider::stream(). Variants: Api, Network, Auth, RateLimited, ContextOverflow, Cancelled, Other.
ToolDefinitionA schema-only description of a tool sent to the LLM (name, description, JSON Schema parameters). Does not include the execute function.
RetryConfigExponential-backoff configuration for retrying RateLimited and Network provider errors.
AgentLoopConfigA flat configuration struct passed to agent_loop() / agent_loop_continue() bundling all behavioral settings. Required field: model_config: ModelConfig (provider identity, auth, cost rates). Optional provider_override: Option<Arc<dyn StreamProvider>> bypasses registry dispatch (used in tests).
QueueModeControls how queued messages (steering/follow-ups) are consumed per read. OneAtATime (default): pops only the first queued message. All: drains the entire queue at once.
McpContentA content item returned by an MCP tool call. Variants: Text { text } and Image { data: base64, mimeType }.
OpenApiAuthAuthentication method for OpenAPI requests. Variants: None, Bearer(token), ApiKey { header, value }. Token/value is redacted in debug output.
OperationFilterControls which OpenAPI operations become tools. Variants: All, ByOperationId, ByTag, ByPathPrefix. Operations without an operationId are always skipped.
agent_idA UUID v4 string generated once when Agent::new() is called. Stable for the lifetime of the Agent instance. Included in every AgentStart event to identify which agent produced the run.
session_idA UUID v4 string generated once when Agent::new() is called. Groups all loops (origin + continuations) that belong to one logical session. Stable for the lifetime of the Agent instance.
loop_idA string of the form "{session_id}.{config_id}.{N}" that uniquely identifies one agent_loop / agent_loop_continue call. The config_id segment is either caller-supplied or auto-derived from provider + model + thinking level. N is a per-config_id monotonic counter. Included in every AgentStart event.
ContinuationKindLabels how an agent_loop or agent_loop_continue call relates to prior loops. Set on AgentContext.continuation_kind before calling. Variants: Initial (origin agent_loop call; the #[default]), Default (unspecified continuation), Rerun { tag } (retry the same scenario from an equivalent context), Branch { tag } (explore a different execution path), Compaction (context-compacted continuation). Tags are RFC 3339 UTC timestamps. Surfaced in AgentStart.continuation_kind.
TurnTriggerIdentifies what caused a turn to begin. Emitted in TurnStart.triggered_by. Variants: User (first turn of an Initial continuation — i.e., origin agent_loop call), SubAgent (running as a sub-agent via SubAgentTool), Continuation (subsequent turns, tool round-trips, Default/Rerun continuations, and steering-injected turns; renamed from FollowUp), Branch (first turn of a ContinuationKind::Branch continuation).
BeforeLoopFn / AfterLoopFnLoop-level lifecycle hooks on AgentLoopConfig. BeforeLoopFn fires before AgentStart — return false to abort the run before it begins. AfterLoopFn fires after AgentEnd with the new messages and accumulated usage.
BeforeToolExecutionFn / AfterToolExecutionFnTool-level lifecycle hooks on AgentLoopConfig. BeforeToolExecutionFn fires before ToolExecutionStart — return false to skip the tool call. AfterToolExecutionFn fires after ToolExecutionEnd with the tool name, call ID, and error flag.
BeforeToolExecutionUpdateFn / AfterToolExecutionUpdateFnStreaming tool update hooks on AgentLoopConfig. Fire around each ToolExecutionUpdate event emitted when a tool calls ctx.on_update(partial). BeforeToolExecutionUpdateFn returns false to suppress the event (tool keeps running; final ToolResult is unaffected). AfterToolExecutionUpdateFn fires after the event if not suppressed.

Architecture Overview

For detailed component specifications, trait signatures, sequence diagrams, and data models, see the full Architecture Spec. For formal algorithm descriptions, see Algorithms.

Layered Design

phi-core is organized as three conceptual layers within a single crate. Dependencies flow strictly downward — upper layers use lower layers, never the reverse.

┌─────────────────────────────────────────────┐
│  Layer 3: Orchestration          (planned)   │
│  Multi-agent, delegation, work modes         │
├─────────────────────────────────────────────┤
│  Layer 2: Agent + Providers                  │
│  Concrete providers, tools, retry, caching,  │
│  context management, MCP                     │
├─────────────────────────────────────────────┤
│  Layer 1: Core Loop                          │
│  agent_loop, types, traits                   │
│  Provider-agnostic. Tool-agnostic.           │
└─────────────────────────────────────────────┘

Layer 1: Core Loop

The pure agent loop. No opinions about LLMs, no built-in tools. Just the control flow.

Modules: types/, agent_loop/, provider/traits.rs

Owns:

  • agent_loop() / agent_loop_continue() — the loop itself
  • AgentTool trait — interface tools must implement
  • StreamProvider trait — interface providers must implement
  • AgentMessage, AgentEvent, StreamDelta — message & event types
  • AgentContext — system prompt + messages + tools
  • Tool execution strategies (parallel/sequential/batched)
  • Streaming tool output (ToolUpdateFn)
  • Steering & follow-up message injection

Does not own: Any concrete provider or tool implementation.

Layer 2: Agent + Providers

Batteries-included single-agent layer. Most users interact with this.

Modules: agents/, context/, provider/*.rs, tools/*.rs, mcp/*.rs

Adds on top of Layer 1:

  • Concrete providers — Anthropic, OpenAI-compat, Google, Azure, Bedrock, Vertex
  • Provider registry — dispatch by API protocol
  • Context translation — cross-provider content type compatibility (G8)
  • Prompt caching — automatic cache breakpoint placement
  • Retry with backoff — exponential, jitter, respects retry-after
  • Context management — token estimation, compaction, execution limits, cost tracking
  • AgentProfile + SystemPromptStrategy — reusable agent blueprints with multi-block prompt composition
  • Config-driven construction — TOML/JSON/YAML → agent_from_config()Arc<dyn Agent>
  • Built-in tools — bash, read_file, write_file, edit_file, list_files, search, prun (context pruning)
  • Tool registry — name-based tool resolution from config
  • Session persistence — SessionRecorder materializes Turn structs from events
  • MCP client — stdio + HTTP transports, tool adapter
  • Agent trait — the runtime interface (prompting, state, control, ~40 methods)
  • BasicAgent struct — default in-memory implementation of Agent; stateful builder
  • SubAgentTool — delegates tasks to a child agent_loop() as a tool

Layer 3: Orchestration (planned)

Multi-agent coordination. Not yet implemented — the architecture is designed to support it when needed.

Planned capabilities:

  • Orchestrator struct — spawn, delegate, and coordinate multiple agents
  • Work modes:
    • Interactive — multi-turn, human in the loop (current default)
    • Autonomous — runs to completion without input (background tasks, CI)
    • Pipeline — input → output, chainable (scan → fix → verify)
    • Supervisor — delegates to other agents, synthesizes results
  • Fan-out — same task to multiple agents (different providers for diversity)
  • Pipeline chaining — output of agent A feeds input of agent B
  • Agent communication through the orchestrator event bus

Why not yet: Multi-agent orchestration adds complexity. The single-agent loop handles 95% of use cases. Layer 3 will be built when a concrete use case drives it, not speculatively.


Module Layout

phi-core/
├── src/
│   ├── lib.rs                     # Public re-exports
│   │
│   │── Layer 1: Core Loop ─────────────────────
│   ├── types/
│   │   ├── mod.rs                 # Re-exports, Message, AgentMessage
│   │   ├── content.rs             # Content enum (Text, Image, Thinking, ToolCall), StopReason
│   │   ├── extension.rs           # ExtensionMessage
│   │   ├── agent_message.rs       # AgentMessage enum, LlmMessage (Message + TurnId)
│   │   ├── usage.rs               # Usage, CacheConfig, CacheStrategy, ThinkingLevel
│   │   ├── tool.rs                # AgentTool trait, ToolDefinition, ToolContext
│   │   ├── event.rs               # AgentEvent enum, TurnTrigger, StreamDelta
│   │   ├── context.rs             # AgentContext, InRunEntry (2-stream pruning)
│   │   └── parallel.rs            # ToolExecutionStrategy
│   ├── agent_loop/
│   │   ├── core.rs                # agent_loop(), agent_loop_continue()
│   │   ├── run.rs                 # run_loop() — inner turn engine
│   │   ├── streaming.rs           # stream_assistant_response() — LLM call + retry
│   │   ├── tools.rs               # execute_tool_calls()
│   │   ├── config.rs              # AgentLoopConfig, callback type aliases
│   │   ├── helpers.rs             # Input filtering, message conversion
│   │   ├── parallel.rs            # agent_loop_parallel()
│   │   ├── evaluation.rs          # EvaluationStrategy trait + 5 built-in strategies
│   │   └── script_callback.rs     # ScriptCallback for shell/Python hooks
│   │
│   │── Layer 2: Agent + Providers ─────────────
│   ├── agents/
│   │   ├── agent.rs               # Agent trait (runtime interface, ~40 methods)
│   │   ├── basic_agent.rs         # BasicAgent struct (default in-memory impl)
│   │   ├── profile.rs             # AgentProfile struct
│   │   ├── system_prompt.rs       # SystemPromptStrategy, SystemPrompt, PromptBlockDef
│   │   └── sub_agent.rs           # SubAgentTool (child agent_loop as a tool)
│   ├── config/
│   │   ├── schema.rs              # AgentConfig + all TOML/JSON/YAML config sections
│   │   ├── builder.rs             # agent_from_config(), agents_from_config()
│   │   ├── parser.rs              # Multi-format parsing + env var substitution
│   │   └── reference.rs           # {{...}} ID reference protocol parser
│   ├── context/
│   │   ├── config.rs              # ContextConfig, CompactionConfig, CompactionScope
│   │   ├── compaction.rs          # CompactionBlock, CompactedSection
│   │   ├── compact_messages.rs    # compact_messages() — legacy tiered compaction
│   │   ├── strategy.rs            # CompactionStrategy, BlockCompactionStrategy traits
│   │   ├── orchestration.rs       # compact_session_loops(), build_context_from_session()
│   │   ├── execution.rs           # ExecutionLimits, ExecutionTracker
│   │   ├── tracker.rs             # ContextTracker (hybrid token counting)
│   │   ├── token.rs               # TokenCounter trait, HeuristicTokenCounter
│   │   └── skills.rs              # SkillSet (SKILL.md loader)
│   ├── session/
│   │   ├── model.rs               # Session, LoopRecord, Turn, LoopStatus
│   │   ├── recorder.rs            # SessionRecorder (event → session state machine)
│   │   ├── storage.rs             # save_session(), load_session(), list/delete
│   │   └── helpers.rs             # Internal utilities
│   ├── provider/
│   │   ├── traits.rs              # StreamProvider trait, StreamEvent, ProviderError
│   │   ├── model.rs               # ModelConfig, ApiProtocol, OpenAiCompat
│   │   ├── registry.rs            # ProviderRegistry (protocol → provider)
│   │   ├── retry.rs               # Retry with exponential backoff
│   │   ├── context_translation.rs # ContextTranslationStrategy (G8)
│   │   ├── anthropic.rs           # Anthropic Messages API
│   │   ├── openai_compat.rs       # OpenAI Chat Completions (15+ providers)
│   │   ├── openai_responses.rs    # OpenAI Responses API
│   │   ├── google.rs              # Google Generative AI
│   │   ├── google_vertex.rs       # Google Vertex AI
│   │   ├── bedrock.rs             # AWS Bedrock ConverseStream
│   │   ├── azure_openai.rs        # Azure OpenAI
│   │   ├── mock.rs                # Mock provider for testing
│   │   └── sse.rs                 # SSE utilities
│   ├── tools/
│   │   ├── bash.rs                # BashTool
│   │   ├── file.rs                # ReadFileTool, WriteFileTool
│   │   ├── edit.rs                # EditFileTool
│   │   ├── list.rs                # ListFilesTool
│   │   ├── search.rs              # SearchTool
│   │   ├── prun.rs                # PrunTool, PrunWithMemoTool (context pruning)
│   │   └── registry.rs            # ToolRegistry (name → factory)
│   ├── mcp/
│   │   ├── client.rs              # MCP client (stdio + HTTP)
│   │   ├── tool_adapter.rs        # McpToolAdapter (MCP tool → AgentTool)
│   │   ├── transport.rs           # Transport implementations
│   │   └── types.rs               # MCP protocol types
│   └── openapi/                   # (feature-gated: "openapi")
│       ├── adapter.rs             # OpenApiToolAdapter
│       └── types.rs               # OpenApiConfig, OperationFilter

Data Flow

                    ┌─────────────┐
                    │   Caller    │
                    └──────┬──────┘
                           │ prompt / prompt_messages
                    ┌──────▼──────┐
                    │ BasicAgent  │  Layer 2: stateful wrapper
                    │ (agents/)   │  Manages queues, tools, state
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │ agent_loop  │  Layer 1: core loop
                    │             │  Prompt → LLM → Tools → Repeat
                    └──┬───────┬──┘
                       │       │
              ┌────────▼──┐ ┌──▼────────┐
              │ Provider  │ │   Tools   │  Layer 2: implementations
              │ .stream() │ │ .execute()│
              └────────┬──┘ └──┬────────┘
                       │       │
              ┌────────▼──┐ ┌──▼────────┐
              │ LLM API   │ │ OS / FS   │
              │ (HTTP)    │ │ (shell)   │
              └───────────┘ └───────────┘

Events flow back via mpsc::UnboundedSender<AgentEvent>

How Providers Plug In

  1. Implement StreamProvider trait (Layer 1 interface)
  2. Register with ProviderRegistry under an ApiProtocol (Layer 2)
  3. Set ModelConfig.api to match that protocol
  4. The registry dispatches stream() calls to the right provider

Each provider translates between phi-core's Message/Content types and the provider's native API format. All providers emit StreamEvents through the channel for real-time updates.

How Tools Plug In

  1. Implement AgentTool trait (Layer 1 interface)
  2. Add to the tools vec (via default_tools() or custom)
  3. The agent loop converts tools to ToolDefinition (name, description, schema) for the LLM
  4. When the LLM returns Content::ToolCall, the loop finds the matching tool and calls execute()
  5. Results are wrapped in Message::ToolResult and added to context

Tools receive a CancellationToken child token — they should check it for cooperative cancellation during long operations.

Design Principles

  • Layers are conceptual, not physical. One crate, clean module boundaries, no feature flags needed.
  • Dependencies flow down. Layer 1 never imports from Layer 2. Layer 2 never imports from Layer 3.
  • Layer 1 is stable. The core loop and traits change rarely. New features are added in Layer 2 or 3.
  • Build what's needed. Layer 3 is designed but not implemented. It will be built when a use case demands it, not speculatively.
  • Simple over clever. A straightforward loop with good defaults beats an elegant abstraction nobody can debug.

First Principles: Core vs External

phi-core is a library, not a framework. These principles determine what belongs inside the crate and what should be built on top of it by consumers.

A feature belongs in phi-core if:

  1. All agents need it — every consumer would re-implement it independently. The agent loop, message types, event stream, and tool trait are universal primitives.
  2. Requires deep loop integration — it needs hooks inside the turn cycle that callbacks alone can't provide cleanly. Compaction, execution limits, and streaming are examples.
  3. Defines the contract — traits and interfaces that standardize how consumers extend the system. StreamProvider, AgentTool, CompactionStrategy, and InputFilter are extension contracts.
  4. Fragmentation risk — if consumers implement it differently, interoperability breaks. Session format, event vocabulary, and message types must be shared.
  5. Cross-cutting — it touches multiple modules and can't be layered on top without forking the crate.

A feature should be external if:

  1. Application-specific — workflows, domain tools, business logic, UI patterns.
  2. Infrastructure — databases, web servers, authentication, deployment, CI/CD.
  3. Opinionated — reasonable projects would choose differently. Vector databases, tracing backends, embedding models, and memory strategies are consumer choices.
  4. Implementable via existing extension points — it can be built cleanly using the traits and callbacks already in core. Permissions (via InputFilter + BeforeToolExecutionFn), model fallback chains (via custom StreamProvider), and observability backends (via AgentEvent stream) are examples.

Algorithms

This document has been split into smaller, maintainable files. See the algorithms/ directory for the full index.

Quick Navigation

For pseudocode conventions, see the README.

agent_loop (src/agent_loop/)

Purpose: Start a fresh agent run from new prompt messages. Preconditions: prompts is non-empty; context.messages may contain prior history. Postconditions: All input filters have run; AgentStart/AgentEnd are emitted; returns all new messages produced.

FUNCTION agent_loop(
  prompts: Vec<AgentMessage>,
  context: AgentContext,         // mutable
  config: AgentLoopConfig,
  tx: EventChannel<AgentEvent>,
  cancel: CancellationToken
) -> Vec<AgentMessage>

  // ── loop_id generation (must happen before before_loop so AgentEnd can carry it) ──
  IF context.loop_id is None THEN context.loop_id ← new_uuid() END IF

  // ── before_loop hook ────────────────────────────────────────────────────
  // Fires before AgentStart. Return false to abort before the loop begins.
  IF config.before_loop defined AND NOT before_loop(context.messages, 0) THEN
    EMIT AgentEnd(loop_id=context.loop_id, messages=[])
    RETURN []
  END IF

  // ── Identity write-back ──────────────────────────────────────────────────
  // agent_id / session_id are set by Agent::prompt_*. Direct callers may leave
  // them None; agent_loop generates and writes them back so that a subsequent
  // agent_loop_continue on the same context can inherit them without extra setup.
  IF context.agent_id is None THEN context.agent_id ← new_uuid() END IF
  IF context.session_id is None THEN context.session_id ← new_uuid() END IF

  EMIT AgentStart {
    agent_id:          context.agent_id,
    session_id:        context.session_id,
    loop_id:           context.loop_id,
    parent_loop_id:    None,    // None = origin call
    continuation_kind: Initial, // Initial = origin call (the #[default])
    config_snapshot:   Some(LoopConfigSnapshot from config),
    timestamp:         now()
  }

  // ── Input filtering ─────────────────────────────────────────────────────
  IF config.input_filters is non-empty THEN
    user_text ← JOIN all text from User messages in prompts

    warnings ← []
    FOR EACH filter IN config.input_filters
      MATCH filter.filter(user_text)
        CASE Pass     → continue
        CASE Warn(w)  → warnings.append(w)
        CASE Reject(reason) →
          EMIT InputRejected(reason)
          EMIT AgentEnd(messages=[])
          RETURN []
      END MATCH
    END FOR

    IF warnings is non-empty THEN
      warning_text ← JOIN ["[Warning: " + w + "]" FOR w IN warnings]
      // Append to last User message's content
      append Content::Text(warning_text) to last User message in prompts
    END IF
  END IF

  // ── Append prompts to context ────────────────────────────────────────────
  FOR EACH prompt IN prompts
    context.messages.append(prompt)
  END FOR

  new_messages ← copy of prompts

  EMIT TurnStart

  // Emit events for each incoming prompt
  FOR EACH prompt IN prompts
    EMIT MessageStart(prompt)
    EMIT MessageEnd(prompt)
  END FOR

  // Run the main loop
  loop_usage ← run_loop(context, new_messages, config, tx, cancel)

  EMIT AgentEnd(new_messages)

  // ── after_loop hook ──────────────────────────────────────────────────────
  // Fires after AgentEnd with the messages produced and accumulated usage.
  IF config.after_loop defined THEN after_loop(new_messages, loop_usage) END IF

  RETURN new_messages

END FUNCTION

agent_loop_continue (src/agent_loop/)

Purpose: Resume an agent run from existing context (no new prompts, continue from last user/tool-result message). Preconditions: context.messages is non-empty; last message is NOT an assistant message; context.agent_id and context.session_id are Some. Postconditions: Same as agent_loop.

FUNCTION agent_loop_continue(
  context: AgentContext,         // mutable
  config: AgentLoopConfig,
  tx: EventChannel<AgentEvent>,
  cancel: CancellationToken
) -> Vec<AgentMessage>

  [invariant: context.messages is non-empty]
  [invariant: context.messages.last().role != "assistant"]
  // Identity must carry over from the originating loop.
  // These are set by Agent::continue_loop_with_sender (or the direct caller who
  // bootstrapped the session). Silent UUID generation here would break traceability.
  [invariant: context.agent_id is Some]
  [invariant: context.session_id is Some]

  new_messages ← []

  // ── Classify existing messages into 2-stream model (if not already populated) ──
  IF context.user_context is empty AND context.inrun_context is empty THEN
    FOR EACH msg IN context.messages
      IF msg is User         → context.user_context.append(msg)
      IF msg is Assistant or ToolResult → context.inrun_context.append(Live(msg))
      // Extension messages go to neither stream
    END FOR
  END IF

  // ── before_loop hook ────────────────────────────────────────────────────
  IF config.before_loop defined AND NOT before_loop(context.messages, 0) THEN
    EMIT AgentEnd(messages=[])
    RETURN []
  END IF

  EMIT AgentStart {
    agent_id:          context.agent_id.unwrap(),
    session_id:        context.session_id.unwrap(),
    loop_id:           context.loop_id OR new_uuid(),
    parent_loop_id:    context.parent_loop_id,    // None for Default, Some for Rerun/Branch
    continuation_kind: context.continuation_kind,  // Default|Rerun|Branch|Compaction (ContinuationKind, not Option)
    config_snapshot:   Some(LoopConfigSnapshot from config),
    timestamp:         now()
  }

  loop_usage ← run_loop(context, new_messages, config, tx, cancel)

  EMIT AgentEnd(new_messages)

  // ── after_loop hook ──────────────────────────────────────────────────────
  IF config.after_loop defined THEN after_loop(new_messages, loop_usage) END IF

  RETURN new_messages

END FUNCTION

For pseudocode conventions, see the README.

run_loop (src/agent_loop/)

Purpose: The shared inner logic for both agent_loop and agent_loop_continue. Handles the outer follow-up loop and the inner turn-by-tool loop. Preconditions: Context contains at least one user message. Postconditions: new_messages contains all messages produced; loop has exited cleanly or on limit/cancel/error.

FUNCTION run_loop(
  context: AgentContext,         // mutable
  new_messages: Vec<AgentMessage>,  // mutable accumulator
  config: AgentLoopConfig,
  tx: EventChannel<AgentEvent>,
  cancel: CancellationToken
) -> Usage  // accumulated usage across all turns

  first_turn ← true
  turn_number ← 0
  loop_usage ← Usage.default()
  tracker ← ExecutionTracker.new(config.execution_limits)  // optional

  // Drain any pending steering messages before starting
  pending ← config.get_steering_messages()  // or []

  // ── Outer loop: re-enters if follow-up messages arrive ──────────────────
  WHILE true
    IF cancel.is_cancelled THEN RETURN loop_usage END IF

    steering_after_tools ← null

    // ── Inner loop: runs once per turn (LLM call + tools) ─────────────────
    WHILE true
      IF cancel.is_cancelled THEN RETURN loop_usage END IF

      // Determine TurnTrigger for TurnStart event.
      // NOTE: context.continuation_kind is Option<ContinuationKind> on AgentContext.
      // None means Initial (first loop); Some(x) means a continuation.
      // The pseudocode below abstracts this as direct ContinuationKind values.
      //
      // Priority on the first turn:
      //   1. Branch continuation   → TurnTrigger::Branch   (explicit branch signal)
      //   2. Any other continuation (Default/Rerun/Compaction) → TurnTrigger::Continuation
      //      (the continuation itself is the follow-up, not a fresh user turn)
      //   3. Initial (origin agent_loop call) → config.first_turn_trigger
      //      (User for Agent::prompt, SubAgent for sub-agent callers)
      // Subsequent turns always use TurnTrigger::Continuation.
      IF first_turn THEN
        turn_trigger ←
          IF context.continuation_kind == Branch(..) THEN TurnTrigger::Branch
          ELSE IF context.continuation_kind != Initial   THEN TurnTrigger::Continuation
          ELSE config.first_turn_trigger
        first_turn ← false
      ELSE
        turn_trigger ← TurnTrigger::Continuation
      END IF

      EMIT TurnStart { turn_index: turn_number, triggered_by: turn_trigger }

      // Inject any pending (steering/follow-up) messages
      FOR EACH msg IN pending
        EMIT MessageStart(msg)
        EMIT MessageEnd(msg)
        context.messages.append(msg)
        new_messages.append(msg)
        context.user_context.append(msg)    // steering goes to user stream (never pruned)
      END FOR
      pending ← []

      // Check execution limits
      IF tracker.check_limits() is Some(reason) THEN
        limit_msg ← User message "[Agent stopped: {reason}]"
        EMIT MessageStart(limit_msg)
        EMIT MessageEnd(limit_msg)
        context.messages.append(limit_msg)
        new_messages.append(limit_msg)
        RETURN loop_usage
      END IF

      // Before-turn callback — abort if returns false
      IF config.before_turn is defined THEN
        IF NOT config.before_turn(context.messages, turn_number) THEN
          RETURN loop_usage
        END IF
      END IF
      turn_number ← turn_number + 1

      // Compact context if configured (strategies live in context_config.compaction)
      IF config.context_config is defined THEN
        ctx_config ← config.context_config
        IF tokens_exceed_threshold(context, ctx_config) THEN
          IF config.before_compaction_start defined THEN
            IF NOT before_compaction_start(estimated_tokens, message_count) THEN
              SKIP compaction this cycle
            END IF
          END IF
          EMIT CompactionStarted { ... }
          strategy ← ctx_config.compaction.in_memory_strategy OR DefaultCompaction
          context.messages ← strategy.compact(context.messages, ctx_config)
          EMIT CompactionEnded { ... }
          IF config.after_compaction_end defined THEN
            after_compaction_end(msgs_before, msgs_after, tokens_before, tokens_after)
          END IF
        END IF
      END IF


      // ── LLM call ────────────────────────────────────────────────────────
      message ← AWAIT stream_assistant_response(context, config, tx, cancel)

      agent_msg ← message as AgentMessage
      context.messages.append(agent_msg)
      new_messages.append(agent_msg)
      context.inrun_context.append(Live(agent_msg))   // track in inrun stream (model-generated)

      // Accumulate usage for after_loop hook
      loop_usage ← loop_usage + message.usage

      // Handle error/abort stop reasons
      IF message.stop_reason == Error OR message.stop_reason == Aborted THEN
        IF message.stop_reason == Error AND config.on_error is defined THEN
          config.on_error(message.error_message OR "Unknown error")
        END IF
        IF config.after_turn is defined THEN
          config.after_turn(context.messages, message.usage)
        END IF
        EMIT TurnEnd(agent_msg, tool_results=[])
        RETURN loop_usage
      END IF

      // Extract tool calls from assistant content
      tool_calls ← [
        (id, name, arguments)
        FOR EACH content IN message.content
        IF content is ToolCall
      ]

      tool_results ← []

      IF tool_calls is non-empty THEN
        execution ← AWAIT execute_tool_calls(
          context.tools, tool_calls, tx, cancel,
          config.get_steering_messages, config.tool_execution
        )
        tool_results ← execution.tool_results
        steering_after_tools ← execution.steering_messages

        FOR EACH result IN tool_results
          am ← result as AgentMessage
          context.messages.append(am)
          new_messages.append(am)
          context.inrun_context.append(Live(am))   // track in inrun stream
        END FOR

        // Apply pending prun requests after tool execution (PrunTool stores requests during execute)
        IF config.prun_pending is defined THEN
          requests ← LOCK(config.prun_pending).drain()
          FOR EACH request IN requests
            apply_prun(context, request, tx)  // walks inrun_context backward, prunes Live entries
          END FOR
        END IF
      END IF

      // Record turn for limit tracking
      tracker.record_turn(message.usage.input + message.usage.output)

      // After-turn callback
      IF config.after_turn is defined THEN
        config.after_turn(context.messages, message.usage)
      END IF

      EMIT TurnEnd(agent_msg, tool_results)

      // Check for steering that arrived during tool execution
      IF steering_after_tools is non-empty THEN
        pending ← steering_after_tools
        CONTINUE inner loop
      END IF

      pending ← config.get_steering_messages()

      // Exit inner loop if no tool calls and no pending messages
      IF tool_calls is empty AND pending is empty THEN
        BREAK inner loop
      END IF

    END WHILE  // inner loop

    // Check for follow-up work
    follow_ups ← config.get_follow_up_messages()
    IF follow_ups is non-empty THEN
      pending ← follow_ups
      CONTINUE outer loop
    END IF

    BREAK outer loop

  END WHILE  // outer loop

  RETURN loop_usage

END FUNCTION

For pseudocode conventions, see the README.

stream_assistant_response (src/agent_loop/)

Purpose: Call the LLM with the current context, stream events to the channel, and return the final Message. Includes retry logic for transient errors. Preconditions: context.messages has at least one user message. Postconditions: Returns a complete Message::Assistant; events emitted include MessageStart, zero or more MessageUpdate, and MessageEnd.

FUNCTION stream_assistant_response(
  context: AgentContext,
  config: AgentLoopConfig,
  tx: EventChannel<AgentEvent>,
  cancel: CancellationToken
) -> Message

  // Build working context: merges user_context + live inrun_context + memos, sorted by timestamp.
  // Falls back to context.messages when prun streams are empty.
  base_messages ← context.build_working_context()

  // Apply optional context transform (e.g. for custom preprocessing)
  messages ← IF config.transform_context defined
              THEN config.transform_context(base_messages)
              ELSE base_messages

  // Filter to LLM-compatible messages (drop Extension messages)
  llm_messages ← IF config.convert_to_llm defined
                 THEN config.convert_to_llm(messages)
                 ELSE [m FOR m IN messages IF m is Llm variant]

  // Build tool schema list (schema only, no execute functions)
  tool_defs ← [
    ToolDefinition(name, description, parameters_schema)
    FOR EACH tool IN context.tools
  ]

  retry ← config.retry_config
  attempt ← 0

  // ── Retry loop ──────────────────────────────────────────────────────────
  WHILE true
    stream_config ← StreamConfig {
      model, system_prompt: context.system_prompt,
      messages: llm_messages, tools: tool_defs,
      thinking_level, api_key, max_tokens, temperature,
      model_config, cache_config
    }

    (stream_tx, stream_rx) ← new unbounded channel
    result ← AWAIT config.provider.stream(stream_config, stream_tx, cancel)

    MATCH result
      CASE Err(e) IF e.is_retryable()
                 AND attempt < retry.max_retries
                 AND NOT cancel.is_cancelled →
        attempt ← attempt + 1
        delay ← e.retry_after() OR retry.delay_for_attempt(attempt)
        log_retry(attempt, retry.max_retries, delay, e)
        AWAIT sleep(delay)
        CONTINUE  // retry

      CASE other →
        BREAK with (result, stream_rx)
    END MATCH
  END WHILE

  // ── Process streaming events ─────────────────────────────────────────────
  partial_message ← null

  FOR EACH stream_event IN stream_rx (drain available)
    MATCH stream_event
      CASE Start →
        placeholder ← empty Assistant message
        partial_message ← placeholder
        EMIT MessageStart(placeholder)

      CASE TextDelta(delta) →
        IF partial_message defined THEN
          EMIT MessageUpdate(partial_message, StreamDelta::Text(delta))
        END IF

      CASE ThinkingDelta(delta) →
        IF partial_message defined THEN
          EMIT MessageUpdate(partial_message, StreamDelta::Thinking(delta))
        END IF

      CASE ToolCallDelta(delta) →
        IF partial_message defined THEN
          EMIT MessageUpdate(partial_message, StreamDelta::ToolCallDelta(delta))
        END IF

      CASE Done(message) →
        am ← message as AgentMessage
        partial_message ← am
        // MessageStart was already emitted on Start
        EMIT MessageEnd(am)

      CASE Error(message) →
        am ← message as AgentMessage
        IF partial_message is null THEN
          EMIT MessageStart(am)
        END IF
        partial_message ← am
        EMIT MessageEnd(am)
    END MATCH
  END FOR

  // Return result
  MATCH result
    CASE Ok(msg) → RETURN msg
    CASE Err(e)  →
      RETURN Assistant {
        content: [Text("")],
        stop_reason: Error,
        model: config.model,
        provider: "unknown",
        usage: default,
        error_message: Some(e.to_string())
      }
  END MATCH

END FUNCTION

For pseudocode conventions, see the README.

execute_tool_calls (src/agent_loop/)

Purpose: Dispatch a list of tool calls using the configured execution strategy. Preconditions: tool_calls is non-empty. Postconditions: Returns one ToolResult message per input tool call (in order); skipped tools produce error results.

FUNCTION execute_tool_calls(
  tools: Vec<AgentTool>,
  tool_calls: [(id, name, args)],
  tx: EventChannel<AgentEvent>,
  cancel: CancellationToken,
  get_steering: optional function,
  strategy: ToolExecutionStrategy
) -> ToolExecutionResult { tool_results, steering_messages }

  MATCH strategy

    CASE Sequential →
      RETURN execute_sequential(tools, tool_calls, tx, cancel, get_steering)

    CASE Parallel →
      RETURN execute_batch(tools, tool_calls, tx, cancel, get_steering)

    CASE Batched { size } →
      results ← []
      steering_messages ← null

      FOR EACH batch IN chunks(tool_calls, size)
        batch_result ← AWAIT execute_batch(tools, batch, tx, cancel, steering=null)
        results.extend(batch_result.tool_results)

        // Check steering between batches
        IF get_steering defined THEN
          steering ← get_steering()
          IF steering is non-empty THEN
            steering_messages ← steering
            // Skip remaining tool calls
            remaining_idx ← (batch_index + 1) * size
            FOR EACH (skip_id, skip_name) IN tool_calls[remaining_idx..]
              results.append(skip_tool_call(skip_id, skip_name, tx))
            END FOR
            BREAK
          END IF
        END IF
      END FOR

      RETURN { tool_results: results, steering_messages }

  END MATCH

END FUNCTION

execute_sequential (src/agent_loop/)

Purpose: Execute tool calls one at a time, checking for steering between each.

FUNCTION execute_sequential(
  tools, tool_calls, tx, cancel, get_steering
) -> ToolExecutionResult

  results ← []
  steering_messages ← null

  FOR EACH (index, (id, name, args)) IN enumerate(tool_calls)
    (result_msg, _) ← AWAIT execute_single_tool(tools, id, name, args, tx, cancel)
    results.append(result_msg)

    IF get_steering defined THEN
      steering ← get_steering()
      IF steering is non-empty THEN
        steering_messages ← steering
        // Skip remaining tool calls
        FOR EACH (skip_id, skip_name) IN tool_calls[index+1..]
          results.append(skip_tool_call(skip_id, skip_name, tx))
        END FOR
        BREAK
      END IF
    END IF
  END FOR

  RETURN { tool_results: results, steering_messages }

END FUNCTION

execute_batch (src/agent_loop/)

Purpose: Execute all tool calls in a batch concurrently, then check for steering.

FUNCTION execute_batch(
  tools, tool_calls, tx, cancel, get_steering
) -> ToolExecutionResult

  // Launch all tools concurrently
  futures ← [execute_single_tool(tools, id, name, args, tx, cancel)
             FOR EACH (id, name, args) IN tool_calls]

  batch_results ← AWAIT_ALL(futures)   // wait for all to complete
  results ← [msg FOR (msg, _) IN batch_results]

  // Check steering after all complete
  steering_messages ← null
  IF get_steering defined THEN
    steering ← get_steering()
    IF steering is non-empty THEN
      steering_messages ← steering
    END IF
  END IF

  RETURN { tool_results: results, steering_messages }

END FUNCTION

execute_single_tool (src/agent_loop/)

Purpose: Execute one tool call, emitting progress events and returning the result as a ToolResult message.

FUNCTION execute_single_tool(
  tools: Vec<AgentTool>,
  id: String, name: String, args: JSON,
  tx: EventChannel<AgentEvent>,
  cancel: CancellationToken,
  config: AgentLoopConfig   // for before/after_tool_execution* hooks
) -> (Message::ToolResult, is_error: bool)

  tool ← find tool WHERE tool.name() == name  // may be None

  // ── before_tool_execution hook ───────────────────────────────────────────
  // Return false to skip this tool call entirely.
  IF config.before_tool_execution defined THEN
    IF NOT before_tool_execution(name, id, args) THEN
      // Emit a skipped error result so the LLM knows the call did not run
      skip_result ← ToolResult{ content: [Text("Tool call skipped by before_tool_execution hook")], is_error: true }
      EMIT ToolExecutionEnd(id, name, skip_result, is_error=true, child_loop_id=None)
      msg ← Message::ToolResult{ ..., is_error: true }
      EMIT MessageStart(msg); EMIT MessageEnd(msg)
      RETURN (msg, true)
    END IF
  END IF

  EMIT ToolExecutionStart(tool_call_id=id, tool_name=name, args)

  // Build callbacks for streaming partial results.
  // Each on_update call runs through the before/after_tool_execution_update hooks.
  on_update ← callback(partial: ToolResult):
    // Extract text content for hooks
    text_content ← JOIN text blocks from partial.content
    // before_tool_execution_update — false suppresses the event
    emit ← IF config.before_tool_execution_update defined
               THEN before_tool_execution_update(name, id, text_content)
               ELSE true
    IF emit THEN
      EMIT ToolExecutionUpdate(id, name, partial_result=partial)
      // after_tool_execution_update — fires only when event was not suppressed
      IF config.after_tool_execution_update defined THEN
        after_tool_execution_update(name, id, text_content)
      END IF
    END IF
  on_progress ← callback that EMITS ProgressMessage(id, name, text)

  ctx ← ToolContext {
    tool_call_id: id,
    tool_name: name,
    cancel: cancel.child_token(),  // new child token, same lineage
    on_update: on_update,
    on_progress: on_progress
  }

  (result, is_error) ←
    IF tool found THEN
      MATCH AWAIT tool.execute(args, ctx)
        CASE Ok(r)  → (r, false)
        CASE Err(e) → (ToolResult{ content: [Text(e.to_string())] }, true)
      END MATCH
    ELSE
      (ToolResult{ content: [Text("Tool {name} not found")] }, true)
    END IF

  // child_loop_id is set by SubAgentTool; None for all other tools
  EMIT ToolExecutionEnd(id, name, result, is_error, child_loop_id: result.child_loop_id)

  // ── after_tool_execution hook ────────────────────────────────────────────
  IF config.after_tool_execution defined THEN
    after_tool_execution(name, id, is_error)
  END IF

  msg ← Message::ToolResult {
    tool_call_id: id, tool_name: name,
    content: result.content, is_error, timestamp: now_ms()
  }
  EMIT MessageStart(msg)
  EMIT MessageEnd(msg)

  RETURN (msg, is_error)

END FUNCTION

For pseudocode conventions, see the README.

compact_messages (src/context/)

Note: The algorithm below describes the legacy in-memory compaction (compact_messages()). The current system uses a non-destructive overlay model via CompactionBlock / BlockCompactionStrategy. See compaction concept for the current design.

Purpose: Reduce context size using a 3-level strategy (Level 1 → 2 → 3) until messages fit the token budget. Preconditions: messages is a complete conversation history. Postconditions: Returns a subset/summary of messages with total_tokens(result) <= budget.

FUNCTION compact_messages(
  messages: Vec<AgentMessage>,
  config: ContextConfig
) -> Vec<AgentMessage>

  budget ← config.max_context_tokens - config.system_prompt_tokens

  // Already fits — return unchanged
  IF total_tokens(messages) <= budget THEN
    RETURN messages
  END IF

  // ── Level 1: Truncate verbose tool outputs ──────────────────────────────
  compacted ← level1_truncate_tool_outputs(messages, config.tool_output_max_lines)
  IF total_tokens(compacted) <= budget THEN
    RETURN compacted
  END IF

  // ── Level 2: Summarize old turns ────────────────────────────────────────
  compacted ← level2_summarize_old_turns(compacted, config.keep_recent)
  IF total_tokens(compacted) <= budget THEN
    RETURN compacted
  END IF

  // ── Level 3: Drop middle messages ───────────────────────────────────────
  RETURN level3_drop_middle(compacted, config, budget)

END FUNCTION

level1_truncate_tool_outputs (src/context/)

Note: The algorithm below describes the legacy in-memory compaction (compact_messages()). The current system uses a non-destructive overlay model via CompactionBlock / BlockCompactionStrategy. See compaction concept for the current design.

Purpose: Truncate long tool output text to head + tail, preserving message structure.

FUNCTION level1_truncate_tool_outputs(
  messages: Vec<AgentMessage>,
  max_lines: usize
) -> Vec<AgentMessage>

  RETURN [
    FOR EACH msg IN messages
      IF msg is ToolResult THEN
        // Truncate each Text content block
        new_content ← [
          FOR EACH content IN msg.content
            IF content is Text THEN
              Text { text: truncate_head_tail(content.text, max_lines) }
            ELSE
              content unchanged
            END IF
        ]
        ToolResult { ...msg, content: new_content }
      ELSE
        msg unchanged
      END IF
  ]

END FUNCTION

FUNCTION truncate_head_tail(text: String, max_lines: usize) -> String
  lines ← text.split_lines()
  IF lines.count() <= max_lines THEN
    RETURN text
  END IF

  head_count ← max_lines / 2
  tail_count ← max_lines - head_count
  omitted ← lines.count() - head_count - tail_count

  RETURN (
    lines[0..head_count].join("\n") +
    "\n\n[... {omitted} lines truncated ...]\n\n" +
    lines[lines.count()-tail_count..].join("\n")
  )
END FUNCTION

level2_summarize_old_turns (src/context/)

Note: The algorithm below describes the legacy in-memory compaction (compact_messages()). The current system uses a non-destructive overlay model via CompactionBlock / BlockCompactionStrategy. See compaction concept for the current design.

Purpose: Keep the most recent keep_recent messages in full; replace older assistant-plus-tool-result groups with one-line summaries.

FUNCTION level2_summarize_old_turns(
  messages: Vec<AgentMessage>,
  keep_recent: usize
) -> Vec<AgentMessage>

  len ← messages.count()
  IF len <= keep_recent THEN RETURN messages END IF

  boundary ← len - keep_recent  // messages before this index are candidates

  result ← []
  i ← 0

  WHILE i < boundary
    msg ← messages[i]

    MATCH msg
      CASE Assistant(content) →
        // Build one-line summary
        short_texts ← [t FOR t IN text content IF t.len <= 200]
        tool_count  ← count of ToolCall blocks in content

        summary ←
          IF short_texts non-empty  → JOIN(short_texts)
          ELSE IF tool_count > 0    → "[Assistant used {tool_count} tool(s)]"
          ELSE                      → "[Assistant response]"

        result.append(User{ content: [Text("[Summary] {summary}")] })

        // Skip following ToolResult messages that belong to this turn
        i ← i + 1
        WHILE i < boundary AND messages[i] is ToolResult
          i ← i + 1
        END WHILE
        CONTINUE  // skip i++ below

      CASE ToolResult →
        // Skip orphaned tool results
        i ← i + 1
        CONTINUE

      CASE other →
        // Keep user messages and extension messages
        result.append(other)
    END MATCH

    i ← i + 1
  END WHILE

  // Append recent messages in full
  result.extend(messages[boundary..])
  RETURN result

END FUNCTION

level3_drop_middle (src/context/)

Note: The algorithm below describes the legacy in-memory compaction (compact_messages()). The current system uses a non-destructive overlay model via CompactionBlock / BlockCompactionStrategy. See compaction concept for the current design.

Purpose: Keep the first keep_first and last keep_recent messages; drop everything in between, inserting a marker.

FUNCTION level3_drop_middle(
  messages: Vec<AgentMessage>,
  config: ContextConfig,
  budget: usize
) -> Vec<AgentMessage>

  len ← messages.count()
  first_end   ← min(config.keep_first, len)
  recent_start ← max(0, len - config.keep_recent)

  IF first_end >= recent_start THEN
    // Not enough room to split — keep as many recent as fit
    RETURN keep_within_budget(messages, budget)
  END IF

  removed ← recent_start - first_end
  marker ← User { content: [Text("[Context compacted: {removed} messages removed to fit context window]")] }

  result ← messages[0..first_end] + [marker] + messages[recent_start..]

  IF total_tokens(result) > budget THEN
    RETURN keep_within_budget(result, budget)
  END IF

  RETURN result

END FUNCTION

FUNCTION keep_within_budget(messages, budget) -> Vec<AgentMessage>
  // Greedily keep most-recent messages that fit
  result ← []
  remaining ← budget

  FOR EACH msg IN REVERSE(messages)
    tokens ← message_tokens(msg)
    IF tokens > remaining THEN BREAK END IF
    remaining ← remaining - tokens
    result.prepend(msg)
  END FOR

  IF result.count() < messages.count() THEN
    removed ← messages.count() - result.count()
    result.prepend(User { content: [Text("[Context compacted: {removed} messages removed]")] })
  END IF

  RETURN result
END FUNCTION

estimate_tokens (src/context/)

Purpose: Fast heuristic token count for a text string.

FUNCTION estimate_tokens(text: String) -> usize
  RETURN ceil(text.byte_length() / 4)
  // Heuristic: ~4 UTF-8 bytes per token for English text.
  // Not precise — use tiktoken for exact counts.
END FUNCTION

FUNCTION content_tokens(content: Vec<Content>) -> usize
  total ← 0
  FOR EACH block IN content
    MATCH block
      CASE Text { text }          → total += estimate_tokens(text)
      CASE Image { data }         →
        raw_bytes ← data.base64_decoded_byte_length()
        // ~750 bytes per image token; floor 85, cap 16,000
        total += clamp(raw_bytes / 750, 85, 16_000)
      CASE Thinking { thinking }  → total += estimate_tokens(thinking)
      CASE ToolCall { name, args }→
        total += estimate_tokens(name) + estimate_tokens(args.to_string()) + 8
    END MATCH
  END FOR
  RETURN total
END FUNCTION

FUNCTION message_tokens(msg: AgentMessage) -> usize
  MATCH msg
    CASE Llm(User { content })            → RETURN content_tokens(content) + 4
    CASE Llm(Assistant { content })       → RETURN content_tokens(content) + 4
    CASE Llm(ToolResult { tool_name, content }) →
      RETURN content_tokens(content) + estimate_tokens(tool_name) + 8
    CASE Extension { data }               → RETURN estimate_tokens(data.to_string()) + 4
  END MATCH
END FUNCTION

For pseudocode conventions, see the README.

4. Decision Logic

Tool Execution Strategy Dispatch

FUNCTION select_execution_strategy(strategy, tool_calls) -> ExecutionPath
  MATCH strategy
    CASE Sequential →
      // One at a time; check steering after each tool
      // Use when: tools have shared mutable state, need human-in-the-loop each step
      RETURN sequential_path

    CASE Parallel (default) →
      // All tools concurrently via join_all
      // Use when: tools are independent (most cases); lowest latency
      RETURN parallel_path

    CASE Batched { size } →
      // Groups of `size` concurrently; check steering between groups
      // Use when: tools are independent but human oversight between groups wanted
      RETURN batched_path(size)
  END MATCH
END FUNCTION

Compaction Level Selection

FUNCTION select_compaction_level(messages, config) -> CompactionAction
  budget ← config.max_context_tokens - config.system_prompt_tokens
  current ← total_tokens(messages)

  IF current <= budget          → RETURN NoCompaction
  ELSE IF level1 fits in budget → RETURN Level1 (truncate tool outputs)
  ELSE IF level2 fits in budget → RETURN Level2 (summarize old turns)
  ELSE                          → RETURN Level3 (drop middle)
END FUNCTION

StopReason Determination (in provider implementations)

FUNCTION determine_stop_reason(provider_stop_signal) -> StopReason
  MATCH provider_stop_signal
    CASE "end_turn" (Anthropic) | "stop" (OpenAI) | natural end → Stop
    CASE "max_tokens" (Anthropic) | "length" (OpenAI)           → Length
    CASE "tool_use" (Anthropic) | "tool_calls" (OpenAI)         → ToolUse
    CASE cancel token triggered                                  → Aborted
    CASE any provider error                                      → Error
  END MATCH
END FUNCTION

Input Filter Chain

FUNCTION apply_input_filters(filters, user_text) -> FilterChainResult
  warnings ← []

  FOR EACH filter IN filters
    MATCH filter.filter(user_text)
      CASE Pass     → continue
      CASE Warn(w)  → warnings.append(w)
      CASE Reject(r) →
        // First Reject wins — discards all accumulated warnings
        RETURN Rejected(r)
    END MATCH
  END FOR

  IF warnings non-empty THEN
    RETURN PassWithWarnings(warnings)
  END IF

  RETURN Pass
END FUNCTION

Context Overflow Detection

FUNCTION detect_context_overflow(provider_error_or_message) -> bool

  // Path 1: HTTP error response
  IF error is ProviderError::ContextOverflow THEN RETURN true END IF

  // Path 2: SSE streaming error (Anthropic, OpenAI report overflow in-stream)
  IF message.stop_reason == Error
     AND message.error_message defined
     AND is_context_overflow_message(message.error_message)
  THEN RETURN true END IF

  RETURN false

  // Caller response: next turn will trigger compact_messages() if context_config set
END FUNCTION

For pseudocode conventions, see the README.

3. Initialization & Lifecycle Sequences

Agent Construction (Builder Pattern)

SEQUENCE AgentConstruction
  1. BasicAgent::new(model_config: ModelConfig)
     - Stores model_config (provider identity: id, api_key, base_url, api protocol, cost rates)
     - Initializes messages = []
     - Initializes tools = []
     - Sets defaults: thinking = Off, tool_execution = Parallel, retry = default

  2. .with_system_prompt(text)
     - Stores system_prompt string

  3. .with_tools(vec)
     - Replaces or extends the tools list

  5. .with_context_config(config)
     - Enables automatic compaction before each turn

  6. .with_execution_limits(limits)
     - Enables turn/token/duration caps

  7. .with_skills(skill_set)
     - Appends skill XML index to system_prompt

  8. .with_mcp_server_stdio(cmd, args, env)     [async]
     - Spawns MCP subprocess
     - Calls initialize + tools/list over JSON-RPC
     - Wraps each discovered tool as McpToolAdapter (implements AgentTool)
     - Appends adapters to tools list

  9. .with_openapi_file/url/spec(...)           [async, feature-gated]
     - Parses OpenAPI spec
     - Generates one OpenApiToolAdapter per matching operation
     - Appends adapters to tools list

  10. Callbacks: .on_before_turn(f), .on_after_turn(f), .on_error(f)
      - Stores function pointers; called at appropriate points in run_loop

  11. .with_input_filter(filter)
      - Appends to input_filters list

  12. .with_compaction_strategy(strategy)
      - Sets context_config.compaction.in_memory_strategy (custom compaction implementation)

END SEQUENCE

Agent Run Lifecycle

SEQUENCE AgentRun (invoked by agent.prompt("..."))
  1. Acquire run lock (ensure not already streaming)
     - is_streaming ← true
     - Create new CancellationToken

  2. Build AgentContext from current Agent state
     - Snapshot: system_prompt, messages (copy), tools

  3. Build AgentLoopConfig from current Agent config
     - Wire get_steering_messages → drain steering_queue
     - Wire get_follow_up_messages → drain follow_up_queue

  4. Create event channel (tx, rx)

  5. SPAWN async task: agent_loop(prompts, context, config, tx, cancel)

  6. Return rx to caller immediately (non-blocking)
     - Caller consumes events: AgentStart, TurnStart/End, MessageStart/Update/End,
       ToolExecutionStart/Update/End, ProgressMessage, AgentEnd

  7. When AgentEnd received or channel closes:
     - Merge new_messages into Agent.messages
     - is_streaming ← false
     - CancellationToken dropped

END SEQUENCE

Abort Lifecycle

SEQUENCE AgentAbort (invoked by agent.abort())
  1. IF cancel token exists THEN
       cancel.cancel()  // signals all child tokens
  2. Agent loop checks cancel.is_cancelled() at:
     - Start of each outer/inner loop iteration
     - In BashTool's tokio::select! race
     - In ReadFileTool/WriteFileTool/EditFileTool before each I/O op
  3. Loop exits cleanly at next check point; AgentEnd NOT emitted on abort
     [AMBIGUOUS: AgentEnd may or may not be emitted depending on where
      in the loop cancellation is detected — Start/Done events from provider
      may still arrive before cancellation is noticed]
END SEQUENCE

Message Persistence

SEQUENCE MessagePersistence
  Save:
    1. agent.save_messages() → serde_json::to_string(agent.messages)
    2. Caller writes JSON string to disk/storage

  Restore:
    1. Caller reads JSON string from disk/storage
    2. agent.restore_messages(json_str) → serde_json::from_str(json_str) → Vec<AgentMessage>
    3. Agent.messages ← deserialized messages
    4. Next agent.prompt() continues from restored history

  All types in AgentMessage tree derive Serialize + Deserialize.
  JSON format: array of untagged AgentMessage items;
    Llm variant: has "role" field ("user", "assistant", "toolResult")
    Extension variant: has "role" field "extension" + "kind" + "data"
END SEQUENCE


BasicAgent::new and BasicAgent::prompt (src/agents/basic_agent.rs)

Purpose: Construct a BasicAgent and start a run. These are the primary application-facing entry points.

FUNCTION BasicAgent::new(model_config: ModelConfig) -> BasicAgent
  RETURN BasicAgent {
    model_config: model_config,       // complete provider identity: id, api_key, base_url, api, cost
    system_prompt: "",
    thinking_level: Off,
    max_tokens: None,
    temperature: None,
    messages: [],
    tools: [],
    steering_queue: Arc(Mutex([])),
    follow_up_queue: Arc(Mutex([])),
    steering_mode: QueueMode::OneAtATime,
    follow_up_mode: QueueMode::OneAtATime,
    context_config: Some(ContextConfig::default()),
    execution_limits: Some(ExecutionLimits::default()),
    cache_config: CacheConfig::default(),
    tool_execution: Parallel,
    retry_config: RetryConfig::default(),
    before_turn: None,
    after_turn: None,
    on_error: None,
    input_filters: [],
    // compaction strategies are now inside context_config.compaction (G5)
    cancel: None,
    is_streaming: false
  }
END FUNCTION

FUNCTION Agent::prompt(text: String) -> UnboundedReceiver<AgentEvent>
  RETURN Agent::prompt_messages([AgentMessage::Llm(Message::user(text))])
END FUNCTION

FUNCTION Agent::prompt_messages(messages: Vec<AgentMessage>) -> UnboundedReceiver<AgentEvent>
  (tx, rx) ← new unbounded channel
  SPAWN Agent::prompt_messages_with_sender(messages, tx)
  RETURN rx
END FUNCTION

FUNCTION Agent::prompt_messages_with_sender(
  messages: Vec<AgentMessage>,
  tx: EventSender<AgentEvent>
) [async]

  // Guard: panics if already streaming
  ASSERT NOT self.is_streaming,
    "Agent is already streaming. Use steer() or follow_up()."

  self.is_streaming ← true
  self.cancel ← Some(CancellationToken::new())

  // Build context snapshot for this run
  context ← AgentContext {
    system_prompt: self.system_prompt.clone(),
    messages: self.messages.clone(),
    tools: self.tools  // borrowed
  }

  // Wire queue closures — capture Arc pointers
  steering_arc ← Arc::clone(self.steering_queue)
  followup_arc ← Arc::clone(self.follow_up_queue)

  config ← AgentLoopConfig {
    provider: self.provider,
    model: self.model,
    api_key: self.api_key,
    thinking_level: self.thinking_level,
    max_tokens: self.max_tokens,
    temperature: self.temperature,
    model_config: self.model_config,
    get_steering_messages: closure {
      LOCK(steering_arc)
      MATCH self.steering_mode
        CASE OneAtATime → IF queue non-empty THEN [queue.remove(0)] ELSE []
        CASE All        → queue.drain_all()
      UNLOCK
    },
    get_follow_up_messages: closure {
      LOCK(followup_arc)
      MATCH self.follow_up_mode
        CASE OneAtATime → IF queue non-empty THEN [queue.remove(0)] ELSE []
        CASE All        → queue.drain_all()
      UNLOCK
    },
    context_config: self.context_config,  // includes compaction strategies (G5)
    execution_limits: self.execution_limits,
    cache_config: self.cache_config,
    tool_execution: self.tool_execution,
    retry_config: self.retry_config,
    before_turn: self.before_turn,
    after_turn: self.after_turn,
    on_error: self.on_error,
    input_filters: self.input_filters
  }

  new_messages ← AWAIT agent_loop(messages, context, config, tx, self.cancel.unwrap())

  // Merge new messages back into Agent.messages
  self.messages.extend(new_messages)

  self.is_streaming ← false
  self.cancel ← None

END FUNCTION

For pseudocode conventions, see the README.

5. Concurrency & Async Patterns

Parallel Tool Execution

PATTERN ParallelToolExecution
  // When Parallel strategy is used, all tool calls race concurrently.
  // This is safe because:
  //   1. Tools share no mutable state (each has its own ToolContext)
  //   2. Each ToolContext gets a child cancellation token (same lineage, independent trigger)
  //   3. The event channel (tx) is cloned into each ToolContext — Unbounded sends never block
  //   4. Results are collected in original order via join_all (preserves tool_call ordering)

  futures ← [execute_single_tool(id, name, args) FOR EACH (id, name, args) IN tool_calls]
  results ← AWAIT_ALL(futures)   // futures::join_all — waits for ALL, order preserved
  // Steering is checked AFTER all complete (cannot interrupt mid-batch in Parallel mode)

PATTERN SequentialToolExecution
  // Tools run one at a time; steering is checked after each.
  // Use when tools access shared resources (e.g., same file, same database row).

PATTERN BatchedToolExecution
  // Groups of N run in parallel; steering checked between groups.
  // Balances latency (N concurrent) with control (interrupt between groups).

Cancellation Token Propagation

PATTERN CancellationPropagation
  // CancellationToken forms a tree. Cancelling a parent cancels all children.

  Agent.cancel (root token)
    └── AgentLoop cancel (same token passed in)
          └── ToolContext.cancel (child_token() — inherits from parent)
                └── SubAgentTool: forwards parent cancel to child agent_loop()

  // Checks occur at:
  //   - Top of each loop iteration in run_loop (fast path)
  //   - tokio::select! in BashTool (races against timeout)
  //   - Explicit is_cancelled() checks in ReadFileTool, WriteFileTool, EditFileTool

  // Important: abort() on Agent cancels ALL in-progress tool calls simultaneously,
  // regardless of execution strategy.

Event Channel Architecture

PATTERN EventChannelArchitecture
  // Single producer (AgentLoop), single consumer (caller).
  // Channel: tokio::mpsc::unbounded_channel — never blocks sender.

  AgentLoop ──tx──→ UnboundedChannel ──rx──→ Application

  // Sub-agent events are NOT directly forwarded to parent channel.
  // SubAgentTool spawns a separate task to translate sub-agent events:
  //   AgentEvent::MessageUpdate(Text(delta)) → on_update(ToolResult{text:delta})
  //   AgentEvent::ProgressMessage{text}      → on_progress(text)
  // These are then emitted to the parent channel as ToolExecutionUpdate/ProgressMessage.

  // This means: parent sees sub-agent activity but via ToolExecutionUpdate wrappers,
  // NOT as nested AgentStart/AgentEnd/TurnStart/TurnEnd events.

Steering Queue Thread Safety

PATTERN SteeringQueueSafety
  // steering_queue and follow_up_queue are Arc<Mutex<Vec<AgentMessage>>>.

  // Write path (application thread):
  //   agent.steer(msg)     → LOCK(queue), queue.push(msg), UNLOCK
  //   agent.follow_up(msg) → LOCK(follow_up_queue), queue.push(msg), UNLOCK

  // Read path (agent loop task) — behavior depends on QueueMode:
  //   QueueMode::OneAtATime (default):
  //     LOCK(queue), msg = queue.remove(0), UNLOCK, return [msg]
  //     → delivers exactly one message per check; rest remain for next check
  //   QueueMode::All:
  //     LOCK(queue), msgs = queue.drain_all(), UNLOCK, return msgs
  //     → delivers everything at once

  // Read is called only between tool executions — never concurrently with another read.
  // No deadlock risk: lock is held for microseconds (no I/O inside lock).
  // No data race: Mutex guarantees exclusive access.

  // Queues are passed to AgentLoopConfig as closures capturing the Arc pointer,
  // so the external caller can enqueue messages from any thread at any time.

For pseudocode conventions, see the README.

delay_for_attempt (src/provider/retry.rs)

Purpose: Compute the sleep duration before a retry attempt using exponential backoff with jitter.

FUNCTION delay_for_attempt(config: RetryConfig, attempt: usize) -> Duration
  // attempt is 1-indexed
  base_ms ← config.initial_delay_ms * (config.backoff_multiplier ^ (attempt - 1))
  capped_ms ← min(base_ms, config.max_delay_ms)

  // ±20% uniform jitter: multiply by random value in [0.8, 1.2]
  jitter ← 0.8 + random_float_0_to_1() * 0.4
  delay_ms ← floor(capped_ms * jitter)

  RETURN Duration::from_ms(delay_ms)

  // Examples with defaults (initial=1000ms, multiplier=2.0, max=30000ms):
  //   attempt 1 → base=1000ms  → ~800–1200ms
  //   attempt 2 → base=2000ms  → ~1600–2400ms
  //   attempt 3 → base=4000ms  → ~3200–4800ms
END FUNCTION

For pseudocode conventions, see the README.

ProviderError::classify (src/provider/traits.rs)

Purpose: Map an HTTP error response to the correct ProviderError variant.

FUNCTION ProviderError::classify(status: u16, message: String) -> ProviderError

  IF is_context_overflow(status, message) THEN
    RETURN ContextOverflow { message }
  END IF

  IF status == 429 THEN
    RETURN RateLimited { retry_after_ms: None }
  END IF

  IF status == 401 OR status == 403 THEN
    RETURN Auth(message)
  END IF

  RETURN Api(message)

END FUNCTION

FUNCTION is_context_overflow(status: u16, message: String) -> bool
  // Some providers (Cerebras, Mistral) return 400/413 with empty body
  IF (status == 400 OR status == 413) AND message.trim() is empty THEN
    RETURN true
  END IF
  lower ← message.to_lowercase()
  RETURN any of OVERFLOW_PHRASES is a substring of lower

  // OVERFLOW_PHRASES includes:
  //   "prompt is too long"          (Anthropic)
  //   "input is too long"           (Bedrock)
  //   "exceeds the context window"  (OpenAI)
  //   "exceeds the maximum"         (Google)
  //   "maximum prompt length"       (xAI)
  //   "reduce the length of the messages" (Groq)
  //   "maximum context length"      (OpenRouter)
  //   "context length exceeded"     (generic)
  //   "too many tokens"             (generic)
  //   ... 15 phrases total

END FUNCTION

For pseudocode conventions, see the README.

SubAgentTool::execute (src/agents/sub_agent.rs)

Purpose: Delegate a task to an isolated child agent loop, return its final text as a ToolResult. Preconditions: params.task is a non-empty string. Postconditions: Returns final assistant text from the child run; child context is discarded.

FUNCTION SubAgentTool::execute(
  params: JSON,
  ctx: ToolContext
) -> Result<ToolResult, ToolError>

  task ← params["task"] as String  // ERROR "Missing required 'task' parameter" if absent
  cancel ← ctx.cancel
  on_update ← ctx.on_update
  on_progress ← ctx.on_progress

  // Build fresh child context (no history carried over)
  child_context ← AgentContext {
    system_prompt: self.system_prompt,
    messages: [],              // isolated — starts empty
    tools: self.tools          // child has its own toolset (no SubAgentTool instances)
  }

  child_config ← AgentLoopConfig {
    provider: self.provider,
    model: self.model,
    api_key: self.api_key,
    thinking_level: self.thinking_level,
    max_tokens: self.max_tokens,
    execution_limits: {
      max_turns: self.max_turns,       // primary guard (default: 10)
      max_total_tokens: 1_000_000,     // generous fallback
      max_duration: 300s               // generous fallback
    },
    // No steering, no follow-ups, no input filters in sub-agents
    get_steering_messages: null,
    get_follow_up_messages: null,
    input_filters: [],
    ...other config from self
  }

  (event_tx, event_rx) ← new unbounded channel

  // Forward events to parent if callbacks are present
  IF on_update defined OR on_progress defined THEN
    forwarder ← SPAWN async task:
      WHILE event ← event_rx.recv()
        IF event is ProgressMessage { text } THEN
          on_progress(text)  // if defined
        END IF
        IF event is MessageUpdate { delta: Text(delta) } THEN
          on_update(ToolResult{ content: [Text(delta)] })
        END IF
        IF event is ToolExecutionStart { tool_name } THEN
          on_update(ToolResult{ content: [Text("[sub-agent calling tool: {tool_name}]")] })
        END IF
      END WHILE
  END IF

  prompt_msg ← AgentMessage::Llm(Message::User(task))
  new_messages ← AWAIT agent_loop([prompt_msg], child_context, child_config, event_tx, cancel)

  IF forwarder defined THEN AWAIT forwarder END IF

  // Extract final assistant text
  result_text ← extract_final_text(new_messages)

  RETURN Ok(ToolResult {
    content: [Text(result_text)],
    details: { sub_agent: self.tool_name, turns: new_messages.count() }
  })

END FUNCTION

FUNCTION extract_final_text(messages: Vec<AgentMessage>) -> String
  FOR EACH msg IN REVERSE(messages)
    IF msg is Assistant THEN
      texts ← [t FOR t IN msg.content IF t is Text]
      IF texts non-empty THEN
        RETURN JOIN(texts)
      END IF
    END IF
  END FOR
  RETURN "(sub-agent produced no text output)"
END FUNCTION

For pseudocode conventions, see the README.

BashTool::execute (src/tools/bash.rs)

Purpose: Execute a shell command, capture output, enforce safety. Preconditions: params.command is present. Postconditions: Returns Ok(ToolResult) even for non-zero exit codes (LLM needs the error to self-correct).

FUNCTION BashTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>

  command ← params["command"] as String  // InvalidArgs if missing
  cancel ← ctx.cancel

  // Safety: check deny patterns (substring match)
  FOR EACH pattern IN self.deny_patterns
    IF command contains pattern THEN
      RETURN Err(Failed("Command blocked by safety policy: contains '{pattern}'"))
    END IF
  END FOR

  // Optional confirmation callback
  IF self.confirm_fn defined AND NOT self.confirm_fn(command) THEN
    RETURN Err(Failed("Command was not confirmed by the user."))
  END IF

  // Build subprocess: bash -c "{command}"
  cmd ← Command("bash", ["-c", command])
  IF self.cwd defined THEN cmd.current_dir(self.cwd) END IF
  cmd.stdout(piped), cmd.stderr(piped)

  // Race: cancellation vs timeout vs command completion
  result ← SELECT {
    cancel.cancelled()          → RETURN Err(Cancelled)
    sleep(self.timeout)         → RETURN Err(Failed("Command timed out after {N}s"))
    cmd.output()                → result  // may be Err if spawn failed
  }

  output ← result  // Err(io) → Err(Failed("Failed to execute: {e}"))

  stdout ← output.stdout as utf8 (lossy)
  stderr ← output.stderr as utf8 (lossy)

  // Truncate at limit
  IF stdout.len > self.max_output_bytes THEN
    stdout ← stdout[0..max_output_bytes] + "\n... (output truncated)"
  END IF
  IF stderr.len > self.max_output_bytes THEN
    stderr ← stderr[0..max_output_bytes] + "\n... (output truncated)"
  END IF

  exit_code ← output.exit_code OR -1

  text ←
    IF stderr is empty THEN
      "Exit code: {exit_code}\n{stdout}"
    ELSE
      "Exit code: {exit_code}\nSTDOUT:\n{stdout}\nSTDERR:\n{stderr}"
    END IF

  // Always Ok — non-zero exit is NOT a ToolError
  RETURN Ok(ToolResult {
    content: [Text(text)],
    details: { exit_code, success: exit_code == 0 }
  })

END FUNCTION

For pseudocode conventions, see the README.

ReadFileTool::execute (src/tools/file.rs)

Purpose: Read a file's contents. Routes to binary (image) or text path based on extension.

FUNCTION ReadFileTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>

  path ← params["path"] as String  // InvalidArgs if missing

  IF ctx.cancel.is_cancelled THEN RETURN Err(Cancelled) END IF

  metadata ← AWAIT fs.metadata(path)  // Err → Failed("Cannot access {path}: {e}")

  IF is_image_extension(path) THEN
    // ── Image path ────────────────────────────────────────────────────────
    IF metadata.size > 20MB THEN
      RETURN Err(Failed("Image too large"))
    END IF
    bytes ← AWAIT fs.read(path)
    data ← base64_encode(bytes)
    mime_type ← get_mime_type(path)
    RETURN Ok(ToolResult {
      content: [Image { data, mime_type }],
      details: { path, bytes: bytes.len() }
    })
  END IF

  // ── Text path ─────────────────────────────────────────────────────────
  IF metadata.size > self.max_bytes THEN
    RETURN Err(Failed("File too large. Use offset/limit for partial reads."))
  END IF

  content ← AWAIT fs.read_to_string(path)
  lines ← content.split_lines()
  total ← lines.count()

  offset ← params["offset"] as usize (1-indexed)  // optional, default: 1
  limit  ← params["limit"]  as usize               // optional, default: all

  (start, end) ← compute_range(offset, limit, total)

  // Line-numbered output: "   1 | first line"
  numbered ← ["{start+i+1:>4} | {line}" FOR (i, line) IN enumerate(lines[start..end])]

  header ←
    IF start > 0 OR end < total THEN "[Lines {start+1}-{end} of {total}]"
    ELSE "[{total} lines]"

  RETURN Ok(ToolResult {
    content: [Text("{header}\n{numbered.join('\n')}")],
    details: { path }
  })

END FUNCTION

EditFileTool::execute (src/tools/edit.rs)

Purpose: Make a surgical search-and-replace edit in an existing file. Preconditions: File exists; old_text occurs exactly once in the file. Postconditions: File on disk has exactly the one occurrence of old_text replaced by new_text.

FUNCTION EditFileTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>

  path     ← params["path"]     as String  // InvalidArgs if missing
  old_text ← params["old_text"] as String  // InvalidArgs if missing
  new_text ← params["new_text"] as String  // InvalidArgs if missing

  IF ctx.cancel.is_cancelled THEN RETURN Err(Cancelled) END IF

  content ← AWAIT fs.read_to_string(path)
  // Err → Failed("Cannot read {path}. Use write_file to create new files.")

  match_count ← count of occurrences of old_text in content

  IF match_count == 0 THEN
    // Provide helpful fuzzy hint
    hint ← find_similar_text(content, old_text)
    IF hint defined THEN
      message ← "old_text not found in {path}.\n\nDid you mean:\n```\n{hint}\n```\n..."
    ELSE
      message ← "old_text not found in {path}.\n\nTip: Use read_file to see contents..."
    END IF
    RETURN Err(Failed(message))
  END IF

  IF match_count > 1 THEN
    RETURN Err(Failed(
      "old_text matches {match_count} locations. Include more context to make match unique."
    ))
  END IF

  // Replace exactly the first (and only) occurrence
  new_content ← content.replace_once(old_text, new_text)
  AWAIT fs.write(path, new_content)

  old_lines ← old_text.line_count()
  new_lines ← new_text.line_count()

  RETURN Ok(ToolResult {
    content: [Text("Replaced {old_lines} line(s) with {new_lines} line(s) in {path}")],
    details: { path, old_lines, new_lines }
  })

END FUNCTION

FUNCTION find_similar_text(content: String, target: String) -> Option<String>
  // Fuzzy hint: find the first line of target in the file
  target_trimmed ← target.trim()
  first_line ← target_trimmed.first_line().trim()
  IF first_line is empty THEN RETURN None END IF

  lines ← content.split_lines()
  FOR EACH (i, line) IN enumerate(lines)
    IF line contains first_line THEN
      end ← min(i + target_trimmed.line_count() + 1, lines.count())
      RETURN Some(lines[i..end].join("\n"))
    END IF
  END FOR

  RETURN None
END FUNCTION

SkillSet::format_for_prompt (src/context/skills.rs)

Purpose: Format all loaded skills as an XML index for injection into the system prompt. Standard: Conforms to the AgentSkills open standard (agentskills.io/integrate-skills).

FUNCTION SkillSet::format_for_prompt() -> String

  IF self.skills is empty THEN RETURN "" END IF

  // Skills are sorted by name ascending
  sorted_skills ← sort(self.skills, by: skill.name)

  out ← "<available_skills>\n"

  FOR EACH skill IN sorted_skills
    out += "  <skill>\n"
    out += "    <name>"        + xml_escape(skill.name)                      + "</name>\n"
    out += "    <description>" + xml_escape(skill.description)               + "</description>\n"
    out += "    <location>"    + xml_escape(skill.file_path.to_string())     + "</location>\n"
    out += "  </skill>\n"
  END FOR

  out += "</available_skills>"
  RETURN out

  // xml_escape replaces: & → &amp;  < → &lt;  > → &gt;  " → &quot;  ' → &apos;

END FUNCTION

// Example output:
// <available_skills>
//   <skill>
//     <name>weather</name>
//     <description>Get current weather and forecasts.</description>
//     <location>/home/user/.skills/weather/SKILL.md</location>
//   </skill>
// </available_skills>

SkillSet::load (src/context/skills.rs)

Purpose: Load skills from one or more directories. Later directories override earlier ones on name collision.

FUNCTION SkillSet::load(dirs: Vec<Path>) -> Result<SkillSet, SkillError>

  skill_map ← HashMap<String, Skill>  // key = skill name

  FOR EACH (index, dir) IN enumerate(dirs)
    IF dir does not exist THEN
      CONTINUE  // silently skip missing directories
    END IF

    source_label ← "dir:{index}"

    FOR EACH entry IN list_subdirectories(dir)
      skill_md_path ← entry.path / "SKILL.md"
      IF skill_md_path does not exist THEN
        CONTINUE
      END IF

      content ← read_to_string(skill_md_path)
      (name, description) ← parse_frontmatter(content)
      // Returns SkillError::InvalidFrontmatter or SkillError::MissingField on failure

      base_dir ← canonicalize(entry.path)
      file_path ← base_dir / "SKILL.md"

      skill ← Skill { name, description, file_path, base_dir, source: source_label }
      skill_map[name] ← skill  // later dirs OVERRIDE earlier on name collision
    END FOR
  END FOR

  skills ← sort(skill_map.values(), by: skill.name)
  RETURN Ok(SkillSet { skills })

END FUNCTION

FUNCTION parse_frontmatter(content: String) -> Result<(name, description), SkillError>
  // Content must start with "---"
  IF NOT content.trim_start().starts_with("---") THEN
    RETURN Err(InvalidFrontmatter)
  END IF

  // Find closing "---"
  yaml_block ← content between first "---" and next "\n---"
  IF no closing delimiter THEN
    RETURN Err(InvalidFrontmatter)
  END IF

  name ← ""
  description ← ""

  FOR EACH line IN yaml_block.lines()
    IF line.starts_with("name:") THEN
      name ← unquote(line.after("name:").trim())
    ELSE IF line.starts_with("description:") THEN
      description ← unquote(line.after("description:").trim())
    END IF
    // All other YAML fields silently ignored
  END FOR

  IF name is empty THEN RETURN Err(MissingField("name")) END IF
  IF description is empty THEN RETURN Err(MissingField("description")) END IF

  RETURN Ok((name, description))

  // unquote(): strips surrounding single or double quotes if present

END FUNCTION


ListFilesTool::execute (src/tools/list.rs)

Purpose: List files in a directory, with optional glob filtering and depth limit.

FUNCTION ListFilesTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>

  path      ← params["path"]      as String  // optional; default: current directory
  pattern   ← params["pattern"]   as String  // optional glob filter, e.g. "*.rs"
  max_depth ← params["max_depth"] as usize   // optional; default: 3

  IF ctx.cancel.is_cancelled THEN RETURN Err(Cancelled) END IF

  // Build `find` command
  cmd ← "find {path} -maxdepth {max_depth} -type f"
  IF pattern defined THEN cmd += " -name '{pattern}'" END IF
  // Excluded paths (prepended to command):
  //   -not -path "*/target/*"
  //   -not -path "*/.git/*"
  //   -not -path "*/node_modules/*"

  SELECT {
    ctx.cancel.cancelled() → RETURN Err(Cancelled)
    sleep(self.timeout)    → RETURN Err(Failed("List timed out"))
    run(cmd)               → output
  }

  lines ← output.stdout.split_lines()

  truncated ← false
  IF lines.count() > self.max_results THEN
    lines ← lines[0..self.max_results]
    truncated ← true
  END IF

  text ← lines.join("\n")
  IF truncated THEN
    text += "\n... (truncated at {self.max_results} results)"
  END IF

  RETURN Ok(ToolResult {
    content: [Text(text)],
    details: { total: lines.count(), truncated }
  })

END FUNCTION

Defaults: max_results = 200, timeout = 10s


SearchTool::execute (src/tools/search.rs)

Purpose: Search file contents using regex via ripgrep (preferred) or grep (fallback).

FUNCTION SearchTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>

  pattern        ← params["pattern"]        as String  // required; regex
  path           ← params["path"]           as String  // optional; default: self.root or cwd
  include        ← params["include"]        as String  // optional file glob, e.g. "*.rs"
  case_sensitive ← params["case_sensitive"] as bool    // optional; default: false

  IF ctx.cancel.is_cancelled THEN RETURN Err(Cancelled) END IF

  // Prefer ripgrep (rg) if available, fall back to grep
  IF rg_available() THEN
    cmd ← ["rg", "--line-number", "--no-heading",
            "--max-count={self.max_results}"]
    IF NOT case_sensitive THEN cmd += ["--ignore-case"] END IF
    IF include defined THEN cmd += ["--glob={include}"] END IF
    cmd += [pattern, path]
  ELSE
    cmd ← ["grep", "-r", "-n", "-m{self.max_results}"]
    IF NOT case_sensitive THEN cmd += ["-i"] END IF
    IF include defined THEN cmd += ["--include={include}"] END IF
    cmd += [pattern, path]
  END IF

  SELECT {
    ctx.cancel.cancelled() → RETURN Err(Cancelled)
    sleep(self.timeout)    → RETURN Err(Failed("Search timed out"))
    run(cmd)               → (exit_code, stdout, stderr)
  }

  // Exit code 1 = no matches found (not an error)
  IF exit_code == 1 AND stderr is empty THEN
    stdout ← ""
  END IF
  // Exit code 2+ or non-empty stderr = actual failure
  IF exit_code >= 2 OR (exit_code != 0 AND stderr non-empty) THEN
    RETURN Err(Failed(stderr))
  END IF

  lines ← stdout.split_lines()
  match_count ← lines.count()

  text ← stdout
  IF match_count >= self.max_results THEN
    text += "\n... (truncated at {self.max_results} matches)"
  END IF

  RETURN Ok(ToolResult {
    content: [Text(text)],
    details: { matches: match_count }
  })

END FUNCTION

Defaults: max_results = 50, timeout = 30s Output format: {file}:{line_number}:{matched_line}


For pseudocode conventions, see the README.

McpClient::initialize (src/mcp/)

Purpose: Perform the 3-step MCP handshake to establish a session with a tool server.

FUNCTION McpClient::connect_stdio(
  command: String,
  args: Vec<String>,
  env: Option<Map<String,String>>
) -> Result<McpClient, McpError>

  // Spawn child process
  process ← spawn_process(command, args, env,
    stdin=piped, stdout=piped, stderr=inherit)
  // McpError::Transport on spawn failure

  transport ← StdioTransport { process }
  client ← McpClient { transport: Arc(Mutex(transport)), server_info: None }

  AWAIT client.initialize()
  RETURN Ok(client)

END FUNCTION

FUNCTION McpClient::initialize() -> Result<ServerInfo, McpError>

  // Step 1: send initialize
  result ← AWAIT self.send_request("initialize", {
    protocolVersion: "2024-11-05",
    capabilities: {},
    clientInfo: { name: "phi-core", version: CARGO_PKG_VERSION }
  })
  // Deserialize result as InitializeResult { protocolVersion, capabilities, serverInfo }

  self.server_info ← Some(result.serverInfo)

  // Step 2: send notifications/initialized (no params)
  AWAIT self.send_request("notifications/initialized", None)
  // Server may ignore the response id for this notification

  RETURN Ok(result.serverInfo)

END FUNCTION

FUNCTION McpClient::send_request(method: String, params: Option<Value>) -> Result<Value, McpError>

  request ← JsonRpcRequest {
    jsonrpc: "2.0",
    id: ATOMIC_COUNTER.fetch_add(1),  // monotonically increasing from 1
    method,
    params
  }

  response ← AWAIT self.transport.send(request)

  IF response.error is Some THEN
    RETURN Err(JsonRpc { code: error.code, message: error.message })
  END IF

  IF response.result is None THEN
    RETURN Err(Protocol("Empty result"))
  END IF

  RETURN Ok(response.result)

END FUNCTION

FUNCTION McpClient::list_tools() -> Result<Vec<McpToolInfo>, McpError>
  result ← AWAIT self.send_request("tools/list", {})
  RETURN deserialize result.tools as Vec<McpToolInfo>
END FUNCTION

FUNCTION McpClient::call_tool(name: String, arguments: Value) -> Result<McpToolCallResult, McpError>
  result ← AWAIT self.send_request("tools/call", { name, arguments })
  RETURN deserialize result as McpToolCallResult
END FUNCTION

For pseudocode conventions, see the README.

OpenApiToolAdapter::execute (src/openapi/)

Purpose: Execute a single OpenAPI operation as an HTTP request.

FUNCTION OpenApiToolAdapter::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>

  // Normalize params: null → {}; non-object → error
  IF params is null THEN params ← {} END IF
  IF params is NOT object THEN
    RETURN Ok(ToolResult { content: [Text("Error: params must be an object")] })
  END IF

  // ── Step 1: Substitute path parameters ────────────────────────────────────
  url_path ← self.info.path  // e.g. "/users/{userId}/posts/{postId}"
  FOR EACH param_name IN self.info.path_params
    value ← params[param_name]
    IF value is missing THEN
      RETURN Ok(ToolResult { content: [Text("Error: missing required path param '{param_name}'")] })
    END IF
    encoded ← percent_encode_rfc3986(value.to_string())
    url_path ← replace(url_path, "{" + param_name + "}", encoded)
  END FOR

  // ── Step 2: Build base URL ─────────────────────────────────────────────────
  url ← self.base_url + url_path

  // ── Step 3: Build HTTP request ────────────────────────────────────────────
  method ← parse_http_method(self.info.method)  // GET, POST, PUT, etc.
  request ← self.client.request(method, url)

  // Query parameters
  FOR EACH param_name IN self.info.query_params
    IF params[param_name] defined THEN
      request ← request.query(param_name, params[param_name].to_string())
    END IF
  END FOR

  // Header parameters
  FOR EACH param_name IN self.info.header_params
    IF params[param_name] defined THEN
      request ← request.header(param_name, params[param_name].to_string())
    END IF
  END FOR

  // Authentication
  MATCH self.config.auth
    CASE None           → (no-op)
    CASE Bearer(token)  → request ← request.bearer_auth(token)
    CASE ApiKey{header,value} → request ← request.header(header, value)
  END MATCH

  // Custom headers
  FOR EACH (key, value) IN self.config.custom_headers
    request ← request.header(key, value)
  END FOR

  // Request body (application/json only)
  IF self.info.has_body THEN
    body ← params["body"] OR params["_request_body"]
    IF body defined THEN
      request ← request.json(body)

Developer Conceptual Hierarchy

A developer-facing map of every concept in phi-core, centered on the Agent entity. Designed to enable a future UI layer. Every concept is tagged: [EXISTS] = in code now | [PLANNED] = defined but not implemented | [CONCEPTUAL] = idea only


The Agent: Three Attributes + Skills

                              ┌──────────────────┐
                              │      AGENT       │
                              │   agent_id [E]   │
                              └───────┬──────────┘
                                      │
           ┌──────────────┬───────────┼───────────┬──────────────┐
           │              │           │           │              │
    ┌──────▼──────┐ ┌─────▼─────┐ ┌──▼───┐ ┌────▼─────┐ ┌──────▼──────┐
    │   Profile   │ │ Sessions  │ │Skills│ │   MCP    │ │Introspection│
    │    [E]      │ │   [E]     │ │ [E]  │ │   [E]    │ │    [C]      │
    │ personality │ │  (Tasks)  │ │      │ │connectors│ │   memory    │
    └──────┬──────┘ └─────┬─────┘ └──────┘ └──────────┘ └──────┬──────┘
           │              │                                     │
           │         ┌────▼─────┐                         ┌────▼─────┐
           │         │  Session │                         │  Memory  │
           │         │   [E]    │                         │   [C]    │
           │         └────┬─────┘                         ├──────────┤
           │              │                               │Episodic  │
           │         ┌────▼─────┐                         │Semantic  │
           │         │   Loop   │                         │Procedural│
           │         │   [E]    │                         └──────────┘
           │         └────┬─────┘
           │              │
           │         ┌────▼─────┐
           │         │   Turn   │
           │         │   [E]    │
           │         └────┬─────┘
           │              │
           │    ┌─────────┼──────────┐
           │    │         │          │
           │  ┌─▼──┐  ┌──▼───┐  ┌──▼──┐
           │  │Msg │  │ Tool │  │Delta│
           │  │[E] │  │ [E]  │  │ [E] │
           │  └────┘  └──────┘  └─────┘
           │
    ┌──────▼──────────────────────────────────────┐
    │            INDEPENDENT ENTITIES              │
    ├─────────────────────────────────────────────┤
    │  Provider [E]     Event [E]                 │
    │  Message [E]      Compaction [E]            │
    │  Configuration [E]                          │
    │  SystemPromptStrategy [E]                   │
    │  ContextTranslationStrategy [E]             │
    └─────────────────────────────────────────────┘

[E] = EXISTS    [P] = PLANNED    [C] = CONCEPTUAL

Model/Provider Fallback Hierarchy

Loop model (LoopConfigSnapshot)  →  Agent default model
         [EXISTS]                       [EXISTS]

Each loop captures its model config in LoopConfigSnapshot at AgentStart time. Session-level model override has been removed; the fallback is directly to the Agent's default model.


Entity Quick Reference

EntityCode LocationStatusDeep Dive
Agentagents/basic_agent.rs[EXISTS]agent.md
Agent Profileagents/profile.rs[EXISTS]agent.md
Sessionsession/model.rs[EXISTS]session.md
Loop (LoopRecord)session/model.rs[EXISTS]loop.md
Turnsession/model.rs + event-pair[EXISTS] events; [EXISTS] structturn.md
Messagetypes/content.rs[EXISTS]message.md
AgentMessagetypes/agent_message.rs[EXISTS]message.md
Tooltypes/tool.rs[EXISTS]tool.md
Providerprovider/model.rs[EXISTS]provider.md
Eventtypes/event.rs[EXISTS]event.md
Compactioncontext/compaction.rs[EXISTS]compaction.md
Configurationcontext/config.rs + agent_loop/config.rs[EXISTS]config.md
SystemPromptStrategytrait + implementations[EXISTS]agent.md
ContextTranslationStrategyprovider/context_translation.rs[EXISTS]provider.md
Introspection / Memorynot in code[CONCEPTUAL]agent.md
Permissionsnot in code[CONCEPTUAL]agent.md

Callback Ownership

Callbacks live on the entity they observe:

CallbackOwnerStatus
before_task / after_taskSession (SessionRecorderConfig)[EXISTS]
before_loop / after_loopLoop[EXISTS]
on_errorLoop[EXISTS]
before_turn / after_turnTurn[EXISTS]
before_tool_execution / after_tool_executionTool[EXISTS]
before_tool_execution_update / after_tool_execution_updateTool[EXISTS]
before_compaction_start / after_compaction_endCompaction[EXISTS]

Conceptual vs Code: Key Misalignments

These are places where the conceptual model differs from current code. They represent future refactoring opportunities:

ConceptStatusNotes
Agent Profile[EXISTS]AgentProfile struct in agents/profile.rs with profile_id, name, description, system_prompt, etc.
thinking_level on SessionRemovedSession-level thinking_level removed. Now captured per-loop in LoopConfigSnapshot. AgentProfile::resolve_thinking_level() removed.
temperature on SessionRemovedSession-level temperature removed. Now captured per-loop in LoopConfigSnapshot. AgentProfile::resolve_temperature() removed.
Session modelRemovedSession-level model_config removed. Model config is now captured per-loop in LoopConfigSnapshot.
Session scope[EXISTS]SessionScope::Ephemeral | Persistent (G7).
SystemPromptStrategy[EXISTS]Trait + 3-entity model (strategy template → prompt instance → agent ref). file: and {{...}} resolution.
Compaction config[EXISTS]Strategies consolidated into CompactionConfig (G5).
before_task / after_task[EXISTS]On SessionRecorderConfig (G2).
ContextTranslationStrategy[EXISTS]Trait + DefaultContextTranslation in provider/context_translation.rs (G8).
Introspection[CONCEPTUAL]Memory extraction with 3 categories (episodic, semantic, procedural). Not in code.
Permissions[CONCEPTUAL]Include/exclude rules on Agent. Not in code.

Core Gaps

Prioritized list of features that belong in phi-core (per First Principles) but are not yet implemented. Each gap is derived from [CONCEPTUAL] items in the entity specs.

Priority 1 — Small, High-Value — ALL IMPLEMENTED ✓

IDFeatureStatus
G1Compaction callbacks (before_compaction_start / after_compaction_end)[EXISTS] — On AgentLoopConfig.
G3Agent Profile struct[EXISTS]AgentProfile in agents/profile.rs.
G4Session model overrideRemoved — Session.model_config removed. Model config now captured per-loop in LoopConfigSnapshot.
G7Session scope[EXISTS]SessionScope::Ephemeral | Persistent.
G9Session task attributesRemoved — Session.thinking_level, Session.temperature, and Session.model_config moved to per-loop LoopConfigSnapshot. AgentProfile::resolve_thinking_level() and resolve_temperature() removed.

Priority 2 — Medium Refactors

IDFeatureWhy CoreEffortSpec Ref
G5Compaction config consolidation [EXISTS]Compaction strategies (in_memory_strategy, block_strategy) are now fields on CompactionConfig, consolidating what was previously split across ContextConfig + AgentLoopConfig.~100 LOCconfig.md, misalignment table above
G2Session-level callbacks (before_task / after_task) [EXISTS]before_task and after_task callbacks now exist on SessionRecorderConfig. before_task fires on the first AgentStart with a new session_id; after_task fires on flush().~80 LOCcallback ownership table above
G6SystemPromptStrategy trait [EXISTS]The SystemPromptStrategy trait now exists with a compose(context) -> String method. Supports a 3-entity model: strategy template, prompt instance, profile ref. Full 5-layer composition is a future enhancement.~100 LOCagent.md

Priority 3 — Needs Design

IDFeatureWhy CoreEffortSpec Ref
G8ContextTranslationStrategy [EXISTS]ContextTranslationStrategy trait with DefaultContextTranslation. Read-only translation for cross-provider compatibility.~150 LOCprovider.md, misalignment table above
G10Tool Registry [EXISTS]ToolRegistry maps config tool names to instances. 6 built-in tools registered.~200 LOCconfig.md

External — Not Core

These are explicitly not core gaps. They can be built on top of phi-core using existing extension points:

ItemExtension Point
Introspection / MemoryExternal crate using G1 compaction callbacks + session data
PermissionsInputFilter + BeforeToolExecutionFn
Multi-agent orchestrationagent_loop / agent_loop_continue / agent_loop_parallel
Model fallback chainsCustom StreamProvider wrapping multiple providers
Observability backendsAgentEvent stream
Domain toolsAgentTool trait

Deep Dive Files

Each entity has its own deep dive document in this folder:

  • agent.md — Agent Profile, Capabilities, Skills, MCP, Permissions, Introspection
  • session.md — Session (Task): identity, scope, formation, model, loops, input filters
  • loop.md — Loop (Iteration): model, turns, compaction, parallel groups, callbacks
  • turn.md — Turn (Step): trigger, messages, tool executions, streaming
  • message.md — Content, Message, AgentMessage, LlmMessage, ExtensionMessage
  • tool.md — AgentTool trait, ToolContext, execution strategies, callbacks
  • provider.md — ModelConfig, ApiProtocol, registry, ContextTranslationStrategy
  • event.md — AgentEvent lifecycle, StreamDelta, event flow
  • compaction.md — CompactionBlock, strategies, scope, callbacks
  • config.md — ContextConfig, ExecutionLimits, CacheConfig, AgentLoopConfig, hooks

Agent

The central entity in the system. An Agent combines a given identity (Agent Profile), capabilities (tools, skills, MCP connections), permissions, and introspection into a single runtime unit that executes Sessions (tasks).

The Agent trait defines the runtime interface (prompting, state access, control, steering queues). BasicAgent is the default in-memory implementation that owns conversation state, tools, and provider configuration.

Concept Overview

Agent
├── HEADER
│   ├── agent_id [EXISTS] — UUID, immutable
│   ├── Agent Profile [EXISTS] — AgentProfile struct (src/agents/profile.rs)
│   │   ├── profile_id [EXISTS] — distinct from agent_id; shareable across agents
│   │   ├── SystemPromptStrategy [EXISTS] — how system prompt is composed
│   │   │   └── static system_prompt string [EXISTS], file: prefix [EXISTS], {{...}} 3-entity chain [EXISTS]
│   │   ├── Agent Name [EXISTS] — Option<String> on AgentProfile
│   │   └── Agent Description [EXISTS] — Option<String> on AgentProfile
│   ├── Limits (Agent-level)
│   │   ├── context_config [EXISTS]
│   │   ├── execution_limits [EXISTS]
│   │   └── retry_config [EXISTS]
│   └── Default Model [EXISTS — BasicAgent.model_config]
│       └── Fallback when Session and Loop don't specify their own
│
├── TAB: Sessions (Tasks) [EXISTS]
│   └── (drill-down: Session → Loop → Turn)
├── TAB: Capabilities [EXISTS as Vec<Arc<dyn AgentTool>>]
│   ├── Tools [EXISTS]  ├── Sub-agents [EXISTS]
│   ├── OpenAPI tools [EXISTS]  └── Built-in tools [EXISTS]
├── TAB: Skills [EXISTS as SkillSet; CONCEPTUAL as browsable tab]
├── TAB: MCP Connections [EXISTS]
├── TAB: Permissions [CONCEPTUAL]
│   ├── Include rules [CONCEPTUAL]  └── Exclude rules [CONCEPTUAL]
├── TAB: Introspection [CONCEPTUAL] — mandatory when scope = Persistent
│   ├── Episodic Memory [CONCEPTUAL]  ├── Semantic Memory [CONCEPTUAL]
│   ├── Procedural Memory [CONCEPTUAL]
│   ├── Identity Shaping  └── Knowledge Base
│
└── STATE (runtime) [EXISTS]
    ├── session_id [EXISTS]  ├── messages [EXISTS]
    ├── queues [EXISTS]  └── counters [EXISTS]

FieldTypeStatusDescription
agent_idString (UUID v4)[EXISTS]Stable identifier assigned at construction. Included in every AgentStart event. Immutable for the lifetime of the agent instance.
Agent ProfileAgentProfile[EXISTS]Reusable identity blueprint (src/agents/profile.rs). Separate struct from Agent — multiple agents can share one profile via config instances. Fields: profile_id, name, description, system_prompt, thinking_level, temperature, max_tokens, config_id, skills, workspace.
profile_idString[EXISTS]Distinct from agent_id. Allows profile sharing across agents. Auto-generated UUID if not set.
SystemPromptStrategytrait[EXISTS]Defines block structure for multi-block prompt composition (src/agents/system_prompt.rs). Uses a 3-entity model: strategy template (block definitions with order + max_length), prompt instance (content filling blocks, supports file: paths), agent reference (via {{system_prompt.name}}). See configuration guide.
system_promptOption<String>[EXISTS]System prompt string. Lives on AgentProfile.system_prompt. Supports inline text, file:path (relative to workspace), or {{...}} reference to a prompt instance. Resolution: agent > profile instance > base profile.
Agent NameOption<String>[EXISTS]Human-readable name. Lives on AgentProfile.name.
Agent DescriptionOption<String>[EXISTS]Description of the agent's purpose. Lives on AgentProfile.description.
workspaceOption<PathBuf>[EXISTS]Working directory. Lives on AgentProfile as blueprint default; BasicAgent stores an agent-level override. Resolution: agent workspace > profile workspace > current directory.
model_configModelConfig[EXISTS]Default model for this agent. Falls back here when Session and Loop don't specify their own. Contains: model id, API key, base URL, API protocol, cost rates, context window size.
context_configOption<ContextConfig>[EXISTS]Token budget and compaction policy. Agent-level limit.
execution_limitsOption<ExecutionLimits>[EXISTS]Max turns (50), max tokens (1M), max duration (10 min), cost tracking. Agent-level limit.
retry_configRetryConfig[EXISTS]Retry policy for provider errors. Exponential backoff with jitter. Agent-level.
cache_configCacheConfig[EXISTS]Prompt caching behavior (enabled/disabled, strategy: Auto/Disabled/Manual).
tool_executionToolExecutionStrategy[EXISTS]How tool calls are executed: Parallel (default), Sequential, Batched.
thinking_levelThinkingLevel[EXISTS] on AgentControls depth of model reasoning (Off/Minimal/Low/Medium/High). Agent default; per-loop values tracked in LoopConfigSnapshot.
temperatureOption<f32>[EXISTS] on AgentSampling temperature. Agent default; per-loop values tracked in LoopConfigSnapshot.
max_tokensOption<u32>[EXISTS]Max output tokens per response. None = use model default.
provider_overrideOption<Arc<dyn StreamProvider>>[EXISTS]Escape hatch for test injection or custom providers. Bypasses ProviderRegistry dispatch.

TAB: Sessions (Tasks) [EXISTS]

Sessions are the actions an agent performs. Each Session contains Loops (iterations) which contain Turns (steps). See session.md.

FieldTypeStatusDescription
session_idString (UUID v4)[EXISTS]Current session identifier. Rotatable via check_and_rotate.

TAB: Capabilities [EXISTS]

Registered tools available to the agent. Stored as Vec<Arc<dyn AgentTool>>.

CapabilityStatusDescription
Tools[EXISTS]Registered AgentTool implementations. Added via with_tools().
Sub-agents[EXISTS]Via SubAgentTool. Spawns child agent loops in separate sessions.
OpenAPI tools[EXISTS]Auto-generated from OpenAPI 3.0 spec via OpenApiToolAdapter. Feature-gated (openapi).
Built-in tools[EXISTS]Bash, File, Edit, Grep, ListDir, ReadFile.

TAB: Skills [EXISTS] as SkillSet; [CONCEPTUAL] as browsable tab

Declarative capabilities loaded from SKILL.md files with YAML frontmatter.

FieldStatusDescription
SkillSet[EXISTS]Loaded via with_skills(). Discovery and loading from filesystem.
Skill discovery[EXISTS]Finds <name>/SKILL.md files.
Skill browsing / editing[CONCEPTUAL]Interactive skill management in a UI.

TAB: MCP Connections [EXISTS]

Model Context Protocol integration for external tool servers.

FieldStatusDescription
MCP server connections[EXISTS]Stdio and HTTP transports via McpClient / McpTransport.
Discovered tools[EXISTS]Auto-registered from MCP server via McpToolAdapter. Transparent to agent loop.
MCP connection management[CONCEPTUAL]Browsable tab for managing connections in a UI.

TAB: Permissions [CONCEPTUAL]

Access control for agent actions. Not yet implemented.

FieldStatusDescription
Include rules[CONCEPTUAL]Whitelist of allowed actions.
Exclude rules[CONCEPTUAL]Blacklist of denied actions.

TAB: Introspection [CONCEPTUAL]

Memory extraction from session logs and identity. Mandatory when Session scope is Persistent.

Memory Categories

CategoryStatusDescription
Episodic Memory[CONCEPTUAL]What happened in past sessions (events, conversations).
Semantic Memory[CONCEPTUAL]Distilled knowledge (facts, concepts, relationships).
Procedural Memory[CONCEPTUAL]Successful strategies learned over time (patterns, playbooks).

Memory Destinations

DestinationStatusDescription
Identity Shaping[CONCEPTUAL]Memory feeds back to evolve the Agent Profile.
Knowledge Base[CONCEPTUAL]Searchable database for future use.

Agent State (Runtime) [EXISTS]

Mutable state that changes during execution.

FieldTypeStatusDescription
session_idString[EXISTS]Current session. Rotatable via check_and_rotate on inactivity timeout.
messagesVec<AgentMessage>[EXISTS]Full conversation history (LLM + Extension messages).
steering_queueArc<Mutex<Vec<AgentMessage>>>[EXISTS]Mid-run interrupt messages. Drained per steering_mode (OneAtATime / All).
follow_up_queueArc<Mutex<Vec<AgentMessage>>>[EXISTS]Post-turn follow-up messages. Drained per follow_up_mode.
loop_countersHashMap<String, usize>[EXISTS]Per-(session, config) monotonic counters for loop ID generation.
last_loop_idOption<String>[EXISTS]Most recently started loop. Used for parent_loop_id in continuations.
last_active_atOption<DateTime<Utc>>[EXISTS]Timestamp of last prompt call. Used by check_and_rotate for inactivity detection.
cancelOption<CancellationToken>[EXISTS]Abort handle. Some during streaming, None otherwise.
is_streamingbool[EXISTS]Guard against concurrent prompt() calls.
sessionOption<Session>[EXISTS]Optional session for block-based compaction.

Code Reference

FileWhat it contains
src/agents/agent.rsAgent trait — runtime interface (~40 methods: prompting, state, control, steering queues, hook setters). QueueMode enum.
src/agents/basic_agent.rsBasicAgent struct — default in-memory implementation. Builder pattern. All fields listed above.
src/agents/profile.rsAgentProfile struct — reusable identity blueprint with profile_id, name, description, system_prompt, thinking_level, temperature, max_tokens, config_id, skills, workspace.
src/agents/system_prompt.rsSystemPromptStrategy trait, SystemPrompt struct, PromptBlockDef, built-in strategies (Custom, Agent, Minimal). Compose logic with file: resolution.

Conceptual Notes

  • Agent Profile as a separate struct does not exist in code. The system_prompt field lives directly on BasicAgent. A future AgentProfile struct would hold profile_id, SystemPromptStrategy, name, and description, enabling profile sharing across agents.
  • SystemPromptStrategy now exists as a trait with a compose(context) -> String method. It follows a 3-entity model: strategy template (the trait implementation), prompt instance (concrete prompt for a given context), profile ref (agent profile reference). Full 5-layer composition (base personality, task context, tool/skill index, memory context, turn-specific instructions) is future work. BasicAgent retains a static system_prompt string as a fallback.
  • thinking_level and temperature are Agent-level defaults. Per-loop values are captured in LoopConfigSnapshot on each LoopRecord. AgentProfile::resolve_thinking_level() and resolve_temperature() have been removed; resolution is now direct from AgentLoopConfig.
  • Introspection is the largest conceptual gap. It requires session log analysis, memory categorization (episodic/semantic/procedural), and feedback loops to Agent Profile evolution.

Session

A named container grouping all LoopRecords for one agent session. A Session represents a task the agent performs. It has identity, formation history, configuration, and contains an ordered sequence of Loops (iterations).

Sessions are created automatically by SessionRecorder when a new session_id first appears in an AgentStart event, or explicitly by the caller.

Concept Overview

Session [EXISTS]
├── HEADER
│   ├── session_id, agent_id [EXISTS]
│   ├── formation [EXISTS] — Explicit / FirstLoop / InactivityTimeout
│   ├── scope [EXISTS] — Ephemeral / Persistent (SessionScope enum)
│   ├── created_at, last_active_at [EXISTS]
│   ├── parent_spawn_ref [EXISTS] — cross-session link
│   ├── Task Name, Task Status [CONCEPTUAL]
│   └── Callbacks: before_task / after_task [EXISTS]
├── LINE ITEMS: Loops [EXISTS]
├── LINE ITEMS: Input Filters [EXISTS]
└── SUMMARY: total_usage(), loop_chain_to() [EXISTS]

HEADER

FieldTypeStatusDescription
session_idString[EXISTS]Stable identifier. Matches AgentStart.session_id. Generated as UUID v4 at BasicAgent::new().
agent_idString[EXISTS]The agent that owns this session. Taken from the first AgentStart event.
formationSessionFormation[EXISTS]How the session was created. See Formation section below.
scopeSessionScope[EXISTS]Ephemeral (default, in-memory only) or Persistent (session logs retained). Declared via config [session] scope = "persistent".
created_atDateTime<Utc>[EXISTS]Timestamp of the first AgentStart event for this session.
last_active_atDateTime<Utc>[EXISTS]Updated each time a new loop opens (on AgentStart). Reflects when the last loop started, not when it last had activity.
parent_spawn_refOption<SpawnRef>[EXISTS]Cross-session link when this session was spawned as a sub-agent. Points back to parent session, loop, tool call. Inverse of LoopRecord.child_loop_refs.
Task NameString[CONCEPTUAL]Human-readable label for the task this session represents.
Task Statusenum[CONCEPTUAL]Status of the task (e.g., Pending, Running, Completed, Failed). Derived from loop statuses but would be a first-class field.

Formation [EXISTS]

How the session was initially created. Enum SessionFormation:

VariantStatusDescription
Explicit { timestamp }[EXISTS]Created by direct construction (tests, tooling). SessionRecorder never sets this.
FirstLoop { timestamp }[EXISTS]Created automatically when a new session_id first appeared in an AgentStart event.
InactivityTimeout { threshold_secs, previous_session_id, timestamp }[EXISTS]New session opened because the agent was idle longer than the threshold. Requires prior session_id rotation via BasicAgent::check_and_rotate.

Callbacks [EXISTS]

Callbacks are configured on SessionRecorderConfig, not on the Session struct directly.

CallbackTypeStatusDescription
before_taskOption<BeforeTaskFn>[EXISTS]Fires on the first AgentStart event with a new session_id. Blank by default.
after_taskOption<AfterTaskFn>[EXISTS]Fires on flush(). Blank by default.

LINE ITEMS: Loops (Iterations) [EXISTS]

Ordered list of all LoopRecords in this session, sorted by started_at.

FieldTypeStatusDescription
loopsVec<LoopRecord>[EXISTS]All completed and in-progress loop records. See loop.md.

Loop Tree Structure

The tree is implicit via parent_loop_id / children_loop_ids links:

  • Root loops -- parent_loop_id is None (or points to a loop in a different session for sub-agent roots).
  • Continuation chains -- parent_loop_id -> loop_id within the same session.
  • Parallel branches -- siblings sharing the same parent_loop_id, each with parallel_group set.
  • Sub-agent children -- in child_loop_refs on the parent loop (cross-session, not in loops vec).

LINE ITEMS: Input Filters [EXISTS]

Input filters validate user messages before the LLM is called. Stored on AgentLoopConfig.input_filters, conceptually a Session-level concern.

FieldTypeStatusDescription
input_filtersVec<Arc<dyn InputFilter>>[EXISTS]Each filter returns Pass, Warn, or Reject for a given message. Reject aborts the loop before any LLM call and emits InputRejected.

SUMMARY Methods [EXISTS]

Methods on the Session struct for querying and aggregating.

MethodStatusDescription
total_usage()[EXISTS]Cumulative Usage across all loops. Sums input, output, reasoning, cache_read, cache_write, total_tokens.
loop_chain_to(target_loop_id)[EXISTS]Builds the linear chain of loop IDs from root to target by walking parent_loop_id links backward. Returns chronological order (root first). Handles parallel branches (only selected path) and reruns (only active ancestor chain).
root_loops()[EXISTS]Returns loops whose parent_loop_id is None or belongs to a different session.
children_of(loop_id)[EXISTS]Returns direct same-session children of a loop.
parallel_siblings(loop_id)[EXISTS]Returns all loops in the same parallel group.
get_loop(loop_id)[EXISTS]Look up a loop by ID.

Code Reference

FileWhat it contains
src/session/model.rsSession struct, SessionFormation enum, SpawnRef struct, SessionError enum. All methods (total_usage, loop_chain_to, root_loops, children_of, parallel_siblings, get_loop).

Conceptual Notes

  • Session scope (Ephemeral vs Persistent) does not exist in code. All sessions are currently ephemeral by default. Adding scope would gate whether Introspection is required.
  • Model/thinking/temperature per-loop -- These settings are no longer on Session. They are tracked per-loop via LoopConfigSnapshot on each LoopRecord (see loop.md). The fallback hierarchy is Loop -> Agent default.
  • Task Name and Task Status would give sessions first-class task identity, enabling task dashboards and workflow tracking.
  • before_task / after_task callbacks now exist on SessionRecorderConfig. before_task fires on the first AgentStart with a new session_id; after_task fires on flush(). This mirrors the existing before_loop/after_loop and before_turn/after_turn callback pattern at the Session level.

Loop

A complete record of one agent-loop execution, stored as LoopRecord. Loops are the iterations within a Session. Each Loop contains Turns (steps), tracks its model/provider configuration, accumulates usage, and links to parent/child loops for tree navigation.

Loops are created by agent_loop (origin loops) or agent_loop_continue (continuation loops). The SessionRecorder materializes LoopRecord structs from the AgentStart / AgentEnd event pairs.

Concept Overview

Loop [EXISTS — LoopRecord]
├── HEADER
│   ├── loop_id [EXISTS] — "{session_id}.{config_segment}.{N}"
│   ├── status [EXISTS] — Pending/Running/Completed/Rejected/Aborted
│   ├── continuation_kind [EXISTS] — Initial/Default/Rerun/Branch/Compaction
│   ├── parent_loop_id [EXISTS]
│   ├── timing [EXISTS] — started_at, ended_at
│   ├── Model [EXISTS] — falls back: Loop → Agent default
│   ├── config [EXISTS] — LoopConfigSnapshot
│   ├── usage, compaction_block [EXISTS]
│   └── Callbacks: before_loop / after_loop / on_error [EXISTS]
├── LINE ITEMS: Turns [EXISTS as events and struct]
├── LINE ITEMS: Same-session children, Sub-agent spawns [EXISTS]
├── LINE ITEMS: Parallel group [EXISTS]
└── LINE ITEMS: Events [EXISTS]

HEADER

FieldTypeStatusDescription
loop_idString[EXISTS]Unique identifier. Format: "{session_id}.{config_segment}.{N}". The config_segment encodes which model/provider produced this loop. N is a monotonic counter per (session, config).
session_idString[EXISTS]Session this loop belongs to.
agent_idString[EXISTS]Agent that ran this loop.
statusLoopStatus[EXISTS]Lifecycle state: Pending, Running, Completed, Rejected, Aborted. See Status section below.
continuation_kindContinuationKind[EXISTS]How this loop relates to its parent. Initial for origin loops (agent_loop). Default for regular continuations. Rerun for retries. Branch for branch explorations. Compaction for standalone compaction passes.
parent_loop_idOption<String>[EXISTS]The loop that directly preceded this one. None for origin loops. For sub-agent loops, points to the tool-call loop in a different session.
started_atDateTime<Utc>[EXISTS]Timestamp from AgentStart.
ended_atOption<DateTime<Utc>>[EXISTS]Timestamp from AgentEnd. None while running or pending.
rejectionOption<String>[EXISTS]Set when AgentEnd.rejection is Some (input filter blocked the run).
metadataOption<serde_json::Value>[EXISTS]Opaque caller-supplied metadata from AgentStart (e.g., request id, trace ID).

Model for this Loop [EXISTS]

The model/provider identity is captured as a lightweight snapshot, not the full config (which contains secrets and non-serializable closures).

FieldTypeStatusDescription
configOption<LoopConfigSnapshot>[EXISTS]Populated from AgentStart.config_snapshot or the first Message::Assistant seen. None if loop ended before any assistant message and no snapshot was provided.
config.modelString[EXISTS]Model id string (e.g., "claude-opus-4-6", "gpt-4o").
config.providerString[EXISTS]Provider name (e.g., "anthropic", "openai").
config.config_idOption<String>[EXISTS]Stable config identity from AgentLoopConfig.config_id. Matches the config_segment in loop_id.
config.nameOption<String>[EXISTS]Model display name.
config.apiOption<ApiProtocol>[EXISTS]Which API protocol was used (e.g., AnthropicMessages, OpenAiCompletions).
config.base_urlOption<String>[EXISTS]Provider base URL.
config.reasoningOption<bool>[EXISTS]Whether this model supports reasoning/thinking.
config.context_windowOption<u32>[EXISTS]Context window size in tokens.
config.max_tokensOption<u32>[EXISTS]Max output tokens per response.
config.thinking_levelOption<ThinkingLevel>[EXISTS]Reasoning depth level for this loop. Formerly a Session-level attribute; now per-loop.
config.temperatureOption<f32>[EXISTS]Sampling temperature. Formerly a Session-level attribute; now per-loop.

Model fallback hierarchy: Loop (AgentLoopConfig.model_config) -> Agent default (BasicAgent.model_config).

Usage [EXISTS]

FieldTypeStatusDescription
usageUsage[EXISTS]Token usage from AgentEnd.usage. Accumulated across all turns in this loop. Fields: input, output, reasoning, cache_read, cache_write, total_tokens.

Compaction [EXISTS]

FieldTypeStatusDescription
compaction_blockOption<CompactionBlock>[EXISTS]Non-destructive compaction overlay. When Some, the context loader uses this block instead of raw messages. Original messages remain untouched.

Status [EXISTS]

Lifecycle state of a LoopRecord. Enum LoopStatus:

Pending -> Running -> Completed
                   -> Rejected
                   -> Aborted
VariantStatusDescription
Pending[EXISTS]Loop id appeared in ParallelLoopStart but AgentStart has not yet arrived. Only for parallel-evaluation branches.
Running[EXISTS]AgentStart was received; the loop is executing.
Completed[EXISTS]AgentEnd was received with no rejection.
Rejected[EXISTS]AgentEnd was received with rejection: Some(_). Input filter blocked the run.
Aborted[EXISTS]SessionRecorder::flush was called before AgentEnd arrived (e.g., process shutdown).

Callbacks [EXISTS]

CallbackStatusDescription
before_loop[EXISTS]Fires before AgentStart is emitted. Defined as BeforeLoopFn on AgentLoopConfig. Blank by default.
after_loop[EXISTS]Fires after AgentEnd is emitted. Defined as AfterLoopFn. Receives messages and usage. Blank by default.
on_error[EXISTS]Fires when StopReason::Error is encountered. Defined as OnErrorFn. Blank by default.

LINE ITEMS: Turns (Steps) [EXISTS] as events and struct

Turns exist as TurnStart / TurnEnd event pairs in the loop's event stream, and as materialized Turn structs on LoopRecord.turns. See turn.md.

FieldTypeStatusDescription
turnsVec<Turn>[EXISTS]Materialized turn records. Built by SessionRecorder from event pairs. Empty for old sessions (backward compat via #[serde(default)]).
(event-pair)[EXISTS]Each turn is also bounded by TurnStart and TurnEnd events in self.events.

LINE ITEMS: Same-session Children [EXISTS]

FieldTypeStatusDescription
children_loop_idsVec<String>[EXISTS]Loop IDs of same-session child loops (continuations, reruns, branches). Parent->children direction. Does not include cross-session sub-agent children.

LINE ITEMS: Sub-agent Spawns (Cross-session) [EXISTS]

FieldTypeStatusDescription
child_loop_refsVec<ChildLoopRef>[EXISTS]Cross-session links to sub-agent loops spawned by tool calls. Each entry has: tool_call_id, tool_name, child_loop_id, child_session_id.

ChildLoopRef fields:

FieldTypeStatusDescription
tool_call_idString[EXISTS]The ToolCall.id that triggered sub-agent execution.
tool_nameString[EXISTS]The tool name that performed the spawn.
child_loop_idString[EXISTS]The sub-agent's AgentStart.loop_id.
child_session_idString[EXISTS]The sub-agent's session. Extracted from child_loop_id prefix.

LINE ITEMS: Parallel Group [EXISTS]

Set when this loop was part of an evaluational-parallelism group (agent_loop_parallel).

FieldTypeStatusDescription
parallel_groupOption<ParallelGroupRecord>[EXISTS]None for non-parallel loops.
all_loop_idsVec<String>[EXISTS]All branch loop IDs in config order.
selected_loop_idString[EXISTS]The winning branch's loop ID.
selected_config_indexusize[EXISTS]0-based index of the winner in the original configs.
evaluation_usageUsage[EXISTS]Token usage from the judge LLM (zero for non-judge strategies).
is_selectedbool[EXISTS]true if this LoopRecord is the evaluation winner.

LINE ITEMS: Events [EXISTS]

FieldTypeStatusDescription
eventsVec<LoopEvent>[EXISTS]Ordered event stream for this loop.

Each LoopEvent has:

FieldTypeStatusDescription
sequenceu64[EXISTS]Monotonic counter (0-based). Gaps indicate filtered events (e.g., streaming deltas when include_streaming_events is false).
eventAgentEvent[EXISTS]The original event. event.loop_id() matches this LoopRecord.loop_id.

Messages [EXISTS]

FieldTypeStatusDescription
messagesVec<AgentMessage>[EXISTS]All new messages produced by this loop, from AgentEnd.messages. Authoritative for replay and branching.

Loop Origin Classification

parent_loop_idcontinuation_kindMeaning
NoneInitialFresh origin loop (agent_loop)
Some(p), same sessionDefaultRegular continuation
Some(p), same sessionRerunRetry / error recovery
Some(p), same sessionBranchBranch exploration
Some(p), different sessionInitialSub-agent loop (spawned by a tool)

Code Reference

FileWhat it contains
src/session/model.rsLoopRecord struct, LoopStatus enum, LoopConfigSnapshot struct, ChildLoopRef struct, ParallelGroupRecord struct, LoopEvent struct, OpenLoop struct.
src/agent_loop/run.rsrun_loop function — the core loop engine. Implements the outer loop (follow-ups) and inner loop (tool calls + steering). Accumulates Usage, fires turn events and hooks.

Conceptual Notes

  • Model fallback is Loop -> Agent default. Session no longer carries model/thinking/temperature fields; these are tracked per-loop in LoopConfigSnapshot.
  • Turns as a struct are materialized on LoopRecord.turns as Vec<Turn>. Built by SessionRecorder from TurnStart/TurnEnd event pairs. The flat messages field is kept independently for compaction and context building. Old sessions without turns deserialize with an empty vec.
  • LoopConfigSnapshot intentionally does not store the full AgentLoopConfig because it contains API keys and non-serializable hook closures. The snapshot captures model identity plus key parameters (thinking_level, temperature, context_window, max_tokens, etc.) for cost attribution, replay identification, parallel branch differentiation, and per-loop config tracking.

Turn

A single LLM call-and-response cycle within a Loop. One Loop may have many Turns: the initial response plus one per tool-call round-trip or steering message injection.

Status: Turn [EXISTS] as both a first-class struct (Turn on LoopRecord.turns) and as an event-pair (TurnStart / TurnEnd). The SessionRecorder materializes Turn structs from the event stream.

Concept Overview

Turn [EXISTS as struct on LoopRecord.turns; EXISTS as event-pair TurnStart/TurnEnd]
├── HEADER
│   ├── TurnId [EXISTS] — { loop_id, turn_index }
│   ├── triggered_by [EXISTS] — User/SubAgent/Continuation/Branch
│   ├── usage [EXISTS] — per-turn from TurnEnd
│   └── Callbacks: before_turn / after_turn [EXISTS]
└── LINE ITEMS: Actions
    ├── Messages [EXISTS] — Input (User) + Output (Assistant)
    ├── Tool Executions [EXISTS]
    └── Streaming [EXISTS] — MessageUpdate deltas

HEADER

FieldTypeStatusDescription
TurnIdstruct[EXISTS]Identifies the turn. Composed of loop_id: String and turn_index: u32. Carried on every LlmMessage produced during the turn.
turn_indexu32[EXISTS]Zero-based index within the current loop (0 = first turn after AgentStart). Present on TurnStart and TurnEnd events.
triggered_byTurnTrigger[EXISTS]What caused this turn to begin. See Trigger section below.
usageUsage[EXISTS]Per-turn token usage. Carried on TurnEnd.usage. Fields: input, output, reasoning, cache_read, cache_write, total_tokens.
timestamp (start)DateTime<Utc>[EXISTS]Wall-clock time when the turn began. On TurnStart.timestamp.
timestamp (end)DateTime<Utc>[EXISTS]Wall-clock time when the turn completed (after all tool calls finished). On TurnEnd.timestamp.

TurnTrigger [EXISTS]

Identifies what caused a new turn to begin. Enum TurnTrigger:

VariantStatusDescription
User[EXISTS]First turn triggered by a user message (agent_loop).
SubAgent[EXISTS]This agent was invoked as a sub-agent by a parent agent.
Continuation[EXISTS]Continuation turn: tool round-trip, steering message, or Default / Rerun continuation.
Branch[EXISTS]First turn of a Branch continuation (agent_loop_continue with ContinuationKind::Branch). Subsequent turns within the same branched loop use Continuation.

Callbacks [EXISTS]

CallbackStatusDescription
before_turn[EXISTS]Fires BEFORE TurnStart event is emitted. Defined as BeforeTurnFn on AgentLoopConfig. Receives (&[AgentMessage], usize) (messages, turn index). Returning false aborts the turn.
after_turn[EXISTS]Fires AFTER TurnEnd event is emitted. Defined as AfterTurnFn. Receives (&[AgentMessage], &Usage).

LINE ITEMS: Messages [EXISTS]

Messages produced and consumed during the turn.

Message TypeDirectionStatusDescription
Input (User / Steering / Follow-up)Into LLM[EXISTS]Injected after TurnStart. Includes initial prompt messages (first turn only), pending steering messages, and follow-up messages. Each emits MessageStart / MessageEnd events. All carry the current TurnId.
Output (Assistant)From LLM[EXISTS]The LLM's streamed response. Emitted as MessageStart -> MessageUpdate (streaming deltas) -> MessageEnd. Carries StopReason, model, provider, usage. Pushed to context and new_messages with TurnId.

LINE ITEMS: Tool Executions [EXISTS]

Tool calls extracted from the assistant message's Content::ToolCall items.

FieldStatusDescription
Tool calls[EXISTS]Extracted from Message::Assistant.content as (id, name, arguments) tuples.
ToolExecutionStart event[EXISTS]Emitted per tool call before execute(). Carries tool_call_id, tool_name, args.
ToolExecutionUpdate event[EXISTS]Emitted during execution for streaming partial results (via ctx.on_update). Not all tools emit these.
ToolExecutionEnd event[EXISTS]Emitted when tool finishes. Carries result, is_error, optional child_loop_id (for sub-agent tools).
ProgressMessage event[EXISTS]Plain text status updates from tools (via ctx.on_progress).
Tool results[EXISTS]Message::ToolResult messages appended to context with the current TurnId. Fed back to LLM in the next turn.
TurnEnd.tool_results[EXISTS]All tool result messages for this turn. Empty when no tool calls were made (StopReason::Stop).

LINE ITEMS: Streaming Deltas [EXISTS]

Incremental token-level updates from the LLM stream, carried on MessageUpdate events.

VariantStatusDescription
StreamDelta::Text { delta }[EXISTS]A text token fragment from the LLM's response.
StreamDelta::Thinking { delta }[EXISTS]A thinking/reasoning chunk (extended thinking mode only).
StreamDelta::ToolCallDelta { delta }[EXISTS]A fragment of JSON arguments for a tool call. Must be accumulated and parsed after MessageEnd.

Per-Turn Event Ordering

The event ordering is strictly enforced every iteration of the inner loop in run_loop:

before_turn hook  ->  TurnStart event
                  ->  [MessageStart/End for prompt/steering messages]
                  ->  [Compaction if threshold exceeded]
                  ->  [MessageStart -> MessageUpdate* -> MessageEnd for assistant response]
                  ->  [ToolExecutionStart -> ToolExecutionUpdate* -> ToolExecutionEnd for each tool]
                  ->  TurnEnd event
                  ->  after_turn hook

Code Reference

FileWhat it contains
src/agent_loop/run.rsrun_loop function — implements the turn cycle. TurnStart / TurnEnd event emission, before_turn / after_turn hook invocation, turn trigger determination, usage accumulation, tool call extraction and execution.
src/types/event.rsTurnTrigger enum, AgentEvent::TurnStart and AgentEvent::TurnEnd variants, StreamDelta enum.
src/types/agent_message.rsTurnId struct — { loop_id, turn_index }. Carried on LlmMessage.turn_id.
src/session/model.rsTurn struct — materialized turn record on LoopRecord.turns. Fields: turn_id, triggered_by, usage, input_messages, output_message, tool_results, started_at, ended_at.
src/session/recorder.rsSessionRecorder — builds Turn structs from TurnStart/MessageEnd/TurnEnd event pairs.

Conceptual Notes

  • Turn as a first-class struct is implemented. The Turn struct on LoopRecord.turns contains: turn_id, triggered_by, usage, input_messages, output_message, tool_results, started_at, ended_at. Built by SessionRecorder from TurnStart/TurnEnd event pairs. The flat LoopRecord.messages is kept independently for backward compatibility and use by compaction/context building. Old sessions without turns deserialize with an empty vec via #[serde(default)].
  • Turn lifecycle is entirely within a single Loop. A turn never spans loops. The inner loop in run_loop continues when there are tool calls or pending steering messages; each iteration is one turn.
  • Execution limits are checked BEFORE before_turn fires, so hooks are not invoked for impossible turns. If a limit is reached, a system message ([Agent stopped: ...]) is emitted and the loop returns.
  • Compaction can occur within a turn (after TurnStart, before the LLM call), making a single turn potentially include a compaction event in its span.

Message

The message entities form the communication substrate of the entire system. Messages flow through Agent, Session, Loop, and Turn. The type hierarchy separates atomic content blocks, conversation-level messages, agent-level routing envelopes, and token usage tracking.

Concept Overview

Message System [EXISTS]
├── Content [EXISTS] — Text / Image / Thinking / ToolCall
├── Message [EXISTS] — User / Assistant / ToolResult
├── AgentMessage [EXISTS] — Llm(LlmMessage) | Extension(ExtensionMessage)
├── LlmMessage [EXISTS] — Message + Option<TurnId>
├── StopReason [EXISTS] — Stop/Length/ToolUse/Error/Aborted/...
└── Usage [EXISTS] — input/output/reasoning/cache tokens

Content [EXISTS]

The atomic unit of all message payloads. Every message is composed of Vec<Content>. A single LLM turn can contain multiple content blocks (e.g., a Thinking block followed by Text, or Text followed by multiple ToolCalls).

Enum Content, tagged by "type" in JSON:

VariantStatusFieldsDescription
Text[EXISTS]text: StringPlain string payload sent to/from the LLM.
Image[EXISTS]data: String, mime_type: StringBinary image encoded as base64 string (not a file path). LLMs receive image bytes inline.
Thinking[EXISTS]thinking: String, signature: Option<String>Internal chain-of-thought from the LLM (e.g., Claude extended thinking). Visible in UI, never re-sent as content to LLM. signature is a cryptographic integrity token from the provider that must be echoed back unmodified in multi-turn conversations.
ToolCall[EXISTS]id: String, name: String, arguments: serde_json::ValueLLM's request to invoke a tool with structured JSON arguments. The id links to a corresponding ToolResult.

Message [EXISTS]

The conversation-level message enum. Tagged by "role" in JSON. Each variant carries Vec<Content> plus role-specific metadata.

VariantStatusFieldsDescription
User[EXISTS]content: Vec<Content>, timestamp: u64User turn. Mixed media supported (text + images). Timestamp is unix millis. Helper constructor: Message::user(text).
Assistant[EXISTS]content: Vec<Content>, stop_reason: StopReason, model: String, provider: String, usage: Usage, timestamp: u64, error_message: Option<String>LLM's response, fully annotated. stop_reason tells why generation stopped. model/provider captured for cost tracking and multi-provider routing. Failed turns are persisted, not dropped.
ToolResult[EXISTS]tool_call_id: String, tool_name: String, content: Vec<Content>, is_error: bool, timestamp: u64Tool execution result returned to LLM. tool_call_id links back to the specific ToolCall in the assistant content. is_error: true means the LLM sees the failure and can recover/retry.

Helper Methods on Message

MethodStatusDescription
user(text)[EXISTS]Constructor for simple text user messages.
role()[EXISTS]Returns "user", "assistant", or "toolResult".
is_context_overflow()[EXISTS]Checks if an assistant message represents a context overflow error by inspecting error_message against known provider overflow patterns.

StopReason [EXISTS]

Why an assistant message's generation stopped. Enum with camelCase serialization.

VariantStatusDescription
Stop[EXISTS]Natural end of generation.
Length[EXISTS]Max tokens reached.
ToolUse[EXISTS]LLM requested tool execution.
Error[EXISTS]Provider error during generation.
Aborted[EXISTS]Cancelled by caller.
MaxTurns[EXISTS]Maximum allowed turns reached.
UserStop[EXISTS]Stopped by explicit user command.
Handoff[EXISTS]Agent handing off to human operator.
GuardRail[EXISTS]Stopped by internal guardrail (content moderation, safety filter).
ContextCompacted[EXISTS]Context was compacted, potentially losing information.
Paused[EXISTS]Generation paused (waiting for external input).

AgentMessage [EXISTS]

The agent loop's two-lane routing envelope. Decides whether content goes INTO the LLM context window or SIDEWAYS to the UI/app without consuming tokens.

Enum AgentMessage, untagged in JSON (discriminated by role field):

VariantStatusDescription
Llm(LlmMessage)[EXISTS]Enters the LLM context window. Serialized into the API request.
Extension(ExtensionMessage)[EXISTS]NEVER enters the context window. Only emitted as AgentEvents. For UI notifications, debug events, session metadata, progress markers.

Key Design: One-way Conversion

Message -> AgentMessage::Llm exists via From<Message>. There is no path for ExtensionMessage to become an Llm variant. The type system enforces that UI-only content can never accidentally slip into the LLM context.

Methods on AgentMessage

MethodStatusDescription
role()[EXISTS]Delegates to inner message's role.
as_llm()[EXISTS]Returns Option<&Message>. None for Extension.
turn_id()[EXISTS]Returns Option<&TurnId>. None for Extension.
with_turn_id(Option<TurnId>)[EXISTS]Sets turn_id on LLM messages. No-op for Extension.

LlmMessage [EXISTS]

An LLM-bound message with optional turn tracking metadata. Wraps Message + Option<TurnId>.

FieldTypeStatusDescription
messageMessage[EXISTS]The underlying conversation message.
turn_idOption<TurnId>[EXISTS]Which turn produced this message. None for messages that predate turn tracking or are created outside the agent loop.

Custom Serde (Flatten Pattern)

LlmMessage uses custom Serialize / Deserialize implementations to flatten into the same JSON shape as a bare Message with an optional turnId field injected. This maintains backward compatibility: old data without turnId deserializes as turn_id: None.

Why custom serde: #[serde(flatten)] does not work with serde's internally-tagged enums (#[serde(tag = "role")] on Message). Manual serialize/deserialize is the only way to achieve the flatten-into-Message pattern.

Constructors

MethodStatusDescription
new(message)[EXISTS]Creates LlmMessage without turn tracking (turn_id: None).
with_turn(message, turn_id)[EXISTS]Creates LlmMessage with a specific TurnId.

ExtensionMessage [EXISTS]

App-only message that never enters the LLM context window. Streamed as events for UI/app consumption.

FieldTypeStatusDescription
roleString[EXISTS]Always "extension". Acts as discriminator in untagged deserialization. Named role for consistency with Message but functions more like a type/category marker.
kindString[EXISTS]Message category (e.g., "notification", "system", "debug"). App-specific.
dataserde_json::Value[EXISTS]Arbitrary JSON payload. Serialized from any impl Serialize via ExtensionMessage::new().

Usage [EXISTS]

Token metrics per turn or accumulated across loops/sessions.

FieldTypeStatusDescription
inputu64[EXISTS]Input tokens consumed.
outputu64[EXISTS]Output tokens generated.
reasoningu64[EXISTS]Reasoning/thinking tokens — a subset of output. Non-zero only for providers that report reasoning tokens separately (OpenAI o-series). Defaults to 0.
cache_readu64[EXISTS]Tokens served from prompt cache.
cache_writeu64[EXISTS]Tokens written to prompt cache.
total_tokensu64[EXISTS]Total tokens (may differ from sum of above depending on provider reporting).

Methods on Usage

MethodStatusDescription
estimated_cost(&CostConfig)[EXISTS]Dollar cost calculation using per-million-token rates. reasoning tokens are already counted in output (no double-charge).
combine(&Usage)[EXISTS]Adds two Usage values (e.g., sum across parallel branches or multi-step loops).
cache_hit_rate()[EXISTS]Fraction of input tokens served from cache (0.0-1.0). Returns 0.0 if no input tokens processed.

Where Usage Appears

LocationStatusDescription
Message::Assistant.usage[EXISTS]Per-turn usage on the assistant message itself.
AgentEvent::TurnEnd.usage[EXISTS]Direct per-turn access without destructuring the message.
AgentEvent::AgentEnd.usage[EXISTS]Accumulated across all turns in a loop.
LoopRecord.usage[EXISTS]Captured from AgentEnd.usage.
Session.total_usage()[EXISTS]Summed across all loops.

Code Reference

FileWhat it contains
src/types/content.rsContent enum (Text, Image, Thinking, ToolCall), Message enum (User, Assistant, ToolResult), StopReason enum, now_ms() helper.
src/types/agent_message.rsTurnId struct, LlmMessage struct (with custom serde), AgentMessage enum, From<Message> impl.
src/types/extension.rsExtensionMessage struct.
src/types/usage.rsUsage struct, CacheConfig struct, CacheStrategy enum, ThinkingLevel enum.

Conceptual Notes

  • LlmMessage serde is a critical compatibility mechanism. Any future fields added to LlmMessage must maintain the flatten-into-Message JSON pattern. Do not use #[serde(flatten)] with Message.
  • ExtensionMessage naming: The role field is named for consistency with Message but functions as a type discriminator. A more accurate name would be type or category, but role enables consistent untagged serde deserialization across the AgentMessage enum.
  • StopReason includes several forward-looking variants (MaxTurns, UserStop, Handoff, GuardRail, ContextCompacted, Paused) adopted from other agentic frameworks. These exist as enum variants but may not yet be emitted by all code paths.
  • Usage.reasoning is a subset of output, not an additional charge. It is non-zero only for OpenAI o-series models that report reasoning tokens separately.

Tool System

The tool system defines how agents interact with the external world. Every capability an agent has -- running shell commands, reading files, calling APIs, delegating to sub-agents -- is expressed as a tool implementing the AgentTool trait. The agent loop discovers tools by name from a registry, executes them with lifecycle events, and feeds results back to the LLM.

Concept Overview

Tool [EXISTS]
├── AgentTool trait [EXISTS] — name, label, description, parameters_schema, execute
├── ToolContext [EXISTS] — tool_call_id, tool_name, cancel, on_update, on_progress
├── ToolResult [EXISTS] — content, details, child_loop_id
├── ToolError [EXISTS] — Failed/NotFound/InvalidArgs/Cancelled
├── ToolExecutionStrategy [EXISTS] — Sequential/Parallel/Batched
├── SubAgentTool [EXISTS] — spawns child agent loop
├── Sources: Built-in [EXISTS] / OpenAPI [EXISTS] / MCP [EXISTS]
└── Callbacks: before/after_tool_execution, before/after_update [EXISTS]

AgentTool Trait [EXISTS]

The core extension point. Implement this trait to create custom tools.

MethodSignatureStatusDescription
name()-> &str[EXISTS]Unique identifier used in LLM tool_use (e.g. "bash")
label()-> &str[EXISTS]Human-readable label for UI display
description()-> &str[EXISTS]Description sent to the LLM so it knows when/how to use the tool
parameters_schema()-> serde_json::Value[EXISTS]JSON Schema for parameters; LLM uses this to format arguments
execute()(params, ctx) -> Result<ToolResult, ToolError>[EXISTS]Execute the tool with LLM-chosen arguments and system-injected context

Design: params (LLM input) and ctx (system environment) are deliberately separate parameters. params varies per call; ctx provides cancellation, streaming callbacks, and correlation IDs that are the same shape for every tool.


ToolContext [EXISTS]

Per-invocation context passed to execute(). Using a struct (rather than individual parameters) future-proofs the trait -- adding fields is non-breaking.

FieldTypeStatusDescription
tool_call_idString[EXISTS]Unique ID for this invocation; correlates Start/Update/End events
tool_nameString[EXISTS]Name of the tool being invoked
cancelCancellationToken[EXISTS]Check is_cancelled() in long-running tools; child token of the parent loop's token
on_updateOption<ToolUpdateFn>[EXISTS]Callback for streaming partial ToolResults (UI/logging only; not sent to LLM)
on_progressOption<ProgressFn>[EXISTS]Callback for user-facing progress text (emits ProgressMessage events)

Callback wiring: The agent loop creates on_update and on_progress closures that capture a cloned tx channel sender. When a tool calls on_update(partial), the closure pushes an AgentEvent::ToolExecutionUpdate into the channel. The tool never touches the event system directly.


ToolResult [EXISTS]

What a tool hands back to the runtime after execution.

FieldTypeStatusDescription
contentVec<Content>[EXISTS]Tool output (text, images, etc.)
detailsserde_json::Value[EXISTS]Freeform metadata (not sent to LLM)
child_loop_idOption<String>[EXISTS]Set by SubAgentTool to the child loop's ID; None for regular tools

Note: The runtime transforms struct ToolResult into Message::ToolResult by enriching it with correlation metadata (tool_call_id, tool_name, is_error, timestamp) before it enters the LLM conversation.


ToolError [EXISTS]

Error taxonomy for tool execution failures. Errors are converted to ToolResult with is_error=true so the LLM sees the failure and can self-correct.

VariantDisplayStatus
Failed(String)"{message}"[EXISTS]
NotFound(String)"Tool not found: {name}"[EXISTS]
InvalidArgs(String)"Invalid arguments: {message}"[EXISTS]
Cancelled"Cancelled"[EXISTS]

ToolExecutionStrategy [EXISTS]

Controls how multiple tool calls from a single LLM response are executed. Set at agent construction time (not a per-turn LLM decision).

VariantStatusBehavior
Sequential[EXISTS]One at a time; checks steering between each. Use for tools with shared mutable state
Parallel (default)[EXISTS]All concurrent via futures::join_all; checks steering after all complete. Best latency for independent tools
Batched { size }[EXISTS]N tools in parallel per batch; checks steering between batches. Balances speed with human-in-the-loop control

Steering: The human-in-the-loop interrupt mechanism. Between tool executions (or batches), the loop checks whether the human has sent a new instruction, cancellation, or correction.


SubAgentTool [EXISTS]

A tool that delegates work to a child agent loop. When the parent LLM calls it, a fresh agent_loop() runs with its own system prompt, tools, and provider. The child loop's final text output is returned as the tool result.

AttributeStatusDescription
tool_name[EXISTS]Unique name for the sub-agent tool
tool_description[EXISTS]Description for the parent LLM
system_prompt[EXISTS]Child agent's system prompt
model_config[EXISTS]Child agent's model configuration
provider_override[EXISTS]Optional custom provider (testing)
tools[EXISTS]Tools available to the child agent
thinking_level[EXISTS]Thinking level for the child loop

Design constraints: Sub-agents are NOT given other SubAgentTools (static depth prevention). Cancellation propagates from parent to child. Events stream back to the parent via on_update.


Built-in Tools [EXISTS]

Six tools returned by default_tools():

ToolFileStatusDescription
BashTooltools/bash.rs[EXISTS]Run shell commands
ReadFileTooltools/file.rs[EXISTS]Read file contents
WriteFileTooltools/file.rs[EXISTS]Write or overwrite a file
EditFileTooltools/edit.rs[EXISTS]Precise text replacement within a file
ListFilesTooltools/list.rs[EXISTS]List directory contents
SearchTooltools/search.rs[EXISTS]Grep / content search across files

OpenAPI Tools [EXISTS]

OpenApiToolAdapter parses an OpenAPI 3.0 spec and creates one AgentTool per operation. Each adapter makes HTTP requests to the API endpoint when executed. Feature-gated behind the openapi Cargo feature.

Factory methods: from_str, from_file, from_url, from_spec.


MCP Tools [EXISTS]

McpToolAdapter bridges MCP server tools to the AgentTool trait using the Adapter pattern. All adapters for the same server share one McpClient (via Arc<Mutex<McpClient>>). Name collision prevention uses an optional prefix namespace (e.g. "filesystem__read_file").


Tool Callbacks [EXISTS]

Lifecycle hooks that fire around tool execution. All are Option<Arc<dyn Fn(...)>> on AgentLoopConfig.

HookSignatureStatusFires When
before_tool_execution(tool_name, tool_call_id, args) -> bool[EXISTS]Before ToolExecutionStart; return false to skip the call
after_tool_execution(tool_name, tool_call_id, is_error)[EXISTS]After ToolExecutionEnd
before_tool_execution_update(tool_name, tool_call_id, text) -> bool[EXISTS]Before each ToolExecutionUpdate; return false to suppress the event
after_tool_execution_update(tool_name, tool_call_id, text)[EXISTS]After each ToolExecutionUpdate (only if not suppressed)

Hook ordering: Hooks fire strictly before their paired event is emitted. When before_tool_execution returns false, no ToolExecutionStart/End events are emitted; a synthetic error ToolResult is sent to the LLM so it knows the call was skipped.


Code Reference

ConceptFile
AgentTool trait, ToolContext, ToolResult, ToolError, ToolExecutionStrategysrc/types/tool.rs
ToolUpdateFn, ProgressFn type aliasessrc/types/tool.rs
Tool dispatch, execute_tool_calls, execute_single_tool, skip_tool_callsrc/agent_loop/tools.rs
SubAgentToolsrc/agents/sub_agent.rs
Built-in tools (BashTool, ReadFileTool, etc.)src/tools/
OpenApiToolAdaptersrc/openapi/adapter.rs
McpToolAdaptersrc/mcp/tool_adapter.rs
Tool callback type aliases (BeforeToolExecutionFn, etc.)src/agent_loop/config.rs
ToolDefinition (schema sent to LLM, not executable)src/provider/traits.rs

Conceptual Notes

  • Tool Permission System [CONCEPTUAL] -- The plan includes an Agent-level Permissions tab with include/exclude rules for allowed/denied actions. This would gate tool execution at a higher level than the before_tool_execution hook.
  • Tool Result Streaming to LLM [CONCEPTUAL] -- Currently on_update partial results are UI-only. A future design could allow streaming tool results to the LLM mid-execution for real-time reasoning.
  • ToolDefinition vs AgentTool Split -- ToolDefinition (in provider/traits.rs) is the schema half sent to the LLM; AgentTool (in types/tool.rs) is the executable half. The agent loop bridges them: converts AgentTool to ToolDefinition before streaming, then matches ToolCall content back to AgentTool by name for execution.

Provider System

The provider system abstracts all LLM backends behind a single StreamProvider trait. The caller constructs a ModelConfig (the model's "identity card"), and the ProviderRegistry dispatches to the correct concrete provider at runtime. This design allows seamless switching between Anthropic, OpenAI, Google, Bedrock, Azure, and 15+ OpenAI-compatible providers without changing application code.

Concept Overview

Provider [EXISTS]
├── ModelConfig [EXISTS] — id, name, api, provider, base_url, api_key, cost
├── ApiProtocol [EXISTS] — 7 variants (Anthropic, OpenAI, Google, Bedrock, Azure, etc.)
├── CostConfig [EXISTS] — per-million rates
├── StreamProvider trait [EXISTS] — stream() method
├── ProviderRegistry [EXISTS] — dispatch by ApiProtocol
├── OpenAiCompat [EXISTS] — quirk flags for 15+ providers
└── ContextTranslationStrategy [EXISTS] — cross-provider content translation (G8, src/provider/context_translation.rs)

ModelConfig [EXISTS]

The single source of truth for a model's identity. Bundles everything a provider needs to make API calls.

FieldTypeStatusDescription
idString[EXISTS]Model identifier sent to the API (e.g. "gpt-4o", "claude-sonnet-4-20250514")
nameString[EXISTS]Human-friendly display name (logging/UI; not sent to API)
apiApiProtocol[EXISTS]Which wire protocol to use (dispatch key for ProviderRegistry)
providerString[EXISTS]Provider name for logging (e.g. "openai", "anthropic")
base_urlString[EXISTS]Base URL for API requests (supports private deployments, proxies)
api_keyString[EXISTS]Authentication credential; defaults to empty string so configs can omit it
reasoningbool[EXISTS]Whether this model supports extended thinking/reasoning
context_windowu32[EXISTS]Max input tokens (used for compaction decisions)
max_tokensu32[EXISTS]Default max output tokens
costCostConfig[EXISTS]Token pricing for cost tracking (defaults to zero)
headersHashMap<String, String>[EXISTS]Additional HTTP headers (e.g. API-version headers)
compatOption<OpenAiCompat>[EXISTS]OpenAI quirk flags; None for non-OpenAI providers

ApiProtocol [EXISTS]

The dispatch key that maps a model to its concrete StreamProvider implementation. Seven variants covering all supported backends.

VariantProvider FileStatusCovers
AnthropicMessagesanthropic.rs[EXISTS]Claude models
OpenAiCompletionsopenai_compat.rs[EXISTS]OpenAI, Groq, Together, DeepSeek, Fireworks, Mistral, xAI, OpenRouter, etc. (15+)
OpenAiResponsesopenai_responses.rs[EXISTS]OpenAI Responses API
AzureOpenAiResponsesazure_openai.rs[EXISTS]Azure OpenAI
GoogleGenerativeAigoogle.rs[EXISTS]Gemini (Google AI Studio)
GoogleVertexgoogle_vertex.rs[EXISTS]Vertex AI
BedrockConverseStreambedrock.rs[EXISTS]Amazon Bedrock (ConverseStream)

CostConfig [EXISTS]

Token pricing per million tokens. Embedded in ModelConfig with #[serde(default)] fields, so callers who don't need cost tracking can omit it.

FieldTypeStatusDescription
input_per_millionf64[EXISTS]Cost per million input tokens
output_per_millionf64[EXISTS]Cost per million output tokens
cache_read_per_millionf64[EXISTS]Cost per million cache-read tokens (default: 0.0)
cache_write_per_millionf64[EXISTS]Cost per million cache-write tokens (default: 0.0)

StreamProvider Trait [EXISTS]

The core abstraction every LLM backend implements. The rest of the codebase interacts only with &dyn StreamProvider -- it never knows which concrete backend is used at runtime.

MethodSignatureStatusDescription
provider_id()-> &str[EXISTS]Short stable identifier (e.g. "anthropic"); used in loop_id construction
stream()(config, tx, cancel) -> Result<Message, ProviderError>[EXISTS]Stream a completion; sends StreamEvents through tx in real time; returns final assembled Message

Dual-output contract: The tx channel carries partial deltas for real-time UI updates. The return value carries the complete message after the stream ends. The loop cannot read its own output from the channel -- the return value is the protocol, the channel is the live feed.


ProviderRegistry [EXISTS]

Maps ApiProtocol to StreamProvider implementations. Factory + router.

MethodStatusDescription
new()[EXISTS]Empty registry (no providers)
default()[EXISTS]All 7 built-in providers registered
register(protocol, provider)[EXISTS]Register a provider for a protocol (overwrites if exists)
get(protocol)[EXISTS]Look up provider by protocol
has(protocol)[EXISTS]Check if a provider is registered
protocols()[EXISTS]List all registered protocols
stream(model, config, tx, cancel)[EXISTS]Dispatch: looks up provider by model.api, delegates to provider.stream()

Design: model (routing key) is separate from config (request payload). The registry routes on model.api, then passes config through unchanged.


OpenAiCompat Quirk Flags [EXISTS]

The "quirk matrix" for 15+ OpenAI-compatible providers. One openai_compat.rs provider reads these flags at runtime and branches accordingly, instead of maintaining separate provider files per quirk combination.

FlagTypeStatusDescription
supports_storebool[EXISTS]Supports the store parameter for conversation persistence
supports_developer_rolebool[EXISTS]Supports developer role (system-level instructions)
supports_reasoning_effortbool[EXISTS]Supports reasoning_effort parameter
supports_usage_in_streamingbool[EXISTS]Includes usage data in streaming responses (default: true)
max_tokens_fieldMaxTokensField[EXISTS]Which field name to use: MaxTokens or MaxCompletionTokens
requires_tool_result_namebool[EXISTS]Tool results must include a name field
requires_assistant_after_tool_resultbool[EXISTS]Must insert assistant message after tool results
thinking_formatThinkingFormat[EXISTS]How thinking/reasoning content is formatted: OpenAi, Xai, Qwen, OpenRouter

Factory methods for provider-specific flag combinations:

MethodStatusNotes
OpenAiCompat::openai()[EXISTS]store, developer role, reasoning effort, MaxCompletionTokens
OpenAiCompat::xai()[EXISTS]Grok thinking format
OpenAiCompat::groq()[EXISTS]Default with streaming usage
OpenAiCompat::cerebras()[EXISTS]Pure default (no deviations)
OpenAiCompat::openrouter()[EXISTS]Developer role, OpenRouter thinking format
OpenAiCompat::mistral()[EXISTS]MaxTokens field
OpenAiCompat::deepseek()[EXISTS]MaxCompletionTokens

Factory Methods on ModelConfig [EXISTS]

Convenience constructors for common providers.

MethodStatusProtocolDefault context_window
ModelConfig::anthropic(id, name, api_key)[EXISTS]AnthropicMessages200,000
ModelConfig::openai(id, name, api_key)[EXISTS]OpenAiCompletions128,000
ModelConfig::google(id, name, api_key)[EXISTS]GoogleGenerativeAi1,000,000
ModelConfig::local(base_url, model_id, api_key)[EXISTS]OpenAiCompletions128,000
ModelConfig::openrouter(model_id, api_key)[EXISTS]OpenAiCompletions200,000

ProviderError [EXISTS]

Error taxonomy for provider failures. The agent loop uses this for retry/recovery decisions.

VariantStatusRetryableDescription
Api(String)[EXISTS]NoNon-transient API error (bad request, server error)
Network(String)[EXISTS]YesTransport failure (connection refused, timeout, TLS)
Auth(String)[EXISTS]No401/403 -- bad or missing API key
RateLimited { retry_after_ms }[EXISTS]Yes429 -- too many requests
ContextOverflow { message }[EXISTS]No (compact)Input exceeds context window; caller should compact and retry
Cancelled[EXISTS]NoCancellationToken triggered
Other(String)[EXISTS]NoCatch-all

Context overflow detection: Centralized in OVERFLOW_PHRASES covering 15+ provider-specific error strings. Both HTTP errors and SSE-embedded errors are classified.


Code Reference

ConceptFile
ModelConfig, ApiProtocol, CostConfig, OpenAiCompat, MaxTokensField, ThinkingFormatsrc/provider/model.rs
StreamProvider trait, StreamConfig, StreamEvent, ToolDefinition, ProviderErrorsrc/provider/traits.rs
ProviderRegistrysrc/provider/registry.rs
AnthropicProvidersrc/provider/anthropic.rs
OpenAiCompatProvidersrc/provider/openai_compat.rs
OpenAiResponsesProvidersrc/provider/openai_responses.rs
AzureOpenAiProvidersrc/provider/azure_openai.rs
GoogleProvidersrc/provider/google.rs
GoogleVertexProvidersrc/provider/google_vertex.rs
BedrockProvidersrc/provider/bedrock.rs
RetryConfigsrc/provider/retry.rs
MockProvider (testing)src/provider/mock.rs

Conceptual Notes

  • ContextTranslationStrategy [EXISTS] -- Trait in src/provider/context_translation.rs (G8). DefaultContextTranslation handles cross-provider content translation: Anthropic keeps Thinking blocks, OpenAI converts to Text with [Reasoning] prefix, Google/Bedrock drops Thinking. Set on AgentLoopConfig.context_translation.
  • Model fallback chain -- Model resolution follows: Loop (AgentLoopConfig.model_config) -> Session model override [EXISTS] (Session.model_config: Option<ModelConfig>) -> Agent default (BasicAgent.model_config).
  • provider_override -- AgentLoopConfig.provider_override: Option<Arc<dyn StreamProvider>> bypasses ProviderRegistry dispatch entirely. Used for testing with MockProvider or injecting custom provider implementations.

Event Lifecycle

AgentEvent is the runtime's event vocabulary -- it captures every significant happening in the agent loop that a UI, logger, or analysis consumer might react to. Events are emitted through an mpsc::UnboundedSender<AgentEvent> channel during execution and consumed by SessionRecorder (or any external subscriber) on the receiving end.

Concept Overview

Event [EXISTS]
├── AgentEvent [EXISTS] — 15 variants
│   ├── Session: AgentStart/End [EXISTS]
│   ├── Loop: ParallelLoopStart/End, CompactionStarted/Ended [EXISTS]
│   ├── Turn: TurnStart/End [EXISTS]
│   ├── Message: MessageStart/Update/End [EXISTS]
│   ├── Tool: ToolExecutionStart/Update/End, ProgressMessage [EXISTS]
│   └── Input: InputRejected [EXISTS]
├── StreamDelta [EXISTS] — Text/Thinking/ToolCallDelta
├── ContinuationKind [EXISTS] — Initial/Default/Rerun/Branch/Compaction
└── TurnTrigger [EXISTS] — User/SubAgent/Continuation/Branch

AgentEvent [EXISTS]

15 variants grouped by scope. Each variant carries a loop_id for correlation (except ParallelLoopStart/End which use session_id).

Session-scoped Events

VariantStatusFieldsDescription
AgentStart[EXISTS]agent_id, session_id, loop_id, parent_loop_id, continuation_kind, config_snapshot, timestamp, metadataFires once when agent_loop() is entered, before any LLM call. continuation_kind: ContinuationKind (non-optional). config_snapshot: Option<LoopConfigSnapshot> carries model/provider identity.
AgentEnd[EXISTS]loop_id, messages, usage, timestamp, rejectionFires once when agent_loop() exits; rejection is Some if an InputFilter blocked the input

Loop-scoped Events

VariantStatusFieldsDescription
ParallelLoopStart[EXISTS]session_id, loop_ids, timestampEmitted before parallel branch dispatch; lists all branch loop_ids
ParallelLoopEnd[EXISTS]session_id, selected_loop_id, selected_config_index, evaluation_usage, timestampEmitted after evaluation selects a winning branch
CompactionStarted[EXISTS]loop_id, estimated_tokens, message_count, timestampEmitted before compaction strategy runs
CompactionEnded[EXISTS]loop_id, messages_before, messages_after, estimated_tokens_before, estimated_tokens_after, loops_compacted, timestampEmitted after compaction completes

Turn-scoped Events

VariantStatusFieldsDescription
TurnStart[EXISTS]loop_id, turn_index, timestamp, triggered_byFires at the start of each LLM turn (one LLM call = one turn)
TurnEnd[EXISTS]loop_id, message, usage, timestamp, tool_resultsFires at the end of each LLM turn

Message-scoped Events

VariantStatusFieldsDescription
MessageStart[EXISTS]loop_id, messageNew message created (assistant: when SSE stream opens; user/tool: immediately)
MessageUpdate[EXISTS]loop_id, message, deltaStreaming token/chunk; delta is the increment, message is the accumulator
MessageEnd[EXISTS]loop_id, messageMessage fully complete; safe to persist

Tool-scoped Events

VariantStatusFieldsDescription
ToolExecutionStart[EXISTS]loop_id, tool_call_id, tool_name, argsTool call begins (before execute())
ToolExecutionUpdate[EXISTS]loop_id, tool_call_id, tool_name, partial_resultMid-execution partial result (via ctx.on_update)
ToolExecutionEnd[EXISTS]loop_id, tool_call_id, tool_name, result, is_error, child_loop_idTool finished; child_loop_id is Some for sub-agent tools
ProgressMessage[EXISTS]loop_id, tool_call_id, tool_name, textUser-facing status text (via ctx.on_progress)

Input-scoped Events

VariantStatusFieldsDescription
InputRejected[EXISTS]loop_id, reasonInputFilter rejected the user's message; agent loop returns immediately

Event Scoping (Bracket Relationships)

Events form a nested bracket structure:

AgentStart (+ config_snapshot)     -- session-scoped
  TurnStart                        -- turn-scoped (0-based index)
    MessageStart                   -- message-scoped (assistant message)
      MessageUpdate (N times)      -- streaming deltas
    MessageEnd
    ToolExecutionStart             -- tool-scoped (per tool call)
      ToolExecutionUpdate (0..N)   -- partial results
      ProgressMessage (0..N)       -- status text
    ToolExecutionEnd
    MessageStart                   -- message-scoped (tool result message)
    MessageEnd
  TurnEnd
  TurnStart                        -- next turn (tool round-trip)
    ...
  TurnEnd
AgentEnd                           -- session-scoped

For parallel evaluation:

ParallelLoopStart                  -- loop-scoped (lists all branch IDs)
  AgentStart (branch 1)            -- nested full lifecycle per branch
  AgentEnd (branch 1)
  AgentStart (branch 2)
  AgentEnd (branch 2)
ParallelLoopEnd                    -- loop-scoped (announces winner)

StreamDelta [EXISTS]

Incremental token-level updates from the LLM stream. Carried inside MessageUpdate events.

VariantStatusDescription
Text { delta }[EXISTS]A text token fragment
Thinking { delta }[EXISTS]A thinking/reasoning chunk (extended thinking mode only)
ToolCallDelta { delta }[EXISTS]A fragment of tool call argument JSON (accumulate until MessageEnd)

ContinuationKind [EXISTS]

How an agent_loop_continue call relates to the session's prior loops. Surfaced in AgentStart for observability.

VariantStatusDescription
Initial[EXISTS]First loop in a session via agent_loop(). The #[default] variant.
Default[EXISTS]Unspecified continuation; preserves original semantics
Rerun { tag }[EXISTS]Retry from equivalent state; tag is RFC 3339 UTC timestamp
Branch { tag }[EXISTS]Exploration of a different path from a branching point
Compaction[EXISTS]Standalone context-compaction pass; no LLM call

TurnTrigger [EXISTS]

Identifies what caused a new turn to begin. Carried in TurnStart.

VariantStatusDescription
User[EXISTS]First turn triggered by a user message
SubAgent[EXISTS]Invoked as a sub-agent by a parent agent
Continuation[EXISTS]Continuation turn: tool round-trip, steering, or Default/Rerun continuation
Branch[EXISTS]First turn of a Branch continuation; subsequent turns use Continuation

Event Flow

Producer: agent_loop (src/agent_loop/)
    |
    | mpsc::UnboundedSender<AgentEvent>
    v
Consumer: SessionRecorder (src/session/recorder.rs)
    |
    | on_event() dispatches by variant
    v
Storage: Session -> LoopRecord -> LoopEvent[]

The SessionRecorder consumes events and builds a structured tree:

  • AgentStart opens a LoopRecord (status: Running)
  • AgentEnd closes it (status: Completed or Rejected)
  • TurnEnd extracts config snapshots from assistant messages
  • ToolExecutionEnd records ChildLoopRef for sub-agent traceability
  • ParallelLoopEnd retroactively sets ParallelGroupRecord on all branch records
  • MessageUpdate events are optionally recorded (off by default; 100-1000x more numerous)
  • All other events append to LoopRecord.events as LoopEvent { sequence, event }

Code Reference

ConceptFile
AgentEvent, StreamDelta, ContinuationKind, TurnTriggersrc/types/event.rs
SessionRecorder, SessionRecorderConfigsrc/session/recorder.rs
Event emission (AgentStart, TurnStart, MessageUpdate, etc.)src/agent_loop/run.rs, src/agent_loop/streaming.rs
Tool lifecycle events (ToolExecutionStart/Update/End)src/agent_loop/tools.rs
LoopRecord, LoopEvent, Sessionsrc/session/model.rs

Conceptual Notes

  • before_task / after_task callbacks [EXISTS] -- Session-level callbacks on SessionRecorderConfig (G2). BeforeTaskFn fires on first AgentStart with new session_id; AfterTaskFn fires on flush(). These are semantically session-scoped, unlike before_loop/after_loop which fire per-loop.
  • Session Scope [EXISTS] -- SessionScope enum (Ephemeral / Persistent) on the Session struct (G7). Set via config [session] scope = "persistent".
  • Error Events -- The current design uses StopReason::Error and the on_error callback for LLM errors. A dedicated AgentEvent::Error variant for more granular error reporting (tool failures, network issues, etc.) is noted as a potential improvement in the source comments.
  • Event Replay -- LoopRecord.events stores the full event stream (as Vec<LoopEvent>), enabling replay or analysis of past runs. SessionRecorderConfig.include_streaming_events controls whether the high-volume MessageUpdate deltas are included.

Compaction System

The compaction system manages context window pressure by summarizing, truncating, or dropping older conversation turns when the token count approaches the model's limit. Two strategies coexist: a legacy in-memory approach that rewrites the message array, and a modern block-based approach that creates non-destructive overlays on LoopRecords.

Concept Overview

Compaction [EXISTS]
├── CompactionBlock [EXISTS] — non-destructive overlay
│   ├── keep_first, keep_compacted, keep_recent [EXISTS]
├── CompactionScope [EXISTS] — FixedCount(n) / TokenBudget
├── CompactionStrategy [EXISTS] — legacy in-memory
├── BlockCompactionStrategy [EXISTS] — modern overlay
├── TurnMap [EXISTS] — turn indices → message ranges
├── Callbacks: before/after compaction [EXISTS]
└── Config: consolidated in CompactionConfig [EXISTS]

CompactionBlock [EXISTS]

Non-destructive compaction overlay stored on LoopRecord alongside the original messages. When present, the context loader uses this block instead of raw messages. Three sections control what gets loaded into context.

FieldTypeStatusDescription
keep_firstOption<TurnRange>[EXISTS]Turns kept verbatim from the start; only populated for the MOST RECENT loop
keep_compactedOption<CompactedSection>[EXISTS]Fully summarised section; populated for ALL loops
keep_recentOption<CompactedSection>[EXISTS]Recent turns with truncated tool outputs; only populated for the MOST RECENT loop
created_atDateTime<Utc>[EXISTS]When this block was created

Loading logic:

  • Most recent loop: loads keep_first (original messages) + keep_compacted (summaries) + keep_recent (truncated)
  • Older loops: loads only keep_compacted (full-loop summary)
  • No compaction block: loads raw messages

Supporting Types

TypeStatusDescription
TurnRange { start_turn, end_turn }[EXISTS]Inclusive range of turn indices within a loop
CompactedSection { range, messages }[EXISTS]A turn range plus the replacement messages for that range

CompactionScope [EXISTS]

Controls how many earlier loops are included in compaction and context loading.

VariantStatusDescription
FixedCount(usize)[EXISTS]Compact a fixed number of earlier loops on the active chain (default: 3)
TokenBudget[EXISTS]Walk backward, accumulating per-loop token estimates, stop when max_context_tokens would be exceeded

TokenBudget note: The scope can include loops whose raw messages EXCEED max_context_tokens. This is intentional -- the compacted summaries will fit even when originals don't, enabling richer context for LLM-based summarisation strategies.


CompactionStrategy (Legacy) [EXISTS]

In-memory compaction that rewrites the message array. Used when AgentContext.session is None.

MethodStatusDescription
compact(messages, config) -> Vec<AgentMessage>[EXISTS]Takes ownership of messages and returns a compacted version

DefaultCompaction [EXISTS]

The built-in implementation. Delegates to compact_messages() which applies 3-level reduction:

  1. Truncate tool outputs
  2. Summarize turns
  3. Drop middle

BlockCompactionStrategy (Modern) [EXISTS]

Creates non-destructive CompactionBlock overlays. Used when AgentContext.session is Some.

MethodStatusDescription
keep_first(record, turn_map, config) -> Option<TurnRange>[EXISTS]Determine turns kept verbatim from start (most recent loop only)
keep_recent(record, turn_map, config) -> Option<CompactedSection>[EXISTS]Create recent section with truncated tool outputs (most recent loop only)
keep_compacted(record, turn_map, config, is_most_recent) -> Option<CompactedSection>[EXISTS]Create summarised section; for most recent: middle only; for older: entire loop
compact(record, config, is_most_recent) -> CompactionBlock[EXISTS]Default: assembles from the three methods above

DefaultBlockCompaction [EXISTS]

Stateless implementation. All parameters come from CompactionConfig.

SectionBehavior
keep_firstReturns turn range 0..keep_first_turns
keep_recentTruncates tool outputs to tool_output_max_lines
keep_compactedPer-turn one-liner summaries bounded by max_summary_tokens; drops remaining turns when budget exhausted

Limitation: DefaultBlockCompaction.keep_compacted is basic -- it drops turns that exceed the token budget rather than producing a holistic summary. More sophisticated strategies (e.g. LLM-based) should summarise ALL turns within the budget.


TurnMap [EXISTS]

Maps turn indices to message index ranges within a message array. Built from messages by grouping on TurnId.turn_index.

MethodStatusDescription
from_messages(messages) -> TurnMap[EXISTS]Build from messages; messages without turn_id are their own group
turn_count() -> u32[EXISTS]Number of turn groups
messages_for_range(range, all_msgs) -> &[AgentMessage][EXISTS]Slice of messages belonging to a TurnRange
turn_msg_range(turn_index) -> Option<(usize, usize)>[EXISTS]Message index range for a single turn

Orchestration [EXISTS]

Cross-loop compaction coordination. The orchestrator resolves scope, then creates CompactionBlocks for the current loop and earlier loops within scope.

FunctionStatusDescription
compact_session_loops(session, current_loop_id, strategy, config, max_context_tokens)[EXISTS]Creates blocks: current loop gets all three sections; earlier loops get only keep_compacted
build_context_from_session(session, current_loop_id, config, max_context_tokens)[EXISTS]Walks the loop chain, loads from CompactionBlocks where available, raw messages otherwise
resolve_scope(session, chain, scope, max_context_tokens)[EXISTS]Resolves CompactionScope to a concrete count of earlier loops

CompactionConfig [EXISTS]

Full compaction policy -- controls both WHEN and HOW to compact.

WHEN to compact

FieldTypeDefaultStatusDescription
compact_at_pctf640.90[EXISTS]Fraction of max_context_tokens at which headroom is measured
compact_budget_threshold_pctf640.05[EXISTS]Minimum headroom fraction before compaction fires
compaction_scopeCompactionScopeFixedCount(3)[EXISTS]How many earlier loops to include

HOW to compact

FieldTypeDefaultStatusDescription
keep_first_turnsusize2[EXISTS]Turns kept verbatim from start (most recent loop)
keep_recent_turnsusize10[EXISTS]Turns kept from end (extended to turn boundary)
max_summary_tokensusize2_000[EXISTS]Token budget for summarised middle section
tool_output_max_linesusize50[EXISTS]Max lines per tool output in keep_recent section

Code Reference

ConceptFile
CompactionBlock, TurnRange, CompactedSection, TurnMapsrc/context/compaction.rs
CompactionStrategy, DefaultCompaction, BlockCompactionStrategy, DefaultBlockCompactionsrc/context/strategy.rs
CompactionConfig, CompactionScope, ContextConfigsrc/context/config.rs
compact_session_loops(), build_context_from_session(), resolve_scope()src/context/orchestration.rs
compact_messages() (legacy in-memory)src/context/compact_messages.rs
ContextTracker (token tracking)src/context/tracker.rs
in_memory_strategy and block_strategy fieldssrc/context/config.rs (on CompactionConfig)

Conceptual Notes

  • before_compaction_start / after_compaction_end callbacks [EXISTS] -- Lifecycle hooks now fire around compaction. before_compaction_start fires before compaction begins (for pre-compaction indexing/memory extraction) and after_compaction_end fires after compaction completes (for post-compaction verification). Both are blank-by-default callbacks.
  • Config consolidation [EXISTS] -- Compaction strategies (in_memory_strategy and block_strategy) are now fields on CompactionConfig, consolidating what was previously split across ContextConfig.compaction and AgentLoopConfig. The strategies no longer live on AgentLoopConfig; all compaction policy and strategy configuration is in one place.
  • LLM-based Summarisation -- DefaultBlockCompaction.keep_compacted is a basic per-turn one-liner generator. The BlockCompactionStrategy trait is designed for more sophisticated strategies that call an LLM to produce holistic digests of all turns within the max_summary_tokens budget.
  • Compaction Events [EXISTS] -- CompactionStarted and CompactionEnded events bracket compaction execution, providing estimated token counts before/after. These are consumed by SessionRecorder for observability.
  • Legacy vs Modern -- Two systems coexist: CompactionStrategy (legacy, in-memory, rewrites messages) is used when AgentContext.session is None; BlockCompactionStrategy (modern, non-destructive overlays) is used when session data is available. The legacy path is preserved for backward compatibility and simple stateless use cases.

Configuration

Configuration controls agent behavior at three levels: context management (ContextConfig), execution safety (ExecutionLimits), and the unified loop config (AgentLoopConfig) that bundles model, hooks, compaction, limits, caching, retry, and filters into a single borrowed struct for each agent_loop call.

Concept Overview

Configuration [EXISTS]
├── ContextConfig [EXISTS] — max_context_tokens + compaction policy
├── CompactionConfig [EXISTS] — WHEN (thresholds, scope) + HOW (keep settings)
├── ExecutionLimits [EXISTS] — max_turns/tokens/duration/cost
├── CacheConfig [EXISTS] — Auto/Disabled/Manual
├── AgentLoopConfig [EXISTS] — 20+ fields (model, hooks, limits, filters)
├── Callback hooks [EXISTS] — 12 hook types across turn/loop/tool/error
├── ThinkingLevel [EXISTS] — Off/Minimal/Low/Medium/High
└── InputFilter [EXISTS] — Pass/Warn/Reject

ContextConfig [EXISTS]

Model constraints plus compaction policy. When set on AgentLoopConfig, enables automatic context management.

FieldTypeDefaultStatusDescription
max_context_tokensusize100_000[EXISTS]Maximum context tokens (the model's context window)
system_prompt_tokensusize4_000[EXISTS]Tokens reserved for the system prompt
compactionCompactionConfig(see below)[EXISTS]Compaction policy -- always present when context limits are set
keep_recentusize10[EXISTS]Legacy field (use compaction.keep_recent_turns instead)
keep_firstusize2[EXISTS]Legacy field (use compaction.keep_first_turns instead)
tool_output_max_linesusize50[EXISTS]Legacy field (use compaction.tool_output_max_lines instead)

CompactionConfig [EXISTS]

Full compaction policy -- controls both WHEN to compact and HOW to compact. Embedded in ContextConfig.compaction.

WHEN: Trigger Thresholds

FieldTypeDefaultStatusDescription
compact_at_pctf640.90[EXISTS]Fraction of max_context_tokens below which headroom is measured
compact_budget_threshold_pctf640.05[EXISTS]Minimum remaining headroom before compaction fires. With defaults (100k/4k): fires at ~81k tokens
compaction_scopeCompactionScopeFixedCount(3)[EXISTS]How many earlier loops to include: FixedCount(n) or TokenBudget

HOW: Compaction Parameters

FieldTypeDefaultStatusDescription
keep_first_turnsusize2[EXISTS]Turns kept verbatim from start (most recent loop only)
keep_recent_turnsusize10[EXISTS]Turns kept from end; extended to turn boundary so ToolCall/ToolResult pairs are never split
max_summary_tokensusize2_000[EXISTS]Token budget for the summarised middle section (total, not per-turn)
tool_output_max_linesusize50[EXISTS]Max lines per tool output in keep_recent section

ExecutionLimits [EXISTS]

Safety net against runaway agent loops. Checked before each turn by ExecutionTracker.

FieldTypeDefaultStatusDescription
max_turnsusize50[EXISTS]Maximum LLM turns (catches infinite tool-call loops)
max_total_tokensusize1_000_000[EXISTS]Maximum total tokens consumed across all turns
max_durationDuration600s[EXISTS]Maximum wall-clock duration
max_costOption<f64>None[EXISTS]Maximum cumulative dollar cost; requires model_config.cost rates to be set

ExecutionTracker [EXISTS]

Runtime state tracker that checks limits before each turn.

FieldStatusDescription
limits[EXISTS]The ExecutionLimits being enforced
turns[EXISTS]Turn counter
tokens_used[EXISTS]Accumulated token count
cost_accumulated[EXISTS]Accumulated dollar cost
started_at[EXISTS]Instant when tracking began

When a limit is hit, check_limits() returns a reason string. The agent loop injects a "[Agent stopped: ...]" user message so the LLM (and user) can see what happened.


CacheConfig [EXISTS]

Controls prompt caching behavior for providers that support it.

FieldTypeDefaultStatusDescription
enabledbooltrue[EXISTS]Master switch for caching hints
strategyCacheStrategyAuto[EXISTS]How cache breakpoints are placed

CacheStrategy [EXISTS]

VariantStatusDescription
Auto[EXISTS]Automatic breakpoint placement (system prompt + tool defs + recent history)
Disabled[EXISTS]No caching
Manual { cache_system, cache_tools, cache_messages }[EXISTS]Fine-grained control over what gets cached

AgentLoopConfig [EXISTS]

All static settings for a single agent_loop / agent_loop_continue call. Borrowed (&AgentLoopConfig) throughout the loop -- never mutated. 20+ fields organized by concern.

Model & Provider

FieldTypeStatusDescription
model_configModelConfig[EXISTS]Complete provider identity (model id, api_key, base_url, protocol, compat, cost)
provider_overrideOption<Arc<dyn StreamProvider>>[EXISTS]Bypasses ProviderRegistry dispatch; for testing or custom providers
thinking_levelThinkingLevel[EXISTS]Depth of model reasoning: Off, Minimal, Low, Medium, High
max_tokensOption<u32>[EXISTS]Override model_config.max_tokens for this call
temperatureOption<f32>[EXISTS]Temperature override

Context Transformation

FieldTypeStatusDescription
convert_to_llmOption<ConvertToLlmFn>[EXISTS]Converts AgentMessage[] to Message[] before each LLM call
transform_contextOption<TransformContextFn>[EXISTS]Transforms full context before convert_to_llm (pruning, reordering, injection)

Steering & Follow-up

FieldTypeStatusDescription
get_steering_messagesOption<GetMessagesFn>[EXISTS]Polled between tools for user interruptions
get_follow_up_messagesOption<GetMessagesFn>[EXISTS]Polled after agent finishes for queued work

Compaction

FieldTypeStatusDescription
context_configOption<ContextConfig>[EXISTS]Context window configuration; None disables compaction

Note: Compaction strategies have been consolidated into CompactionConfig (G5). See in_memory_strategy and block_strategy fields on CompactionConfig. The former compaction_strategy and block_compaction_strategy fields no longer exist on AgentLoopConfig.

Limits & Safety

FieldTypeStatusDescription
execution_limitsOption<ExecutionLimits>[EXISTS]Max turns, tokens, duration, cost
cache_configCacheConfig[EXISTS]Prompt caching configuration
tool_executionToolExecutionStrategy[EXISTS]Sequential, Parallel, or Batched
retry_configRetryConfig[EXISTS]Exponential backoff with jitter for transient errors

Callback Hooks -- Turn Level

FieldTypeStatusDescription
before_turnOption<BeforeTurnFn>[EXISTS](messages, turn_index) -> bool; return false to abort the turn
after_turnOption<AfterTurnFn>[EXISTS](messages, turn_usage)

Callback Hooks -- Loop Level

FieldTypeStatusDescription
before_loopOption<BeforeLoopFn>[EXISTS](messages, loop_index) -> bool; return false to abort
after_loopOption<AfterLoopFn>[EXISTS](new_messages, accumulated_usage)
on_errorOption<OnErrorFn>[EXISTS]Called when LLM returns StopReason::Error

Callback Hooks -- Tool Level

FieldTypeStatusDescription
before_tool_executionOption<BeforeToolExecutionFn>[EXISTS](tool_name, tool_call_id, args) -> bool; return false to skip
after_tool_executionOption<AfterToolExecutionFn>[EXISTS](tool_name, tool_call_id, is_error)
before_tool_execution_updateOption<BeforeToolExecutionUpdateFn>[EXISTS](tool_name, tool_call_id, text) -> bool; return false to suppress
after_tool_execution_updateOption<AfterToolExecutionUpdateFn>[EXISTS](tool_name, tool_call_id, text)

Input Filtering & Identity

FieldTypeStatusDescription
input_filtersVec<Arc<dyn InputFilter>>[EXISTS]Filters run in order; first Reject wins
first_turn_triggerTurnTrigger[EXISTS]Trigger type for first TurnStart; default User, set to SubAgent by sub-agent callers
config_idOption<String>[EXISTS]Stable identity for loop_id construction: "{session_id}.{config_id}.{N}"

Callback Hook Type Aliases [EXISTS]

All hooks are Option<Arc<dyn Fn(...)>>. None means no hook (zero overhead).

Type AliasSignatureStatus
ConvertToLlmFnBox<dyn Fn(&[AgentMessage]) -> Vec<Message>>[EXISTS]
TransformContextFnBox<dyn Fn(Vec<AgentMessage>) -> Vec<AgentMessage>>[EXISTS]
GetMessagesFnBox<dyn Fn() -> Vec<AgentMessage>>[EXISTS]
BeforeLoopFnArc<dyn Fn(&[AgentMessage], usize) -> bool>[EXISTS]
AfterLoopFnArc<dyn Fn(&[AgentMessage], &Usage)>[EXISTS]
BeforeTurnFnArc<dyn Fn(&[AgentMessage], usize) -> bool>[EXISTS]
AfterTurnFnArc<dyn Fn(&[AgentMessage], &Usage)>[EXISTS]
BeforeToolExecutionFnArc<dyn Fn(&str, &str, &serde_json::Value) -> bool>[EXISTS]
AfterToolExecutionFnArc<dyn Fn(&str, &str, bool)>[EXISTS]
BeforeToolExecutionUpdateFnArc<dyn Fn(&str, &str, &str) -> bool>[EXISTS]
AfterToolExecutionUpdateFnArc<dyn Fn(&str, &str, &str)>[EXISTS]
OnErrorFnArc<dyn Fn(&str)>[EXISTS]

InputFilter Trait [EXISTS]

Synchronous filter applied to user input before the LLM call. Intentionally synchronous for hot-path performance; use before_turn for async moderation.

MethodStatusDescription
filter(text) -> FilterResult[EXISTS]Returns Pass, Warn(String), or Reject(String)

FilterResult [EXISTS]

VariantStatusDescription
Pass[EXISTS]Message passes unchanged
Warn(String)[EXISTS]Message passes; warning appended to context for LLM to see
Reject(String)[EXISTS]Message rejected; agent loop returns immediately with InputRejected event

Filters run in order. First Reject wins and discards accumulated warnings. Warn messages accumulate and are appended to the user message.


ThinkingLevel [EXISTS]

Controls the depth of model reasoning before responding.

VariantStatusDescription
Off (default)[EXISTS]No thinking tokens; fastest and cheapest
Minimal[EXISTS]Lightest reasoning pass
Low[EXISTS]Shallow chain-of-thought
Medium[EXISTS]Balanced reasoning; default for most agentic workflows
High[EXISTS]Maximum reasoning budget; most expensive

Usage [EXISTS]

Token metrics per turn or accumulated.

FieldTypeStatusDescription
inputu64[EXISTS]Input tokens
outputu64[EXISTS]Output tokens
reasoningu64[EXISTS]Reasoning tokens (subset of output; non-zero for OpenAI o-series)
cache_readu64[EXISTS]Tokens served from cache
cache_writeu64[EXISTS]Tokens written to cache
total_tokensu64[EXISTS]Total tokens
MethodStatusDescription
estimated_cost(cost_config)[EXISTS]Dollar cost from per-million-token rates
combine(other)[EXISTS]Sum two Usage values
cache_hit_rate()[EXISTS]Fraction of input tokens from cache (0.0-1.0)

Code Reference

ConceptFile
ContextConfig, CompactionConfig, CompactionScopesrc/context/config.rs
ExecutionLimits, ExecutionTrackersrc/context/execution.rs
AgentLoopConfig and all callback type aliasessrc/agent_loop/config.rs
Usage, CacheConfig, CacheStrategy, ThinkingLevelsrc/types/usage.rs
InputFilter, FilterResult, EvaluationStrategysrc/types/parallel.rs
ToolExecutionStrategysrc/types/tool.rs
RetryConfigsrc/provider/retry.rs

Conceptual Notes

  • before_task / after_task [EXISTS] -- Session-level callbacks on SessionRecorderConfig. BeforeTaskFn: Arc<dyn Fn(&Session) -> bool> fires on first AgentStart with a new session_id. AfterTaskFn: Arc<dyn Fn(&Session)> fires on flush().
  • before_compaction_start / after_compaction_end [EXISTS] -- Compaction lifecycle callbacks (G1) on AgentLoopConfig. before_compaction_start(estimated_tokens, message_count) -> bool fires before CompactionStarted. after_compaction_end(msgs_before, msgs_after, tokens_before, tokens_after) fires after CompactionEnded.
  • Per-loop config tracking [EXISTS] -- Model, thinking_level, temperature, and other config values are captured per-loop in LoopConfigSnapshot on each LoopRecord (and in AgentStart.config_snapshot). Session no longer carries model_config, thinking_level, or temperature fields. Fallback hierarchy: Loop -> Agent default.
  • Config streamlining [DONE] -- Compaction strategies (in_memory_strategy, block_strategy) have been consolidated into CompactionConfig, completing G5. The dispatch logic in run.rs reads them from ctx_config.compaction. AgentLoopConfig no longer carries strategy fields.
  • ParallelLoopOutcome / ParallelLoopResult -- Defined in src/types/parallel.rs, these types support evaluational parallelism where multiple branches run concurrently and an EvaluationStrategy selects the winner. Related to config because parallel configs produce multiple AgentLoopConfig instances.

phi-core — System Architecture

1. Component Map

Agent trait + BasicAgent (src/agents/)

Responsibility: Agent (trait, agents/agent.rs) defines the runtime interface — prompting, state access, control, and steering queues. BasicAgent (struct, agents/basic_agent.rs) is the default in-memory implementation: owns the conversation, tools, and ModelConfig (provider identity), and is the application-facing entry point. Construction: BasicAgent::new(ModelConfig::anthropic(...)). The optional provider_override field bypasses ProviderRegistry for custom or test providers. SubAgentTool (agents/sub_agent.rs) implements AgentTool to delegate tasks to a child agent_loop(). Public interface:

  • prompt(text) — Send a text prompt; returns an event stream receiver.
  • prompt_messages(messages) — Send one or more messages as a prompt; returns an event stream receiver.
  • prompt_with_sender(text, tx) — Send a text prompt, streaming events to a caller-provided sender.
  • continue_loop() — Resume from existing context with ContinuationKind::Default; returns an event stream receiver.
  • continue_loop_with_sender(tx, kind) — Resume with an explicit ContinuationKind (Default, Rerun { tag }, or Branch { tag }), streaming events to a caller-provided sender.
  • steer(msg) — Queue a message that will be injected mid-run between tool executions.
  • follow_up(msg) — Queue a message to be processed after the agent would otherwise stop.
  • abort() — Cancel the in-progress run by signalling the cancellation token.
  • reset() — Clear messages, queues, streaming state, and cancel token to return the agent to its initial state.
  • save_messages() — Serialize the current conversation to a JSON string.
  • restore_messages(json) — Replace the current conversation with messages deserialized from a JSON string.
  • with_skills(skill_set) — Load skills and append their XML index to the system prompt per the AgentSkills standard.
  • with_mcp_server_stdio(cmd, args, env) — Connect to an MCP server by spawning a child process and add its tools to the agent.
  • with_mcp_server_http(url) — Connect to an MCP server via HTTP and add its tools to the agent.
  • with_openapi_file(path, config, filter) — Load tools from an OpenAPI spec file and add them to the agent.
  • with_openapi_url(url, config, filter) — Fetch an OpenAPI spec from a URL and add its tools to the agent.
  • with_openapi_spec(spec_str, config, filter) — Parse an OpenAPI spec string (JSON or YAML) and add its tools to the agent.
  • new_session() — Immediately rotate to a new session_id; resets loop counters and last_loop_id; returns the new session id.
  • check_and_rotate(threshold) — Rotate to a new session if the agent has been idle longer than threshold since the last prompt_* call; returns Some(new_session_id) on rotation, None otherwise.

BasicAgent state relevant to session management:

FieldTypeDescription
agent_idStringStable identifier across all sessions for this instance
session_idStringCurrent session identifier; updated by new_session()
loop_countersHashMap<String, usize>Per-config loop counter; cleared on new_session()
last_loop_idOption<String>Most recent loop; cleared on new_session()
last_active_atOption<DateTime<Utc>>Timestamp of last prompt_* call; used by check_and_rotate()

AgentLoop (src/agent_loop/)

Responsibility: The core execution engine. Manages the turn loop, tool dispatch, steering injection, follow-up processing, and lifecycle event emission. Public interface:

  • agent_loop(prompts, context, config, tx, cancel) — Start an agent run from new prompt messages, applying input filters, emitting lifecycle events, and returning all new messages produced.
  • agent_loop_continue(context, config, tx, cancel) — Resume from existing context (no new prompts); used for retries after errors or mid-conversation continuation.
  • agent_loop_parallel(prompts, base_context, configs, strategy, tx, cancel) -> ParallelLoopResult — Run N AgentLoopConfigs concurrently and evaluate results via EvaluationStrategy. When prompts is non-empty, each branch uses agent_loop; when prompts is empty, each branch uses agent_loop_continue (the user query is already the last message in base_context). base_context is cloned per branch (tools Arc-shared; message history deep-copied). All branches share the same session_id; each gets a distinct loop_id. ParallelLoopOutcome.original_context_len marks the base/branch message boundary. Emits ParallelLoopStart/ParallelLoopEnd events. selected_context feeds into agent_loop_continue() for normal session resumption.
  • derive_config_segment(config) -> String (pub crate) — Derives the stable {config_segment} portion of a loop_id from config.config_id or provider/model/thinking fields.

EvaluationLoop (src/agent_loop/evaluation.rs)

Responsibility: Pluggable strategy for selecting among parallel loop outcomes. Decoupled from src/agent_loop/ to allow custom implementations without a circular dependency (trait is defined in src/types/; implementations live here). Public interface:

  • EvaluationStrategy (trait, defined in src/types/)evaluate(prompts, outcomes, tx, cancel) -> (EvaluationDecision, Usage)
  • EvaluationDecision (enum, defined in src/types/)Select(usize) — 0-based index of the winning outcome.
  • ParallelLoopOutcome.original_context_len: usize — Number of messages in the cloned context at dispatch time. Allows strategies to split "original context" from "new branch output" messages without separate bookkeeping. Identical across all outcomes (same base context); outcomes[0] is the idiomatic source.
  • TransparentEvaluation — Single-branch pass-through; panics if > 1 outcome.
  • PickFirstEvaluation — Always selects index 0. Useful for testing.
  • TokenEfficientEvaluation — Selects the outcome with the lowest total token usage.
  • ElaborateEvaluation — Selects the outcome with the highest total token usage.
  • LlmJudgeEvaluation { judge_config, system_prompt } — Runs a separate LLM call to select the best branch. Supports both agent_loop mode (query from prompts) and agent_loop_continue mode (query extracted from last Message::User in context.messages[..original_context_len]). Includes prior conversation context in the judge prompt. Applies 2-iteration compaction: Iteration 1 compacts only prior context (3 tiers: tail-truncate → paragraph-summary → hard char limit), keeping outputs intact; Iteration 2 (if needed) compacts both context and outputs independently through the same tier pipeline. Budget derived from judge_config.context_config.max_context_tokens. Emits a ProgressMessage warning if comprehension criteria cannot be satisfied after iteration 2.

ContextManager (src/context/)

Responsibility: Token estimation, tiered context compaction, and execution limit tracking. Public interface:

  • estimate_tokens(text) — Rough token count heuristic: ~4 characters per token.
  • compact_messages(messages, config) — Reduce message list to fit token budget using a tiered strategy: truncate tool outputs → summarize old turns → drop middle messages.
  • CompactionStrategy (trait) — Interface for custom compaction logic; default implementation uses the tiered cascade (legacy compact_messages(); modern: CompactionBlock overlays).
  • ContextTracker — Tracks context window usage by combining provider-reported token counts with local estimates for recent messages.
  • ExecutionTracker — Tracks turns, cumulative tokens, and elapsed time against configured limits; signals when any limit is exceeded.
  • ContextConfig — Tuning knobs for compaction: token budget, system-prompt overhead, head/tail message preservation counts, per-tool-output line limit.
  • ExecutionLimits — Hard caps on agent execution: max turns, max total tokens, max wall-clock duration.

ProviderRegistry (src/provider/registry.rs, src/provider/mod.rs)

Responsibility: Dispatches StreamConfig to the correct provider implementation based on model_config.api: ApiProtocol. Built inline per agent_loop() call; zero allocation for a registry with all built-in providers pre-registered. Public interface:

  • ProviderRegistry::default() — Pre-registers all 7 built-in providers; used automatically by agent_loop() when AgentLoopConfig.provider_override is None.
  • ProviderRegistry::new() — Create an empty registry for custom provider sets.
  • Provider resolution: model_config.api selects the wire-protocol handler; model_config fields (id, api_key, base_url, compat, etc.) differentiate services within the same protocol.

StreamProvider implementations (src/provider/)

Responsibility: Translate the unified StreamConfig into provider-specific HTTP requests and parse streaming responses back into StreamEvents. Providers: AnthropicProvider, OpenAiCompatProvider (15+ backends), OpenAiResponsesProvider, AzureOpenAiProvider, GoogleProvider, GoogleVertexProvider, BedrockProvider, MockProvider. Public interface:

  • StreamProvider::stream(config, tx, cancel) -> Result<Message, ProviderError> — stream a single LLM response.
  • StreamProvider::provider_id() -> &str — stable lowercase identifier for this provider (e.g. "anthropic", "openai", "google", "bedrock"). Used as the first segment of the auto-derived config_id in loop_id construction.

ToolSystem (src/tools/)

Responsibility: Built-in tool implementations. Each implements AgentTool. Tools: BashTool (shell execution), ReadFileTool (text + image files), WriteFileTool (create/overwrite), EditFileTool (surgical search/replace), ListFilesTool (directory listing), SearchTool (grep/ripgrep). Public interface:

  • default_tools() — Returns the standard built-in toolset: bash, read-file, write-file, edit-file, list-files, search.
  • AgentTool::name() — Unique tool identifier used in LLM tool-use calls and event correlation.
  • AgentTool::label() — Human-readable display name for UI.
  • AgentTool::description() — Free-text description sent to the LLM to explain when to use the tool.
  • AgentTool::parameters_schema() — JSON Schema object describing the tool's accepted parameters.
  • AgentTool::execute(params, ctx) — Run the tool with resolved parameters and a context carrying the cancellation token and progress callbacks.

SubAgentTool (src/agents/sub_agent.rs)

Responsibility: Implements AgentTool to delegate tasks to a child agent_loop() with isolated context, its own toolset, and a turn limit. The child gets its own agent_id, session_id, and loop_id; its parent_loop_id is linked back to the calling loop via with_parent_loop_id. Public interface:

  • SubAgentTool::new(name, model_config).with_*(...) — Construct a sub-agent tool with its own ModelConfig (provider identity), system prompt, toolset, and turn limit, then register it as an AgentTool.
  • SubAgentTool::with_provider_override(provider) — Bypass ProviderRegistry dispatch; used in tests to inject MockProvider.
  • SubAgentTool::with_parent_loop_id(loop_id) — Supply the parent loop's loop_id so the child AgentStart event carries parent_loop_id, enabling ancestry tracing across the event stream.

SkillSystem (src/context/skills.rs)

Responsibility: Loads SKILL.md files from one or more directories, parses YAML frontmatter, and formats them as an XML index injected into the system prompt. Public interface:

  • SkillSet::load(dirs) — Load skills from multiple directories; later entries override earlier ones on name conflict.
  • SkillSet::load_dir(dir, source) — Load skills from a single directory, tagging each with a source label.
  • SkillSet::merge(other) — Merge another SkillSet in; the other's skills override on name conflict.
  • SkillSet::format_for_prompt() — Render the skill list as an <available_skills> XML block ready for system-prompt injection.

McpClient (src/mcp/)

Responsibility: MCP client that connects to external tool servers over stdio or HTTP. Adapts discovered tools into AgentTool instances. Public interface:

  • McpClient::connect_stdio(cmd, args, env) — Spawn a child process, complete the JSON-RPC initialize handshake, and return a connected client.
  • McpClient::connect_http(url) — Connect to an HTTP-based MCP server and complete the initialize handshake.
  • McpToolAdapter::from_client(client) — Query the server for available tools and return one AgentTool adapter per tool.

OpenApiAdapter (src/openapi/, feature-gated)

Responsibility: Parses OpenAPI 3.x specs and generates one AgentTool per operation. Each tool makes an HTTP request to the spec's base URL. Public interface:

  • from_file(path, config, filter) — Parse an OpenAPI spec from a local file and return one tool adapter per matching operation.
  • from_url(url, config, filter) — Fetch an OpenAPI spec over HTTP and return one tool adapter per matching operation.
  • from_str(spec, config, filter) — Parse an OpenAPI spec from an in-memory string (auto-detects JSON vs YAML) and return one tool adapter per matching operation. Availability: Only compiled when the openapi feature flag is enabled.

SessionStore (src/session/)

Responsibility: Persistent session layer. Records every AgentEvent into a structured tree of Session + LoopRecord objects, and provides both free-function and trait-based APIs for flat JSON-file persistence. Public interface:

  • SessionRecorder::new(config) — Create a recorder; call on_event(event) for every event on the agent's tx channel.
  • SessionRecorder::flush() — Finalize all open loops (status → Aborted) and move them into their sessions.
  • SessionRecorder::drain_completed() — Consume and return all completed sessions.
  • SessionRecorder::sessions() — Iterate all known sessions (completed + in-progress).
  • SessionRecorder::get_session(id) — Look up a session by session_id.
  • SessionRecorder::current_loop(id) — Look up an in-progress LoopRecord by loop_id.
  • save_session(session, dir) — Write {dir}/{session_id}.json (creates dir if needed). Atomic via tmp-file + rename.
  • load_session(session_id, dir) — Read {dir}/{session_id}.json.
  • list_session_ids(dir) — List all session ids in dir, newest first.
  • load_sessions_for_agent(agent_id, dir) — Load all sessions matching agent_id.
  • delete_session(session_id, dir) — Remove {dir}/{session_id}.json.
  • SessionStore trait — async save / load / list_ids / delete / list_for_agent for callers that want a pluggable store (custom backends, mocks). (Added 0.7.0)
  • FileSystemSessionStore::new(dir) — In-tree async impl of SessionStore. Adds advisory fs2 exclusive lock on save (returns SessionError::Locked if a concurrent writer holds it). (Added 0.7.0) File format: Pretty-printed JSON. Flat directory — one file per session, no index. Writes are atomic (tmp + rename) regardless of API surface used.

RetryEngine (src/provider/retry.rs)

Responsibility: Computes exponential-backoff delay with ±20% jitter. Classifies which errors are retryable. Public interface:

  • RetryConfig — Parameters for automatic retry: initial delay, backoff multiplier, max delay, max attempt count.
  • RetryConfig::delay_for_attempt(attempt) — Compute the sleep duration before attempt N using exponential backoff with ±20% jitter.
  • is_retryable() (on ProviderError) — Returns true only for RateLimited and Network variants; all other errors fail immediately.
  • retry_after() (on ProviderError) — Extracts the server-specified retry delay from a RateLimited { retry_after_ms: Some(...) } error, if present.

2. Dependency Graph

graph TD
    App["Application Code"] --> Agent
    Agent --> AgentLoop["AgentLoop\nagent_loop/"]
    AgentLoop --> ContextManager["ContextManager\ncontext/"]
    AgentLoop --> ProviderRegistry["Provider\ntraits.rs / registry.rs"]
    AgentLoop --> ToolSystem["ToolSystem\ntools/"]
    AgentLoop --> RetryEngine["RetryEngine\nprovider/retry.rs"]
    ProviderRegistry --> Anthropic["AnthropicProvider"]
    ProviderRegistry --> OpenAI["OpenAiCompatProvider\n(15+ backends)"]
    ProviderRegistry --> OpenAIResp["OpenAiResponsesProvider"]
    ProviderRegistry --> Azure["AzureOpenAiProvider"]
    ProviderRegistry --> Google["GoogleProvider"]
    ProviderRegistry --> Vertex["GoogleVertexProvider"]
    ProviderRegistry --> Bedrock["BedrockProvider"]
    ProviderRegistry --> Mock["MockProvider\n(tests)"]
    Agent --> SkillSystem["SkillSystem\ncontext/skills.rs"]
    Agent --> McpClient["McpClient\nmcp/"]
    Agent --> OpenApiAdapter["OpenApiAdapter\nopenapi/ (feature)"]
    McpClient --> ToolSystem
    OpenApiAdapter --> ToolSystem
    SubAgent["SubAgentTool\nsub_agent.rs"] --> AgentLoop
    ToolSystem --> SubAgent
    Types["types/\n(shared types)"] --> Agent
    Types --> AgentLoop
    Types --> ToolSystem
    Types --> ProviderRegistry
    SessionStore["SessionStore\nsession/"] --> Types
    App --> SessionStore

3. Data Flow

3.1 Simple Text Prompt (no tool calls)

sequenceDiagram
    participant App
    participant Agent
    participant AgentLoop
    participant Provider
    participant EventCh as EventChannel

    App->>Agent: prompt("What is 2+2?")
    Agent->>AgentLoop: agent_loop(prompts, context, config, tx, cancel)
    AgentLoop->>EventCh: AgentStart
    AgentLoop->>EventCh: TurnStart
    AgentLoop->>EventCh: MessageStart (user)
    AgentLoop->>EventCh: MessageEnd (user)
    AgentLoop->>Provider: stream(StreamConfig)
    Provider-->>EventCh: StreamEvent::Start
    Provider-->>EventCh: StreamEvent::TextDelta x N
    Provider-->>EventCh: StreamEvent::Done(Message)
    AgentLoop->>EventCh: MessageStart (assistant placeholder)
    AgentLoop->>EventCh: MessageUpdate x N (deltas)
    AgentLoop->>EventCh: MessageEnd (assistant final)
    AgentLoop->>EventCh: TurnEnd
    AgentLoop->>EventCh: AgentEnd(messages)
    App->>EventCh: receives events via rx.recv()

3.2 Tool Call Cycle

sequenceDiagram
    participant AgentLoop
    participant Provider
    participant BashTool
    participant EventCh as EventChannel

    AgentLoop->>Provider: stream(config with tool defs)
    Provider-->>AgentLoop: Done(Message{stop_reason: ToolUse, content: [ToolCall{...}]})
    AgentLoop->>EventCh: TurnEnd(assistant message)
    AgentLoop->>AgentLoop: extract tool_calls from assistant content
    AgentLoop->>EventCh: ToolExecutionStart(id, name, args)
    AgentLoop->>BashTool: execute(params, ToolContext)
    BashTool-->>EventCh: ProgressMessage (via on_progress callback)
    BashTool-->>AgentLoop: Ok(ToolResult)
    AgentLoop->>EventCh: ToolExecutionEnd(id, name, result, is_error=false)
    AgentLoop->>EventCh: MessageStart(ToolResult message)
    AgentLoop->>EventCh: MessageEnd(ToolResult message)
    AgentLoop->>AgentLoop: append tool results to context.messages
    AgentLoop->>Provider: stream(config, now includes tool results)
    Provider-->>AgentLoop: Done(Message{stop_reason: Stop})
    AgentLoop->>EventCh: TurnEnd
    AgentLoop->>EventCh: AgentEnd

3.3 Context Compaction Trigger

sequenceDiagram
    participant AgentLoop
    participant ContextManager
    participant Provider

    AgentLoop->>ContextManager: compact(messages, config)
    ContextManager->>ContextManager: total_tokens(messages) > budget?
    alt Level 1 fits
        ContextManager-->>AgentLoop: truncated tool outputs
    else Level 2 fits
        ContextManager-->>AgentLoop: old turns summarized
    else Level 3
        ContextManager-->>AgentLoop: first + recent kept, middle dropped
    end
    AgentLoop->>Provider: stream(config with compacted messages)

3.4 Sub-Agent Delegation

sequenceDiagram
    participant ParentLoop as Parent AgentLoop
    participant SubAgentTool
    participant ChildLoop as Child AgentLoop
    participant ChildProvider as Provider

    ParentLoop->>SubAgentTool: execute({task: "..."}, ToolContext)
    SubAgentTool->>SubAgentTool: build AgentContext with child identity<br/>(new agent_id, session_id, loop_id="{child_session}.sub.1",<br/>parent_loop_id = parent's loop_id)
    SubAgentTool->>ChildLoop: agent_loop([task_prompt], context, config, tx, cancel)
    ChildLoop->>ChildLoop: emit AgentStart{loop_id, parent_loop_id}
    ChildLoop->>ChildProvider: stream(...)
    ChildProvider-->>ChildLoop: streaming events
    ChildLoop-->>SubAgentTool: Vec<AgentMessage> (final messages)
    SubAgentTool->>SubAgentTool: extract_final_text(messages)
    SubAgentTool-->>ParentLoop: Ok(ToolResult{text, child_loop_id: Some(loop_id)})
    Note over ParentLoop: ToolExecutionEnd{child_loop_id} emitted<br/>→ parent stream records child ancestry

4. Data Models

Content

Entity: Content (enum)
  Variant Text:
    text: String               [the text content]
  Variant Image:
    data: String               [base64-encoded binary]
    mime_type: String          [e.g. "image/png", "image/jpeg"]
  Variant Thinking:
    thinking: String           [internal reasoning text]
    signature: Option<String>  [provider-specific thinking signature, optional]
  Variant ToolCall:
    id: String                 [unique call ID, e.g. UUID]
    name: String               [tool name matching AgentTool::name()]
    arguments: JSON            [parameter values matching tool's JSON Schema]

Serialization: tagged by "type" field ("text", "image", "thinking", "toolCall")

Message

Entity: Message (enum)
  Variant User:
    content: Vec<Content>      [usually a single Text block]
    timestamp: u64             [unix milliseconds]
  Variant Assistant:
    content: Vec<Content>      [text, thinking, tool call blocks]
    stop_reason: StopReason    [why the model stopped]
    model: String              [model ID returned by provider]
    provider: String           [provider name, e.g. "anthropic"]
    usage: Usage               [token counts for this turn]
    timestamp: u64             [unix milliseconds]
    error_message: Option<String>  [set when stop_reason == Error]
  Variant ToolResult:
    tool_call_id: String       [matches Content::ToolCall.id]
    tool_name: String          [matches Content::ToolCall.name]
    content: Vec<Content>      [tool output, usually a Text block]
    is_error: bool             [true if tool execution failed]
    timestamp: u64             [unix milliseconds]

Lifecycle: User messages are created by the caller. Assistant messages are
           created by the provider after streaming completes. ToolResult messages
           are created by the agent loop after tool execution.

AgentMessage

Entity: AgentMessage (enum, untagged)
  Variant Llm(LlmMessage)       [sent to the LLM; user/assistant/toolResult roles; LlmMessage wraps Message + Option<TurnId>]
  Variant Extension(ExtensionMessage)  [not sent to LLM; app-only metadata]

Note: stored in Agent.messages and AgentContext.messages
      Extension messages are filtered out before LLM calls

ExtensionMessage

Entity: ExtensionMessage
  role: String        [always "extension"]
  kind: String        [app-defined event type, e.g. "ui_update"]
  data: JSON          [arbitrary app-defined payload]

StopReason

Entity: StopReason (enum)
  Stop      -> model completed naturally
  Length    -> max_tokens limit hit
  ToolUse   -> model returned tool calls (loop must continue)
  Error     -> provider or streaming error occurred
  Aborted   -> cancellation token was triggered

Serialization: camelCase ("stop", "length", "toolUse", "error", "aborted")

Usage

Entity: Usage
  input: u64          [prompt tokens processed]
  output: u64         [completion tokens generated]
  cache_read: u64     [tokens served from prompt cache]
  cache_write: u64    [tokens written to prompt cache]
  total_tokens: u64   [sum, may be 0 if not reported]

Derived: cache_hit_rate() = cache_read / (input + cache_read + cache_write)

AgentEvent

Entity: AgentEvent (enum, #[serde(tag = "type")])

Every variant except AgentStart, ParallelLoopStart, and ParallelLoopEnd now carries
loop_id: String so that events from concurrent parallel branches can be reliably
attributed to the correct LoopRecord even when they are interleaved on one tx channel.

  AgentStart {
    agent_id:          String                    [stable agent instance identifier]
    session_id:        String                    [groups all loops in one session]
    loop_id:           String                    ["{session_id}.{config_id}.{N}" — unique per call]
    parent_loop_id:    Option<String>            [None for origin calls; Some for continuations/sub-agents]
    continuation_kind: Option<ContinuationKind>  [None=origin; Some(Default/Rerun/Branch)=continuation]
    timestamp:         DateTime<Utc>
    metadata:          Option<JSON>
  }
  AgentEnd {
    loop_id:  String                         [← identifies the loop]
    messages: Vec<AgentMessage>              [all new messages produced by this loop]
    usage:    Usage
    timestamp: DateTime<Utc>
    rejection: Option<String>               [Some if input filter blocked the run]
  }
  TurnStart {
    loop_id:      String
    turn_index:   u32
    timestamp:    DateTime<Utc>
    triggered_by: TurnTrigger               [what caused this turn to begin]
  }
  TurnEnd {
    loop_id:      String
    message:      AgentMessage
    usage:        Usage
    timestamp:    DateTime<Utc>
    tool_results: Vec<Message>
  }
  MessageStart  { loop_id: String, message }            [message streaming began]
  MessageUpdate { loop_id: String, message, delta }     [content delta arrived]
  MessageEnd    { loop_id: String, message }            [message complete]
  ToolExecutionStart  { loop_id: String, tool_call_id, tool_name, args }
  ToolExecutionUpdate { loop_id: String, tool_call_id, tool_name, partial_result }
  ToolExecutionEnd {
    loop_id:       String
    tool_call_id:  String
    tool_name:     String
    result:        ToolResult
    is_error:      bool
    child_loop_id: Option<String>           [Some only when tool spawned a sub-agent loop]
  }
  ProgressMessage { loop_id: String, tool_call_id, tool_name, text }
  InputRejected   { loop_id: String, reason }           [input filter blocked the prompt]
  ParallelLoopStart {                                   [loop_id NOT on this variant]
    session_id: String
    loop_ids:   Vec<String>                 [one loop_id per branch, in config order]
    timestamp:  DateTime<Utc>
  }
  ParallelLoopEnd {                                     [loop_id NOT on this variant]
    session_id:             String
    selected_loop_id:       String
    selected_config_index:  usize
    evaluation_usage:       Usage
    timestamp:              DateTime<Utc>
  }

StreamDelta

Entity: StreamDelta (enum)
  Text { delta: String }              [text content chunk]
  Thinking { delta: String }          [thinking content chunk]
  ToolCallDelta { delta: String }     [tool call argument chunk]

ToolContext

Entity: ToolContext
  tool_call_id: String               [for correlation with AgentEvent]
  tool_name: String                  [for correlation with AgentEvent]
  cancel: CancellationToken          [check is_cancelled() in long-running tools]
  on_update: Option<ToolUpdateFn>    [callback for streaming partial ToolResults]
  on_progress: Option<ProgressFn>    [callback for user-facing status text]

ContinuationKind

Entity: ContinuationKind (enum)
  Default                 [unspecified continuation — preserves legacy semantics]
  Rerun { tag: String }   [retry from an equivalent context; tag is RFC 3339 UTC timestamp]
  Branch { tag: String }  [explore a different path from a branching point; tag is RFC 3339 UTC timestamp]

Set on AgentContext.continuation_kind before calling agent_loop_continue().
Surfaced in AgentStart.continuation_kind (None = origin call).
TurnTrigger semantics:
  Default / Rerun → first turn uses TurnTrigger::Continuation
  Branch          → first turn uses TurnTrigger::Branch

TurnTrigger

Entity: TurnTrigger (enum)
  User      [first turn of an agent_loop() origin call with new user prompts]
  SubAgent  [first turn when running as a sub-agent via SubAgentTool]
  Continuation  [subsequent turns; tool round-trip, steering, or Default/Rerun continuation]
  Branch    [first turn of an agent_loop_continue(Branch) call]

Emitted in TurnStart.triggered_by.
Priority on first turn (run_loop):
  1. Branch continuation     → TurnTrigger::Branch
  2. Any other continuation  → TurnTrigger::Continuation
  3. Origin call             → config.first_turn_trigger (User or SubAgent)
Subsequent turns always use TurnTrigger::Continuation.

ToolResult / ToolError

Entity: ToolResult
  content:       Vec<Content>    [tool output content blocks]
  details:       JSON            [structured metadata, not sent to LLM, e.g. exit_code]
  child_loop_id: Option<String>  [set by sub-agent tools; None for all other tools]

Entity: ToolError (enum)
  Failed(String)          [general execution failure]
  NotFound(String)        [tool name not in registry]
  InvalidArgs(String)     [parameter validation failed]
  Cancelled               [CancellationToken was triggered]

ContextConfig

Entity: ContextConfig
  max_context_tokens: usize     [default: 100,000; total budget including system prompt]
  system_prompt_tokens: usize   [default: 4,000; reserved for system prompt]
  keep_recent: usize            [default: 10; messages always kept in full at tail]
  keep_first: usize             [default: 2; messages always kept at head]
  tool_output_max_lines: usize  [default: 50; L1 compaction per-tool-output limit]

Effective budget = max_context_tokens - system_prompt_tokens

ExecutionLimits / ExecutionTracker

Entity: ExecutionLimits
  max_turns: usize              [default: 50; LLM calls before forced stop]
  max_total_tokens: usize       [default: 1,000,000; cumulative token budget]
  max_duration: Duration        [default: 600s; wall-clock time limit]

Entity: ExecutionTracker (runtime state)
  limits: ExecutionLimits       [immutable config]
  turns: usize                  [incremented after each LLM call]
  tokens_used: usize            [cumulative; updated from provider Usage]
  started_at: Instant           [set on construction]

RetryConfig

Entity: RetryConfig
  max_retries: usize            [default: 3; 0 = no retries]
  initial_delay_ms: u64         [default: 1,000ms]
  backoff_multiplier: f64       [default: 2.0; exponential growth factor]
  max_delay_ms: u64             [default: 30,000ms; ceiling before jitter]

CacheConfig / CacheStrategy

Entity: CacheConfig
  enabled: bool                 [master switch; default: true]
  strategy: CacheStrategy

Entity: CacheStrategy (enum)
  Auto                          [provider places breakpoints automatically]
  Disabled                      [no caching hints sent]
  Manual {
    cache_system: bool          [cache system prompt]
    cache_tools: bool           [cache tool definitions]
    cache_messages: bool        [cache second-to-last message]
  }

StreamConfig (sent to provider)

Entity: StreamConfig
  model_config: ModelConfig     [REQUIRED — full provider identity: id, api_key, base_url, compat, cost]
  system_prompt: String
  messages: Vec<Message>        [LLM-only messages, Extension filtered out]
  tools: Vec<ToolDefinition>    [schema-only; no execute functions]
  thinking_level: ThinkingLevel
  max_tokens: Option<u32>       [overrides model_config.max_tokens when Some]
  temperature: Option<f32>
  cache_config: CacheConfig

Note: model identity (id, api_key, base_url, headers, compat) is accessed via
      model_config.id, model_config.api_key, etc. No top-level model or api_key fields.

ToolDefinition (sent to LLM)

Entity: ToolDefinition
  name: String              [matches AgentTool::name()]
  description: String       [matches AgentTool::description()]
  parameters: JSON          [JSON Schema object matching AgentTool::parameters_schema()]

Skill / SkillSet

Entity: Skill
  name: String              [from YAML frontmatter; skill identifier]
  description: String       [from YAML frontmatter; one-line capability summary]
  file_path: PathBuf        [absolute path to the SKILL.md file]
  base_dir: PathBuf         [absolute path to the skill's directory]
  source: String            [origin label: "dir:0", "dir:1", etc.]

Entity: SkillSet
  skills: Vec<Skill>

Lifecycle: Loaded from disk at startup via SkillSet::load(dirs).
           Formatted as XML via format_for_prompt() and appended to system prompt.
           Agent reads full SKILL.md on-demand when activating a skill via read_file tool.

QueueMode

Entity: QueueMode (enum) — controls steering/follow-up queue delivery

  OneAtATime   pop and return exactly one message per call (default)
  All          drain and return all queued messages at once

Used in: Agent.steering_mode, Agent.follow_up_mode

McpToolInfo / McpContent

Entity: McpToolInfo — tool metadata returned by MCP server
  name: String                  [tool identifier used in tools/call]
  description: Option<String>   [human-readable description; default empty string]
  inputSchema: JSON             [JSON Schema for the tool's parameters]

Entity: McpContent (enum) — content item in a tool call result
  Variant Text:
    type: "text"
    text: String
  Variant Image:
    type: "image"
    data: String    [base64-encoded]
    mimeType: String

Entity: McpToolCallResult
  content: Vec<McpContent>  [output from the tool]
  isError: bool             [true if the tool reported an error]

OpenApiConfig / OpenApiAuth / OperationFilter

Entity: OpenApiConfig — configuration for OpenAPI tool generation
  base_url: Option<String>          [overrides spec servers[0].url; trailing slash stripped]
  auth: OpenApiAuth                 [authentication method]
  custom_headers: Map<String,String> [extra headers added to every request]
  max_response_bytes: usize         [default: 65536 (64KB); response body truncation limit]
  timeout_secs: u64                 [default: 30; per-request timeout]
  name_prefix: Option<String>       [if set, tool names formatted as "{prefix}__{operationId}"]

Entity: OpenApiAuth (enum)
  None                              [no authentication]
  Bearer(token: String)             [Authorization: Bearer {token}]
  ApiKey { header: String, value: String }  [custom header: {header}: {value}]

Note: Bearer token and ApiKey value are redacted as "****" in debug output.

Entity: OperationFilter (enum) — controls which API operations become tools
  All                               [include all operations that have an operationId]
  ByOperationId(Vec<String>)        [include only operations whose id is in the list]
  ByTag(Vec<String>)                [include operations tagged with any listed tag]
  ByPathPrefix(String)              [include operations whose path starts with the prefix]

Session / LoopRecord / SessionRecorder

Entity: Session
  session_id:      String
  agent_id:        String
  created_at:      DateTime<Utc>
  last_active_at:  DateTime<Utc>
  formation:       SessionFormation  [Explicit | FirstLoop | InactivityTimeout{..}]
  parent_spawn_ref: Option<SpawnRef> [set when this session was a sub-agent spawn]
  loops:           Vec<LoopRecord>   [ordered by started_at]

Methods: root_loops(), children_of(loop_id), parallel_siblings(loop_id),
         get_loop(loop_id), total_usage()

Entity: LoopRecord
  loop_id:             String
  session_id:          String
  agent_id:            String
  parent_loop_id:      Option<String>
  continuation_kind:   Option<ContinuationKind>
  started_at:          DateTime<Utc>
  ended_at:            Option<DateTime<Utc>>
  status:              LoopStatus          [Pending | Running | Completed | Rejected | Aborted]
  rejection:           Option<String>
  config:              Option<LoopConfigSnapshot>  [model, provider, config_id + name, api, base_url, reasoning, context_window, max_tokens, thinking_level, temperature]
  messages:            Vec<AgentMessage>   [from AgentEnd.messages — authoritative]
  usage:               Usage
  metadata:            Option<JSON>
  events:              Vec<LoopEvent>      [full event stream; MessageUpdate opt-in]
  children_loop_ids:   Vec<String>         [same-session direct children]
  child_loop_refs:     Vec<ChildLoopRef>   [cross-session sub-agent spawn links]
  parallel_group:      Option<ParallelGroupRecord>

Entity: ChildLoopRef — outbound cross-session link on the parent LoopRecord
  tool_call_id:    String
  tool_name:       String
  child_loop_id:   String
  child_session_id: String

Entity: SpawnRef — inbound cross-session link on the child Session
  parent_session_id: String
  parent_loop_id:    String
  tool_call_id:      String
  tool_name:         String

Entity: ParallelGroupRecord
  all_loop_ids:         Vec<String>   [all branch loop_ids in config order]
  selected_loop_id:     String
  selected_config_index: usize
  evaluation_usage:     Usage
  is_selected:          bool          [true only on the winner's LoopRecord]

Entity: SessionRecorderConfig
  formation_policy:         SessionFormationPolicy  [PerSessionId | InactivityTimeout{secs}]
  include_streaming_events: bool                    [default: false — excludes MessageUpdate]

5. Integration Contracts

Anthropic Messages API

  • Endpoint: https://api.anthropic.com/v1/messages
  • Auth (standard): x-api-key: {ANTHROPIC_API_KEY} + anthropic-version: 2023-06-01
  • Auth (OAuth): authorization: Bearer {TOKEN} + beta headers claude-code-20250219,oauth-2025-04-20,fine-grained-tool-streaming-2025-05-14; x-app: cli; anthropic-dangerous-direct-browser-access: true; user-agent: claude-cli/2.1.2
  • Request: POST JSON with model, system (array of text blocks), messages, tools, max_tokens (default 8192), stream: true
  • Response: Server-Sent Events stream; events: message_start, content_block_start, content_block_delta, message_delta, message_stop
  • Tool args: Streamed as InputJsonDelta text fragments; buffered in arguments["__partial_json"]; parsed as complete JSON on content_block_stop
  • Thinking: ThinkingLevel mapped to {type:"enabled", budget_tokens: N} — Minimal→128, Low→512, Medium→2048, High→8192
  • Prompt caching: cache_control: {type: "ephemeral"} placed at system/last-tool-def/second-to-last-message per CacheStrategy
  • Content format: {type: "text"|"image"|"thinking"|"tool_use"|"tool_result", ...}
  • Tool results: Role "user", type "tool_result", fields: tool_use_id, content, is_error

OpenAI-Compatible APIs (Chat Completions)

  • Endpoints: https://api.openai.com/v1/chat/completions and 14+ compatible bases (xAI/Grok, Groq, Cerebras, Mistral, DeepSeek, etc.)
  • Auth: Authorization: Bearer {API_KEY}
  • Request: POST JSON with model, messages, tools, stream: true, stream_options: {include_usage: true}
  • max_tokens field name: "max_tokens" (most) or "max_completion_tokens" (OpenAI) — controlled by OpenAiCompat.max_tokens_field
  • System prompt: First message with role "system" or "developer" (OpenAI) — controlled by supports_developer_role
  • Thinking: reasoning_effort: "low"|"medium"|"high" if supports_reasoning_effort; response in delta.reasoning_content (OpenAI) or delta.reasoning (xAI)
  • Response: SSE stream; each chunk has choices[0].delta; tool args in delta.tool_calls[].function.arguments (incremental JSON string)

OpenAI Responses API

  • Endpoint: {base_url}/responses
  • Auth: Authorization: Bearer {OPENAI_API_KEY}
  • System prompt: "instructions" field (not "messages")
  • Message format: Different from Chat Completions — see Bedrock/Responses comparison below
  • Thinking: "reasoning": {effort: "low"|"medium"|"high"} field
  • SSE events: response.output_text.delta, response.reasoning.delta, response.function_call_arguments.start/delta/done, response.completed

Azure OpenAI

  • Endpoint: {base_url}/responses?api-version=2025-01-01-preview (base_url pattern: https://{resource}.openai.azure.com/openai/deployments/{deployment})
  • Auth: api-key: {AZURE_OPENAI_API_KEY} header (not Authorization: Bearer)
  • Request/Response: Same format as OpenAI Responses API

Google Generative AI (Gemini)

  • Endpoint: {base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={API_KEY}
  • Auth: API key as URL query parameter ?key=; no Authorization header
  • System prompt: "systemInstruction": {parts: [{text: "..."}]}
  • Tools: Single object {functionDeclarations: [...]} wrapping all tool definitions
  • Contents: Role "user" or "model"; ToolResults sent as {role:"user", parts:[{functionResponse:{name, response:{result: text}}}]}
  • Tool args: Delivered complete in one event (no streaming deltas); tool IDs auto-generated as "google-fc-{index}"
  • Response parsing: Custom SSE parser (not standard library); splits on \n\n, extracts data: line

Google Vertex AI

  • Endpoint: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:streamGenerateContent?alt=sse
  • Auth: Authorization: Bearer {OAUTH_TOKEN} (OAuth2, not API key in URL)
  • Request/Response: Identical to Google Generative AI; tool IDs generated as "vertex-fc-{index}"

Amazon Bedrock (ConverseStream)

  • Endpoint: {base_url}/model/{model}/converse-stream (base_url: https://bedrock-runtime.{region}.amazonaws.com)
  • Auth: Authorization: Bearer {token} or custom headers from model_config.headers; minimal SigV4 support
  • System prompt: "system" array: [{text: "..."}]
  • Tools: toolConfig.tools: [{toolSpec: {name, description, inputSchema: {json: schema}}}]
  • Tool results: {toolResult: {toolUseId, content: [...], status: "success"|"error"}}
  • Streaming format: Newline-delimited JSON (not standard SSE); events: contentBlockDelta, contentBlockStart, contentBlockStop, messageStop, metadata

Model Context Protocol (MCP)

  • Protocol: JSON-RPC 2.0

  • Message types:

    • Request: {jsonrpc:"2.0", id:u64, method:String, params:Option<Value>}
    • Response: {jsonrpc:"2.0", id:Option<u64>, result:Option<Value>, error:Option<{code:i64,message:String,data?}>}
    • Request IDs: auto-incremented AtomicU64 starting at 1
  • Initialization handshake (3 steps):

    1. Client sends initialize with {protocolVersion:"2024-11-05", capabilities:{}, clientInfo:{name:"phi-core",version:"<pkg>"}}
    2. Server responds with {protocolVersion, capabilities:{tools?,resources?,prompts?}, serverInfo:{name,version}}
    3. Client sends notifications/initialized notification (no params; server may ignore id)
  • Tool discovery: Client sends tools/list → server returns {tools: [{name, description?, inputSchema}]}

  • Tool execution: Client sends tools/call {name, arguments} → server returns {content:[{type:"text",text}|{type:"image",data,mimeType}], isError:bool}

  • Stdio transport: Spawns child process; newline-delimited JSON over stdin/stdout; tokio::sync::Mutex for concurrent access; shutdown: EOF on stdin then kill child

  • HTTP transport: POST JSON-RPC body to configured URL; stateless (no persistent connection)

  • Tool adapter: McpToolAdapter wraps McpToolInfo + Arc<Mutex<McpClient>>; optional prefix for namespace disambiguation ({prefix}__{name})

  • Error enum: Transport(String), Protocol(String), JsonRpc{code,message}, Serialization, Io, ConnectionClosed

OpenAPI

  • Spec formats: OpenAPI 3.x; auto-detected: first non-whitespace char { or [ → JSON, else YAML

  • Sources: from_file(path) (async read), from_url(url) (HTTP GET via reqwest), from_str(text) (in-memory)

  • Base URL resolution: config.base_urlspec.servers[0].url → error if neither set; trailing slashes stripped

  • Parameter classification:

    • Path parameters → URL {param} substitution (RFC 3986 percent-encoding); required
    • Query parameters → .query() chains; optional
    • Header parameters → .header() chains; optional
    • Cookie parameters → skipped (unsupported)
    • RequestBody (application/json only) → keyed as "body" (or "_request_body" on collision); required if requestBody.required
  • HTTP execution pipeline (per tool call):

    1. Validate params is object (or null treated as {})
    2. Substitute path params with percent-encoded values; error if any missing
    3. Build URL: {base_url}{path}
    4. Chain .query() for query params present in input
    5. Chain .header() for header params present in input
    6. Apply auth: Bearer.bearer_auth(), ApiKey.header(header, value), None → nothing
    7. Apply custom_headers
    8. If has_body: .json(params["body"])
    9. Send request; read full body text; truncate to max_response_bytes at UTF-8 boundary
    10. Return: "{METHOD} {URL} → {STATUS_CODE}\n\n{BODY}"
  • Operation filter: OperationFilter::All|ByOperationId|ByTag|ByPathPrefix; operations without operationId always skipped with warning

  • Tool naming: Default = operationId; with prefix = {prefix}__{operationId}

File System

  • Read: tokio::fs::read_to_string for text (max 1MB), tokio::fs::read for images (max 20MB)
  • Write: tokio::fs::write with automatic parent dir creation
  • Edit: Read → string replace (exact match, once) → write
  • List: Spawns find command via BashTool
  • Search: Spawns grep or rg command via BashTool

Shell

  • Execution: tokio::process::Command::new("bash").arg("-c").arg(command)
  • Timeout: tokio::time::sleep with default 120s, configurable
  • Output capture: stdout + stderr piped, truncated at 256KB each
  • Safety: Deny patterns checked before execution (substring match)
  • Exit code: Returned in ToolResult.details.exit_code; tool always returns Ok (non-zero is not a ToolError)

6. State Management

Agent-Level State (in Agent struct)

All fields on Agent:

FieldTypeNotes
system_promptStringImmutable once set; injected into every LLM call
modelStringModel identifier
api_keyStringAPI authentication key
thinking_levelThinkingLevelOff/Minimal/Low/Medium/High
max_tokensOption<u32>Max completion tokens
temperatureOption<f32>Sampling temperature
model_configOption<ModelConfig>Provider-specific extras (base_url, headers, compat flags)
messagesVec<AgentMessage>Grows on each prompt() call; reset by reset(); replaced by restore_messages()
toolsVec<Box<dyn AgentTool>>Tool instances (heap-allocated trait objects)
providerBox<dyn StreamProvider>Boxed, not Arc; owned exclusively by Agent
steering_queueArc<Mutex<Vec<AgentMessage>>>Written by steer(), drained by agent loop before each tool execution check
follow_up_queueArc<Mutex<Vec<AgentMessage>>>Written by follow_up(), drained when agent loop would stop
steering_modeQueueModeDefault: OneAtATime
follow_up_modeQueueModeDefault: OneAtATime
context_configOption<ContextConfig>If None, context compaction is disabled
execution_limitsOption<ExecutionLimits>If None, no hard limits enforced
cache_configCacheConfigPrompt caching hints (Anthropic)
tool_executionToolExecutionStrategyParallel (default), Sequential, or Batched
retry_configRetryConfigBackoff for RateLimited/Network errors
before_turnOption<BeforeTurnFn>Signature: fn(&[AgentMessage], turn_number: usize) -> bool; return false to abort
after_turnOption<AfterTurnFn>Signature: fn(&[AgentMessage], &Usage)
on_errorOption<OnErrorFn>Signature: fn(&str)
input_filtersVec<Arc<dyn InputFilter>>Applied in order before LLM call
(compaction strategies)(moved to ContextConfig.compaction)in_memory_strategy and block_strategy fields on CompactionConfig (G5)
cancelOption<CancellationToken>Created when prompt() starts, consumed by abort()
is_streamingboolSet true on prompt() entry, false on exit
agent_idStringUUID v4 generated once at Agent::new(); stable for the Agent's lifetime. Injected into every AgentContext built by this agent.
session_idStringUUID v4 generated once at Agent::new(); groups all loops under one session. Stable for the Agent's lifetime.
loop_countersHashMap<String, usize>Per-"{session_id}.{config_id}" monotonic counter; incremented by next_loop_id() to produce the N component of loop_id.
last_loop_idOption<String>loop_id of the most recently started loop; set after each prompt_* or continue_loop_* call. Becomes parent_loop_id on the next continuation.
before_loopOption<BeforeLoopFn>Hook called once before AgentStart. Signature: fn(&[AgentMessage], loop_index: usize) -> bool; return false to abort before AgentStart.
after_loopOption<AfterLoopFn>Hook called once after AgentEnd. Signature: fn(&[AgentMessage], &Usage).
before_tool_executionOption<BeforeToolExecutionFn>Hook called before each ToolExecutionStart. Signature: fn(&str, &str, &JSON) -> bool (tool_name, call_id, args); return false to skip.
after_tool_executionOption<AfterToolExecutionFn>Hook called after each ToolExecutionEnd. Signature: fn(&str, &str, bool) (tool_name, call_id, is_error).
before_tool_execution_updateOption<BeforeToolExecutionUpdateFn>Hook called before each ToolExecutionUpdate. Signature: fn(&str, &str, &str) -> bool (tool_name, call_id, text); return false to suppress the event.
after_tool_execution_updateOption<AfterToolExecutionUpdateFn>Hook called after each ToolExecutionUpdate (only when not suppressed). Signature: fn(&str, &str, &str).

Invariants:

  • assert!(!self.is_streaming) fires if prompt() is called while already running — callers must use steer() or follow_up() during active runs
  • cancel is always Some while is_streaming is true
  • messages must not end in an Assistant message before agent_loop_continue() is called
  • agent_id and session_id are always Some in any AgentContext built by Agent; direct callers of agent_loop_continue must also set them

AgentContext (per-run, passed into agent loop)

State ElementTypeDescription
system_promptStringImmutable for the duration of the run
messagesVec<AgentMessage>Mutated in-place: prompts appended, assistant messages appended, tool results appended; may be replaced by compaction
tools&[Box<dyn AgentTool>]Immutable for the duration of the run
agent_idOption<String>Stable agent instance ID. Set by Agent::prompt_*; also written back by agent_loop when None. Required (non-None) for agent_loop_continue.
session_idOption<String>Stable session ID. Same lifecycle as agent_id.
loop_idOption<String>Per-call identifier of the form "{session_id}.{config_id}.{N}". Set by Agent before calling agent_loop/agent_loop_continue; falls back to UUID if None at loop entry.
parent_loop_idOption<String>loop_id of the loop this call continues from. None for origin calls. Set by Agent::continue_loop_with_sender to Agent.last_loop_id.
continuation_kindOption<ContinuationKind>How this call relates to prior loops. None for origin; Some(Default|Rerun|Branch) for continuations.

ExecutionTracker (per-run)

StateInitialTransitions
turns0Incremented after each LLM call
tokens_used0Incremented by token count of each LLM response
started_atInstant::now()Immutable; compared against max_duration on each check

Steering/Follow-up Queue Modes

  • QueueMode::OneAtATime (default for both queues): on each read, lock mutex, pop the first message only, return as Vec of 1
  • QueueMode::All: on each read, lock mutex, drain all queued messages, return the full vec

Both queues are passed to AgentLoopConfig as closures (get_steering_messages, get_follow_up_messages) that capture the Arc<Mutex<>> pointer, enabling external callers to enqueue messages while the agent loop is running on another task.

Event Hook Ordering

All hooks fire in a guaranteed strict order relative to their paired events. This ordering is enforced at runtime and is an invariant of the system:

before_loop → AgentStart
  before_turn → TurnStart
    [MessageStart/End for initial prompts — first turn of agent_loop() only]
    [MessageStart/End for injected steering messages]
    [LLM: MessageStart → MessageUpdate* → MessageEnd]
    [per tool call:]
      before_tool_execution → ToolExecutionStart
        (before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)*
      ToolExecutionEnd → after_tool_execution
  TurnEnd → after_turn
  (repeat inner block for each follow-up / steering-triggered turn)
AgentEnd → after_loop

Short-circuit rules — hook returns false:

HookWhen false is returnedBehaviour
before_loopBefore AgentStartLoop is aborted; AgentEnd { messages: [] } is emitted; function returns immediately
before_turnBefore TurnStartTurn is skipped; TurnStart/TurnEnd are not emitted; AgentEnd is not guaranteed
before_tool_executionBefore ToolExecutionStartTool call is skipped; ToolExecutionStart/End are not emitted; a skipped error ToolResult is returned to the LLM
before_tool_execution_updateBefore ToolExecutionUpdateEvent is suppressed; after_tool_execution_update is not called; tool keeps running and final ToolResult is unaffected

7. Error Handling Strategy

Provider Errors (ProviderError)

ErrorRetryableHandling
RateLimited { retry_after_ms }YesExponential backoff; respects Retry-After header if present
Network(msg)YesExponential backoff
Auth(msg)NoPropagated immediately as StopReason::Error message
Api(msg)NoPropagated as StopReason::Error message
ContextOverflow { msg }NoDetected on HTTP 400/413; triggers compaction on next turn (see below)
CancelledNoLoop exits cleanly, AgentEnd emitted
Other(msg)NoPropagated as StopReason::Error message

Context Overflow Recovery

  1. Provider returns HTTP 400/413 matching any of 15+ known overflow phrases.
  2. ProviderError::classify() returns ContextOverflow.
  3. The overflow may arrive as an HTTP error (caught in retry loop) or as a streaming error event (StreamEvent::Error with matching message), caught by Message::is_context_overflow().
  4. On the next turn, if context_config is set, compact_messages() is called before the LLM call.
  5. If no context_config is set, the error message is included in conversation history and the loop continues — the LLM may self-recover or the next turn will also fail.

Tool Errors (ToolError)

  • Cancelled: Tool execution skipped; ToolResult content = "Skipped due to queued user message." with is_error: true
  • Failed(msg): Converted to ToolResult with error text; is_error: true; always returned to LLM so it can self-correct
  • InvalidArgs(msg): Same as Failed; LLM can retry with corrected parameters
  • NotFound(msg): Produced when tool name in ToolCall has no matching AgentTool; same handling as Failed

Input Filter Errors

  • Reject(reason): Emits AgentEvent::InputRejected, immediately emits AgentEvent::AgentEnd { messages: [] }, returns empty message list
  • Warn(msg): Warning text appended to last user message content; loop continues

Execution Limit Exhaustion

  • When any limit is exceeded, a synthetic user message [Agent stopped: {reason}] is appended to context and emitted as events.
  • Loop returns immediately after appending the message.
  • No error is thrown; AgentEnd is emitted normally.

Before-Turn Abort

  • If before_turn callback returns false, the loop returns immediately with no AgentEnd emitted.
  • This is the only path where AgentEnd is not guaranteed.

Error Propagation Across Components

Provider → ProviderError → stream_assistant_response() → Message{stop_reason: Error}
                                                        ↓
                                            on_error callback invoked
                                                        ↓
                                        AgentEvent::TurnEnd emitted
                                                        ↓
                                           agent loop returns

Implementation Roadmap

Generated from: ../reference/glossary.md, ../specs/architecture.md, ../architecture/algorithms.md Last updated: 2026-03-17 Paradigm: Language-agnostic / Implementation-independent

This roadmap defines six progressive stages of implementation derived from the reverse-engineered specification. Each level is a complete, testable stage. Complete and stabilize each level fully before advancing to the next.


Level 1 — Survive

Goal: The system can start, load configuration, initialize its core structures, and confirm it is alive. Nothing works end-to-end yet, but nothing crashes either.

Completion Criteria: A smoke test confirms the Agent can be constructed with a MockProvider, configured via builder methods, and all core data entities can be instantiated without error. No LLM call is required to pass Level 1.


Milestone 1.1 — Core Type System

  • REQ-001: Define the Content enum with four variants: Text { text }, Image { data: base64, mime_type }, Thinking { thinking, signature }, and ToolCall { id, name, arguments }. Serialized with a "type" discriminant field. (Source: [AR])

    • Depends on: —
    • Definition of Done: All four variants instantiate; round-trip JSON serialization produces the correct tagged shape.
  • REQ-002: Define the Message enum with three variants: User { content, timestamp }, Assistant { content, stop_reason, model, provider, usage, timestamp, error_message }, and ToolResult { tool_call_id, tool_name, content, is_error, timestamp }. (Source: [AR])

    • Depends on: REQ-001, REQ-005, REQ-006
    • Definition of Done: All three variants instantiate; serialization preserves the role field with values "user", "assistant", "toolResult".
  • REQ-003: Define AgentMessage as an untagged enum wrapping Llm(LlmMessage) and Extension(ExtensionMessage). (Source: [AR])

    • Depends on: REQ-002, REQ-004
    • Definition of Done: Both variants serialize/deserialize correctly; an Extension variant round-trips without loss.
  • REQ-004: Define ExtensionMessage with fields role: String (always "extension"), kind: String, and data: JSON. (Source: [AR])

    • Depends on: —
    • Definition of Done: Instantiates and serializes to {role:"extension", kind:"...", data:{...}}.
  • REQ-005: Define StopReason enum with variants Stop, Length, ToolUse, Error, Aborted. Serialized in camelCase. (Source: [AR])

    • Depends on: —
    • Definition of Done: All variants serialize to their documented camelCase strings.
  • REQ-006: Define Usage struct with fields input, output, cache_read, cache_write, total_tokens (all u64). Include a cache_hit_rate() derived method. (Source: [AR])

    • Depends on: —
    • Definition of Done: cache_hit_rate() returns cache_read / (input + cache_read + cache_write).
  • REQ-007: Define AgentEvent enum with all variants: AgentStart, AgentEnd { messages }, TurnStart, TurnEnd { message, tool_results }, MessageStart { message }, MessageUpdate { message, delta }, MessageEnd { message }, ToolExecutionStart { tool_call_id, tool_name, args }, ToolExecutionUpdate { tool_call_id, tool_name, partial_result }, ToolExecutionEnd { tool_call_id, tool_name, result, is_error }, ProgressMessage { tool_call_id, tool_name, text }, InputRejected { reason }. (Source: [AR])

    • Depends on: REQ-002, REQ-008
    • Definition of Done: All variants instantiate.
  • REQ-008: Define StreamDelta enum with variants Text { delta }, Thinking { delta }, ToolCallDelta { delta }. (Source: [AR])

    • Depends on: —
    • Definition of Done: All variants instantiate and carry their string payload.
  • REQ-009: Define ToolContext struct with fields tool_call_id, tool_name, cancel: CancellationToken, on_update: Option<ToolUpdateFn>, on_progress: Option<ProgressFn>. (Source: [AR])

    • Depends on: —
    • Definition of Done: Struct instantiates; callback fields accept closures/function pointers.
  • REQ-010: Define ToolResult { content: Vec<Content>, details: JSON } and ToolError enum with variants Failed(String), NotFound(String), InvalidArgs(String), Cancelled. (Source: [AR])

    • Depends on: REQ-001
    • Definition of Done: All variants instantiate; ToolError converts to a display string.
  • REQ-011: Define ContextConfig struct with fields and defaults: max_context_tokens (100,000), system_prompt_tokens (4,000), keep_recent (10), keep_first (2), tool_output_max_lines (50). (Source: [AR])

    • Depends on: —
    • Definition of Done: Default construction produces the documented default values.
  • REQ-012: Define ExecutionLimits struct with defaults max_turns (50), max_total_tokens (1,000,000), max_duration (600s); and ExecutionTracker runtime state with fields limits, turns, tokens_used, started_at. (Source: [AR])

    • Depends on: —
    • Definition of Done: ExecutionTracker::new(limits) initializes turns=0, tokens_used=0, started_at=now.
  • REQ-013: Define RetryConfig with defaults: max_retries (3), initial_delay_ms (1,000), backoff_multiplier (2.0), max_delay_ms (30,000). (Source: [AR])

    • Depends on: —
    • Definition of Done: Default construction produces documented defaults.
  • REQ-014: Define CacheConfig { enabled: bool, strategy: CacheStrategy } and CacheStrategy enum with variants Auto, Disabled, Manual { cache_system, cache_tools, cache_messages }. (Source: [AR])

    • Depends on: —
    • Definition of Done: All variants instantiate; default CacheConfig has enabled: true, strategy: Auto.
  • REQ-015: Define StreamConfig struct with fields model, system_prompt, messages: Vec<Message>, tools: Vec<ToolDefinition>, thinking_level, api_key, max_tokens, temperature, model_config, cache_config. (Source: [AR])

    • Depends on: REQ-014, REQ-016
    • Definition of Done: Struct instantiates with all optional fields as None.
  • REQ-016: Define ToolDefinition struct with fields name, description, parameters: JSON. (Source: [AR])

    • Depends on: —
    • Definition of Done: Struct instantiates and serializes to the expected JSON shape.
  • REQ-017: Define QueueMode enum with variants OneAtATime and All. (Source: [AR])

    • Depends on: —
    • Definition of Done: Both variants exist; default is OneAtATime.
  • REQ-018: All types in the AgentMessage tree derive Serialize and Deserialize. (Source: [OV])

    • Depends on: REQ-001 through REQ-017
    • Definition of Done: Full round-trip JSON serialization of a Vec<AgentMessage> containing all message types is lossless.
  • REQ-019: Define ThinkingLevel enum with variants Off, Minimal, Low, Medium, High. (Source: [OV])

    • Depends on: —
    • Definition of Done: All variants exist.

Milestone 1.2 — Core Traits

  • REQ-020: Define StreamProvider trait with a single method stream(config: StreamConfig, tx: EventSender, cancel: CancellationToken) -> Result<Message, ProviderError>. Define ProviderError enum with variants Api(String), Network(String), Auth(String), RateLimited { retry_after_ms: Option<u64> }, ContextOverflow { message: String }, Cancelled, Other(String). (Source: [AR])

    • Depends on: REQ-002, REQ-015
    • Definition of Done: Trait compiles; ProviderError variants all instantiate.
  • REQ-021: Define AgentTool trait with methods name() -> &str, label() -> &str, description() -> &str, parameters_schema() -> JSON, execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>. (Source: [AR])

    • Depends on: REQ-009, REQ-010
    • Definition of Done: Trait compiles; a minimal struct can implement it.
  • REQ-022: Define InputFilter trait with method filter(text: &str) -> FilterResult where FilterResult is Pass, Warn(String), or Reject(String). (Source: [OV])

    • Depends on: —
    • Definition of Done: Trait compiles; all three result variants exist.
  • REQ-023: Define CompactionStrategy trait with method compact(messages: Vec<AgentMessage>, config: ContextConfig) -> Vec<AgentMessage>. (Source: [AR])

    • Depends on: REQ-003, REQ-011
    • Definition of Done: Trait compiles; a struct can implement it.

Milestone 1.3 — Agent Struct Construction

  • REQ-024: Implement BasicAgent::new(model_config: ModelConfig) -> BasicAgent. Initialize all fields to documented defaults: messages = [], tools = [], thinking_level = Off, tool_execution = Parallel, steering_mode = OneAtATime, follow_up_mode = OneAtATime, context_config = Some(default), execution_limits = Some(default), retry_config = default, is_streaming = false, cancel = None. (Source: [PS])

    • Depends on: REQ-011 through REQ-017, REQ-019, REQ-020
    • Definition of Done: BasicAgent::new(ModelConfig::anthropic("m", "m", "k")) compiles and all fields have their documented defaults.
  • REQ-025: Implement builder methods: with_system_prompt(text), with_model_config(cfg), with_provider_override(provider), with_max_tokens(n), with_thinking(level). (Source: [PS])

    • Depends on: REQ-024
    • Definition of Done: Method chain BasicAgent::new(ModelConfig::anthropic("m", "m", "k")).with_system_prompt("x") compiles and all fields are set correctly.
  • REQ-026: Implement with_tools(vec), with_context_config(cfg), with_execution_limits(limits), with_retry_config(cfg), with_cache_config(cfg), with_tool_execution(strategy), with_steering_mode(mode), with_follow_up_mode(mode). (Source: [PS])

    • Depends on: REQ-024
    • Definition of Done: All builders set their respective fields; with_tools replaces (or extends) the tools list.
  • REQ-027: Initialize steering_queue and follow_up_queue as Arc<Mutex<Vec<AgentMessage>>> in BasicAgent::new. (Source: [AR])

    • Depends on: REQ-003, REQ-024
    • Definition of Done: Both queues are non-null, independently lockable, and start empty.

Milestone 1.4 — AgentContext and AgentLoopConfig

  • REQ-028: Define AgentContext struct with fields system_prompt: String, messages: Vec<AgentMessage>, tools: &[Box<dyn AgentTool>]. (Source: [AR])

    • Depends on: REQ-003, REQ-021
    • Definition of Done: Struct compiles; messages is mutable in-place during the loop.
  • REQ-029: Define AgentLoopConfig struct bundling all behavioral settings: provider, model, api_key, thinking_level, max_tokens, temperature, model_config, get_steering_messages: Option<Fn()>, get_follow_up_messages: Option<Fn()>, context_config, compaction_strategy, execution_limits, cache_config, tool_execution, retry_config, before_turn, after_turn, on_error, input_filters, transform_context, convert_to_llm. (Source: [OV])

    • Depends on: REQ-011 through REQ-017, REQ-023
    • Definition of Done: Struct compiles with all optional fields as None.

Milestone 1.5 — MockProvider and Smoke Test

  • REQ-030: Implement MockProvider that implements StreamProvider. Accepts a list of pre-configured responses to return in sequence. Returns a Message::Assistant with stop_reason: Stop and configurable text content. (Source: [AR])

    • Depends on: REQ-020
    • Definition of Done: MockProvider::new(vec![response1, response2]) returns each response in order when stream() is called; after exhausting the list, returns a default stop response.
  • REQ-031: Smoke test: construct Agent::new(MockProvider::new([])), configure with builder methods, verify all fields are set correctly, and confirm no panic occurs. (Source: [OV])

    • Depends on: REQ-024 through REQ-030
    • Definition of Done: Test passes with zero panics; all configured fields read back correctly.

Level 2 — Useful

Goal: The primary use cases from the spec work end-to-end on valid, well-formed inputs. An agent can accept a prompt, call an LLM, execute tool calls, and return a final response.

Completion Criteria: Every primary use case from ../reference/glossary.md executes successfully with valid inputs and a real (or mock) provider: single-turn text response, multi-turn tool call cycle, message persistence round-trip, and agent reset. The built-in coding tools all execute on valid inputs.


Milestone 2.1 — Event Channel Infrastructure

  • REQ-032: Implement an unbounded async event channel. The agent_loop holds the sender (tx); callers receive from the receiver (rx). The channel never blocks the sender. (Source: [AR])

    • Depends on: REQ-007
    • Definition of Done: Sender can emit 1,000 events without blocking; receiver drains them all in order.
  • REQ-033: Implement CancellationToken with methods new(), cancel(), is_cancelled() -> bool, child_token() -> CancellationToken. Cancelling a parent automatically cancels all children. (Source: [AR])

    • Depends on: —
    • Definition of Done: Cancelling a root token causes is_cancelled() to return true on both the root and any child tokens created from it.

Milestone 2.2 — Agent Prompt Entry Point

  • REQ-034: Implement Agent::prompt(text: String) -> EventReceiver as a thin wrapper that constructs a User message and delegates to prompt_messages. (Source: [PS])

    • Depends on: REQ-002, REQ-035
    • Definition of Done: agent.prompt("hello") returns a receiver immediately (non-blocking).
  • REQ-035: Implement Agent::prompt_messages_with_sender(messages, tx): set is_streaming = true, create CancellationToken, build AgentContext snapshot, build AgentLoopConfig (wiring queue closures), spawn agent_loop, merge returned messages into Agent.messages on completion, set is_streaming = false. (Source: [PS])

    • Depends on: REQ-027, REQ-028, REQ-029, REQ-033, REQ-036
    • Definition of Done: After the spawned task completes, agent.messages contains the new messages and is_streaming is false.

Milestone 2.3 — Agent Loop Core

  • REQ-036: Implement agent_loop: emit AgentStart, append prompts to context.messages, emit TurnStart/MessageStart/MessageEnd for each prompt, call run_loop, emit AgentEnd, return new messages. (Source: [PS])

    • Depends on: REQ-032, REQ-037
    • Definition of Done: With MockProvider, a single call emits AgentStart, at least one TurnStart/TurnEnd pair, and AgentEnd; returned messages include the input prompt and the assistant response.
  • REQ-037: Implement agent_loop_continue: emit AgentStart/TurnStart, call run_loop, emit AgentEnd. (Source: [PS])

    • Depends on: REQ-036
    • Definition of Done: Resumes from existing context without re-appending prompts.
  • REQ-038: Implement run_loop inner loop (happy path only: no steering, no follow-ups, no limits): call stream_assistant_response, append assistant message, extract tool calls, call execute_tool_calls, append tool results, loop until no more tool calls, then break. (Source: [PS])

    • Depends on: REQ-039, REQ-045, REQ-060
    • Definition of Done: With a MockProvider that returns one tool call then one Stop, run_loop executes the tool and calls the LLM a second time before stopping.

Milestone 2.4 — LLM Streaming (Happy Path)

  • REQ-039: Implement stream_assistant_response (no retry): build StreamConfig from context and config, call provider.stream(), process stream events (Start → emit MessageStart; TextDelta/ThinkingDelta/ToolCallDelta → emit MessageUpdate; Done → emit MessageEnd; Error → emit MessageStart+MessageEnd), return final Message. (Source: [PS])

    • Depends on: REQ-007, REQ-008, REQ-015, REQ-020, REQ-032
    • Definition of Done: With MockProvider, caller receives MessageStart, one or more MessageUpdate with text deltas, and MessageEnd containing the complete assembled message.
  • REQ-040: Implement AnthropicProvider::stream: POST to https://api.anthropic.com/v1/messages with x-api-key + anthropic-version: 2023-06-01 headers, stream: true body; parse SSE events (message_start, content_block_start, content_block_delta, message_delta, message_stop); buffer InputJsonDelta tool-argument fragments; parse complete JSON on content_block_stop; emit StreamEvents. (Source: [AR])

    • Depends on: REQ-020, REQ-039
    • Definition of Done: Integration test with a real or stubbed Anthropic endpoint produces a correctly parsed Message::Assistant with usage stats.
  • REQ-041: Implement OpenAiCompatProvider::stream: POST to configured base URL + /chat/completions with Authorization: Bearer header, stream: true, stream_options: {include_usage: true}; parse SSE chunks choices[0].delta; accumulate tool-call argument strings; emit StreamEvents. (Source: [AR])

    • Depends on: REQ-020, REQ-039
    • Definition of Done: Correctly parses a streamed chat-completion response from any OpenAI-compatible endpoint.
  • REQ-042: Implement ProviderRegistry with new() (empty) and default() (pre-registers AnthropicProvider and OpenAiCompatProvider). ProviderRegistry itself implements StreamProvider, dispatching based on ApiProtocol or model prefix. (Source: [AR])

    • Depends on: REQ-040, REQ-041
    • Definition of Done: ProviderRegistry::default() can route a config to AnthropicProvider or OpenAiCompatProvider without manual dispatch.
  • REQ-043: Implement StopReason determination in each provider: map provider-specific stop signals to the unified StopReason enum ("end_turn"/"stop"Stop; "max_tokens"/"length"Length; "tool_use"/"tool_calls"ToolUse; cancellation → Aborted; errors → Error). (Source: [PS])

    • Depends on: REQ-005, REQ-040, REQ-041
    • Definition of Done: Each stop signal string maps to exactly one StopReason variant.
  • REQ-044: Filter Extension messages out of AgentMessage history before building StreamConfig.messages. Only Llm(LlmMessage) variants are sent to the LLM (note: LlmMessage wraps Message + Option<TurnId>). (Source: [AR])

    • Depends on: REQ-003, REQ-015
    • Definition of Done: An AgentMessage::Extension present in context.messages does not appear in the StreamConfig sent to the provider.

Milestone 2.5 — Tool Execution (Happy Path)

  • REQ-045: Implement execute_tool_calls dispatching to the configured ToolExecutionStrategy. For Parallel (default), use execute_batch. (Source: [PS])

    • Depends on: REQ-046
    • Definition of Done: Multiple tool calls from one LLM response are dispatched concurrently; results arrive in original call order.
  • REQ-046: Implement execute_single_tool: find tool by name, emit ToolExecutionStart, build ToolContext with child cancel token and callbacks, call tool.execute(args, ctx), emit ToolExecutionEnd, construct Message::ToolResult, emit MessageStart/MessageEnd, return (ToolResult, is_error). (Source: [PS])

    • Depends on: REQ-007, REQ-009, REQ-010, REQ-021, REQ-033
    • Definition of Done: A registered tool is called; its result is wrapped in a ToolResult message; ToolExecutionStart and ToolExecutionEnd events are emitted.
  • REQ-047: Implement BashTool::execute (basic): extract command param, run bash -c {command}, capture stdout+stderr, construct text output ("Exit code: N\n{stdout}" or "Exit code: N\nSTDOUT:\n{stdout}\nSTDERR:\n{stderr}"), return Ok(ToolResult). (Source: [PS])

    • Depends on: REQ-010, REQ-021
    • Definition of Done: echo "hello" returns Ok(ToolResult) with text containing "Exit code: 0" and "hello".
  • REQ-048: Implement ReadFileTool::execute (basic text path): extract path param, read file to string, split into lines, apply optional offset/limit, produce line-numbered output with header, return Ok(ToolResult). (Source: [PS])

    • Depends on: REQ-010, REQ-021
    • Definition of Done: Reading a known text file returns numbered lines; partial reads with offset/limit return the correct slice with a range header.
  • REQ-049: Implement WriteFileTool::execute: extract path and content params, create parent directories as needed, write file, return Ok(ToolResult). (Source: [AR])

    • Depends on: REQ-010, REQ-021
    • Definition of Done: Writing to a path with non-existent parent directories succeeds; file is created on disk with correct content.
  • REQ-050: Implement EditFileTool::execute (basic): extract path, old_text, new_text; read file; replace the first occurrence of old_text with new_text; write back; return confirmation text. (Source: [PS])

    • Depends on: REQ-010, REQ-021
    • Definition of Done: A known substitution in an existing file is applied correctly; confirmation message reports old/new line counts.
  • REQ-051: Implement ListFilesTool::execute (basic): extract path, pattern, max_depth; build and run find command with exclusions for target/, .git/, node_modules/; return file paths as text. (Source: [PS])

    • Depends on: REQ-010, REQ-021
    • Definition of Done: Listing a known directory returns its files; excluded directories do not appear in results.
  • REQ-052: Implement SearchTool::execute (basic): extract pattern, path, include, case_sensitive; prefer rg, fall back to grep; return matching lines. (Source: [PS])

    • Depends on: REQ-010, REQ-021
    • Definition of Done: Searching for a known string in a known directory returns matching file paths and line content.
  • REQ-053: Implement default_tools() returning a Vec<Box<dyn AgentTool>> containing all six built-in tools: Bash, ReadFile, WriteFile, EditFile, ListFiles, Search. (Source: [AR])

    • Depends on: REQ-047 through REQ-052
    • Definition of Done: default_tools() returns exactly 6 tools with distinct names.

Milestone 2.6 — Context Compaction (Happy Path)

  • REQ-054: Implement estimate_tokens(text) -> usize using the heuristic ceil(byte_length / 4). (Source: [PS])

    • Depends on: —
    • Definition of Done: estimate_tokens("hello") returns 2 (5 bytes / 4, rounded up).
  • REQ-055: Implement content_tokens(content: Vec<Content>) -> usize and message_tokens(msg: AgentMessage) -> usize per the specified formulas (image tokens: clamp(raw_bytes/750, 85, 16000); per-message overhead: +4 for user/assistant, +8 for tool result). (Source: [PS])

    • Depends on: REQ-001, REQ-003, REQ-054
    • Definition of Done: Token counts match the specified formulas for each content type.
  • REQ-056: Implement compact_messages(messages, config) -> Vec<AgentMessage>: if under budget, return unchanged; else cascade through Level 1 → Level 2 → Level 3 until budget is satisfied. (Source: [PS])

    • Depends on: REQ-055, REQ-057, REQ-058, REQ-059
    • Definition of Done: compact_messages called on a history exceeding budget returns a smaller history with total_tokens <= budget.
  • REQ-057: Implement level1_truncate_tool_outputs: for each ToolResult message, truncate each Text content block to at most max_lines using head+tail preservation with an omission marker. (Source: [PS])

    • Depends on: REQ-003, REQ-054
    • Definition of Done: A 200-line tool output truncated to max_lines=50 produces a 50-line result with "[... N lines truncated ...]" marker.
  • REQ-058: Implement level2_summarize_old_turns: keep the last keep_recent messages in full; replace older assistant+tool-result groups with a single one-line summary user message. (Source: [PS])

    • Depends on: REQ-003, REQ-054
    • Definition of Done: Old assistant messages and their tool results are replaced by "[Summary] ..." user messages; recent messages are untouched.
  • REQ-059: Implement level3_drop_middle: keep keep_first head messages and keep_recent tail messages; replace the dropped middle with a marker message. Implement keep_within_budget fallback that greedily keeps the most-recent messages fitting the budget. (Source: [PS])

    • Depends on: REQ-003, REQ-054
    • Definition of Done: Result contains the first N and last M messages with a marker; total tokens fits the budget.
  • REQ-060: Integrate compact_messages call in run_loop before each LLM call when context_config is Some. (Source: [PS])

    • Depends on: REQ-038, REQ-056
    • Definition of Done: When configured, each LLM call is preceded by a compaction pass; when context_config is None, no compaction occurs.

Milestone 2.7 — Execution Limits

  • REQ-061: Implement ExecutionTracker::record_turn(tokens: usize) (increments turns and adds to tokens_used) and check_limits() -> Option<String> (returns a reason string if any limit is exceeded: turns, total tokens, or wall-clock duration). (Source: [AR])

    • Depends on: REQ-012
    • Definition of Done: check_limits() returns None when under all limits and Some("max turns exceeded") when over.
  • REQ-062: Integrate execution limit checking in run_loop: call tracker.check_limits() at the start of each inner loop iteration; if exceeded, append a synthetic User message "[Agent stopped: {reason}]", emit MessageStart/MessageEnd, and return. (Source: [PS])

    • Depends on: REQ-038, REQ-061
    • Definition of Done: An agent with max_turns=2 stops after exactly 2 LLM calls; the last message contains the stop reason.

Milestone 2.8 — Message Persistence and Agent Control

  • REQ-063: Implement Agent::save_messages() -> String: serialize agent.messages to a JSON string. (Source: [OV])

    • Depends on: REQ-018
    • Definition of Done: save_messages() returns a valid JSON array; the string can be parsed back without error.
  • REQ-064: Implement Agent::restore_messages(json: &str): deserialize the JSON string into Vec<AgentMessage> and replace agent.messages. (Source: [OV])

    • Depends on: REQ-018, REQ-063
    • Definition of Done: After save_messages()restore_messages(), the agent's message history is identical to the original.
  • REQ-065: Implement Agent::reset(): clear messages, drain both queues, cancel any active run, reset is_streaming to false, drop the cancel token. (Source: [AR])

    • Depends on: REQ-033
    • Definition of Done: After reset(), messages is empty, both queues are empty, and is_streaming is false.
  • REQ-066: Implement Agent::steer(msg: AgentMessage) (push to steering_queue) and Agent::follow_up(msg: AgentMessage) (push to follow_up_queue). (Source: [AR])

    • Depends on: REQ-027
    • Definition of Done: After steer(msg), the steering queue contains exactly that message and is safe to read from another thread.
  • REQ-067: Implement Agent::abort(): if a cancel token exists, call cancel() on it. (Source: [AR])

    • Depends on: REQ-033, REQ-035
    • Definition of Done: Calling abort() during an active run causes cancel.is_cancelled() to return true inside the running agent loop.

Level 3 — Smart

Goal: The system handles reality. Invalid inputs, missing data, external failures, and edge cases are all handled gracefully. Every [invariant] and ERROR branch from the pseudocode is implemented.

Completion Criteria: No unhandled exception can be triggered by a known class of bad input. All error paths from ../architecture/algorithms.md are covered: provider failures, tool errors, context overflow, execution limits, filter rejections, and cancellation.


Milestone 3.1 — Input Filter Chain

  • REQ-068: Implement the input filter chain at the start of agent_loop: join all Text content from User messages in prompts, run each registered InputFilter in order. (Source: [PS])

    • Depends on: REQ-022, REQ-036
    • Definition of Done: A filter registered via with_input_filter is called with the user's text before any LLM call.
  • REQ-069: On first Reject result, emit InputRejected { reason } then AgentEnd { messages: [] } and return an empty message list immediately. (Source: [PS])

    • Depends on: REQ-068
    • Definition of Done: A rejecting filter stops the run before the first LLM call; the caller's event stream contains InputRejected followed by AgentEnd.
  • REQ-070: Accumulate Warn results; after all filters pass, append all warning text as Content::Text to the last User message before it is appended to context. (Source: [PS])

    • Depends on: REQ-068
    • Definition of Done: A warning filter adds "[Warning: ...]" text to the user message; the run continues normally.

Milestone 3.2 — Retry Engine

  • REQ-071: Implement delay_for_attempt(config, attempt) -> Duration: exponential backoff formula initial_delay_ms * (multiplier ^ (attempt - 1)), capped at max_delay_ms, multiplied by a uniform random jitter in [0.8, 1.2]. (Source: [PS])

    • Depends on: REQ-013
    • Definition of Done: With defaults, attempt 1 produces a duration in [800ms, 1200ms]; attempt 3 produces a duration in [3200ms, 4800ms].
  • REQ-072: Implement is_retryable() on ProviderError: returns true only for RateLimited and Network variants. (Source: [AR])

    • Depends on: REQ-020
    • Definition of Done: Auth, Api, ContextOverflow, Cancelled, Other all return false; RateLimited and Network return true.
  • REQ-073: Implement retry_after() on ProviderError: extracts retry_after_ms from RateLimited { retry_after_ms: Some(n) } if present; returns None otherwise. (Source: [AR])

    • Depends on: REQ-020
    • Definition of Done: ProviderError::RateLimited { retry_after_ms: Some(5000) }.retry_after() returns Some(Duration::from_ms(5000)).
  • REQ-074: Integrate retry loop into stream_assistant_response: on a retryable error, sleep for retry_after() OR delay_for_attempt(attempt) and retry up to max_retries times; stop retrying if cancel.is_cancelled(). (Source: [PS])

    • Depends on: REQ-039, REQ-071, REQ-072, REQ-073
    • Definition of Done: A RateLimited error causes the loop to wait and retry; after exhausting retries, the error is propagated as an Error stop reason.

Milestone 3.3 — Provider Error Classification

  • REQ-075: Implement ProviderError::classify(status: u16, message: String) -> ProviderError: route to ContextOverflow first (status 400/413 or matching overflow phrase), then RateLimited (429), then Auth (401/403), then Api. (Source: [PS])

    • Depends on: REQ-020
    • Definition of Done: HTTP 429 maps to RateLimited; HTTP 401 maps to Auth; "prompt is too long" in the body maps to ContextOverflow.
  • REQ-076: Implement is_context_overflow(status, message) -> bool: check for empty body with status 400/413 (Cerebras/Mistral pattern); check for any of 15+ documented overflow phrases (case-insensitive substring match). (Source: [PS])

    • Depends on: —
    • Definition of Done: All 15 documented overflow phrases are recognized; unrelated 400 errors with non-empty body are not misclassified.
  • REQ-077: Implement context overflow recovery: when the streaming error event contains a message matching overflow detection (Message::is_context_overflow()), treat it as an overflow on the next turn by triggering compact_messages (if context_config is set). (Source: [AR])

    • Depends on: REQ-056, REQ-075, REQ-076
    • Definition of Done: A mock that returns an overflow error on turn 1 causes compaction before turn 2.

Milestone 3.4 — Tool Error Handling

  • REQ-078: On ToolError::Failed(msg) or ToolError::InvalidArgs(msg): convert to a ToolResult with content: [Text(msg)] and is_error: true; always return this to the LLM so it can self-correct. (Source: [AR])

    • Depends on: REQ-010, REQ-046
    • Definition of Done: A tool that returns Err(Failed("oops")) produces a ToolResult message with is_error: true and the text "oops".
  • REQ-079: On ToolError::NotFound(name): produce ToolResult { content: [Text("Tool {name} not found")], is_error: true }. (Source: [PS])

    • Depends on: REQ-046
    • Definition of Done: Requesting a non-existent tool name in a tool call produces a NotFound error result.
  • REQ-080: On ToolError::Cancelled: produce ToolResult { content: [Text("Skipped due to queued user message.")], is_error: true }. (Source: [AR])

    • Depends on: REQ-010, REQ-046
    • Definition of Done: A tool skipped due to steering produces the documented skipped message.

Milestone 3.5 — Error and Abort Stop Reason Handling

  • REQ-081: In run_loop, when the assistant message has stop_reason == Error: call on_error(error_message) if defined, call after_turn if defined, emit TurnEnd, return immediately. (Source: [PS])

    • Depends on: REQ-038, REQ-082
    • Definition of Done: A mock provider that returns an error stop reason causes the loop to exit; on_error is called with the message text.
  • REQ-082: In run_loop, when stop_reason == Aborted: call after_turn if defined, emit TurnEnd, return immediately. (Source: [PS])

    • Depends on: REQ-038
    • Definition of Done: Calling agent.abort() mid-run causes the loop to exit cleanly; TurnEnd is emitted.
  • REQ-083: Construct a synthetic error Message::Assistant on irrecoverable provider failure (after retry exhaustion): empty content, stop_reason: Error, error_message: Some(e.to_string()). (Source: [PS])

    • Depends on: REQ-002, REQ-039
    • Definition of Done: A provider that always fails produces an Assistant message with stop_reason: Error containing the provider's error text.

Milestone 3.6 — Sequential and Batched Tool Execution

  • REQ-084: Implement execute_sequential: execute tool calls one at a time; after each, check the steering queue; on non-empty steering, skip remaining tools with ToolError::Cancelled results and return steering messages. (Source: [PS])

    • Depends on: REQ-046, REQ-080
    • Definition of Done: With steering arriving after tool 1 of 3, tools 2 and 3 receive skipped error results; the steering message is returned for injection.
  • REQ-085: Implement execute_batch (Parallel): launch all tools concurrently via join_all; after all complete, check steering once; return steering if present. (Source: [PS])

    • Depends on: REQ-046
    • Definition of Done: Three parallel tools all complete; steering arriving before their completion is returned after all finish.
  • REQ-086: Implement Batched { size } dispatch: split tool calls into groups of size; run each group via execute_batch; check steering between groups; on steering, skip remaining groups with cancelled results. (Source: [PS])

    • Depends on: REQ-085
    • Definition of Done: With 5 tool calls, Batched { size: 2 } executes groups [1,2], [3,4], [5]; steering after group 1 skips groups 2 and 3.

Milestone 3.7 — Steering and Follow-up Queue Integration

  • REQ-087: In run_loop, drain the steering queue at the start of the outer loop before the first inner-loop iteration. (Source: [PS])

    • Depends on: REQ-038
    • Definition of Done: Messages enqueued via steer() before prompt() is called are injected as the first pending messages.
  • REQ-088: After tool execution, if steering messages were captured, set them as pending and continue the inner loop (injecting them before the next LLM call). (Source: [PS])

    • Depends on: REQ-038, REQ-084, REQ-085
    • Definition of Done: A steering message injected during tool execution appears in context before the subsequent LLM call.
  • REQ-089: After the inner loop exits (no tool calls, no pending steering), check the follow-up queue; if non-empty, add follow-up messages to pending and continue the outer loop. (Source: [PS])

    • Depends on: REQ-038
    • Definition of Done: A follow-up message enqueued via follow_up() causes the agent to re-enter the loop rather than stopping.
  • REQ-090: Implement QueueMode::OneAtATime (pop exactly one message per read) and QueueMode::All (drain the entire queue per read). Both modes are thread-safe (mutex-protected). (Source: [AR])

    • Depends on: REQ-017, REQ-027
    • Definition of Done: OneAtATime leaves remaining messages in the queue; All empties it; both are safe to call from the agent loop while another thread pushes.

Milestone 3.8 — Lifecycle Callbacks

  • REQ-091: Call before_turn(messages, turn_number) -> bool at the start of each turn (before the LLM call). If it returns false, return from run_loop immediately without emitting AgentEnd. (Source: [PS])

    • Depends on: REQ-038
    • Definition of Done: A before_turn that returns false on turn 2 stops the loop after turn 1; AgentEnd is not emitted.
  • REQ-092: Call after_turn(messages, usage) after each LLM call and its tool executions, including on error/abort paths. (Source: [PS])

    • Depends on: REQ-038
    • Definition of Done: after_turn is called exactly once per turn, including when the turn ends in an error.
  • REQ-093: Call on_error(message: &str) when stop_reason == Error. (Source: [PS])

    • Depends on: REQ-081
    • Definition of Done: An error-returning provider invokes the on_error callback with the error message string.

Milestone 3.9 — Tool Safety and Edge Cases

  • REQ-094: BashTool: check each deny_pattern against the command (substring match) before execution; return Err(Failed("Command blocked...")) on match. (Source: [PS])

    • Depends on: REQ-047
    • Definition of Done: A command containing a deny pattern is rejected before any subprocess is spawned.
  • REQ-095: BashTool: race subprocess completion against a configurable timeout and the cancellation token; on timeout return Err(Failed("Command timed out after Ns")); on cancellation return Err(Cancelled). (Source: [PS])

    • Depends on: REQ-047
    • Definition of Done: sleep 300 with a 2s timeout produces a timeout error; cancellation produces Cancelled.
  • REQ-096: BashTool: truncate stdout and stderr independently at max_output_bytes (default 256KB) and append "\n... (output truncated)". (Source: [PS])

    • Depends on: REQ-047
    • Definition of Done: Output exceeding 256KB is truncated with the documented suffix.
  • REQ-097: BashTool: optional confirm_fn callback; if defined and returns false, return Err(Failed("Command was not confirmed by the user.")). (Source: [PS])

    • Depends on: REQ-047
    • Definition of Done: A rejecting confirm_fn prevents subprocess execution.
  • REQ-098: ReadFileTool: check file size before reading. Text files exceeding max_bytes (1MB): return Err(Failed("File too large. Use offset/limit...")). Image files exceeding 20MB: return Err(Failed("Image too large")). (Source: [PS])

    • Depends on: REQ-048
    • Definition of Done: Reading a file above the size limit returns the documented error without reading the file contents.
  • REQ-099: ReadFileTool: for image extensions, read file as bytes, base64-encode, detect MIME type from extension, return Content::Image. (Source: [PS])

    • Depends on: REQ-001, REQ-048
    • Definition of Done: Reading a .png file returns a ToolResult with Content::Image { data: base64, mime_type: "image/png" }.
  • REQ-100: ReadFileTool: check ctx.cancel.is_cancelled() before each I/O operation; return Err(Cancelled) if set. (Source: [PS])

    • Depends on: REQ-048
    • Definition of Done: Cancelling before a read returns Cancelled without touching the file.
  • REQ-101: EditFileTool: if old_text matches zero occurrences, attempt find_similar_text for a fuzzy hint; return Err(Failed("old_text not found... Did you mean: ...")). (Source: [PS])

    • Depends on: REQ-050
    • Definition of Done: An edit with wrong old_text returns a Failed error; if a similar line exists, the hint is included.
  • REQ-102: EditFileTool: if old_text matches more than one occurrence, return Err(Failed("old_text matches N locations. Include more context...")). (Source: [PS])

    • Depends on: REQ-050
    • Definition of Done: Attempting to replace ambiguous text returns a descriptive error with the match count.
  • REQ-103: EditFileTool: check ctx.cancel.is_cancelled() before each I/O operation. (Source: [PS])

    • Depends on: REQ-050
    • Definition of Done: Cancellation before read or write returns Err(Cancelled).
  • REQ-104: WriteFileTool: check ctx.cancel.is_cancelled() before writing. (Source: [AR])

    • Depends on: REQ-049
    • Definition of Done: Cancellation prevents the write from occurring.
  • REQ-105: ListFilesTool: race find execution against a timeout (default 10s) and the cancellation token; truncate results at max_results (default 200) with a truncation suffix. (Source: [PS])

    • Depends on: REQ-051
    • Definition of Done: Listing a directory with 500 files returns 200 with the truncation message.
  • REQ-106: SearchTool: fall back from rg to grep if ripgrep is not available on the system. Check ctx.cancel.is_cancelled() before execution. (Source: [PS])

    • Depends on: REQ-052
    • Definition of Done: Search succeeds on a system without rg installed; cancellation is respected.

Milestone 3.10 — Agent Invariants

  • REQ-107: In prompt_messages_with_sender, assert !self.is_streaming with a clear panic message before proceeding. (Source: [PS])

    • Depends on: REQ-035
    • Definition of Done: Calling prompt() while a run is active panics with a message directing the caller to use steer() or follow_up().
  • REQ-108: In agent_loop_continue, validate preconditions: context.messages is non-empty and the last message is not an Assistant variant. (Source: [PS])

    • Depends on: REQ-037
    • Definition of Done: Calling agent_loop_continue with an empty context or with a trailing assistant message returns an error or panics with a clear message.

Milestone 3.11 — Skill System

  • REQ-109: Implement SkillSet::load(dirs: Vec<Path>): iterate directories, skip missing ones silently, scan each for subdirectories containing SKILL.md, parse frontmatter, build a name-keyed map (later dirs override earlier on collision), return sorted SkillSet. (Source: [PS])

    • Depends on: REQ-110
    • Definition of Done: Loading two dirs where both contain a skill named "foo" results in the second dir's version being used.
  • REQ-110: Implement parse_frontmatter(content) -> (name, description): require content to begin with ---, extract YAML block up to next \n---, parse name: and description: lines, strip surrounding quotes, return Err(InvalidFrontmatter) or Err(MissingField) on failure. (Source: [PS])

    • Depends on: —
    • Definition of Done: Valid frontmatter parses correctly; missing name field returns a MissingField error; missing delimiters return InvalidFrontmatter.
  • REQ-111: Implement SkillSet::format_for_prompt(): emit <available_skills> XML block with one <skill> element per skill (sorted by name ascending), XML-escaping all string values; return empty string if no skills loaded. (Source: [PS])

    • Depends on: REQ-109
    • Definition of Done: Output is well-formed XML; special characters in skill names/descriptions are correctly escaped.
  • REQ-112: Implement SkillSet::load_dir(dir, source) and SkillSet::merge(other). (Source: [AR])

    • Depends on: REQ-109
    • Definition of Done: merge causes the other's skills to override on name conflict.
  • REQ-113: Implement Agent::with_skills(skill_set): call format_for_prompt() and append the XML block to self.system_prompt. (Source: [PS])

    • Depends on: REQ-111
    • Definition of Done: After with_skills(set), the agent's system prompt contains the <available_skills> XML block.

Milestone 3.12 — MCP Client

  • REQ-114: Implement McpClient::connect_stdio(cmd, args, env): spawn subprocess with piped stdin/stdout; complete the 3-step initialize handshake; return Ok(McpClient). (Source: [PS])

    • Depends on: REQ-115, REQ-116
    • Definition of Done: Spawning a compliant MCP server subprocess results in a connected client; server_info is populated from the handshake.
  • REQ-115: Implement McpClient::send_request(method, params): construct a JSON-RPC 2.0 request with auto-incremented atomic ID, send over transport, receive response, return Err(JsonRpc{...}) on error field or Err(Protocol("Empty result")) on missing result. (Source: [PS])

    • Depends on: —
    • Definition of Done: A JSON-RPC response with an error field maps to McpError::JsonRpc; a valid result field is returned as Ok(value).
  • REQ-116: Implement McpClient::list_tools() and McpClient::call_tool(name, args). (Source: [PS])

    • Depends on: REQ-115
    • Definition of Done: list_tools() returns a parsed Vec<McpToolInfo>; call_tool() returns a parsed McpToolCallResult.
  • REQ-117: Implement McpToolAdapter implementing AgentTool: wraps McpToolInfo metadata and an Arc<Mutex<McpClient>>; execute() calls client.call_tool() and converts McpContent to Content variants. (Source: [AR])

    • Depends on: REQ-001, REQ-021, REQ-116
    • Definition of Done: An McpToolAdapter can be registered on an agent and called successfully in a tool-use turn.
  • REQ-118: Handle all McpError variants gracefully: Transport, Protocol, JsonRpc, Serialization, Io, ConnectionClosed all surface as ToolError::Failed with descriptive messages. (Source: [AR])

    • Depends on: REQ-117
    • Definition of Done: Each McpError variant produces a non-panicking ToolError::Failed with a message identifying the error type and context.
  • REQ-119: Implement Agent::with_mcp_server_stdio(cmd, args, env): call McpClient::connect_stdio, then McpToolAdapter::from_client, append resulting tool adapters to self.tools. (Source: [AR])

    • Depends on: REQ-114, REQ-117
    • Definition of Done: After with_mcp_server_stdio, the agent's tool list includes all tools reported by the MCP server.

Level 4 — Professional

Goal: The system is safe, observable, and maintainable. It can be operated with multiple provider backends, supports prompt caching and extended thinking, exposes useful observability hooks, and shuts down gracefully.

Completion Criteria: All 7 provider protocols are implemented. Prompt caching, thinking levels, structured logging, and security-sensitive fields are all handled. The cancellation tree propagates correctly to all I/O boundaries. The system is configurable for production use.


Milestone 4.1 — Full Provider Suite

  • REQ-120: Implement GoogleProvider::stream (Gemini API): POST to {base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={API_KEY}; use custom SSE parser (split on \n\n, extract data: line); map tool calls from functionDeclarations; auto-generate tool IDs as "google-fc-{index}"; tool results as functionResponse parts. (Source: [AR])

    • Depends on: REQ-020
    • Definition of Done: A Gemini streaming response is parsed into the correct StreamEvents; tool IDs are auto-generated in the documented format.
  • REQ-121: Implement GoogleVertexProvider::stream (Vertex AI): identical wire format to Gemini; endpoint pattern https://{region}-aiplatform.googleapis.com/...; auth via Authorization: Bearer {OAUTH_TOKEN}; tool IDs as "vertex-fc-{index}". (Source: [AR])

    • Depends on: REQ-120
    • Definition of Done: Vertex request differs from Gemini only in endpoint and auth header.
  • REQ-122: Implement BedrockProvider::stream (ConverseStream API): endpoint {base_url}/model/{model}/converse-stream; newline-delimited JSON (not standard SSE); parse events contentBlockDelta, contentBlockStart, contentBlockStop, messageStop, metadata; tool spec format: toolSpec { inputSchema: { json: schema } }; tool result format: { toolResult: { toolUseId, content, status } }. (Source: [AR])

    • Depends on: REQ-020
    • Definition of Done: A Bedrock ndjson streaming response is correctly parsed; tool definitions and results are in the Bedrock-specific format.
  • REQ-123: Implement OpenAiResponsesProvider::stream (OpenAI Responses API): endpoint {base_url}/responses; system prompt in "instructions" field; SSE events response.output_text.delta, response.reasoning.delta, response.function_call_arguments.*, response.completed. (Source: [AR])

    • Depends on: REQ-020
    • Definition of Done: The Responses API wire format differs correctly from Chat Completions in system prompt field and event names.
  • REQ-124: Implement AzureOpenAiProvider::stream: endpoint {base_url}/responses?api-version=2025-01-01-preview; auth via api-key: {AZURE_OPENAI_API_KEY} header (not Authorization: Bearer); same request/response format as OpenAI Responses API. (Source: [AR])

    • Depends on: REQ-123
    • Definition of Done: Azure auth uses api-key header; base URL pattern https://{resource}.openai.azure.com/openai/deployments/{deployment} is supported.
  • REQ-125: Register all 7 providers (Anthropic, OpenAiCompat, OpenAiResponses, Azure, Google, Vertex, Bedrock) in ProviderRegistry::default(). (Source: [AR])

    • Depends on: REQ-042, REQ-120 through REQ-124
    • Definition of Done: ProviderRegistry::default() can dispatch to any of the 7 implementations based on protocol selection.

Milestone 4.2 — Prompt Caching

  • REQ-126: Implement CacheStrategy::Auto: provider automatically places cache_control: { type: "ephemeral" } breakpoints at the system prompt, the last tool definition, and the second-to-last message. (Source: [AR])

    • Depends on: REQ-014, REQ-040
    • Definition of Done: In Anthropic requests, the three cache breakpoints appear in the correct positions when strategy: Auto.
  • REQ-127: Implement CacheStrategy::Manual { cache_system, cache_tools, cache_messages }: conditionally apply breakpoints per flag. Implement CacheStrategy::Disabled: no breakpoints emitted. (Source: [AR])

    • Depends on: REQ-126
    • Definition of Done: Each flag independently controls placement of its respective cache breakpoint.
  • REQ-128: Propagate Usage.cache_read and Usage.cache_write from Anthropic response metadata into Message::Assistant.usage. (Source: [AR])

    • Depends on: REQ-006, REQ-040
    • Definition of Done: Cache token counts from Anthropic are populated in the usage struct after a cached-hit response.

Milestone 4.3 — Extended Thinking

  • REQ-129: Map ThinkingLevel to Anthropic thinking parameter: Off → omit; Minimalbudget_tokens: 128; Low → 512; Medium → 2048; High → 8192. (Source: [AR])

    • Depends on: REQ-019, REQ-040
    • Definition of Done: Setting ThinkingLevel::Medium causes {type:"enabled", budget_tokens:2048} to appear in the Anthropic request.
  • REQ-130: Map ThinkingLevel to OpenAI-compat reasoning_effort parameter when supports_reasoning_effort flag is set: Minimal/Low"low"; Medium"medium"; High"high". (Source: [AR])

    • Depends on: REQ-019, REQ-041
    • Definition of Done: ThinkingLevel::High with a reasoning-capable provider produces reasoning_effort: "high" in the request body.
  • REQ-131: Parse Thinking content blocks from streaming responses (Anthropic thinking type blocks; OpenAI delta.reasoning_content / xAI delta.reasoning); emit as StreamDelta::Thinking and store as Content::Thinking in the final message. (Source: [AR])

    • Depends on: REQ-001, REQ-008, REQ-040
    • Definition of Done: A streaming response containing thinking/reasoning content produces MessageUpdate events with StreamDelta::Thinking and the final Content::Thinking block in the assembled message.

Milestone 4.4 — MCP HTTP Transport

  • REQ-132: Implement McpClient::connect_http(url): POST JSON-RPC bodies to the configured URL (stateless, no persistent connection); complete the initialize handshake. (Source: [AR])

    • Depends on: REQ-115
    • Definition of Done: An HTTP-based MCP server can be connected to and queried for tools.
  • REQ-133: Implement Agent::with_mcp_server_http(url) builder. Support optional tool name prefix ({prefix}__{name}) for namespace disambiguation. (Source: [AR])

    • Depends on: REQ-117, REQ-132
    • Definition of Done: HTTP MCP tools appear in the agent's tool list; with a prefix configured, tool names are formatted as "{prefix}__{name}".
  • REQ-134: On MCP stdio transport shutdown, send EOF on stdin then kill the child process. (Source: [AR])

    • Depends on: REQ-114
    • Definition of Done: Dropping or closing the stdio MCP client terminates the child process cleanly.

Milestone 4.5 — Observability and Logging

  • REQ-135: Implement structured retry logging: when a retry occurs, log attempt number, max retries, delay, and the triggering error at an appropriate log level. (Source: [PS])

    • Depends on: REQ-074
    • Definition of Done: A retried request produces a structured log entry containing all four fields.
  • REQ-136: Implement ContextTracker: combine provider-reported token counts (from Usage) with local estimate_tokens for messages appended since the last provider report. Expose current_tokens() -> usize. (Source: [AR])

    • Depends on: REQ-054, REQ-055
    • Definition of Done: After a turn with known provider-reported usage, current_tokens() reflects the reported value; after additional messages are appended, it adds heuristic estimates.
  • REQ-137: Populate ToolResult.details with structured metadata per tool: BashTool{ exit_code, success }; ReadFileTool{ path }; WriteFileTool{ path }; EditFileTool{ path, old_lines, new_lines }; ListFilesTool{ total, truncated }; SubAgentTool{ sub_agent, turns }. (Source: [AR])

    • Depends on: REQ-047 through REQ-052
    • Definition of Done: ToolResult.details for a bash execution contains exit_code and success keys.

Milestone 4.6 — Security

  • REQ-138: Redact sensitive OpenApiAuth credentials in debug output: Bearer(token) displays as Bearer("****"); ApiKey { value } displays as ApiKey { header: "...", value: "****" }. (Source: [AR])

    • Depends on: —
    • Definition of Done: Printing/logging an OpenApiAuth::Bearer("secret") value produces "****" instead of the actual token.
  • REQ-139: Implement the complete BashTool deny-pattern list (configurable; default list to be specified at implementation time based on the safety policy described in the spec). (Source: [PS])

    • Depends on: REQ-094
    • Definition of Done: A configurable list of deny patterns is applied; at least the patterns documented in the spec are included in the default list.

Milestone 4.7 — Graceful Cancellation

  • REQ-140: Implement CancellationToken::child_token(): creates a new token that is cancelled when the parent is cancelled. Each ToolContext receives a child token. (Source: [PS])

    • Depends on: REQ-033, REQ-046
    • Definition of Done: Calling agent.abort() (which cancels the root token) causes all active tool contexts' cancel.is_cancelled() to return true simultaneously.
  • REQ-141: SubAgentTool forwards the parent's cancel token to the child agent_loop(), so agent.abort() terminates sub-agents as well. (Source: [PS])

    • Depends on: REQ-033, REQ-140
    • Definition of Done: Aborting the parent agent cancels the sub-agent's run.

Milestone 4.8 — Callbacks and Advanced Configuration

  • REQ-142: Implement on_update callback in ToolContext: when called, emits AgentEvent::ToolExecutionUpdate { tool_call_id, tool_name, partial_result } to the event channel. (Source: [AR])

    • Depends on: REQ-007, REQ-046
    • Definition of Done: A tool that calls ctx.on_update(partial) causes ToolExecutionUpdate events to appear in the stream before ToolExecutionEnd.
  • REQ-143: Implement on_progress callback in ToolContext: when called, emits AgentEvent::ProgressMessage { tool_call_id, tool_name, text }. (Source: [AR])

    • Depends on: REQ-007, REQ-046
    • Definition of Done: A tool that calls ctx.on_progress("working...") causes a ProgressMessage event in the stream.
  • REQ-144: Implement Agent::prompt_with_sender(text, tx): like prompt, but streams events to a caller-provided sender rather than creating a new channel. (Source: [AR])

    • Depends on: REQ-034
    • Definition of Done: Events are sent to the provided tx; the caller can multiplex one sender across multiple prompts.
  • REQ-145: Implement transform_context and convert_to_llm optional hooks on AgentLoopConfig. When set, stream_assistant_response calls them to preprocess messages before building StreamConfig. (Source: [PS])

    • Depends on: REQ-039
    • Definition of Done: A transform_context hook that adds a prefix message causes that message to appear in every LLM call.
  • REQ-146: Implement Agent::with_compaction_strategy(strategy) builder; when set, use the custom CompactionStrategy instead of the default tiered cascade. (Source: [AR])

    • Depends on: REQ-023, REQ-060
    • Definition of Done: A custom strategy that always returns an empty list causes the LLM to be called with no history.
  • REQ-147: Define ModelConfig struct with fields: base_url: Option<String>, headers: Map<String,String>, max_tokens_field: String (default "max_tokens"), supports_developer_role: bool, supports_reasoning_effort: bool. Apply in OpenAiCompatProvider. (Source: [AR])

    • Depends on: REQ-041
    • Definition of Done: Setting max_tokens_field: "max_completion_tokens" causes the OpenAI provider to use that key in the request body.

Milestone 4.9 — Agent Identity and Event Hook Observability

  • REQ-180: Define ContinuationKind enum in types.rs with three variants: Default (unspecified continuation), Rerun { tag: String } (retry from equivalent context), Branch { tag: String } (different execution path). Tags are RFC 3339 UTC timestamps auto-generated at call time by the caller. (Source: [AR])

    • Depends on: —
    • Definition of Done: All three variants instantiate; Rerun { tag } and Branch { tag } round-trip through JSON serialization preserving the tag string.
  • REQ-181: Define TurnTrigger enum in types.rs with four variants: User (first turn of origin call), SubAgent (sub-agent invocation), Continuation (subsequent turns, tool round-trips, steering, Default/Rerun continuations), Branch (first turn of a Branch continuation). Add triggered_by: TurnTrigger field to AgentEvent::TurnStart. (Source: [AR])

    • Depends on: REQ-007
    • Definition of Done: TurnStart events carry the correct triggered_by value: origin calls emit User on turn 0; Branch continuations emit Branch on turn 0; all other first turns and all subsequent turns emit Continuation.
  • REQ-182: Add before_loop: Option<BeforeLoopFn> and after_loop: Option<AfterLoopFn> to AgentLoopConfig. BeforeLoopFn fires before AgentStart — return false to abort the loop (emit AgentEnd { messages: [] } instead). AfterLoopFn fires after AgentEnd with the new messages and accumulated usage. Both are wired in agent_loop and agent_loop_continue. (Source: [AR])

    • Depends on: REQ-036, REQ-037
    • Definition of Done: A before_loop returning false stops the run before AgentStart; after_loop is called exactly once per loop call, after AgentEnd, with correct message and usage values.
  • REQ-183: Add before_tool_execution: Option<BeforeToolExecutionFn> and after_tool_execution: Option<AfterToolExecutionFn> to AgentLoopConfig. BeforeToolExecutionFn fires before ToolExecutionStart — return false to skip the tool (emit skipped error result). AfterToolExecutionFn fires after ToolExecutionEnd. (Source: [AR])

    • Depends on: REQ-046
    • Definition of Done: A before_tool_execution returning false for one tool causes that tool to be skipped with an error result; other tools in the same batch are unaffected. after_tool_execution is called exactly once per tool call.
  • REQ-184: Add before_tool_execution_update: Option<BeforeToolExecutionUpdateFn> and after_tool_execution_update: Option<AfterToolExecutionUpdateFn> to AgentLoopConfig. BeforeToolExecutionUpdateFn fires before each ToolExecutionUpdate — return false to suppress the event (tool keeps running, final ToolResult unaffected). AfterToolExecutionUpdateFn fires after the event when not suppressed. (Source: [AR])

    • Depends on: REQ-142
    • Definition of Done: Suppressing an update via before_tool_execution_update causes no ToolExecutionUpdate event to be emitted; after_tool_execution_update is not called for suppressed updates.
  • REQ-185: Enforce and document the event hook ordering invariant: before_loop → AgentStart … before_turn → TurnStart … before_tool_execution → ToolExecutionStart … (before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)* … ToolExecutionEnd → after_tool_execution … TurnEnd → after_turn … AgentEnd → after_loop. No hook may fire out of this sequence. (Source: [AR])

    • Depends on: REQ-182, REQ-183, REQ-184
    • Definition of Done: An integration test with all hooks registered verifies they fire in the documented order for a multi-turn, multi-tool run.
  • REQ-186: Add fn provider_id(&self) -> &str as a required method on the StreamProvider trait (src/provider/traits.rs). Implement in all 7 providers: "anthropic", "openai", "openai_responses", "azure_openai", "google", "google_vertex", "bedrock". The MockProvider returns "mock". (Source: [AR])

    • Depends on: REQ-020
    • Definition of Done: All 8 StreamProvider implementations compile with provider_id() returning the documented string; existing tests pass unchanged.
  • REQ-187: Add config_id: Option<String> field to AgentLoopConfig. When None, Agent::next_loop_id() auto-derives the effective config ID as "{provider_id}.{model_slug}[.thinking]". When Some, the supplied value is used verbatim. Used as the middle segment of loop_id: "{session_id}.{config_id}.{N}". (Source: [AR])

    • Depends on: REQ-029, REQ-186
    • Definition of Done: Setting config_id: Some("my-config") causes loop_id to include "my-config" as its middle segment; leaving None produces an auto-derived segment from provider + model.
  • REQ-188: Add agent_id: String and session_id: String fields to Agent struct, both initialized to UUID v4 in Agent::new(). These are stable for the lifetime of the Agent instance and injected into every AgentContext built by Agent::prompt_* and continue_loop_*. (Source: [AR])

    • Depends on: REQ-024
    • Definition of Done: All AgentStart events emitted by a single Agent instance share the same agent_id and session_id values across multiple prompt() calls.
  • REQ-189: Add loop_counters: HashMap<String, usize> and last_loop_id: Option<String> to Agent. Implement Agent::next_loop_id(config) -> String: compute effective_config_id from config.config_id or auto-derivation; increment the per-"{session_id}.{effective_config_id}" counter; return "{session_id}.{effective_config_id}.{N}". Set last_loop_id after each prompt_* / continue_loop_* call. (Source: [AR])

    • Depends on: REQ-187, REQ-188
    • Definition of Done: Two agent_loop calls on the same agent with the same provider/model produce loop_id values ending in .1 and .2 respectively; different configs produce independent counters (both .1).
  • REQ-190: Add agent_id, session_id, loop_id, parent_loop_id, and continuation_kind fields to AgentContext. In agent_loop, generate and write back agent_id/session_id/loop_id if None at entry. parent_loop_id and continuation_kind remain whatever the caller set. (Source: [AR])

    • Depends on: REQ-028, REQ-180, REQ-189
    • Definition of Done: After agent_loop returns, context.agent_id, context.session_id, and context.loop_id are all Some; a subsequent agent_loop_continue on the same context can read them without regenerating.
  • REQ-191: In agent_loop_continue, assert context.agent_id.is_some() and context.session_id.is_some() with descriptive panic messages. Do not silently generate new UUIDs. (Source: [AR])

    • Depends on: REQ-037, REQ-190
    • Definition of Done: Calling agent_loop_continue with agent_id: None panics with a message referencing "agent_loop_continue requires context.agent_id to be set"; with both fields Some, the assertion passes.
  • REQ-192: Add agent_id: String, session_id: String, loop_id: String, parent_loop_id: Option<String>, and continuation_kind: Option<ContinuationKind> to AgentEvent::AgentStart. Emit these fields from both agent_loop and agent_loop_continue. parent_loop_id is None for origin calls; continuation_kind is None for origin calls and Some(...) for continuations. (Source: [AR])

    • Depends on: REQ-007, REQ-180, REQ-190, REQ-191
    • Definition of Done: AgentStart events from agent_loop have parent_loop_id: None and continuation_kind: None; events from agent_loop_continue carry the values set on AgentContext.
  • REQ-193: In run_loop, determine TurnTrigger for the first turn based on context.continuation_kind: Branch(..)TurnTrigger::Branch; any other Some(..)TurnTrigger::Continuation; Noneconfig.first_turn_trigger (default User; SubAgent for sub-agent callers). All subsequent turns use TurnTrigger::Continuation. Emit triggered_by in AgentEvent::TurnStart. (Source: [AR])

    • Depends on: REQ-038, REQ-181
    • Definition of Done: A Branch continuation emits TurnTrigger::Branch on turn 0 and TurnTrigger::Continuation on all subsequent turns; a Default continuation emits TurnTrigger::Continuation on all turns.
  • REQ-194: Add child_loop_id: Option<String> to both ToolResult and AgentEvent::ToolExecutionEnd. Sub-agent tools set ToolResult.child_loop_id to the child loop's loop_id after agent_loop completes. execute_single_tool propagates result.child_loop_id into ToolExecutionEnd. Non-sub-agent tools leave both fields None. (Source: [AR])

    • Depends on: REQ-010, REQ-046, REQ-148, REQ-190
    • Definition of Done: A ToolExecutionEnd event from a SubAgentTool call carries a non-None child_loop_id; the same loop_id appears in the child's AgentStart event.
  • REQ-195: Add SubAgentTool::with_parent_loop_id(loop_id: String) builder method. When set, the child AgentContext built inside execute() has parent_loop_id: Some(loop_id). The child's AgentStart event thus carries parent_loop_id, enabling ancestry tracing from child back to parent. (Source: [AR])

    • Depends on: REQ-148, REQ-190
    • Definition of Done: A sub-agent tool configured with with_parent_loop_id("parent.loop.1") emits a child AgentStart event with parent_loop_id: Some("parent.loop.1").

Milestone 4.10 — Evaluational Parallelism

  • REQ-196: Migrate AgentContext.tools from Vec<Box<dyn AgentTool>> to Vec<Arc<dyn AgentTool>>. Add #[derive(Clone)] to AgentContext. Update Agent::set_tools, BasicAgent::with_tools, default_tools() return type, and all push sites in BasicAgent (sub-agent, openapi, mcp). Remove ArcToolWrapper from sub_agent.rs. (Implemented)

    • Depends on: REQ-028, REQ-046
    • Definition of Done: AgentContext: Clone; all existing tests pass; ArcToolWrapper deleted.
  • REQ-197: Add Usage::combine(&self, other: &Usage) -> Usage method for summing usage across branches. (Implemented)

    • Depends on: —
    • Definition of Done: usage_a.combine(&usage_b) returns a Usage with all fields summed.
  • REQ-198: Add ParallelLoopOutcome and ParallelLoopResult structs to types.rs. Add AgentEvent::ParallelLoopStart { session_id, loop_ids, timestamp } and AgentEvent::ParallelLoopEnd { session_id, selected_loop_id, selected_config_index, evaluation_usage, timestamp } variants to AgentEvent. (Implemented)

    • Depends on: REQ-190, REQ-197
    • Definition of Done: Both structs construct and the enum variants match correctly.
  • REQ-199: Define EvaluationDecision enum and EvaluationStrategy trait in types.rs. Trait method: evaluate(prompts, outcomes, tx, cancel) -> (EvaluationDecision, Usage). Placed in types.rs (not evaluation.rs) to avoid a circular dependency with agent_loop.rs. (Implemented)

    • Depends on: REQ-198
    • Definition of Done: Custom implementations compile by importing from crate::types or crate::evaluation.
  • REQ-200: Create src/agent_loop/evaluation.rs with five built-in EvaluationStrategy implementations: TransparentEvaluation (single-branch pass-through), PickFirstEvaluation (always index 0), TokenEfficientEvaluation (lowest total_tokens), ElaborateEvaluation (highest total_tokens), LlmJudgeEvaluation { judge_config, system_prompt }. (Implemented)

    • Depends on: REQ-199
    • Definition of Done: All five strategies implement EvaluationStrategy; unit tests pass for each.
  • REQ-201: LlmJudgeEvaluation — judge prompt construction: extract original query text from user messages in prompts only; extract final assistant text from each branch's new_messages (strip tool calls, tool results, intermediate turns). Build numbered judge prompt; run agent_loop with judge_config; parse first integer from reply; inherit session_id from branches for traceability. (Implemented)

    • Depends on: REQ-200
    • Definition of Done: Judge receives clean final responses, not raw tool traces; judge AgentStart has same session_id as branches.
  • REQ-202: LlmJudgeEvaluation — judge's comprehension criteria: all N branch final responses must fit in the judge model's context budget simultaneously. Apply iterative multi-tier compaction: tier 1 (last 80 lines), tier 2 (first+last paragraph), tier 3 (hard char limit derived from budget / N). Budget derives from judge_config.context_config.max_context_tokens (if set). Emit AgentEvent::ProgressMessage warning if criteria cannot be satisfied after tier 3. Selected winner always returns the original uncompacted messages. (Implemented)

    • Depends on: REQ-201
    • Definition of Done: With a tight context_config.max_context_tokens, compaction fires and a warning is emitted; selected output is the original branch content.
  • REQ-203: Add derive_config_segment(config: &AgentLoopConfig) -> String helper (pub crate) and run_parallel_branches(...) internal async function to agent_loop.rs. Add agent_loop_parallel(prompts, base_context, configs, strategy, tx, cancel) -> ParallelLoopResult public async function. Uses futures::future::join_all for branch concurrency (avoids 'static bound on AgentLoopConfig hooks). Per-branch forwarder task (tokio::spawn) captures usage from AgentEnd. (Implemented)

    • Depends on: REQ-196, REQ-199
    • Definition of Done: agent_loop_parallel with 2 configs runs both branches, emits ParallelLoopStart/ParallelLoopEnd, and returns correct selected_index.
  • REQ-204: Export evaluation module from lib.rs; re-export agent_loop_parallel and all five evaluation strategies at crate root. (Implemented)

    • Depends on: REQ-200, REQ-203
    • Definition of Done: use phi_core::{agent_loop_parallel, PickFirstEvaluation, LlmJudgeEvaluation} compiles.
  • REQ-205: agent_loop_parallel routes to agent_loop_continue when prompts is empty. (Implemented)

    • Depends on: REQ-203
    • Definition of Done: Calling agent_loop_parallel(vec![], ctx_with_user_msg, ...) dispatches each branch via agent_loop_continue and returns a valid ParallelLoopResult.
  • REQ-206: Add original_context_len: usize to ParallelLoopOutcome. (Implemented)

    • Depends on: REQ-198, REQ-205
    • Definition of Done: outcome.context.messages[..outcome.original_context_len] is the shared base context; [original_context_len..] are branch-produced messages.
  • REQ-207: LlmJudgeEvaluation extracts prior conversation context and query from context.messages[..original_context_len] in agent_loop_continue mode; includes formatted prior-context transcript in judge prompt. (Implemented)

    • Depends on: REQ-201, REQ-206
    • Definition of Done: When prompts is empty, the judge prompt contains "Prior conversation context:" and "Original query:" sections derived from the original context.
  • REQ-208: Replace single-pass output compaction with 2-iteration compact_for_judge: Iteration 1 compacts prior context only (outputs intact); Iteration 2 compacts both independently. (Implemented)

    • Depends on: REQ-202, REQ-207
    • Definition of Done: Under a tight token budget, outputs remain uncompacted as long as prior-context compaction alone can satisfy the criteria.
  • REQ-209: Updated build_judge_user_message includes optional prior context section before the query. (Implemented)

    • Depends on: REQ-207
    • Definition of Done: Judge prompt includes "Prior conversation context:\n<transcript>" when prior context is non-empty; omitted when empty (fresh-session case).

Level 5 — Creative

Goal: The system surpasses the original. Sub-agent delegation, OpenAPI tool generation, advanced Anthropic protocol features, and all documented ambiguities are resolved with principled design decisions.

Completion Criteria: SubAgentTool works end-to-end; the OpenAPI adapter generates callable tools from a spec file; all [AMBIGUOUS] items have a documented resolution; performance benchmarks for parallel tool execution meet or exceed documented expectations.


Milestone 4.11 — Persistent Session Layer

  • REQ-210: Add loop_id: String to all AgentEvent variants that lacked it (AgentEnd, TurnStart, TurnEnd, MessageStart, MessageUpdate, MessageEnd, ToolExecutionStart, ToolExecutionUpdate, ToolExecutionEnd, ProgressMessage, InputRejected). Add Serialize, Deserialize to AgentEvent, ContinuationKind, TurnTrigger, StreamDelta. Thread loop_id through all emission sites in agent_loop.rs and evaluation.rs. (Source: [AR])

    • Depends on: REQ-007, REQ-114
    • Definition of Done: All AgentEvent variants carry loop_id; events from interleaved parallel branches can be unambiguously attributed to the correct LoopRecord.
  • REQ-211: Define Session, LoopRecord, LoopEvent, and LoopConfigSnapshot types in src/session/. Session contains an ordered Vec<LoopRecord>; LoopRecord holds identity fields (loop_id, session_id, agent_id), timing, status, messages (from AgentEnd.messages), usage, events, and tree links (children_loop_ids, parent_loop_id). LoopConfigSnapshot stores model, provider, config_id. (Source: [AR])

    • Depends on: REQ-210
    • Definition of Done: All types serialize/deserialize (JSON round-trip lossless); Session.total_usage() sums LoopRecord.usage across all loops.
  • REQ-212: Define ChildLoopRef and SpawnRef for bidirectional cross-session sub-agent tracking. ChildLoopRef is stored in LoopRecord.child_loop_refs (parent → child); SpawnRef is stored in Session.parent_spawn_ref (child → parent). Both carry tool_call_id, tool_name, and cross-session ids. (Source: [AR])

    • Depends on: REQ-211
    • Definition of Done: A parent session's LoopRecord.child_loop_refs can be used to load and link the child session.
  • REQ-213: Define ParallelGroupRecord and implement LoopStatus::Pending pre-registration in SessionRecorder. When ParallelLoopStart arrives, pre-create LoopRecord { status: Pending } for each branch loop_id so the group is registered before AgentStart fires for each branch. ParallelLoopEnd retroactively sets ParallelGroupRecord on all branch records. (Source: [AR])

    • Depends on: REQ-211
    • Definition of Done: After a parallel loop completes, all branch LoopRecords have parallel_group set; exactly one has is_selected = true.
  • REQ-214: Implement SessionRecorder with PerSessionId formation policy. on_event(event) routes events by loop_id: creates Session on first-seen session_id from AgentStart; closes LoopRecord on AgentEnd; appends bidirectional tree links; handles sub-agent SpawnRef enrichment from ToolExecutionEnd.child_loop_id. (Source: [AR])

    • Depends on: REQ-211, REQ-212, REQ-213
    • Definition of Done: test_session_recorder_single_loop, test_session_recorder_continuation, test_session_recorder_bidirectional_tree, test_session_recorder_continuation_kind all pass.
  • REQ-215: Add BasicAgent::new_session() and check_and_rotate(threshold) to BasicAgent. Add last_active_at: Option<DateTime<Utc>> field; update prompt_messages_with_sender to record it. new_session() rotates session_id, clears loop_counters and last_loop_id. (Source: [AR])

    • Depends on: REQ-214
    • Definition of Done: test_basic_agent_new_session and test_basic_agent_check_and_rotate pass.
  • REQ-216: Implement save_session, load_session, list_session_ids persistence API. File layout: {dir}/{session_id}.json (pretty-printed JSON, flat directory). list_session_ids returns ids sorted by modification time (newest first). (Source: [AR])

    • Depends on: REQ-211
    • Definition of Done: test_session_save_load_roundtrip and test_session_list_ids pass; saved files are valid, human-readable JSON.
  • REQ-217: Implement load_sessions_for_agent and delete_session. load_sessions_for_agent loads all sessions in dir and filters by agent_id. delete_session removes the file; returns SessionError::NotFound if absent. (Source: [AR])

    • Depends on: REQ-216
    • Definition of Done: test_session_delete passes; load_sessions_for_agent returns only sessions with the matching agent_id.
  • REQ-218: Implement Session tree navigation methods: root_loops(), children_of(loop_id), parallel_siblings(loop_id), get_loop(loop_id). Export all public session types from src/lib.rs. (Source: [AR])

    • Depends on: REQ-211
    • Definition of Done: test_session_recorder_parallel_group and test_session_recorder_bidirectional_tree exercise all navigation methods; all assertions pass.
  • REQ-219: Write docs/concepts/sessions.md documenting: Overview, Session Formation (three modes), LoopRecord Anatomy (field table, LoopStatus lifecycle, continuation_kind classification, LoopConfigSnapshot rationale), Loop Tree Navigation, Cross-Session Sub-Agent Tracking, Parallel Evaluation Groups, SessionRecorder usage with code example, Persistence API, and 9 Design Decisions (each with decision / why / rejected alternative). (Source: [AR])

    • Depends on: REQ-211 – REQ-218
    • Definition of Done: docs/concepts/sessions.md exists; covers all listed sections; code examples are syntactically valid Rust.
  • REQ-220: Update docs/specs/architecture.md: add SessionStore component section, add SessionStore to dependency graph, update AgentEvent variant table to document loop_id: String on all applicable variants, add Session/LoopRecord/SessionRecorder data model entries, add new_session() / check_and_rotate() / last_active_at to BasicAgent interface table. Update docs/specs/roadmap.md with this milestone. (Source: [AR])

    • Depends on: REQ-219
    • Definition of Done: Both spec files updated; all new types and methods are documented.
  • REQ-221: Fix SessionRecorder SpawnRef enrichment to handle the case where the child session has already been moved to completed before the parent's ToolExecutionEnd fires. Currently, ToolExecutionEnd only searches open_sessions for the child session to enrich parent_spawn_ref.tool_call_id / tool_name; if flush() was called between child AgentEnd and the parent's ToolExecutionEnd (e.g. periodic batch checkpointing in production), the child session is in completed and the enrichment is silently skipped — leaving tool_call_id: "" and tool_name: "" on the SpawnRef permanently. Fix by also searching completed sessions in the enrichment step, or by deferring child-session promotion to completed until the parent loop also closes. (Source: post-sprint review)

    • Depends on: REQ-214
    • Definition of Done: A test demonstrates that calling flush() between child AgentEnd and parent ToolExecutionEnd still produces a fully-enriched SpawnRef on the child session.

Milestone 5.1 — Sub-Agent Delegation

  • REQ-148: Implement SubAgentTool::execute: validate params["task"] is non-empty; build a fresh AgentContext (empty messages, own toolset); build AgentLoopConfig with max_turns guard (default 10), no steering/follow-ups, no input filters; spawn child agent_loop; await result; call extract_final_text. (Source: [PS])

    • Depends on: REQ-036, REQ-157
    • Definition of Done: A sub-agent tool registered on a parent agent completes a delegated task and returns the child agent's final text as a ToolResult.
  • REQ-149: Implement extract_final_text(messages) -> String: scan messages in reverse for the last Assistant message with Text content blocks; join and return them; fall back to "(sub-agent produced no text output)". (Source: [PS])

    • Depends on: REQ-002
    • Definition of Done: extract_final_text returns the text of the last assistant message; an all-tool-call assistant message returns the fallback string.
  • REQ-150: Sub-agent event forwarding: spawn a task to consume child AgentEvents and forward them to parent channel as ToolExecutionUpdate (for MessageUpdate::Text) and ProgressMessage (for child ProgressMessage) events. (Source: [PS])

    • Depends on: REQ-007, REQ-148
    • Definition of Done: Parent event stream includes ToolExecutionUpdate events showing the sub-agent's text generation in real time.
  • REQ-151: Implement SubAgentTool builder: SubAgentTool::new(name, model_config).with_system_prompt(...).with_tools(...).with_max_turns(...).with_thinking(...). (Source: [AR])

    • Depends on: REQ-021, REQ-148
    • Definition of Done: A fully configured SubAgentTool can be added to a parent agent's tool list via with_tools.

Milestone 5.2 — OpenAPI Adapter (Feature-Gated)

  • REQ-152: Implement OpenApiAdapter::from_str(spec, config, filter): auto-detect JSON vs YAML (first non-whitespace char { or [ → JSON, else YAML); parse OpenAPI 3.x spec; resolve base URL; generate one OpenApiToolAdapter per matching operation. (Source: [AR])

    • Depends on: REQ-153, REQ-154, REQ-155, REQ-156
    • Definition of Done: A valid OpenAPI 3.x spec string (JSON and YAML both) produces one tool adapter per operation with an operationId.
  • REQ-153: Classify parameters: path → URL substitution with RFC 3986 percent-encoding; query → query string; header → request headers; cookie → skip with no error; requestBody (application/json only) → keyed as "body" (or "_request_body" on name collision). (Source: [AR])

    • Depends on: REQ-021
    • Definition of Done: Path parameters appear in the URL; query parameters appear in the query string; cookie parameters are silently ignored.
  • REQ-154: Implement the HTTP execution pipeline per tool call: validate params, substitute path params, build URL, chain query/header params, apply OpenApiAuth, apply custom_headers, optionally attach JSON body, send request, read body, truncate at max_response_bytes on a UTF-8 boundary, return "{METHOD} {URL} → {STATUS}\n\n{BODY}". (Source: [AR])

    • Depends on: REQ-021
    • Definition of Done: A POST to a test endpoint with path, query, and body params produces the documented return format.
  • REQ-155: Implement OperationFilter: All (include everything with an operationId); ByOperationId(ids) (include only listed IDs); ByTag(tags) (include operations tagged with any listed tag); ByPathPrefix(prefix) (include operations whose path starts with prefix). Operations without operationId always emit a warning and are skipped. (Source: [AR])

    • Depends on: REQ-152
    • Definition of Done: Each filter variant correctly includes/excludes operations; an operation without operationId logs a warning and is excluded regardless of filter.
  • REQ-156: Apply optional name_prefix from OpenApiConfig: tool name becomes "{prefix}__{operationId}" when set. (Source: [AR])

    • Depends on: REQ-152
    • Definition of Done: With name_prefix: Some("myapi"), the tool for operationId: "getUser" is named "myapi__getUser".
  • REQ-157: Implement from_file(path, config, filter) (async file read) and from_url(url, config, filter) (HTTP GET via HTTP client). (Source: [AR])

    • Depends on: REQ-152
    • Definition of Done: Both sources produce identical tool lists as from_str on the same spec content.
  • REQ-158: Implement Agent::with_openapi_file, with_openapi_url, with_openapi_spec builders on Agent. Gate the entire openapi module behind an openapi feature flag. (Source: [AR])

    • Depends on: REQ-026, REQ-157
    • Definition of Done: Without the openapi feature, the code compiles successfully without the adapter; with it, all three builders are available.

Milestone 5.3 — Advanced Anthropic Protocol

  • REQ-159: Implement Anthropic OAuth auth path: when model_config indicates OAuth, use Authorization: Bearer {TOKEN} header plus beta headers claude-code-20250219,oauth-2025-04-20,fine-grained-tool-streaming-2025-05-14, x-app: cli, anthropic-dangerous-direct-browser-access: true, user-agent: claude-cli/2.1.2. (Source: [AR])

    • Depends on: REQ-040
    • Definition of Done: An OAuth-configured provider sends all documented headers; standard API key auth sends the standard x-api-key header.
  • REQ-160: Implement Anthropic InputJsonDelta tool-argument streaming: buffer incremental InputJsonDelta text fragments in arguments["__partial_json"]; parse the complete accumulated string as JSON on content_block_stop. (Source: [AR])

    • Depends on: REQ-040
    • Definition of Done: A tool call streamed in 5 InputJsonDelta fragments produces a single, complete, parseable JSON arguments object.

Milestone 5.4 — Ambiguity Resolutions

  • REQ-161: [AMBIGUOUS] Standardize AgentEnd emission on abort: define and document whether AgentEnd is emitted when cancellation is detected at various checkpoints (start of loop, mid-stream, mid-tool). Implement a consistent policy. (Source: [PS])

    • Depends on: REQ-067, REQ-082
    • Definition of Done: The chosen policy is documented; behavior is consistent regardless of where in the loop cancellation is detected.
  • REQ-162: TokenCounter trait in context/token.rs with HeuristicTokenCounter (chars/4) as default. Pluggable via ContextConfig.token_counter. Threaded through all hot-path call sites. (Source: [OV])

    • Depends on: REQ-054
    • Definition of Done: A TokenCounter trait or injection point exists; the default implementation uses the 4-char heuristic; a precise implementation can be substituted via configuration.
  • REQ-163: [AMBIGUOUS] Define sub-agent error propagation: document what execute() returns when the child agent_loop produces only error/empty messages. Implement the extract_final_text fallback consistently. (Source: [PS])

    • Depends on: REQ-149
    • Definition of Done: The policy is documented; child agent error messages are reflected in the fallback text or surfaced as ToolError::Failed.

Level 6 — Boss

Goal: The system is exceptional. It is fully tested, scalable, developer-friendly, and operates as a platform with a clear public API contract and operational runbooks.

Completion Criteria: The system passes load tests at 10x expected tool concurrency. Full test coverage includes unit, integration, property-based, and end-to-end tests. Public API documentation is complete. Operational runbooks cover all known failure modes.


Milestone 6.1 — Full Test Suite

  • REQ-164: Unit tests for all three compaction levels (level1, level2, level3) including: no-op when under budget; exact budget boundary; message count edge cases (fewer messages than keep_recent/keep_first); correct ordering of head+marker+tail in level 3. (Source: [AR])

    • Depends on: REQ-056 through REQ-059
    • Definition of Done: All edge cases identified above have dedicated test cases that pass.
  • REQ-165: Property-based tests for compact_messages: for any valid (messages, config) input, total_tokens(compact_messages(messages, config)) <= budget. (Source: [AR])

    • Depends on: REQ-056
    • Definition of Done: 10,000 random test cases all satisfy the budget invariant without panic.
  • REQ-166: Unit tests for delay_for_attempt: verify exponential growth; verify jitter stays in [0.8, 1.2] range over 10,000 samples; verify max_delay_ms cap is respected. (Source: [AR])

    • Depends on: REQ-071
    • Definition of Done: All three assertions pass across the full retry range.
  • REQ-167: Integration tests for each of the 7 provider protocols using a mock HTTP server: correct request format, correct response parsing, correct StopReason mapping, correct tool-call extraction. (Source: [AR])

    • Depends on: REQ-040 through REQ-042, REQ-120 through REQ-124
    • Definition of Done: Each provider has at least one happy-path integration test and one error-path test using a local mock server.
  • REQ-168: Integration test for MCP stdio transport: spawn a minimal mock MCP server subprocess; verify initialize handshake, tool listing, and tool execution. (Source: [AR])

    • Depends on: REQ-114 through REQ-119
    • Definition of Done: The mock MCP server can be connected to, queried, and called; all three phases produce correct results.
  • REQ-169: End-to-end agent loop tests using MockProvider: test single-turn text response; multi-turn tool call cycle; steering injection mid-run; follow-up queue; execution limit enforcement; context compaction trigger; input filter rejection. (Source: [AR])

    • Depends on: REQ-036 through REQ-090
    • Definition of Done: All seven scenarios have a passing automated test.

Milestone 6.2 — Load and Scale Testing

  • REQ-170: Load test: run 100 parallel agents each with 10 concurrent tool calls using MockProvider. Verify no data races, no deadlocks, correct result ordering, no memory leaks. (Source: [AR])

    • Depends on: REQ-045, REQ-085
    • Definition of Done: 1,000 total tool calls complete correctly with no panics and tool results are in original call order.
  • REQ-171: Load test: run a single agent for 1,000 turns with compaction enabled. Verify token estimates stay bounded; no unbounded memory growth; compaction fires when expected. (Source: [AR])

    • Depends on: REQ-056, REQ-060
    • Definition of Done: Memory usage stabilizes after compaction; no messages are dropped that violate keep_first/keep_recent invariants.
  • REQ-172: Memory profile: verify Agent.messages does not grow unboundedly in a long conversation with compaction enabled. (Source: [AR])

    • Depends on: REQ-056, REQ-060
    • Definition of Done: Message count stays within keep_first + keep_recent + small_constant after steady state is reached.

Milestone 6.3 — Public API Contract and Documentation

  • REQ-173: Publish complete API reference documentation for all public types, traits, and functions with usage examples for each primary use case from ../reference/glossary.md. (Source: [OV])

    • Depends on: REQ-001 through REQ-163
    • Definition of Done: A developer with no prior context can build a working coding assistant and CLI REPL from the docs alone.
  • REQ-174: Document all 7 provider integration contracts: authentication method, endpoint pattern, request format, response parsing notes, any quirks (e.g., Bedrock ndjson, Google tool ID generation, Azure api-key header). (Source: [AR])

    • Depends on: REQ-040 through REQ-042, REQ-120 through REQ-124
    • Definition of Done: Each provider has a documentation page listing all fields from the integration contract table.
  • REQ-175: Write and publish working example implementations: (1) CLI REPL with /quit, /clear, /model commands; (2) coding assistant with all built-in tools; (3) multi-agent pipeline with SubAgentTool. (Source: [OV])

    • Depends on: REQ-053, REQ-148
    • Definition of Done: All three examples compile and run end-to-end; the CLI REPL handles all three slash commands.
  • REQ-176: Publish AgentSkills standard compliance documentation and MCP integration guide. (Source: [OV])

    • Depends on: REQ-109 through REQ-113, REQ-114 through REQ-119
    • Definition of Done: Both guides include a "getting started" section that results in a working integration.

Milestone 6.4 — Developer Tooling and Operational Readiness

  • REQ-177: Package and publish the library with proper semantic versioning. The openapi feature is opt-in. Document all feature flags. (Source: [AR])

    • Depends on: REQ-158
    • Definition of Done: Library installs as a dependency; openapi feature is absent from the default build; enabling it adds the adapter without breaking existing code.
  • REQ-178: CI pipeline: run unit tests, integration tests (with mock servers), and openapi-feature tests on every commit. Gate provider live tests behind API key secrets. (Source: [AR])

    • Depends on: REQ-164 through REQ-169
    • Definition of Done: CI passes on every commit; provider live tests run in a separate gated workflow.
  • REQ-179: Operational runbook covering: retry tuning (when to adjust RetryConfig); context overflow handling (choosing ContextConfig values); provider failover (switching providers on persistent failures); MCP server crash recovery; performance profiling guide. (Source: [AR])

    • Depends on: REQ-071 through REQ-077
    • Definition of Done: The runbook covers all five topics with actionable decision trees.

Requirement Index

REQDescriptionLevelMilestoneSourceDepends On
REQ-001Content enum (Text, Image, Thinking, ToolCall)11.1[AR]
REQ-002Message enum (User, Assistant, ToolResult)11.1[AR]REQ-001, REQ-005, REQ-006
REQ-003AgentMessage enum (Llm, Extension)11.1[AR]REQ-002, REQ-004
REQ-004ExtensionMessage struct11.1[AR]
REQ-005StopReason enum11.1[AR]
REQ-006Usage struct with cache_hit_rate()11.1[AR]
REQ-007AgentEvent enum (all variants)11.1[AR]REQ-002, REQ-008
REQ-008StreamDelta enum11.1[AR]
REQ-009ToolContext struct11.1[AR]
REQ-010ToolResult and ToolError types11.1[AR]REQ-001
REQ-011ContextConfig struct with defaults11.1[AR]
REQ-012ExecutionLimits and ExecutionTracker11.1[AR]
REQ-013RetryConfig with defaults11.1[AR]
REQ-014CacheConfig and CacheStrategy11.1[AR]
REQ-015StreamConfig struct11.1[AR]REQ-014, REQ-016
REQ-016ToolDefinition struct11.1[AR]
REQ-017QueueMode enum11.1[AR]
REQ-018Full Serialize/Deserialize on AgentMessage tree11.1[OV]REQ-001–017
REQ-019ThinkingLevel enum11.1[OV]
REQ-020StreamProvider trait and ProviderError enum11.2[AR]REQ-002, REQ-015
REQ-021AgentTool trait11.2[AR]REQ-009, REQ-010
REQ-022InputFilter trait11.2[OV]
REQ-023CompactionStrategy trait11.2[AR]REQ-003, REQ-011
REQ-024Agent::new() with all field defaults11.3[PS]REQ-011–017, REQ-019–020
REQ-025Builder methods: system_prompt, model, api_key, etc.11.3[PS]REQ-024
REQ-026Builder methods: tools, context_config, limits, etc.11.3[PS]REQ-024
REQ-027Steering/follow-up queues as Arc<Mutex>11.3[AR]REQ-003, REQ-024
REQ-028AgentContext struct11.4[AR]REQ-003, REQ-021
REQ-029AgentLoopConfig struct11.4[OV]REQ-011–017, REQ-023
REQ-030MockProvider implementation11.5[AR]REQ-020
REQ-031Smoke test: Agent constructs without error11.5[OV]REQ-024–030
REQ-032Unbounded async event channel22.1[AR]REQ-007
REQ-033CancellationToken with child_token propagation22.1[AR]
REQ-034Agent::prompt() entry point22.2[PS]REQ-002, REQ-035
REQ-035Agent::prompt_messages_with_sender()22.2[PS]REQ-027–029, REQ-033, REQ-036
REQ-036agent_loop() implementation22.3[PS]REQ-032, REQ-037
REQ-037agent_loop_continue() implementation22.3[PS]REQ-036
REQ-038run_loop() inner loop (happy path)22.3[PS]REQ-039, REQ-045, REQ-060
REQ-039stream_assistant_response() (no retry)22.4[PS]REQ-007–008, REQ-015, REQ-020, REQ-032
REQ-040AnthropicProvider::stream()22.4[AR]REQ-020, REQ-039
REQ-041OpenAiCompatProvider::stream()22.4[AR]REQ-020, REQ-039
REQ-042ProviderRegistry with default()22.4[AR]REQ-040, REQ-041
REQ-043StopReason determination in providers22.4[PS]REQ-005, REQ-040–041
REQ-044Filter Extension messages before LLM call22.4[AR]REQ-003, REQ-015
REQ-045execute_tool_calls() (Parallel dispatch)22.5[PS]REQ-046
REQ-046execute_single_tool()22.5[PS]REQ-007, REQ-009–010, REQ-021, REQ-033
REQ-047BashTool::execute() (basic)22.5[PS]REQ-010, REQ-021
REQ-048ReadFileTool::execute() (basic)22.5[PS]REQ-010, REQ-021
REQ-049WriteFileTool::execute()22.5[AR]REQ-010, REQ-021
REQ-050EditFileTool::execute() (basic)22.5[PS]REQ-010, REQ-021
REQ-051ListFilesTool::execute() (basic)22.5[PS]REQ-010, REQ-021
REQ-052SearchTool::execute() (basic)22.5[PS]REQ-010, REQ-021
REQ-053default_tools() returning all 6 tools22.5[AR]REQ-047–052
REQ-054estimate_tokens() heuristic22.6[PS]
REQ-055content_tokens() and message_tokens()22.6[PS]REQ-001, REQ-003, REQ-054
REQ-056compact_messages() 3-tier cascade22.6[PS]REQ-055, REQ-057–059
REQ-057level1_truncate_tool_outputs()22.6[PS]REQ-003, REQ-054
REQ-058level2_summarize_old_turns()22.6[PS]REQ-003, REQ-054
REQ-059level3_drop_middle() and keep_within_budget()22.6[PS]REQ-003, REQ-054
REQ-060Integrate compaction in run_loop22.6[PS]REQ-038, REQ-056
REQ-061ExecutionTracker::record_turn() and check_limits()22.7[AR]REQ-012
REQ-062Execution limit enforcement in run_loop22.7[PS]REQ-038, REQ-061
REQ-063Agent::save_messages()22.8[OV]REQ-018
REQ-064Agent::restore_messages()22.8[OV]REQ-018, REQ-063
REQ-065Agent::reset()22.8[AR]REQ-033
REQ-066Agent::steer() and Agent::follow_up()22.8[AR]REQ-027
REQ-067Agent::abort()22.8[AR]REQ-033, REQ-035
REQ-068Input filter chain execution33.1[PS]REQ-022, REQ-036
REQ-069Reject → emit InputRejected + AgentEnd([])33.1[PS]REQ-068
REQ-070Warn → append warning text to last user message33.1[PS]REQ-068
REQ-071delay_for_attempt() exponential backoff with jitter33.2[PS]REQ-013
REQ-072is_retryable() on ProviderError33.2[AR]REQ-020
REQ-073retry_after() on ProviderError33.2[AR]REQ-020
REQ-074Retry loop in stream_assistant_response33.2[PS]REQ-039, REQ-071–073
REQ-075ProviderError::classify() HTTP status routing33.3[PS]REQ-020
REQ-076is_context_overflow() phrase matching33.3[PS]
REQ-077Context overflow recovery trigger33.3[AR]REQ-056, REQ-075–076
REQ-078ToolError::Failed/InvalidArgs → error ToolResult33.4[AR]REQ-010, REQ-046
REQ-079ToolError::NotFound → "Tool X not found"33.4[PS]REQ-046
REQ-080ToolError::Cancelled → "Skipped" ToolResult33.4[AR]REQ-010, REQ-046
REQ-081Error stop reason handling in run_loop33.5[PS]REQ-038, REQ-082
REQ-082Aborted stop reason handling in run_loop33.5[PS]REQ-038
REQ-083Synthetic error Message::Assistant on provider failure33.5[PS]REQ-002, REQ-039
REQ-084execute_sequential() with steering check33.6[PS]REQ-046, REQ-080
REQ-085execute_batch() (Parallel) with post-batch steering33.6[PS]REQ-046
REQ-086Batched { size } dispatch with inter-batch steering33.6[PS]REQ-085
REQ-087Drain steering queue at start of outer loop33.7[PS]REQ-038
REQ-088Inject steering messages into pending after tools33.7[PS]REQ-038, REQ-084–085
REQ-089Follow-up queue check re-enters outer loop33.7[PS]REQ-038
REQ-090QueueMode::OneAtATime and QueueMode::All33.7[AR]REQ-017, REQ-027
REQ-091before_turn callback with abort-if-false33.8[PS]REQ-038
REQ-092after_turn callback on every turn33.8[PS]REQ-038
REQ-093on_error callback on Error stop reason33.8[PS]REQ-081
REQ-094BashTool deny patterns33.9[PS]REQ-047
REQ-095BashTool timeout + cancellation race33.9[PS]REQ-047
REQ-096BashTool output truncation33.9[PS]REQ-047
REQ-097BashTool confirm_fn callback33.9[PS]REQ-047
REQ-098ReadFileTool size limits (1MB text, 20MB image)33.9[PS]REQ-048
REQ-099ReadFileTool image path (base64, MIME detection)33.9[PS]REQ-001, REQ-048
REQ-100ReadFileTool cancellation check33.9[PS]REQ-048
REQ-101EditFileTool zero-match error with fuzzy hint33.9[PS]REQ-050
REQ-102EditFileTool multiple-match error33.9[PS]REQ-050
REQ-103EditFileTool cancellation check33.9[PS]REQ-050
REQ-104WriteFileTool cancellation check33.9[AR]REQ-049
REQ-105ListFilesTool timeout + max_results truncation33.9[PS]REQ-051
REQ-106SearchTool rg→grep fallback + cancellation33.9[PS]REQ-052
REQ-107is_streaming guard in prompt_messages_with_sender33.10[PS]REQ-035
REQ-108agent_loop_continue precondition validation33.10[PS]REQ-037
REQ-109SkillSet::load() with collision handling33.11[PS]REQ-110
REQ-110parse_frontmatter() with error variants33.11[PS]
REQ-111SkillSet::format_for_prompt() XML output33.11[PS]REQ-109
REQ-112SkillSet::load_dir() and SkillSet::merge()33.11[AR]REQ-109
REQ-113Agent::with_skills() builder33.11[PS]REQ-111
REQ-114McpClient::connect_stdio() with handshake33.12[PS]REQ-115, REQ-116
REQ-115McpClient::send_request() JSON-RPC 2.033.12[PS]
REQ-116McpClient::list_tools() and call_tool()33.12[PS]REQ-115
REQ-117McpToolAdapter implementing AgentTool33.12[AR]REQ-001, REQ-021, REQ-116
REQ-118All McpError variants → ToolError::Failed33.12[AR]REQ-117
REQ-119Agent::with_mcp_server_stdio() builder33.12[AR]REQ-114, REQ-117
REQ-120GoogleProvider::stream() (Gemini API)44.1[AR]REQ-020
REQ-121GoogleVertexProvider::stream() (Vertex AI)44.1[AR]REQ-120
REQ-122BedrockProvider::stream() (ConverseStream)44.1[AR]REQ-020
REQ-123OpenAiResponsesProvider::stream()44.1[AR]REQ-020
REQ-124AzureOpenAiProvider::stream()44.1[AR]REQ-123
REQ-125All 7 providers in ProviderRegistry::default()44.1[AR]REQ-042, REQ-120–124
REQ-126CacheStrategy::Auto breakpoint placement44.2[AR]REQ-014, REQ-040
REQ-127CacheStrategy::Manual and Disabled44.2[AR]REQ-126
REQ-128Cache token counts in Usage44.2[AR]REQ-006, REQ-040
REQ-129ThinkingLevel → Anthropic thinking parameter44.3[AR]REQ-019, REQ-040
REQ-130ThinkingLevel → OpenAI reasoning_effort44.3[AR]REQ-019, REQ-041
REQ-131Parse Thinking content from streaming responses44.3[AR]REQ-001, REQ-008, REQ-040
REQ-132McpClient::connect_http()44.4[AR]REQ-115
REQ-133Agent::with_mcp_server_http() with prefix support44.4[AR]REQ-117, REQ-132
REQ-134MCP stdio shutdown (EOF + kill)44.4[AR]REQ-114
REQ-135Structured retry logging44.5[PS]REQ-074
REQ-136ContextTracker hybrid token tracking44.5[AR]REQ-054–055
REQ-137ToolResult.details per-tool metadata44.5[AR]REQ-047–052
REQ-138OpenApiAuth credential redaction in debug44.6[AR]
REQ-139BashTool default deny-pattern list44.6[PS]REQ-094
REQ-140CancellationToken::child_token() propagation44.7[PS]REQ-033, REQ-046
REQ-141Sub-agent inherits parent cancel token44.7[PS]REQ-033, REQ-140
REQ-142on_update callback → ToolExecutionUpdate event44.8[AR]REQ-007, REQ-046
REQ-143on_progress callback → ProgressMessage event44.8[AR]REQ-007, REQ-046
REQ-144Agent::prompt_with_sender()44.8[AR]REQ-034
REQ-145transform_context/convert_to_llm hooks44.8[PS]REQ-039
REQ-146Agent::with_compaction_strategy() builder44.8[AR]REQ-023, REQ-060
REQ-147ModelConfig struct and application in OpenAiCompat44.8[AR]REQ-041
REQ-148SubAgentTool::execute()55.1[PS]REQ-036, REQ-157
REQ-149extract_final_text()55.1[PS]REQ-002
REQ-150Sub-agent event forwarding to parent channel55.1[PS]REQ-007, REQ-148
REQ-151SubAgentTool builder API55.1[AR]REQ-021, REQ-148
REQ-152OpenApiAdapter::from_str() JSON/YAML parsing55.2[AR]REQ-153–156
REQ-153OpenAPI parameter classification55.2[AR]REQ-021
REQ-154OpenAPI HTTP execution pipeline55.2[AR]REQ-021
REQ-155OperationFilter variants55.2[AR]REQ-152
REQ-156name_prefix tool naming55.2[AR]REQ-152
REQ-157from_file() and from_url() spec sources55.2[AR]REQ-152
REQ-158OpenAPI builders on Agent + feature flag55.2[AR]REQ-026, REQ-157
REQ-159Anthropic OAuth auth path55.3[AR]REQ-040
REQ-160Anthropic InputJsonDelta tool-arg streaming55.3[AR]REQ-040
REQ-161[AMBIGUOUS] AgentEnd on abort policy55.4[PS]REQ-067, REQ-082
REQ-162[AMBIGUOUS] TokenCounter abstraction point55.4[OV]REQ-054
REQ-163[AMBIGUOUS] Sub-agent error propagation policy55.4[PS]REQ-149
REQ-164Compaction algorithm unit tests66.1[AR]REQ-056–059
REQ-165Property-based tests: budget invariant66.1[AR]REQ-056
REQ-166Retry backoff unit tests66.1[AR]REQ-071
REQ-167Provider integration tests (mock HTTP server)66.1[AR]REQ-040–042, REQ-120–124
REQ-168MCP stdio integration test66.1[AR]REQ-114–119
REQ-169End-to-end agent loop tests (MockProvider)66.1[AR]REQ-036–090
REQ-170Load test: 100 parallel agents, 10 concurrent tools66.2[AR]REQ-045, REQ-085
REQ-171Load test: 1,000-turn single agent with compaction66.2[AR]REQ-056, REQ-060
REQ-172Memory profile: message growth is bounded66.2[AR]REQ-056, REQ-060
REQ-173Public API reference documentation66.3[OV]REQ-001–163
REQ-174Provider integration contract documentation66.3[AR]REQ-040–042, REQ-120–124
REQ-175Working example implementations66.3[OV]REQ-053, REQ-148
REQ-176AgentSkills + MCP integration guides66.3[OV]REQ-109–119
REQ-177Library packaging with feature flags66.4[AR]REQ-158
REQ-178CI pipeline with gated live tests66.4[AR]REQ-164–169
REQ-179Operational runbooks66.4[AR]REQ-071–077
REQ-180ContinuationKind enum (Default, Rerun { tag }, Branch { tag })44.9[AR]
REQ-181TurnTrigger enum (User, Continuation, SubAgent, Branch)44.9[AR]
REQ-182before_loop/after_loop hooks on AgentLoopConfig44.9[AR]REQ-029, REQ-036
REQ-183before_tool_execution/after_tool_execution hooks on AgentLoopConfig44.9[AR]REQ-029, REQ-046
REQ-184before_tool_execution_update/after_tool_execution_update hooks44.9[AR]REQ-142, REQ-183
REQ-185Guaranteed event hook ordering invariant44.9[AR]REQ-182–184, REQ-091–092
REQ-186provider_id() -> &str required method on StreamProvider; implement in all 7 providers44.9[AR]REQ-020, REQ-125
REQ-187config_id: Option<String> on AgentLoopConfig; auto-derived when None44.9[AR]REQ-029, REQ-186
REQ-188agent_id/session_id UUID fields on Agent; stable for Agent lifetime44.9[AR]REQ-024
REQ-189loop_counters and last_loop_id on Agent; next_loop_id() helper44.9[AR]REQ-024, REQ-187, REQ-188
REQ-190agent_id, session_id, loop_id, parent_loop_id, continuation_kind on AgentContext; write-back in agent_loop44.9[AR]REQ-028, REQ-180, REQ-188
REQ-191Assert agent_id/session_id are Some in agent_loop_continue44.9[AR]REQ-037, REQ-190
REQ-192AgentStart event: agent_id, session_id, loop_id, parent_loop_id, continuation_kind fields44.9[AR]REQ-007, REQ-180, REQ-190
REQ-193TurnStart.triggered_by: TurnTrigger; Branch continuation uses Branch on first turn44.9[AR]REQ-007, REQ-181, REQ-190
REQ-194child_loop_id: Option<String> on ToolResult and ToolExecutionEnd; set by sub-agent tools44.9[AR]REQ-010, REQ-007, REQ-148
REQ-195SubAgentTool::with_parent_loop_id(loop_id) builder; child AgentContext includes parent_loop_id44.9[AR]REQ-151, REQ-190

Known Ambiguities

Items marked [AMBIGUOUS] in the spec that require a design decision before implementation:

IDDescriptionSuggested ResolutionLevel Introduced
AMB-001AgentEnd emission on abort — pseudocode says AgentEnd is NOT emitted on abort, but notes this may vary depending on where in the loop cancellation is detected (provider Start/Done events may still arrive).Define a clear policy: AgentEnd is ALWAYS emitted when the loop exits, including on abort, so callers can rely on the channel always closing cleanly. Gate this by ensuring cancellation detection before the loop attempts to emit AgentEnd.5
AMB-002Token counting precision — estimate_tokens uses a 4-chars-per-token heuristic explicitly noted as imprecise. No integration with tiktoken or similar is specified.Introduce a TokenCounter trait (or function pointer) on ContextConfig that defaults to the 4-char heuristic but can be overridden by the caller. This keeps the default zero-dependency while enabling precision via injection.5
AMB-003Sub-agent error propagation — when a child agent_loop produces only error or tool-only messages (no Text in the final assistant message), extract_final_text returns a fixed fallback string. It is unclear whether the calling tool should return Ok(ToolResult { fallback }) or Err(ToolError::Failed(...)).Return Ok(ToolResult) with the fallback text always. If the sub-agent produced an error assistant message, include the error_message field in the fallback text so the parent LLM can see and react to it.5

Level Completion Checklist

  • Level 1 — Survive: All core types, traits, and the Agent struct initialize without error; smoke test passes.
  • Level 2 — Useful: Text prompt → LLM call → tool execution → final response works end-to-end; all 6 built-in tools execute on valid input; message persistence round-trips correctly.
  • Level 3 — Smart: Input filters, retry, provider error classification, tool errors, execution limits, steering/follow-up queues, lifecycle callbacks, tool safety guards, skill loading, and MCP client all handle their error paths without panicking.
  • Level 4 — Professional: All 7 provider protocols implemented; prompt caching and extended thinking integrated; cancellation propagates to all I/O; structured logging in place; ContextTracker accurate.
  • Level 5 — Creative: Sub-agent delegation works end-to-end; OpenAPI adapter generates callable tools; Anthropic OAuth and InputJsonDelta streaming are correct; all three ambiguities have documented resolutions and implementations.
  • Level 6 — Boss: All test suites pass (unit, property-based, integration, end-to-end, load); public API docs and examples are complete; CI runs automatically; operational runbooks are written.

Session & Loop Identity — Future Scenarios

Added: 2026-03-22 Status: Foundation implemented (loop_id, ContinuationKind, parent_loop_id, child_loop_id). The scenarios below build on this foundation but are out of scope for the initial change.

The current implementation covers:

  • loop_id derived from session_id + config_id + counter (config owns its identity)
  • ContinuationKind enum: Default, Rerun { tag }, Branch { tag }
  • parent_loop_id for ancestry tracking across reruns/branches
  • child_loop_id on ToolExecutionEnd for parent→sub-agent traceability
  • Asserts in agent_loop_continue requiring agent_id/session_id to be set
  • TurnTrigger::Branch fires on first turn of a Branch continuation

Future: HITL Resume

Scenario: User cancels a loop mid-execution (via Agent::abort()), reviews the partial output, then resumes. The loop was aborted at some known message boundary.

Mechanism: Caller restores context.messages to the desired resume point, then calls agent_loop_continue(Rerun | Branch). The kind communicates intent:

  • Rerun — resume from the same point (same logical path, treat as a retry)
  • Branch — resume but with modifications (e.g., injected steering message, different system prompt, tweaked tool result) — a diverging path from the original

What needs to be built: A context.messages checkpoint API. The current Agent::messages() getter returns a slice; the caller needs to be able to snapshot and restore it. The save_messages / restore_messages methods on Agent already support this (JSON round-trip). The missing piece is a higher-level Agent::checkpoint() -> Checkpoint and Agent::restore(checkpoint) that bundle the full state (messages + loop_id + session_id) for clean HITL resume without manual field management.

Future: Checkpoint Restore

Scenario: Context is serialized to persistent storage (database, file) and later loaded for a new run — either by the same process after restart or by a different process instance.

Mechanism: Same as HITL resume at the loop level. The caller deserializes context.messages and sets the identity fields (agent_id, session_id, loop_id) to their original values, then calls agent_loop_continue(Branch). The parent_loop_id points to the last loop ID from the original session, maintaining the ancestry chain across process boundaries.

What needs to be built: A serializable AgentSnapshot type that captures everything needed to resume: messages, agent_id, session_id, last_loop_id, and any relevant config fields. AgentSnapshot::save(path) / AgentSnapshot::load(path) convenience methods. The snapshot does NOT include the provider config (API keys, base URLs) — those remain in the caller's environment.

Future: Parallel Exploration

Scenario: Multiple branches from the same checkpoint are run concurrently — e.g., A/B testing two different tool result injections, or evaluating three different system prompt variants on the same conversation prefix.

Mechanism: The caller snapshots the context at a branching point, then calls multiple agent_loop_continue(Branch) concurrently, each with a different modification to context.messages before the call. Each concurrent call produces an independent event stream with its own loop_id and parent_loop_id pointing to the same branch-point loop.

What needs to be built: No new primitives are needed — agent_loop_continue and AgentContext already support this. The caller is responsible for cloning the context and making independent calls. A higher-level Agent::explore_branches(Vec<BranchSpec>) -> Vec<Receiver<AgentEvent>> convenience method could simplify the pattern but is not required for correctness.

Concurrency note: Each branch needs its own AgentContext (owned), its own CancellationToken, and its own mpsc::UnboundedSender. tokio::spawn each agent_loop_continue call independently. The parent task collects results from all branch receivers.

Future: Auto Origin/Continue Selection

Scenario: The caller wants to send a new message to the agent without knowing whether the current context requires an origin call (agent_loop) or a continuation (agent_loop_continue).

Mechanism: Inspect context.messages.last():

  • No messages → agent_loop (fresh start)
  • Last message is User or ToolResultagent_loop_continue (already awaiting model response)
  • Last message is Assistantagent_loop with new prompt (start new turn)

What needs to be built: An Agent::send(message) method (or similar) that encapsulates this logic. It would inspect the context state, build the appropriate call type, and dispatch. This trades explicit caller control for convenience and is opt-in.