phi-core
Simple, effective agent loop in Rust.
phi-core is a library for building LLM-powered agents that can use tools. It provides the core loop — prompt the model, execute tool calls, feed results back — and gets out of your way.
Philosophy
The loop is the product. An agent is just a loop: send messages to an LLM, get back text and tool calls, execute the tools, repeat until the model stops. phi-core implements this loop with streaming, cancellation, context management, and multi-provider support — so you don't have to.
Features
- Streaming events — Real-time
AgentEventstream for UI updates (text deltas, thinking, tool execution) - Multi-provider — Anthropic, OpenAI, Google Gemini, Amazon Bedrock, Azure OpenAI, and any OpenAI-compatible API
- Tool system —
AgentTooltrait with built-in coding tools (bash, file read/write/edit, search) - Context management — Automatic token estimation, tiered compaction (truncate tool outputs → summarize → drop old messages)
- Execution limits — Max turns, tokens, and wall-clock time
- Steering & follow-ups — Interrupt the agent mid-run or queue work for after it finishes
- Cancellation —
CancellationToken-based abort at any point - Builder pattern — Ergonomic
BasicAgentstruct with chainable configuration;Agenttrait for polymorphism - Config-driven construction — TOML/JSON/YAML config →
agent_from_config()→Arc<dyn Agent> - Session persistence —
SessionRecordermaterializes structured session/loop/turn records from events - Sub-agents — Delegate tasks to child agent loops via
SubAgentTool - MCP integration — Connect to external tool servers via Model Context Protocol (stdio + HTTP)
- Evaluational parallelism — Run N configs concurrently, select the best result via
EvaluationStrategy
Ecosystem
phi-core is part of the LazyBouy ecosystem. It powers the agent backend for Phi applications.
- Repository: github.com/LazyBouy/phi-core
- License: MIT
Installation
Requirements
- Rust 2021 edition (1.75+)
- Tokio async runtime
Add to Cargo.toml
[dependencies]
phi-core = "0.8"
Dependencies
phi-core brings in these key dependencies automatically:
| Crate | Purpose |
|---|---|
tokio | Async runtime (full features) |
serde / serde_json | Serialization |
reqwest | HTTP client for provider APIs |
reqwest-eventsource | SSE streaming |
async-trait | Async trait support |
tokio-util | CancellationToken |
thiserror | Error types |
tracing | Logging |
Feature Flags
All providers and built-in tools are included by default. Optional features:
| Feature | Dependencies | Description |
|---|---|---|
openapi | openapiv3, serde_yaml | Auto-generate tools from OpenAPI 3.0 specs |
Enable in Cargo.toml:
[dependencies]
phi-core = { version = "0.7", features = ["openapi"] }
Quick Start
Basic Example with Anthropic
use phi_core::{BasicAgent, AgentEvent, StreamDelta}; use phi_core::provider::ModelConfig; use phi_core::tools::default_tools; #[tokio::main] async fn main() { let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let mut agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_system_prompt("You are a helpful coding assistant.") .with_tools(default_tools()); let mut rx = agent.prompt("List the files in the current directory").await; while let Some(event) = rx.recv().await { match event { AgentEvent::MessageUpdate { delta, .. } => match delta { StreamDelta::Text { delta } => print!("{}", delta), StreamDelta::Thinking { delta } => print!("[thinking] {}", delta), _ => {} }, AgentEvent::ToolExecutionStart { tool_name, .. } => { println!("\n→ Running tool: {}", tool_name); } AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => { if is_error { println!(" ✗ {} failed", tool_name); } else { println!(" ✓ {} done", tool_name); } } AgentEvent::AgentEnd { .. } => { println!("\n\nDone."); } _ => {} } } }
Example with OpenAI-Compatible Provider
For OpenAI, xAI, Groq, or any compatible API, use ModelConfig::openai() or ModelConfig::local():
use phi_core::{BasicAgent, AgentEvent, StreamDelta}; use phi_core::provider::ModelConfig; use phi_core::tools::default_tools; #[tokio::main] async fn main() { let api_key = std::env::var("OPENAI_API_KEY").unwrap(); let mut agent = BasicAgent::new(ModelConfig::openai("gpt-4o", "GPT-4o", &api_key)) .with_system_prompt("You are a helpful assistant.") .with_tools(default_tools()); let mut rx = agent.prompt("What is 2 + 2?").await; while let Some(event) = rx.recv().await { match event { AgentEvent::MessageUpdate { delta, .. } => { if let StreamDelta::Text { delta } = delta { print!("{}", delta); } } AgentEvent::AgentEnd { .. } => println!(), _ => {} } } }
Real-Time Streaming
By default, agent.prompt() blocks until the loop finishes and returns a receiver with all events buffered. To consume events in real-time, use prompt_with_sender() with a caller-provided channel:
use phi_core::{BasicAgent, AgentEvent, StreamDelta}; use phi_core::provider::ModelConfig; use phi_core::tools::default_tools; #[tokio::main] async fn main() { let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let mut agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_system_prompt("You are a helpful assistant.") .with_tools(default_tools()); let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel(); // Consume events in real-time on a separate task tokio::spawn(async move { while let Some(event) = rx.recv().await { match event { AgentEvent::MessageUpdate { delta, .. } => { if let StreamDelta::Text { delta } = delta { print!("{}", delta); } } AgentEvent::AgentEnd { .. } => println!(), _ => {} } } }); // This blocks until the loop finishes; state is restored automatically agent.prompt_with_sender("What is 2 + 2?", tx).await; // Agent is ready for another prompt immediately let _rx = agent.prompt("Follow up question").await; }
Using the Low-Level API
For more control, use agent_loop() directly:
use phi_core::agent_loop::{agent_loop, AgentLoopConfig}; use phi_core::provider::ModelConfig; use phi_core::types::*; use tokio::sync::mpsc; use tokio_util::sync::CancellationToken; #[tokio::main] async fn main() { let (tx, mut rx) = mpsc::unbounded_channel(); let cancel = CancellationToken::new(); let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let mut context = AgentContext { system_prompt: "You are helpful.".into(), messages: Vec::new(), tools: phi_core::tools::default_tools(), ..Default::default() }; let config = AgentLoopConfig { model_config: ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, ), thinking_level: ThinkingLevel::Off, max_tokens: None, temperature: None, convert_to_llm: None, transform_context: None, get_steering_messages: None, get_follow_up_messages: None, context_config: None, execution_limits: None, cache_config: CacheConfig::default(), tool_execution: ToolExecutionStrategy::default(), retry_config: phi_core::RetryConfig::default(), before_turn: None, after_turn: None, on_error: None, input_filters: vec![], ..Default::default() }; let prompts = vec![AgentMessage::Llm(Message::user("Hello!"))]; let new_messages = agent_loop(prompts, &mut context, &config, tx, cancel).await; // Drain events while let Ok(_event) = rx.try_recv() { // handle events... } println!("Got {} new messages", new_messages.len()); }
The Agent Loop
The agent loop is the core of phi-core. It implements the fundamental cycle:
User prompt → LLM call → Tool execution → LLM call → ... → Final response
The agent_loop module contains the core loop logic in mod.rs and the evaluation sub-module for evaluational parallelism strategies.
How It Works
┌──────────────────────────────────────────────┐
│ agent_loop() │
│ │
│ 1. Add prompts to context │
│ 2. Emit AgentStart + TurnStart │
│ │
│ ┌─────────── Inner Loop ──────────────┐ │
│ │ • Check steering messages │ │
│ │ • Check execution limits │ │
│ │ • Compact context (if configured) │ │
│ │ • Stream LLM response │ │
│ │ • Extract tool calls │ │
│ │ • Execute tools (with steering) │ │
│ │ • Emit TurnEnd │ │
│ │ • Continue if tool_calls or steer │ │
│ └─────────────────────────────────────┘ │
│ │
│ 3. Check follow-up messages │
│ 4. If follow-ups exist, loop again │
│ 5. Emit AgentEnd │
└──────────────────────────────────────────────┘
Entry Points
agent_loop()
Starts a new agent run with prompt messages:
#![allow(unused)] fn main() { pub async fn agent_loop( prompts: Vec<AgentMessage>, context: &mut AgentContext, config: &AgentLoopConfig, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> Vec<AgentMessage> }
The prompts are added to context, then the loop runs. Returns all new messages generated during the run.
agent_loop_continue()
Resumes from existing context (e.g., after an error, retry, or branch):
#![allow(unused)] fn main() { pub async fn agent_loop_continue( context: &mut AgentContext, config: &AgentLoopConfig, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> Vec<AgentMessage> }
Preconditions: context.agent_id and context.session_id must be Some — the function panics with a descriptive message otherwise. In practice, any context that passed through agent_loop() at least once already has these set. When constructing a context manually (e.g., from a persisted snapshot), set them explicitly before calling this function.
The last message in context must also not be an assistant message.
AgentLoopConfig
#![allow(unused)] fn main() { pub struct AgentLoopConfig { /// REQUIRED — complete provider identity: model id, api_key, base_url, protocol, cost rates. pub model_config: ModelConfig, /// Optional override — bypasses ProviderRegistry, used for MockProvider in tests. pub provider_override: Option<Arc<dyn StreamProvider>>, pub config_id: Option<String>, pub thinking_level: ThinkingLevel, pub max_tokens: Option<u32>, pub temperature: Option<f32>, pub convert_to_llm: Option<ConvertToLlmFn>, pub transform_context: Option<TransformContextFn>, pub get_steering_messages: Option<GetMessagesFn>, pub get_follow_up_messages: Option<GetMessagesFn>, pub context_config: Option<ContextConfig>, pub execution_limits: Option<ExecutionLimits>, pub cache_config: CacheConfig, pub tool_execution: ToolExecutionStrategy, pub retry_config: RetryConfig, pub before_loop: Option<BeforeLoopFn>, pub after_loop: Option<AfterLoopFn>, pub before_turn: Option<BeforeTurnFn>, pub after_turn: Option<AfterTurnFn>, pub on_error: Option<OnErrorFn>, pub before_tool_execution: Option<BeforeToolExecutionFn>, pub after_tool_execution: Option<AfterToolExecutionFn>, pub before_tool_execution_update: Option<BeforeToolExecutionUpdateFn>, pub after_tool_execution_update: Option<AfterToolExecutionUpdateFn>, pub before_compaction_start: Option<BeforeCompactionStartFn>, pub after_compaction_end: Option<AfterCompactionEndFn>, pub input_filters: Vec<Arc<dyn InputFilter>>, pub first_turn_trigger: TurnTrigger, pub context_translation: Option<Arc<dyn ContextTranslationStrategy>>, pub prun_pending: Option<Arc<Mutex<Vec<PrunRequest>>>>, } }
| Field | Purpose |
|---|---|
model_config | Required. Complete provider identity: model id, api_key, base_url, api protocol, cost rates, compat flags. The provider is resolved from model_config.api via ProviderRegistry. |
provider_override | Custom Arc<dyn StreamProvider> — bypasses registry when Some. Used for MockProvider in tests or fully custom backends. |
config_id | Optional stable identity for this config; auto-derived as "{provider_id}.{model_slug}[.thinking]" when None. Used as the middle segment of loop_id. |
thinking_level | Off, Minimal, Low, Medium, High |
convert_to_llm | Custom AgentMessage[] → Message[] conversion |
transform_context | Pre-processing hook for context pruning |
get_steering_messages | Returns user interruptions during tool execution |
get_follow_up_messages | Returns queued work after agent would stop |
context_config | Token budget and compaction settings |
execution_limits | Max turns, tokens, duration |
cache_config | Prompt caching behavior (see Prompt Caching) |
tool_execution | Parallel, Sequential, or Batched (see Tools) |
retry_config | Retry behavior for transient errors (see Retry) |
before_loop | Called once before AgentStart; return false to abort the entire run (see Callbacks) |
after_loop | Called once after AgentEnd with all new messages and accumulated usage (see Callbacks) |
before_turn | Called before each LLM call; return false to abort (see Callbacks) |
after_turn | Called after each turn with messages and usage (see Callbacks) |
on_error | Called on StopReason::Error with the error string (see Callbacks) |
before_tool_execution | Called before each tool call; return false to skip it (see Callbacks) |
after_tool_execution | Called after each tool call completes (see Callbacks) |
before_tool_execution_update | Called before each streaming tool update; return false to suppress the event (see Callbacks) |
after_tool_execution_update | Called after each streaming tool update event (see Callbacks) |
before_compaction_start | Called before compaction starts with (estimated_tokens, message_count); return false to skip compaction for this cycle (see Callbacks) |
after_compaction_end | Called after compaction completes with (messages_before, messages_after, tokens_before, tokens_after) (see Callbacks) |
input_filters | Input filters applied to user messages before the LLM call (see Tools) |
first_turn_trigger | The TurnTrigger for the first TurnStart event; defaults to TurnTrigger::User, set to SubAgent by sub-agent callers |
context_translation | Optional ContextTranslationStrategy for cross-provider compatibility — translates content types (e.g., Content::Thinking) when targeting a different provider (G8) |
prun_pending | Shared state for PrunTool to communicate pruning requests to the loop; set automatically by with_prun_tool() |
0.9.0 — async lifecycle hooks.
BeforeLoopFn,AfterLoopFn,BeforeTurnFn,AfterTurnFn,OnErrorFn,BeforeToolExecutionFn,AfterToolExecutionFn,BeforeCompactionStartFn, andAfterCompactionEndFnare now async — their function bodies returnPin<Box<dyn Future<Output = T> + Send>>(alias:HookFuture<'_, T>). Sync closure bodies migrate by wrapping inBox::pin(async move { ... }). Closures can now.awaitLLM calls and other async work directly without atokio::task::block_in_placebridge.Pre-existing-behaviour preservation note (phi-core 0.9.0): tool-update hooks stay sync.
BeforeToolExecutionUpdateFnandAfterToolExecutionUpdateFnremainArc<dyn Fn(&str, &str, &str) -> bool + Send + Sync>/Arc<dyn Fn(&str, &str, &str) + Send + Sync>respectively. Async-ifying them would cascade into theToolUpdateFncallback type and everyAgentTool::executebody that invokesctx.on_update(...)— a materially wider migration than the 0.9.0 scope. The veto decision inBeforeToolExecutionUpdateFnis synchronous so the surrounding emit gate works without an.awaitat every streamed tool-update; consumers that need async work at update-time should dispatch viatokio::spawn(...)inside the sync closure body. Tracked under the CHANGELOG[Unreleased]"Forward markers" section for a future release.InputFilter::filter()is also nowasync fnvia#[async_trait]— see Tools and the per-turn debug-capture surface atdebugging.md.
Steering & Follow-Ups
Steering
Steering messages interrupt the agent between tool executions. When the agent is executing multiple tool calls from a single LLM response, steering is checked after each tool completes. If a steering message is found:
- The current tool finishes normally
- All remaining tool calls are skipped with
is_error: trueand "Skipped due to queued user message" - The steering message is injected into context
- The loop continues with a new LLM call that sees the interruption
#![allow(unused)] fn main() { // While agent is running tools, redirect it: agent.steer(AgentMessage::Llm(Message::user("Stop that. Instead, explain what you found."))); }
Follow-Ups
Follow-up messages are checked after the agent would normally stop (no more tool calls, no steering). If follow-ups exist, the loop continues with them as new input — the agent doesn't need to be re-prompted.
#![allow(unused)] fn main() { // Queue work for after the agent finishes its current task: agent.follow_up(AgentMessage::Llm(Message::user("Now run the tests."))); agent.follow_up(AgentMessage::Llm(Message::user("Then commit the changes."))); }
Queue Modes
Both queues support two delivery modes:
| Mode | Behavior |
|---|---|
QueueMode::OneAtATime | Delivers one message per turn (default) |
QueueMode::All | Delivers all queued messages at once |
#![allow(unused)] fn main() { agent.set_steering_mode(QueueMode::All); agent.set_follow_up_mode(QueueMode::OneAtATime); }
Queue Management
#![allow(unused)] fn main() { agent.clear_steering_queue(); // Drop all pending steers agent.clear_follow_up_queue(); // Drop all pending follow-ups agent.clear_all_queues(); // Drop everything }
Low-Level API
When using agent_loop() directly, steering and follow-ups are provided via callback functions:
#![allow(unused)] fn main() { let config = AgentLoopConfig { get_steering_messages: Some(Box::new(|| { // Return Vec<AgentMessage> — checked between tool calls vec![] })), get_follow_up_messages: Some(Box::new(|| { // Return Vec<AgentMessage> — checked when agent would stop vec![] })), // ... }; }
Custom Compaction
By default, when context exceeds the token budget in ContextConfig, phi-core runs a 3-level compaction strategy: truncate tool outputs → summarize old turns → drop middle messages (legacy in-memory path via compact_messages()). When a Session is available, the modern system uses non-destructive CompactionBlock overlays — see compaction. You can replace this with your own CompactionStrategy.
CompactionStrategyvsBlockCompactionStrategy
CompactionStrategy— Legacy in-memory approach. Destructive: it mutates the message list directly. Used whenAgentContext.sessionisNone(no session persistence).BlockCompactionStrategy— New overlay approach. Non-destructive: it creates aCompactionBlockon theLoopRecordrather than altering the original messages. Used whenAgentContext.sessionisSome(session-backed execution). Original messages remain authoritative for replay and branching.
Example of a custom CompactionStrategy:
#![allow(unused)] fn main() { use phi_core::context::{CompactionStrategy, ContextConfig, CompactionConfig, compact_messages}; use phi_core::types::*; use std::sync::Arc; struct MyCompaction; impl CompactionStrategy for MyCompaction { fn compact( &self, messages: Vec<AgentMessage>, config: &ContextConfig, ) -> Vec<AgentMessage> { // Your logic here — then optionally delegate to the default: compact_messages(messages, config) } } // Modern pattern: set strategies via ContextConfig.compaction let context_config = ContextConfig { compaction: CompactionConfig { // in_memory_strategy: used when AgentContext.session is None (sub-agents, tests) in_memory_strategy: Some(Arc::new(MyCompaction)), // block_strategy: used when AgentContext.session is Some (session-backed execution) // block_strategy: Some(Arc::new(MyBlockCompaction)), ..CompactionConfig::default() }, ..ContextConfig::default() }; let agent = BasicAgent::new(model_config) .with_context_config(context_config); }
The in-memory strategy is called once per turn, right before the LLM call, whenever context_config is Some and AgentContext.session is None. When in_memory_strategy is None, DefaultCompaction (which wraps compact_messages()) is used automatically. When a session is present, block_strategy is used instead (defaulting to DefaultBlockCompaction).
Use Cases
Memory-aware compaction — Index messages into a vector store before they're dropped, so the agent can recall them later via a search tool:
#![allow(unused)] fn main() { struct MemoryAwareCompaction { memory: Arc<dyn MemoryStore>, } impl CompactionStrategy for MemoryAwareCompaction { fn compact( &self, messages: Vec<AgentMessage>, config: &ContextConfig, ) -> Vec<AgentMessage> { let compacted = compact_messages(messages.clone(), config); // Index what was dropped let dropped: Vec<_> = messages.iter() .filter(|m| !compacted.contains(m)) .collect(); if !dropped.is_empty() { self.memory.index(dropped); } compacted } } }
Semantic pointer compaction — Replace dropped messages with a marker so the agent knows context was lost:
#![allow(unused)] fn main() { struct SemanticPointerCompaction; impl CompactionStrategy for SemanticPointerCompaction { fn compact( &self, messages: Vec<AgentMessage>, config: &ContextConfig, ) -> Vec<AgentMessage> { let compacted = compact_messages(messages.clone(), config); let dropped_count = messages.len() - compacted.len(); if dropped_count == 0 { return compacted; } // Insert a marker after the first kept messages let mut result = compacted; let insert_at = config.compaction.keep_first_turns.min(result.len()); result.insert(insert_at, AgentMessage::Extension( ExtensionMessage::new("compaction_marker", serde_json::json!({ "dropped": dropped_count, "note": format!("{} earlier messages were compacted", dropped_count), })) )); result } } }
Priority-preserving compaction — Never drop messages containing important keywords:
#![allow(unused)] fn main() { struct PriorityPreservingCompaction { preserve_keywords: Vec<String>, } impl CompactionStrategy for PriorityPreservingCompaction { fn compact( &self, messages: Vec<AgentMessage>, config: &ContextConfig, ) -> Vec<AgentMessage> { let (priority, normal): (Vec<_>, Vec<_>) = messages.into_iter() .partition(|m| self.is_priority(m)); let mut compacted = compact_messages(normal, config); // Re-insert priority messages — they're never dropped for msg in priority { compacted.push(msg); } compacted } } }
Evaluational Parallelism
agent_loop_parallel runs the same prompt through multiple AgentLoopConfigs concurrently, evaluates the results with a pluggable EvaluationStrategy, and returns the winning branch. This is useful for multi-model comparison, A/B prompt testing, and selecting the best response among different reasoning approaches.
#![allow(unused)] fn main() { use phi_core::{agent_loop_parallel, PickFirstEvaluation, AgentContext, AgentLoopConfig}; use std::sync::Arc; let result = agent_loop_parallel( prompts, base_context, // cloned per branch; Arc tools shared vec![config_a, config_b], Arc::new(PickFirstEvaluation), tx, cancel, ).await; // result.selected_context feeds directly into agent_loop_continue() // result.selected_messages is the winning branch's output }
See Evaluational Parallelism for the full guide including built-in strategies, the LLM judge, and session continuity.
Evaluational Parallelism
Evaluational parallelism runs the same prompt through multiple AgentLoopConfigs
concurrently, evaluates the results with a pluggable strategy, and delivers the single
best outcome. This lets you compare models, prompt variants, or reasoning settings in
one call — then continue the session normally with the winner.
Overview
┌─ Config A ─► Branch A ─► response A ─┐
prompt ──────┤ ├─► Evaluate ─► selected response
└─ Config B ─► Branch B ─► response B ─┘
Every branch receives an identical copy of the base context (message history, tools) and
the same prompt. Branches run concurrently. After all branches finish, the
EvaluationStrategy picks the winner and returns its context and messages.
When to use evaluational parallelism vs. parallel sub-agents
| Evaluational parallelism | Parallel sub-agents | |
|---|---|---|
| Task structure | Same task, different configs | Different subtasks |
| Context shared | Yes (cloned base context) | No (isolated child contexts) |
| Result | One selected outcome | All results merged |
| Typical use | Multi-model comparison, A/B prompts | Divide-and-conquer work |
Entry point
#![allow(unused)] fn main() { pub async fn agent_loop_parallel( prompts: Vec<AgentMessage>, base_context: AgentContext, // cloned once per config configs: Vec<AgentLoopConfig>, // one per branch strategy: Arc<dyn EvaluationStrategy>, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> ParallelLoopResult }
base_context is cloned once per config entry — tools are Arc-shared (zero copy);
the message history is deep-cloned so branches start from identical state but diverge
independently.
Minimal example
#![allow(unused)] fn main() { use phi_core::{agent_loop_parallel, PickFirstEvaluation, AgentContext, AgentLoopConfig}; use phi_core::provider::ModelConfig; use std::sync::Arc; use tokio::sync::mpsc; use tokio_util::sync::CancellationToken; let config_a = AgentLoopConfig { model_config: ModelConfig::anthropic("claude-opus-4-6", "my-key", "claude-opus-4-6"), ..AgentLoopConfig::default() }; let config_b = AgentLoopConfig { model_config: ModelConfig::anthropic("claude-haiku-4-5", "my-key", "claude-haiku-4-5"), ..AgentLoopConfig::default() }; let (tx, mut rx) = mpsc::unbounded_channel(); let result = agent_loop_parallel( vec![AgentMessage::Llm(Message::user("Explain quantum entanglement."))], AgentContext { system_prompt: "Be concise.".into(), ..Default::default() }, vec![config_a, config_b], Arc::new(PickFirstEvaluation), // or any EvaluationStrategy tx, CancellationToken::new(), ) .await; println!("Selected branch: {}", result.selected_index); // Continue the session with the winning context // agent_loop_continue(&mut result.selected_context, &next_config, tx, cancel).await; }
ParallelLoopResult
#![allow(unused)] fn main() { pub struct ParallelLoopResult { pub selected_context: AgentContext, // winning branch's full context pub selected_messages: Vec<AgentMessage>, // messages produced by the winner pub selected_index: usize, // 0-based index into original configs pub all_outcomes: Vec<ParallelLoopOutcome>,// remaining (non-selected) outcomes pub total_usage: Usage, // all branch usages + evaluation usage } }
Feed selected_context directly into agent_loop_continue() to resume the session
normally — parallel execution is a single-loop operation, not a special session mode.
Built-in strategies
TransparentEvaluation
Single-branch pass-through. Panics if more than one config is provided.
Use this when you want the parallel plumbing (events, ParallelLoopResult) for a
single config — zero evaluation overhead.
#![allow(unused)] fn main() { Arc::new(TransparentEvaluation) }
PickFirstEvaluation
Always selects index 0 regardless of content.
Deterministic, zero-cost. Useful for testing and debugging multi-branch setups where you only care about the first config's output.
#![allow(unused)] fn main() { Arc::new(PickFirstEvaluation) }
TokenEfficientEvaluation
Selects the branch with the lowest total token usage.
Prefer when cost or latency matters more than response depth. The model that solved the task most concisely wins.
#![allow(unused)] fn main() { Arc::new(TokenEfficientEvaluation) }
ElaborateEvaluation
Selects the branch with the highest total token usage.
Prefer when depth and thoroughness are the priority. The most verbose response wins — useful when you want the most comprehensive analysis.
#![allow(unused)] fn main() { Arc::new(ElaborateEvaluation) }
LlmJudgeEvaluation
Uses a separate LLM call to evaluate which branch produced the best response.
#![allow(unused)] fn main() { use phi_core::LlmJudgeEvaluation; Arc::new(LlmJudgeEvaluation { judge_config: AgentLoopConfig { model_config: ModelConfig::anthropic("claude-opus-4-6", "my-key", "claude-opus-4-6"), context_config: Some(ContextConfig { max_context_tokens: 100_000, ..Default::default() }), ..AgentLoopConfig::default() }, system_prompt: None, // use built-in judge prompt }) }
agent_loop_continue mode
When prompts is empty, agent_loop_parallel routes each branch to
agent_loop_continue instead of agent_loop. This lets you run parallel evaluation
from an existing conversation context — the user query is already the last message in
base_context.
#![allow(unused)] fn main() { // The user query is the last message in context (no new prompts to add). let result = agent_loop_parallel( vec![], // empty → agent_loop_continue mode base_context, // must be non-empty and not end on an assistant message configs, strategy, tx, cancel, ) .await; }
Same preconditions as agent_loop_continue apply: base_context.messages must be
non-empty and must not end on an assistant message.
original_context_len on ParallelLoopOutcome
Each outcome carries original_context_len: usize — the number of messages in the
cloned context at the moment the branch was dispatched:
#![allow(unused)] fn main() { pub struct ParallelLoopOutcome { // ... pub original_context_len: usize, } }
context.messages[..original_context_len] is the shared base context all branches
started from. Messages at [original_context_len..] are new messages produced by
that branch.
Evaluation strategies use this field to extract the original user query and prior
conversation history without separate bookkeeping, regardless of whether
agent_loop or agent_loop_continue mode was used.
LLM Judge — prompt construction and comprehension criteria
What the judge sees
The judge receives only clean, relevant content:
- Prior conversation context (new): the conversation history before the user
query, formatted as a human-readable transcript. Tool call arguments and images
are stripped — only
Content::Textsurvives. Omitted from the prompt when empty. - Original query: text extracted from user messages in
prompts(agent_loop mode), or from the lastMessage::Userincontext.messages[..original_context_len](agent_loop_continue mode). Tool calls, images, and thinking are stripped. - Per-branch response: the text of the last
Message::Assistantin each branch'snew_messages. Tool calls, tool results, and intermediate multi-turn exchanges are stripped entirely — the judge evaluates outcomes, not reasoning traces.
Example judge prompt (with prior context):
Prior conversation context:
User: What is quantum mechanics?
Assistant: Quantum mechanics is the branch of physics that...
Original query:
Can you explain quantum entanglement in simple terms?
Response 1:
Quantum entanglement is when two particles share a quantum state...
Response 2:
Think of two magic dice...
Which response is best? Reply with ONLY the response number (e.g., "1" or "2").
Query extraction in agent_loop_continue mode
When prompts is empty, the judge cannot read the query directly from the prompts
slice. It instead locates the last Message::User in
outcome.context.messages[..original_context_len] and extracts its text content.
Everything before that message becomes the prior conversation context.
Judge's comprehension criteria
The judge can only make a fair comparison when it sees all N branch final responses simultaneously alongside the prior context and query. For this to work, the combined content must fit within the judge model's context window.
This condition — all content fitting in the judge's context at once — is called the judge's comprehension criteria.
The budget is derived automatically from judge_config.context_config.max_context_tokens
(if set). About 20% of the budget is reserved for the system prompt, query framing, and
overhead; the remaining 80% is allocated for prior context + branch responses combined.
When no context_config is set on judge_config, no compaction is applied (all content
is passed through as-is).
2-iteration compaction strategy
When the combined content exceeds the budget, compaction is applied in two iterations:
Iteration 1 — compact prior context only, outputs intact
The prior conversation context is compacted through 3 progressive tiers while branch outputs are preserved verbatim:
- Tier 1 — tail truncation: keep only the last 80 lines of the context transcript.
- Tier 2 — paragraph summary: keep only the first paragraph and last paragraph
(separated by
...). - Tier 3 — hard char limit: truncate to a per-response char limit derived from
the remaining budget, minimum 200 chars. The formula is
max(200, (token_budget * 4) / n)wherenis the number of texts being compacted and the* 4factor converts from tokens to chars (1 token ~ 4 chars estimate).
After each tier, the combined token estimate is re-checked. If the budget is satisfied, the judge proceeds with the compacted context and intact outputs.
Iteration 2 — compact both context and outputs independently
If iteration 1 cannot satisfy the budget even at tier 3, the context stays at its most-
compacted (tier-3) form and branch outputs are now compacted independently through the
same tiered compaction pipeline (legacy compact_messages(); see compaction for the modern CompactionBlock system).
prior context (tier-3) + outputs (tier-1 → 2 → 3) → check budget after each tier
If the criteria still cannot be satisfied after iteration 2, a ProgressMessage warning
is emitted to tx and the judge proceeds best-effort.
Why context is compacted first
Iteration 1 biases the judge towards seeing the complete, uncompacted branch outputs — the actual decision material. Prior conversation history is ancillary; trimming it first preserves the most important information for fair comparison.
Original responses are always preserved
Compaction only affects what the judge reads. The selected_messages field in
ParallelLoopResult always contains the original, uncompacted winning branch response.
Setting the judge's context limit
Set judge_config.context_config.max_context_tokens to the judge model's context window
size (in tokens). This enables the comprehension-criteria check:
#![allow(unused)] fn main() { context_config: Some(ContextConfig { max_context_tokens: 200_000, // Claude Opus 4.6 context window ..Default::default() }), }
Different judge models have different context windows — the limit is co-located with the model config that actually has the constraint.
Design decisions
original_context_len on outcome (not a separate parameter)
The EvaluationStrategy trait receives only outcomes and prompts. Embedding
original_context_len in each outcome avoids changing the trait signature and keeps all
outcome data co-located. Since all branches share the same base context, the value is
identical across outcomes — using outcomes[0] is idiomatic.
Same tier functions for context and output compaction
compact_tier1/2/3 were designed for document text but work equally well on a formatted
conversation transcript. Reusing the same primitives minimises code surface and keeps
compaction behaviour consistent.
Budget allocation — context gets priority (iteration 1) Iteration 1 compacts only the prior context, keeping outputs intact. This preserves the complete branch responses — the actual decision material — while trimming ancillary history first. Outputs are only compacted in iteration 2 when the context alone cannot satisfy the budget.
Session identity and loop IDs
All branches share the same session_id for traceability. Each branch gets a distinct
loop_id following the format:
{session_id}.{config_segment}.{N}
where config_segment is derived from config.config_id (if set) or auto-derived as
{provider}.{model-slug}[.thinking].
Example with two configs:
ses_abc123.anthropic.claude-opus-4-6.1
ses_abc123.anthropic.claude-haiku-4-5.2
The judge loop (if used) also runs in the same session:
ses_abc123.anthropic.claude-opus-4-6.3 ← judge's loop
Observability
Two events bracket the entire parallel execution:
#![allow(unused)] fn main() { AgentEvent::ParallelLoopStart { session_id: String, loop_ids: Vec<String>, // one per branch, in config order timestamp: DateTime<Utc>, } AgentEvent::ParallelLoopEnd { session_id: String, selected_loop_id: String, selected_config_index: usize, evaluation_usage: Usage, // judge LLM usage (zero if no judge) timestamp: DateTime<Utc>, } }
Events from all branches are interleaved in tx. Demultiplex by loop_id from each
branch's AgentStart event.
Session continuity
agent_loop_parallel is a single-loop operation. After it returns, call
agent_loop_continue on result.selected_context to continue the session:
#![allow(unused)] fn main() { let result = agent_loop_parallel(prompts, base_ctx, configs, strategy, tx, cancel).await; // The session continues normally with the winning branch's context let follow_up = agent_loop_continue( &mut result.selected_context, &next_config, tx2, cancel2, ) .await; }
Complete example — multi-model comparison with LLM judge
use phi_core::{ agent_loop_parallel, agent_loop_continue, AgentContext, AgentLoopConfig, AgentMessage, AgentEvent, Message, }; use phi_core::context::ContextConfig; use phi_core::LlmJudgeEvaluation; use phi_core::provider::ModelConfig; use std::sync::Arc; use tokio::sync::mpsc; use tokio_util::sync::CancellationToken; #[tokio::main] async fn main() { // Branch A: fast, cost-efficient model let config_a = AgentLoopConfig { model_config: ModelConfig::anthropic("claude-haiku-4-5", API_KEY, "claude-haiku-4-5"), ..AgentLoopConfig::default() }; // Branch B: powerful model let config_b = AgentLoopConfig { model_config: ModelConfig::anthropic("claude-opus-4-6", API_KEY, "claude-opus-4-6"), ..AgentLoopConfig::default() }; // Judge: evaluates which response is better let judge_config = AgentLoopConfig { model_config: ModelConfig::anthropic("claude-opus-4-6", API_KEY, "claude-opus-4-6"), context_config: Some(ContextConfig { max_context_tokens: 200_000, ..Default::default() }), ..AgentLoopConfig::default() }; let (tx, mut rx) = mpsc::unbounded_channel::<AgentEvent>(); let cancel = CancellationToken::new(); let result = agent_loop_parallel( vec![AgentMessage::Llm(Message::user("What is the most important physics discovery of the 20th century?"))], AgentContext { system_prompt: "You are a knowledgeable assistant.".into(), ..Default::default() }, vec![config_a, config_b], Arc::new(LlmJudgeEvaluation { judge_config, system_prompt: None }), tx, cancel, ) .await; println!("Selected branch: {}", result.selected_index); println!("Total tokens used: {}", result.total_usage.total_tokens); // Collect and display the winning response for msg in &result.selected_messages { if let phi_core::AgentMessage::Llm(phi_core::Message::Assistant { content, .. }) = msg { for block in content { if let phi_core::Content::Text { text } = block { println!("Response: {}", text); } } } } // Continue the session with the winner // let (tx2, _rx2) = mpsc::unbounded_channel(); // agent_loop_continue(&mut result.selected_context, &next_config, tx2, cancel2).await; }
Custom evaluation strategies
Implement EvaluationStrategy for custom evaluation logic:
#![allow(unused)] fn main() { use phi_core::{AgentEvent, AgentMessage, ParallelLoopOutcome, Usage}; use phi_core::{EvaluationDecision, EvaluationStrategy}; use async_trait::async_trait; use tokio::sync::mpsc; use tokio_util::sync::CancellationToken; struct LongestResponseEvaluation; #[async_trait::async_trait] impl EvaluationStrategy for LongestResponseEvaluation { async fn evaluate( &self, _prompts: &[AgentMessage], outcomes: &[ParallelLoopOutcome], _tx: &mpsc::UnboundedSender<AgentEvent>, _cancel: CancellationToken, ) -> (EvaluationDecision, Usage) { let idx = outcomes .iter() .enumerate() .max_by_key(|(_, o)| { // Sum all text content lengths across new messages o.new_messages.iter().filter_map(|m| m.as_llm()).flat_map(|msg| { if let phi_core::Message::Assistant { content, .. } = msg { content.iter().filter_map(|c| { if let phi_core::Content::Text { text } = c { Some(text.len()) } else { None } }).collect::<Vec<_>>() } else { vec![] } }).sum::<usize>() }) .map(|(i, _)| i) .unwrap_or(0); (EvaluationDecision::Select(idx), Usage::default()) } } }
Messages & Events
Message Types
Message
The core LLM message type, tagged by role:
#![allow(unused)] fn main() { pub enum Message { User { content: Vec<Content>, timestamp: u64, }, Assistant { content: Vec<Content>, stop_reason: StopReason, model: String, provider: String, usage: Usage, timestamp: u64, error_message: Option<String>, }, ToolResult { tool_call_id: String, tool_name: String, content: Vec<Content>, is_error: bool, timestamp: u64, child_loop_id: Option<String>, // set by sub-agent tools }, } }
Create user messages easily:
#![allow(unused)] fn main() { let msg = Message::user("Hello, world!"); }
AgentMessage
Wraps Message with support for extension messages (UI-only, notifications, etc.):
#![allow(unused)] fn main() { pub enum AgentMessage { Llm(LlmMessage), Extension(ExtensionMessage), } pub struct LlmMessage { pub message: Message, /// Which turn produced this message. `None` for messages that predate /// turn tracking or are created outside the agent loop. pub turn_id: Option<TurnId>, } pub struct ExtensionMessage { pub role: String, pub kind: String, pub data: serde_json::Value, } }
Create extension messages with the convenience constructor:
#![allow(unused)] fn main() { let ext = ExtensionMessage::new("status_update", serde_json::json!({"status": "running"})); let msg = AgentMessage::Extension(ext); }
The kind field categorizes the extension (e.g., "status_update", "ui_event", "notification"). Use as_llm() to extract the Message if it's an LLM message. LlmMessage wraps a Message with an optional TurnId { loop_id, turn_index } for compaction tracking — this allows the compaction system to identify which turn produced each message. The default convert_to_llm function filters out Extension messages before sending to the provider.
All core message types implement Serialize, Deserialize, Clone, and PartialEq, enabling state persistence and test assertions.
Content
Each message contains Vec<Content>:
#![allow(unused)] fn main() { pub enum Content { Text { text: String }, Image { data: String, mime_type: String }, Thinking { thinking: String, signature: Option<String> }, ToolCall { id: String, name: String, arguments: serde_json::Value }, } }
An assistant message can contain multiple content blocks — e.g., thinking + text + tool calls.
The signature field on Content::Thinking is a cryptographic integrity token issued by the LLM provider (Anthropic calls it signature, OpenAI calls it encrypted_content, Gemini calls it thought_signature). It must be echoed back unmodified in multi-turn conversations — tampering or omitting it causes the provider to reject the request. It is None on providers that don't support extended thinking or on the first-turn generation.
StopReason
#![allow(unused)] fn main() { pub enum StopReason { Stop, // Natural completion Length, // Hit max tokens ToolUse, // Wants to call tools Error, // Provider error Aborted, // Cancelled by user MaxTurns, // Reached maximum allowed turns UserStop, // Explicit user stop command Handoff, // Handing off to a human operator GuardRail, // Stopped by content moderation / safety filter ContextCompacted, // Context was compacted to fit within limits Paused, // Paused waiting for external input } }
Usage
Token usage from the provider:
#![allow(unused)] fn main() { pub struct Usage { pub input: u64, pub output: u64, pub cache_read: u64, pub cache_write: u64, pub total_tokens: u64, } }
AgentEvent
Events emitted during the agent loop for real-time UI updates:
| Event | When |
|---|---|
AgentStart { agent_id, session_id, loop_id, parent_loop_id, continuation_kind, config_snapshot, timestamp } | Loop begins. loop_id is "{session_id}.{config_id}.{N}". parent_loop_id is Some for continuations and sub-agents. continuation_kind is a ContinuationKind (Initial for first loops, Default/Rerun/Branch/Compaction for continuations). config_snapshot is Option<LoopConfigSnapshot> capturing model/provider settings for the loop. |
AgentEnd { messages, timestamp, rejection } | Loop finishes; rejection is Some when an InputFilter blocked input |
TurnStart { turn_index, timestamp, triggered_by } | New LLM call starting; turn_index is 0-based, triggered_by is User | SubAgent | Continuation | Branch |
TurnEnd { message, timestamp, tool_results } | LLM call + tool execution complete |
MessageStart { message } | A message is available |
MessageUpdate { message, delta } | Streaming delta arrived |
MessageEnd { message } | Message finalized |
ToolExecutionStart { tool_call_id, tool_name, args } | Tool about to run |
ToolExecutionUpdate { tool_call_id, tool_name, partial_result } | Tool progress |
ToolExecutionEnd { tool_call_id, tool_name, result, is_error, child_loop_id } | Tool finished. child_loop_id is Some when the tool was a sub-agent — it identifies the child loop that ran. |
ProgressMessage { tool_call_id, tool_name, text } | User-facing progress text from a tool |
InputRejected { reason } | Input filter rejected the user's message |
StreamDelta
Deltas within MessageUpdate:
#![allow(unused)] fn main() { pub enum StreamDelta { Text { delta: String }, Thinking { delta: String }, ToolCallDelta { delta: String }, } }
Agent State
The Agent struct provides access to its current state:
#![allow(unused)] fn main() { // Check if the agent is currently streaming a response if agent.is_streaming() { // Use steer() or follow_up() instead of prompt() agent.steer(AgentMessage::Llm(Message::user("New instruction"))); } // Access the full message history let messages: &[AgentMessage] = agent.messages(); // Check the last message if let Some(last) = messages.last() { println!("Last message role: {}", last.role()); } }
The is_streaming() flag is true between prompt()/continue_loop() call and completion. While streaming, calling prompt() will panic — use steer() or follow_up() instead.
Tools
The AgentTool Trait
Every tool implements AgentTool:
#![allow(unused)] fn main() { #[async_trait] pub trait AgentTool: Send + Sync { fn name(&self) -> &str; fn label(&self) -> &str; fn description(&self) -> &str; fn parameters_schema(&self) -> serde_json::Value; async fn execute( &self, params: serde_json::Value, ctx: ToolContext, ) -> Result<ToolResult, ToolError>; } }
| Method | Purpose |
|---|---|
name() | Unique ID sent to LLM (e.g., "bash") |
label() | Human-readable name for UI (e.g., "Run Command") |
description() | Tells the LLM what the tool does |
parameters_schema() | JSON Schema for the tool's parameters |
execute() | Runs the tool, returns ToolResult or ToolError. Receives a ToolContext with cancellation, update, and progress callbacks. |
ToolContext
All execution context is bundled into a single struct, making the trait easier to extend in the future:
#![allow(unused)] fn main() { pub struct ToolContext { pub tool_call_id: String, pub tool_name: String, pub cancel: CancellationToken, pub on_update: Option<ToolUpdateFn>, pub on_progress: Option<ProgressFn>, } }
| Field | Purpose |
|---|---|
tool_call_id | Unique ID for this tool call (for correlating events) |
tool_name | Name of the tool being executed |
cancel | Cancellation token — check ctx.cancel.is_cancelled() in long-running tools |
on_update | Callback for streaming partial ToolResult updates to the UI — carries structured data (ToolResult with content + details), emits AgentEvent::ToolExecutionUpdate. Use when you need progress percentages, partial results, or structured metadata. |
on_progress | Callback for lightweight text-only status messages — takes a single String, emits AgentEvent::ProgressMessage. Use for simple human-readable status lines (e.g., "Compiling...", "Almost done..."). |
ToolContext implements Clone and Debug.
ToolResult
#![allow(unused)] fn main() { pub struct ToolResult { pub content: Vec<Content>, pub details: serde_json::Value, } }
The content is sent back to the LLM. The details field holds metadata (not sent to the LLM) for UI/logging.
ToolError
#![allow(unused)] fn main() { pub enum ToolError { Failed(String), NotFound(String), InvalidArgs(String), Cancelled, } }
Errors are converted to ToolResult with is_error: true and sent back to the LLM so it can recover.
Implementing a Custom Tool
#![allow(unused)] fn main() { use phi_core::types::*; use async_trait::async_trait; pub struct WeatherTool; #[async_trait] impl AgentTool for WeatherTool { fn name(&self) -> &str { "get_weather" } fn label(&self) -> &str { "Weather" } fn description(&self) -> &str { "Get current weather for a city." } fn parameters_schema(&self) -> serde_json::Value { serde_json::json!({ "type": "object", "properties": { "city": { "type": "string", "description": "City name" } }, "required": ["city"] }) } async fn execute( &self, params: serde_json::Value, _ctx: ToolContext, ) -> Result<ToolResult, ToolError> { let city = params["city"].as_str() .ok_or(ToolError::InvalidArgs("missing city".into()))?; // Call weather API... Ok(ToolResult { content: vec![Content::Text { text: format!("Weather in {}: 72°F, sunny", city), }], details: serde_json::Value::Null, }) } } }
Register custom tools alongside defaults:
#![allow(unused)] fn main() { use phi_core::tools::default_tools; let mut tools = default_tools(); tools.push(Box::new(WeatherTool)); let agent = BasicAgent::new(model_config).with_tools(tools); }
Error Handling
Return Err(ToolError) on failure, not Ok with error text. When a tool returns Err, the agent loop converts it to a Message::ToolResult with is_error: true and sends it to the LLM. The LLM sees the error and can self-correct — retry with different arguments, try a different approach, or explain the failure to the user.
#![allow(unused)] fn main() { async fn execute(&self, params: serde_json::Value, _ctx: ToolContext) -> Result<ToolResult, ToolError> { let path = params["path"].as_str() .ok_or(ToolError::InvalidArgs("missing 'path'".into()))?; let content = std::fs::read_to_string(path) .map_err(|e| ToolError::Failed(format!("Cannot read {}: {}", path, e)))?; Ok(ToolResult { content: vec![Content::Text { text: content }], details: serde_json::Value::Null, }) } }
Exception: BashTool. The built-in BashTool returns Ok even on non-zero exit codes, with both stdout and stderr in the result. This is intentional — the LLM needs to see the actual error output (compilation errors, test failures, etc.) to diagnose and fix issues. Only truly exceptional failures (e.g., command not found, cancellation) return Err.
Tool Execution Flow
- LLM returns
Content::ToolCallblocks in its response - Agent loop emits
ToolExecutionStartfor each - Tool's
execute()is called with parsed arguments - Result (or error) is wrapped in
Message::ToolResult ToolExecutionEndis emitted- All tool results are added to context
- Loop continues with another LLM call
Streaming Tool Output
Long-running tools can stream progress updates to the UI via the on_update callback. Each call emits a ToolExecutionUpdate event. Partial results are for UI/logging only — they are not sent to the LLM. Only the final ToolResult returned from execute() becomes part of the conversation.
The ToolUpdateFn type
#![allow(unused)] fn main() { pub type ToolUpdateFn = Arc<dyn Fn(ToolResult) + Send + Sync>; }
Basic usage
Call on_update whenever you have progress to report:
#![allow(unused)] fn main() { use phi_core::types::*; struct DataProcessorTool; #[async_trait] impl AgentTool for DataProcessorTool { // ... name, label, description, parameters_schema ... async fn execute( &self, params: serde_json::Value, ctx: ToolContext, ) -> Result<ToolResult, ToolError> { let rows = fetch_rows(¶ms)?; let total = rows.len(); for (i, row) in rows.iter().enumerate() { // Check for cancellation if ctx.cancel.is_cancelled() { return Err(ToolError::Cancelled); } process_row(row); // Stream progress every 100 rows if i % 100 == 0 { if let Some(ref cb) = &ctx.on_update { cb(ToolResult { content: vec![Content::Text { text: format!("Processed {}/{} rows", i, total), }], details: serde_json::json!({"progress": i as f64 / total as f64}), }); } } } Ok(ToolResult { content: vec![Content::Text { text: format!("Processed all {} rows", total), }], details: serde_json::Value::Null, }) } } }
Consuming updates in your UI
Updates arrive as AgentEvent::ToolExecutionUpdate events on the same event stream as all other agent events:
#![allow(unused)] fn main() { while let Some(event) = rx.recv().await { match event { AgentEvent::ToolExecutionStart { tool_name, .. } => { println!("⏳ {} started", tool_name); } AgentEvent::ToolExecutionUpdate { tool_name, partial_result, .. } => { // Show progress in your UI if let Some(Content::Text { text }) = partial_result.content.first() { println!(" 📊 {}: {}", tool_name, text); } } AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => { println!("{} {}", if is_error { "❌" } else { "✅" }, tool_name); } AgentEvent::ProgressMessage { tool_name, text, .. } => { println!(" 💬 {}: {}", tool_name, text); } _ => {} } } }
Progress Messages
In addition to on_update (which streams partial ToolResult values), tools can emit lightweight text-only progress messages via ctx.on_progress. These appear as AgentEvent::ProgressMessage events:
#![allow(unused)] fn main() { async fn execute(&self, params: serde_json::Value, ctx: ToolContext) -> Result<ToolResult, ToolError> { if let Some(ref progress) = &ctx.on_progress { progress("Starting analysis...".into()); } // ... do work ... if let Some(ref progress) = &ctx.on_progress { progress("Almost done...".into()); } Ok(ToolResult { /* ... */ }) } }
Use on_progress for simple status text. Use on_update when you need structured data (progress percentages, partial results).
Guidelines
- Call
on_updateas often as useful — there's no rate limit. The callback is synchronous and cheap. - Always check
ctx.on_update.is_some()before building theToolResult. IfNone, the loop isn't interested in updates (e.g., testing). - Use
detailsfor structured data —contentis for human-readable text,detailscan carry progress percentages, byte counts, etc. - Don't rely on updates reaching the LLM — they won't. Only the final return value is added to context.
- Simple tools don't need it — if your tool completes in <1 second, just ignore
ctx(prefix with_ctxto suppress the warning).
End-to-end example
Here's a complete example: a CLI agent with a deploy tool that streams progress. The human sees real-time output while the LLM only gets the final result.
use phi_core::BasicAgent; use phi_core::provider::ModelConfig; use phi_core::types::*; /// A tool that deploys an app and streams each step. struct DeployTool; #[async_trait] impl AgentTool for DeployTool { fn name(&self) -> &str { "deploy" } fn label(&self) -> &str { "Deploy App" } fn description(&self) -> &str { "Deploy the application to production." } fn parameters_schema(&self) -> serde_json::Value { serde_json::json!({ "type": "object", "properties": { "env": { "type": "string", "description": "Target environment" } }, "required": ["env"] }) } async fn execute( &self, params: serde_json::Value, ctx: ToolContext, ) -> Result<ToolResult, ToolError> { let env = params["env"].as_str().unwrap_or("staging"); let steps = ["Building image", "Running tests", "Pushing to registry", "Rolling out"]; for (i, step) in steps.iter().enumerate() { if ctx.cancel.is_cancelled() { return Err(ToolError::Cancelled); } // Stream each step to the UI if let Some(ref cb) = &ctx.on_update { cb(ToolResult { content: vec![Content::Text { text: format!("[{}/{}] {}...", i + 1, steps.len(), step), }], details: serde_json::json!({ "step": i + 1, "total": steps.len(), "phase": step, }), }); } // Simulate work tokio::time::sleep(std::time::Duration::from_secs(2)).await; } // Only this final result is sent to the LLM Ok(ToolResult { content: vec![Content::Text { text: format!("Successfully deployed to {}", env), }], details: serde_json::json!({"env": env, "status": "success"}), }) } } #[tokio::main] async fn main() { let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let mut agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_system_prompt("You are a deployment assistant.") .with_tools(vec![Box::new(DeployTool)]); let mut rx = agent.prompt("Deploy to production").await; while let Some(event) = rx.recv().await { match event { // LLM text streaming AgentEvent::MessageUpdate { delta: StreamDelta::Text { delta }, .. } => print!("{}", delta), // Tool progress streaming AgentEvent::ToolExecutionStart { tool_name, .. } => { println!("\n🚀 Starting {}...", tool_name); } AgentEvent::ToolExecutionUpdate { partial_result, .. } => { if let Some(Content::Text { text }) = partial_result.content.first() { println!(" {}", text); } } AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => { if is_error { println!(" ❌ {} failed", tool_name); } else { println!(" ✅ {} complete", tool_name); } } AgentEvent::ProgressMessage { text, .. } => { println!(" 💬 {}", text); } AgentEvent::AgentEnd { .. } => break, _ => {} } } }
Running this produces:
🚀 Starting deploy...
[1/4] Building image...
[2/4] Running tests...
[3/4] Pushing to registry...
[4/4] Rolling out...
✅ deploy complete
Successfully deployed to production. The deployment completed all 4 stages.
The human sees each step as it happens. The LLM only sees "Successfully deployed to production" and can continue the conversation from there.
How agents benefit
When an AI agent (like a coding assistant) uses phi-core, streaming tool output helps in two ways:
-
Human oversight — The human watching the agent work sees real-time progress instead of waiting for a tool to finish. A bash command running
cargo buildcan stream compiler output as it happens, so the human can interrupt early if something is wrong. -
Agent UIs — Tools like web dashboards, IDE extensions, or chat interfaces can render live progress bars, log tails, or status indicators. The
detailsfield inToolResultcarries structured data (progress percentage, byte counts, etc.) that UIs can render however they want.
The LLM itself doesn't see updates — it works with final results only. This is intentional: partial output would waste context tokens and confuse the model. The streaming is purely a human-facing feature.
Execution Strategies
When the LLM returns multiple tool calls in a single response (e.g., "read file A, read file B, run bash C"), ToolExecutionStrategy controls how they run:
The enum is defined with #[derive(Default)] and Parallel carries the #[default] attribute:
#![allow(unused)] fn main() { pub enum ToolExecutionStrategy { Sequential, #[default] Parallel, Batched { size: usize }, } }
| Strategy | Behavior |
|---|---|
Sequential | One at a time. Steering checked between each tool. Use for debugging or tools with shared mutable state. |
Parallel (#[default]) | All tool calls run concurrently via futures::join_all. Steering checked after all complete. Best latency for independent tools. |
Batched { size: usize } | Run in groups of size. Steering checked between batches. Balances speed with human-in-the-loop control. |
Configuration
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::ModelConfig; use phi_core::types::ToolExecutionStrategy; // Default — parallel (fastest) let agent = BasicAgent::new(model_config.clone()); // Sequential (debug / shared state) let agent = BasicAgent::new(model_config.clone()) .with_tool_execution(ToolExecutionStrategy::Sequential); // Batched — 3 at a time let agent = BasicAgent::new(model_config.clone()) .with_tool_execution(ToolExecutionStrategy::Batched { size: 3 }); }
When to use each
- Parallel (default): Most tool calls are independent — file reads, searches, API calls. Running them concurrently can cut latency dramatically (3 tools × 50ms = ~50ms instead of ~150ms).
- Sequential: When tools have side effects that depend on order, or when you need fine-grained steering control between each tool.
- Batched: When you want parallelism but also want steering checkpoints. For example,
Batched { size: 3 }runs 3 tools concurrently, checks for user interrupts, then runs the next 3.
Steering messages are always checked between execution units (between each tool in Sequential, after all tools in Parallel, between batches in Batched). If a user interrupts, remaining tools are skipped.
Context Management
Long-running agents accumulate messages that exceed the model's context window. phi-core provides token tracking, overflow detection, tiered compaction, and execution limits.
The context module is split into sub-modules: token, config, tracker, compaction, strategy, compact_messages, execution, orchestration.
Token Estimation
Fast estimation without external tokenizer dependencies:
#![allow(unused)] fn main() { use phi_core::context::{estimate_tokens, message_tokens, total_tokens}; estimate_tokens("Hello world"); // ~3 tokens (chars / 4) message_tokens(&agent_message); // estimate for a single message total_tokens(&messages); // estimate for all messages }
Context Tracking
ContextTracker combines real token counts from provider responses with estimation for new messages — more accurate than pure estimation:
#![allow(unused)] fn main() { use phi_core::context::ContextTracker; let mut tracker = ContextTracker::new(); // After each assistant response, record the real usage: tracker.record_usage(&assistant_usage, message_index); // Get current context size (real usage + estimated trailing): let tokens = tracker.estimate_context_tokens(agent.messages()); // After compaction, reset the tracker: tracker.reset(); }
When no usage data is available, it falls back to chars/4 estimation.
Context Overflow Detection
When the context exceeds a model's window, providers return overflow errors. phi-core detects these automatically across all major providers.
HTTP-level detection
Providers that check before streaming (Google, Bedrock, Vertex) return ProviderError::ContextOverflow:
#![allow(unused)] fn main() { use phi_core::provider::ProviderError; match agent.prompt("...").await { // The loop already handles this — but you can also match it: Err(ProviderError::ContextOverflow { message }) => { // Compact and retry } _ => {} } }
ProviderError::classify() auto-detects overflow from error messages covering Anthropic, OpenAI, Google, AWS Bedrock, xAI, Groq, OpenRouter, llama.cpp, LM Studio, MiniMax, Kimi, GitHub Copilot, and generic patterns.
Message-level detection
SSE-based providers (Anthropic, OpenAI) return overflow as a StopReason::Error message. Check with:
#![allow(unused)] fn main() { if message.is_context_overflow() { // Compact and retry } }
Handling overflow in your application
phi-core provides the detection and building blocks. Your application wires the compaction strategy:
#![allow(unused)] fn main() { // Proactive: check before each prompt let tokens = tracker.estimate_context_tokens(agent.messages()); if tokens > context_window - reserve { let compacted = compact_messages(agent.messages().to_vec(), &config); agent.replace_messages(compacted); } // Reactive: catch overflow errors // ... on ContextOverflow or message.is_context_overflow(): // compact, then retry with agent.continue_loop() }
For LLM-based summarization (asking the model to summarize old messages), implement that in your application layer — phi-core provides replace_messages() and compact_messages() as building blocks.
ContextConfig
#![allow(unused)] fn main() { pub struct ContextConfig { pub max_context_tokens: usize, // Default: 100,000 pub system_prompt_tokens: usize, // Default: 4,000 pub compaction: CompactionConfig, // Primary compaction settings // Custom token counter (serde-skipped). None → HeuristicTokenCounter (chars/4). pub token_counter: Option<Arc<dyn TokenCounter>>, // Legacy backward-compat fields (prefer CompactionConfig equivalents): pub keep_recent: usize, // Default: 10 pub keep_first: usize, // Default: 2 pub tool_output_max_lines: usize, // Default: 50 } pub struct CompactionConfig { // ── WHEN to compact ── pub compact_at_pct: f64, // Default: 0.90 (90%) pub compact_budget_threshold_pct: f64, // Default: 0.05 (5%) pub compaction_scope: CompactionScope, // Default: FixedCount(3) // ── HOW to compact ── pub keep_first_turns: usize, // Default: 2 pub keep_recent_turns: usize, // Default: 10 pub max_summary_tokens: usize, // Default: 2_000 (budget, not per-turn) pub tool_output_max_lines: usize, // Default: 50 } }
CompactionScope
Controls how many earlier loops are included in compaction and context loading:
| Variant | Description |
|---|---|
FixedCount(usize) | Compact a fixed number of earlier loops on the active chain. Default: FixedCount(3). |
TokenBudget | Walk the chain backward, accumulating per-loop token estimates, and stop when max_context_tokens would be exceeded. Loops whose raw messages exceed the budget are still included — their compacted summaries will fit. |
See compaction.md for full details on the non-destructive overlay model.
Tiered Compaction
compact_messages() tries each level in order, stopping as soon as messages fit the budget:
Level 1: Truncate Tool Outputs
Replaces long tool outputs with head + tail (keeping first N/2 and last N/2 lines). This is the cheapest — preserves conversation structure, typically saves 50-70% in coding sessions.
Level 2: Summarize Old Turns
Keeps the last keep_recent messages in full detail. Older assistant messages are replaced with one-line summaries like "[Summary] [Assistant used 3 tool(s)]", and their tool results are dropped.
Level 3: Drop Middle Messages
Keeps keep_first messages from the start and keep_recent from the end, dropping everything in between. A marker message notes how many were removed.
ExecutionLimits
Prevents runaway agents:
#![allow(unused)] fn main() { pub struct ExecutionLimits { pub max_turns: usize, // Default: 50 pub max_total_tokens: usize, // Default: 1,000,000 pub max_duration: Duration, // Default: 600s (10 min) pub max_cost: Option<f64>, // Default: None (no cost cap) } }
max_cost caps cumulative dollar cost for the run. Requires AgentLoopConfig.cost_config to be set — without pricing rates the accumulated cost is always 0.0 and this limit has no effect.
When a limit is reached, the agent stops with a message like "[Agent stopped: Max turns reached (50/50)]".
Disabling Context Management
#![allow(unused)] fn main() { let agent = BasicAgent::new(model_config) .without_context_management(); }
This sets both context_config and execution_limits to None.
Prompt Caching
phi-core automatically optimizes API costs through prompt caching. For providers that support it, stable content (system prompts, tool definitions, conversation history) is cached between turns, giving you up to 90% savings on input tokens.
How It Works
In a multi-turn agent loop, each request sends the full context: system prompt + tools + conversation history. Without caching, you pay full price for all of it every turn. With caching, the provider reuses previously processed prefixes.
Provider Support
| Provider | Caching Type | Savings | Framework Action |
|---|---|---|---|
| Anthropic | Explicit (cache breakpoints) | 90% on hits | ✅ Auto-placed |
| OpenAI | Automatic (>1024 tokens) | 50% on hits | None needed |
| Google Gemini | Implicit (automatic) | Varies | None needed |
| Azure OpenAI | Automatic (same as OpenAI) | 50% on hits | None needed |
| Amazon Bedrock | Not yet implemented | N/A | CacheConfig accepted but no breakpoints placed |
What Gets Cached (Anthropic)
phi-core places up to 3 cache breakpoints automatically:
- System prompt — stable across all turns
- Tool definitions — rarely change between turns
- Conversation history — second-to-last message, so the growing prefix is cached
This means on a typical multi-turn conversation, only the latest user message and the new assistant response cost full price.
Configuration
Caching is enabled by default with automatic breakpoint placement. No configuration needed for optimal behavior.
Disable Caching
Use CacheStrategy::Disabled to turn off all cache breakpoint placement while keeping the config structure intact. Alternatively, set enabled: false on the CacheConfig master switch.
#![allow(unused)] fn main() { use phi_core::{BasicAgent, CacheConfig, CacheStrategy}; use phi_core::provider::ModelConfig; let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); // Option 1: CacheStrategy::Disabled (preferred — explicit intent) let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_cache_config(CacheConfig { strategy: CacheStrategy::Disabled, ..Default::default() }); // Option 2: Master switch (equivalent effect) let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_cache_config(CacheConfig { enabled: false, ..Default::default() }); }
Fine-Grained Control
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_cache_config(CacheConfig { enabled: true, strategy: CacheStrategy::Manual { cache_system: true, cache_tools: true, cache_messages: false, // Don't cache conversation history }, }); }
Monitoring Cache Usage
Every Usage struct includes cache statistics:
#![allow(unused)] fn main() { // After a response: let usage = message.usage(); // from assistant message println!("Cache read: {} tokens", usage.cache_read); println!("Cache write: {} tokens", usage.cache_write); println!("Cache hit rate: {:.1}%", usage.cache_hit_rate() * 100.0); }
cache_read— tokens served from cache (cheap)cache_write— tokens written to cache (slightly more than base price)cache_hit_rate()— fraction of input tokens from cache (0.0–1.0)
Cost Impact
For a typical 10-turn agent conversation with Anthropic Claude:
| Without Caching | With Caching (auto) |
|---|---|
| ~500K input tokens billed at full price | ~50K at full price + ~450K at 10% price |
| $2.50 (Sonnet) | $0.39 (Sonnet) |
That's an 84% cost reduction with zero configuration.
Best Practices
- Keep system prompts stable — changing the system prompt between turns invalidates the cache
- Don't shuffle tools — tool order matters for cache prefix matching
- Let it work automatically — the default
CacheStrategy::Autois optimal for most use cases. The three strategies areAuto(recommended),Disabled(no breakpoints), andManual(fine-grained control) - Monitor
cache_hit_rate()— if it's consistently low, check if your system prompt or tools are changing unexpectedly
Retry with Backoff
When an LLM provider returns a transient error — rate limit (HTTP 429) or network failure — phi-core automatically retries with exponential backoff and jitter. No configuration required; it works out of the box.
How it works
Request → Error? → Retryable? → Wait (backoff + jitter) → Retry → ...
↓ No
Fail immediately
- The agent loop calls the provider
- If the provider returns a retryable error:
- If a
retry-afterdelay was provided (rate limits), use that - Otherwise, calculate delay:
initial_delay × multiplier^(attempt-1)with ±20% jitter - Wait, then retry
- If a
- After
max_retriesattempts, the error propagates normally
What gets retried
| Error Type | Retried? | Why |
|---|---|---|
RateLimited (429) | ✅ Yes | Temporary — provider will accept requests again soon |
Network | ✅ Yes | Transient — connection resets, timeouts, DNS failures |
Auth (401/403) | ❌ No | Permanent — wrong API key won't fix itself |
Api (400, etc.) | ❌ No | Permanent — bad request won't change on retry |
Cancelled | ❌ No | User-initiated — respect the cancellation |
Default configuration
#![allow(unused)] fn main() { RetryConfig { max_retries: 3, // Up to 3 retry attempts initial_delay_ms: 1000, // 1 second before first retry backoff_multiplier: 2.0, // Double the delay each attempt max_delay_ms: 30_000, // Cap at 30 seconds } }
With defaults, the retry delays are approximately:
- Attempt 1: ~1s
- Attempt 2: ~2s
- Attempt 3: ~4s
(±20% jitter to avoid thundering herd when multiple agents hit the same provider)
Configuration
Using the Agent builder
#![allow(unused)] fn main() { use phi_core::{BasicAgent, RetryConfig}; use phi_core::provider::ModelConfig; let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); // Default — 3 retries, exponential backoff (recommended) let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )); // Custom — more retries, longer initial delay let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_retry_config(RetryConfig { max_retries: 5, initial_delay_ms: 2000, backoff_multiplier: 2.0, max_delay_ms: 60_000, }); // Disable retries entirely let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_retry_config(RetryConfig::none()); }
Using AgentLoopConfig directly
#![allow(unused)] fn main() { use phi_core::agent_loop::AgentLoopConfig; use phi_core::RetryConfig; let config = AgentLoopConfig { // ...other fields... retry_config: RetryConfig { max_retries: 3, initial_delay_ms: 1000, backoff_multiplier: 2.0, max_delay_ms: 30_000, }, ..Default::default() }; }
Rate limit headers
When a provider returns ProviderError::RateLimited { retry_after_ms: Some(5000) }, phi-core uses that exact delay instead of the calculated backoff. This respects the provider's guidance — if Anthropic says "retry after 5 seconds", we wait 5 seconds, not our own estimate.
If no retry_after_ms is provided, the exponential backoff kicks in.
Observability
Retry attempts are logged via tracing at the WARN level:
WARN Provider error (attempt 1/3), retrying in 1.1s: Rate limited, retry after 1000ms
WARN Provider error (attempt 2/3), retrying in 2.3s: Rate limited, retry after 2000ms
Subscribe to tracing events in your application to surface these in your UI:
#![allow(unused)] fn main() { use tracing_subscriber; // Simple stderr logging tracing_subscriber::fmt::init(); // Or filter to just retries tracing_subscriber::fmt() .with_env_filter("phi_core::provider::retry=warn") .init(); }
Design notes
- Retry lives in the agent loop, not inside individual providers. One config controls all retry behavior.
- Jitter prevents thundering herd: when many agents hit a rate limit simultaneously, jitter spreads their retries so they don't all retry at the same instant.
- Cancellation is respected: if the user cancels while waiting for a retry, the loop exits immediately.
- No retry on API errors: a malformed request will fail the same way every time. Retrying wastes time and tokens.
Skills
Skills extend an agent with domain expertise using the AgentSkills open standard. A skill is a directory containing a SKILL.md file with instructions the agent can load on demand.
How it works
Skills use progressive disclosure to manage context efficiently:
- Metadata (~100 tokens/skill) — name + description, always in the system prompt
- Instructions (<5k tokens) — SKILL.md body, loaded when the agent decides the skill is relevant
- Resources (unlimited) — scripts, references, assets — loaded only when needed
The agent decides when to activate a skill based on the description alone. No trigger engine needed.
Skill format
my-skill/
├── SKILL.md # Required: YAML frontmatter + instructions
├── scripts/ # Optional: executable code
├── references/ # Optional: documentation loaded on demand
└── assets/ # Optional: templates, static resources
SKILL.md uses YAML frontmatter:
---
name: git
description: Git operations — commit, branch, merge, rebase. Use when the user mentions version control.
---
# Git Skill
## Workflow
1. Run `git status` first
2. Stage changes, write conventional commit messages
3. For merges, check for conflicts first
## Scripts
For complex diffs: `bash {baseDir}/scripts/diff_summary.sh`
Loading skills
#![allow(unused)] fn main() { use phi_core::SkillSet; use std::path::PathBuf; // Load from multiple directories (later dirs override earlier on name conflict) let skills = SkillSet::load(&[PathBuf::from("./skills"), PathBuf::from("~/.phi-core/skills")]); // Or load from a single directory with a label let workspace_skills = SkillSet::load_dir("./skills", "workspace"); }
Using with Agent
#![allow(unused)] fn main() { use phi_core::{BasicAgent, SkillSet}; use phi_core::provider::ModelConfig; use std::path::PathBuf; let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let skills = SkillSet::load(&[PathBuf::from("./skills")]); let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_system_prompt("You are a coding assistant.") .with_skills(skills) // Appends skill index to system prompt .with_tools(tools); }
The agent's system prompt will include:
<available_skills>
<skill>
<name>git</name>
<description>Git operations — commit, branch, merge, rebase.</description>
<location>/path/to/skills/git/SKILL.md</location>
</skill>
</available_skills>
When the agent encounters a task matching a skill, it reads the SKILL.md using the read_file tool and follows the instructions. No special infrastructure needed.
Precedence
When loading from multiple directories, later directories take precedence. A skill in ./skills/ overrides the same-named skill in ~/.phi-core/skills/.
You can also merge skill sets explicitly:
#![allow(unused)] fn main() { let mut base = SkillSet::load_dir("/usr/share/phi-core/skills", "bundled")?; let user = SkillSet::load_dir("~/.phi-core/skills", "user")?; let workspace = SkillSet::load_dir("./skills", "workspace")?; base.merge(user); base.merge(workspace); // workspace wins on conflict }
Compatibility
By following the AgentSkills standard, skills written for phi-core work with Claude Code, Codex CLI, Gemini CLI, Cursor, OpenCode, Goose, and any other compatible agent. Write once, use everywhere.
Design philosophy
Skills are deliberately simple:
- No trigger engine — the LLM decides from descriptions
- No compile-time registration — skills use existing tools (read_file, bash)
- No plugin API — skills are just files
- No runtime loading — loaded at startup, that's it
If a skill needs a custom tool, it can provide an MCP server.
Sub-Agents
Sub-agents let a parent agent delegate tasks to child agent loops, each with their own system prompt, tools, and ModelConfig. The parent LLM invokes them like any other tool.
Overview
Parent Agent
├── prompt("Research X and implement Y")
│ ├── calls SubAgentTool("researcher", task="Research X")
│ │ └── child agent_loop() with read/search tools → returns findings
│ ├── calls SubAgentTool("coder", task="Implement Y based on findings")
│ │ └── child agent_loop() with edit/write tools → returns result
│ └── summarizes both results
Each sub-agent invocation starts a fresh conversation — no state leaks between calls.
Creating Sub-Agents
#![allow(unused)] fn main() { use phi_core::agents::SubAgentTool; use phi_core::provider::ModelConfig; use phi_core::tools; use std::sync::Arc; let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let researcher = SubAgentTool::new( "researcher", ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key), ) .with_description("Searches and reads files to gather information.") .with_system_prompt("You are a research assistant. Be thorough and concise.") .with_tools(vec![ Arc::new(tools::ReadFileTool::new()), Arc::new(tools::SearchTool::new()), ]) .with_max_turns(10); }
Registering on a Parent Agent
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::ModelConfig; let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let mut agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_system_prompt("You coordinate between sub-agents.") .with_sub_agent(researcher) .with_sub_agent(coder); }
The parent sees sub-agents as regular tools. It decides when to delegate based on its system prompt.
Parallel Execution
When the parent LLM calls multiple sub-agents in a single response, they run concurrently (default Parallel strategy). Two sub-agents each taking 50ms complete in ~50ms total, not 100ms.
Configuration
| Method | Purpose |
|---|---|
with_description() | What the parent LLM sees (helps it decide when to delegate) |
with_system_prompt() | The sub-agent's own instructions |
with_provider_override(provider) | Bypass ProviderRegistry (primarily for tests) |
with_tools() | Tools available to the sub-agent (accepts Vec<Arc<dyn AgentTool>>) |
with_max_turns(N) | Turn limit (default: 10). Primary guard against runaway execution. |
with_max_tokens(N) | Max tokens for LLM responses |
with_thinking() | Enable extended thinking for the sub-agent |
with_cache_config() | Prompt caching settings |
with_tool_execution(strategy) | Tool execution strategy (Parallel, Sequential, Batched) |
with_retry_config(config) | Retry configuration for transient errors |
with_parent_loop_id(id: String) | Sets parent_loop_id on the child's AgentContext. The child's AgentStart event will carry this value, enabling parent→child ancestry tracing across the event stream. |
Event Forwarding
When the parent provides an on_update callback (standard for all tools), sub-agent events are forwarded as ToolExecutionUpdate events. The parent's UI sees real-time progress from the child:
- Text deltas from the sub-agent's LLM responses
- Tool call notifications from the sub-agent's tool usage
When the child loop completes, the parent emits ToolExecutionEnd with child_loop_id: Some(loop_id) set to the child's loop_id. This lets you correlate ToolExecutionEnd on the parent side with AgentStart/AgentEnd on the child side when both event streams are consumed.
Design Decisions
- Context isolation: Each invocation starts fresh. Sub-agents don't accumulate history across calls.
- No nesting: Sub-agents are not given other
SubAgentTools. This prevents infinite delegation chains. - Cancellation propagation: The parent's cancellation token is forwarded. Aborting the parent aborts all sub-agents.
- Turn limiting: The default 10-turn limit prevents runaway execution. The parent's execution limits also apply to total wall-clock time.
Example
See examples/sub_agent.rs for a complete coordinator with researcher and coder sub-agents.
State Persistence
phi-core supports saving and restoring agent conversation state, enabling pause/resume workflows, state transfer between processes, and conversation checkpointing.
Save and Restore
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::ModelConfig; // After running some conversation turns... let json = agent.save_messages(); std::fs::write("conversation.json", &json)?; // Later, in a new process: let json = std::fs::read_to_string("conversation.json")?; let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let mut agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_system_prompt("You are helpful."); agent.restore_messages(&json)?; // Continue the conversation — the agent sees the full history let rx = agent.prompt("Follow up question").await; }
Builder Initialization
For constructing an agent with pre-existing history:
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::ModelConfig; let saved: Vec<AgentMessage> = serde_json::from_str(&json)?; let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .with_messages(saved) .with_system_prompt("..."); }
JSON Format
Messages serialize as a JSON array. Each message is tagged by role:
[
{
"role": "user",
"content": [{"type": "text", "text": "Hello"}],
"timestamp": 1700000000000
},
{
"role": "assistant",
"content": [{"type": "text", "text": "Hi there!"}],
"stopReason": "stop",
"model": "claude-sonnet-4-20250514",
"provider": "anthropic",
"usage": {"input": 100, "output": 50, "cache_read": 0, "cache_write": 0, "total_tokens": 150},
"timestamp": 1700000001000
}
]
Extension messages use a nested structure:
{
"role": "extension",
"kind": "status_update",
"data": {"status": "running"}
}
Context Tracking
ContextTracker and ExecutionTracker are runtime-only and not persisted. This is by design — both are created fresh each agent_loop() invocation and operate on whatever messages are in context at that point. Restoring messages and calling prompt() works correctly without any special recalculation.
What's Serializable
| Type | Serialize | Deserialize | PartialEq |
|---|---|---|---|
Content | Yes | Yes | Yes |
Message | Yes | Yes | Yes |
AgentMessage | Yes | Yes | Yes |
ExtensionMessage | Yes | Yes | Yes |
Usage | Yes | Yes | Yes |
StopReason | Yes | Yes | Yes |
ToolResult | Yes | Yes | Yes |
CacheConfig | Yes | Yes | Yes |
ToolExecutionStrategy | Yes | Yes | Yes |
ContextConfig | Yes | Yes | No |
ExecutionLimits | Yes | Yes | No |
Lifecycle Callbacks
phi-core provides four tiers of lifecycle callbacks that let you observe and control the agent loop without modifying its internals. Loop-level, turn-level, and tool-level callbacks are set on AgentLoopConfig (or via Agent builder methods). Session-level callbacks (before_task / after_task) are set on SessionRecorderConfig.
0.9.0 — async hook bodies. All loop-level, turn-level, and the non-update tool-level hooks below (plus the two compaction hooks) are now
async. Theon_*builders onBasicAgentaccept closures whose bodies returnPin<Box<dyn Future<Output = T> + Send>>— wrap sync bodies inBox::pin(async move { ... }), or.awaitLLM and other async work directly. The two tool-update hooks (before_tool_execution_update/after_tool_execution_update) stay sync — see the note next to their sections for the rationale. CHANGELOG[0.9.0]§ Migration carries the full mechanical recipe.
Tiers Overview
| Tier | Hooks | Scope |
|---|---|---|
| Session-level | before_task, after_task | Once per session (on SessionRecorderConfig) |
| Loop-level | before_loop, after_loop | Once per agent_loop() / agent_loop_continue() call |
| Turn-level | before_turn, after_turn, on_error | Once per LLM call (every turn) |
| Tool-level | before_tool_execution, after_tool_execution, before_tool_execution_update, after_tool_execution_update | Once per tool call |
Loop-Level Hooks
before_loop
Called once before AgentStart is emitted. Receives the current message history and an initial usage counter of 0. Return false to abort the entire run — AgentEnd is emitted with an empty message list and the loop exits immediately.
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_before_loop(|messages, _usage| { println!("Starting run with {} existing messages", messages.len()); true // return false to abort }); }
after_loop
Called once after AgentEnd is emitted. Receives the new messages produced during the run and the accumulated Usage across all turns.
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_after_loop(|new_messages, total_usage| { println!( "Run complete: {} new messages, {} total tokens", new_messages.len(), total_usage.total_tokens ); }); }
Turn-Level Hooks
before_turn
Called before each LLM call. Receives the current message history and the turn number (0-indexed). Return false to abort the loop.
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_before_turn(|messages, turn| { println!("Turn {} starting with {} messages", turn, messages.len()); turn < 10 // Stop after 10 turns }); }
after_turn
Called after each LLM response and tool execution. Receives the updated message history and the turn's token usage.
#![allow(unused)] fn main() { use std::sync::{Arc, Mutex}; let total_cost = Arc::new(Mutex::new(0u64)); let cost_tracker = total_cost.clone(); let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_after_turn(move |_messages, usage| { let mut cost = cost_tracker.lock().unwrap(); *cost += usage.input + usage.output; println!("Cumulative tokens: {}", *cost); }); }
on_error
Called when the LLM returns a StopReason::Error. Receives the error message string.
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_error(|err| { eprintln!("LLM error: {}", err); // Log to monitoring, send alert, etc. }); }
Tool-Level Hooks
before_tool_execution
Called before each tool starts, after the ToolExecutionStart event would normally emit. Receives the tool name, call ID, and arguments. Return false to skip the tool — a ToolExecutionEnd with an error result is emitted and the tool's execute() is never called.
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_before_tool_execution(|name, call_id, _args| { println!("About to run tool: {}", name); // Return false to block specific tools: name != "bash" // block bash, allow everything else }); }
after_tool_execution
Called after each tool finishes (after ToolExecutionEnd is emitted). Receives the tool name, call ID, and whether the result was an error.
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_after_tool_execution(|name, call_id, is_error| { if is_error { eprintln!("Tool {} ({}) failed", name, call_id); } }); }
before_tool_execution_update (sync — see note below)
Called before each ToolExecutionUpdate event (streaming progress from a running tool). Return false to suppress the event — the tool keeps running and the final ToolResult is unaffected; only the intermediate streaming update is dropped.
Pre-existing-behaviour preservation note (phi-core 0.9.0). The two tool-update hooks (
before_tool_execution_update/after_tool_execution_update) remain sync after the 0.9.0 async-trait migration. Async-ifying them would cascade into theToolUpdateFncallback type and everyAgentTool::executebody that invokesctx.on_update(...)— materially wider than the 0.9.0 scope. The veto decision inbefore_tool_execution_updatemust be synchronous so the surrounding emit gate works without an.awaitsuspension at every streamed tool-update. Async work at update-time should be dispatched viatokio::spawn(...)inside the sync closure body. Tracked in the CHANGELOG[Unreleased]"Forward markers" for a future release.
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_before_tool_execution_update(|name, call_id, text| { // Only forward updates for bash tool name == "bash" }); }
after_tool_execution_update
Called after each ToolExecutionUpdate event, only if it was not suppressed by before_tool_execution_update.
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_after_tool_execution_update(|name, call_id, text| { // e.g., log streaming updates to a file }); }
Script Callbacks
In addition to Rust closures, callbacks can be implemented as external shell or Python scripts. This allows non-Rust consumers to hook into the agent lifecycle without compiling Rust code.
Script callbacks are specified as command strings (e.g., "./scripts/on_task_start.sh" or "python3 scripts/after_turn.py"). The agent loop spawns the script as a subprocess, passing relevant context (such as session ID, turn number, or tool name) as environment variables or arguments. The script's exit code determines whether the action proceeds (0 = continue, non-zero = abort, for Before* hooks).
Script callbacks can be configured in the [callbacks] section of the config file or set programmatically via the Agent trait.
All callback tiers are wired in the script callback bridge. Loop-level (before_loop, after_loop), tool-level (before_tool_execution, after_tool_execution), compaction-level (before_compaction_start, after_compaction_end), and turn-level (before_turn, after_turn) hooks are all resolved from the [callbacks] config section and bridged to external scripts. The bridge passes hook context as JSON (message count, turn index, tool name, etc.) via stdin to the subprocess.
Hook Ordering
The hooks fire in strict order relative to their paired events. This ordering is an invariant — it is enforced at runtime:
before_loop
→ AgentStart
before_turn
→ TurnStart
[MessageStart / MessageUpdate* / MessageEnd]
[per tool call:]
before_tool_execution
→ ToolExecutionStart
(before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)*
ToolExecutionEnd →
after_tool_execution
[if context budget exceeded:]
before_compaction_start
→ CompactionStarted
CompactionEnded →
after_compaction_end
TurnEnd →
after_turn
AgentEnd →
after_loop
Short-Circuit Rules
Hook returns false | Effect |
|---|---|
before_loop | Aborts before AgentStart; emits AgentEnd(messages=[]) |
before_turn | Skips turn; neither TurnStart nor TurnEnd is emitted |
before_tool_execution | Skips tool; emits error ToolExecutionEnd without calling execute() |
before_tool_execution_update | Suppresses ToolExecutionUpdate; tool keeps running; ToolResult unaffected |
Steering Checkpoints
Steering messages (injected via the agent's steering queue) are checked at six specific points in the turn cycle. These checkpoints give the caller opportunities to redirect the agent mid-run without waiting for the current loop iteration to complete.
The Six Checkpoints
- Before turn -- After
before_turnfires, before the LLM call. The steering message is prepended to the message history as a User message before the model sees it. - After turn -- After the LLM response is received and
after_turnfires. Steering is appended before the next turn begins. - Between tool executions (Sequential) -- When
tool_strategy = "sequential", the steering queue is checked between each individual tool call. This is the finest-grained checkpoint. - Between batches (Batched) -- When
tool_strategy = "batched", the steering queue is checked after each batch completes, before the next batch starts. - After all tools (Parallel) -- When
tool_strategy = "parallel", steering is checked once after all tool calls complete. No mid-batch interruption. - On loop re-entry -- At the top of each loop iteration, before
before_turnfires.
Per-Strategy Behavior
| Strategy | When steering is checked | Granularity |
|---|---|---|
| Sequential | Between each tool call | Per-tool |
| Batched | After each batch completes | Per-batch |
| Parallel | After all tools complete | Post-batch |
In all strategies, checkpoints 1, 2, and 6 always apply. The strategy only affects when steering is checked during tool execution (checkpoints 3-5).
Why Mid-Stream and Mid-Tool Steering Is Not Supported
Steering is intentionally not checked:
- During an LLM streaming response -- The SSE stream is atomic from the agent loop's perspective. Interrupting a partial response would produce an inconsistent message (partial assistant text with no stop reason). The model's response must complete or fail before steering can take effect.
- During a single tool's execution -- A tool call is an atomic unit. Interrupting a bash command mid-execution or a file write mid-stream would leave the environment in an undefined state. The tool must return its
ToolResultbefore steering is considered.
These boundaries are not limitations but invariants that keep the message history and environment consistent.
Hard Abort with CancellationToken
For cases where waiting for the next steering checkpoint is unacceptable (e.g., runaway tool, user-initiated cancel), CancellationToken provides a hard abort:
#![allow(unused)] fn main() { use tokio_util::sync::CancellationToken; let cancel = CancellationToken::new(); let cancel_clone = cancel.clone(); // In another task: cancel_clone.cancel(); // triggers immediate abort }
When the token is cancelled:
- The current LLM stream is dropped (partial response discarded)
- Running tools are cancelled via their async cancellation
- The loop emits
AgentEndwithStopReason::Aborted - No further turns or tool calls are attempted
CancellationToken is a last resort. Prefer steering for graceful redirection; use cancellation only when the agent must stop immediately.
Combining Callbacks
All callbacks are optional and independent:
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key)) .on_before_loop(|_msgs, _| true) .on_after_loop(|msgs, usage| { println!("Done: {} messages, {} tokens", msgs.len(), usage.total_tokens); }) .on_before_turn(|_msgs, turn| turn < 20) .on_after_turn(|msgs, usage| { println!("Messages: {}, Tokens: {}/{}", msgs.len(), usage.input, usage.output); }) .on_error(|err| eprintln!("Error: {}", err)) .on_before_tool_execution(|name, _id, _args| { println!("Running: {}", name); true }) .on_after_tool_execution(|name, _id, is_error| { println!("Tool {} finished (error={})", name, is_error); }); }
Using with AgentLoopConfig
For direct loop usage without the Agent wrapper:
#![allow(unused)] fn main() { use std::sync::Arc; use phi_core::agent_loop::AgentLoopConfig; use phi_core::provider::ModelConfig; let config = AgentLoopConfig { model_config: ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key), // Loop-level before_loop: Some(Arc::new(|_msgs, _| true)), after_loop: Some(Arc::new(|msgs, usage| { /* log */ })), // Turn-level before_turn: Some(Arc::new(|_msgs, turn| turn < 5)), after_turn: Some(Arc::new(|_msgs, _usage| { /* log */ })), on_error: Some(Arc::new(|err| eprintln!("{}", err))), // Tool-level before_tool_execution: Some(Arc::new(|name, id, args| true)), after_tool_execution: Some(Arc::new(|name, id, is_error| {})), before_tool_execution_update: Some(Arc::new(|name, id, text| true)), after_tool_execution_update: Some(Arc::new(|name, id, text| {})), ..Default::default() }; }
Sessions
A Session is a named container (keyed by session_id) that groups all
LoopRecords belonging to one agent session. Sessions provide persistent,
structured memory of every agent interaction — suitable for logging, replay,
branching, and tracing agent-spawning chains.
The session module is split into sub-modules: model, recorder, storage, helpers.
Session (session_id)
├── LoopRecord (loop_id: A) ← origin loop
│ ├── LoopRecord (loop_id: B) ← continuation of A
│ └── LoopRecord (loop_id: C) ← another continuation of A
│ ├── LoopRecord (loop_id: D) ← parallel branch
│ └── LoopRecord (loop_id: E) ← parallel branch (selected)
└── child_loop_refs → Session (sub-agent session)
Overview
| Concept | Description |
|---|---|
Session | Container for all loops belonging to one session_id |
LoopRecord | Complete record of one agent_loop / agent_loop_continue execution |
LoopEvent | One event in a loop's ordered event stream |
SessionRecorder | Stateful consumer that builds sessions from AgentEvent streams |
Relationship to loops
One session contains many loops. Loops within a session form a tree via
parent_loop_id / children_loop_ids links. Parallel-evaluation branches
form a sibling group linked by ParallelGroupRecord. Sub-agent loops are
cross-session (different session_id) and connected via ChildLoopRef /
SpawnRef instead.
Session Formation
A new Session is opened when SessionRecorder first encounters a session_id
it has not seen before. Three scenarios produce a new session:
PerSessionId (default)
One Session per session_id. Maps naturally onto BasicAgent lifetime —
one BasicAgent instance = one session for its entire lifetime.
#![allow(unused)] fn main() { let mut recorder = SessionRecorder::new(SessionRecorderConfig::default()); // Every event from a single BasicAgent feeds into one Session. }
When to use: The default for most applications. No infrastructure needed.
InactivityTimeout
Opens a new session when the agent has been idle for longer than a configured
threshold. Requires the caller to rotate session_id beforehand — the recorder
detects the new session_id on the next AgentStart.
#![allow(unused)] fn main() { // In your agent orchestrator, before prompting: if agent.check_and_rotate(Duration::from_secs(1800)).is_some() { println!("Started new session after 30 minutes idle"); } }
When to use: Long-running assistants where each "conversation" should be a
distinct session even if the BasicAgent object persists.
Explicit rotation
Call BasicAgent::new_session() directly to rotate immediately.
#![allow(unused)] fn main() { let new_id = agent.new_session(); // All subsequent loops belong to the new session. }
When to use: At conversation boundaries you control explicitly (e.g. "clear chat" button, new document context).
LoopRecord Anatomy
Field table
| Field | Type | Description |
|---|---|---|
loop_id | String | Unique id for this execution |
session_id | String | Session this loop belongs to |
agent_id | String | Agent that ran this loop |
parent_loop_id | Option<String> | Preceding loop (same or different session) |
continuation_kind | ContinuationKind | How this loop relates to its parent (Initial for first loops) |
started_at | DateTime<Utc> | Timestamp from AgentStart |
ended_at | Option<DateTime<Utc>> | Timestamp from AgentEnd |
status | LoopStatus | Lifecycle state |
rejection | Option<String> | Input-filter rejection reason (if any) |
config | Option<LoopConfigSnapshot> | Model/provider that ran this loop |
messages | Vec<AgentMessage> | All new messages produced (from AgentEnd) |
turns | Vec<Turn> | Materialized turn records (one per LLM call-response cycle). Built from TurnStart/TurnEnd event pairs. Empty for old sessions or loops that ended before any turn completed. |
usage | Usage | Token usage for this loop |
metadata | Option<Value> | Caller-supplied metadata from AgentStart |
events | Vec<LoopEvent> | Full ordered event stream |
children_loop_ids | Vec<String> | Same-session child loops (parent→children) |
child_loop_refs | Vec<ChildLoopRef> | Cross-session sub-agent spawn links |
compaction_block | Option<CompactionBlock> | Non-destructive compaction overlay (see below) |
parallel_group | Option<ParallelGroupRecord> | Parallel-evaluation group metadata |
LoopStatus lifecycle
AgentEnd (no rejection)
┌─────────┐ AgentStart ┌─────────┐ ───────────────────────► ┌───────────┐
│ Pending ├─────────────►│ Running │ │ Completed │
└─────────┘ └────┬────┘ AgentEnd (rejection Some) └───────────┘
│ ────────────────────────────► ┌──────────┐
│ │ Rejected │
│ flush() before AgentEnd └──────────┘
└─────────────────────────────► ┌─────────┐
│ Aborted │
└─────────┘
Pending is only used for parallel-evaluation branches: they are pre-registered
when ParallelLoopStart arrives, before their individual AgentStart fires.
continuation_kind classification
parent_loop_id | continuation_kind | Meaning |
|---|---|---|
None | Initial | Fresh origin loop (agent_loop) |
| Same-session parent | Default | Regular continuation |
| Same-session parent | Rerun { tag } | Retry / error recovery |
| Same-session parent | Branch { tag } | Branch exploration |
| Different-session parent | Initial | Sub-agent loop (spawned by a tool) |
LoopConfigSnapshot
LoopConfigSnapshot captures model identity and key configuration from
the AgentLoopConfig that ran the loop:
#![allow(unused)] fn main() { pub struct LoopConfigSnapshot { pub model: String, // e.g. "claude-opus-4-6" pub provider: String, // e.g. "anthropic" pub config_id: Option<String>, // from AgentLoopConfig.config_id pub name: Option<String>, // model display name pub api: Option<ApiProtocol>, // which API protocol was used pub base_url: Option<String>, // provider base URL pub reasoning: Option<bool>, // whether model supports reasoning/thinking pub context_window: Option<u32>, // model context window size pub max_tokens: Option<u32>, // max output tokens pub thinking_level: Option<ThinkingLevel>, // reasoning depth for this loop pub temperature: Option<f32>, // sampling temperature } }
The first three fields (model, provider, config_id) are always populated.
The remaining fields use Option with #[serde(skip_serializing_if = "Option::is_none")]
so they only appear when set, keeping serialized output compact.
Why not store the full AgentLoopConfig? The full config contains API keys
(in ModelConfig.api_key) and non-serialisable hook closures. Storing it would
require stripping secrets and skipping closures for little extra value.
LoopConfigSnapshot is sufficient for cost attribution, replay (the caller
reconstructs the config), and identifying parallel branches (e.g. "haiku vs. opus").
Note:
thinking_levelandtemperaturewere previously stored on theSessionstruct. They are now tracked per-loop inLoopConfigSnapshot, which more accurately reflects that these settings can vary between loops (e.g. across parallel evaluation branches with different configs).
events field
LoopRecord.events contains every AgentEvent emitted during the loop, in
order, tagged with a monotonic sequence counter.
MessageUpdate (streaming delta) events are excluded by default — they are
100–1 000× more numerous than final messages and are not needed for replay.
Enable them with SessionRecorderConfig { include_streaming_events: true, .. }.
AgentEnd.messages is the authoritative message source for a loop.
LoopRecord.messages is populated directly from it. Reconstructing messages
from MessageStart/MessageEnd events would be fragile.
compaction_block field
LoopRecord.compaction_block holds a non-destructive compaction overlay. When
present, the context loader uses this block instead of the raw messages field
to reconstruct the agent's working context. The original messages remain
authoritative for replay and branching — they are never mutated or discarded.
This overlay model means compaction is always reversible: removing or replacing
the CompactionBlock restores the original conversation without data loss.
Bidirectional parent↔child links
Both directions of the loop tree are maintained:
LoopRecord.parent_loop_id— child → parent (set at loop creation)LoopRecord.children_loop_ids— parent → children (appended atAgentEnd)
This allows O(1) traversal in either direction without scanning the full
loops vec.
Loop Tree Navigation
Session provides four navigation methods:
#![allow(unused)] fn main() { // Root loops — no parent in this session. session.root_loops(); // Direct same-session children of a loop. session.children_of("loop-id-A"); // All parallel siblings (including the loop itself). session.parallel_siblings("loop-id-branch-1"); // Lookup by id. session.get_loop("loop-id-X"); // Cumulative token usage for the whole session. session.total_usage(); }
Reconstructing a conversation thread
Follow the parent→child chain from a root:
#![allow(unused)] fn main() { fn print_thread(session: &Session, loop_id: &str, indent: usize) { if let Some(lr) = session.get_loop(loop_id) { println!("{:indent$}{loop_id}: {:?}", "", lr.status, indent = indent); for child_id in &lr.children_loop_ids { print_thread(session, child_id, indent + 2); } } } for root in session.root_loops() { print_thread(&session, &root.loop_id, 0); } }
Identifying branches
Branches share the same parent_loop_id and each has parallel_group set:
#![allow(unused)] fn main() { let branches: Vec<_> = session.parallel_siblings("branch-loop-id").collect(); let winner = branches.iter().find(|l| { l.parallel_group.as_ref().map(|pg| pg.is_selected).unwrap_or(false) }); }
Cross-Session Sub-Agent Tracking
Sub-agents run with their own session_id. phi-core maintains bidirectional
links between the parent session and the child session:
Parent Session Child Session
────────────────────── ──────────────────────────
LoopRecord (loop-P) Session
child_loop_refs: parent_spawn_ref:
ChildLoopRef { SpawnRef {
tool_call_id: "call-1" parent_session_id: "sess-P"
tool_name: "sub_agent" parent_loop_id: "loop-P"
child_loop_id: "loop-C" tool_call_id: "call-1"
child_session_id: "sess-C" tool_name: "sub_agent"
} }
Tracing a full spawn chain
#![allow(unused)] fn main() { // Load parent and child sessions from disk. let parent = load_session("sess-P", dir)?; let child = load_session("sess-C", dir)?; // From parent: find all sub-agent spawns. for lr in &parent.loops { for child_ref in &lr.child_loop_refs { println!("Tool {} spawned sub-agent loop {}", child_ref.tool_name, child_ref.child_loop_id); } } // From child: find the parent that triggered it. if let Some(ref sr) = child.parent_spawn_ref { println!("This session was spawned by {} in session {}", sr.tool_name, sr.parent_session_id); } }
Why sub-agents get separate sessions
Sub-agents have clean identity boundaries — they can be loaded and analyzed
independently of their parent. Embedding child data inside the parent session
would bloat the parent record and couple two independent execution traces.
The bidirectional ChildLoopRef / SpawnRef pair provides a complete spawn
graph without that coupling.
Parallel Evaluation Groups
When agent_loop_parallel runs N branches, each branch gets its own
LoopRecord. All N records are linked by ParallelGroupRecord:
#![allow(unused)] fn main() { pub struct ParallelGroupRecord { pub all_loop_ids: Vec<String>, // all branch loop_ids in config order pub selected_loop_id: String, // winner chosen by EvaluationStrategy pub selected_config_index: usize, // 0-based index into original configs pub evaluation_usage: Usage, // judge LLM tokens (zero if no judge) pub is_selected: bool, // true only on the winner's record } }
LoopStatus::Pending is used before AgentStart arrives for each branch.
ParallelLoopStart announces all loop_ids in advance, so the group can be
registered immediately without retroactive wiring.
SessionRecorder Usage
Wire the recorder to your agent's event channel:
#![allow(unused)] fn main() { use phi_core::session::{SessionRecorder, SessionRecorderConfig, save_session}; use phi_core::AgentEvent; use std::path::Path; use tokio::sync::mpsc; let (tx, mut rx) = mpsc::unbounded_channel::<AgentEvent>(); let mut recorder = SessionRecorder::new(SessionRecorderConfig::default()); // Spawn a task to consume events from the channel. tokio::spawn(async move { while let Some(event) = rx.recv().await { recorder.on_event(event); } // Channel closed — flush and persist. recorder.flush(); for session in recorder.drain_completed() { save_session(&session, Path::new("./sessions")).unwrap(); } }); // Pass tx to agent_loop / agent_loop_continue / BasicAgent. }
include_streaming_events
Enable only when you need to replay or audit the raw token stream:
#![allow(unused)] fn main() { SessionRecorderConfig { include_streaming_events: true, ..Default::default() } }
Storage implications: a single turn with extended thinking may produce thousands
of MessageUpdate events. Each is a full clone of the accumulated message plus
the delta.
Session Lifecycle Callbacks
SessionRecorderConfig supports two session-level callbacks for billing, audit, and metrics:
| Field | Type | Description |
|---|---|---|
before_task | Option<BeforeTaskFn> | Arc<dyn Fn(&Session) -> bool>. Fires on the first AgentStart with a new session_id. Return false to reject. Use for session initialization, billing setup, or audit logging. |
after_task | Option<AfterTaskFn> | Arc<dyn Fn(&Session)>. Fires in flush() when the session is finalized. Use for billing finalization, metrics emission, or cleanup. |
These are session-level (not loop-level) hooks. Unlike before_loop/after_loop on AgentLoopConfig which fire around each individual agent loop, before_task and after_task fire once per session lifecycle:
#![allow(unused)] fn main() { use phi_core::session::{SessionRecorder, SessionRecorderConfig}; let config = SessionRecorderConfig { before_task: Some(Arc::new(|session: &phi_core::session::Session| -> bool { println!("Session started: {}", session.session_id); // Initialize billing, start audit trail, etc. true // return false to reject the session })), after_task: Some(Arc::new(|session| { println!("Session finalized: {} ({} loops)", session.session_id, session.loops.len()); // Finalize billing, emit metrics, etc. })), ..Default::default() }; let mut recorder = SessionRecorder::new(config); }
Persistence API
| Function | Description |
|---|---|
save_session(session, dir) | Write {dir}/{session_id}.json (atomic via tmp + rename) |
load_session(session_id, dir) | Read {dir}/{session_id}.json |
list_session_ids(dir) | List all .json filenames, newest first |
load_sessions_for_agent(agent_id, dir) | Load all sessions matching agent_id |
delete_session(session_id, dir) | Remove {dir}/{session_id}.json |
File format: pretty-printed JSON (serde_json::to_writer_pretty).
Directory layout: flat — {dir}/{session_id}.json, no sub-directories, no index.
Writes are atomic: the implementation writes to a temp file then renames over the
target, so readers never observe a partially-written file.
Pluggable store trait (0.7.0+)
For callers that want to swap the persistence backend (e.g. S3, SQLite, an
in-memory fake for tests) or that need concurrent-writer safety, phi-core
exposes a SessionStore async trait alongside the free functions:
#![allow(unused)] fn main() { use phi_core::session::{SessionStore, FileSystemSessionStore}; let store = FileSystemSessionStore::new("./sessions"); store.save(&session).await?; // acquires fs2 exclusive lock let loaded = store.load("sess-1").await?; let ids = store.list_ids().await?; store.delete("sess-1").await?; }
FileSystemSessionStore::save() takes an advisory fs2 exclusive lock on the
target file before the atomic rename. Concurrent writers to the same
session_id get back SessionError::Locked { session_id } instead of silently
producing a corrupt file. Readers take a shared lock and so coexist with
themselves.
The free save_session() / load_session() / etc. functions remain available
and unchanged — use the trait when you need pluggability or contention safety.
When to call flush()
Call flush() before saving to finalize any loops that have not received
AgentEnd yet (e.g. on process shutdown). Flushed loops get status Aborted.
#![allow(unused)] fn main() { recorder.flush(); let sessions = recorder.drain_completed(); for s in &sessions { save_session(s, Path::new("./sessions"))?; } }
Design Decisions
1. loop_id on every AgentEvent variant
Decision: Add loop_id: String to all 11 AgentEvent variants that lacked it.
Why: agent_loop_parallel interleaves branch events on one tx channel.
Without loop_id on every event, TurnStart, ToolExecutionEnd, etc. cannot
be reliably attributed to the correct branch LoopRecord. The only alternative
— heuristically assigning events to the last-opened loop — produces incorrect
records when two branches overlap.
Rejected alternative: Last-opened-loop heuristic. Rejected because parallel branches genuinely interleave; the heuristic would silently misattribute events.
2. LoopStatus::Pending for parallel branches
Decision: Pre-register LoopRecord { status: Pending } entries when
ParallelLoopStart arrives, before their AgentStart events fire.
Why: ParallelLoopStart announces all loop_ids in advance.
Pre-creating records lets the ParallelGroupRecord be registered immediately,
so no retroactive wiring is needed when each branch's AgentStart arrives later.
Rejected alternative: Create LoopRecords only on AgentStart and
retroactively set ParallelGroupRecord on ParallelLoopEnd. Rejected because
it requires a second pass over all records and makes the group state inconsistent
during the parallel execution window.
3. Messages from AgentEnd, not reconstructed from events
Decision: LoopRecord.messages is populated directly from AgentEnd.messages.
Why: AgentEnd.messages is the authoritative, ordered list of all messages
produced by a loop. The LLM loop already assembles this — there is no value in
re-assembling it from MessageStart/MessageEnd events in the recorder.
Rejected alternative: Reconstruct messages from streaming events. Rejected because it duplicates work, is fragile (missed events, ordering edge cases), and requires special handling for partial messages.
4. Bidirectional parent↔child within a session
Decision: Maintain both parent_loop_id (child→parent) and
children_loop_ids (parent→children) on every LoopRecord.
Why: O(1) traversal in both directions without scanning the full loops vec.
The recorder appends to parent.children_loop_ids when a loop's AgentEnd
arrives and its parent_loop_id is in the same session.
Rejected alternative: Single-direction links + O(N) scan. Rejected because deep continuation trees (10+ loops) would incur O(N²) cost for common tree operations.
5. continuation_kind classifies loop origin
Decision: Reuse the existing ContinuationKind enum (Initial, Default, Rerun,
Branch, Compaction) to classify loop relationships, supplemented by the
parent_loop_id/session_id cross-session check.
Why: ContinuationKind is already threaded through AgentStart — no new
enum is needed. The full classification table (origin / continuation / retry /
branch / sub-agent) is derivable from (parent_loop_id, session_id, continuation_kind).
Rejected alternative: A dedicated LoopOrigin enum on LoopRecord. Rejected
because it would duplicate information already present in the existing fields and
require an additional mapping step in the recorder.
6. Sub-agents are separate sessions with bidirectional cross-session links
Decision: Sub-agents always get their own session_id. The parent records
ChildLoopRef (outbound); the child Session records SpawnRef (inbound).
Why: Clean agent identity boundaries — sub-agent sessions can be loaded and analyzed independently. The bidirectional link pair provides a complete spawn graph without coupling the parent and child session records.
Rejected alternative: Embed sub-agent loops inside the parent session. Rejected because a sub-agent may have many of its own continuations, parallel branches, and even nested sub-agents — treating it as a flat loop inside the parent session would obscure this structure.
7. SpawnRef on Session (not on LoopRecord)
Decision: The inbound cross-session spawn reference lives on Session.parent_spawn_ref,
not on an individual LoopRecord.
Why: Sub-agent spawning is a session-level concern. The entire child session
was triggered by one parent loop — the reference belongs at the session level,
not on individual loop records within it. Placing it on a LoopRecord would
require choosing which loop gets the ref (the first? the origin?) arbitrarily.
Rejected alternative: LoopRecord.parent_spawn_ref. Rejected because a
sub-agent session may have multiple origin loops (e.g. after new_session())
and the spawn ref would be duplicated or placed inconsistently.
8. include_streaming_events: bool (default false)
Decision: MessageUpdate (streaming delta) events are excluded from
LoopRecord.events by default.
Why: Streaming deltas are 100–1 000× more numerous than final messages and
are not needed for replay or branching. The final message content in AgentEnd.messages
is authoritative. Opt-in ensures that session files stay compact by default.
Rejected alternative: Always store all events. Rejected because a single session with a few extended-thinking turns could easily produce megabytes of delta events.
9. Flat file layout: {dir}/{session_id}.json
Decision: One JSON file per session. No index file, no sub-directories.
Why: Simplest observable format — files can be inspected directly with
any JSON tool. list_session_ids is a directory listing. No index to maintain
or synchronize.
Rejected alternative: Indexed layout (e.g. sessions/index.json + sessions/{id}.json).
Rejected because the index requires atomic updates (write to two files) and can
fall out of sync. An indexed layout can be added in a future iteration when
query patterns (filtering, pagination) are clearer.
Context Compaction
Compaction manages context window pressure by creating non-destructive overlays on session history. Nothing is deleted or replaced — original messages remain authoritative in LoopRecord.messages.
How it works
When the context approaches the token budget, a CompactionBlock is created on the current LoopRecord. This block controls what gets loaded into context for subsequent LLM calls, replacing the raw messages with a compacted view.
CompactionBlock anatomy
A block has three sections:
┌─────────────────────────────────────────────┐
│ keep_first │ Original turns, verbatim │ Most recent loop only
│ (turns 0..1) │ No modification │
├────────────────┼────────────────────────────-│
│ keep_compacted│ Summarised one-liners │ All loops
│ (turns 2..N-6)│ ≤ max_summary_tokens │
├────────────────┼────────────────────────────-│
│ keep_recent │ Tool outputs truncated │ Most recent loop only
│ (turns N-5..N)│ Rest unchanged │
└─────────────────────────────────────────────┘
keep_first— verbatim turns from the start. Only for the most recent loop. Original messages in this range are used as-is.keep_compacted— fully summarised middle section. For the most recent loop this is the gap between keep_first and keep_recent. For older loops this covers the entire loop.keep_recent— recent turns with only tool outputs truncated. Only for the most recent loop.
When compaction fires
Compaction uses a percentage-based threshold:
headroom = compact_at_pct − (system_tokens / max_tokens) − (current_tokens / max_tokens)
Compaction fires when headroom < compact_budget_threshold_pct.
With defaults (100k max, 4k system, 90% ceiling, 5% threshold): fires when current tokens exceed ~81k.
Configuration
ContextConfig
#![allow(unused)] fn main() { ContextConfig { max_context_tokens: 100_000, // Model's context window system_prompt_tokens: 4_000, // Reserved for system prompt compaction: CompactionConfig { // Always present when limits are set // WHEN compact_at_pct: 0.90, compact_budget_threshold_pct: 0.05, compaction_scope: CompactionScope::FixedCount(3), // HOW keep_first_turns: 2, keep_recent_turns: 10, max_summary_tokens: 2_000, tool_output_max_lines: 50, }, } }
Compaction is disabled entirely by setting context_config: None on AgentLoopConfig.
CompactionScope
Controls how many earlier loops are included in compaction and context loading:
FixedCount(n)— Compact a fixed number of earlier loops. Simple and predictable.TokenBudget— Walk the chain backward, accumulating per-loop token estimates, and stop whenmax_context_tokenswould be exceeded.
TokenBudget and exceeding the window
The TokenBudget scope can include loops whose raw messages exceed max_context_tokens. This is intentional: the compacted summaries of those loops will fit in the window, even though the originals did not. This enables richer context for expensive summarisation strategies (e.g. LLM summarisers) that compress large loops into compact representations that then fit within the budget.
For example, if a loop has 50k tokens of raw messages and the window is 100k, TokenBudget includes it in scope. The strategy's keep_compacted method produces a ~500 token summary of that loop, which fits easily.
Cross-loop compaction
When compaction fires, blocks are created for the current loop and earlier loops within the compaction_scope on the active chain.
The "active chain" is the linear path from root to current loop via parent_loop_id links:
- Parallel branches — only the selected branch is on the chain. Unselected siblings get their own compaction if/when they become active.
- Reruns — the rerun's parent points to the pre-rerun loop. Superseded runs are siblings, not ancestors.
Loading rule
When building context from session history:
- Most recent loop:
keep_first+keep_compacted+keep_recent - Earlier loops (within
compaction_scope): onlykeep_compacted - Loops older than that: skipped entirely
Custom strategies
Compaction strategies are fields on CompactionConfig, not on AgentLoopConfig. The dispatch logic in run.rs reads them from ctx_config.compaction:
in_memory_strategy— custom in-memory compaction strategy (used when session isNone)block_strategy— block-based compaction strategy (used when session isSome; falls back toDefaultBlockCompaction)
Implement BlockCompactionStrategy to customise any section.
As of phi-core 0.9.0, BlockCompactionStrategy is #[async_trait]-marked
and all four methods are async fn — implementations can issue LLM calls
inside keep_compacted / keep_recent without block_in_place workarounds:
#![allow(unused)] fn main() { use async_trait::async_trait; use phi_core::{BlockCompactionStrategy, CompactionConfig, CompactedSection, TurnRange, TurnMap, DefaultBlockCompaction}; use phi_core::session::LoopRecord; struct MyStrategy; #[async_trait] impl BlockCompactionStrategy for MyStrategy { async fn keep_first(&self, record: &LoopRecord, turn_map: &TurnMap, config: &CompactionConfig) -> Option<TurnRange> { DefaultBlockCompaction.keep_first(record, turn_map, config).await // delegate } async fn keep_recent(&self, record: &LoopRecord, turn_map: &TurnMap, config: &CompactionConfig) -> Option<CompactedSection> { DefaultBlockCompaction.keep_recent(record, turn_map, config).await // delegate } async fn keep_compacted(&self, record: &LoopRecord, turn_map: &TurnMap, config: &CompactionConfig, is_most_recent: bool) -> Option<CompactedSection> { // Custom LLM-based summarisation — issue LLM calls directly without bridging. my_llm_summarize(record, turn_map, config, is_most_recent).await } } }
Sync impls that don't .await anything migrate by adding
#[async_trait::async_trait] + the async keyword on each method signature;
the bodies remain unchanged. See the per-turn debug-capture surface in
debugging.md for the canonical pattern to inspect what
each compacted turn looked like to the model.
Set the custom strategy on CompactionConfig:
#![allow(unused)] fn main() { let compaction_config = CompactionConfig { block_strategy: Some(Arc::new(MyStrategy)), ..Default::default() }; }
Public APIs
Orchestration functions
compact_session_loops(session, loop_id, strategy, config, max_tokens)— CreatesCompactionBlocks for the current loop and earlier loops within the configured scope. Mutates the session in place; caller persists to disk.build_context_from_session(session, loop_id, config, max_tokens)— Builds a compacted context by walking the loop chain, loading from blocks where available and raw messages otherwise.
BasicAgent methods
compact_context_with_sender(&mut self, tx)— Standalone compaction with full event lifecycle:AgentStart(Compaction)→CompactionStarted→ compact →CompactionEnded→AgentEnd. No-op if session or config is missing.compact_context(&mut self) -> usize— Fire-and-forget compaction. Returns the number of loops that received new CompactionBlocks. Returns 0 if session or config is missing.
Events
Two events bracket compaction:
CompactionStarted { loop_id, estimated_tokens, message_count, timestamp }CompactionEnded { loop_id, messages_before, messages_after, estimated_tokens_before, estimated_tokens_after, loops_compacted, timestamp }
For standalone compaction (compact_context_with_sender), these appear inside a dedicated LoopRecord with continuation_kind: Compaction.
TurnId tracking
Every message pushed during the agent loop carries a TurnId { loop_id, turn_index } identifying which turn produced it. This enables TurnMap::from_messages() to group messages by turn without replaying the event stream.
TurnId is stored on LlmMessage.turn_id and serialized as an optional turnId field alongside the existing message JSON. Old data without turnId deserializes with turn_id: None.
Data model
Struct definitions
#![allow(unused)] fn main() { pub struct CompactionBlock { pub keep_first: Option<TurnRange>, // verbatim turns from start (most recent loop only) pub keep_recent: Option<CompactedSection>, // truncated tool outputs (most recent loop only) pub keep_compacted: Option<CompactedSection>,// summarised section (all loops) pub created_at: DateTime<Utc>, } pub struct TurnRange { pub start_turn: u32, // inclusive, matches TurnId.turn_index pub end_turn: u32, // inclusive } pub struct CompactedSection { pub range: TurnRange, pub messages: Vec<AgentMessage>, // replacement messages for this range } pub struct TurnId { pub loop_id: String, pub turn_index: u32, } }
Serialization format
CompactionBlock on LoopRecord:
{
"loop_id": "session123.model.1",
"messages": [ ... ],
"compaction_block": {
"keep_first": { "startTurn": 0, "endTurn": 1 },
"keep_compacted": {
"range": { "startTurn": 2, "endTurn": 7 },
"messages": [
{ "role": "user", "content": [{"type": "text", "text": "[Summary] User asked about X"}], "timestamp": 123 }
]
},
"keep_recent": {
"range": { "startTurn": 8, "endTurn": 12 },
"messages": [ ... ]
},
"createdAt": "2026-03-28T10:00:00Z"
}
}
TurnId on LlmMessage:
{
"role": "assistant",
"content": [...],
"stopReason": "stop",
"model": "claude-sonnet-4-6",
"provider": "anthropic",
"usage": { ... },
"timestamp": 123,
"turnId": { "loopId": "session123.model.1", "turnIndex": 3 }
}
Old data without turnId deserializes as turn_id: None.
Invariants
- If
keep_firstisSome,keep_compactedmust also beSome(there must be a middle to summarise). - If
keep_recentisSome,keep_compactedmust also beSome. - For older loops (not most recent),
keep_firstandkeep_recentare alwaysNone. CompactedSection.rangebounds must be within the loop's turn count.- If a loop has a
compaction_block, all older loops on the same chain must also have one. - If a ToolCall content block is within a section's turn range, its corresponding ToolResult message must also be within the same section. Turn-based grouping (via
TurnId) enforces this.
Summary budget semantics
max_summary_tokens is a token budget for the summarised output, not a per-turn limit. Strategies should aim to summarise ALL turns within this budget (e.g. shorter summaries or LLM-generated digests), not merely process turns until the budget runs out. DefaultBlockCompaction is a basic implementation that drops remaining turns when exhausted.
Backward compatibility
LoopRecord.compaction_blockuses#[serde(default, skip_serializing_if = "Option::is_none")]— old records without the field deserialize asNone.LlmMessage.turn_iduses#[serde(default, skip_serializing_if = "Option::is_none")]— old messages withoutturnIddeserialize asNone.- The
CompactionConfigfield onContextConfiguses#[serde(default)]— old configs getCompactionConfig::default().
Focused Compaction
Focused compaction extends the context compaction system with two features: focus messages that steer what the compaction summary emphasizes, and compaction instances that let you define named compaction configurations reusable across agent profiles.
Focus Message
The focus_message field on CompactionConfig is an optional string prepended to the compacted section before the LLM summarizes it. It tells the summarizer what to prioritize when condensing conversation history.
Without a focus message, compaction produces a generic summary. With one, the summary retains details relevant to your domain:
#![allow(unused)] fn main() { use phi_core::context::{ContextConfig, CompactionConfig}; let config = ContextConfig { max_context_tokens: 200_000, compaction: CompactionConfig { focus_message: Some( "Focus on specification details, API contracts, and architectural decisions.".to_string() ), ..Default::default() }, ..Default::default() }; }
The focus message does not change the compaction trigger logic (thresholds, turn counts). It only affects the content of the summarized middle section.
When to use a focus message
- Domain-specific agents: An agent reviewing legal contracts should retain clause references, not general pleasantries.
- Long coding sessions: Focus on file paths, function signatures, and design rationale so the agent can continue working after compaction.
- Research agents: Preserve citations, data points, and methodology notes.
Compaction Instances
Compaction instances are named variations of the compaction defaults, declared with [[context.compaction.instances]] in the config file. Each instance uses the {{...}} ID reference protocol to declare its name, and overrides specific fields from the parent [compaction] section. Fields not set on the instance fall through to the parent defaults.
Config example
# ── Context config (max_context_tokens lives on ContextConfig, not CompactionConfig) ──
[context]
max_context_tokens = 200000
# ── Compaction defaults ─────────────────────────────────────────
[context.compaction]
compact_at_pct = 0.85
compact_budget_threshold_pct = 0.05
keep_first_turns = 2
keep_recent_turns = 4
max_summary_tokens = 2000
tool_output_max_lines = 50
focus_message = "Retain key decisions and code changes."
# ── Named compaction instances ──────────────────────────────────
[[context.compaction.instances]]
id = "{{%coding%}}"
description = "Compaction tuned for coding tasks"
focus_message = "Focus on file paths, function signatures, and design rationale."
keep_recent_turns = 6
max_summary_tokens = 3000
[[context.compaction.instances]]
id = "{{%research%}}"
description = "Compaction tuned for research tasks"
focus_message = "Preserve citations, data sources, and methodology."
keep_first_turns = 3
max_summary_tokens = 4000
Referencing from an agent profile
Agent profiles reference a compaction instance via the compaction field, using the {{...}} ID protocol:
[agent.profile]
name = "coding-agent"
system_prompt = "You are an expert software engineer."
compaction = "{{compaction.coding}}"
[[agent.profile.instances]]
id = "{{%researcher%}}"
description = "A research-focused profile variant"
compaction = "{{compaction.research}}"
When the agent is constructed from config, the referenced compaction instance is resolved and its fields are merged with the compaction defaults to produce the final CompactionConfig.
Programmatic Usage
When building agents in Rust without a config file, focused compaction is set directly on CompactionConfig:
#![allow(unused)] fn main() { use phi_core::context::CompactionConfig; use phi_core::agent_loop::AgentLoopConfig; use phi_core::provider::ModelConfig; let context = phi_core::context::ContextConfig { max_context_tokens: 200_000, compaction: CompactionConfig { compact_at_pct: 0.85, compact_budget_threshold_pct: 0.05, keep_first_turns: 2, keep_recent_turns: 6, max_summary_tokens: 3_000, tool_output_max_lines: 50, focus_message: Some( "Focus on file paths, function signatures, and design rationale.".to_string() ), ..Default::default() }, ..Default::default() }; let config = AgentLoopConfig { model_config: ModelConfig::anthropic("claude-sonnet-4-20250514", "Sonnet", &api_key), context_config: Some(context), ..Default::default() }; }
Or via BasicAgent builder methods:
#![allow(unused)] fn main() { use phi_core::{BasicAgent, context::CompactionConfig}; use phi_core::provider::ModelConfig; let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Sonnet", &api_key)) .with_context_config(phi_core::context::ContextConfig { max_context_tokens: 200_000, compaction: CompactionConfig { focus_message: Some("Retain specification details and API contracts.".to_string()), ..Default::default() }, ..Default::default() }); }
Summary
| Feature | Purpose |
|---|---|
focus_message | Steers compaction summarization toward domain-relevant content |
[[compaction.instances]] | Named compaction configurations with {{...}} ID protocol |
Profile compaction field | Links an agent profile to a specific compaction instance |
Context Translation
Context translation solves a fundamental problem in multi-provider agent systems: when an agent switches providers mid-session, content types from the original provider may be silently dropped or cause errors on the new provider. The ContextTranslationStrategy trait provides a read-only translation layer that produces temporary copies of messages, never modifying the canonical history.
Why it is needed
Different LLM providers support different content types. For example:
- Anthropic emits
Content::Thinkingblocks (chain-of-thought reasoning) - OpenAI has no native thinking block format
- Google/Bedrock do not support thinking blocks at all
Without translation, switching from Anthropic to OpenAI mid-session would cause thinking blocks to be silently dropped or rejected. The agent loses reasoning context it previously produced.
Design principles
The canonical Message format IS the master layout
phi-core's Message enum (User, Assistant, ToolResult) and Content enum (Text, Image, Thinking, ToolCall) define the canonical format. All providers parse into this format and all session history is stored in it. Translation happens only at the boundary, right before messages are sent to a provider.
Read-only translation
Translation produces temporary copies of the message slice. The original messages in LoopRecord.messages are never modified. This means:
- Session persistence always stores the full-fidelity canonical format
- Multiple providers can read the same history with different translations
- No information is permanently lost
Lossless round-trip guarantee
Consider this scenario:
Turn 1-3: Anthropic (produces Content::Thinking blocks)
Turn 4: Switch to OpenAI
Turn 5-6: Switch back to Anthropic
Here is what happens:
- Turns 1-3 are stored with full
Content::Thinkingblocks in canonical format. - Turn 4: Before calling OpenAI, the translator converts
Content::ThinkingtoContent::Textprefixed with[Reasoning]. OpenAI sees text, not thinking blocks. The canonical history is untouched. - Turns 5-6: Back on Anthropic. The translator passes
Content::Thinkingthrough unchanged. Anthropic sees the original thinking blocks from turns 1-3 exactly as they were produced.
The original thinking blocks from turns 1-3 are never lost. They remain in the canonical history and are available whenever the session returns to a provider that supports them.
Content type translation rules
The DefaultContextTranslation implementation applies these rules per target provider:
Content::Thinking
| Target Provider | Translation |
|---|---|
| Anthropic | Kept as-is |
| OpenAI Completions | Converted to Content::Text with [Reasoning] prefix |
| OpenAI Responses | Converted to Content::Text with [Reasoning] prefix |
| Azure OpenAI | Converted to Content::Text with [Reasoning] prefix |
| Google Gemini | Dropped (unsupported) |
| Google Vertex | Dropped (unsupported) |
| Amazon Bedrock | Dropped (unsupported) |
All other content types
Content::Text, Content::Image, and Content::ToolCall pass through unchanged for all providers.
Message-level behavior
Only Message::Assistant messages are translated (since they are the only ones that carry provider-specific content types). Message::User and Message::ToolResult pass through unchanged.
The ContextTranslationStrategy trait
#![allow(unused)] fn main() { pub trait ContextTranslationStrategy: Send + Sync { /// Translate a slice of messages for the given target provider protocol. fn translate_for_provider(&self, messages: &[Message], target: ApiProtocol) -> Vec<Message>; } }
The trait receives the full message slice and the target ApiProtocol enum variant. It returns a new Vec<Message> with translations applied.
DefaultContextTranslation
The built-in implementation applies the content type rules described above. It is the default when no custom strategy is provided.
Custom strategies
Implement the trait to define custom translation logic:
#![allow(unused)] fn main() { use phi_core::provider::context_translation::{ContextTranslationStrategy, DefaultContextTranslation}; use phi_core::provider::model::ApiProtocol; use phi_core::types::content::Message; struct MyTranslation; impl ContextTranslationStrategy for MyTranslation { fn translate_for_provider(&self, messages: &[Message], target: ApiProtocol) -> Vec<Message> { // Custom logic here — e.g., strip all images for text-only providers // Fall back to default for everything else DefaultContextTranslation.translate_for_provider(messages, target) } } }
Usage
On AgentLoopConfig
Set the context_translation field to inject a strategy into the agent loop:
#![allow(unused)] fn main() { use std::sync::Arc; use phi_core::agent_loop::AgentLoopConfig; use phi_core::provider::context_translation::DefaultContextTranslation; use phi_core::provider::ModelConfig; let config = AgentLoopConfig { model_config: ModelConfig::openai("gpt-4o", "GPT-4o", &api_key), context_translation: Some(Arc::new(DefaultContextTranslation)), ..Default::default() }; }
When context_translation is Some, the loop calls translate_for_provider() on the message slice before each LLM call. When None, messages are passed to the provider as-is.
When to enable translation
Enable context translation when:
- Your agent may switch providers mid-session (e.g., using different models for different tasks)
- You are loading session history that was produced by a different provider
- You are running parallel sub-agents on different providers that share context
If your agent always uses a single provider, translation is unnecessary.
Context Pruning
Context pruning is a model-directed mechanism for surgically removing irrelevant content from the working context during a run. Unlike compaction (which is threshold-triggered and bulk), pruning gives the model fine-grained control over what stays in the context window.
Philosophy
Context pruning saves context length (tokens in the context window), not monetary cost. The token cost of a pruned message has already been paid -- pruning cannot reclaim it. What pruning reclaims is space: room in the context window for the model to continue working without hitting the context limit.
Think of it as a researcher working through a stack of papers. The researcher freely explores tangential references, reads through lengthy tool outputs, and investigates dead ends. When a line of inquiry turns out to be irrelevant, the researcher sets those papers aside rather than keeping them on the desk. The desk has limited space; the filing cabinet does not. Pruning moves content from desk to cabinet.
This freedom to explore without anxiety about context length is the core value proposition. The model can request verbose tool outputs, try multiple approaches, and investigate broadly -- knowing it can prune the dead ends and keep only what matters for the current task.
Static vs In-Run Context
Every message in the context belongs to one of two streams:
- user_context -- All
Usermessages: the initial prompt, follow-ups, steering messages, and any user-injected content. These represent user intent. - inrun_context -- All model-generated content:
Assistantmessages,ToolCallcontent, andToolResultmessages. These are the model's working memory.
The system_prompt is separate from both streams. It is a dedicated field on AgentContext, always occupies the first position, and is never subject to pruning or compaction.
Pruning Rules
- user_context is NEVER pruned. User intent is sacred. The model cannot discard the user's words, steering messages, or follow-up instructions.
- inrun_context CAN be surgically pruned by the model using the PrunTool. The model decides what is no longer relevant and removes it.
- system_prompt is never pruned. It is not part of either stream.
PrunTool Variants
phi-core provides two pruning operations, both invoked by the model as tool calls:
prun(tokens)
Silent removal. The model specifies a token budget to reclaim, and the oldest inrun_context entries (by timestamp) are removed from the working context until the budget is met.
- Removed content is preserved in the session log -- nothing is lost permanently
- Removed entries become invisible to the LLM on subsequent turns
- The model sees a
ToolResultconfirming how many tokens were reclaimed
prun_with_memo(tokens, memo)
Removal with summary. Same as prun, but the model provides a concise memo string that replaces the pruned content in the working context.
- Each pruned message with a memo creates a separate
PrunedMemoentry at its original timestamp, preserving chronological order - Useful when the pruned content contained decisions or conclusions the model wants to remember
- The memo should be concise -- a few sentences, not a reproduction of the pruned content
Model Autonomy
The model decides which variant to use and when. Typical patterns:
- Silent prune after exploring a dead end (e.g., reading a file that turned out to be irrelevant)
- Memo prune after a productive investigation (e.g., "Investigated auth module: uses JWT with RS256, tokens expire after 1h, refresh handled in middleware")
- No prune when all context remains relevant to the current task
Working Context Rebuild
Each turn, the working context sent to the LLM is rebuilt from scratch by build_working_context(), merging the two streams:
- Collect all
user_contextentries with their timestamps - Collect all live
inrun_contextentries with their timestamps - For each
PrunedMemoentry, create a separate User message with the memo text at the entry's original timestamp - Sort all collected entries by timestamp to preserve chronological order
- Prepend the system_prompt
The result is a coherent conversation history where:
- User messages are always present
- Pruned-silent entries are invisible (the conversation flows as if they never existed)
- Each pruned-with-memo entry appears as a separate brief summary message at its original timestamp position, preserving the chronological position of the message it replaced
Session Log Integrity
The session log (context.messages) records everything that happened during the run. Pruning never modifies the session log -- it only affects what the LLM sees in the working context.
PrunRecord
Each prune operation emits a PrunApplied event (recorded in LoopRecord.events by SessionRecorder) containing:
- pruned_timestamps --
Vec<u64>of timestamps identifying the pruned messages - tokens_removed -- Total tokens reclaimed
- messages_removed -- Number of messages pruned
- memo -- Optional summary string (present only for
prun_with_memo)
On session reload, the two context streams are reconstructed by walking LoopRecord.events to find PrunApplied events. The pruned_timestamps field identifies which messages were pruned. These messages are placed in the pruned state (PrunedSilent or PrunedMemo depending on whether memo is Some), and their memo (if any) is restored as a separate message at the correct chronological position.
Compaction Interaction
Pruning and compaction are complementary mechanisms that operate at different levels:
| Pruning | Compaction | |
|---|---|---|
| Trigger | Model-directed (tool call) | Threshold-triggered (automatic) |
| Granularity | Surgical (specific messages) | Bulk (entire middle section) |
| Scope | inrun_context only | All messages in the compaction window |
| Preserved in | Session log + PrunRecord | CompactionBlock overlay |
After Compaction
When compaction fires, it summarizes a range of messages into a CompactionBlock. After compaction:
- All surviving messages (the summary, kept-first, and kept-recent) become part of user_context -- they are treated as established context and are unprunable
- New model-generated content after compaction starts a fresh inrun_context stream
- The model can prune this new inrun_context as usual
This means compaction resets the pruning boundary. Content that was once prunable inrun_context, if it survives compaction, becomes permanent user_context.
Configuration
TOML
[tools]
enabled = ["bash", "read_file", "write_file", "edit_file", "search", "prun"]
Adding "prun" to the enabled tools list makes both prun and prun_with_memo available to the model. They are two operations exposed through a single tool registration.
Rust (Programmatic)
#![allow(unused)] fn main() { use phi_core::agents::BasicAgent; use phi_core::provider::ModelConfig; let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Sonnet", &api_key)) .with_default_tools() .with_prun_tool(); // enables context pruning }
The with_prun_tool() builder method registers the PrunTool, making both pruning variants available. It can be combined with any other tool configuration.
Recommended Setup
Context pruning works best when compaction is also configured, providing both surgical (model-directed) and bulk (automatic) context management:
[tools]
enabled = ["bash", "read_file", "write_file", "edit_file", "search", "prun"]
[compaction]
max_context_tokens = 200000
compact_at_pct = 0.85
keep_first_turns = 2
keep_recent_turns = 4
With this setup, the model can prune irrelevant exploration results as it works, and compaction provides a safety net if the context still grows too large.
Configuration Guide
Define your entire agent in a config file — model, tools, compaction, limits — and construct it with two lines of Rust:
use phi_core::{parse_config_file, agent_from_config, Agent}; use std::path::Path; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let config = parse_config_file(Path::new("agent.toml"))?; let agent = agent_from_config(&config)?; // agent is Arc<dyn Agent> — ready to prompt println!("Agent model: {:?}", agent.model_config().unwrap().id); Ok(()) }
Overview
The configuration system replaces scattered Rust builder calls with a declarative config file. Instead of this:
#![allow(unused)] fn main() { let agent = BasicAgent::new(ModelConfig::anthropic("claude-sonnet-4-20250514", "Sonnet", &key)) .with_system_prompt("You are a coding assistant.") .with_thinking(ThinkingLevel::High) .with_temperature(0.2) .with_execution_limits(ExecutionLimits { max_turns: 50, .. }) .with_context_config(ContextConfig { .. }); }
You write a TOML file:
[agent]
system_prompt = "You are a coding assistant."
[agent.profile]
thinking_level = "high"
temperature = 0.2
[provider]
model = "claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"
[execution]
max_turns = 50
Three formats supported: TOML (primary, Rust-idiomatic), JSON (programmatic generation), YAML (human-friendly alternative).
Pipeline: Config file → parse_config_file() → AgentConfig struct → agent_from_config() → Arc<dyn Agent>
Quick Start
1. Create agent.toml:
[provider]
model = "claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"
2. Load and use it:
use phi_core::{parse_config_file, agent_from_config, Agent}; use std::path::Path; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let config = parse_config_file(Path::new("agent.toml"))?; let agent = agent_from_config(&config)?; // The agent is an Arc<dyn Agent> wrapping a BasicAgent internally. // Access configuration through trait methods: let model = agent.model_config().unwrap(); println!("Using model: {} via {}", model.id, model.provider); Ok(()) }
Only the [provider] section is required. Everything else has sensible defaults.
Config Formats
TOML (Recommended)
The primary format. Clean, readable, Rust-idiomatic.
#![allow(unused)] fn main() { use phi_core::config::{parse_config, ConfigFormat}; let toml_str = r#" [provider] model = "claude-sonnet-4-20250514" api_key = "sk-..." "#; let config = parse_config(toml_str, ConfigFormat::Toml)?; }
JSON
Useful when generating config programmatically.
#![allow(unused)] fn main() { let json_str = r#"{ "provider": { "model": "gpt-4o", "api_key": "sk-...", "api": "openai" } }"#; let config = parse_config(json_str, ConfigFormat::Json)?; }
YAML
Human-friendly alternative.
#![allow(unused)] fn main() { let yaml_str = "provider:\n model: claude-sonnet-4-20250514\n api_key: sk-..."; let config = parse_config(yaml_str, ConfigFormat::Yaml)?; }
Auto-Detection
parse_config_file detects format from the file extension:
| Extension | Format |
|---|---|
.toml | TOML |
.json | JSON |
.yaml, .yml | YAML |
parse_config_auto tries all formats in order (TOML → JSON → YAML) and returns the first successful parse.
Environment Variable Substitution
Any string field in the config can reference environment variables with ${VAR}:
[provider]
api_key = "${ANTHROPIC_API_KEY}"
base_url = "${CUSTOM_API_URL}"
[agent]
system_prompt = "Running in ${ENVIRONMENT} mode."
How it works:
- Substitution happens before parsing (pre-parse text replacement)
- Works in all three formats (TOML, JSON, YAML)
- Missing variables produce
ConfigError::MissingEnvVar - Malformed patterns like
${UNCLOSEDare passed through literally - Empty
${}is passed through literally
Agent Profile
An AgentProfile is a reusable blueprint that defines default configuration. Multiple agent instances can share the same profile while overriding specific fields.
[agent.profile]
name = "coding-agent"
description = "An agent specialized for code generation and review"
system_prompt = "You are an expert software engineer."
thinking_level = "high"
temperature = 0.2
max_tokens = 16384
config_id = "coder"
skills = ["code-review", "debugging"]
System Prompt Resolution
The system prompt is resolved through a priority chain. The first non-empty value wins:
[agent].system_prompt— explicit agent-level override (highest priority)- Profile instance
system_prompt— when an agent instance references a profile via{{agent_profile.name}} [agent.profile].system_prompt— base inline profile fallback- Empty string (no system prompt)
Inline Text
The simplest form — write the prompt directly:
[agent.profile]
system_prompt = "You are an expert software engineer."
Agent-level overrides the profile:
[agent.profile]
system_prompt = "You are a general assistant." # default from blueprint
[agent]
system_prompt = "You are a Python specialist." # overrides the profile
File Reference (file: prefix)
Load the prompt from a file. Relative paths resolve from the agent's workspace directory:
[agent]
workspace = "workspace"
[agent.profile]
system_prompt = "file:system_prompt.md" # resolves to workspace/system_prompt.md
Absolute paths are used as-is:
[agent.profile]
system_prompt = "file:/etc/phi/prompts/coder.md"
The file: prefix works at all levels: [agent].system_prompt, [agent.profile].system_prompt, and [[agent.profile.instances]].system_prompt.
Strategy Reference ({{...}} protocol)
For advanced multi-block prompt composition, reference a system prompt instance. This uses a 3-entity chain: strategy (block template) → prompt instance (block content) → agent reference.
# 1. Define the strategy template — block structure with ordering and size limits
[[system_prompt_strategy.instances]]
id = "{{coding_strategy}}"
[[system_prompt_strategy.instances.blocks]]
name = "identity"
order = 0
max_length = 2000
[[system_prompt_strategy.instances.blocks]]
name = "instructions"
order = 1
max_length = 3000
[[system_prompt_strategy.instances.blocks]]
name = "constraints"
order = 2
max_length = 1000
# 2. Define the prompt instance — fills content into the strategy's blocks
# Block values can be inline text or file: references (relative to workspace)
[[system_prompt.instances]]
id = "{{coding_prompt}}"
description = "System prompt for coding agents"
type = "{{system_prompt_strategy.coding_strategy}}"
identity = "You are an expert software engineer at a fintech company."
instructions = "file:prompts/coding_instructions.md"
constraints = "Never modify production databases. Always write tests."
# 3. Reference the prompt instance from the agent
[agent]
system_prompt = "{{system_prompt.coding_prompt}}"
workspace = "workspace"
The builder resolves the chain: finds the prompt instance → finds its strategy → sorts blocks by order → resolves file: paths → truncates each block to max_length → joins with double newlines.
See the Field Reference for [system_prompt_strategy] and [system_prompt] sections.
Profile Instance Override
When using named profile instances, the instance's system_prompt participates in resolution. The system_prompt field on a profile instance supports all three modes — inline text, file: path, or {{...}} reference to a system_prompt instance:
# ── System prompt strategy + instance (reusable prompt definition) ───
[[system_prompt_strategy.instances]]
id = "{{simple}}"
[[system_prompt_strategy.instances.blocks]]
name = "identity"
order = 0
max_length = 5000
[[system_prompt.instances]]
id = "{{coder_prompt}}"
type = "{{system_prompt_strategy.simple}}"
identity = "file:prompts/coder.md"
[[system_prompt.instances]]
id = "{{reviewer_prompt}}"
type = "{{system_prompt_strategy.simple}}"
identity = "file:prompts/reviewer.md"
# ── Profile instances reference system_prompt instances ──────────────
[agent.profile]
name = "base"
system_prompt = "You are a general assistant." # base fallback
[[agent.profile.instances]]
id = "{{coder}}"
system_prompt = "{{system_prompt.coder_prompt}}" # profile → system_prompt instance
temperature = 0.2
max_tokens = 16384
[[agent.profile.instances]]
id = "{{reviewer}}"
system_prompt = "{{system_prompt.reviewer_prompt}}" # profile → system_prompt instance
temperature = 0.1
max_tokens = 8192
# ── Agent instances reference profile instances ──────────────────────
[[agent.instances]]
name = "code-writer"
agent_profile = "{{agent_profile.coder}}" # agent → profile → system_prompt
[[agent.instances]]
name = "code-reviewer"
agent_profile = "{{agent_profile.reviewer}}" # agent → profile → system_prompt
[[agent.instances]]
name = "generalist"
# no agent_profile → uses base [agent.profile].system_prompt
Full reference chain: agent.instances → agent.profile.instances (via agent_profile) → system_prompt.instances (via system_prompt) → system_prompt_strategy.instances (via type). Each layer can override or inherit from the one above.
When an agent instance omits agent_profile, it is built using the base [agent.profile] directly (no instance override). The base profile's system_prompt, temperature, and other fields apply as defaults.
Workspace-relative Resolution
The file: prefix resolves relative to the agent's workspace directory. Each agent instance can set its own workspace, so the same file: reference resolves to different files per agent:
[agent.profile]
name = "base"
[[agent.profile.instances]]
id = "{{copywriter}}"
system_prompt = "file:system_prompt.md" # same file ref, different workspace resolution
temperature = 0.7
[[agent.instances]]
name = "alpha-writer"
agent_profile = "{{agent_profile.copywriter}}"
workspace = "projects/alpha" # reads projects/alpha/system_prompt.md
[[agent.instances]]
name = "beta-writer"
agent_profile = "{{agent_profile.copywriter}}"
workspace = "projects/beta" # reads projects/beta/system_prompt.md
Workspace resolution order:
[[agent.instances]].workspace— per-agent instance (highest priority)[agent].workspace— shared agent-leveldefault_workspace— global default"."— current directory
Thinking Level
Controls depth of model reasoning. Specified as a string in config:
| Config Value | Rust Enum | Description |
|---|---|---|
"off" | ThinkingLevel::Off | No chain-of-thought (default) |
"minimal" | ThinkingLevel::Minimal | Lightweight reasoning |
"low" | ThinkingLevel::Low | Some reasoning |
"medium" | ThinkingLevel::Medium | Moderate reasoning |
"high" | ThinkingLevel::High | Deep reasoning before responding |
Parsing is case-insensitive: "High", "HIGH", "high" all work.
Skills vs Tools
skills in the profile are skill names loaded via SkillSet from SKILL.md files (per the AgentSkills standard). They are NOT tools. See Skills for details.
Profile Instances
Profile instances are named variations of the profile blueprint. Each instance inherits the profile defaults and overrides specific fields. This lets you define a single profile and then create specialized variants without duplicating the entire configuration.
Use [[agent.profile.instances]] to define instances. Each instance requires an id field using the {{...}} ID reference protocol (see below). Instance fields override the corresponding profile defaults; any field not specified falls through to the profile value.
Agent instances reference a profile instance via the agent_profile field, using either a qualified reference ({{agent_profile.name}}) or an unqualified reference ({{name}}) if the name is unique across all namespaces.
Example
# ── Profile defaults ──────────────────────────────────────────
[agent.profile]
name = "coding-agent"
description = "An agent specialized for code tasks"
system_prompt = "You are an expert software engineer."
thinking_level = "high"
temperature = 0.2
max_tokens = 16384
# ── Profile instances (override specific fields) ─────────────
[[agent.profile.instances]]
id = "{{%coder%}}"
description = "A code generation specialist"
thinking_level = "high"
temperature = 0.2
max_tokens = 16384
config_id = "coder"
[[agent.profile.instances]]
id = "{{%reviewer%}}"
description = "A code review specialist"
thinking_level = "high"
temperature = 0.1
max_tokens = 8192
config_id = "reviewer"
# ── Agent instances referencing profile instances ─────────────
[[agent.instances]]
name = "code-writer"
agent_profile = "{{agent_profile.coder}}"
system_prompt = "You write clean, well-tested code. Follow existing patterns."
[[agent.instances]]
name = "code-reviewer"
agent_profile = "{{agent_profile.reviewer}}"
system_prompt = "You review code for bugs, security issues, and style violations."
The code-writer agent inherits all profile defaults and applies the coder instance overrides. The code-reviewer agent uses the reviewer instance, which sets a lower temperature and smaller token budget for more focused review output.
ID Reference Protocol
The {{...}} syntax is a lightweight reference protocol for linking configuration entities (providers, profile instances, sub-agents) by name. It appears in id fields (to declare an entity) and in reference fields like provider and agent_profile (to point to an entity).
Syntax
| Pattern | Meaning |
|---|---|
{{type.name}} | Qualified reference, recreate if invoked |
{{%type.name%}} | Qualified reference, no recreation if already exists |
{{name}} | Unqualified reference (unique resolve), recreate if invoked |
{{%name%}} | Unqualified reference, no recreation if already exists |
{{#system_id#}} | Literal system ID, no recreation |
Namespaces
References are resolved within namespaces. The three namespaces are:
agent_profile-- Profile instances declared in[[agent.profile.instances]]provider-- Provider instances declared in[[provider.instances]]sub_agent-- Sub-agent instances declared in[[sub_agents.instances]]
Resolution
Qualified references ({{type.name}}) include the namespace prefix and always resolve unambiguously. Use these when multiple namespaces could contain the same name.
Unqualified references ({{name}}) omit the namespace. The system searches all namespaces and resolves the reference only if the name is unique. If multiple entities share the same name across namespaces, an unqualified reference is ambiguous and will produce an error.
Recreation Semantics
The % sigil controls whether an entity is recreated when referenced:
- Without
%({{name}}or{{type.name}}): The entity is recreated each time it is resolved. Use this when you want fresh instances. - With
%({{%name%}}or{{%type.name%}}): The entity is reused if it already exists (matched by latest creation date). Use this for shared singletons like provider connections.
The {{#system_id#}} form references a literal system-generated ID and never triggers recreation.
Usage in ID Fields
When declaring an entity, the id field establishes the entity's name within its namespace:
[[provider.instances]]
id = "{{%openai%}}" # declares "openai" in the provider namespace
model = "gpt-4o"
Usage in Reference Fields
When referencing an entity from another section, use the reference syntax:
[[agent.instances]]
name = "my-agent"
provider = "{{provider.openai}}" # qualified reference
agent_profile = "{{reviewer}}" # unqualified (must be unique)
Provider Configuration
The [provider] section defines the LLM model, API credentials, and protocol.
[provider]
model = "claude-sonnet-4-20250514" # Model ID sent to the API
api_key = "${ANTHROPIC_API_KEY}" # API credential
api = "anthropic_messages" # API protocol
provider = "anthropic" # Provider name
name = "Claude Sonnet 4" # Human-friendly display name
reasoning = true # Model supports thinking
context_window = 200000 # Context window in tokens
max_tokens = 8192 # Default max output tokens
API Protocols
| Config Value | Aliases | Protocol |
|---|---|---|
"anthropic_messages" | "anthropic" | Anthropic Messages API |
"openai_completions" | "openai" | OpenAI Chat Completions |
"openai_responses" | OpenAI Responses API | |
"azure_openai_responses" | "azure" | Azure OpenAI |
"google_generative_ai" | "google", "gemini" | Google Gemini |
"google_vertex" | "vertex" | Google Vertex AI |
"bedrock_converse_stream" | "bedrock" | Amazon Bedrock |
Default base URLs are set automatically per protocol when base_url is omitted:
- Anthropic:
https://api.anthropic.com - OpenAI:
https://api.openai.com - Google:
https://generativelanguage.googleapis.com - Others: empty (uses provider defaults)
Important: The API protocol is NOT auto-detected from the model name. If you set model = "gpt-4o", you must also set api = "openai" explicitly.
Cost Rates
Enable cost tracking by setting per-token rates:
[provider.cost]
input_per_million = 3.0 # $ per million input tokens
output_per_million = 15.0 # $ per million output tokens
cache_read_per_million = 0.3 # $ per million cache-read tokens
cache_write_per_million = 3.75
Cost is tracked automatically after each turn. Combine with [execution].max_cost to enforce a budget.
Custom Headers
[provider]
model = "my-model"
[provider.headers]
"X-Custom-Header" = "value"
"Authorization" = "Bearer ${CUSTOM_TOKEN}"
Multiple Providers
Use [[provider.instances]] to define named providers alongside the default. Each instance uses the {{...}} ID reference protocol to declare its name in the provider namespace. The url field is an alias for base_url.
# Default provider — Anthropic (used unless overridden)
[provider]
model = "claude-sonnet-4-20250514"
name = "Claude Sonnet 4"
api_key = "${ANTHROPIC_API_KEY}"
api = "anthropic_messages"
provider = "anthropic"
[provider.cost]
input_per_million = 3.0
output_per_million = 15.0
cache_read_per_million = 0.3
cache_write_per_million = 3.75
# OpenAI
[[provider.instances]]
id = "{{%openai%}}"
description = "OpenAI GPT-4o provider"
name = "GPT-4o"
model = "gpt-4o"
api_key = "${OPENAI_API_KEY}"
api = "openai_completions"
url = "https://api.openai.com/v1"
# OpenRouter
[[provider.instances]]
id = "{{%openrouter%}}"
description = "OpenRouter multi-model gateway"
name = "OpenRouter"
model = "anthropic/claude-sonnet-4"
api_key = "${OPENROUTER_API_KEY}"
api = "openai_completions"
url = "https://openrouter.ai/api/v1"
provider = "openrouter"
# Google Gemini
[[provider.instances]]
id = "{{%gemini%}}"
description = "Google Gemini 2.5 Flash provider"
name = "Gemini 2.5 Flash"
model = "gemini-2.5-flash"
api_key = "${GOOGLE_API_KEY}"
api = "google_generative_ai"
# Ollama (local)
[[provider.instances]]
id = "{{%ollama%}}"
description = "Local Ollama instance for development"
name = "Ollama Llama 3.2"
model = "llama3.2"
api = "openai_completions"
url = "http://localhost:11434/v1"
api_key = "not-needed"
provider = "ollama"
Agent instances and sub-agents reference these via the ID protocol (e.g., provider = "{{provider.openai}}" or provider = "{{ollama}}" if unique).
Session Configuration
The [session] section controls session scope.
[session]
scope = "persistent" # "ephemeral" (default) or "persistent"
Session Scope
| Value | Behavior |
|---|---|
"ephemeral" | Session exists only in memory for the process lifetime (default) |
"persistent" | Session data is written to a store and survives restarts |
Note: Setting scope = "persistent" declares intent but does not automatically configure a storage backend. The caller must set up session persistence using the session recorder.
Thinking level and temperature are configured per-loop via LoopConfigSnapshot (captured on each AgentStart event) rather than at the session level. Set them on the agent profile or AgentLoopConfig.
Tools
The [tools] section declares which tools the agent can use and how they execute.
[tools]
enabled = ["bash", "file_read", "file_write", "search"]
tool_strategy = "parallel" # "sequential", "parallel", or "batched"
batch_size = 3 # Only used when strategy is "batched"
Tool Execution Strategies
| Strategy | Behavior |
|---|---|
"sequential" | One tool at a time; checks steering queue between each |
"parallel" | All tool calls concurrent; check steering after all complete (default) |
"batched" | Run N concurrent, wait, check steering, next batch |
Context Pruning
Enable model-directed context pruning with with_prun_tool(). This lets the model surgically remove irrelevant inrun content (its own messages, tool calls, tool results) from the working context to reclaim space in the context window. User messages are never pruned. See Context Pruning for details.
#![allow(unused)] fn main() { let agent = BasicAgent::new(model_config) .with_default_tools() .with_prun_tool(); }
Or via config:
[tools]
enabled = ["bash", "read_file", "write_file", "prun"]
Registering Tools at Runtime
Tools are NOT instantiated from the config file. The config specifies tool names only. You must register tool instances after constructing the agent:
#![allow(unused)] fn main() { use phi_core::{parse_config_file, agent_from_config, Agent}; use phi_core::tools::{BashTool, ReadFileTool, WriteFileTool, SearchTool}; use std::sync::Arc; let config = parse_config_file(Path::new("agent.toml"))?; let agent = agent_from_config(&config)?; // Cast to mutable and register tools let agent_mut = Arc::get_mut(&mut agent).unwrap(); agent_mut.set_tools(vec![ Arc::new(BashTool::default()), Arc::new(ReadFileTool::new()), Arc::new(WriteFileTool::new()), Arc::new(SearchTool::new()), ]); }
Tool Registry
Instead of manually registering tools after construction, use agent_from_config_with_registry() to resolve tool names from the config automatically:
#![allow(unused)] fn main() { use phi_core::{parse_config_file, agent_from_config_with_registry, Agent}; use phi_core::tools::ToolRegistry; use std::path::Path; let config = parse_config_file(Path::new("agent.toml"))?; // Create a registry with the 6 built-in tools let registry = ToolRegistry::new().with_defaults(); // Tools listed in config.tools.enabled are resolved through the registry let agent = agent_from_config_with_registry(&config, ®istry)?; }
The default registry includes all 6 built-in tools: bash, read_file, write_file, edit_file, list_files, search. You can also register custom tools:
#![allow(unused)] fn main() { let mut registry = ToolRegistry::new().with_defaults(); registry.register("my_tool", || Arc::new(MyCustomTool::new())); let agent = agent_from_config_with_registry(&config, ®istry)?; }
Unknown tool names in tools.enabled are silently skipped. Use registry.contains(name) to check availability before construction if needed.
Context & Compaction
The [compaction] section controls automatic context management. When the conversation grows too long, compaction summarizes older messages to stay within the model's context window.
[compaction]
max_context_tokens = 200000 # Model's context window
system_prompt_tokens = 4000 # Tokens reserved for system prompt
compact_at_pct = 0.85 # Start measuring at 85% capacity
compact_budget_threshold_pct = 0.05 # Compact when < 5% headroom remains
keep_first_turns = 2 # Keep first 2 turns verbatim
keep_recent_turns = 4 # Keep last 4 turns verbatim
max_summary_tokens = 2000 # Token budget for the summarized middle
tool_output_max_lines = 50 # Truncate tool outputs to 50 lines
Compaction must be explicitly enabled by setting max_context_tokens. If omitted, compaction is disabled entirely.
How Compaction Works
- Before each LLM turn, the loop estimates current token usage
- If usage exceeds the trigger threshold, compaction fires
- First N turns are kept verbatim (preserves initial context)
- Middle turns are summarized (aggressive token reduction)
- Last M turns are kept verbatim (preserves recent history)
- Tool outputs in kept turns are truncated to
max_lines
See Context Compaction for the full algorithm.
Focused Compaction
The focus_message field steers what the compaction summary emphasizes. Compaction instances let you define named variations that agent profiles can reference.
[compaction]
max_context_tokens = 200000
focus_message = "Retain key decisions and code changes."
# Named compaction instances
[[compaction.instances]]
id = "{{%coding%}}"
focus_message = "Focus on file paths, function signatures, and design rationale."
keep_recent_turns = 6
max_summary_tokens = 3000
[[compaction.instances]]
id = "{{%research%}}"
focus_message = "Preserve citations, data sources, and methodology."
keep_first_turns = 3
max_summary_tokens = 4000
Profiles reference a compaction instance via compaction = "{{compaction.coding}}":
[agent.profile]
name = "coding-agent"
compaction = "{{compaction.coding}}"
See Focused Compaction for full details.
Execution Limits
The [execution] section sets safety guards that prevent runaway loops and budget overruns.
[execution]
max_turns = 50 # Maximum LLM turns (default: 50)
max_total_tokens = 1000000 # Total token budget (default: 1,000,000)
max_duration_secs = 600 # Wall-clock timeout in seconds (default: 600)
max_cost = 5.0 # Dollar cost cap (requires [provider.cost] rates)
Cost Tracking
Cost enforcement requires both cost rates and a budget:
[provider.cost]
input_per_million = 3.0
output_per_million = 15.0
[execution]
max_cost = 5.0 # Stop when accumulated cost reaches $5
Without cost rates (all zeros), max_cost has no effect. Token usage is always tracked regardless.
Retry Configuration
Automatic retry for transient provider errors (rate limits, network issues):
[execution.retry]
max_retries = 3 # Retry attempts (default: 3, 0 = disabled)
initial_delay_ms = 1000 # First retry delay in ms
backoff_multiplier = 2.0 # Exponential backoff multiplier
max_delay_ms = 30000 # Maximum delay cap
Only RateLimited and Network errors are retried. Invalid requests and context overflows fail immediately.
Cache Configuration
Control prompt caching behavior:
[execution.cache]
enabled = true # Master switch (default: true)
strategy = "auto" # "auto" or "disabled"
Sub-Agents
Define sub-agents that run their own agent loops when invoked as tools:
[[sub_agents.instances]]
name = "researcher"
description = "Searches the web for information"
system_prompt = "You are a research assistant. Search thoroughly."
model = "claude-haiku-4-5-20251001"
max_turns = 10
tools = ["web_search"]
[[sub_agents.instances]]
name = "code_writer"
description = "Writes and edits code files"
system_prompt = "You are a code generation expert."
provider = "openai" # References a [[provider.instances]] by name
max_turns = 20
tools = ["bash", "file_write"]
Sub-agents do NOT inherit the parent agent's configuration. Each sub-agent is fully independent — set all needed fields explicitly.
Multi-Agent Configurations
For complex setups, combine named providers with named agent instances:
# Providers
[provider]
model = "claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"
[[provider.instances]]
name = "fast"
model = "claude-haiku-4-5-20251001"
api_key = "${ANTHROPIC_API_KEY}"
# Agent instances
[[agent.instances]]
name = "planner"
system_prompt = "You are an architect. Plan the approach."
provider = "fast"
[[agent.instances]]
name = "executor"
system_prompt = "You are an implementer. Write the code."
Agent Workspace
The workspace field sets the working directory for an agent. Tools that interact with the filesystem (bash, file read/write, etc.) use this as their base path.
There are two levels of workspace configuration:
default_workspace(top-level config field): Sets the default workspace for all agents. If omitted, the current working directory is used.workspace(per-agent field on[agent.profile]or[[agent.instances]]): Overridesdefault_workspacefor a specific agent.
default_workspace = "/home/user/projects"
[agent.profile]
workspace = "/home/user/projects/my-app" # overrides default_workspace for this agent
Callbacks & Hooks
The config schema accepts [callbacks] and [hooks] sections for lifecycle hooks:
[callbacks]
before_loop = "my_plugin::before_loop"
after_turn = "my_plugin::after_turn"
before_task = "./scripts/on_task_start.sh"
after_task = "python3 scripts/after_task.py"
[hooks]
transform_context = "my_plugin::transform"
Script-based callbacks (shell scripts, Python scripts) are supported. The agent spawns the script as a subprocess, passing context via environment variables. Exit code 0 means continue; non-zero aborts the action (for Before* hooks). WASM plugin loading for Rust-native callbacks is planned for Phase 2.
Session-Level Callbacks
before_task and after_task are session-level callbacks configured on SessionRecorderConfig:
before_task: Fires on the firstAgentStartevent with a newsession_id. Use for task-level setup, metrics initialization, or audit logging.after_task: Fires onflush(). Use for task-level teardown, billing, or summary generation.
Programmatic Hooks
To set hooks programmatically, use the Agent trait setter methods after construction:
#![allow(unused)] fn main() { let agent = agent_from_config(&config)?; let agent_mut = Arc::get_mut(&mut agent).unwrap(); agent_mut.set_before_loop(Some(Arc::new(|msgs, n| { println!("Loop starting with {} messages", msgs.len()); true // return false to abort }))); }
Complete Example
A full coding agent configuration using every section:
# ── Agent identity ────────────────────────────────────────────
[agent]
system_prompt = "You are an expert software engineer."
[agent.profile]
name = "coding-agent"
description = "Full-featured coding assistant"
thinking_level = "high"
temperature = 0.2
max_tokens = 16384
config_id = "coder-v1"
skills = ["code-review"]
# ── Provider ──────────────────────────────────────────────────
[provider]
model = "claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"
reasoning = true
context_window = 200000
[provider.cost]
input_per_million = 3.0
output_per_million = 15.0
cache_read_per_million = 0.3
cache_write_per_million = 3.75
# ── Session ───────────────────────────────────────────────────
[session]
scope = "persistent"
# ── Tools ─────────────────────────────────────────────────────
[tools]
enabled = ["bash", "file_read", "file_write", "search", "edit_file"]
tool_strategy = "parallel"
# ── Context management ────────────────────────────────────────
[compaction]
max_context_tokens = 200000
system_prompt_tokens = 4000
compact_at_pct = 0.85
keep_first_turns = 2
keep_recent_turns = 4
max_summary_tokens = 2000
tool_output_max_lines = 50
# ── Execution limits ──────────────────────────────────────────
[execution]
max_turns = 100
max_total_tokens = 2000000
max_duration_secs = 1800
max_cost = 10.0
[execution.retry]
max_retries = 3
initial_delay_ms = 1000
backoff_multiplier = 2.0
[execution.cache]
enabled = true
strategy = "auto"
# ── Sub-agents ────────────────────────────────────────────────
[[sub_agents.instances]]
name = "researcher"
description = "Searches for information and documentation"
system_prompt = "Find relevant information. Be thorough."
model = "claude-haiku-4-5-20251001"
max_turns = 10
tools = ["web_search"]
Field Reference
[agent]
| Field | Type | Default | Description |
|---|---|---|---|
system_prompt | string | None | Agent-level system prompt (overrides profile). Supports: inline text, file:path (relative to workspace), or {{...}} reference to a [[system_prompt.instances]] entry. |
profile | table | (empty) | Profile blueprint (see below) |
workspace | string | None | Workspace directory for file: resolution and tool paths |
instances | array | [] | Named agent instances |
[agent.profile]
| Field | Type | Default | Description |
|---|---|---|---|
profile_id | string | UUID | Unique profile identifier |
name | string | None | Human-readable name |
description | string | None | Profile description |
system_prompt | string | None | Default system prompt. Supports: inline text, file:path, or {{...}} reference. |
thinking_level | string | None | "off", "minimal", "low", "medium", "high" |
temperature | float | None | LLM temperature (0.0-2.0) |
max_tokens | integer | None | Max output tokens |
config_id | string | None | Stable identity for loop_id generation |
skills | array | [] | Skill names (SKILL.md, not tools) |
instances | array | [] | Named profile instances (see ProfileInstanceSection) |
ProfileInstanceSection
Each entry in [[agent.profile.instances]]:
| Field | Type | Default | Description |
|---|---|---|---|
id | string | required | {{...}} ID in the agent_profile namespace |
description | string | None | Human-readable description of this variant |
name | string | (from profile) | Override name |
system_prompt | string | (from profile) | Override system prompt (supports inline, file:, or {{...}}) |
thinking_level | string | (from profile) | Override thinking level |
temperature | float | (from profile) | Override temperature |
max_tokens | integer | (from profile) | Override max output tokens |
config_id | string | None | Stable identity for loop_id generation |
skills | array | (from profile) | Override skill names |
AgentInstanceSection
Each entry in [[agent.instances]]:
| Field | Type | Default | Description |
|---|---|---|---|
name | string | "unnamed" | Instance name |
agent_profile | string | None | Profile instance reference ({{...}} syntax) |
profile | table | None | Inline profile override (not a reference) |
system_prompt | string | None | Instance-specific system prompt |
provider | string | (default provider) | Provider reference ({{...}} syntax) |
workspace | string | None | Per-instance workspace directory (overrides [agent].workspace) |
[provider]
| Field | Type | Default | Description |
|---|---|---|---|
model | string | "unknown" | Model ID sent to API |
api_key | string | "" | API credential (supports ${VAR}) |
api | string | "anthropic_messages" | API protocol |
base_url | string | (per protocol) | API base URL (url is an accepted alias) |
provider | string | "anthropic" | Provider name |
name | string | model value | Display name |
reasoning | bool | false | Supports thinking/reasoning |
context_window | integer | 200000 | Context window tokens |
max_tokens | integer | 8192 | Default max output tokens |
ProviderInstanceSection
Each entry in [[provider.instances]] accepts all fields from [provider] above, plus:
| Field | Type | Default | Description |
|---|---|---|---|
id | string | None | {{...}} ID in the provider namespace |
description | string | None | Human-readable description of this provider |
url | string | None | Alias for base_url |
[provider.cost]
| Field | Type | Default | Description |
|---|---|---|---|
input_per_million | float | 0.0 | Input token rate |
output_per_million | float | 0.0 | Output token rate |
cache_read_per_million | float | 0.0 | Cache read rate |
cache_write_per_million | float | 0.0 | Cache write rate |
[session]
| Field | Type | Default | Description |
|---|---|---|---|
scope | string | "ephemeral" | "ephemeral" or "persistent" |
[tools]
| Field | Type | Default | Description |
|---|---|---|---|
enabled | array | [] | Tool names (resolved by caller) |
tool_strategy | string | "parallel" | "sequential", "parallel", "batched" |
batch_size | integer | 3 | Batch size for "batched" strategy |
[compaction]
| Field | Type | Default | Description |
|---|---|---|---|
max_context_tokens | integer | None | Context window (must set to enable compaction) |
system_prompt_tokens | integer | 4000 | Reserved system prompt tokens |
compact_at_pct | float | 0.90 | Measurement threshold |
compact_budget_threshold_pct | float | 0.05 | Compaction trigger |
keep_first_turns | integer | 2 | Verbatim turns from start |
keep_recent_turns | integer | 10 | Verbatim turns from end |
max_summary_tokens | integer | 2000 | Summary token budget |
tool_output_max_lines | integer | 50 | Tool output line cap |
[system_prompt_strategy]
Strategy templates define block structure for multi-block system prompts.
[[system_prompt_strategy.instances]]
| Field | Type | Default | Description |
|---|---|---|---|
id | string | required | {{...}} ID for this strategy template |
description | string | None | Human-readable description |
blocks | array | [] | Block definitions (see below) |
[[system_prompt_strategy.instances.blocks]]
| Field | Type | Default | Description |
|---|---|---|---|
name | string | required | Block name (e.g., "identity", "instructions", "constraints") |
order | integer | 0 | Assembly order — lower appears first in the composed prompt |
max_length | integer | unlimited | Maximum character budget for this block |
[system_prompt]
Prompt instances fill content into a strategy's blocks.
[[system_prompt.instances]]
| Field | Type | Default | Description |
|---|---|---|---|
id | string | required | {{...}} ID for this prompt instance |
description | string | None | Human-readable description |
type | string | None | {{...}} reference to a strategy instance (e.g., "{{system_prompt_strategy.coding}}") |
| (block names) | string | — | Each block defined in the strategy gets a field here. Value is inline text or "file:path" (relative to workspace). |
Note: Block content fields use #[serde(flatten)] — they appear as top-level keys on the instance, not nested under a blocks table.
[execution]
| Field | Type | Default | Description |
|---|---|---|---|
max_turns | integer | 50 | Maximum LLM turns |
max_total_tokens | integer | 1000000 | Total token budget |
max_duration_secs | integer | 600 | Wall-clock timeout (seconds) |
max_cost | float | None | Dollar cost cap |
[execution.retry]
| Field | Type | Default | Description |
|---|---|---|---|
max_retries | integer | 3 | Retry attempts (0 = disabled) |
initial_delay_ms | integer | 1000 | First retry delay (ms) |
backoff_multiplier | float | 2.0 | Exponential backoff factor |
max_delay_ms | integer | 30000 | Maximum delay cap (ms) |
[execution.cache]
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Master switch |
strategy | string | "auto" | "auto" or "disabled" |
Error Handling
agent_from_config() and the parse functions return ConfigError:
| Variant | Cause | Fix |
|---|---|---|
Parse(msg) | Invalid TOML/JSON/YAML syntax | Check syntax; the message includes the parser error |
MissingEnvVar { var } | ${VAR} references an unset env var | Set the variable or remove the reference |
InvalidField { field, value, expected } | Invalid enum value (e.g., thinking_level = "extreme") | Use one of the expected values |
Io(err) | File not found or not readable | Check file path and permissions |
Common Mistakes
Forgetting to set the API protocol for non-Anthropic models:
# Wrong — defaults to anthropic_messages, fails at runtime
[provider]
model = "gpt-4o"
api_key = "${OPENAI_API_KEY}"
# Correct
[provider]
model = "gpt-4o"
api_key = "${OPENAI_API_KEY}"
api = "openai"
Setting max_cost without cost rates:
# max_cost is ignored — no rates to compute cost from
[execution]
max_cost = 5.0
# Correct — set rates AND budget
[provider.cost]
input_per_million = 3.0
output_per_million = 15.0
[execution]
max_cost = 5.0
Expecting tools to be instantiated from config:
[tools]
enabled = ["bash", "file_read"]
# These are names only — you must call agent.set_tools() in Rust
Programmatic Usage
Using AgentConfig Directly
You can construct AgentConfig in Rust without a file:
#![allow(unused)] fn main() { use phi_core::config::schema::{AgentConfig, ProviderSection, ProfileSection, AgentSection}; let config = AgentConfig { provider: ProviderSection { model: Some("claude-sonnet-4-20250514".into()), api_key: Some(std::env::var("ANTHROPIC_API_KEY")?), ..Default::default() }, agent: AgentSection { system_prompt: Some("You are helpful.".into()), profile: ProfileSection { thinking_level: Some("high".into()), ..Default::default() }, ..Default::default() }, ..Default::default() }; let agent = agent_from_config(&config)?; }
Mixing Config with Programmatic Overrides
After agent_from_config(), use Agent trait methods to add hooks, tools, or modify settings:
#![allow(unused)] fn main() { use phi_core::{parse_config_file, agent_from_config, Agent}; use std::sync::Arc; let config = parse_config_file(Path::new("agent.toml"))?; let mut agent = agent_from_config(&config)?; // Get mutable access to add tools and hooks let a = Arc::get_mut(&mut agent).unwrap(); a.set_tools(vec![Arc::new(phi_core::tools::BashTool::default())]); a.set_before_loop(Some(Arc::new(|msgs, _| { println!("Starting with {} messages", msgs.len()); true }))); }
Reading Config Through the Agent Trait
All configuration is accessible through Agent trait methods:
#![allow(unused)] fn main() { let agent = agent_from_config(&config)?; // Config accessors (all have defaults) agent.model_config(); // Option<&ModelConfig> agent.profile(); // Option<&AgentProfile> agent.system_prompt(); // &str agent.thinking_level(); // ThinkingLevel agent.temperature(); // Option<f32> agent.max_tokens(); // Option<u32> agent.context_config(); // Option<&ContextConfig> agent.execution_limits(); // Option<&ExecutionLimits> agent.cache_config(); // CacheConfig agent.tool_execution(); // ToolExecutionStrategy agent.retry_config(); // RetryConfig agent.session(); // Option<&Session> agent.build_config(); // Result<AgentLoopConfig, AgentBuildError> // Default impl returns Err(MissingModelConfig) // if model_config() returns None. BasicAgent's // override always returns Ok(...). }
MCP Integration
What is MCP?
The Model Context Protocol (MCP) is a JSON-RPC 2.0 protocol that lets AI agents discover and call tools from external servers. It defines a standard way for agents to connect to tool providers over two transports:
- Stdio — spawn a child process, communicate via stdin/stdout (newline-delimited JSON)
- HTTP — POST JSON-RPC requests to an HTTP endpoint
Connecting to MCP Servers
Stdio Transport
Use with_mcp_server_stdio() to spawn an MCP server process and register its tools:
use phi_core::BasicAgent; use phi_core::provider::ModelConfig; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let api_key = std::env::var("ANTHROPIC_API_KEY")?; let mut agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_system_prompt("You are a helpful assistant with file access.") .with_mcp_server_stdio( "npx", &["-y", "@modelcontextprotocol/server-filesystem", "/tmp"], None, ) .await?; let rx = agent.prompt("List files in /tmp").await; // handle events... Ok(()) }
You can pass environment variables to the server process:
#![allow(unused)] fn main() { use std::collections::HashMap; let mut env = HashMap::new(); env.insert("API_TOKEN".into(), "secret".into()); let agent = BasicAgent::new(model_config) .with_mcp_server_stdio("my-mcp-server", &["--port", "0"], Some(env)) .await?; }
HTTP Transport
For remote MCP servers exposed over HTTP:
#![allow(unused)] fn main() { let agent = BasicAgent::new(model_config) .with_mcp_server_http("http://localhost:8080/mcp") .await?; }
How MCP Tools Work
When you call with_mcp_server_stdio() or with_mcp_server_http(), phi-core:
- Connects to the MCP server and performs the
initializehandshake - Calls
tools/listto discover available tools - Wraps each MCP tool as an
AgentToolviaMcpToolAdapter - Adds them to the agent's tool list
MCP tools appear alongside built-in tools. The LLM sees them with their original names, descriptions, and JSON Schema parameters — it can call them just like any other tool.
Mixing Built-in and MCP Tools
#![allow(unused)] fn main() { use phi_core::tools::default_tools; let agent = BasicAgent::new(model_config) .with_tools(default_tools()) // bash, read, write, edit, list, search .with_mcp_server_stdio("my-db-server", &[], None) .await?; // Agent now has both built-in coding tools AND MCP database tools }
Using the MCP Client Directly
For lower-level control, use McpClient directly:
#![allow(unused)] fn main() { use phi_core::mcp::{McpClient, McpToolAdapter}; use std::sync::Arc; use tokio::sync::Mutex; let client = McpClient::connect_stdio("my-server", &[], None).await?; let tools = client.list_tools().await?; for tool in &tools { println!("{}: {}", tool.name, tool.description.as_deref().unwrap_or("")); } // Call a tool directly let result = client.call_tool("read_file", serde_json::json!({"path": "/tmp/test.txt"})).await?; // Or wrap as AgentTool adapters let client = Arc::new(Mutex::new(client)); let adapters = McpToolAdapter::from_client(client).await?; }
Error Handling
MCP operations return McpError:
McpError::Transport— connection or I/O failureMcpError::Protocol— unexpected response formatMcpError::JsonRpc— server returned a JSON-RPC error (code+message)McpError::Serialization— JSON serialization/deserialization failureMcpError::Io— standard I/O errorMcpError::ConnectionClosed— server process exited
When an MCP tool returns isError: true, the adapter converts it to a ToolError::Failed, which the agent loop sends back to the LLM with is_error: true so it can self-correct.
OpenAPI Tool Adapter
Auto-generate AgentTool implementations from OpenAPI 3.0 specs. Point an agent at any API spec and it instantly gets callable tools for every operation.
Feature-gated — add
features = ["openapi"]to yourCargo.toml.
Quick Start
use phi_core::BasicAgent; use phi_core::openapi::{OpenApiToolAdapter, OpenApiConfig, OperationFilter}; use phi_core::provider::ModelConfig; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let api_key = std::env::var("ANTHROPIC_API_KEY")?; let config = OpenApiConfig::new() .with_bearer_token("sk-..."); let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )) .with_system_prompt("You are an API assistant.") .with_openapi_file("petstore.yaml", config, &OperationFilter::All) .await?; Ok(()) }
Loading Specs
Three ways to load an OpenAPI spec:
#![allow(unused)] fn main() { // From a file let agent = agent.with_openapi_file("spec.yaml", config, &filter).await?; // From a URL let agent = agent.with_openapi_url("https://api.example.com/openapi.json", config, &filter).await?; // From a string (sync) let agent = agent.with_openapi_spec(&spec_string, config, &filter)?; }
Or create adapters directly for more control:
#![allow(unused)] fn main() { let adapters = OpenApiToolAdapter::from_str(&spec, config, &OperationFilter::All)?; let tools: Vec<Box<dyn AgentTool>> = adapters.into_iter().map(|a| Box::new(a) as _).collect(); }
Configuration
OpenApiConfig controls auth, headers, timeouts, and response limits:
#![allow(unused)] fn main() { let config = OpenApiConfig::new() .with_base_url("https://api.staging.example.com") // Override spec's servers .with_bearer_token("sk-...") // Bearer auth .with_header("X-Custom", "value") // Extra headers .with_timeout_secs(60) // Request timeout .with_max_response_bytes(128 * 1024) // Truncate large responses .with_name_prefix("github"); // Tool names: github__listRepos }
Authentication
#![allow(unused)] fn main() { // Bearer token let config = OpenApiConfig::new().with_bearer_token("token"); // API key in a custom header let config = OpenApiConfig::new().with_api_key("X-API-Key", "key-value"); // No auth let config = OpenApiConfig::new(); // default }
Filtering Operations
Most API specs have dozens or hundreds of operations. Use OperationFilter to select which ones become tools:
#![allow(unused)] fn main() { // All operations (default) let filter = OperationFilter::All; // Specific operations by ID let filter = OperationFilter::ByOperationId(vec![ "listRepos".into(), "getRepo".into(), "createIssue".into(), ]); // All operations with a specific tag let filter = OperationFilter::ByTag(vec!["repos".into()]); // All operations under a path prefix let filter = OperationFilter::ByPathPrefix("/repos".into()); }
How It Works
Each OpenAPI operation becomes one AgentTool:
| AgentTool method | Mapped from |
|---|---|
name() | operationId (with optional prefix) |
label() | summary or operationId |
description() | description or summary |
parameters_schema() | Combined JSON Schema from path/query/header params + request body |
When the LLM calls a tool, the adapter:
- Substitutes path parameters in the URL (
/pets/{petId}→/pets/123) - Adds query parameters as
?key=value - Adds header parameters
- Applies auth from config
- Sends the request body as JSON (if the operation has one)
- Returns the response text (with status code) to the LLM
Non-2xx responses are not treated as errors — they're returned as text so the LLM can reason about them and retry or adjust.
Mixing with Other Tools
OpenAPI tools work alongside built-in tools and MCP tools:
#![allow(unused)] fn main() { use phi_core::tools::default_tools; let agent = BasicAgent::new(model_config) .with_tools(default_tools()) .with_openapi_file("github.yaml", github_config, &github_filter).await? .with_mcp_server_stdio("db-server", &[], None).await?; }
Limitations (v1)
- OpenAPI 3.0.x only (not 3.1.x)
- JSON request/response bodies only (no multipart/form-data)
- No OAuth2 or token refresh (pass tokens via
OpenApiConfig) - Operations without
operationIdare skipped - Path-level
$refitems are skipped
Providers Overview
phi-core supports multiple LLM providers through the StreamProvider trait and ApiProtocol
dispatch. Callers never name a provider struct directly — ModelConfig is the single
descriptor for every provider connection.
Supported Protocols
ApiProtocol | Wire Format | Factory Method |
|---|---|---|
AnthropicMessages | Anthropic Messages API | ModelConfig::anthropic(id, name, key) |
OpenAiCompletions | OpenAI Chat Completions (15+ backends) | ModelConfig::openai(id, name, key) / ModelConfig::local(url, id, key) / ModelConfig::openrouter(id, key) |
OpenAiResponses | OpenAI Responses API | Direct struct construction |
AzureOpenAiResponses | Azure OpenAI Responses | Direct struct construction |
GoogleGenerativeAi | Google Gemini API | ModelConfig::google(id, name, key) |
GoogleVertex | Google Vertex AI | Direct struct construction |
BedrockConverseStream | AWS Bedrock ConverseStream | Direct struct construction |
ApiProtocol Enum
#![allow(unused)] fn main() { pub enum ApiProtocol { AnthropicMessages, OpenAiCompletions, OpenAiResponses, AzureOpenAiResponses, GoogleGenerativeAi, GoogleVertex, BedrockConverseStream, } }
ModelConfig
ModelConfig is the single, complete description of a provider connection. Pass it to
BasicAgent::new(), SubAgentTool::new(), or AgentLoopConfig.model_config:
#![allow(unused)] fn main() { pub struct ModelConfig { pub id: String, // e.g. "gpt-4o" — model name sent to the API pub name: String, // e.g. "GPT-4o" — display label for logging/UI pub api: ApiProtocol, // Which wire protocol to use (dispatch key) pub provider: String, // e.g. "openai" — logging label pub base_url: String, // API endpoint (no trailing slash) pub api_key: String, // Auth credential (sk-..., or "access_key:secret" for Bedrock) pub reasoning: bool, // Supports thinking/reasoning pub context_window: u32, // Context size in tokens pub max_tokens: u32, // Default max output pub cost: CostConfig, // Pricing per million tokens (0.0 = no tracking) pub headers: HashMap<String, String>, // Extra HTTP headers pub compat: Option<OpenAiCompat>, // Quirk flags (OpenAiCompletions only) } }
Factory methods (all accept api_key as the auth parameter):
#![allow(unused)] fn main() { let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let anthropic = ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key); let openai_key = std::env::var("OPENAI_API_KEY").unwrap(); let openai = ModelConfig::openai("gpt-4o", "GPT-4o", &openai_key); let gemini_key = std::env::var("GEMINI_API_KEY").unwrap(); let google = ModelConfig::google("gemini-2.0-flash", "Gemini 2.0 Flash", &gemini_key); // Local server — pass empty string for api_key if unauthenticated let local = ModelConfig::local("http://localhost:1234/v1", "my-model", ""); // OpenRouter — dedicated factory with correct compat flags let or_key = std::env::var("OPENROUTER_API_KEY").unwrap(); let openrouter = ModelConfig::openrouter("anthropic/claude-sonnet-4", &or_key); }
ProviderRegistry
Maps ApiProtocol → StreamProvider. The default registry includes all built-in providers:
#![allow(unused)] fn main() { let registry = ProviderRegistry::default(); // Use it to stream with any model let result = registry.stream(&model_config, stream_config, tx, cancel).await?; }
Custom registries (advanced — for adding a fully custom StreamProvider implementation):
#![allow(unused)] fn main() { use phi_core::provider::{ProviderRegistry, ApiProtocol}; let mut registry = ProviderRegistry::new(); registry.register(ApiProtocol::AnthropicMessages, my_custom_provider); // Then pass to AgentLoopConfig... (most users should use provider_override instead) }
StreamProvider Trait
#![allow(unused)] fn main() { #[async_trait] pub trait StreamProvider: Send + Sync { async fn stream( &self, config: StreamConfig, tx: mpsc::UnboundedSender<StreamEvent>, cancel: CancellationToken, ) -> Result<Message, ProviderError>; } }
All providers receive a StreamConfig, emit StreamEvents through the channel, and return the final Message.
OpenAPI Tool Adapter
In addition to LLM providers, phi-core can auto-generate tools from any OpenAPI 3.0 spec. This is a tool integration (not a provider), but it complements the provider system by letting agents call external APIs.
Enable with features = ["openapi"]. See the OpenAPI Tools guide for details.
Anthropic Provider
Handles the Anthropic Messages API with SSE streaming. Selected automatically when ModelConfig.api == ApiProtocol::AnthropicMessages.
Usage
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::ModelConfig; let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )); }
Features
Streaming SSE
Uses reqwest-eventsource to parse Anthropic's SSE stream. Events handled:
message_start— Input token usage, cache statscontent_block_start— Text, thinking, or tool_use blockcontent_block_delta— Text, thinking, input JSON, or signature deltascontent_block_stop— Block completemessage_delta— Stop reason, output usagemessage_stop— Stream complete
Extended Thinking
Set thinking_level to enable thinking with a token budget:
| Level | Budget Tokens |
|---|---|
Minimal | 128 |
Low | 512 |
Medium | 2,048 |
High | 8,192 |
Thinking content is streamed as Content::Thinking with a cryptographic signature for verification.
Cache Control
Automatic prompt caching via cache_control markers:
- System prompt: Always cached with
{"type": "ephemeral"} - Second-to-last message: Gets
cache_controlon its last content block, creating a cache breakpoint
This means on repeated calls, only the latest message is processed at full price.
Configuration
| Setting | Value |
|---|---|
| API URL | https://api.anthropic.com/v1/messages |
| API Version | 2023-06-01 |
| Auth Header | x-api-key |
| Default Max Tokens | 8,192 |
Environment Variables
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY | API key |
OpenAI Compatible Provider
One implementation (OpenAiCompatProvider) covers OpenAI, xAI, Groq, Cerebras, OpenRouter,
Mistral, DeepSeek, and any other OpenAI Chat Completions-compatible API. The provider is
selected automatically when ModelConfig.api == ApiProtocol::OpenAiCompletions.
Per-service behavior is controlled by OpenAiCompat flags stored in ModelConfig.compat.
Usage
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::ModelConfig; // OpenAI let api_key = std::env::var("OPENAI_API_KEY").unwrap(); let agent = BasicAgent::new(ModelConfig::openai("gpt-4o", "GPT-4o", &api_key)); // OpenRouter let or_key = std::env::var("OPENROUTER_API_KEY").unwrap(); let agent = BasicAgent::new(ModelConfig::openrouter("anthropic/claude-sonnet-4", &or_key)); // Local server (LM Studio, Ollama, llama.cpp, vLLM) let agent = BasicAgent::new(ModelConfig::local( "http://localhost:1234/v1", "my-model", "", // empty string — most local servers don't require auth )); }
OpenAiCompat Quirk Flags
Different providers have behavioral differences even though they share the same API:
#![allow(unused)] fn main() { pub struct OpenAiCompat { pub supports_store: bool, pub supports_developer_role: bool, pub supports_reasoning_effort: bool, pub supports_usage_in_streaming: bool, pub max_tokens_field: MaxTokensField, // MaxTokens or MaxCompletionTokens pub requires_tool_result_name: bool, pub requires_assistant_after_tool_result: bool, pub thinking_format: ThinkingFormat, // OpenAi, Xai, Qwen, or OpenRouter } }
Provider Presets
| Provider | ModelConfig factory | Key Differences |
|---|---|---|
| OpenAI | ModelConfig::openai(id, name, key) | developer role, max_completion_tokens, store, reasoning_effort |
| OpenRouter | ModelConfig::openrouter(id, key) | developer role, max_tokens, OpenRouter thinking format |
| Local | ModelConfig::local(url, id, key) | Generic defaults, empty api_key OK |
| xAI (Grok) | Direct construction with OpenAiCompat::xai() | reasoning field for thinking |
| Groq | Direct construction with OpenAiCompat::groq() | Standard defaults |
| Cerebras | Direct construction with OpenAiCompat::cerebras() | Standard defaults |
| Mistral | Direct construction with OpenAiCompat::mistral() | max_tokens field |
| DeepSeek | Direct construction with OpenAiCompat::deepseek() | max_completion_tokens |
Adding a New Compatible Provider
- Add a constructor to
OpenAiCompat:
#![allow(unused)] fn main() { impl OpenAiCompat { pub fn my_provider() -> Self { Self { supports_usage_in_streaming: true, // set flags as needed... ..Default::default() } } } }
- Create a
ModelConfigthat uses it:
#![allow(unused)] fn main() { use phi_core::provider::{ModelConfig, ApiProtocol, OpenAiCompat}; let config = ModelConfig { id: "my-model".into(), name: "My Model".into(), api: ApiProtocol::OpenAiCompletions, provider: "my-provider".into(), base_url: "https://api.myprovider.com/v1".into(), api_key: std::env::var("MY_API_KEY").unwrap_or_default(), compat: Some(OpenAiCompat::my_provider()), ..Default::default() }; BasicAgent::new(config) }
Thinking/Reasoning
The ThinkingFormat enum controls how reasoning content is parsed from streams:
ThinkingFormat::OpenAi— Usesreasoning_contentfield (most providers, default)ThinkingFormat::Xai— Usesreasoningfield (Grok)ThinkingFormat::Qwen— Usesreasoning_contentfield (Qwen variant)ThinkingFormat::OpenRouter— Usesreasoning_detailsarray (OpenRouter extended thinking)
Auth
Uses Authorization: Bearer {api_key} header. Extra headers can be added via ModelConfig.headers.
Google Gemini Provider
Two providers for Google's Gemini models:
GoogleProvider— Google AI Studio (Generative AI API) viaApiProtocol::GoogleGenerativeAiGoogleVertexProvider— Google Cloud Vertex AI viaApiProtocol::GoogleVertex
Google AI Studio
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::ModelConfig; let api_key = std::env::var("GOOGLE_API_KEY").unwrap(); let agent = BasicAgent::new(ModelConfig::google( "gemini-2.0-flash", "Gemini 2.0 Flash", &api_key, )); }
API Details
- Endpoint:
{base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={api_key} - Auth: API key as query parameter
- Default base URL:
https://generativelanguage.googleapis.com - Default context window: 1,000,000 tokens
Message Format
Google uses a different message format than OpenAI/Anthropic:
| phi-core | Google API |
|---|---|
user role | user role |
assistant role | model role |
Content::Text | {"text": "..."} |
Content::Image | {"inlineData": {...}} |
Content::ToolCall | {"functionCall": {...}} |
Message::ToolResult | {"functionResponse": {...}} |
| System prompt | systemInstruction field |
| Tools | tools[].functionDeclarations[] |
Streaming
Uses SSE format (alt=sse). Each chunk contains candidates with content.parts and optional usageMetadata.
Google Vertex AI
GoogleVertexProvider uses the same message format but with Vertex AI authentication and endpoints.
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::{ModelConfig, ApiProtocol}; // Vertex AI uses OAuth2 Bearer tokens as the api_key let access_token = get_access_token(); // your OAuth2 helper let agent = BasicAgent::new(ModelConfig { id: "gemini-2.0-flash".into(), name: "Gemini 2.0 Flash (Vertex)".into(), api: ApiProtocol::GoogleVertex, provider: "google_vertex".into(), base_url: "https://us-central1-aiplatform.googleapis.com".into(), api_key: access_token, ..Default::default() }); }
- Protocol:
ApiProtocol::GoogleVertex - Auth: OAuth2 / service account credentials (Bearer token in
api_key) - Endpoint pattern:
https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:streamGenerateContent
Amazon Bedrock Provider
Handles the AWS Bedrock ConverseStream API. Selected automatically when
ModelConfig.api == ApiProtocol::BedrockConverseStream.
Usage
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::{ModelConfig, ApiProtocol}; // With static credentials in api_key: "ACCESS_KEY:SECRET_KEY" or "ACCESS_KEY:SECRET_KEY:SESSION_TOKEN" let creds = std::env::var("AWS_BEDROCK_CREDENTIALS").unwrap_or_default(); let agent = BasicAgent::new(ModelConfig { id: "anthropic.claude-3-sonnet-20240229-v1:0".into(), name: "Claude Sonnet (Bedrock)".into(), api: ApiProtocol::BedrockConverseStream, provider: "bedrock".into(), base_url: "https://bedrock-runtime.us-east-1.amazonaws.com".into(), api_key: creds, // "access_key:secret_key[:session_token]", or "" for IAM roles ..Default::default() }); }
Authentication
The api_key field uses a colon-separated format:
{access_key_id}:{secret_access_key}
{access_key_id}:{secret_access_key}:{session_token}
For IAM roles (e.g., EC2 instance profiles, ECS task roles), pass an empty api_key and provide
pre-computed Authorization headers via ModelConfig.headers.
API Details
- Endpoint:
{base_url}/model/{model}/converse-stream - Default base URL:
https://bedrock-runtime.us-east-1.amazonaws.com - Protocol:
ApiProtocol::BedrockConverseStream
Message Format
Bedrock uses its own content block format:
| phi-core | Bedrock API |
|---|---|
Content::Text | {"text": "..."} |
Content::Image | {"image": {"format": "...", "source": {"bytes": "..."}}} |
Content::ToolCall | {"toolUse": {"toolUseId": "...", "name": "...", "input": ...}} |
Message::ToolResult | {"toolResult": {"toolUseId": "...", "content": [...], "status": "success"}} |
| System prompt | system array of text blocks |
| Tools | toolConfig.tools[].toolSpec |
| Max tokens | inferenceConfig.maxTokens |
Stream Events
Bedrock's ConverseStream returns these event types:
contentBlockStart— New content block (text or tool use)contentBlockDelta— Text or tool use input deltacontentBlockStop— Block completemessageStop— Stop reason (end_turn,max_tokens,tool_use)metadata— Token usage
Azure OpenAI Provider
Handles the OpenAI Responses API format with Azure-specific authentication and URL patterns.
Selected automatically when ModelConfig.api == ApiProtocol::AzureOpenAiResponses.
Usage
#![allow(unused)] fn main() { use phi_core::BasicAgent; use phi_core::provider::{ModelConfig, ApiProtocol}; let api_key = std::env::var("AZURE_OPENAI_API_KEY").unwrap(); let agent = BasicAgent::new(ModelConfig { id: "gpt-4o".into(), name: "GPT-4o (Azure)".into(), api: ApiProtocol::AzureOpenAiResponses, provider: "azure_openai".into(), base_url: "https://my-resource.openai.azure.com/openai/deployments/my-deployment".into(), api_key, ..Default::default() }); }
Authentication
Uses the api-key header (not Authorization: Bearer):
api-key: {your_api_key}
Additional headers can be set via ModelConfig.headers (e.g., for Azure AD Bearer tokens).
URL Format
https://{resource}.openai.azure.com/openai/deployments/{deployment}
Set this as ModelConfig.base_url. The provider appends /responses?api-version=2025-01-01-preview.
API Details
- Protocol:
ApiProtocol::AzureOpenAiResponses - Format: OpenAI Responses API (not Chat Completions)
- Streaming: SSE with event types:
response.output_text.delta— Text contentresponse.function_call_arguments.start— Tool call startresponse.function_call_arguments.delta— Tool call argumentsresponse.completed— Final usage data
Message Format
Uses the Responses API input format:
| phi-core | Azure Responses API |
|---|---|
| User message | {"role": "user", "content": "..."} |
| Assistant text | {"type": "message", "role": "assistant", "content": [{"type": "output_text", ...}]} |
| Tool call | {"type": "function_call", "call_id": "...", "name": "...", "arguments": "..."} |
| Tool result | {"type": "function_call_output", "call_id": "...", "output": "..."} |
| System prompt | instructions field |
Built-in Tools
phi-core ships with six coding-oriented tools. Get them all with default_tools():
#![allow(unused)] fn main() { use phi_core::tools::default_tools; let tools = default_tools(); }
BashTool
Execute shell commands with timeout and output capture.
- Name:
bash - Parameters:
command(string, required)
Configuration
#![allow(unused)] fn main() { pub struct BashTool { pub cwd: Option<String>, // Working directory pub timeout: Duration, // Default: 120s pub max_output_bytes: usize, // Default: 256KB pub deny_patterns: Vec<String>, // Blocked commands pub confirm_fn: Option<ConfirmFn>, // Confirmation callback } }
Default deny patterns: rm -rf /, rm -rf /*, mkfs, dd if=, fork bomb.
Example
#![allow(unused)] fn main() { let bash = BashTool::default(); // Or customize: let bash = BashTool { cwd: Some("/workspace".into()), timeout: Duration::from_secs(60), ..Default::default() }; }
ReadFileTool
Read file contents with optional line range.
- Name:
read_file - Parameters:
path(required),offset(optional, 1-indexed line),limit(optional, number of lines)
Configuration
#![allow(unused)] fn main() { pub struct ReadFileTool { pub max_bytes: usize, // Default: 1MB pub allowed_paths: Vec<String>, // Path restrictions (empty = no restriction) } }
WriteFileTool
Write content to a file. Creates parent directories automatically.
- Name:
write_file - Parameters:
path(required),content(required)
EditFileTool
Surgical search/replace edits. The most important tool for coding agents — instead of rewriting entire files, the agent specifies exact text to find and replace.
- Name:
edit_file - Parameters:
path(required),old_text(required),new_text(required)
The old_text must match exactly, including whitespace and indentation.
ListFilesTool
List files and directories with optional glob filtering.
- Name:
list_files - Parameters:
path(optional, default:.),pattern(optional glob)
Configuration
#![allow(unused)] fn main() { pub struct ListFilesTool { pub max_results: usize, // Default: 200 pub timeout: Duration, // Default: 10s } }
Uses find or fd for efficient traversal.
SearchTool
Search files using grep (or ripgrep if available).
- Name:
search - Parameters:
pattern(required, regex),path(optional root directory)
Configuration
#![allow(unused)] fn main() { pub struct SearchTool { pub root: Option<String>, // Root directory pub max_results: usize, // Default: 50 pub timeout: Duration, // Default: 30s } }
Returns matching lines with file paths and line numbers.
PrunTool
Model-directed context pruning. Removes the oldest inrun_context entries (model-generated messages) from the working context to reclaim space in the context window. Pruned content is preserved in the session log.
- Name:
prun - Parameters:
tokens(integer, required) -- approximate number of tokens to reclaim
The tool removes inrun_context entries oldest-first until the requested token budget is met. User messages are never affected. Returns a confirmation with the actual token count reclaimed.
Configuration
#![allow(unused)] fn main() { let agent = BasicAgent::new(model_config) .with_prun_tool(); // enables both prun and prun_with_memo }
PrunWithMemoTool
Context pruning with a summary replacement. Same removal behavior as prun, but inserts a concise memo at the position of the earliest pruned message so the model retains key takeaways.
- Name:
prun_with_memo - Parameters:
tokens(integer, required) -- approximate number of tokens to reclaim;memo(string, required) -- concise summary to retain in working context
The memo appears at the original timestamp of the earliest pruned message, preserving conversation chronology. Useful when pruned content contained decisions or conclusions worth remembering.
See Context Pruning for the full design.
Configuration
AgentLoopConfig
The main configuration for the agent loop:
#![allow(unused)] fn main() { pub struct AgentLoopConfig { /// REQUIRED — Complete provider identity: model id, api_key, base_url, protocol, compat flags, cost rates. pub model_config: ModelConfig, /// Custom provider override. When Some, bypasses ProviderRegistry. Use for MockProvider in tests. pub provider_override: Option<Arc<dyn StreamProvider>>, /// Stable config identity for loop_id generation. pub config_id: Option<String>, pub thinking_level: ThinkingLevel, pub max_tokens: Option<u32>, pub temperature: Option<f32>, pub convert_to_llm: Option<ConvertToLlmFn>, pub transform_context: Option<TransformContextFn>, pub get_steering_messages: Option<GetMessagesFn>, pub get_follow_up_messages: Option<GetMessagesFn>, /// Context config (includes CompactionConfig with strategies and token counter). pub context_config: Option<ContextConfig>, pub execution_limits: Option<ExecutionLimits>, pub cache_config: CacheConfig, pub tool_execution: ToolExecutionStrategy, pub retry_config: RetryConfig, // ── Lifecycle callbacks ── pub before_turn: Option<BeforeTurnFn>, pub after_turn: Option<AfterTurnFn>, pub before_loop: Option<BeforeLoopFn>, pub after_loop: Option<AfterLoopFn>, pub before_tool_execution: Option<BeforeToolExecutionFn>, pub after_tool_execution: Option<AfterToolExecutionFn>, pub before_tool_execution_update: Option<BeforeToolExecutionUpdateFn>, pub after_tool_execution_update: Option<AfterToolExecutionUpdateFn>, /// Compaction lifecycle callbacks (G1). pub before_compaction_start: Option<BeforeCompactionStartFn>, pub after_compaction_end: Option<AfterCompactionEndFn>, pub on_error: Option<OnErrorFn>, pub input_filters: Vec<Arc<dyn InputFilter>>, pub first_turn_trigger: TurnTrigger, /// Context translation strategy for cross-provider compatibility (G8). pub context_translation: Option<Arc<dyn ContextTranslationStrategy>>, /// Shared state for PrunTool to communicate pruning requests to the loop. pub prun_pending: Option<Arc<Mutex<Vec<PrunRequest>>>>, } }
Note: Compaction strategies (
in_memory_strategy,block_strategy) are fields onCompactionConfig(insideContextConfig), not onAgentLoopConfig. Thetoken_counterfor pluggable token counting is also onContextConfig.
StreamConfig
Internal config passed to StreamProvider::stream(). All provider identity comes from model_config:
#![allow(unused)] fn main() { pub struct StreamConfig { /// REQUIRED — full provider identity: id, api_key, base_url, compat, cost. pub model_config: ModelConfig, pub system_prompt: String, pub messages: Vec<Message>, pub tools: Vec<ToolDefinition>, pub thinking_level: ThinkingLevel, pub max_tokens: Option<u32>, // overrides model_config.max_tokens when Some pub temperature: Option<f32>, pub cache_config: CacheConfig, } }
ContextConfig
Controls context window management and compaction:
#![allow(unused)] fn main() { pub struct ContextConfig { pub max_context_tokens: usize, // Default: 100,000 pub system_prompt_tokens: usize, // Default: 4,000 pub compaction: CompactionConfig, // Full compaction policy (nested) pub token_counter: Option<Arc<dyn TokenCounter>>, // Pluggable token counting (REQ-162) // Legacy fields (backward compat — use compaction.* instead): pub keep_recent: usize, // Default: 10 pub keep_first: usize, // Default: 2 pub tool_output_max_lines: usize, // Default: 50 } }
CompactionConfig
#![allow(unused)] fn main() { pub struct CompactionConfig { // WHEN to compact: pub compact_at_pct: f64, // Default: 0.90 pub compact_budget_threshold_pct: f64, // Default: 0.05 pub compaction_scope: CompactionScope, // Default: FixedCount(3) // HOW to compact: pub keep_first_turns: usize, // Default: 2 pub keep_recent_turns: usize, // Default: 10 pub max_summary_tokens: usize, // Default: 2,000 pub tool_output_max_lines: usize, // Default: 50 pub focus_message: Option<String>, // Guides summarization focus // Strategy objects (G5 — moved from AgentLoopConfig): pub in_memory_strategy: Option<Arc<dyn CompactionStrategy>>, pub block_strategy: Option<Arc<dyn BlockCompactionStrategy>>, } }
ExecutionLimits
Prevents runaway agents:
#![allow(unused)] fn main() { pub struct ExecutionLimits { pub max_turns: usize, // Default: 50 pub max_total_tokens: usize, // Default: 1,000,000 pub max_duration: Duration, // Default: 600s pub max_cost: Option<f64>, // Default: None (no cost cap) } }
ThinkingLevel
#![allow(unused)] fn main() { pub enum ThinkingLevel { Off, // No thinking (default) Minimal, // 128 tokens (Anthropic budget) Low, // 512 tokens Medium, // 2,048 tokens High, // 8,192 tokens } }
CostConfig
Token pricing per million:
#![allow(unused)] fn main() { pub struct CostConfig { pub input_per_million: f64, pub output_per_million: f64, pub cache_read_per_million: f64, pub cache_write_per_million: f64, } }
API Reference
Top-Level Functions
agent_loop()
#![allow(unused)] fn main() { pub async fn agent_loop( prompts: Vec<AgentMessage>, context: &mut AgentContext, config: &AgentLoopConfig, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> Vec<AgentMessage> }
Start an agent loop with new prompt messages. Returns all messages generated during the run.
agent_loop_continue()
#![allow(unused)] fn main() { pub async fn agent_loop_continue( context: &mut AgentContext, config: &AgentLoopConfig, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> Vec<AgentMessage> }
Resume from existing context. The last message must not be an assistant message.
default_tools()
#![allow(unused)] fn main() { pub fn default_tools() -> Vec<Arc<dyn AgentTool>> }
Returns: BashTool, ReadFileTool, WriteFileTool, EditFileTool, ListFilesTool, SearchTool.
Agent Trait
The runtime interface for all agent implementations. Programs against this trait to remain independent of the specific implementation.
#![allow(unused)] fn main() { use phi_core::Agent; // trait must be in scope to call trait methods }
Trait methods cover: prompting (prompt, prompt_messages, prompt_with_sender, prompt_messages_with_sender, continue_loop, continue_loop_with_sender), state access (messages, is_streaming, agent_id, session_id, last_loop_id), message mutation (clear_messages, append_message, replace_messages, save_messages, restore_messages, set_tools), control (abort, reset), and steering/follow-up queues (steer, follow_up, clear_steering_queue, clear_follow_up_queue, clear_all_queues, set_steering_mode, set_follow_up_mode).
The trait is object-safe: Box<dyn Agent> and &mut dyn Agent work for runtime polymorphism.
phi_core::* re-exports Agent, so use phi_core::* brings it into scope automatically.
BasicAgent Struct
The default in-memory Agent implementation. Owns a single linear message history, tool registry, and model configuration.
Construction
#![allow(unused)] fn main() { let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap(); let agent = BasicAgent::new(ModelConfig::anthropic( "claude-sonnet-4-20250514", "Claude Sonnet 4", &api_key, )); }
| Signature | Description |
|---|---|
BasicAgent::new(model_config: ModelConfig) -> Self | Create a new agent with the given model configuration |
Builder Methods
All return Self for chaining (unless noted as Result).
Core
| Method | Description |
|---|---|
with_system_prompt(prompt) -> Self | Set the system prompt |
with_thinking(level: ThinkingLevel) -> Self | Set thinking level (Off, Minimal, Low, Medium, High) |
with_max_tokens(max: u32) -> Self | Set max output tokens |
with_model_config(config: ModelConfig) -> Self | Replace the entire ModelConfig (id, api_key, base_url, compat, cost, etc.) |
with_provider_override(provider: Arc<dyn StreamProvider>) -> Self | Bypass ProviderRegistry dispatch and use this provider directly (primarily for testing with MockProvider) |
Tools & Integrations
| Method | Description |
|---|---|
with_tools(tools: Vec<Arc<dyn AgentTool>>) -> Self | Set tools (replaces existing) |
with_sub_agent(sub: SubAgentTool) -> Self | Add a sub-agent tool |
with_skills(skills: SkillSet) -> Self | Load skills and append their index to the system prompt |
async with_mcp_server_stdio(command, args, env) -> Result<Self, McpError> | Connect to MCP server via stdio and add its tools |
async with_mcp_server_http(url) -> Result<Self, McpError> | Connect to MCP server via HTTP and add its tools |
async with_openapi_file(path, config, filter) -> Result<Self, OpenApiError> | Load tools from an OpenAPI spec file (requires openapi feature) |
async with_openapi_url(url, config, filter) -> Result<Self, OpenApiError> | Fetch spec from URL and add tools (requires openapi feature) |
with_openapi_spec(spec_str, config, filter) -> Result<Self, OpenApiError> | Parse spec string and add tools (requires openapi feature) |
Workspace & System Prompt
| Method | Description |
|---|---|
with_workspace(path: impl Into<PathBuf>) -> Self | Set the agent's workspace directory |
Context & Limits
| Method | Description |
|---|---|
with_context_config(config: ContextConfig) -> Self | Set context compaction config |
with_execution_limits(limits: ExecutionLimits) -> Self | Set execution limits (max turns, tokens, duration) |
with_compaction_strategy(strategy: impl CompactionStrategy) -> Self | Set a custom compaction strategy |
without_context_management() -> Self | Disable automatic context compaction and execution limits |
Behavior
| Method | Description |
|---|---|
with_messages(msgs: Vec<AgentMessage>) -> Self | Pre-load message history |
with_cache_config(config: CacheConfig) -> Self | Set prompt caching configuration |
with_tool_execution(strategy: ToolExecutionStrategy) -> Self | Set tool execution strategy (Parallel, Sequential, Batched) |
with_retry_config(config: RetryConfig) -> Self | Set retry configuration |
with_input_filter(filter: impl InputFilter) -> Self | Add an input filter (runs on user messages before LLM call) |
Callbacks
| Method | Description |
|---|---|
on_before_loop(f: Fn(&[AgentMessage], u64) -> bool) -> Self | Called once before AgentStart; return false to abort the entire run |
on_after_loop(f: Fn(&[AgentMessage], &Usage)) -> Self | Called once after AgentEnd with all new messages and accumulated usage |
on_before_turn(f: Fn(&[AgentMessage], usize) -> bool) -> Self | Called before each LLM call; return false to abort |
on_after_turn(f: Fn(&[AgentMessage], &Usage)) -> Self | Called after each LLM response and tool execution |
on_error(f: Fn(&str)) -> Self | Called when the LLM returns StopReason::Error |
on_before_tool_execution(f: Fn(&str, &str, &Value) -> bool) -> Self | Called before each tool call (name, call_id, args); return false to skip |
on_after_tool_execution(f: Fn(&str, &str, bool)) -> Self | Called after each tool call (name, call_id, is_error) |
on_before_tool_execution_update(f: Fn(&str, &str, &str) -> bool) -> Self | Called before each streaming tool update (name, call_id, text); return false to suppress the event |
on_after_tool_execution_update(f: Fn(&str, &str, &str)) -> Self | Called after each streaming tool update (name, call_id, text) |
on_before_compaction_start(f: Fn(usize, usize) -> bool) -> Self | Called before compaction begins (estimated_tokens, message_count); return false to skip compaction |
on_after_compaction_end(f: Fn(usize, usize, usize, usize)) -> Self | Called after compaction completes (messages_before, messages_after, tokens_before, tokens_after) |
Prompting
| Method | Description |
|---|---|
async prompt(text) -> UnboundedReceiver<AgentEvent> | Send a text prompt, returns event stream |
async prompt_messages(messages) -> UnboundedReceiver<AgentEvent> | Send messages as prompt |
async prompt_with_sender(text, tx: UnboundedSender<AgentEvent>) | Send a text prompt, streaming events to a caller-provided sender for real-time consumption |
async prompt_messages_with_sender(messages, tx) | Send messages, streaming events to a caller-provided sender |
async continue_loop() -> UnboundedReceiver<AgentEvent> | Resume from current context with ContinuationKind::Default. continuation_kind on AgentStart is ContinuationKind (not Option). |
async continue_loop_with_sender(tx: UnboundedSender<AgentEvent>, kind: ContinuationKind) | Resume from current context with an explicit continuation kind, streaming events to a caller-provided sender |
State Access
| Method | Description |
|---|---|
messages() -> &[AgentMessage] | Get the full message history |
is_streaming() -> bool | Whether the agent is currently running |
agent_id() -> &str | Stable UUID assigned at construction; included in every AgentStart event |
session_id() -> &str | Stable UUID assigned at construction; groups all loops from this Agent instance |
last_loop_id() -> Option<&str> | The loop_id of the most recently started loop; None before first run |
workspace() -> Option<&Path> | The agent's workspace directory, if set (Agent trait method) |
State Mutation
| Method | Description |
|---|---|
set_tools(tools: Vec<Arc<dyn AgentTool>>) | Replace the tool set |
clear_messages() | Clear all messages |
append_message(msg: AgentMessage) | Add a message to history |
replace_messages(msgs: Vec<AgentMessage>) | Replace all messages |
save_messages() -> Result<String, serde_json::Error> | Serialize message history to JSON |
restore_messages(json: &str) -> Result<(), serde_json::Error> | Restore message history from JSON |
Steering & Follow-Up Queues
| Method | Description |
|---|---|
steer(msg: AgentMessage) | Queue a steering message (interrupts mid-tool-execution) |
follow_up(msg: AgentMessage) | Queue a follow-up message (processed after agent finishes) |
clear_steering_queue() | Clear pending steering messages |
clear_follow_up_queue() | Clear pending follow-up messages |
clear_all_queues() | Clear both queues |
set_steering_mode(mode: QueueMode) | Set delivery mode: OneAtATime or All |
set_follow_up_mode(mode: QueueMode) | Set delivery mode: OneAtATime or All |
Control
| Method | Description |
|---|---|
abort() | Cancel the current run via CancellationToken |
reset() | Clear all state (messages, queues, streaming flag) |
Session Callback Types
| Type | Signature | Description |
|---|---|---|
BeforeTaskFn | Arc<dyn Fn(&Session) -> bool + Send + Sync> | Called on first AgentStart with a new session_id. Parameter is the Session. Return false to reject. |
AfterTaskFn | Arc<dyn Fn(&Session) + Send + Sync> | Called in flush() when the session is finalized. Parameter is the completed Session. |
These are set on SessionRecorderConfig and fire at the session level (not per-loop). See Sessions for usage.
Re-exports
The crate re-exports key types from lib.rs:
#![allow(unused)] fn main() { // Agent system pub use agents::{Agent, AgentProfile, BasicAgent, QueueMode}; pub use agents::SubAgentTool; // Agent loop pub use agent_loop::{agent_loop, agent_loop_continue, agent_loop_parallel}; pub use agent_loop::evaluation::{ ElaborateEvaluation, LlmJudgeEvaluation, PickFirstEvaluation, TokenEfficientEvaluation, TransparentEvaluation, }; // Config-driven construction pub use config::{ agent_from_config, agent_from_config_with_registry, agents_from_config, parse_config, parse_config_file, AgentConfig, ConfigError, ConfigFormat, }; // Context management pub use context::{ CompactionStrategy, CompactionConfig, CompactionScope, ContextConfig, DefaultCompaction, DefaultBlockCompaction, BlockCompactionStrategy, ContextTracker, CompactionBlock, CompactedSection, TurnMap, TurnRange, build_context_from_session, compact_session_loops, }; pub use context::skills::SkillSet; // Session persistence pub use session::{ Session, SessionRecorder, SessionRecorderConfig, SessionScope, SessionError, LoopRecord, LoopEvent, LoopStatus, Turn, LoopConfigSnapshot, ParallelGroupRecord, ChildLoopRef, SpawnRef, SessionFormation, save_session, load_session, list_session_ids, delete_session, load_sessions_for_agent, }; // Provider pub use provider::retry::RetryConfig; // Types (glob re-export) pub use types::*; // Message, Content, AgentMessage, AgentEvent, Usage, LlmMessage, // StopReason, StreamDelta, TurnTrigger, ThinkingLevel, CacheConfig, etc. }
0.7.0 additions (reachable via module paths)
These symbols were added in 0.7.0 but are not (yet) part of the top-level glob. Import them via their module path:
#![allow(unused)] fn main() { // Session: trait-based pluggable store + atomic-write filesystem impl with // advisory locks (fs2 exclusive lock; returns SessionError::Locked on contention). use phi_core::session::{SessionStore, FileSystemSessionStore}; // Provider: credential refresh hook for long-running agents whose token expires // mid-run. On ProviderError::Auth, the loop invalidates the cached credential // and retries once before propagating. use phi_core::provider::{CredentialProvider, StaticCredentialProvider}; // Provider: structured-output contract. JsonObject = free-form JSON; // JsonSchema = strict schema (native where supported, tool-call emulation on // Anthropic and Anthropic-on-Bedrock, SchemaMismatch on others). use phi_core::provider::ResponseFormat; // Agent: fallible build_config(). Default impl returns Err(MissingModelConfig) // instead of panicking when model_config() is None. use phi_core::agents::AgentBuildError; // MCP: configurable per-request timeout (default 30s) on both stdio + HTTP // transports. Use McpClientConfig with connect_stdio_with_config / // connect_http_with_config. use phi_core::mcp::{McpClientConfig, DEFAULT_REQUEST_TIMEOUT}; }
phi-core — Project Overview
1. Purpose Statement
phi-core is a Rust async library for building stateful, multi-turn LLM agents that can autonomously execute tools to accomplish tasks. The library solves the core engineering problems of agent construction: routing between many LLM provider APIs through a unified interface, running a prompt-then-tool-call loop until the model signals completion, streaming real-time events to UI consumers, and automatically managing context windows so conversations do not exceed model token limits. It is designed to be embedded as a dependency in application code — it provides no standalone binary, no HTTP server, and no user interface of its own.
2. Key Capabilities
| Capability | Source Location |
|---|---|
| Multi-turn conversation loop (prompt → LLM → tool call → repeat) | src/agent_loop/ |
| Support for 20+ LLM providers via 7 distinct API protocols | src/provider/ |
| Real-time event streaming over an async channel | src/types/ (AgentEvent), src/agent_loop/ |
| Parallel, sequential, or batched tool execution | src/agent_loop/:execute_tool_calls() |
| Context compaction via CompactionBlock overlays (legacy: tiered compact_messages()) | src/context/ — compaction is now modeled via CompactionBlock |
| Built-in coding tools: bash execution, file read/write/edit, directory listing, grep search | src/tools/ |
| Sub-agent delegation: run an isolated child agent as a tool | src/agents/sub_agent.rs |
| Model Context Protocol (MCP) client for stdio and HTTP tool servers | src/mcp/ |
| AgentSkills system: load instruction sets from directory-based skill files | src/context/skills.rs |
| OpenAPI tool auto-generation from spec files or URLs (optional feature) | src/openapi/ |
| JSON serialization of entire conversation history for persistence | src/types/ (all types derive Serialize/Deserialize) |
| Exponential-backoff retry for rate-limit and network errors | src/provider/retry.rs |
| Prompt caching hints for compatible providers (Anthropic) | src/types/ (CacheConfig) |
| Extended thinking / reasoning mode | src/types/ (ThinkingLevel) |
| Lifecycle callbacks: before/after each turn, on error | src/agent_loop/ (BeforeTurnFn, AfterTurnFn, OnErrorFn) |
| Loop-level hooks: setup/teardown around each complete agent run | src/agent_loop/ (BeforeLoopFn, AfterLoopFn) |
| Tool-level hooks: intercept each tool execution and streaming update | src/agent_loop/ (BeforeToolExecutionFn, AfterToolExecutionFn, BeforeToolExecutionUpdateFn, AfterToolExecutionUpdateFn) |
Agent identity: stable agent_id / session_id / loop_id for cross-loop traceability | src/agents/basic_agent.rs, src/types/ |
Evaluational parallelism: agent_loop_parallel() runs N AgentLoopConfigs concurrently on the same prompt, evaluates results via the pluggable EvaluationStrategy trait, and delivers the best outcome. Built-in strategies: TransparentEvaluation, PickFirstEvaluation, TokenEfficientEvaluation, ElaborateEvaluation, LlmJudgeEvaluation (with iterative compaction to satisfy judge's comprehension criteria). ParallelLoopStart/ParallelLoopEnd events bracket execution. Session continuity: selected_context feeds directly into agent_loop_continue(). | src/agent_loop/ (agent_loop_parallel), src/agent_loop/evaluation.rs, src/types/ |
Continuation kinds: Initial, Default, Rerun, Branch, Compaction variants for origin, retry, explore, and compaction semantics | src/types/ (ContinuationKind), src/agent_loop/ |
| Input filtering: moderation, PII redaction, injection detection | src/types/ (InputFilter) |
| User steering mid-run: inject messages between tool calls | src/agents/basic_agent.rs (steering queue), src/agent_loop/ |
| Follow-up work queuing: append more tasks after agent would stop | src/agents/basic_agent.rs (follow-up queue), src/agent_loop/ |
| Execution limits: max turns, max total tokens, max duration | src/context/ (ExecutionLimits, ExecutionTracker) |
3. Inputs & Outputs
Inputs
| Input | Format | Description |
|---|---|---|
| User prompt | Vec<AgentMessage> or String | Text (or multi-content) messages to start or continue a conversation |
| System prompt | String | Instruction set defining agent behavior, injected at each LLM call |
| Tool definitions | Vec<Box<dyn AgentTool>> | Executable tools exposed to the LLM via JSON Schema |
| LLM provider config | ModelConfig | Single provider identity card: id, api_key, base_url, api: ApiProtocol, cost, compat. Factory methods: ModelConfig::anthropic(), ::openai(), ::local(), ::google(), ::openrouter(). Pass to BasicAgent::new() or AgentLoopConfig.model_config. |
| Steering messages | Vec<AgentMessage> via queue | User-injected messages that interrupt mid-run tool execution |
| Follow-up messages | Vec<AgentMessage> via queue | Queued tasks appended when the agent would otherwise stop |
| Context config | ContextConfig | Token budget, compaction parameters |
| Execution limits | ExecutionLimits | Max turns, tokens, duration |
| Skill directories | Vec<Path> | Directories containing SKILL.md files |
| MCP server commands | Command string, args, env | Stdio or HTTP MCP server specifications |
| OpenAPI spec | File path, URL, or YAML/JSON string | API specs to auto-generate tools from |
| Cancellation token | CancellationToken | External abort signal |
Outputs
| Output | Format | Description |
|---|---|---|
| Agent event stream | UnboundedReceiver<AgentEvent> | Real-time stream of all events (text deltas, tool calls, results, errors) |
| Final messages | Vec<AgentMessage> | All new messages produced in the run (returned from agent_loop()) |
| Serialized conversation | JSON | Complete message history, serializable for persistence |
| Tool results | Embedded in AgentEvent::ToolExecutionEnd | Structured result of each tool call |
| Usage statistics | Usage struct per turn | Input/output/cache token counts per LLM call |
4. Actors & Use Cases
Application Developer
The primary consumer. Embeds phi-core as a library dependency.
| Use Case | How Triggered |
|---|---|
| Build a coding assistant | Create Agent, attach built-in tools, call agent.prompt("...") |
| Build a CLI REPL | Loop reading stdin, call agent.prompt(), render events (see examples/cli.rs) |
| Persist conversation across sessions | Call agent.save_messages() → JSON → agent.restore_messages() |
| Run a task autonomously with limits | Set ExecutionLimits, observe AgentEvent::AgentEnd |
| Interrupt a running agent | Call agent.steer(message) while event loop is running |
| Chain specialized agents | Attach SubAgentTool instances to a parent agent |
| Use third-party tools | Connect to an MCP server via agent.with_mcp_server_stdio() |
| Expose a REST API as tools | Load OpenAPI spec via agent.with_openapi_file() |
End User (via application)
Interacts through the application wrapping this library. Uses cases match what the application exposes (e.g., CLI prompts in examples/cli.rs: /quit, /clear, /model).
LLM Provider
External service receiving structured HTTP requests. The library sends conversation history and tool schemas; the provider returns streaming token deltas and final messages. Providers never call back into the library.
MCP Server
External process exposing tools over the Model Context Protocol. The library connects as a client via stdio pipe or HTTP. The server exposes tool definitions that are adapted into AgentTool instances.
Sub-Agent
A child instance of the agent loop spawned internally when a SubAgentTool is called. Operates with its own fresh context and toolset. Results are returned to the parent as a ToolResult.
5. Constraints & Non-Goals
- No built-in HTTP server. The library is embeddable only; serving the agent over HTTP requires external frameworks.
- No user interface. UI rendering (text display, color, input handling) is the application's responsibility (see
examples/cli.rsfor a reference implementation). - No authentication management. API keys must be supplied by the caller. The library does not fetch, rotate, or cache credentials.
- Single event consumer per run.
agent_loop()returns a singleUnboundedReceiver<AgentEvent>. Fan-out to multiple consumers requires application-level bridging. - No agent-to-agent networking. Sub-agents run in-process only. No remote agent delegation.
- No persistent storage. Conversation state is held in memory. Serialization to disk is the caller's responsibility (the library provides
serialize/deserializehelpers). - No built-in precision token counting. The default
HeuristicTokenCounteruses 4 characters per token. A pluggableTokenCountertrait (src/context/token.rs) allows callers to supply a custom counter (e.g., tiktoken-based), but no precision implementation ships with the library. - No multi-modal generation. Images can be sent to the model (as
Content::Image), but image generation is not supported. - No structured output / JSON mode. The library passes raw messages; enforcing structured output is the caller's responsibility via system prompt.
- Skipped tools on steering. When steering messages arrive mid-batch, remaining tool calls in that batch are skipped with an error result — their outputs are never computed. This is a documented behavior, not a bug.
6. Key Terminology Glossary
| Term | Definition |
|---|---|
| Agent | The runtime interface trait (src/agents/agent.rs). Programs against this trait to remain independent of the specific implementation. BasicAgent (src/agents/basic_agent.rs) is the default in-memory implementation: owns conversation history, tools, ModelConfig (provider identity + auth + cost), and configuration. Construction: BasicAgent::new(ModelConfig::anthropic(...)). The application-facing entry point. |
| Agent Loop | The recursive execution cycle (src/agent_loop/) that calls the LLM, processes tool calls, checks steering, and repeats until the LLM stops or limits are hit. |
| Turn | One complete LLM call plus the resulting tool executions. Bounded by TurnStart/TurnEnd events. Materialized as a Turn struct on LoopRecord.turns (src/session/model.rs). |
| Steering | A Vec<AgentMessage> injected into the running loop between tool executions. Used to redirect the agent mid-task without restarting it. |
| Follow-up | A Vec<AgentMessage> queued to be injected after the agent would naturally stop. Extends the run without creating a new agent_loop() call. |
| ModelConfig | The single, complete description of a provider connection (src/provider/model.rs). Fields: id (model name sent to API), name (display label), api: ApiProtocol (wire-protocol dispatch key), provider (logging label), base_url, api_key, cost: CostConfig, headers, compat: Option<OpenAiCompat>. Factory methods: anthropic(), openai(), local(), google(), openrouter(). Passed to BasicAgent::new(), SubAgentTool::new(), and AgentLoopConfig.model_config. |
| ApiProtocol | Enum that selects which HTTP wire format to use: AnthropicMessages, OpenAiCompletions, OpenAiResponses, AzureOpenAiResponses, GoogleGenerativeAi, GoogleVertex, BedrockConverseStream. Used by ProviderRegistry as a dispatch key. |
| StreamProvider | The trait (src/provider/traits.rs) that any LLM backend must implement. Has a single method stream() that takes a StreamConfig and sends StreamEvents. |
| AgentTool | The trait (src/types/) that any executable tool must implement. Methods: name(), label(), description(), parameters_schema(), execute(). |
| ToolContext | A struct passed to AgentTool::execute() containing the call ID, name, cancellation token, and optional progress callbacks. |
| AgentEvent | The streaming event enum emitted to the consumer during a run. Covers agent lifecycle, turn lifecycle, message streaming, and tool execution. |
| StreamDelta | A partial content update emitted during LLM streaming: Text, Thinking, or ToolCallDelta. |
| StopReason | Why the LLM ended its response. Variants: Stop (natural end), Length (token limit), ToolUse (returned tool calls), Error (failure), Aborted (cancellation), MaxTurns, UserStop, Handoff, GuardRail, ContextCompacted, Paused. |
| AgentMessage | The top-level message enum stored in the conversation history. Either Llm(LlmMessage) (sent to the LLM; LlmMessage wraps Message + optional TurnId for turn tracking) or Extension(ExtensionMessage) (app-only metadata). |
| Message | The LLM-protocol message enum: User, Assistant, or ToolResult. |
| Content | A single content block within a message: Text, Image (base64), Thinking, or ToolCall. |
| Usage | Token count metadata returned with each Assistant message: input, output, cache_read, cache_write, total_tokens. |
| ContextConfig | Configuration for the automatic context compaction: token budget, lines-to-keep per tool output, number of recent/first messages to preserve. |
| CompactionStrategy | A trait for customizing how messages are compacted when the token budget is exceeded. The default implementation uses 3 tiers. |
| CompactionBlock | The model used by the compaction system to represent compacted message regions. Replaces the previous inline approach in compact_messages() with a structured block-based representation. |
| ExecutionLimits | Hard caps on agent execution: max_turns, max_total_tokens, max_duration, max_cost: Option<f64>. When exceeded, the loop appends a system message and stops. |
| ToolExecutionStrategy | How multiple tool calls from one LLM response are dispatched: Sequential, Parallel (default), or Batched { size }. |
| CacheConfig / CacheStrategy | Controls prompt caching breakpoint placement for providers that support it (Anthropic). Strategies: Auto, Disabled, Manual. |
| ThinkingLevel | Controls extended reasoning depth: Off, Minimal, Low, Medium, High. Translated to provider-specific parameters. |
| AgentSkills | A directory-based system for loading instruction files (SKILL.md) that extend agent capabilities. Compatible with the AgentSkills open standard. |
| MCP | Model Context Protocol. A standard for tool servers that communicate over stdio or HTTP. The library acts as an MCP client. |
| SubAgentTool | An AgentTool implementation that, when called by the parent LLM, spawns a complete child agent_loop() with isolated context. |
| InputFilter | A synchronous trait applied to user text before the LLM call. Returns Pass, Warn(text) (appended to message), or Reject(reason) (aborts run). |
| ExtensionMessage | An AgentMessage variant that is not sent to the LLM. Used for application-specific metadata (UI state, notifications) stored in conversation history. |
| ContextTracker | Tracks context token usage using a hybrid of real provider-reported counts and local heuristic estimates for messages since the last report. |
| ProviderError | The error enum returned by StreamProvider::stream(). Variants: Api, Network, Auth, RateLimited, ContextOverflow, Cancelled, Other. |
| ToolDefinition | A schema-only description of a tool sent to the LLM (name, description, JSON Schema parameters). Does not include the execute function. |
| RetryConfig | Exponential-backoff configuration for retrying RateLimited and Network provider errors. |
| AgentLoopConfig | A flat configuration struct passed to agent_loop() / agent_loop_continue() bundling all behavioral settings. Required field: model_config: ModelConfig (provider identity, auth, cost rates). Optional provider_override: Option<Arc<dyn StreamProvider>> bypasses registry dispatch (used in tests). |
| QueueMode | Controls how queued messages (steering/follow-ups) are consumed per read. OneAtATime (default): pops only the first queued message. All: drains the entire queue at once. |
| McpContent | A content item returned by an MCP tool call. Variants: Text { text } and Image { data: base64, mimeType }. |
| OpenApiAuth | Authentication method for OpenAPI requests. Variants: None, Bearer(token), ApiKey { header, value }. Token/value is redacted in debug output. |
| OperationFilter | Controls which OpenAPI operations become tools. Variants: All, ByOperationId, ByTag, ByPathPrefix. Operations without an operationId are always skipped. |
| agent_id | A UUID v4 string generated once when Agent::new() is called. Stable for the lifetime of the Agent instance. Included in every AgentStart event to identify which agent produced the run. |
| session_id | A UUID v4 string generated once when Agent::new() is called. Groups all loops (origin + continuations) that belong to one logical session. Stable for the lifetime of the Agent instance. |
| loop_id | A string of the form "{session_id}.{config_id}.{N}" that uniquely identifies one agent_loop / agent_loop_continue call. The config_id segment is either caller-supplied or auto-derived from provider + model + thinking level. N is a per-config_id monotonic counter. Included in every AgentStart event. |
| ContinuationKind | Labels how an agent_loop or agent_loop_continue call relates to prior loops. Set on AgentContext.continuation_kind before calling. Variants: Initial (origin agent_loop call; the #[default]), Default (unspecified continuation), Rerun { tag } (retry the same scenario from an equivalent context), Branch { tag } (explore a different execution path), Compaction (context-compacted continuation). Tags are RFC 3339 UTC timestamps. Surfaced in AgentStart.continuation_kind. |
| TurnTrigger | Identifies what caused a turn to begin. Emitted in TurnStart.triggered_by. Variants: User (first turn of an Initial continuation — i.e., origin agent_loop call), SubAgent (running as a sub-agent via SubAgentTool), Continuation (subsequent turns, tool round-trips, Default/Rerun continuations, and steering-injected turns; renamed from FollowUp), Branch (first turn of a ContinuationKind::Branch continuation). |
| BeforeLoopFn / AfterLoopFn | Loop-level lifecycle hooks on AgentLoopConfig. BeforeLoopFn fires before AgentStart — return false to abort the run before it begins. AfterLoopFn fires after AgentEnd with the new messages and accumulated usage. |
| BeforeToolExecutionFn / AfterToolExecutionFn | Tool-level lifecycle hooks on AgentLoopConfig. BeforeToolExecutionFn fires before ToolExecutionStart — return false to skip the tool call. AfterToolExecutionFn fires after ToolExecutionEnd with the tool name, call ID, and error flag. |
| BeforeToolExecutionUpdateFn / AfterToolExecutionUpdateFn | Streaming tool update hooks on AgentLoopConfig. Fire around each ToolExecutionUpdate event emitted when a tool calls ctx.on_update(partial). BeforeToolExecutionUpdateFn returns false to suppress the event (tool keeps running; final ToolResult is unaffected). AfterToolExecutionUpdateFn fires after the event if not suppressed. |
Architecture Overview
For detailed component specifications, trait signatures, sequence diagrams, and data models, see the full Architecture Spec. For formal algorithm descriptions, see Algorithms.
Layered Design
phi-core is organized as three conceptual layers within a single crate. Dependencies flow strictly downward — upper layers use lower layers, never the reverse.
┌─────────────────────────────────────────────┐
│ Layer 3: Orchestration (planned) │
│ Multi-agent, delegation, work modes │
├─────────────────────────────────────────────┤
│ Layer 2: Agent + Providers │
│ Concrete providers, tools, retry, caching, │
│ context management, MCP │
├─────────────────────────────────────────────┤
│ Layer 1: Core Loop │
│ agent_loop, types, traits │
│ Provider-agnostic. Tool-agnostic. │
└─────────────────────────────────────────────┘
Layer 1: Core Loop
The pure agent loop. No opinions about LLMs, no built-in tools. Just the control flow.
Modules: types/, agent_loop/, provider/traits.rs
Owns:
agent_loop()/agent_loop_continue()— the loop itselfAgentTooltrait — interface tools must implementStreamProvidertrait — interface providers must implementAgentMessage,AgentEvent,StreamDelta— message & event typesAgentContext— system prompt + messages + tools- Tool execution strategies (parallel/sequential/batched)
- Streaming tool output (
ToolUpdateFn) - Steering & follow-up message injection
Does not own: Any concrete provider or tool implementation.
Layer 2: Agent + Providers
Batteries-included single-agent layer. Most users interact with this.
Modules: agents/, context/, provider/*.rs, tools/*.rs, mcp/*.rs
Adds on top of Layer 1:
- Concrete providers — Anthropic, OpenAI-compat, Google, Azure, Bedrock, Vertex
- Provider registry — dispatch by API protocol
- Context translation — cross-provider content type compatibility (G8)
- Prompt caching — automatic cache breakpoint placement
- Retry with backoff — exponential, jitter, respects retry-after
- Context management — token estimation, compaction, execution limits, cost tracking
AgentProfile+SystemPromptStrategy— reusable agent blueprints with multi-block prompt composition- Config-driven construction — TOML/JSON/YAML →
agent_from_config()→Arc<dyn Agent> - Built-in tools — bash, read_file, write_file, edit_file, list_files, search, prun (context pruning)
- Tool registry — name-based tool resolution from config
- Session persistence —
SessionRecordermaterializes Turn structs from events - MCP client — stdio + HTTP transports, tool adapter
Agenttrait — the runtime interface (prompting, state, control, ~40 methods)BasicAgentstruct — default in-memory implementation ofAgent; stateful builderSubAgentTool— delegates tasks to a childagent_loop()as a tool
Layer 3: Orchestration (planned)
Multi-agent coordination. Not yet implemented — the architecture is designed to support it when needed.
Planned capabilities:
Orchestratorstruct — spawn, delegate, and coordinate multiple agents- Work modes:
- Interactive — multi-turn, human in the loop (current default)
- Autonomous — runs to completion without input (background tasks, CI)
- Pipeline — input → output, chainable (scan → fix → verify)
- Supervisor — delegates to other agents, synthesizes results
- Fan-out — same task to multiple agents (different providers for diversity)
- Pipeline chaining — output of agent A feeds input of agent B
- Agent communication through the orchestrator event bus
Why not yet: Multi-agent orchestration adds complexity. The single-agent loop handles 95% of use cases. Layer 3 will be built when a concrete use case drives it, not speculatively.
Module Layout
phi-core/
├── src/
│ ├── lib.rs # Public re-exports
│ │
│ │── Layer 1: Core Loop ─────────────────────
│ ├── types/
│ │ ├── mod.rs # Re-exports, Message, AgentMessage
│ │ ├── content.rs # Content enum (Text, Image, Thinking, ToolCall), StopReason
│ │ ├── extension.rs # ExtensionMessage
│ │ ├── agent_message.rs # AgentMessage enum, LlmMessage (Message + TurnId)
│ │ ├── usage.rs # Usage, CacheConfig, CacheStrategy, ThinkingLevel
│ │ ├── tool.rs # AgentTool trait, ToolDefinition, ToolContext
│ │ ├── event.rs # AgentEvent enum, TurnTrigger, StreamDelta
│ │ ├── context.rs # AgentContext, InRunEntry (2-stream pruning)
│ │ └── parallel.rs # ToolExecutionStrategy
│ ├── agent_loop/
│ │ ├── core.rs # agent_loop(), agent_loop_continue()
│ │ ├── run.rs # run_loop() — inner turn engine
│ │ ├── streaming.rs # stream_assistant_response() — LLM call + retry
│ │ ├── tools.rs # execute_tool_calls()
│ │ ├── config.rs # AgentLoopConfig, callback type aliases
│ │ ├── helpers.rs # Input filtering, message conversion
│ │ ├── parallel.rs # agent_loop_parallel()
│ │ ├── evaluation.rs # EvaluationStrategy trait + 5 built-in strategies
│ │ └── script_callback.rs # ScriptCallback for shell/Python hooks
│ │
│ │── Layer 2: Agent + Providers ─────────────
│ ├── agents/
│ │ ├── agent.rs # Agent trait (runtime interface, ~40 methods)
│ │ ├── basic_agent.rs # BasicAgent struct (default in-memory impl)
│ │ ├── profile.rs # AgentProfile struct
│ │ ├── system_prompt.rs # SystemPromptStrategy, SystemPrompt, PromptBlockDef
│ │ └── sub_agent.rs # SubAgentTool (child agent_loop as a tool)
│ ├── config/
│ │ ├── schema.rs # AgentConfig + all TOML/JSON/YAML config sections
│ │ ├── builder.rs # agent_from_config(), agents_from_config()
│ │ ├── parser.rs # Multi-format parsing + env var substitution
│ │ └── reference.rs # {{...}} ID reference protocol parser
│ ├── context/
│ │ ├── config.rs # ContextConfig, CompactionConfig, CompactionScope
│ │ ├── compaction.rs # CompactionBlock, CompactedSection
│ │ ├── compact_messages.rs # compact_messages() — legacy tiered compaction
│ │ ├── strategy.rs # CompactionStrategy, BlockCompactionStrategy traits
│ │ ├── orchestration.rs # compact_session_loops(), build_context_from_session()
│ │ ├── execution.rs # ExecutionLimits, ExecutionTracker
│ │ ├── tracker.rs # ContextTracker (hybrid token counting)
│ │ ├── token.rs # TokenCounter trait, HeuristicTokenCounter
│ │ └── skills.rs # SkillSet (SKILL.md loader)
│ ├── session/
│ │ ├── model.rs # Session, LoopRecord, Turn, LoopStatus
│ │ ├── recorder.rs # SessionRecorder (event → session state machine)
│ │ ├── storage.rs # save_session(), load_session(), list/delete
│ │ └── helpers.rs # Internal utilities
│ ├── provider/
│ │ ├── traits.rs # StreamProvider trait, StreamEvent, ProviderError
│ │ ├── model.rs # ModelConfig, ApiProtocol, OpenAiCompat
│ │ ├── registry.rs # ProviderRegistry (protocol → provider)
│ │ ├── retry.rs # Retry with exponential backoff
│ │ ├── context_translation.rs # ContextTranslationStrategy (G8)
│ │ ├── anthropic.rs # Anthropic Messages API
│ │ ├── openai_compat.rs # OpenAI Chat Completions (15+ providers)
│ │ ├── openai_responses.rs # OpenAI Responses API
│ │ ├── google.rs # Google Generative AI
│ │ ├── google_vertex.rs # Google Vertex AI
│ │ ├── bedrock.rs # AWS Bedrock ConverseStream
│ │ ├── azure_openai.rs # Azure OpenAI
│ │ ├── mock.rs # Mock provider for testing
│ │ └── sse.rs # SSE utilities
│ ├── tools/
│ │ ├── bash.rs # BashTool
│ │ ├── file.rs # ReadFileTool, WriteFileTool
│ │ ├── edit.rs # EditFileTool
│ │ ├── list.rs # ListFilesTool
│ │ ├── search.rs # SearchTool
│ │ ├── prun.rs # PrunTool, PrunWithMemoTool (context pruning)
│ │ └── registry.rs # ToolRegistry (name → factory)
│ ├── mcp/
│ │ ├── client.rs # MCP client (stdio + HTTP)
│ │ ├── tool_adapter.rs # McpToolAdapter (MCP tool → AgentTool)
│ │ ├── transport.rs # Transport implementations
│ │ └── types.rs # MCP protocol types
│ └── openapi/ # (feature-gated: "openapi")
│ ├── adapter.rs # OpenApiToolAdapter
│ └── types.rs # OpenApiConfig, OperationFilter
Data Flow
┌─────────────┐
│ Caller │
└──────┬──────┘
│ prompt / prompt_messages
┌──────▼──────┐
│ BasicAgent │ Layer 2: stateful wrapper
│ (agents/) │ Manages queues, tools, state
└──────┬──────┘
│
┌──────▼──────┐
│ agent_loop │ Layer 1: core loop
│ │ Prompt → LLM → Tools → Repeat
└──┬───────┬──┘
│ │
┌────────▼──┐ ┌──▼────────┐
│ Provider │ │ Tools │ Layer 2: implementations
│ .stream() │ │ .execute()│
└────────┬──┘ └──┬────────┘
│ │
┌────────▼──┐ ┌──▼────────┐
│ LLM API │ │ OS / FS │
│ (HTTP) │ │ (shell) │
└───────────┘ └───────────┘
Events flow back via mpsc::UnboundedSender<AgentEvent>
How Providers Plug In
- Implement
StreamProvidertrait (Layer 1 interface) - Register with
ProviderRegistryunder anApiProtocol(Layer 2) - Set
ModelConfig.apito match that protocol - The registry dispatches
stream()calls to the right provider
Each provider translates between phi-core's Message/Content types and the provider's native API format. All providers emit StreamEvents through the channel for real-time updates.
How Tools Plug In
- Implement
AgentTooltrait (Layer 1 interface) - Add to the tools vec (via
default_tools()or custom) - The agent loop converts tools to
ToolDefinition(name, description, schema) for the LLM - When the LLM returns
Content::ToolCall, the loop finds the matching tool and callsexecute() - Results are wrapped in
Message::ToolResultand added to context
Tools receive a CancellationToken child token — they should check it for cooperative cancellation during long operations.
Design Principles
- Layers are conceptual, not physical. One crate, clean module boundaries, no feature flags needed.
- Dependencies flow down. Layer 1 never imports from Layer 2. Layer 2 never imports from Layer 3.
- Layer 1 is stable. The core loop and traits change rarely. New features are added in Layer 2 or 3.
- Build what's needed. Layer 3 is designed but not implemented. It will be built when a use case demands it, not speculatively.
- Simple over clever. A straightforward loop with good defaults beats an elegant abstraction nobody can debug.
First Principles: Core vs External
phi-core is a library, not a framework. These principles determine what belongs inside the crate and what should be built on top of it by consumers.
A feature belongs in phi-core if:
- All agents need it — every consumer would re-implement it independently. The agent loop, message types, event stream, and tool trait are universal primitives.
- Requires deep loop integration — it needs hooks inside the turn cycle that callbacks alone can't provide cleanly. Compaction, execution limits, and streaming are examples.
- Defines the contract — traits and interfaces that standardize how consumers extend the system.
StreamProvider,AgentTool,CompactionStrategy, andInputFilterare extension contracts. - Fragmentation risk — if consumers implement it differently, interoperability breaks. Session format, event vocabulary, and message types must be shared.
- Cross-cutting — it touches multiple modules and can't be layered on top without forking the crate.
A feature should be external if:
- Application-specific — workflows, domain tools, business logic, UI patterns.
- Infrastructure — databases, web servers, authentication, deployment, CI/CD.
- Opinionated — reasonable projects would choose differently. Vector databases, tracing backends, embedding models, and memory strategies are consumer choices.
- Implementable via existing extension points — it can be built cleanly using the traits and callbacks already in core. Permissions (via
InputFilter+BeforeToolExecutionFn), model fallback chains (via customStreamProvider), and observability backends (viaAgentEventstream) are examples.
Algorithms
This document has been split into smaller, maintainable files. See the algorithms/ directory for the full index.
Quick Navigation
- Core Loop: agent-loop | run-loop | streaming | tool-execution
- Context: compaction | decision-logic
- Lifecycle: agent-lifecycle | concurrency
- Providers: retry | error-classification | sub-agent
- Tools: bash | file-tools | mcp | openapi
For pseudocode conventions, see the README.
agent_loop (src/agent_loop/)
Purpose: Start a fresh agent run from new prompt messages.
Preconditions: prompts is non-empty; context.messages may contain prior history.
Postconditions: All input filters have run; AgentStart/AgentEnd are emitted; returns all new messages produced.
FUNCTION agent_loop(
prompts: Vec<AgentMessage>,
context: AgentContext, // mutable
config: AgentLoopConfig,
tx: EventChannel<AgentEvent>,
cancel: CancellationToken
) -> Vec<AgentMessage>
// ── loop_id generation (must happen before before_loop so AgentEnd can carry it) ──
IF context.loop_id is None THEN context.loop_id ← new_uuid() END IF
// ── before_loop hook ────────────────────────────────────────────────────
// Fires before AgentStart. Return false to abort before the loop begins.
IF config.before_loop defined AND NOT before_loop(context.messages, 0) THEN
EMIT AgentEnd(loop_id=context.loop_id, messages=[])
RETURN []
END IF
// ── Identity write-back ──────────────────────────────────────────────────
// agent_id / session_id are set by Agent::prompt_*. Direct callers may leave
// them None; agent_loop generates and writes them back so that a subsequent
// agent_loop_continue on the same context can inherit them without extra setup.
IF context.agent_id is None THEN context.agent_id ← new_uuid() END IF
IF context.session_id is None THEN context.session_id ← new_uuid() END IF
EMIT AgentStart {
agent_id: context.agent_id,
session_id: context.session_id,
loop_id: context.loop_id,
parent_loop_id: None, // None = origin call
continuation_kind: Initial, // Initial = origin call (the #[default])
config_snapshot: Some(LoopConfigSnapshot from config),
timestamp: now()
}
// ── Input filtering ─────────────────────────────────────────────────────
IF config.input_filters is non-empty THEN
user_text ← JOIN all text from User messages in prompts
warnings ← []
FOR EACH filter IN config.input_filters
MATCH filter.filter(user_text)
CASE Pass → continue
CASE Warn(w) → warnings.append(w)
CASE Reject(reason) →
EMIT InputRejected(reason)
EMIT AgentEnd(messages=[])
RETURN []
END MATCH
END FOR
IF warnings is non-empty THEN
warning_text ← JOIN ["[Warning: " + w + "]" FOR w IN warnings]
// Append to last User message's content
append Content::Text(warning_text) to last User message in prompts
END IF
END IF
// ── Append prompts to context ────────────────────────────────────────────
FOR EACH prompt IN prompts
context.messages.append(prompt)
END FOR
new_messages ← copy of prompts
EMIT TurnStart
// Emit events for each incoming prompt
FOR EACH prompt IN prompts
EMIT MessageStart(prompt)
EMIT MessageEnd(prompt)
END FOR
// Run the main loop
loop_usage ← run_loop(context, new_messages, config, tx, cancel)
EMIT AgentEnd(new_messages)
// ── after_loop hook ──────────────────────────────────────────────────────
// Fires after AgentEnd with the messages produced and accumulated usage.
IF config.after_loop defined THEN after_loop(new_messages, loop_usage) END IF
RETURN new_messages
END FUNCTION
agent_loop_continue (src/agent_loop/)
Purpose: Resume an agent run from existing context (no new prompts, continue from last user/tool-result message).
Preconditions: context.messages is non-empty; last message is NOT an assistant message; context.agent_id and context.session_id are Some.
Postconditions: Same as agent_loop.
FUNCTION agent_loop_continue(
context: AgentContext, // mutable
config: AgentLoopConfig,
tx: EventChannel<AgentEvent>,
cancel: CancellationToken
) -> Vec<AgentMessage>
[invariant: context.messages is non-empty]
[invariant: context.messages.last().role != "assistant"]
// Identity must carry over from the originating loop.
// These are set by Agent::continue_loop_with_sender (or the direct caller who
// bootstrapped the session). Silent UUID generation here would break traceability.
[invariant: context.agent_id is Some]
[invariant: context.session_id is Some]
new_messages ← []
// ── Classify existing messages into 2-stream model (if not already populated) ──
IF context.user_context is empty AND context.inrun_context is empty THEN
FOR EACH msg IN context.messages
IF msg is User → context.user_context.append(msg)
IF msg is Assistant or ToolResult → context.inrun_context.append(Live(msg))
// Extension messages go to neither stream
END FOR
END IF
// ── before_loop hook ────────────────────────────────────────────────────
IF config.before_loop defined AND NOT before_loop(context.messages, 0) THEN
EMIT AgentEnd(messages=[])
RETURN []
END IF
EMIT AgentStart {
agent_id: context.agent_id.unwrap(),
session_id: context.session_id.unwrap(),
loop_id: context.loop_id OR new_uuid(),
parent_loop_id: context.parent_loop_id, // None for Default, Some for Rerun/Branch
continuation_kind: context.continuation_kind, // Default|Rerun|Branch|Compaction (ContinuationKind, not Option)
config_snapshot: Some(LoopConfigSnapshot from config),
timestamp: now()
}
loop_usage ← run_loop(context, new_messages, config, tx, cancel)
EMIT AgentEnd(new_messages)
// ── after_loop hook ──────────────────────────────────────────────────────
IF config.after_loop defined THEN after_loop(new_messages, loop_usage) END IF
RETURN new_messages
END FUNCTION
For pseudocode conventions, see the README.
run_loop (src/agent_loop/)
Purpose: The shared inner logic for both agent_loop and agent_loop_continue. Handles the outer follow-up loop and the inner turn-by-tool loop.
Preconditions: Context contains at least one user message.
Postconditions: new_messages contains all messages produced; loop has exited cleanly or on limit/cancel/error.
FUNCTION run_loop(
context: AgentContext, // mutable
new_messages: Vec<AgentMessage>, // mutable accumulator
config: AgentLoopConfig,
tx: EventChannel<AgentEvent>,
cancel: CancellationToken
) -> Usage // accumulated usage across all turns
first_turn ← true
turn_number ← 0
loop_usage ← Usage.default()
tracker ← ExecutionTracker.new(config.execution_limits) // optional
// Drain any pending steering messages before starting
pending ← config.get_steering_messages() // or []
// ── Outer loop: re-enters if follow-up messages arrive ──────────────────
WHILE true
IF cancel.is_cancelled THEN RETURN loop_usage END IF
steering_after_tools ← null
// ── Inner loop: runs once per turn (LLM call + tools) ─────────────────
WHILE true
IF cancel.is_cancelled THEN RETURN loop_usage END IF
// Determine TurnTrigger for TurnStart event.
// NOTE: context.continuation_kind is Option<ContinuationKind> on AgentContext.
// None means Initial (first loop); Some(x) means a continuation.
// The pseudocode below abstracts this as direct ContinuationKind values.
//
// Priority on the first turn:
// 1. Branch continuation → TurnTrigger::Branch (explicit branch signal)
// 2. Any other continuation (Default/Rerun/Compaction) → TurnTrigger::Continuation
// (the continuation itself is the follow-up, not a fresh user turn)
// 3. Initial (origin agent_loop call) → config.first_turn_trigger
// (User for Agent::prompt, SubAgent for sub-agent callers)
// Subsequent turns always use TurnTrigger::Continuation.
IF first_turn THEN
turn_trigger ←
IF context.continuation_kind == Branch(..) THEN TurnTrigger::Branch
ELSE IF context.continuation_kind != Initial THEN TurnTrigger::Continuation
ELSE config.first_turn_trigger
first_turn ← false
ELSE
turn_trigger ← TurnTrigger::Continuation
END IF
EMIT TurnStart { turn_index: turn_number, triggered_by: turn_trigger }
// Inject any pending (steering/follow-up) messages
FOR EACH msg IN pending
EMIT MessageStart(msg)
EMIT MessageEnd(msg)
context.messages.append(msg)
new_messages.append(msg)
context.user_context.append(msg) // steering goes to user stream (never pruned)
END FOR
pending ← []
// Check execution limits
IF tracker.check_limits() is Some(reason) THEN
limit_msg ← User message "[Agent stopped: {reason}]"
EMIT MessageStart(limit_msg)
EMIT MessageEnd(limit_msg)
context.messages.append(limit_msg)
new_messages.append(limit_msg)
RETURN loop_usage
END IF
// Before-turn callback — abort if returns false
IF config.before_turn is defined THEN
IF NOT config.before_turn(context.messages, turn_number) THEN
RETURN loop_usage
END IF
END IF
turn_number ← turn_number + 1
// Compact context if configured (strategies live in context_config.compaction)
IF config.context_config is defined THEN
ctx_config ← config.context_config
IF tokens_exceed_threshold(context, ctx_config) THEN
IF config.before_compaction_start defined THEN
IF NOT before_compaction_start(estimated_tokens, message_count) THEN
SKIP compaction this cycle
END IF
END IF
EMIT CompactionStarted { ... }
strategy ← ctx_config.compaction.in_memory_strategy OR DefaultCompaction
context.messages ← strategy.compact(context.messages, ctx_config)
EMIT CompactionEnded { ... }
IF config.after_compaction_end defined THEN
after_compaction_end(msgs_before, msgs_after, tokens_before, tokens_after)
END IF
END IF
END IF
// ── LLM call ────────────────────────────────────────────────────────
message ← AWAIT stream_assistant_response(context, config, tx, cancel)
agent_msg ← message as AgentMessage
context.messages.append(agent_msg)
new_messages.append(agent_msg)
context.inrun_context.append(Live(agent_msg)) // track in inrun stream (model-generated)
// Accumulate usage for after_loop hook
loop_usage ← loop_usage + message.usage
// Handle error/abort stop reasons
IF message.stop_reason == Error OR message.stop_reason == Aborted THEN
IF message.stop_reason == Error AND config.on_error is defined THEN
config.on_error(message.error_message OR "Unknown error")
END IF
IF config.after_turn is defined THEN
config.after_turn(context.messages, message.usage)
END IF
EMIT TurnEnd(agent_msg, tool_results=[])
RETURN loop_usage
END IF
// Extract tool calls from assistant content
tool_calls ← [
(id, name, arguments)
FOR EACH content IN message.content
IF content is ToolCall
]
tool_results ← []
IF tool_calls is non-empty THEN
execution ← AWAIT execute_tool_calls(
context.tools, tool_calls, tx, cancel,
config.get_steering_messages, config.tool_execution
)
tool_results ← execution.tool_results
steering_after_tools ← execution.steering_messages
FOR EACH result IN tool_results
am ← result as AgentMessage
context.messages.append(am)
new_messages.append(am)
context.inrun_context.append(Live(am)) // track in inrun stream
END FOR
// Apply pending prun requests after tool execution (PrunTool stores requests during execute)
IF config.prun_pending is defined THEN
requests ← LOCK(config.prun_pending).drain()
FOR EACH request IN requests
apply_prun(context, request, tx) // walks inrun_context backward, prunes Live entries
END FOR
END IF
END IF
// Record turn for limit tracking
tracker.record_turn(message.usage.input + message.usage.output)
// After-turn callback
IF config.after_turn is defined THEN
config.after_turn(context.messages, message.usage)
END IF
EMIT TurnEnd(agent_msg, tool_results)
// Check for steering that arrived during tool execution
IF steering_after_tools is non-empty THEN
pending ← steering_after_tools
CONTINUE inner loop
END IF
pending ← config.get_steering_messages()
// Exit inner loop if no tool calls and no pending messages
IF tool_calls is empty AND pending is empty THEN
BREAK inner loop
END IF
END WHILE // inner loop
// Check for follow-up work
follow_ups ← config.get_follow_up_messages()
IF follow_ups is non-empty THEN
pending ← follow_ups
CONTINUE outer loop
END IF
BREAK outer loop
END WHILE // outer loop
RETURN loop_usage
END FUNCTION
For pseudocode conventions, see the README.
stream_assistant_response (src/agent_loop/)
Purpose: Call the LLM with the current context, stream events to the channel, and return the final Message. Includes retry logic for transient errors.
Preconditions: context.messages has at least one user message.
Postconditions: Returns a complete Message::Assistant; events emitted include MessageStart, zero or more MessageUpdate, and MessageEnd.
FUNCTION stream_assistant_response(
context: AgentContext,
config: AgentLoopConfig,
tx: EventChannel<AgentEvent>,
cancel: CancellationToken
) -> Message
// Build working context: merges user_context + live inrun_context + memos, sorted by timestamp.
// Falls back to context.messages when prun streams are empty.
base_messages ← context.build_working_context()
// Apply optional context transform (e.g. for custom preprocessing)
messages ← IF config.transform_context defined
THEN config.transform_context(base_messages)
ELSE base_messages
// Filter to LLM-compatible messages (drop Extension messages)
llm_messages ← IF config.convert_to_llm defined
THEN config.convert_to_llm(messages)
ELSE [m FOR m IN messages IF m is Llm variant]
// Build tool schema list (schema only, no execute functions)
tool_defs ← [
ToolDefinition(name, description, parameters_schema)
FOR EACH tool IN context.tools
]
retry ← config.retry_config
attempt ← 0
// ── Retry loop ──────────────────────────────────────────────────────────
WHILE true
stream_config ← StreamConfig {
model, system_prompt: context.system_prompt,
messages: llm_messages, tools: tool_defs,
thinking_level, api_key, max_tokens, temperature,
model_config, cache_config
}
(stream_tx, stream_rx) ← new unbounded channel
result ← AWAIT config.provider.stream(stream_config, stream_tx, cancel)
MATCH result
CASE Err(e) IF e.is_retryable()
AND attempt < retry.max_retries
AND NOT cancel.is_cancelled →
attempt ← attempt + 1
delay ← e.retry_after() OR retry.delay_for_attempt(attempt)
log_retry(attempt, retry.max_retries, delay, e)
AWAIT sleep(delay)
CONTINUE // retry
CASE other →
BREAK with (result, stream_rx)
END MATCH
END WHILE
// ── Process streaming events ─────────────────────────────────────────────
partial_message ← null
FOR EACH stream_event IN stream_rx (drain available)
MATCH stream_event
CASE Start →
placeholder ← empty Assistant message
partial_message ← placeholder
EMIT MessageStart(placeholder)
CASE TextDelta(delta) →
IF partial_message defined THEN
EMIT MessageUpdate(partial_message, StreamDelta::Text(delta))
END IF
CASE ThinkingDelta(delta) →
IF partial_message defined THEN
EMIT MessageUpdate(partial_message, StreamDelta::Thinking(delta))
END IF
CASE ToolCallDelta(delta) →
IF partial_message defined THEN
EMIT MessageUpdate(partial_message, StreamDelta::ToolCallDelta(delta))
END IF
CASE Done(message) →
am ← message as AgentMessage
partial_message ← am
// MessageStart was already emitted on Start
EMIT MessageEnd(am)
CASE Error(message) →
am ← message as AgentMessage
IF partial_message is null THEN
EMIT MessageStart(am)
END IF
partial_message ← am
EMIT MessageEnd(am)
END MATCH
END FOR
// Return result
MATCH result
CASE Ok(msg) → RETURN msg
CASE Err(e) →
RETURN Assistant {
content: [Text("")],
stop_reason: Error,
model: config.model,
provider: "unknown",
usage: default,
error_message: Some(e.to_string())
}
END MATCH
END FUNCTION
For pseudocode conventions, see the README.
execute_tool_calls (src/agent_loop/)
Purpose: Dispatch a list of tool calls using the configured execution strategy.
Preconditions: tool_calls is non-empty.
Postconditions: Returns one ToolResult message per input tool call (in order); skipped tools produce error results.
FUNCTION execute_tool_calls(
tools: Vec<AgentTool>,
tool_calls: [(id, name, args)],
tx: EventChannel<AgentEvent>,
cancel: CancellationToken,
get_steering: optional function,
strategy: ToolExecutionStrategy
) -> ToolExecutionResult { tool_results, steering_messages }
MATCH strategy
CASE Sequential →
RETURN execute_sequential(tools, tool_calls, tx, cancel, get_steering)
CASE Parallel →
RETURN execute_batch(tools, tool_calls, tx, cancel, get_steering)
CASE Batched { size } →
results ← []
steering_messages ← null
FOR EACH batch IN chunks(tool_calls, size)
batch_result ← AWAIT execute_batch(tools, batch, tx, cancel, steering=null)
results.extend(batch_result.tool_results)
// Check steering between batches
IF get_steering defined THEN
steering ← get_steering()
IF steering is non-empty THEN
steering_messages ← steering
// Skip remaining tool calls
remaining_idx ← (batch_index + 1) * size
FOR EACH (skip_id, skip_name) IN tool_calls[remaining_idx..]
results.append(skip_tool_call(skip_id, skip_name, tx))
END FOR
BREAK
END IF
END IF
END FOR
RETURN { tool_results: results, steering_messages }
END MATCH
END FUNCTION
execute_sequential (src/agent_loop/)
Purpose: Execute tool calls one at a time, checking for steering between each.
FUNCTION execute_sequential(
tools, tool_calls, tx, cancel, get_steering
) -> ToolExecutionResult
results ← []
steering_messages ← null
FOR EACH (index, (id, name, args)) IN enumerate(tool_calls)
(result_msg, _) ← AWAIT execute_single_tool(tools, id, name, args, tx, cancel)
results.append(result_msg)
IF get_steering defined THEN
steering ← get_steering()
IF steering is non-empty THEN
steering_messages ← steering
// Skip remaining tool calls
FOR EACH (skip_id, skip_name) IN tool_calls[index+1..]
results.append(skip_tool_call(skip_id, skip_name, tx))
END FOR
BREAK
END IF
END IF
END FOR
RETURN { tool_results: results, steering_messages }
END FUNCTION
execute_batch (src/agent_loop/)
Purpose: Execute all tool calls in a batch concurrently, then check for steering.
FUNCTION execute_batch(
tools, tool_calls, tx, cancel, get_steering
) -> ToolExecutionResult
// Launch all tools concurrently
futures ← [execute_single_tool(tools, id, name, args, tx, cancel)
FOR EACH (id, name, args) IN tool_calls]
batch_results ← AWAIT_ALL(futures) // wait for all to complete
results ← [msg FOR (msg, _) IN batch_results]
// Check steering after all complete
steering_messages ← null
IF get_steering defined THEN
steering ← get_steering()
IF steering is non-empty THEN
steering_messages ← steering
END IF
END IF
RETURN { tool_results: results, steering_messages }
END FUNCTION
execute_single_tool (src/agent_loop/)
Purpose: Execute one tool call, emitting progress events and returning the result as a ToolResult message.
FUNCTION execute_single_tool(
tools: Vec<AgentTool>,
id: String, name: String, args: JSON,
tx: EventChannel<AgentEvent>,
cancel: CancellationToken,
config: AgentLoopConfig // for before/after_tool_execution* hooks
) -> (Message::ToolResult, is_error: bool)
tool ← find tool WHERE tool.name() == name // may be None
// ── before_tool_execution hook ───────────────────────────────────────────
// Return false to skip this tool call entirely.
IF config.before_tool_execution defined THEN
IF NOT before_tool_execution(name, id, args) THEN
// Emit a skipped error result so the LLM knows the call did not run
skip_result ← ToolResult{ content: [Text("Tool call skipped by before_tool_execution hook")], is_error: true }
EMIT ToolExecutionEnd(id, name, skip_result, is_error=true, child_loop_id=None)
msg ← Message::ToolResult{ ..., is_error: true }
EMIT MessageStart(msg); EMIT MessageEnd(msg)
RETURN (msg, true)
END IF
END IF
EMIT ToolExecutionStart(tool_call_id=id, tool_name=name, args)
// Build callbacks for streaming partial results.
// Each on_update call runs through the before/after_tool_execution_update hooks.
on_update ← callback(partial: ToolResult):
// Extract text content for hooks
text_content ← JOIN text blocks from partial.content
// before_tool_execution_update — false suppresses the event
emit ← IF config.before_tool_execution_update defined
THEN before_tool_execution_update(name, id, text_content)
ELSE true
IF emit THEN
EMIT ToolExecutionUpdate(id, name, partial_result=partial)
// after_tool_execution_update — fires only when event was not suppressed
IF config.after_tool_execution_update defined THEN
after_tool_execution_update(name, id, text_content)
END IF
END IF
on_progress ← callback that EMITS ProgressMessage(id, name, text)
ctx ← ToolContext {
tool_call_id: id,
tool_name: name,
cancel: cancel.child_token(), // new child token, same lineage
on_update: on_update,
on_progress: on_progress
}
(result, is_error) ←
IF tool found THEN
MATCH AWAIT tool.execute(args, ctx)
CASE Ok(r) → (r, false)
CASE Err(e) → (ToolResult{ content: [Text(e.to_string())] }, true)
END MATCH
ELSE
(ToolResult{ content: [Text("Tool {name} not found")] }, true)
END IF
// child_loop_id is set by SubAgentTool; None for all other tools
EMIT ToolExecutionEnd(id, name, result, is_error, child_loop_id: result.child_loop_id)
// ── after_tool_execution hook ────────────────────────────────────────────
IF config.after_tool_execution defined THEN
after_tool_execution(name, id, is_error)
END IF
msg ← Message::ToolResult {
tool_call_id: id, tool_name: name,
content: result.content, is_error, timestamp: now_ms()
}
EMIT MessageStart(msg)
EMIT MessageEnd(msg)
RETURN (msg, is_error)
END FUNCTION
For pseudocode conventions, see the README.
compact_messages (src/context/)
Note: The algorithm below describes the legacy in-memory compaction (
compact_messages()). The current system uses a non-destructive overlay model viaCompactionBlock/BlockCompactionStrategy. See compaction concept for the current design.
Purpose: Reduce context size using a 3-level strategy (Level 1 → 2 → 3) until messages fit the token budget.
Preconditions: messages is a complete conversation history.
Postconditions: Returns a subset/summary of messages with total_tokens(result) <= budget.
FUNCTION compact_messages(
messages: Vec<AgentMessage>,
config: ContextConfig
) -> Vec<AgentMessage>
budget ← config.max_context_tokens - config.system_prompt_tokens
// Already fits — return unchanged
IF total_tokens(messages) <= budget THEN
RETURN messages
END IF
// ── Level 1: Truncate verbose tool outputs ──────────────────────────────
compacted ← level1_truncate_tool_outputs(messages, config.tool_output_max_lines)
IF total_tokens(compacted) <= budget THEN
RETURN compacted
END IF
// ── Level 2: Summarize old turns ────────────────────────────────────────
compacted ← level2_summarize_old_turns(compacted, config.keep_recent)
IF total_tokens(compacted) <= budget THEN
RETURN compacted
END IF
// ── Level 3: Drop middle messages ───────────────────────────────────────
RETURN level3_drop_middle(compacted, config, budget)
END FUNCTION
level1_truncate_tool_outputs (src/context/)
Note: The algorithm below describes the legacy in-memory compaction (
compact_messages()). The current system uses a non-destructive overlay model viaCompactionBlock/BlockCompactionStrategy. See compaction concept for the current design.
Purpose: Truncate long tool output text to head + tail, preserving message structure.
FUNCTION level1_truncate_tool_outputs(
messages: Vec<AgentMessage>,
max_lines: usize
) -> Vec<AgentMessage>
RETURN [
FOR EACH msg IN messages
IF msg is ToolResult THEN
// Truncate each Text content block
new_content ← [
FOR EACH content IN msg.content
IF content is Text THEN
Text { text: truncate_head_tail(content.text, max_lines) }
ELSE
content unchanged
END IF
]
ToolResult { ...msg, content: new_content }
ELSE
msg unchanged
END IF
]
END FUNCTION
FUNCTION truncate_head_tail(text: String, max_lines: usize) -> String
lines ← text.split_lines()
IF lines.count() <= max_lines THEN
RETURN text
END IF
head_count ← max_lines / 2
tail_count ← max_lines - head_count
omitted ← lines.count() - head_count - tail_count
RETURN (
lines[0..head_count].join("\n") +
"\n\n[... {omitted} lines truncated ...]\n\n" +
lines[lines.count()-tail_count..].join("\n")
)
END FUNCTION
level2_summarize_old_turns (src/context/)
Note: The algorithm below describes the legacy in-memory compaction (
compact_messages()). The current system uses a non-destructive overlay model viaCompactionBlock/BlockCompactionStrategy. See compaction concept for the current design.
Purpose: Keep the most recent keep_recent messages in full; replace older assistant-plus-tool-result groups with one-line summaries.
FUNCTION level2_summarize_old_turns(
messages: Vec<AgentMessage>,
keep_recent: usize
) -> Vec<AgentMessage>
len ← messages.count()
IF len <= keep_recent THEN RETURN messages END IF
boundary ← len - keep_recent // messages before this index are candidates
result ← []
i ← 0
WHILE i < boundary
msg ← messages[i]
MATCH msg
CASE Assistant(content) →
// Build one-line summary
short_texts ← [t FOR t IN text content IF t.len <= 200]
tool_count ← count of ToolCall blocks in content
summary ←
IF short_texts non-empty → JOIN(short_texts)
ELSE IF tool_count > 0 → "[Assistant used {tool_count} tool(s)]"
ELSE → "[Assistant response]"
result.append(User{ content: [Text("[Summary] {summary}")] })
// Skip following ToolResult messages that belong to this turn
i ← i + 1
WHILE i < boundary AND messages[i] is ToolResult
i ← i + 1
END WHILE
CONTINUE // skip i++ below
CASE ToolResult →
// Skip orphaned tool results
i ← i + 1
CONTINUE
CASE other →
// Keep user messages and extension messages
result.append(other)
END MATCH
i ← i + 1
END WHILE
// Append recent messages in full
result.extend(messages[boundary..])
RETURN result
END FUNCTION
level3_drop_middle (src/context/)
Note: The algorithm below describes the legacy in-memory compaction (
compact_messages()). The current system uses a non-destructive overlay model viaCompactionBlock/BlockCompactionStrategy. See compaction concept for the current design.
Purpose: Keep the first keep_first and last keep_recent messages; drop everything in between, inserting a marker.
FUNCTION level3_drop_middle(
messages: Vec<AgentMessage>,
config: ContextConfig,
budget: usize
) -> Vec<AgentMessage>
len ← messages.count()
first_end ← min(config.keep_first, len)
recent_start ← max(0, len - config.keep_recent)
IF first_end >= recent_start THEN
// Not enough room to split — keep as many recent as fit
RETURN keep_within_budget(messages, budget)
END IF
removed ← recent_start - first_end
marker ← User { content: [Text("[Context compacted: {removed} messages removed to fit context window]")] }
result ← messages[0..first_end] + [marker] + messages[recent_start..]
IF total_tokens(result) > budget THEN
RETURN keep_within_budget(result, budget)
END IF
RETURN result
END FUNCTION
FUNCTION keep_within_budget(messages, budget) -> Vec<AgentMessage>
// Greedily keep most-recent messages that fit
result ← []
remaining ← budget
FOR EACH msg IN REVERSE(messages)
tokens ← message_tokens(msg)
IF tokens > remaining THEN BREAK END IF
remaining ← remaining - tokens
result.prepend(msg)
END FOR
IF result.count() < messages.count() THEN
removed ← messages.count() - result.count()
result.prepend(User { content: [Text("[Context compacted: {removed} messages removed]")] })
END IF
RETURN result
END FUNCTION
estimate_tokens (src/context/)
Purpose: Fast heuristic token count for a text string.
FUNCTION estimate_tokens(text: String) -> usize
RETURN ceil(text.byte_length() / 4)
// Heuristic: ~4 UTF-8 bytes per token for English text.
// Not precise — use tiktoken for exact counts.
END FUNCTION
FUNCTION content_tokens(content: Vec<Content>) -> usize
total ← 0
FOR EACH block IN content
MATCH block
CASE Text { text } → total += estimate_tokens(text)
CASE Image { data } →
raw_bytes ← data.base64_decoded_byte_length()
// ~750 bytes per image token; floor 85, cap 16,000
total += clamp(raw_bytes / 750, 85, 16_000)
CASE Thinking { thinking } → total += estimate_tokens(thinking)
CASE ToolCall { name, args }→
total += estimate_tokens(name) + estimate_tokens(args.to_string()) + 8
END MATCH
END FOR
RETURN total
END FUNCTION
FUNCTION message_tokens(msg: AgentMessage) -> usize
MATCH msg
CASE Llm(User { content }) → RETURN content_tokens(content) + 4
CASE Llm(Assistant { content }) → RETURN content_tokens(content) + 4
CASE Llm(ToolResult { tool_name, content }) →
RETURN content_tokens(content) + estimate_tokens(tool_name) + 8
CASE Extension { data } → RETURN estimate_tokens(data.to_string()) + 4
END MATCH
END FUNCTION
For pseudocode conventions, see the README.
4. Decision Logic
Tool Execution Strategy Dispatch
FUNCTION select_execution_strategy(strategy, tool_calls) -> ExecutionPath
MATCH strategy
CASE Sequential →
// One at a time; check steering after each tool
// Use when: tools have shared mutable state, need human-in-the-loop each step
RETURN sequential_path
CASE Parallel (default) →
// All tools concurrently via join_all
// Use when: tools are independent (most cases); lowest latency
RETURN parallel_path
CASE Batched { size } →
// Groups of `size` concurrently; check steering between groups
// Use when: tools are independent but human oversight between groups wanted
RETURN batched_path(size)
END MATCH
END FUNCTION
Compaction Level Selection
FUNCTION select_compaction_level(messages, config) -> CompactionAction
budget ← config.max_context_tokens - config.system_prompt_tokens
current ← total_tokens(messages)
IF current <= budget → RETURN NoCompaction
ELSE IF level1 fits in budget → RETURN Level1 (truncate tool outputs)
ELSE IF level2 fits in budget → RETURN Level2 (summarize old turns)
ELSE → RETURN Level3 (drop middle)
END FUNCTION
StopReason Determination (in provider implementations)
FUNCTION determine_stop_reason(provider_stop_signal) -> StopReason
MATCH provider_stop_signal
CASE "end_turn" (Anthropic) | "stop" (OpenAI) | natural end → Stop
CASE "max_tokens" (Anthropic) | "length" (OpenAI) → Length
CASE "tool_use" (Anthropic) | "tool_calls" (OpenAI) → ToolUse
CASE cancel token triggered → Aborted
CASE any provider error → Error
END MATCH
END FUNCTION
Input Filter Chain
FUNCTION apply_input_filters(filters, user_text) -> FilterChainResult
warnings ← []
FOR EACH filter IN filters
MATCH filter.filter(user_text)
CASE Pass → continue
CASE Warn(w) → warnings.append(w)
CASE Reject(r) →
// First Reject wins — discards all accumulated warnings
RETURN Rejected(r)
END MATCH
END FOR
IF warnings non-empty THEN
RETURN PassWithWarnings(warnings)
END IF
RETURN Pass
END FUNCTION
Context Overflow Detection
FUNCTION detect_context_overflow(provider_error_or_message) -> bool
// Path 1: HTTP error response
IF error is ProviderError::ContextOverflow THEN RETURN true END IF
// Path 2: SSE streaming error (Anthropic, OpenAI report overflow in-stream)
IF message.stop_reason == Error
AND message.error_message defined
AND is_context_overflow_message(message.error_message)
THEN RETURN true END IF
RETURN false
// Caller response: next turn will trigger compact_messages() if context_config set
END FUNCTION
For pseudocode conventions, see the README.
3. Initialization & Lifecycle Sequences
Agent Construction (Builder Pattern)
SEQUENCE AgentConstruction
1. BasicAgent::new(model_config: ModelConfig)
- Stores model_config (provider identity: id, api_key, base_url, api protocol, cost rates)
- Initializes messages = []
- Initializes tools = []
- Sets defaults: thinking = Off, tool_execution = Parallel, retry = default
2. .with_system_prompt(text)
- Stores system_prompt string
3. .with_tools(vec)
- Replaces or extends the tools list
5. .with_context_config(config)
- Enables automatic compaction before each turn
6. .with_execution_limits(limits)
- Enables turn/token/duration caps
7. .with_skills(skill_set)
- Appends skill XML index to system_prompt
8. .with_mcp_server_stdio(cmd, args, env) [async]
- Spawns MCP subprocess
- Calls initialize + tools/list over JSON-RPC
- Wraps each discovered tool as McpToolAdapter (implements AgentTool)
- Appends adapters to tools list
9. .with_openapi_file/url/spec(...) [async, feature-gated]
- Parses OpenAPI spec
- Generates one OpenApiToolAdapter per matching operation
- Appends adapters to tools list
10. Callbacks: .on_before_turn(f), .on_after_turn(f), .on_error(f)
- Stores function pointers; called at appropriate points in run_loop
11. .with_input_filter(filter)
- Appends to input_filters list
12. .with_compaction_strategy(strategy)
- Sets context_config.compaction.in_memory_strategy (custom compaction implementation)
END SEQUENCE
Agent Run Lifecycle
SEQUENCE AgentRun (invoked by agent.prompt("..."))
1. Acquire run lock (ensure not already streaming)
- is_streaming ← true
- Create new CancellationToken
2. Build AgentContext from current Agent state
- Snapshot: system_prompt, messages (copy), tools
3. Build AgentLoopConfig from current Agent config
- Wire get_steering_messages → drain steering_queue
- Wire get_follow_up_messages → drain follow_up_queue
4. Create event channel (tx, rx)
5. SPAWN async task: agent_loop(prompts, context, config, tx, cancel)
6. Return rx to caller immediately (non-blocking)
- Caller consumes events: AgentStart, TurnStart/End, MessageStart/Update/End,
ToolExecutionStart/Update/End, ProgressMessage, AgentEnd
7. When AgentEnd received or channel closes:
- Merge new_messages into Agent.messages
- is_streaming ← false
- CancellationToken dropped
END SEQUENCE
Abort Lifecycle
SEQUENCE AgentAbort (invoked by agent.abort())
1. IF cancel token exists THEN
cancel.cancel() // signals all child tokens
2. Agent loop checks cancel.is_cancelled() at:
- Start of each outer/inner loop iteration
- In BashTool's tokio::select! race
- In ReadFileTool/WriteFileTool/EditFileTool before each I/O op
3. Loop exits cleanly at next check point; AgentEnd NOT emitted on abort
[AMBIGUOUS: AgentEnd may or may not be emitted depending on where
in the loop cancellation is detected — Start/Done events from provider
may still arrive before cancellation is noticed]
END SEQUENCE
Message Persistence
SEQUENCE MessagePersistence
Save:
1. agent.save_messages() → serde_json::to_string(agent.messages)
2. Caller writes JSON string to disk/storage
Restore:
1. Caller reads JSON string from disk/storage
2. agent.restore_messages(json_str) → serde_json::from_str(json_str) → Vec<AgentMessage>
3. Agent.messages ← deserialized messages
4. Next agent.prompt() continues from restored history
All types in AgentMessage tree derive Serialize + Deserialize.
JSON format: array of untagged AgentMessage items;
Llm variant: has "role" field ("user", "assistant", "toolResult")
Extension variant: has "role" field "extension" + "kind" + "data"
END SEQUENCE
BasicAgent::new and BasicAgent::prompt (src/agents/basic_agent.rs)
Purpose: Construct a BasicAgent and start a run. These are the primary application-facing entry points.
FUNCTION BasicAgent::new(model_config: ModelConfig) -> BasicAgent
RETURN BasicAgent {
model_config: model_config, // complete provider identity: id, api_key, base_url, api, cost
system_prompt: "",
thinking_level: Off,
max_tokens: None,
temperature: None,
messages: [],
tools: [],
steering_queue: Arc(Mutex([])),
follow_up_queue: Arc(Mutex([])),
steering_mode: QueueMode::OneAtATime,
follow_up_mode: QueueMode::OneAtATime,
context_config: Some(ContextConfig::default()),
execution_limits: Some(ExecutionLimits::default()),
cache_config: CacheConfig::default(),
tool_execution: Parallel,
retry_config: RetryConfig::default(),
before_turn: None,
after_turn: None,
on_error: None,
input_filters: [],
// compaction strategies are now inside context_config.compaction (G5)
cancel: None,
is_streaming: false
}
END FUNCTION
FUNCTION Agent::prompt(text: String) -> UnboundedReceiver<AgentEvent>
RETURN Agent::prompt_messages([AgentMessage::Llm(Message::user(text))])
END FUNCTION
FUNCTION Agent::prompt_messages(messages: Vec<AgentMessage>) -> UnboundedReceiver<AgentEvent>
(tx, rx) ← new unbounded channel
SPAWN Agent::prompt_messages_with_sender(messages, tx)
RETURN rx
END FUNCTION
FUNCTION Agent::prompt_messages_with_sender(
messages: Vec<AgentMessage>,
tx: EventSender<AgentEvent>
) [async]
// Guard: panics if already streaming
ASSERT NOT self.is_streaming,
"Agent is already streaming. Use steer() or follow_up()."
self.is_streaming ← true
self.cancel ← Some(CancellationToken::new())
// Build context snapshot for this run
context ← AgentContext {
system_prompt: self.system_prompt.clone(),
messages: self.messages.clone(),
tools: self.tools // borrowed
}
// Wire queue closures — capture Arc pointers
steering_arc ← Arc::clone(self.steering_queue)
followup_arc ← Arc::clone(self.follow_up_queue)
config ← AgentLoopConfig {
provider: self.provider,
model: self.model,
api_key: self.api_key,
thinking_level: self.thinking_level,
max_tokens: self.max_tokens,
temperature: self.temperature,
model_config: self.model_config,
get_steering_messages: closure {
LOCK(steering_arc)
MATCH self.steering_mode
CASE OneAtATime → IF queue non-empty THEN [queue.remove(0)] ELSE []
CASE All → queue.drain_all()
UNLOCK
},
get_follow_up_messages: closure {
LOCK(followup_arc)
MATCH self.follow_up_mode
CASE OneAtATime → IF queue non-empty THEN [queue.remove(0)] ELSE []
CASE All → queue.drain_all()
UNLOCK
},
context_config: self.context_config, // includes compaction strategies (G5)
execution_limits: self.execution_limits,
cache_config: self.cache_config,
tool_execution: self.tool_execution,
retry_config: self.retry_config,
before_turn: self.before_turn,
after_turn: self.after_turn,
on_error: self.on_error,
input_filters: self.input_filters
}
new_messages ← AWAIT agent_loop(messages, context, config, tx, self.cancel.unwrap())
// Merge new messages back into Agent.messages
self.messages.extend(new_messages)
self.is_streaming ← false
self.cancel ← None
END FUNCTION
For pseudocode conventions, see the README.
5. Concurrency & Async Patterns
Parallel Tool Execution
PATTERN ParallelToolExecution
// When Parallel strategy is used, all tool calls race concurrently.
// This is safe because:
// 1. Tools share no mutable state (each has its own ToolContext)
// 2. Each ToolContext gets a child cancellation token (same lineage, independent trigger)
// 3. The event channel (tx) is cloned into each ToolContext — Unbounded sends never block
// 4. Results are collected in original order via join_all (preserves tool_call ordering)
futures ← [execute_single_tool(id, name, args) FOR EACH (id, name, args) IN tool_calls]
results ← AWAIT_ALL(futures) // futures::join_all — waits for ALL, order preserved
// Steering is checked AFTER all complete (cannot interrupt mid-batch in Parallel mode)
PATTERN SequentialToolExecution
// Tools run one at a time; steering is checked after each.
// Use when tools access shared resources (e.g., same file, same database row).
PATTERN BatchedToolExecution
// Groups of N run in parallel; steering checked between groups.
// Balances latency (N concurrent) with control (interrupt between groups).
Cancellation Token Propagation
PATTERN CancellationPropagation
// CancellationToken forms a tree. Cancelling a parent cancels all children.
Agent.cancel (root token)
└── AgentLoop cancel (same token passed in)
└── ToolContext.cancel (child_token() — inherits from parent)
└── SubAgentTool: forwards parent cancel to child agent_loop()
// Checks occur at:
// - Top of each loop iteration in run_loop (fast path)
// - tokio::select! in BashTool (races against timeout)
// - Explicit is_cancelled() checks in ReadFileTool, WriteFileTool, EditFileTool
// Important: abort() on Agent cancels ALL in-progress tool calls simultaneously,
// regardless of execution strategy.
Event Channel Architecture
PATTERN EventChannelArchitecture
// Single producer (AgentLoop), single consumer (caller).
// Channel: tokio::mpsc::unbounded_channel — never blocks sender.
AgentLoop ──tx──→ UnboundedChannel ──rx──→ Application
// Sub-agent events are NOT directly forwarded to parent channel.
// SubAgentTool spawns a separate task to translate sub-agent events:
// AgentEvent::MessageUpdate(Text(delta)) → on_update(ToolResult{text:delta})
// AgentEvent::ProgressMessage{text} → on_progress(text)
// These are then emitted to the parent channel as ToolExecutionUpdate/ProgressMessage.
// This means: parent sees sub-agent activity but via ToolExecutionUpdate wrappers,
// NOT as nested AgentStart/AgentEnd/TurnStart/TurnEnd events.
Steering Queue Thread Safety
PATTERN SteeringQueueSafety
// steering_queue and follow_up_queue are Arc<Mutex<Vec<AgentMessage>>>.
// Write path (application thread):
// agent.steer(msg) → LOCK(queue), queue.push(msg), UNLOCK
// agent.follow_up(msg) → LOCK(follow_up_queue), queue.push(msg), UNLOCK
// Read path (agent loop task) — behavior depends on QueueMode:
// QueueMode::OneAtATime (default):
// LOCK(queue), msg = queue.remove(0), UNLOCK, return [msg]
// → delivers exactly one message per check; rest remain for next check
// QueueMode::All:
// LOCK(queue), msgs = queue.drain_all(), UNLOCK, return msgs
// → delivers everything at once
// Read is called only between tool executions — never concurrently with another read.
// No deadlock risk: lock is held for microseconds (no I/O inside lock).
// No data race: Mutex guarantees exclusive access.
// Queues are passed to AgentLoopConfig as closures capturing the Arc pointer,
// so the external caller can enqueue messages from any thread at any time.
For pseudocode conventions, see the README.
delay_for_attempt (src/provider/retry.rs)
Purpose: Compute the sleep duration before a retry attempt using exponential backoff with jitter.
FUNCTION delay_for_attempt(config: RetryConfig, attempt: usize) -> Duration
// attempt is 1-indexed
base_ms ← config.initial_delay_ms * (config.backoff_multiplier ^ (attempt - 1))
capped_ms ← min(base_ms, config.max_delay_ms)
// ±20% uniform jitter: multiply by random value in [0.8, 1.2]
jitter ← 0.8 + random_float_0_to_1() * 0.4
delay_ms ← floor(capped_ms * jitter)
RETURN Duration::from_ms(delay_ms)
// Examples with defaults (initial=1000ms, multiplier=2.0, max=30000ms):
// attempt 1 → base=1000ms → ~800–1200ms
// attempt 2 → base=2000ms → ~1600–2400ms
// attempt 3 → base=4000ms → ~3200–4800ms
END FUNCTION
For pseudocode conventions, see the README.
ProviderError::classify (src/provider/traits.rs)
Purpose: Map an HTTP error response to the correct ProviderError variant.
FUNCTION ProviderError::classify(status: u16, message: String) -> ProviderError
IF is_context_overflow(status, message) THEN
RETURN ContextOverflow { message }
END IF
IF status == 429 THEN
RETURN RateLimited { retry_after_ms: None }
END IF
IF status == 401 OR status == 403 THEN
RETURN Auth(message)
END IF
RETURN Api(message)
END FUNCTION
FUNCTION is_context_overflow(status: u16, message: String) -> bool
// Some providers (Cerebras, Mistral) return 400/413 with empty body
IF (status == 400 OR status == 413) AND message.trim() is empty THEN
RETURN true
END IF
lower ← message.to_lowercase()
RETURN any of OVERFLOW_PHRASES is a substring of lower
// OVERFLOW_PHRASES includes:
// "prompt is too long" (Anthropic)
// "input is too long" (Bedrock)
// "exceeds the context window" (OpenAI)
// "exceeds the maximum" (Google)
// "maximum prompt length" (xAI)
// "reduce the length of the messages" (Groq)
// "maximum context length" (OpenRouter)
// "context length exceeded" (generic)
// "too many tokens" (generic)
// ... 15 phrases total
END FUNCTION
For pseudocode conventions, see the README.
SubAgentTool::execute (src/agents/sub_agent.rs)
Purpose: Delegate a task to an isolated child agent loop, return its final text as a ToolResult.
Preconditions: params.task is a non-empty string.
Postconditions: Returns final assistant text from the child run; child context is discarded.
FUNCTION SubAgentTool::execute(
params: JSON,
ctx: ToolContext
) -> Result<ToolResult, ToolError>
task ← params["task"] as String // ERROR "Missing required 'task' parameter" if absent
cancel ← ctx.cancel
on_update ← ctx.on_update
on_progress ← ctx.on_progress
// Build fresh child context (no history carried over)
child_context ← AgentContext {
system_prompt: self.system_prompt,
messages: [], // isolated — starts empty
tools: self.tools // child has its own toolset (no SubAgentTool instances)
}
child_config ← AgentLoopConfig {
provider: self.provider,
model: self.model,
api_key: self.api_key,
thinking_level: self.thinking_level,
max_tokens: self.max_tokens,
execution_limits: {
max_turns: self.max_turns, // primary guard (default: 10)
max_total_tokens: 1_000_000, // generous fallback
max_duration: 300s // generous fallback
},
// No steering, no follow-ups, no input filters in sub-agents
get_steering_messages: null,
get_follow_up_messages: null,
input_filters: [],
...other config from self
}
(event_tx, event_rx) ← new unbounded channel
// Forward events to parent if callbacks are present
IF on_update defined OR on_progress defined THEN
forwarder ← SPAWN async task:
WHILE event ← event_rx.recv()
IF event is ProgressMessage { text } THEN
on_progress(text) // if defined
END IF
IF event is MessageUpdate { delta: Text(delta) } THEN
on_update(ToolResult{ content: [Text(delta)] })
END IF
IF event is ToolExecutionStart { tool_name } THEN
on_update(ToolResult{ content: [Text("[sub-agent calling tool: {tool_name}]")] })
END IF
END WHILE
END IF
prompt_msg ← AgentMessage::Llm(Message::User(task))
new_messages ← AWAIT agent_loop([prompt_msg], child_context, child_config, event_tx, cancel)
IF forwarder defined THEN AWAIT forwarder END IF
// Extract final assistant text
result_text ← extract_final_text(new_messages)
RETURN Ok(ToolResult {
content: [Text(result_text)],
details: { sub_agent: self.tool_name, turns: new_messages.count() }
})
END FUNCTION
FUNCTION extract_final_text(messages: Vec<AgentMessage>) -> String
FOR EACH msg IN REVERSE(messages)
IF msg is Assistant THEN
texts ← [t FOR t IN msg.content IF t is Text]
IF texts non-empty THEN
RETURN JOIN(texts)
END IF
END IF
END FOR
RETURN "(sub-agent produced no text output)"
END FUNCTION
For pseudocode conventions, see the README.
BashTool::execute (src/tools/bash.rs)
Purpose: Execute a shell command, capture output, enforce safety.
Preconditions: params.command is present.
Postconditions: Returns Ok(ToolResult) even for non-zero exit codes (LLM needs the error to self-correct).
FUNCTION BashTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>
command ← params["command"] as String // InvalidArgs if missing
cancel ← ctx.cancel
// Safety: check deny patterns (substring match)
FOR EACH pattern IN self.deny_patterns
IF command contains pattern THEN
RETURN Err(Failed("Command blocked by safety policy: contains '{pattern}'"))
END IF
END FOR
// Optional confirmation callback
IF self.confirm_fn defined AND NOT self.confirm_fn(command) THEN
RETURN Err(Failed("Command was not confirmed by the user."))
END IF
// Build subprocess: bash -c "{command}"
cmd ← Command("bash", ["-c", command])
IF self.cwd defined THEN cmd.current_dir(self.cwd) END IF
cmd.stdout(piped), cmd.stderr(piped)
// Race: cancellation vs timeout vs command completion
result ← SELECT {
cancel.cancelled() → RETURN Err(Cancelled)
sleep(self.timeout) → RETURN Err(Failed("Command timed out after {N}s"))
cmd.output() → result // may be Err if spawn failed
}
output ← result // Err(io) → Err(Failed("Failed to execute: {e}"))
stdout ← output.stdout as utf8 (lossy)
stderr ← output.stderr as utf8 (lossy)
// Truncate at limit
IF stdout.len > self.max_output_bytes THEN
stdout ← stdout[0..max_output_bytes] + "\n... (output truncated)"
END IF
IF stderr.len > self.max_output_bytes THEN
stderr ← stderr[0..max_output_bytes] + "\n... (output truncated)"
END IF
exit_code ← output.exit_code OR -1
text ←
IF stderr is empty THEN
"Exit code: {exit_code}\n{stdout}"
ELSE
"Exit code: {exit_code}\nSTDOUT:\n{stdout}\nSTDERR:\n{stderr}"
END IF
// Always Ok — non-zero exit is NOT a ToolError
RETURN Ok(ToolResult {
content: [Text(text)],
details: { exit_code, success: exit_code == 0 }
})
END FUNCTION
For pseudocode conventions, see the README.
ReadFileTool::execute (src/tools/file.rs)
Purpose: Read a file's contents. Routes to binary (image) or text path based on extension.
FUNCTION ReadFileTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>
path ← params["path"] as String // InvalidArgs if missing
IF ctx.cancel.is_cancelled THEN RETURN Err(Cancelled) END IF
metadata ← AWAIT fs.metadata(path) // Err → Failed("Cannot access {path}: {e}")
IF is_image_extension(path) THEN
// ── Image path ────────────────────────────────────────────────────────
IF metadata.size > 20MB THEN
RETURN Err(Failed("Image too large"))
END IF
bytes ← AWAIT fs.read(path)
data ← base64_encode(bytes)
mime_type ← get_mime_type(path)
RETURN Ok(ToolResult {
content: [Image { data, mime_type }],
details: { path, bytes: bytes.len() }
})
END IF
// ── Text path ─────────────────────────────────────────────────────────
IF metadata.size > self.max_bytes THEN
RETURN Err(Failed("File too large. Use offset/limit for partial reads."))
END IF
content ← AWAIT fs.read_to_string(path)
lines ← content.split_lines()
total ← lines.count()
offset ← params["offset"] as usize (1-indexed) // optional, default: 1
limit ← params["limit"] as usize // optional, default: all
(start, end) ← compute_range(offset, limit, total)
// Line-numbered output: " 1 | first line"
numbered ← ["{start+i+1:>4} | {line}" FOR (i, line) IN enumerate(lines[start..end])]
header ←
IF start > 0 OR end < total THEN "[Lines {start+1}-{end} of {total}]"
ELSE "[{total} lines]"
RETURN Ok(ToolResult {
content: [Text("{header}\n{numbered.join('\n')}")],
details: { path }
})
END FUNCTION
EditFileTool::execute (src/tools/edit.rs)
Purpose: Make a surgical search-and-replace edit in an existing file.
Preconditions: File exists; old_text occurs exactly once in the file.
Postconditions: File on disk has exactly the one occurrence of old_text replaced by new_text.
FUNCTION EditFileTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>
path ← params["path"] as String // InvalidArgs if missing
old_text ← params["old_text"] as String // InvalidArgs if missing
new_text ← params["new_text"] as String // InvalidArgs if missing
IF ctx.cancel.is_cancelled THEN RETURN Err(Cancelled) END IF
content ← AWAIT fs.read_to_string(path)
// Err → Failed("Cannot read {path}. Use write_file to create new files.")
match_count ← count of occurrences of old_text in content
IF match_count == 0 THEN
// Provide helpful fuzzy hint
hint ← find_similar_text(content, old_text)
IF hint defined THEN
message ← "old_text not found in {path}.\n\nDid you mean:\n```\n{hint}\n```\n..."
ELSE
message ← "old_text not found in {path}.\n\nTip: Use read_file to see contents..."
END IF
RETURN Err(Failed(message))
END IF
IF match_count > 1 THEN
RETURN Err(Failed(
"old_text matches {match_count} locations. Include more context to make match unique."
))
END IF
// Replace exactly the first (and only) occurrence
new_content ← content.replace_once(old_text, new_text)
AWAIT fs.write(path, new_content)
old_lines ← old_text.line_count()
new_lines ← new_text.line_count()
RETURN Ok(ToolResult {
content: [Text("Replaced {old_lines} line(s) with {new_lines} line(s) in {path}")],
details: { path, old_lines, new_lines }
})
END FUNCTION
FUNCTION find_similar_text(content: String, target: String) -> Option<String>
// Fuzzy hint: find the first line of target in the file
target_trimmed ← target.trim()
first_line ← target_trimmed.first_line().trim()
IF first_line is empty THEN RETURN None END IF
lines ← content.split_lines()
FOR EACH (i, line) IN enumerate(lines)
IF line contains first_line THEN
end ← min(i + target_trimmed.line_count() + 1, lines.count())
RETURN Some(lines[i..end].join("\n"))
END IF
END FOR
RETURN None
END FUNCTION
SkillSet::format_for_prompt (src/context/skills.rs)
Purpose: Format all loaded skills as an XML index for injection into the system prompt. Standard: Conforms to the AgentSkills open standard (agentskills.io/integrate-skills).
FUNCTION SkillSet::format_for_prompt() -> String
IF self.skills is empty THEN RETURN "" END IF
// Skills are sorted by name ascending
sorted_skills ← sort(self.skills, by: skill.name)
out ← "<available_skills>\n"
FOR EACH skill IN sorted_skills
out += " <skill>\n"
out += " <name>" + xml_escape(skill.name) + "</name>\n"
out += " <description>" + xml_escape(skill.description) + "</description>\n"
out += " <location>" + xml_escape(skill.file_path.to_string()) + "</location>\n"
out += " </skill>\n"
END FOR
out += "</available_skills>"
RETURN out
// xml_escape replaces: & → & < → < > → > " → " ' → '
END FUNCTION
// Example output:
// <available_skills>
// <skill>
// <name>weather</name>
// <description>Get current weather and forecasts.</description>
// <location>/home/user/.skills/weather/SKILL.md</location>
// </skill>
// </available_skills>
SkillSet::load (src/context/skills.rs)
Purpose: Load skills from one or more directories. Later directories override earlier ones on name collision.
FUNCTION SkillSet::load(dirs: Vec<Path>) -> Result<SkillSet, SkillError>
skill_map ← HashMap<String, Skill> // key = skill name
FOR EACH (index, dir) IN enumerate(dirs)
IF dir does not exist THEN
CONTINUE // silently skip missing directories
END IF
source_label ← "dir:{index}"
FOR EACH entry IN list_subdirectories(dir)
skill_md_path ← entry.path / "SKILL.md"
IF skill_md_path does not exist THEN
CONTINUE
END IF
content ← read_to_string(skill_md_path)
(name, description) ← parse_frontmatter(content)
// Returns SkillError::InvalidFrontmatter or SkillError::MissingField on failure
base_dir ← canonicalize(entry.path)
file_path ← base_dir / "SKILL.md"
skill ← Skill { name, description, file_path, base_dir, source: source_label }
skill_map[name] ← skill // later dirs OVERRIDE earlier on name collision
END FOR
END FOR
skills ← sort(skill_map.values(), by: skill.name)
RETURN Ok(SkillSet { skills })
END FUNCTION
FUNCTION parse_frontmatter(content: String) -> Result<(name, description), SkillError>
// Content must start with "---"
IF NOT content.trim_start().starts_with("---") THEN
RETURN Err(InvalidFrontmatter)
END IF
// Find closing "---"
yaml_block ← content between first "---" and next "\n---"
IF no closing delimiter THEN
RETURN Err(InvalidFrontmatter)
END IF
name ← ""
description ← ""
FOR EACH line IN yaml_block.lines()
IF line.starts_with("name:") THEN
name ← unquote(line.after("name:").trim())
ELSE IF line.starts_with("description:") THEN
description ← unquote(line.after("description:").trim())
END IF
// All other YAML fields silently ignored
END FOR
IF name is empty THEN RETURN Err(MissingField("name")) END IF
IF description is empty THEN RETURN Err(MissingField("description")) END IF
RETURN Ok((name, description))
// unquote(): strips surrounding single or double quotes if present
END FUNCTION
ListFilesTool::execute (src/tools/list.rs)
Purpose: List files in a directory, with optional glob filtering and depth limit.
FUNCTION ListFilesTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>
path ← params["path"] as String // optional; default: current directory
pattern ← params["pattern"] as String // optional glob filter, e.g. "*.rs"
max_depth ← params["max_depth"] as usize // optional; default: 3
IF ctx.cancel.is_cancelled THEN RETURN Err(Cancelled) END IF
// Build `find` command
cmd ← "find {path} -maxdepth {max_depth} -type f"
IF pattern defined THEN cmd += " -name '{pattern}'" END IF
// Excluded paths (prepended to command):
// -not -path "*/target/*"
// -not -path "*/.git/*"
// -not -path "*/node_modules/*"
SELECT {
ctx.cancel.cancelled() → RETURN Err(Cancelled)
sleep(self.timeout) → RETURN Err(Failed("List timed out"))
run(cmd) → output
}
lines ← output.stdout.split_lines()
truncated ← false
IF lines.count() > self.max_results THEN
lines ← lines[0..self.max_results]
truncated ← true
END IF
text ← lines.join("\n")
IF truncated THEN
text += "\n... (truncated at {self.max_results} results)"
END IF
RETURN Ok(ToolResult {
content: [Text(text)],
details: { total: lines.count(), truncated }
})
END FUNCTION
Defaults: max_results = 200, timeout = 10s
SearchTool::execute (src/tools/search.rs)
Purpose: Search file contents using regex via ripgrep (preferred) or grep (fallback).
FUNCTION SearchTool::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>
pattern ← params["pattern"] as String // required; regex
path ← params["path"] as String // optional; default: self.root or cwd
include ← params["include"] as String // optional file glob, e.g. "*.rs"
case_sensitive ← params["case_sensitive"] as bool // optional; default: false
IF ctx.cancel.is_cancelled THEN RETURN Err(Cancelled) END IF
// Prefer ripgrep (rg) if available, fall back to grep
IF rg_available() THEN
cmd ← ["rg", "--line-number", "--no-heading",
"--max-count={self.max_results}"]
IF NOT case_sensitive THEN cmd += ["--ignore-case"] END IF
IF include defined THEN cmd += ["--glob={include}"] END IF
cmd += [pattern, path]
ELSE
cmd ← ["grep", "-r", "-n", "-m{self.max_results}"]
IF NOT case_sensitive THEN cmd += ["-i"] END IF
IF include defined THEN cmd += ["--include={include}"] END IF
cmd += [pattern, path]
END IF
SELECT {
ctx.cancel.cancelled() → RETURN Err(Cancelled)
sleep(self.timeout) → RETURN Err(Failed("Search timed out"))
run(cmd) → (exit_code, stdout, stderr)
}
// Exit code 1 = no matches found (not an error)
IF exit_code == 1 AND stderr is empty THEN
stdout ← ""
END IF
// Exit code 2+ or non-empty stderr = actual failure
IF exit_code >= 2 OR (exit_code != 0 AND stderr non-empty) THEN
RETURN Err(Failed(stderr))
END IF
lines ← stdout.split_lines()
match_count ← lines.count()
text ← stdout
IF match_count >= self.max_results THEN
text += "\n... (truncated at {self.max_results} matches)"
END IF
RETURN Ok(ToolResult {
content: [Text(text)],
details: { matches: match_count }
})
END FUNCTION
Defaults: max_results = 50, timeout = 30s
Output format: {file}:{line_number}:{matched_line}
For pseudocode conventions, see the README.
McpClient::initialize (src/mcp/)
Purpose: Perform the 3-step MCP handshake to establish a session with a tool server.
FUNCTION McpClient::connect_stdio(
command: String,
args: Vec<String>,
env: Option<Map<String,String>>
) -> Result<McpClient, McpError>
// Spawn child process
process ← spawn_process(command, args, env,
stdin=piped, stdout=piped, stderr=inherit)
// McpError::Transport on spawn failure
transport ← StdioTransport { process }
client ← McpClient { transport: Arc(Mutex(transport)), server_info: None }
AWAIT client.initialize()
RETURN Ok(client)
END FUNCTION
FUNCTION McpClient::initialize() -> Result<ServerInfo, McpError>
// Step 1: send initialize
result ← AWAIT self.send_request("initialize", {
protocolVersion: "2024-11-05",
capabilities: {},
clientInfo: { name: "phi-core", version: CARGO_PKG_VERSION }
})
// Deserialize result as InitializeResult { protocolVersion, capabilities, serverInfo }
self.server_info ← Some(result.serverInfo)
// Step 2: send notifications/initialized (no params)
AWAIT self.send_request("notifications/initialized", None)
// Server may ignore the response id for this notification
RETURN Ok(result.serverInfo)
END FUNCTION
FUNCTION McpClient::send_request(method: String, params: Option<Value>) -> Result<Value, McpError>
request ← JsonRpcRequest {
jsonrpc: "2.0",
id: ATOMIC_COUNTER.fetch_add(1), // monotonically increasing from 1
method,
params
}
response ← AWAIT self.transport.send(request)
IF response.error is Some THEN
RETURN Err(JsonRpc { code: error.code, message: error.message })
END IF
IF response.result is None THEN
RETURN Err(Protocol("Empty result"))
END IF
RETURN Ok(response.result)
END FUNCTION
FUNCTION McpClient::list_tools() -> Result<Vec<McpToolInfo>, McpError>
result ← AWAIT self.send_request("tools/list", {})
RETURN deserialize result.tools as Vec<McpToolInfo>
END FUNCTION
FUNCTION McpClient::call_tool(name: String, arguments: Value) -> Result<McpToolCallResult, McpError>
result ← AWAIT self.send_request("tools/call", { name, arguments })
RETURN deserialize result as McpToolCallResult
END FUNCTION
For pseudocode conventions, see the README.
OpenApiToolAdapter::execute (src/openapi/)
Purpose: Execute a single OpenAPI operation as an HTTP request.
FUNCTION OpenApiToolAdapter::execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>
// Normalize params: null → {}; non-object → error
IF params is null THEN params ← {} END IF
IF params is NOT object THEN
RETURN Ok(ToolResult { content: [Text("Error: params must be an object")] })
END IF
// ── Step 1: Substitute path parameters ────────────────────────────────────
url_path ← self.info.path // e.g. "/users/{userId}/posts/{postId}"
FOR EACH param_name IN self.info.path_params
value ← params[param_name]
IF value is missing THEN
RETURN Ok(ToolResult { content: [Text("Error: missing required path param '{param_name}'")] })
END IF
encoded ← percent_encode_rfc3986(value.to_string())
url_path ← replace(url_path, "{" + param_name + "}", encoded)
END FOR
// ── Step 2: Build base URL ─────────────────────────────────────────────────
url ← self.base_url + url_path
// ── Step 3: Build HTTP request ────────────────────────────────────────────
method ← parse_http_method(self.info.method) // GET, POST, PUT, etc.
request ← self.client.request(method, url)
// Query parameters
FOR EACH param_name IN self.info.query_params
IF params[param_name] defined THEN
request ← request.query(param_name, params[param_name].to_string())
END IF
END FOR
// Header parameters
FOR EACH param_name IN self.info.header_params
IF params[param_name] defined THEN
request ← request.header(param_name, params[param_name].to_string())
END IF
END FOR
// Authentication
MATCH self.config.auth
CASE None → (no-op)
CASE Bearer(token) → request ← request.bearer_auth(token)
CASE ApiKey{header,value} → request ← request.header(header, value)
END MATCH
// Custom headers
FOR EACH (key, value) IN self.config.custom_headers
request ← request.header(key, value)
END FOR
// Request body (application/json only)
IF self.info.has_body THEN
body ← params["body"] OR params["_request_body"]
IF body defined THEN
request ← request.json(body)
Developer Conceptual Hierarchy
A developer-facing map of every concept in phi-core, centered on the Agent entity. Designed to enable a future UI layer. Every concept is tagged:
[EXISTS]= in code now |[PLANNED]= defined but not implemented |[CONCEPTUAL]= idea only
The Agent: Three Attributes + Skills
┌──────────────────┐
│ AGENT │
│ agent_id [E] │
└───────┬──────────┘
│
┌──────────────┬───────────┼───────────┬──────────────┐
│ │ │ │ │
┌──────▼──────┐ ┌─────▼─────┐ ┌──▼───┐ ┌────▼─────┐ ┌──────▼──────┐
│ Profile │ │ Sessions │ │Skills│ │ MCP │ │Introspection│
│ [E] │ │ [E] │ │ [E] │ │ [E] │ │ [C] │
│ personality │ │ (Tasks) │ │ │ │connectors│ │ memory │
└──────┬──────┘ └─────┬─────┘ └──────┘ └──────────┘ └──────┬──────┘
│ │ │
│ ┌────▼─────┐ ┌────▼─────┐
│ │ Session │ │ Memory │
│ │ [E] │ │ [C] │
│ └────┬─────┘ ├──────────┤
│ │ │Episodic │
│ ┌────▼─────┐ │Semantic │
│ │ Loop │ │Procedural│
│ │ [E] │ └──────────┘
│ └────┬─────┘
│ │
│ ┌────▼─────┐
│ │ Turn │
│ │ [E] │
│ └────┬─────┘
│ │
│ ┌─────────┼──────────┐
│ │ │ │
│ ┌─▼──┐ ┌──▼───┐ ┌──▼──┐
│ │Msg │ │ Tool │ │Delta│
│ │[E] │ │ [E] │ │ [E] │
│ └────┘ └──────┘ └─────┘
│
┌──────▼──────────────────────────────────────┐
│ INDEPENDENT ENTITIES │
├─────────────────────────────────────────────┤
│ Provider [E] Event [E] │
│ Message [E] Compaction [E] │
│ Configuration [E] │
│ SystemPromptStrategy [E] │
│ ContextTranslationStrategy [E] │
└─────────────────────────────────────────────┘
[E] = EXISTS [P] = PLANNED [C] = CONCEPTUAL
Model/Provider Fallback Hierarchy
Loop model (LoopConfigSnapshot) → Agent default model
[EXISTS] [EXISTS]
Each loop captures its model config in LoopConfigSnapshot at AgentStart time. Session-level model override has been removed; the fallback is directly to the Agent's default model.
Entity Quick Reference
| Entity | Code Location | Status | Deep Dive |
|---|---|---|---|
| Agent | agents/basic_agent.rs | [EXISTS] | agent.md |
| Agent Profile | agents/profile.rs | [EXISTS] | agent.md |
| Session | session/model.rs | [EXISTS] | session.md |
| Loop (LoopRecord) | session/model.rs | [EXISTS] | loop.md |
| Turn | session/model.rs + event-pair | [EXISTS] events; [EXISTS] struct | turn.md |
| Message | types/content.rs | [EXISTS] | message.md |
| AgentMessage | types/agent_message.rs | [EXISTS] | message.md |
| Tool | types/tool.rs | [EXISTS] | tool.md |
| Provider | provider/model.rs | [EXISTS] | provider.md |
| Event | types/event.rs | [EXISTS] | event.md |
| Compaction | context/compaction.rs | [EXISTS] | compaction.md |
| Configuration | context/config.rs + agent_loop/config.rs | [EXISTS] | config.md |
| SystemPromptStrategy | trait + implementations | [EXISTS] | agent.md |
| ContextTranslationStrategy | provider/context_translation.rs | [EXISTS] | provider.md |
| Introspection / Memory | not in code | [CONCEPTUAL] | agent.md |
| Permissions | not in code | [CONCEPTUAL] | agent.md |
Callback Ownership
Callbacks live on the entity they observe:
| Callback | Owner | Status |
|---|---|---|
| before_task / after_task | Session (SessionRecorderConfig) | [EXISTS] |
| before_loop / after_loop | Loop | [EXISTS] |
| on_error | Loop | [EXISTS] |
| before_turn / after_turn | Turn | [EXISTS] |
| before_tool_execution / after_tool_execution | Tool | [EXISTS] |
| before_tool_execution_update / after_tool_execution_update | Tool | [EXISTS] |
| before_compaction_start / after_compaction_end | Compaction | [EXISTS] |
Conceptual vs Code: Key Misalignments
These are places where the conceptual model differs from current code. They represent future refactoring opportunities:
| Concept | Status | Notes |
|---|---|---|
[EXISTS] ✓ | AgentProfile struct in agents/profile.rs with profile_id, name, description, system_prompt, etc. | |
| Removed | Session-level thinking_level removed. Now captured per-loop in LoopConfigSnapshot. AgentProfile::resolve_thinking_level() removed. | |
| Removed | Session-level temperature removed. Now captured per-loop in LoopConfigSnapshot. AgentProfile::resolve_temperature() removed. | |
| Removed | Session-level model_config removed. Model config is now captured per-loop in LoopConfigSnapshot. | |
[EXISTS] ✓ | SessionScope::Ephemeral | Persistent (G7). | |
[EXISTS] ✓ | Trait + 3-entity model (strategy template → prompt instance → agent ref). file: and {{...}} resolution. | |
[EXISTS] ✓ | Strategies consolidated into CompactionConfig (G5). | |
[EXISTS] ✓ | On SessionRecorderConfig (G2). | |
[EXISTS] ✓ | Trait + DefaultContextTranslation in provider/context_translation.rs (G8). | |
| Introspection | [CONCEPTUAL] | Memory extraction with 3 categories (episodic, semantic, procedural). Not in code. |
| Permissions | [CONCEPTUAL] | Include/exclude rules on Agent. Not in code. |
Core Gaps
Prioritized list of features that belong in phi-core (per First Principles) but are not yet implemented. Each gap is derived from [CONCEPTUAL] items in the entity specs.
Priority 1 — Small, High-Value — ALL IMPLEMENTED ✓
| ID | Feature | Status |
|---|---|---|
| G1 | Compaction callbacks (before_compaction_start / after_compaction_end) | [EXISTS] — On AgentLoopConfig. |
| G3 | Agent Profile struct | [EXISTS] — AgentProfile in agents/profile.rs. |
| G4 | Session model override | Removed — Session.model_config removed. Model config now captured per-loop in LoopConfigSnapshot. |
| G7 | Session scope | [EXISTS] — SessionScope::Ephemeral | Persistent. |
| G9 | Session task attributes | Removed — Session.thinking_level, Session.temperature, and Session.model_config moved to per-loop LoopConfigSnapshot. AgentProfile::resolve_thinking_level() and resolve_temperature() removed. |
Priority 2 — Medium Refactors
| ID | Feature | Why Core | Effort | Spec Ref |
|---|---|---|---|---|
| G5 | Compaction config consolidation [EXISTS] | Compaction strategies (in_memory_strategy, block_strategy) are now fields on CompactionConfig, consolidating what was previously split across ContextConfig + AgentLoopConfig. | ~100 LOC | config.md, misalignment table above |
| G2 | Session-level callbacks (before_task / after_task) [EXISTS] | before_task and after_task callbacks now exist on SessionRecorderConfig. before_task fires on the first AgentStart with a new session_id; after_task fires on flush(). | ~80 LOC | callback ownership table above |
| G6 | SystemPromptStrategy trait [EXISTS] | The SystemPromptStrategy trait now exists with a compose(context) -> String method. Supports a 3-entity model: strategy template, prompt instance, profile ref. Full 5-layer composition is a future enhancement. | ~100 LOC | agent.md |
Priority 3 — Needs Design
| ID | Feature | Why Core | Effort | Spec Ref |
|---|---|---|---|---|
| G8 | ContextTranslationStrategy [EXISTS] | ContextTranslationStrategy trait with DefaultContextTranslation. Read-only translation for cross-provider compatibility. | ~150 LOC | provider.md, misalignment table above |
| G10 | Tool Registry [EXISTS] | ToolRegistry maps config tool names to instances. 6 built-in tools registered. | ~200 LOC | config.md |
External — Not Core
These are explicitly not core gaps. They can be built on top of phi-core using existing extension points:
| Item | Extension Point |
|---|---|
| Introspection / Memory | External crate using G1 compaction callbacks + session data |
| Permissions | InputFilter + BeforeToolExecutionFn |
| Multi-agent orchestration | agent_loop / agent_loop_continue / agent_loop_parallel |
| Model fallback chains | Custom StreamProvider wrapping multiple providers |
| Observability backends | AgentEvent stream |
| Domain tools | AgentTool trait |
Deep Dive Files
Each entity has its own deep dive document in this folder:
- agent.md — Agent Profile, Capabilities, Skills, MCP, Permissions, Introspection
- session.md — Session (Task): identity, scope, formation, model, loops, input filters
- loop.md — Loop (Iteration): model, turns, compaction, parallel groups, callbacks
- turn.md — Turn (Step): trigger, messages, tool executions, streaming
- message.md — Content, Message, AgentMessage, LlmMessage, ExtensionMessage
- tool.md — AgentTool trait, ToolContext, execution strategies, callbacks
- provider.md — ModelConfig, ApiProtocol, registry, ContextTranslationStrategy
- event.md — AgentEvent lifecycle, StreamDelta, event flow
- compaction.md — CompactionBlock, strategies, scope, callbacks
- config.md — ContextConfig, ExecutionLimits, CacheConfig, AgentLoopConfig, hooks
Agent
The central entity in the system. An Agent combines a given identity (Agent Profile), capabilities (tools, skills, MCP connections), permissions, and introspection into a single runtime unit that executes Sessions (tasks).
The Agent trait defines the runtime interface (prompting, state access, control, steering queues). BasicAgent is the default in-memory implementation that owns conversation state, tools, and provider configuration.
Concept Overview
Agent
├── HEADER
│ ├── agent_id [EXISTS] — UUID, immutable
│ ├── Agent Profile [EXISTS] — AgentProfile struct (src/agents/profile.rs)
│ │ ├── profile_id [EXISTS] — distinct from agent_id; shareable across agents
│ │ ├── SystemPromptStrategy [EXISTS] — how system prompt is composed
│ │ │ └── static system_prompt string [EXISTS], file: prefix [EXISTS], {{...}} 3-entity chain [EXISTS]
│ │ ├── Agent Name [EXISTS] — Option<String> on AgentProfile
│ │ └── Agent Description [EXISTS] — Option<String> on AgentProfile
│ ├── Limits (Agent-level)
│ │ ├── context_config [EXISTS]
│ │ ├── execution_limits [EXISTS]
│ │ └── retry_config [EXISTS]
│ └── Default Model [EXISTS — BasicAgent.model_config]
│ └── Fallback when Session and Loop don't specify their own
│
├── TAB: Sessions (Tasks) [EXISTS]
│ └── (drill-down: Session → Loop → Turn)
├── TAB: Capabilities [EXISTS as Vec<Arc<dyn AgentTool>>]
│ ├── Tools [EXISTS] ├── Sub-agents [EXISTS]
│ ├── OpenAPI tools [EXISTS] └── Built-in tools [EXISTS]
├── TAB: Skills [EXISTS as SkillSet; CONCEPTUAL as browsable tab]
├── TAB: MCP Connections [EXISTS]
├── TAB: Permissions [CONCEPTUAL]
│ ├── Include rules [CONCEPTUAL] └── Exclude rules [CONCEPTUAL]
├── TAB: Introspection [CONCEPTUAL] — mandatory when scope = Persistent
│ ├── Episodic Memory [CONCEPTUAL] ├── Semantic Memory [CONCEPTUAL]
│ ├── Procedural Memory [CONCEPTUAL]
│ ├── Identity Shaping └── Knowledge Base
│
└── STATE (runtime) [EXISTS]
├── session_id [EXISTS] ├── messages [EXISTS]
├── queues [EXISTS] └── counters [EXISTS]
HEADER
| Field | Type | Status | Description |
|---|---|---|---|
agent_id | String (UUID v4) | [EXISTS] | Stable identifier assigned at construction. Included in every AgentStart event. Immutable for the lifetime of the agent instance. |
| Agent Profile | AgentProfile | [EXISTS] | Reusable identity blueprint (src/agents/profile.rs). Separate struct from Agent — multiple agents can share one profile via config instances. Fields: profile_id, name, description, system_prompt, thinking_level, temperature, max_tokens, config_id, skills, workspace. |
profile_id | String | [EXISTS] | Distinct from agent_id. Allows profile sharing across agents. Auto-generated UUID if not set. |
SystemPromptStrategy | trait | [EXISTS] | Defines block structure for multi-block prompt composition (src/agents/system_prompt.rs). Uses a 3-entity model: strategy template (block definitions with order + max_length), prompt instance (content filling blocks, supports file: paths), agent reference (via {{system_prompt.name}}). See configuration guide. |
system_prompt | Option<String> | [EXISTS] | System prompt string. Lives on AgentProfile.system_prompt. Supports inline text, file:path (relative to workspace), or {{...}} reference to a prompt instance. Resolution: agent > profile instance > base profile. |
| Agent Name | Option<String> | [EXISTS] | Human-readable name. Lives on AgentProfile.name. |
| Agent Description | Option<String> | [EXISTS] | Description of the agent's purpose. Lives on AgentProfile.description. |
workspace | Option<PathBuf> | [EXISTS] | Working directory. Lives on AgentProfile as blueprint default; BasicAgent stores an agent-level override. Resolution: agent workspace > profile workspace > current directory. |
model_config | ModelConfig | [EXISTS] | Default model for this agent. Falls back here when Session and Loop don't specify their own. Contains: model id, API key, base URL, API protocol, cost rates, context window size. |
context_config | Option<ContextConfig> | [EXISTS] | Token budget and compaction policy. Agent-level limit. |
execution_limits | Option<ExecutionLimits> | [EXISTS] | Max turns (50), max tokens (1M), max duration (10 min), cost tracking. Agent-level limit. |
retry_config | RetryConfig | [EXISTS] | Retry policy for provider errors. Exponential backoff with jitter. Agent-level. |
cache_config | CacheConfig | [EXISTS] | Prompt caching behavior (enabled/disabled, strategy: Auto/Disabled/Manual). |
tool_execution | ToolExecutionStrategy | [EXISTS] | How tool calls are executed: Parallel (default), Sequential, Batched. |
thinking_level | ThinkingLevel | [EXISTS] on Agent | Controls depth of model reasoning (Off/Minimal/Low/Medium/High). Agent default; per-loop values tracked in LoopConfigSnapshot. |
temperature | Option<f32> | [EXISTS] on Agent | Sampling temperature. Agent default; per-loop values tracked in LoopConfigSnapshot. |
max_tokens | Option<u32> | [EXISTS] | Max output tokens per response. None = use model default. |
provider_override | Option<Arc<dyn StreamProvider>> | [EXISTS] | Escape hatch for test injection or custom providers. Bypasses ProviderRegistry dispatch. |
TAB: Sessions (Tasks) [EXISTS]
Sessions are the actions an agent performs. Each Session contains Loops (iterations) which contain Turns (steps). See session.md.
| Field | Type | Status | Description |
|---|---|---|---|
session_id | String (UUID v4) | [EXISTS] | Current session identifier. Rotatable via check_and_rotate. |
TAB: Capabilities [EXISTS]
Registered tools available to the agent. Stored as Vec<Arc<dyn AgentTool>>.
| Capability | Status | Description |
|---|---|---|
| Tools | [EXISTS] | Registered AgentTool implementations. Added via with_tools(). |
| Sub-agents | [EXISTS] | Via SubAgentTool. Spawns child agent loops in separate sessions. |
| OpenAPI tools | [EXISTS] | Auto-generated from OpenAPI 3.0 spec via OpenApiToolAdapter. Feature-gated (openapi). |
| Built-in tools | [EXISTS] | Bash, File, Edit, Grep, ListDir, ReadFile. |
TAB: Skills [EXISTS] as SkillSet; [CONCEPTUAL] as browsable tab
Declarative capabilities loaded from SKILL.md files with YAML frontmatter.
| Field | Status | Description |
|---|---|---|
SkillSet | [EXISTS] | Loaded via with_skills(). Discovery and loading from filesystem. |
| Skill discovery | [EXISTS] | Finds <name>/SKILL.md files. |
| Skill browsing / editing | [CONCEPTUAL] | Interactive skill management in a UI. |
TAB: MCP Connections [EXISTS]
Model Context Protocol integration for external tool servers.
| Field | Status | Description |
|---|---|---|
| MCP server connections | [EXISTS] | Stdio and HTTP transports via McpClient / McpTransport. |
| Discovered tools | [EXISTS] | Auto-registered from MCP server via McpToolAdapter. Transparent to agent loop. |
| MCP connection management | [CONCEPTUAL] | Browsable tab for managing connections in a UI. |
TAB: Permissions [CONCEPTUAL]
Access control for agent actions. Not yet implemented.
| Field | Status | Description |
|---|---|---|
| Include rules | [CONCEPTUAL] | Whitelist of allowed actions. |
| Exclude rules | [CONCEPTUAL] | Blacklist of denied actions. |
TAB: Introspection [CONCEPTUAL]
Memory extraction from session logs and identity. Mandatory when Session scope is Persistent.
Memory Categories
| Category | Status | Description |
|---|---|---|
| Episodic Memory | [CONCEPTUAL] | What happened in past sessions (events, conversations). |
| Semantic Memory | [CONCEPTUAL] | Distilled knowledge (facts, concepts, relationships). |
| Procedural Memory | [CONCEPTUAL] | Successful strategies learned over time (patterns, playbooks). |
Memory Destinations
| Destination | Status | Description |
|---|---|---|
| Identity Shaping | [CONCEPTUAL] | Memory feeds back to evolve the Agent Profile. |
| Knowledge Base | [CONCEPTUAL] | Searchable database for future use. |
Agent State (Runtime) [EXISTS]
Mutable state that changes during execution.
| Field | Type | Status | Description |
|---|---|---|---|
session_id | String | [EXISTS] | Current session. Rotatable via check_and_rotate on inactivity timeout. |
messages | Vec<AgentMessage> | [EXISTS] | Full conversation history (LLM + Extension messages). |
steering_queue | Arc<Mutex<Vec<AgentMessage>>> | [EXISTS] | Mid-run interrupt messages. Drained per steering_mode (OneAtATime / All). |
follow_up_queue | Arc<Mutex<Vec<AgentMessage>>> | [EXISTS] | Post-turn follow-up messages. Drained per follow_up_mode. |
loop_counters | HashMap<String, usize> | [EXISTS] | Per-(session, config) monotonic counters for loop ID generation. |
last_loop_id | Option<String> | [EXISTS] | Most recently started loop. Used for parent_loop_id in continuations. |
last_active_at | Option<DateTime<Utc>> | [EXISTS] | Timestamp of last prompt call. Used by check_and_rotate for inactivity detection. |
cancel | Option<CancellationToken> | [EXISTS] | Abort handle. Some during streaming, None otherwise. |
is_streaming | bool | [EXISTS] | Guard against concurrent prompt() calls. |
session | Option<Session> | [EXISTS] | Optional session for block-based compaction. |
Code Reference
| File | What it contains |
|---|---|
src/agents/agent.rs | Agent trait — runtime interface (~40 methods: prompting, state, control, steering queues, hook setters). QueueMode enum. |
src/agents/basic_agent.rs | BasicAgent struct — default in-memory implementation. Builder pattern. All fields listed above. |
src/agents/profile.rs | AgentProfile struct — reusable identity blueprint with profile_id, name, description, system_prompt, thinking_level, temperature, max_tokens, config_id, skills, workspace. |
src/agents/system_prompt.rs | SystemPromptStrategy trait, SystemPrompt struct, PromptBlockDef, built-in strategies (Custom, Agent, Minimal). Compose logic with file: resolution. |
Conceptual Notes
- Agent Profile as a separate struct does not exist in code. The
system_promptfield lives directly onBasicAgent. A futureAgentProfilestruct would holdprofile_id,SystemPromptStrategy, name, and description, enabling profile sharing across agents. - SystemPromptStrategy now exists as a trait with a
compose(context) -> Stringmethod. It follows a 3-entity model: strategy template (the trait implementation), prompt instance (concrete prompt for a given context), profile ref (agent profile reference). Full 5-layer composition (base personality, task context, tool/skill index, memory context, turn-specific instructions) is future work.BasicAgentretains a staticsystem_promptstring as a fallback. - thinking_level and temperature are Agent-level defaults. Per-loop values are captured in
LoopConfigSnapshoton eachLoopRecord.AgentProfile::resolve_thinking_level()andresolve_temperature()have been removed; resolution is now direct fromAgentLoopConfig. - Introspection is the largest conceptual gap. It requires session log analysis, memory categorization (episodic/semantic/procedural), and feedback loops to Agent Profile evolution.
Session
A named container grouping all LoopRecords for one agent session. A Session represents a task the agent performs. It has identity, formation history, configuration, and contains an ordered sequence of Loops (iterations).
Sessions are created automatically by SessionRecorder when a new session_id first appears in an AgentStart event, or explicitly by the caller.
Concept Overview
Session [EXISTS]
├── HEADER
│ ├── session_id, agent_id [EXISTS]
│ ├── formation [EXISTS] — Explicit / FirstLoop / InactivityTimeout
│ ├── scope [EXISTS] — Ephemeral / Persistent (SessionScope enum)
│ ├── created_at, last_active_at [EXISTS]
│ ├── parent_spawn_ref [EXISTS] — cross-session link
│ ├── Task Name, Task Status [CONCEPTUAL]
│ └── Callbacks: before_task / after_task [EXISTS]
├── LINE ITEMS: Loops [EXISTS]
├── LINE ITEMS: Input Filters [EXISTS]
└── SUMMARY: total_usage(), loop_chain_to() [EXISTS]
HEADER
| Field | Type | Status | Description |
|---|---|---|---|
session_id | String | [EXISTS] | Stable identifier. Matches AgentStart.session_id. Generated as UUID v4 at BasicAgent::new(). |
agent_id | String | [EXISTS] | The agent that owns this session. Taken from the first AgentStart event. |
formation | SessionFormation | [EXISTS] | How the session was created. See Formation section below. |
scope | SessionScope | [EXISTS] | Ephemeral (default, in-memory only) or Persistent (session logs retained). Declared via config [session] scope = "persistent". |
created_at | DateTime<Utc> | [EXISTS] | Timestamp of the first AgentStart event for this session. |
last_active_at | DateTime<Utc> | [EXISTS] | Updated each time a new loop opens (on AgentStart). Reflects when the last loop started, not when it last had activity. |
parent_spawn_ref | Option<SpawnRef> | [EXISTS] | Cross-session link when this session was spawned as a sub-agent. Points back to parent session, loop, tool call. Inverse of LoopRecord.child_loop_refs. |
| Task Name | String | [CONCEPTUAL] | Human-readable label for the task this session represents. |
| Task Status | enum | [CONCEPTUAL] | Status of the task (e.g., Pending, Running, Completed, Failed). Derived from loop statuses but would be a first-class field. |
Formation [EXISTS]
How the session was initially created. Enum SessionFormation:
| Variant | Status | Description |
|---|---|---|
Explicit { timestamp } | [EXISTS] | Created by direct construction (tests, tooling). SessionRecorder never sets this. |
FirstLoop { timestamp } | [EXISTS] | Created automatically when a new session_id first appeared in an AgentStart event. |
InactivityTimeout { threshold_secs, previous_session_id, timestamp } | [EXISTS] | New session opened because the agent was idle longer than the threshold. Requires prior session_id rotation via BasicAgent::check_and_rotate. |
Callbacks [EXISTS]
Callbacks are configured on SessionRecorderConfig, not on the Session struct directly.
| Callback | Type | Status | Description |
|---|---|---|---|
before_task | Option<BeforeTaskFn> | [EXISTS] | Fires on the first AgentStart event with a new session_id. Blank by default. |
after_task | Option<AfterTaskFn> | [EXISTS] | Fires on flush(). Blank by default. |
LINE ITEMS: Loops (Iterations) [EXISTS]
Ordered list of all LoopRecords in this session, sorted by started_at.
| Field | Type | Status | Description |
|---|---|---|---|
loops | Vec<LoopRecord> | [EXISTS] | All completed and in-progress loop records. See loop.md. |
Loop Tree Structure
The tree is implicit via parent_loop_id / children_loop_ids links:
- Root loops --
parent_loop_idisNone(or points to a loop in a different session for sub-agent roots). - Continuation chains --
parent_loop_id->loop_idwithin the same session. - Parallel branches -- siblings sharing the same
parent_loop_id, each withparallel_groupset. - Sub-agent children -- in
child_loop_refson the parent loop (cross-session, not inloopsvec).
LINE ITEMS: Input Filters [EXISTS]
Input filters validate user messages before the LLM is called. Stored on AgentLoopConfig.input_filters, conceptually a Session-level concern.
| Field | Type | Status | Description |
|---|---|---|---|
input_filters | Vec<Arc<dyn InputFilter>> | [EXISTS] | Each filter returns Pass, Warn, or Reject for a given message. Reject aborts the loop before any LLM call and emits InputRejected. |
SUMMARY Methods [EXISTS]
Methods on the Session struct for querying and aggregating.
| Method | Status | Description |
|---|---|---|
total_usage() | [EXISTS] | Cumulative Usage across all loops. Sums input, output, reasoning, cache_read, cache_write, total_tokens. |
loop_chain_to(target_loop_id) | [EXISTS] | Builds the linear chain of loop IDs from root to target by walking parent_loop_id links backward. Returns chronological order (root first). Handles parallel branches (only selected path) and reruns (only active ancestor chain). |
root_loops() | [EXISTS] | Returns loops whose parent_loop_id is None or belongs to a different session. |
children_of(loop_id) | [EXISTS] | Returns direct same-session children of a loop. |
parallel_siblings(loop_id) | [EXISTS] | Returns all loops in the same parallel group. |
get_loop(loop_id) | [EXISTS] | Look up a loop by ID. |
Code Reference
| File | What it contains |
|---|---|
src/session/model.rs | Session struct, SessionFormation enum, SpawnRef struct, SessionError enum. All methods (total_usage, loop_chain_to, root_loops, children_of, parallel_siblings, get_loop). |
Conceptual Notes
- Session scope (Ephemeral vs Persistent) does not exist in code. All sessions are currently ephemeral by default. Adding scope would gate whether Introspection is required.
- Model/thinking/temperature per-loop -- These settings are no longer on
Session. They are tracked per-loop viaLoopConfigSnapshoton eachLoopRecord(see loop.md). The fallback hierarchy is Loop -> Agent default. - Task Name and Task Status would give sessions first-class task identity, enabling task dashboards and workflow tracking.
- before_task / after_task callbacks now exist on
SessionRecorderConfig.before_taskfires on the firstAgentStartwith a newsession_id;after_taskfires onflush(). This mirrors the existing before_loop/after_loop and before_turn/after_turn callback pattern at the Session level.
Loop
A complete record of one agent-loop execution, stored as LoopRecord. Loops are the iterations within a Session. Each Loop contains Turns (steps), tracks its model/provider configuration, accumulates usage, and links to parent/child loops for tree navigation.
Loops are created by agent_loop (origin loops) or agent_loop_continue (continuation loops). The SessionRecorder materializes LoopRecord structs from the AgentStart / AgentEnd event pairs.
Concept Overview
Loop [EXISTS — LoopRecord]
├── HEADER
│ ├── loop_id [EXISTS] — "{session_id}.{config_segment}.{N}"
│ ├── status [EXISTS] — Pending/Running/Completed/Rejected/Aborted
│ ├── continuation_kind [EXISTS] — Initial/Default/Rerun/Branch/Compaction
│ ├── parent_loop_id [EXISTS]
│ ├── timing [EXISTS] — started_at, ended_at
│ ├── Model [EXISTS] — falls back: Loop → Agent default
│ ├── config [EXISTS] — LoopConfigSnapshot
│ ├── usage, compaction_block [EXISTS]
│ └── Callbacks: before_loop / after_loop / on_error [EXISTS]
├── LINE ITEMS: Turns [EXISTS as events and struct]
├── LINE ITEMS: Same-session children, Sub-agent spawns [EXISTS]
├── LINE ITEMS: Parallel group [EXISTS]
└── LINE ITEMS: Events [EXISTS]
HEADER
| Field | Type | Status | Description |
|---|---|---|---|
loop_id | String | [EXISTS] | Unique identifier. Format: "{session_id}.{config_segment}.{N}". The config_segment encodes which model/provider produced this loop. N is a monotonic counter per (session, config). |
session_id | String | [EXISTS] | Session this loop belongs to. |
agent_id | String | [EXISTS] | Agent that ran this loop. |
status | LoopStatus | [EXISTS] | Lifecycle state: Pending, Running, Completed, Rejected, Aborted. See Status section below. |
continuation_kind | ContinuationKind | [EXISTS] | How this loop relates to its parent. Initial for origin loops (agent_loop). Default for regular continuations. Rerun for retries. Branch for branch explorations. Compaction for standalone compaction passes. |
parent_loop_id | Option<String> | [EXISTS] | The loop that directly preceded this one. None for origin loops. For sub-agent loops, points to the tool-call loop in a different session. |
started_at | DateTime<Utc> | [EXISTS] | Timestamp from AgentStart. |
ended_at | Option<DateTime<Utc>> | [EXISTS] | Timestamp from AgentEnd. None while running or pending. |
rejection | Option<String> | [EXISTS] | Set when AgentEnd.rejection is Some (input filter blocked the run). |
metadata | Option<serde_json::Value> | [EXISTS] | Opaque caller-supplied metadata from AgentStart (e.g., request id, trace ID). |
Model for this Loop [EXISTS]
The model/provider identity is captured as a lightweight snapshot, not the full config (which contains secrets and non-serializable closures).
| Field | Type | Status | Description |
|---|---|---|---|
config | Option<LoopConfigSnapshot> | [EXISTS] | Populated from AgentStart.config_snapshot or the first Message::Assistant seen. None if loop ended before any assistant message and no snapshot was provided. |
config.model | String | [EXISTS] | Model id string (e.g., "claude-opus-4-6", "gpt-4o"). |
config.provider | String | [EXISTS] | Provider name (e.g., "anthropic", "openai"). |
config.config_id | Option<String> | [EXISTS] | Stable config identity from AgentLoopConfig.config_id. Matches the config_segment in loop_id. |
config.name | Option<String> | [EXISTS] | Model display name. |
config.api | Option<ApiProtocol> | [EXISTS] | Which API protocol was used (e.g., AnthropicMessages, OpenAiCompletions). |
config.base_url | Option<String> | [EXISTS] | Provider base URL. |
config.reasoning | Option<bool> | [EXISTS] | Whether this model supports reasoning/thinking. |
config.context_window | Option<u32> | [EXISTS] | Context window size in tokens. |
config.max_tokens | Option<u32> | [EXISTS] | Max output tokens per response. |
config.thinking_level | Option<ThinkingLevel> | [EXISTS] | Reasoning depth level for this loop. Formerly a Session-level attribute; now per-loop. |
config.temperature | Option<f32> | [EXISTS] | Sampling temperature. Formerly a Session-level attribute; now per-loop. |
Model fallback hierarchy: Loop (AgentLoopConfig.model_config) -> Agent default (BasicAgent.model_config).
Usage [EXISTS]
| Field | Type | Status | Description |
|---|---|---|---|
usage | Usage | [EXISTS] | Token usage from AgentEnd.usage. Accumulated across all turns in this loop. Fields: input, output, reasoning, cache_read, cache_write, total_tokens. |
Compaction [EXISTS]
| Field | Type | Status | Description |
|---|---|---|---|
compaction_block | Option<CompactionBlock> | [EXISTS] | Non-destructive compaction overlay. When Some, the context loader uses this block instead of raw messages. Original messages remain untouched. |
Status [EXISTS]
Lifecycle state of a LoopRecord. Enum LoopStatus:
Pending -> Running -> Completed
-> Rejected
-> Aborted
| Variant | Status | Description |
|---|---|---|
Pending | [EXISTS] | Loop id appeared in ParallelLoopStart but AgentStart has not yet arrived. Only for parallel-evaluation branches. |
Running | [EXISTS] | AgentStart was received; the loop is executing. |
Completed | [EXISTS] | AgentEnd was received with no rejection. |
Rejected | [EXISTS] | AgentEnd was received with rejection: Some(_). Input filter blocked the run. |
Aborted | [EXISTS] | SessionRecorder::flush was called before AgentEnd arrived (e.g., process shutdown). |
Callbacks [EXISTS]
| Callback | Status | Description |
|---|---|---|
before_loop | [EXISTS] | Fires before AgentStart is emitted. Defined as BeforeLoopFn on AgentLoopConfig. Blank by default. |
after_loop | [EXISTS] | Fires after AgentEnd is emitted. Defined as AfterLoopFn. Receives messages and usage. Blank by default. |
on_error | [EXISTS] | Fires when StopReason::Error is encountered. Defined as OnErrorFn. Blank by default. |
LINE ITEMS: Turns (Steps) [EXISTS] as events and struct
Turns exist as TurnStart / TurnEnd event pairs in the loop's event stream, and as materialized Turn structs on LoopRecord.turns. See turn.md.
| Field | Type | Status | Description |
|---|---|---|---|
turns | Vec<Turn> | [EXISTS] | Materialized turn records. Built by SessionRecorder from event pairs. Empty for old sessions (backward compat via #[serde(default)]). |
| (event-pair) | — | [EXISTS] | Each turn is also bounded by TurnStart and TurnEnd events in self.events. |
LINE ITEMS: Same-session Children [EXISTS]
| Field | Type | Status | Description |
|---|---|---|---|
children_loop_ids | Vec<String> | [EXISTS] | Loop IDs of same-session child loops (continuations, reruns, branches). Parent->children direction. Does not include cross-session sub-agent children. |
LINE ITEMS: Sub-agent Spawns (Cross-session) [EXISTS]
| Field | Type | Status | Description |
|---|---|---|---|
child_loop_refs | Vec<ChildLoopRef> | [EXISTS] | Cross-session links to sub-agent loops spawned by tool calls. Each entry has: tool_call_id, tool_name, child_loop_id, child_session_id. |
ChildLoopRef fields:
| Field | Type | Status | Description |
|---|---|---|---|
tool_call_id | String | [EXISTS] | The ToolCall.id that triggered sub-agent execution. |
tool_name | String | [EXISTS] | The tool name that performed the spawn. |
child_loop_id | String | [EXISTS] | The sub-agent's AgentStart.loop_id. |
child_session_id | String | [EXISTS] | The sub-agent's session. Extracted from child_loop_id prefix. |
LINE ITEMS: Parallel Group [EXISTS]
Set when this loop was part of an evaluational-parallelism group (agent_loop_parallel).
| Field | Type | Status | Description |
|---|---|---|---|
parallel_group | Option<ParallelGroupRecord> | [EXISTS] | None for non-parallel loops. |
all_loop_ids | Vec<String> | [EXISTS] | All branch loop IDs in config order. |
selected_loop_id | String | [EXISTS] | The winning branch's loop ID. |
selected_config_index | usize | [EXISTS] | 0-based index of the winner in the original configs. |
evaluation_usage | Usage | [EXISTS] | Token usage from the judge LLM (zero for non-judge strategies). |
is_selected | bool | [EXISTS] | true if this LoopRecord is the evaluation winner. |
LINE ITEMS: Events [EXISTS]
| Field | Type | Status | Description |
|---|---|---|---|
events | Vec<LoopEvent> | [EXISTS] | Ordered event stream for this loop. |
Each LoopEvent has:
| Field | Type | Status | Description |
|---|---|---|---|
sequence | u64 | [EXISTS] | Monotonic counter (0-based). Gaps indicate filtered events (e.g., streaming deltas when include_streaming_events is false). |
event | AgentEvent | [EXISTS] | The original event. event.loop_id() matches this LoopRecord.loop_id. |
Messages [EXISTS]
| Field | Type | Status | Description |
|---|---|---|---|
messages | Vec<AgentMessage> | [EXISTS] | All new messages produced by this loop, from AgentEnd.messages. Authoritative for replay and branching. |
Loop Origin Classification
parent_loop_id | continuation_kind | Meaning |
|---|---|---|
None | Initial | Fresh origin loop (agent_loop) |
Some(p), same session | Default | Regular continuation |
Some(p), same session | Rerun | Retry / error recovery |
Some(p), same session | Branch | Branch exploration |
Some(p), different session | Initial | Sub-agent loop (spawned by a tool) |
Code Reference
| File | What it contains |
|---|---|
src/session/model.rs | LoopRecord struct, LoopStatus enum, LoopConfigSnapshot struct, ChildLoopRef struct, ParallelGroupRecord struct, LoopEvent struct, OpenLoop struct. |
src/agent_loop/run.rs | run_loop function — the core loop engine. Implements the outer loop (follow-ups) and inner loop (tool calls + steering). Accumulates Usage, fires turn events and hooks. |
Conceptual Notes
- Model fallback is Loop -> Agent default. Session no longer carries model/thinking/temperature fields; these are tracked per-loop in
LoopConfigSnapshot. - Turns as a struct are materialized on
LoopRecord.turnsasVec<Turn>. Built bySessionRecorderfromTurnStart/TurnEndevent pairs. The flatmessagesfield is kept independently for compaction and context building. Old sessions withoutturnsdeserialize with an empty vec. - LoopConfigSnapshot intentionally does not store the full
AgentLoopConfigbecause it contains API keys and non-serializable hook closures. The snapshot captures model identity plus key parameters (thinking_level, temperature, context_window, max_tokens, etc.) for cost attribution, replay identification, parallel branch differentiation, and per-loop config tracking.
Turn
A single LLM call-and-response cycle within a Loop. One Loop may have many Turns: the initial response plus one per tool-call round-trip or steering message injection.
Status: Turn [EXISTS] as both a first-class struct (Turn on LoopRecord.turns) and as an event-pair (TurnStart / TurnEnd). The SessionRecorder materializes Turn structs from the event stream.
Concept Overview
Turn [EXISTS as struct on LoopRecord.turns; EXISTS as event-pair TurnStart/TurnEnd]
├── HEADER
│ ├── TurnId [EXISTS] — { loop_id, turn_index }
│ ├── triggered_by [EXISTS] — User/SubAgent/Continuation/Branch
│ ├── usage [EXISTS] — per-turn from TurnEnd
│ └── Callbacks: before_turn / after_turn [EXISTS]
└── LINE ITEMS: Actions
├── Messages [EXISTS] — Input (User) + Output (Assistant)
├── Tool Executions [EXISTS]
└── Streaming [EXISTS] — MessageUpdate deltas
HEADER
| Field | Type | Status | Description |
|---|---|---|---|
TurnId | struct | [EXISTS] | Identifies the turn. Composed of loop_id: String and turn_index: u32. Carried on every LlmMessage produced during the turn. |
turn_index | u32 | [EXISTS] | Zero-based index within the current loop (0 = first turn after AgentStart). Present on TurnStart and TurnEnd events. |
triggered_by | TurnTrigger | [EXISTS] | What caused this turn to begin. See Trigger section below. |
usage | Usage | [EXISTS] | Per-turn token usage. Carried on TurnEnd.usage. Fields: input, output, reasoning, cache_read, cache_write, total_tokens. |
timestamp (start) | DateTime<Utc> | [EXISTS] | Wall-clock time when the turn began. On TurnStart.timestamp. |
timestamp (end) | DateTime<Utc> | [EXISTS] | Wall-clock time when the turn completed (after all tool calls finished). On TurnEnd.timestamp. |
TurnTrigger [EXISTS]
Identifies what caused a new turn to begin. Enum TurnTrigger:
| Variant | Status | Description |
|---|---|---|
User | [EXISTS] | First turn triggered by a user message (agent_loop). |
SubAgent | [EXISTS] | This agent was invoked as a sub-agent by a parent agent. |
Continuation | [EXISTS] | Continuation turn: tool round-trip, steering message, or Default / Rerun continuation. |
Branch | [EXISTS] | First turn of a Branch continuation (agent_loop_continue with ContinuationKind::Branch). Subsequent turns within the same branched loop use Continuation. |
Callbacks [EXISTS]
| Callback | Status | Description |
|---|---|---|
before_turn | [EXISTS] | Fires BEFORE TurnStart event is emitted. Defined as BeforeTurnFn on AgentLoopConfig. Receives (&[AgentMessage], usize) (messages, turn index). Returning false aborts the turn. |
after_turn | [EXISTS] | Fires AFTER TurnEnd event is emitted. Defined as AfterTurnFn. Receives (&[AgentMessage], &Usage). |
LINE ITEMS: Messages [EXISTS]
Messages produced and consumed during the turn.
| Message Type | Direction | Status | Description |
|---|---|---|---|
| Input (User / Steering / Follow-up) | Into LLM | [EXISTS] | Injected after TurnStart. Includes initial prompt messages (first turn only), pending steering messages, and follow-up messages. Each emits MessageStart / MessageEnd events. All carry the current TurnId. |
| Output (Assistant) | From LLM | [EXISTS] | The LLM's streamed response. Emitted as MessageStart -> MessageUpdate (streaming deltas) -> MessageEnd. Carries StopReason, model, provider, usage. Pushed to context and new_messages with TurnId. |
LINE ITEMS: Tool Executions [EXISTS]
Tool calls extracted from the assistant message's Content::ToolCall items.
| Field | Status | Description |
|---|---|---|
| Tool calls | [EXISTS] | Extracted from Message::Assistant.content as (id, name, arguments) tuples. |
ToolExecutionStart event | [EXISTS] | Emitted per tool call before execute(). Carries tool_call_id, tool_name, args. |
ToolExecutionUpdate event | [EXISTS] | Emitted during execution for streaming partial results (via ctx.on_update). Not all tools emit these. |
ToolExecutionEnd event | [EXISTS] | Emitted when tool finishes. Carries result, is_error, optional child_loop_id (for sub-agent tools). |
ProgressMessage event | [EXISTS] | Plain text status updates from tools (via ctx.on_progress). |
| Tool results | [EXISTS] | Message::ToolResult messages appended to context with the current TurnId. Fed back to LLM in the next turn. |
TurnEnd.tool_results | [EXISTS] | All tool result messages for this turn. Empty when no tool calls were made (StopReason::Stop). |
LINE ITEMS: Streaming Deltas [EXISTS]
Incremental token-level updates from the LLM stream, carried on MessageUpdate events.
| Variant | Status | Description |
|---|---|---|
StreamDelta::Text { delta } | [EXISTS] | A text token fragment from the LLM's response. |
StreamDelta::Thinking { delta } | [EXISTS] | A thinking/reasoning chunk (extended thinking mode only). |
StreamDelta::ToolCallDelta { delta } | [EXISTS] | A fragment of JSON arguments for a tool call. Must be accumulated and parsed after MessageEnd. |
Per-Turn Event Ordering
The event ordering is strictly enforced every iteration of the inner loop in run_loop:
before_turn hook -> TurnStart event
-> [MessageStart/End for prompt/steering messages]
-> [Compaction if threshold exceeded]
-> [MessageStart -> MessageUpdate* -> MessageEnd for assistant response]
-> [ToolExecutionStart -> ToolExecutionUpdate* -> ToolExecutionEnd for each tool]
-> TurnEnd event
-> after_turn hook
Code Reference
| File | What it contains |
|---|---|
src/agent_loop/run.rs | run_loop function — implements the turn cycle. TurnStart / TurnEnd event emission, before_turn / after_turn hook invocation, turn trigger determination, usage accumulation, tool call extraction and execution. |
src/types/event.rs | TurnTrigger enum, AgentEvent::TurnStart and AgentEvent::TurnEnd variants, StreamDelta enum. |
src/types/agent_message.rs | TurnId struct — { loop_id, turn_index }. Carried on LlmMessage.turn_id. |
src/session/model.rs | Turn struct — materialized turn record on LoopRecord.turns. Fields: turn_id, triggered_by, usage, input_messages, output_message, tool_results, started_at, ended_at. |
src/session/recorder.rs | SessionRecorder — builds Turn structs from TurnStart/MessageEnd/TurnEnd event pairs. |
Conceptual Notes
- Turn as a first-class struct is implemented. The
Turnstruct onLoopRecord.turnscontains:turn_id,triggered_by,usage,input_messages,output_message,tool_results,started_at,ended_at. Built bySessionRecorderfromTurnStart/TurnEndevent pairs. The flatLoopRecord.messagesis kept independently for backward compatibility and use by compaction/context building. Old sessions withoutturnsdeserialize with an empty vec via#[serde(default)]. - Turn lifecycle is entirely within a single Loop. A turn never spans loops. The inner loop in
run_loopcontinues when there are tool calls or pending steering messages; each iteration is one turn. - Execution limits are checked BEFORE
before_turnfires, so hooks are not invoked for impossible turns. If a limit is reached, a system message ([Agent stopped: ...]) is emitted and the loop returns. - Compaction can occur within a turn (after
TurnStart, before the LLM call), making a single turn potentially include a compaction event in its span.
Message
The message entities form the communication substrate of the entire system. Messages flow through Agent, Session, Loop, and Turn. The type hierarchy separates atomic content blocks, conversation-level messages, agent-level routing envelopes, and token usage tracking.
Concept Overview
Message System [EXISTS]
├── Content [EXISTS] — Text / Image / Thinking / ToolCall
├── Message [EXISTS] — User / Assistant / ToolResult
├── AgentMessage [EXISTS] — Llm(LlmMessage) | Extension(ExtensionMessage)
├── LlmMessage [EXISTS] — Message + Option<TurnId>
├── StopReason [EXISTS] — Stop/Length/ToolUse/Error/Aborted/...
└── Usage [EXISTS] — input/output/reasoning/cache tokens
Content [EXISTS]
The atomic unit of all message payloads. Every message is composed of Vec<Content>. A single LLM turn can contain multiple content blocks (e.g., a Thinking block followed by Text, or Text followed by multiple ToolCalls).
Enum Content, tagged by "type" in JSON:
| Variant | Status | Fields | Description |
|---|---|---|---|
Text | [EXISTS] | text: String | Plain string payload sent to/from the LLM. |
Image | [EXISTS] | data: String, mime_type: String | Binary image encoded as base64 string (not a file path). LLMs receive image bytes inline. |
Thinking | [EXISTS] | thinking: String, signature: Option<String> | Internal chain-of-thought from the LLM (e.g., Claude extended thinking). Visible in UI, never re-sent as content to LLM. signature is a cryptographic integrity token from the provider that must be echoed back unmodified in multi-turn conversations. |
ToolCall | [EXISTS] | id: String, name: String, arguments: serde_json::Value | LLM's request to invoke a tool with structured JSON arguments. The id links to a corresponding ToolResult. |
Message [EXISTS]
The conversation-level message enum. Tagged by "role" in JSON. Each variant carries Vec<Content> plus role-specific metadata.
| Variant | Status | Fields | Description |
|---|---|---|---|
User | [EXISTS] | content: Vec<Content>, timestamp: u64 | User turn. Mixed media supported (text + images). Timestamp is unix millis. Helper constructor: Message::user(text). |
Assistant | [EXISTS] | content: Vec<Content>, stop_reason: StopReason, model: String, provider: String, usage: Usage, timestamp: u64, error_message: Option<String> | LLM's response, fully annotated. stop_reason tells why generation stopped. model/provider captured for cost tracking and multi-provider routing. Failed turns are persisted, not dropped. |
ToolResult | [EXISTS] | tool_call_id: String, tool_name: String, content: Vec<Content>, is_error: bool, timestamp: u64 | Tool execution result returned to LLM. tool_call_id links back to the specific ToolCall in the assistant content. is_error: true means the LLM sees the failure and can recover/retry. |
Helper Methods on Message
| Method | Status | Description |
|---|---|---|
user(text) | [EXISTS] | Constructor for simple text user messages. |
role() | [EXISTS] | Returns "user", "assistant", or "toolResult". |
is_context_overflow() | [EXISTS] | Checks if an assistant message represents a context overflow error by inspecting error_message against known provider overflow patterns. |
StopReason [EXISTS]
Why an assistant message's generation stopped. Enum with camelCase serialization.
| Variant | Status | Description |
|---|---|---|
Stop | [EXISTS] | Natural end of generation. |
Length | [EXISTS] | Max tokens reached. |
ToolUse | [EXISTS] | LLM requested tool execution. |
Error | [EXISTS] | Provider error during generation. |
Aborted | [EXISTS] | Cancelled by caller. |
MaxTurns | [EXISTS] | Maximum allowed turns reached. |
UserStop | [EXISTS] | Stopped by explicit user command. |
Handoff | [EXISTS] | Agent handing off to human operator. |
GuardRail | [EXISTS] | Stopped by internal guardrail (content moderation, safety filter). |
ContextCompacted | [EXISTS] | Context was compacted, potentially losing information. |
Paused | [EXISTS] | Generation paused (waiting for external input). |
AgentMessage [EXISTS]
The agent loop's two-lane routing envelope. Decides whether content goes INTO the LLM context window or SIDEWAYS to the UI/app without consuming tokens.
Enum AgentMessage, untagged in JSON (discriminated by role field):
| Variant | Status | Description |
|---|---|---|
Llm(LlmMessage) | [EXISTS] | Enters the LLM context window. Serialized into the API request. |
Extension(ExtensionMessage) | [EXISTS] | NEVER enters the context window. Only emitted as AgentEvents. For UI notifications, debug events, session metadata, progress markers. |
Key Design: One-way Conversion
Message -> AgentMessage::Llm exists via From<Message>. There is no path for ExtensionMessage to become an Llm variant. The type system enforces that UI-only content can never accidentally slip into the LLM context.
Methods on AgentMessage
| Method | Status | Description |
|---|---|---|
role() | [EXISTS] | Delegates to inner message's role. |
as_llm() | [EXISTS] | Returns Option<&Message>. None for Extension. |
turn_id() | [EXISTS] | Returns Option<&TurnId>. None for Extension. |
with_turn_id(Option<TurnId>) | [EXISTS] | Sets turn_id on LLM messages. No-op for Extension. |
LlmMessage [EXISTS]
An LLM-bound message with optional turn tracking metadata. Wraps Message + Option<TurnId>.
| Field | Type | Status | Description |
|---|---|---|---|
message | Message | [EXISTS] | The underlying conversation message. |
turn_id | Option<TurnId> | [EXISTS] | Which turn produced this message. None for messages that predate turn tracking or are created outside the agent loop. |
Custom Serde (Flatten Pattern)
LlmMessage uses custom Serialize / Deserialize implementations to flatten into the same JSON shape as a bare Message with an optional turnId field injected. This maintains backward compatibility: old data without turnId deserializes as turn_id: None.
Why custom serde: #[serde(flatten)] does not work with serde's internally-tagged enums (#[serde(tag = "role")] on Message). Manual serialize/deserialize is the only way to achieve the flatten-into-Message pattern.
Constructors
| Method | Status | Description |
|---|---|---|
new(message) | [EXISTS] | Creates LlmMessage without turn tracking (turn_id: None). |
with_turn(message, turn_id) | [EXISTS] | Creates LlmMessage with a specific TurnId. |
ExtensionMessage [EXISTS]
App-only message that never enters the LLM context window. Streamed as events for UI/app consumption.
| Field | Type | Status | Description |
|---|---|---|---|
role | String | [EXISTS] | Always "extension". Acts as discriminator in untagged deserialization. Named role for consistency with Message but functions more like a type/category marker. |
kind | String | [EXISTS] | Message category (e.g., "notification", "system", "debug"). App-specific. |
data | serde_json::Value | [EXISTS] | Arbitrary JSON payload. Serialized from any impl Serialize via ExtensionMessage::new(). |
Usage [EXISTS]
Token metrics per turn or accumulated across loops/sessions.
| Field | Type | Status | Description |
|---|---|---|---|
input | u64 | [EXISTS] | Input tokens consumed. |
output | u64 | [EXISTS] | Output tokens generated. |
reasoning | u64 | [EXISTS] | Reasoning/thinking tokens — a subset of output. Non-zero only for providers that report reasoning tokens separately (OpenAI o-series). Defaults to 0. |
cache_read | u64 | [EXISTS] | Tokens served from prompt cache. |
cache_write | u64 | [EXISTS] | Tokens written to prompt cache. |
total_tokens | u64 | [EXISTS] | Total tokens (may differ from sum of above depending on provider reporting). |
Methods on Usage
| Method | Status | Description |
|---|---|---|
estimated_cost(&CostConfig) | [EXISTS] | Dollar cost calculation using per-million-token rates. reasoning tokens are already counted in output (no double-charge). |
combine(&Usage) | [EXISTS] | Adds two Usage values (e.g., sum across parallel branches or multi-step loops). |
cache_hit_rate() | [EXISTS] | Fraction of input tokens served from cache (0.0-1.0). Returns 0.0 if no input tokens processed. |
Where Usage Appears
| Location | Status | Description |
|---|---|---|
Message::Assistant.usage | [EXISTS] | Per-turn usage on the assistant message itself. |
AgentEvent::TurnEnd.usage | [EXISTS] | Direct per-turn access without destructuring the message. |
AgentEvent::AgentEnd.usage | [EXISTS] | Accumulated across all turns in a loop. |
LoopRecord.usage | [EXISTS] | Captured from AgentEnd.usage. |
Session.total_usage() | [EXISTS] | Summed across all loops. |
Code Reference
| File | What it contains |
|---|---|
src/types/content.rs | Content enum (Text, Image, Thinking, ToolCall), Message enum (User, Assistant, ToolResult), StopReason enum, now_ms() helper. |
src/types/agent_message.rs | TurnId struct, LlmMessage struct (with custom serde), AgentMessage enum, From<Message> impl. |
src/types/extension.rs | ExtensionMessage struct. |
src/types/usage.rs | Usage struct, CacheConfig struct, CacheStrategy enum, ThinkingLevel enum. |
Conceptual Notes
- LlmMessage serde is a critical compatibility mechanism. Any future fields added to LlmMessage must maintain the flatten-into-Message JSON pattern. Do not use
#[serde(flatten)]withMessage. - ExtensionMessage naming: The
rolefield is named for consistency withMessagebut functions as a type discriminator. A more accurate name would betypeorcategory, butroleenables consistent untagged serde deserialization across theAgentMessageenum. - StopReason includes several forward-looking variants (MaxTurns, UserStop, Handoff, GuardRail, ContextCompacted, Paused) adopted from other agentic frameworks. These exist as enum variants but may not yet be emitted by all code paths.
- Usage.reasoning is a subset of
output, not an additional charge. It is non-zero only for OpenAI o-series models that report reasoning tokens separately.
Tool System
The tool system defines how agents interact with the external world. Every capability an agent has -- running shell commands, reading files, calling APIs, delegating to sub-agents -- is expressed as a tool implementing the AgentTool trait. The agent loop discovers tools by name from a registry, executes them with lifecycle events, and feeds results back to the LLM.
Concept Overview
Tool [EXISTS]
├── AgentTool trait [EXISTS] — name, label, description, parameters_schema, execute
├── ToolContext [EXISTS] — tool_call_id, tool_name, cancel, on_update, on_progress
├── ToolResult [EXISTS] — content, details, child_loop_id
├── ToolError [EXISTS] — Failed/NotFound/InvalidArgs/Cancelled
├── ToolExecutionStrategy [EXISTS] — Sequential/Parallel/Batched
├── SubAgentTool [EXISTS] — spawns child agent loop
├── Sources: Built-in [EXISTS] / OpenAPI [EXISTS] / MCP [EXISTS]
└── Callbacks: before/after_tool_execution, before/after_update [EXISTS]
AgentTool Trait [EXISTS]
The core extension point. Implement this trait to create custom tools.
| Method | Signature | Status | Description |
|---|---|---|---|
name() | -> &str | [EXISTS] | Unique identifier used in LLM tool_use (e.g. "bash") |
label() | -> &str | [EXISTS] | Human-readable label for UI display |
description() | -> &str | [EXISTS] | Description sent to the LLM so it knows when/how to use the tool |
parameters_schema() | -> serde_json::Value | [EXISTS] | JSON Schema for parameters; LLM uses this to format arguments |
execute() | (params, ctx) -> Result<ToolResult, ToolError> | [EXISTS] | Execute the tool with LLM-chosen arguments and system-injected context |
Design: params (LLM input) and ctx (system environment) are deliberately separate parameters. params varies per call; ctx provides cancellation, streaming callbacks, and correlation IDs that are the same shape for every tool.
ToolContext [EXISTS]
Per-invocation context passed to execute(). Using a struct (rather than individual parameters) future-proofs the trait -- adding fields is non-breaking.
| Field | Type | Status | Description |
|---|---|---|---|
tool_call_id | String | [EXISTS] | Unique ID for this invocation; correlates Start/Update/End events |
tool_name | String | [EXISTS] | Name of the tool being invoked |
cancel | CancellationToken | [EXISTS] | Check is_cancelled() in long-running tools; child token of the parent loop's token |
on_update | Option<ToolUpdateFn> | [EXISTS] | Callback for streaming partial ToolResults (UI/logging only; not sent to LLM) |
on_progress | Option<ProgressFn> | [EXISTS] | Callback for user-facing progress text (emits ProgressMessage events) |
Callback wiring: The agent loop creates on_update and on_progress closures that capture a cloned tx channel sender. When a tool calls on_update(partial), the closure pushes an AgentEvent::ToolExecutionUpdate into the channel. The tool never touches the event system directly.
ToolResult [EXISTS]
What a tool hands back to the runtime after execution.
| Field | Type | Status | Description |
|---|---|---|---|
content | Vec<Content> | [EXISTS] | Tool output (text, images, etc.) |
details | serde_json::Value | [EXISTS] | Freeform metadata (not sent to LLM) |
child_loop_id | Option<String> | [EXISTS] | Set by SubAgentTool to the child loop's ID; None for regular tools |
Note: The runtime transforms struct ToolResult into Message::ToolResult by enriching it with correlation metadata (tool_call_id, tool_name, is_error, timestamp) before it enters the LLM conversation.
ToolError [EXISTS]
Error taxonomy for tool execution failures. Errors are converted to ToolResult with is_error=true so the LLM sees the failure and can self-correct.
| Variant | Display | Status |
|---|---|---|
Failed(String) | "{message}" | [EXISTS] |
NotFound(String) | "Tool not found: {name}" | [EXISTS] |
InvalidArgs(String) | "Invalid arguments: {message}" | [EXISTS] |
Cancelled | "Cancelled" | [EXISTS] |
ToolExecutionStrategy [EXISTS]
Controls how multiple tool calls from a single LLM response are executed. Set at agent construction time (not a per-turn LLM decision).
| Variant | Status | Behavior |
|---|---|---|
Sequential | [EXISTS] | One at a time; checks steering between each. Use for tools with shared mutable state |
Parallel (default) | [EXISTS] | All concurrent via futures::join_all; checks steering after all complete. Best latency for independent tools |
Batched { size } | [EXISTS] | N tools in parallel per batch; checks steering between batches. Balances speed with human-in-the-loop control |
Steering: The human-in-the-loop interrupt mechanism. Between tool executions (or batches), the loop checks whether the human has sent a new instruction, cancellation, or correction.
SubAgentTool [EXISTS]
A tool that delegates work to a child agent loop. When the parent LLM calls it, a fresh agent_loop() runs with its own system prompt, tools, and provider. The child loop's final text output is returned as the tool result.
| Attribute | Status | Description |
|---|---|---|
tool_name | [EXISTS] | Unique name for the sub-agent tool |
tool_description | [EXISTS] | Description for the parent LLM |
system_prompt | [EXISTS] | Child agent's system prompt |
model_config | [EXISTS] | Child agent's model configuration |
provider_override | [EXISTS] | Optional custom provider (testing) |
tools | [EXISTS] | Tools available to the child agent |
thinking_level | [EXISTS] | Thinking level for the child loop |
Design constraints: Sub-agents are NOT given other SubAgentTools (static depth prevention). Cancellation propagates from parent to child. Events stream back to the parent via on_update.
Built-in Tools [EXISTS]
Six tools returned by default_tools():
| Tool | File | Status | Description |
|---|---|---|---|
BashTool | tools/bash.rs | [EXISTS] | Run shell commands |
ReadFileTool | tools/file.rs | [EXISTS] | Read file contents |
WriteFileTool | tools/file.rs | [EXISTS] | Write or overwrite a file |
EditFileTool | tools/edit.rs | [EXISTS] | Precise text replacement within a file |
ListFilesTool | tools/list.rs | [EXISTS] | List directory contents |
SearchTool | tools/search.rs | [EXISTS] | Grep / content search across files |
OpenAPI Tools [EXISTS]
OpenApiToolAdapter parses an OpenAPI 3.0 spec and creates one AgentTool per operation. Each adapter makes HTTP requests to the API endpoint when executed. Feature-gated behind the openapi Cargo feature.
Factory methods: from_str, from_file, from_url, from_spec.
MCP Tools [EXISTS]
McpToolAdapter bridges MCP server tools to the AgentTool trait using the Adapter pattern. All adapters for the same server share one McpClient (via Arc<Mutex<McpClient>>). Name collision prevention uses an optional prefix namespace (e.g. "filesystem__read_file").
Tool Callbacks [EXISTS]
Lifecycle hooks that fire around tool execution. All are Option<Arc<dyn Fn(...)>> on AgentLoopConfig.
| Hook | Signature | Status | Fires When |
|---|---|---|---|
before_tool_execution | (tool_name, tool_call_id, args) -> bool | [EXISTS] | Before ToolExecutionStart; return false to skip the call |
after_tool_execution | (tool_name, tool_call_id, is_error) | [EXISTS] | After ToolExecutionEnd |
before_tool_execution_update | (tool_name, tool_call_id, text) -> bool | [EXISTS] | Before each ToolExecutionUpdate; return false to suppress the event |
after_tool_execution_update | (tool_name, tool_call_id, text) | [EXISTS] | After each ToolExecutionUpdate (only if not suppressed) |
Hook ordering: Hooks fire strictly before their paired event is emitted. When before_tool_execution returns false, no ToolExecutionStart/End events are emitted; a synthetic error ToolResult is sent to the LLM so it knows the call was skipped.
Code Reference
| Concept | File |
|---|---|
AgentTool trait, ToolContext, ToolResult, ToolError, ToolExecutionStrategy | src/types/tool.rs |
ToolUpdateFn, ProgressFn type aliases | src/types/tool.rs |
Tool dispatch, execute_tool_calls, execute_single_tool, skip_tool_call | src/agent_loop/tools.rs |
SubAgentTool | src/agents/sub_agent.rs |
Built-in tools (BashTool, ReadFileTool, etc.) | src/tools/ |
OpenApiToolAdapter | src/openapi/adapter.rs |
McpToolAdapter | src/mcp/tool_adapter.rs |
Tool callback type aliases (BeforeToolExecutionFn, etc.) | src/agent_loop/config.rs |
ToolDefinition (schema sent to LLM, not executable) | src/provider/traits.rs |
Conceptual Notes
- Tool Permission System [CONCEPTUAL] -- The plan includes an Agent-level Permissions tab with include/exclude rules for allowed/denied actions. This would gate tool execution at a higher level than the
before_tool_executionhook. - Tool Result Streaming to LLM [CONCEPTUAL] -- Currently
on_updatepartial results are UI-only. A future design could allow streaming tool results to the LLM mid-execution for real-time reasoning. - ToolDefinition vs AgentTool Split --
ToolDefinition(inprovider/traits.rs) is the schema half sent to the LLM;AgentTool(intypes/tool.rs) is the executable half. The agent loop bridges them: convertsAgentTooltoToolDefinitionbefore streaming, then matchesToolCallcontent back toAgentToolby name for execution.
Provider System
The provider system abstracts all LLM backends behind a single StreamProvider trait. The caller constructs a ModelConfig (the model's "identity card"), and the ProviderRegistry dispatches to the correct concrete provider at runtime. This design allows seamless switching between Anthropic, OpenAI, Google, Bedrock, Azure, and 15+ OpenAI-compatible providers without changing application code.
Concept Overview
Provider [EXISTS]
├── ModelConfig [EXISTS] — id, name, api, provider, base_url, api_key, cost
├── ApiProtocol [EXISTS] — 7 variants (Anthropic, OpenAI, Google, Bedrock, Azure, etc.)
├── CostConfig [EXISTS] — per-million rates
├── StreamProvider trait [EXISTS] — stream() method
├── ProviderRegistry [EXISTS] — dispatch by ApiProtocol
├── OpenAiCompat [EXISTS] — quirk flags for 15+ providers
└── ContextTranslationStrategy [EXISTS] — cross-provider content translation (G8, src/provider/context_translation.rs)
ModelConfig [EXISTS]
The single source of truth for a model's identity. Bundles everything a provider needs to make API calls.
| Field | Type | Status | Description |
|---|---|---|---|
id | String | [EXISTS] | Model identifier sent to the API (e.g. "gpt-4o", "claude-sonnet-4-20250514") |
name | String | [EXISTS] | Human-friendly display name (logging/UI; not sent to API) |
api | ApiProtocol | [EXISTS] | Which wire protocol to use (dispatch key for ProviderRegistry) |
provider | String | [EXISTS] | Provider name for logging (e.g. "openai", "anthropic") |
base_url | String | [EXISTS] | Base URL for API requests (supports private deployments, proxies) |
api_key | String | [EXISTS] | Authentication credential; defaults to empty string so configs can omit it |
reasoning | bool | [EXISTS] | Whether this model supports extended thinking/reasoning |
context_window | u32 | [EXISTS] | Max input tokens (used for compaction decisions) |
max_tokens | u32 | [EXISTS] | Default max output tokens |
cost | CostConfig | [EXISTS] | Token pricing for cost tracking (defaults to zero) |
headers | HashMap<String, String> | [EXISTS] | Additional HTTP headers (e.g. API-version headers) |
compat | Option<OpenAiCompat> | [EXISTS] | OpenAI quirk flags; None for non-OpenAI providers |
ApiProtocol [EXISTS]
The dispatch key that maps a model to its concrete StreamProvider implementation. Seven variants covering all supported backends.
| Variant | Provider File | Status | Covers |
|---|---|---|---|
AnthropicMessages | anthropic.rs | [EXISTS] | Claude models |
OpenAiCompletions | openai_compat.rs | [EXISTS] | OpenAI, Groq, Together, DeepSeek, Fireworks, Mistral, xAI, OpenRouter, etc. (15+) |
OpenAiResponses | openai_responses.rs | [EXISTS] | OpenAI Responses API |
AzureOpenAiResponses | azure_openai.rs | [EXISTS] | Azure OpenAI |
GoogleGenerativeAi | google.rs | [EXISTS] | Gemini (Google AI Studio) |
GoogleVertex | google_vertex.rs | [EXISTS] | Vertex AI |
BedrockConverseStream | bedrock.rs | [EXISTS] | Amazon Bedrock (ConverseStream) |
CostConfig [EXISTS]
Token pricing per million tokens. Embedded in ModelConfig with #[serde(default)] fields, so callers who don't need cost tracking can omit it.
| Field | Type | Status | Description |
|---|---|---|---|
input_per_million | f64 | [EXISTS] | Cost per million input tokens |
output_per_million | f64 | [EXISTS] | Cost per million output tokens |
cache_read_per_million | f64 | [EXISTS] | Cost per million cache-read tokens (default: 0.0) |
cache_write_per_million | f64 | [EXISTS] | Cost per million cache-write tokens (default: 0.0) |
StreamProvider Trait [EXISTS]
The core abstraction every LLM backend implements. The rest of the codebase interacts only with &dyn StreamProvider -- it never knows which concrete backend is used at runtime.
| Method | Signature | Status | Description |
|---|---|---|---|
provider_id() | -> &str | [EXISTS] | Short stable identifier (e.g. "anthropic"); used in loop_id construction |
stream() | (config, tx, cancel) -> Result<Message, ProviderError> | [EXISTS] | Stream a completion; sends StreamEvents through tx in real time; returns final assembled Message |
Dual-output contract: The tx channel carries partial deltas for real-time UI updates. The return value carries the complete message after the stream ends. The loop cannot read its own output from the channel -- the return value is the protocol, the channel is the live feed.
ProviderRegistry [EXISTS]
Maps ApiProtocol to StreamProvider implementations. Factory + router.
| Method | Status | Description |
|---|---|---|
new() | [EXISTS] | Empty registry (no providers) |
default() | [EXISTS] | All 7 built-in providers registered |
register(protocol, provider) | [EXISTS] | Register a provider for a protocol (overwrites if exists) |
get(protocol) | [EXISTS] | Look up provider by protocol |
has(protocol) | [EXISTS] | Check if a provider is registered |
protocols() | [EXISTS] | List all registered protocols |
stream(model, config, tx, cancel) | [EXISTS] | Dispatch: looks up provider by model.api, delegates to provider.stream() |
Design: model (routing key) is separate from config (request payload). The registry routes on model.api, then passes config through unchanged.
OpenAiCompat Quirk Flags [EXISTS]
The "quirk matrix" for 15+ OpenAI-compatible providers. One openai_compat.rs provider reads these flags at runtime and branches accordingly, instead of maintaining separate provider files per quirk combination.
| Flag | Type | Status | Description |
|---|---|---|---|
supports_store | bool | [EXISTS] | Supports the store parameter for conversation persistence |
supports_developer_role | bool | [EXISTS] | Supports developer role (system-level instructions) |
supports_reasoning_effort | bool | [EXISTS] | Supports reasoning_effort parameter |
supports_usage_in_streaming | bool | [EXISTS] | Includes usage data in streaming responses (default: true) |
max_tokens_field | MaxTokensField | [EXISTS] | Which field name to use: MaxTokens or MaxCompletionTokens |
requires_tool_result_name | bool | [EXISTS] | Tool results must include a name field |
requires_assistant_after_tool_result | bool | [EXISTS] | Must insert assistant message after tool results |
thinking_format | ThinkingFormat | [EXISTS] | How thinking/reasoning content is formatted: OpenAi, Xai, Qwen, OpenRouter |
Factory methods for provider-specific flag combinations:
| Method | Status | Notes |
|---|---|---|
OpenAiCompat::openai() | [EXISTS] | store, developer role, reasoning effort, MaxCompletionTokens |
OpenAiCompat::xai() | [EXISTS] | Grok thinking format |
OpenAiCompat::groq() | [EXISTS] | Default with streaming usage |
OpenAiCompat::cerebras() | [EXISTS] | Pure default (no deviations) |
OpenAiCompat::openrouter() | [EXISTS] | Developer role, OpenRouter thinking format |
OpenAiCompat::mistral() | [EXISTS] | MaxTokens field |
OpenAiCompat::deepseek() | [EXISTS] | MaxCompletionTokens |
Factory Methods on ModelConfig [EXISTS]
Convenience constructors for common providers.
| Method | Status | Protocol | Default context_window |
|---|---|---|---|
ModelConfig::anthropic(id, name, api_key) | [EXISTS] | AnthropicMessages | 200,000 |
ModelConfig::openai(id, name, api_key) | [EXISTS] | OpenAiCompletions | 128,000 |
ModelConfig::google(id, name, api_key) | [EXISTS] | GoogleGenerativeAi | 1,000,000 |
ModelConfig::local(base_url, model_id, api_key) | [EXISTS] | OpenAiCompletions | 128,000 |
ModelConfig::openrouter(model_id, api_key) | [EXISTS] | OpenAiCompletions | 200,000 |
ProviderError [EXISTS]
Error taxonomy for provider failures. The agent loop uses this for retry/recovery decisions.
| Variant | Status | Retryable | Description |
|---|---|---|---|
Api(String) | [EXISTS] | No | Non-transient API error (bad request, server error) |
Network(String) | [EXISTS] | Yes | Transport failure (connection refused, timeout, TLS) |
Auth(String) | [EXISTS] | No | 401/403 -- bad or missing API key |
RateLimited { retry_after_ms } | [EXISTS] | Yes | 429 -- too many requests |
ContextOverflow { message } | [EXISTS] | No (compact) | Input exceeds context window; caller should compact and retry |
Cancelled | [EXISTS] | No | CancellationToken triggered |
Other(String) | [EXISTS] | No | Catch-all |
Context overflow detection: Centralized in OVERFLOW_PHRASES covering 15+ provider-specific error strings. Both HTTP errors and SSE-embedded errors are classified.
Code Reference
| Concept | File |
|---|---|
ModelConfig, ApiProtocol, CostConfig, OpenAiCompat, MaxTokensField, ThinkingFormat | src/provider/model.rs |
StreamProvider trait, StreamConfig, StreamEvent, ToolDefinition, ProviderError | src/provider/traits.rs |
ProviderRegistry | src/provider/registry.rs |
AnthropicProvider | src/provider/anthropic.rs |
OpenAiCompatProvider | src/provider/openai_compat.rs |
OpenAiResponsesProvider | src/provider/openai_responses.rs |
AzureOpenAiProvider | src/provider/azure_openai.rs |
GoogleProvider | src/provider/google.rs |
GoogleVertexProvider | src/provider/google_vertex.rs |
BedrockProvider | src/provider/bedrock.rs |
RetryConfig | src/provider/retry.rs |
MockProvider (testing) | src/provider/mock.rs |
Conceptual Notes
- ContextTranslationStrategy [EXISTS] -- Trait in
src/provider/context_translation.rs(G8).DefaultContextTranslationhandles cross-provider content translation: Anthropic keeps Thinking blocks, OpenAI converts to Text with[Reasoning]prefix, Google/Bedrock drops Thinking. Set onAgentLoopConfig.context_translation. - Model fallback chain -- Model resolution follows: Loop (
AgentLoopConfig.model_config) -> Session model override [EXISTS] (Session.model_config: Option<ModelConfig>) -> Agent default (BasicAgent.model_config). - provider_override --
AgentLoopConfig.provider_override: Option<Arc<dyn StreamProvider>>bypassesProviderRegistrydispatch entirely. Used for testing withMockProvideror injecting custom provider implementations.
Event Lifecycle
AgentEvent is the runtime's event vocabulary -- it captures every significant happening in the agent loop that a UI, logger, or analysis consumer might react to. Events are emitted through an mpsc::UnboundedSender<AgentEvent> channel during execution and consumed by SessionRecorder (or any external subscriber) on the receiving end.
Concept Overview
Event [EXISTS]
├── AgentEvent [EXISTS] — 15 variants
│ ├── Session: AgentStart/End [EXISTS]
│ ├── Loop: ParallelLoopStart/End, CompactionStarted/Ended [EXISTS]
│ ├── Turn: TurnStart/End [EXISTS]
│ ├── Message: MessageStart/Update/End [EXISTS]
│ ├── Tool: ToolExecutionStart/Update/End, ProgressMessage [EXISTS]
│ └── Input: InputRejected [EXISTS]
├── StreamDelta [EXISTS] — Text/Thinking/ToolCallDelta
├── ContinuationKind [EXISTS] — Initial/Default/Rerun/Branch/Compaction
└── TurnTrigger [EXISTS] — User/SubAgent/Continuation/Branch
AgentEvent [EXISTS]
15 variants grouped by scope. Each variant carries a loop_id for correlation (except ParallelLoopStart/End which use session_id).
Session-scoped Events
| Variant | Status | Fields | Description |
|---|---|---|---|
AgentStart | [EXISTS] | agent_id, session_id, loop_id, parent_loop_id, continuation_kind, config_snapshot, timestamp, metadata | Fires once when agent_loop() is entered, before any LLM call. continuation_kind: ContinuationKind (non-optional). config_snapshot: Option<LoopConfigSnapshot> carries model/provider identity. |
AgentEnd | [EXISTS] | loop_id, messages, usage, timestamp, rejection | Fires once when agent_loop() exits; rejection is Some if an InputFilter blocked the input |
Loop-scoped Events
| Variant | Status | Fields | Description |
|---|---|---|---|
ParallelLoopStart | [EXISTS] | session_id, loop_ids, timestamp | Emitted before parallel branch dispatch; lists all branch loop_ids |
ParallelLoopEnd | [EXISTS] | session_id, selected_loop_id, selected_config_index, evaluation_usage, timestamp | Emitted after evaluation selects a winning branch |
CompactionStarted | [EXISTS] | loop_id, estimated_tokens, message_count, timestamp | Emitted before compaction strategy runs |
CompactionEnded | [EXISTS] | loop_id, messages_before, messages_after, estimated_tokens_before, estimated_tokens_after, loops_compacted, timestamp | Emitted after compaction completes |
Turn-scoped Events
| Variant | Status | Fields | Description |
|---|---|---|---|
TurnStart | [EXISTS] | loop_id, turn_index, timestamp, triggered_by | Fires at the start of each LLM turn (one LLM call = one turn) |
TurnEnd | [EXISTS] | loop_id, message, usage, timestamp, tool_results | Fires at the end of each LLM turn |
Message-scoped Events
| Variant | Status | Fields | Description |
|---|---|---|---|
MessageStart | [EXISTS] | loop_id, message | New message created (assistant: when SSE stream opens; user/tool: immediately) |
MessageUpdate | [EXISTS] | loop_id, message, delta | Streaming token/chunk; delta is the increment, message is the accumulator |
MessageEnd | [EXISTS] | loop_id, message | Message fully complete; safe to persist |
Tool-scoped Events
| Variant | Status | Fields | Description |
|---|---|---|---|
ToolExecutionStart | [EXISTS] | loop_id, tool_call_id, tool_name, args | Tool call begins (before execute()) |
ToolExecutionUpdate | [EXISTS] | loop_id, tool_call_id, tool_name, partial_result | Mid-execution partial result (via ctx.on_update) |
ToolExecutionEnd | [EXISTS] | loop_id, tool_call_id, tool_name, result, is_error, child_loop_id | Tool finished; child_loop_id is Some for sub-agent tools |
ProgressMessage | [EXISTS] | loop_id, tool_call_id, tool_name, text | User-facing status text (via ctx.on_progress) |
Input-scoped Events
| Variant | Status | Fields | Description |
|---|---|---|---|
InputRejected | [EXISTS] | loop_id, reason | InputFilter rejected the user's message; agent loop returns immediately |
Event Scoping (Bracket Relationships)
Events form a nested bracket structure:
AgentStart (+ config_snapshot) -- session-scoped
TurnStart -- turn-scoped (0-based index)
MessageStart -- message-scoped (assistant message)
MessageUpdate (N times) -- streaming deltas
MessageEnd
ToolExecutionStart -- tool-scoped (per tool call)
ToolExecutionUpdate (0..N) -- partial results
ProgressMessage (0..N) -- status text
ToolExecutionEnd
MessageStart -- message-scoped (tool result message)
MessageEnd
TurnEnd
TurnStart -- next turn (tool round-trip)
...
TurnEnd
AgentEnd -- session-scoped
For parallel evaluation:
ParallelLoopStart -- loop-scoped (lists all branch IDs)
AgentStart (branch 1) -- nested full lifecycle per branch
AgentEnd (branch 1)
AgentStart (branch 2)
AgentEnd (branch 2)
ParallelLoopEnd -- loop-scoped (announces winner)
StreamDelta [EXISTS]
Incremental token-level updates from the LLM stream. Carried inside MessageUpdate events.
| Variant | Status | Description |
|---|---|---|
Text { delta } | [EXISTS] | A text token fragment |
Thinking { delta } | [EXISTS] | A thinking/reasoning chunk (extended thinking mode only) |
ToolCallDelta { delta } | [EXISTS] | A fragment of tool call argument JSON (accumulate until MessageEnd) |
ContinuationKind [EXISTS]
How an agent_loop_continue call relates to the session's prior loops. Surfaced in AgentStart for observability.
| Variant | Status | Description |
|---|---|---|
Initial | [EXISTS] | First loop in a session via agent_loop(). The #[default] variant. |
Default | [EXISTS] | Unspecified continuation; preserves original semantics |
Rerun { tag } | [EXISTS] | Retry from equivalent state; tag is RFC 3339 UTC timestamp |
Branch { tag } | [EXISTS] | Exploration of a different path from a branching point |
Compaction | [EXISTS] | Standalone context-compaction pass; no LLM call |
TurnTrigger [EXISTS]
Identifies what caused a new turn to begin. Carried in TurnStart.
| Variant | Status | Description |
|---|---|---|
User | [EXISTS] | First turn triggered by a user message |
SubAgent | [EXISTS] | Invoked as a sub-agent by a parent agent |
Continuation | [EXISTS] | Continuation turn: tool round-trip, steering, or Default/Rerun continuation |
Branch | [EXISTS] | First turn of a Branch continuation; subsequent turns use Continuation |
Event Flow
Producer: agent_loop (src/agent_loop/)
|
| mpsc::UnboundedSender<AgentEvent>
v
Consumer: SessionRecorder (src/session/recorder.rs)
|
| on_event() dispatches by variant
v
Storage: Session -> LoopRecord -> LoopEvent[]
The SessionRecorder consumes events and builds a structured tree:
AgentStartopens aLoopRecord(status:Running)AgentEndcloses it (status:CompletedorRejected)TurnEndextracts config snapshots from assistant messagesToolExecutionEndrecordsChildLoopReffor sub-agent traceabilityParallelLoopEndretroactively setsParallelGroupRecordon all branch recordsMessageUpdateevents are optionally recorded (off by default; 100-1000x more numerous)- All other events append to
LoopRecord.eventsasLoopEvent { sequence, event }
Code Reference
| Concept | File |
|---|---|
AgentEvent, StreamDelta, ContinuationKind, TurnTrigger | src/types/event.rs |
SessionRecorder, SessionRecorderConfig | src/session/recorder.rs |
| Event emission (AgentStart, TurnStart, MessageUpdate, etc.) | src/agent_loop/run.rs, src/agent_loop/streaming.rs |
| Tool lifecycle events (ToolExecutionStart/Update/End) | src/agent_loop/tools.rs |
LoopRecord, LoopEvent, Session | src/session/model.rs |
Conceptual Notes
- before_task / after_task callbacks [EXISTS] -- Session-level callbacks on
SessionRecorderConfig(G2).BeforeTaskFnfires on firstAgentStartwith new session_id;AfterTaskFnfires onflush(). These are semantically session-scoped, unlikebefore_loop/after_loopwhich fire per-loop. - Session Scope [EXISTS] --
SessionScopeenum (Ephemeral/Persistent) on theSessionstruct (G7). Set via config[session] scope = "persistent". - Error Events -- The current design uses
StopReason::Errorand theon_errorcallback for LLM errors. A dedicatedAgentEvent::Errorvariant for more granular error reporting (tool failures, network issues, etc.) is noted as a potential improvement in the source comments. - Event Replay --
LoopRecord.eventsstores the full event stream (asVec<LoopEvent>), enabling replay or analysis of past runs.SessionRecorderConfig.include_streaming_eventscontrols whether the high-volumeMessageUpdatedeltas are included.
Compaction System
The compaction system manages context window pressure by summarizing, truncating, or dropping older conversation turns when the token count approaches the model's limit. Two strategies coexist: a legacy in-memory approach that rewrites the message array, and a modern block-based approach that creates non-destructive overlays on LoopRecords.
Concept Overview
Compaction [EXISTS]
├── CompactionBlock [EXISTS] — non-destructive overlay
│ ├── keep_first, keep_compacted, keep_recent [EXISTS]
├── CompactionScope [EXISTS] — FixedCount(n) / TokenBudget
├── CompactionStrategy [EXISTS] — legacy in-memory
├── BlockCompactionStrategy [EXISTS] — modern overlay
├── TurnMap [EXISTS] — turn indices → message ranges
├── Callbacks: before/after compaction [EXISTS]
└── Config: consolidated in CompactionConfig [EXISTS]
CompactionBlock [EXISTS]
Non-destructive compaction overlay stored on LoopRecord alongside the original messages. When present, the context loader uses this block instead of raw messages. Three sections control what gets loaded into context.
| Field | Type | Status | Description |
|---|---|---|---|
keep_first | Option<TurnRange> | [EXISTS] | Turns kept verbatim from the start; only populated for the MOST RECENT loop |
keep_compacted | Option<CompactedSection> | [EXISTS] | Fully summarised section; populated for ALL loops |
keep_recent | Option<CompactedSection> | [EXISTS] | Recent turns with truncated tool outputs; only populated for the MOST RECENT loop |
created_at | DateTime<Utc> | [EXISTS] | When this block was created |
Loading logic:
- Most recent loop: loads
keep_first(original messages) +keep_compacted(summaries) +keep_recent(truncated) - Older loops: loads only
keep_compacted(full-loop summary) - No compaction block: loads raw messages
Supporting Types
| Type | Status | Description |
|---|---|---|
TurnRange { start_turn, end_turn } | [EXISTS] | Inclusive range of turn indices within a loop |
CompactedSection { range, messages } | [EXISTS] | A turn range plus the replacement messages for that range |
CompactionScope [EXISTS]
Controls how many earlier loops are included in compaction and context loading.
| Variant | Status | Description |
|---|---|---|
FixedCount(usize) | [EXISTS] | Compact a fixed number of earlier loops on the active chain (default: 3) |
TokenBudget | [EXISTS] | Walk backward, accumulating per-loop token estimates, stop when max_context_tokens would be exceeded |
TokenBudget note: The scope can include loops whose raw messages EXCEED max_context_tokens. This is intentional -- the compacted summaries will fit even when originals don't, enabling richer context for LLM-based summarisation strategies.
CompactionStrategy (Legacy) [EXISTS]
In-memory compaction that rewrites the message array. Used when AgentContext.session is None.
| Method | Status | Description |
|---|---|---|
compact(messages, config) -> Vec<AgentMessage> | [EXISTS] | Takes ownership of messages and returns a compacted version |
DefaultCompaction [EXISTS]
The built-in implementation. Delegates to compact_messages() which applies 3-level reduction:
- Truncate tool outputs
- Summarize turns
- Drop middle
BlockCompactionStrategy (Modern) [EXISTS]
Creates non-destructive CompactionBlock overlays. Used when AgentContext.session is Some.
| Method | Status | Description |
|---|---|---|
keep_first(record, turn_map, config) -> Option<TurnRange> | [EXISTS] | Determine turns kept verbatim from start (most recent loop only) |
keep_recent(record, turn_map, config) -> Option<CompactedSection> | [EXISTS] | Create recent section with truncated tool outputs (most recent loop only) |
keep_compacted(record, turn_map, config, is_most_recent) -> Option<CompactedSection> | [EXISTS] | Create summarised section; for most recent: middle only; for older: entire loop |
compact(record, config, is_most_recent) -> CompactionBlock | [EXISTS] | Default: assembles from the three methods above |
DefaultBlockCompaction [EXISTS]
Stateless implementation. All parameters come from CompactionConfig.
| Section | Behavior |
|---|---|
keep_first | Returns turn range 0..keep_first_turns |
keep_recent | Truncates tool outputs to tool_output_max_lines |
keep_compacted | Per-turn one-liner summaries bounded by max_summary_tokens; drops remaining turns when budget exhausted |
Limitation: DefaultBlockCompaction.keep_compacted is basic -- it drops turns that exceed the token budget rather than producing a holistic summary. More sophisticated strategies (e.g. LLM-based) should summarise ALL turns within the budget.
TurnMap [EXISTS]
Maps turn indices to message index ranges within a message array. Built from messages by grouping on TurnId.turn_index.
| Method | Status | Description |
|---|---|---|
from_messages(messages) -> TurnMap | [EXISTS] | Build from messages; messages without turn_id are their own group |
turn_count() -> u32 | [EXISTS] | Number of turn groups |
messages_for_range(range, all_msgs) -> &[AgentMessage] | [EXISTS] | Slice of messages belonging to a TurnRange |
turn_msg_range(turn_index) -> Option<(usize, usize)> | [EXISTS] | Message index range for a single turn |
Orchestration [EXISTS]
Cross-loop compaction coordination. The orchestrator resolves scope, then creates CompactionBlocks for the current loop and earlier loops within scope.
| Function | Status | Description |
|---|---|---|
compact_session_loops(session, current_loop_id, strategy, config, max_context_tokens) | [EXISTS] | Creates blocks: current loop gets all three sections; earlier loops get only keep_compacted |
build_context_from_session(session, current_loop_id, config, max_context_tokens) | [EXISTS] | Walks the loop chain, loads from CompactionBlocks where available, raw messages otherwise |
resolve_scope(session, chain, scope, max_context_tokens) | [EXISTS] | Resolves CompactionScope to a concrete count of earlier loops |
CompactionConfig [EXISTS]
Full compaction policy -- controls both WHEN and HOW to compact.
WHEN to compact
| Field | Type | Default | Status | Description |
|---|---|---|---|---|
compact_at_pct | f64 | 0.90 | [EXISTS] | Fraction of max_context_tokens at which headroom is measured |
compact_budget_threshold_pct | f64 | 0.05 | [EXISTS] | Minimum headroom fraction before compaction fires |
compaction_scope | CompactionScope | FixedCount(3) | [EXISTS] | How many earlier loops to include |
HOW to compact
| Field | Type | Default | Status | Description |
|---|---|---|---|---|
keep_first_turns | usize | 2 | [EXISTS] | Turns kept verbatim from start (most recent loop) |
keep_recent_turns | usize | 10 | [EXISTS] | Turns kept from end (extended to turn boundary) |
max_summary_tokens | usize | 2_000 | [EXISTS] | Token budget for summarised middle section |
tool_output_max_lines | usize | 50 | [EXISTS] | Max lines per tool output in keep_recent section |
Code Reference
| Concept | File |
|---|---|
CompactionBlock, TurnRange, CompactedSection, TurnMap | src/context/compaction.rs |
CompactionStrategy, DefaultCompaction, BlockCompactionStrategy, DefaultBlockCompaction | src/context/strategy.rs |
CompactionConfig, CompactionScope, ContextConfig | src/context/config.rs |
compact_session_loops(), build_context_from_session(), resolve_scope() | src/context/orchestration.rs |
compact_messages() (legacy in-memory) | src/context/compact_messages.rs |
ContextTracker (token tracking) | src/context/tracker.rs |
in_memory_strategy and block_strategy fields | src/context/config.rs (on CompactionConfig) |
Conceptual Notes
- before_compaction_start / after_compaction_end callbacks [EXISTS] -- Lifecycle hooks now fire around compaction.
before_compaction_startfires before compaction begins (for pre-compaction indexing/memory extraction) andafter_compaction_endfires after compaction completes (for post-compaction verification). Both are blank-by-default callbacks. - Config consolidation [EXISTS] -- Compaction strategies (
in_memory_strategyandblock_strategy) are now fields onCompactionConfig, consolidating what was previously split acrossContextConfig.compactionandAgentLoopConfig. The strategies no longer live onAgentLoopConfig; all compaction policy and strategy configuration is in one place. - LLM-based Summarisation --
DefaultBlockCompaction.keep_compactedis a basic per-turn one-liner generator. TheBlockCompactionStrategytrait is designed for more sophisticated strategies that call an LLM to produce holistic digests of all turns within themax_summary_tokensbudget. - Compaction Events [EXISTS] --
CompactionStartedandCompactionEndedevents bracket compaction execution, providing estimated token counts before/after. These are consumed bySessionRecorderfor observability. - Legacy vs Modern -- Two systems coexist:
CompactionStrategy(legacy, in-memory, rewrites messages) is used whenAgentContext.sessionisNone;BlockCompactionStrategy(modern, non-destructive overlays) is used when session data is available. The legacy path is preserved for backward compatibility and simple stateless use cases.
Configuration
Configuration controls agent behavior at three levels: context management (ContextConfig), execution safety (ExecutionLimits), and the unified loop config (AgentLoopConfig) that bundles model, hooks, compaction, limits, caching, retry, and filters into a single borrowed struct for each agent_loop call.
Concept Overview
Configuration [EXISTS]
├── ContextConfig [EXISTS] — max_context_tokens + compaction policy
├── CompactionConfig [EXISTS] — WHEN (thresholds, scope) + HOW (keep settings)
├── ExecutionLimits [EXISTS] — max_turns/tokens/duration/cost
├── CacheConfig [EXISTS] — Auto/Disabled/Manual
├── AgentLoopConfig [EXISTS] — 20+ fields (model, hooks, limits, filters)
├── Callback hooks [EXISTS] — 12 hook types across turn/loop/tool/error
├── ThinkingLevel [EXISTS] — Off/Minimal/Low/Medium/High
└── InputFilter [EXISTS] — Pass/Warn/Reject
ContextConfig [EXISTS]
Model constraints plus compaction policy. When set on AgentLoopConfig, enables automatic context management.
| Field | Type | Default | Status | Description |
|---|---|---|---|---|
max_context_tokens | usize | 100_000 | [EXISTS] | Maximum context tokens (the model's context window) |
system_prompt_tokens | usize | 4_000 | [EXISTS] | Tokens reserved for the system prompt |
compaction | CompactionConfig | (see below) | [EXISTS] | Compaction policy -- always present when context limits are set |
keep_recent | usize | 10 | [EXISTS] | Legacy field (use compaction.keep_recent_turns instead) |
keep_first | usize | 2 | [EXISTS] | Legacy field (use compaction.keep_first_turns instead) |
tool_output_max_lines | usize | 50 | [EXISTS] | Legacy field (use compaction.tool_output_max_lines instead) |
CompactionConfig [EXISTS]
Full compaction policy -- controls both WHEN to compact and HOW to compact. Embedded in ContextConfig.compaction.
WHEN: Trigger Thresholds
| Field | Type | Default | Status | Description |
|---|---|---|---|---|
compact_at_pct | f64 | 0.90 | [EXISTS] | Fraction of max_context_tokens below which headroom is measured |
compact_budget_threshold_pct | f64 | 0.05 | [EXISTS] | Minimum remaining headroom before compaction fires. With defaults (100k/4k): fires at ~81k tokens |
compaction_scope | CompactionScope | FixedCount(3) | [EXISTS] | How many earlier loops to include: FixedCount(n) or TokenBudget |
HOW: Compaction Parameters
| Field | Type | Default | Status | Description |
|---|---|---|---|---|
keep_first_turns | usize | 2 | [EXISTS] | Turns kept verbatim from start (most recent loop only) |
keep_recent_turns | usize | 10 | [EXISTS] | Turns kept from end; extended to turn boundary so ToolCall/ToolResult pairs are never split |
max_summary_tokens | usize | 2_000 | [EXISTS] | Token budget for the summarised middle section (total, not per-turn) |
tool_output_max_lines | usize | 50 | [EXISTS] | Max lines per tool output in keep_recent section |
ExecutionLimits [EXISTS]
Safety net against runaway agent loops. Checked before each turn by ExecutionTracker.
| Field | Type | Default | Status | Description |
|---|---|---|---|---|
max_turns | usize | 50 | [EXISTS] | Maximum LLM turns (catches infinite tool-call loops) |
max_total_tokens | usize | 1_000_000 | [EXISTS] | Maximum total tokens consumed across all turns |
max_duration | Duration | 600s | [EXISTS] | Maximum wall-clock duration |
max_cost | Option<f64> | None | [EXISTS] | Maximum cumulative dollar cost; requires model_config.cost rates to be set |
ExecutionTracker [EXISTS]
Runtime state tracker that checks limits before each turn.
| Field | Status | Description |
|---|---|---|
limits | [EXISTS] | The ExecutionLimits being enforced |
turns | [EXISTS] | Turn counter |
tokens_used | [EXISTS] | Accumulated token count |
cost_accumulated | [EXISTS] | Accumulated dollar cost |
started_at | [EXISTS] | Instant when tracking began |
When a limit is hit, check_limits() returns a reason string. The agent loop injects a "[Agent stopped: ...]" user message so the LLM (and user) can see what happened.
CacheConfig [EXISTS]
Controls prompt caching behavior for providers that support it.
| Field | Type | Default | Status | Description |
|---|---|---|---|---|
enabled | bool | true | [EXISTS] | Master switch for caching hints |
strategy | CacheStrategy | Auto | [EXISTS] | How cache breakpoints are placed |
CacheStrategy [EXISTS]
| Variant | Status | Description |
|---|---|---|
Auto | [EXISTS] | Automatic breakpoint placement (system prompt + tool defs + recent history) |
Disabled | [EXISTS] | No caching |
Manual { cache_system, cache_tools, cache_messages } | [EXISTS] | Fine-grained control over what gets cached |
AgentLoopConfig [EXISTS]
All static settings for a single agent_loop / agent_loop_continue call. Borrowed (&AgentLoopConfig) throughout the loop -- never mutated. 20+ fields organized by concern.
Model & Provider
| Field | Type | Status | Description |
|---|---|---|---|
model_config | ModelConfig | [EXISTS] | Complete provider identity (model id, api_key, base_url, protocol, compat, cost) |
provider_override | Option<Arc<dyn StreamProvider>> | [EXISTS] | Bypasses ProviderRegistry dispatch; for testing or custom providers |
thinking_level | ThinkingLevel | [EXISTS] | Depth of model reasoning: Off, Minimal, Low, Medium, High |
max_tokens | Option<u32> | [EXISTS] | Override model_config.max_tokens for this call |
temperature | Option<f32> | [EXISTS] | Temperature override |
Context Transformation
| Field | Type | Status | Description |
|---|---|---|---|
convert_to_llm | Option<ConvertToLlmFn> | [EXISTS] | Converts AgentMessage[] to Message[] before each LLM call |
transform_context | Option<TransformContextFn> | [EXISTS] | Transforms full context before convert_to_llm (pruning, reordering, injection) |
Steering & Follow-up
| Field | Type | Status | Description |
|---|---|---|---|
get_steering_messages | Option<GetMessagesFn> | [EXISTS] | Polled between tools for user interruptions |
get_follow_up_messages | Option<GetMessagesFn> | [EXISTS] | Polled after agent finishes for queued work |
Compaction
| Field | Type | Status | Description |
|---|---|---|---|
context_config | Option<ContextConfig> | [EXISTS] | Context window configuration; None disables compaction |
Note: Compaction strategies have been consolidated into
CompactionConfig(G5). Seein_memory_strategyandblock_strategyfields onCompactionConfig. The formercompaction_strategyandblock_compaction_strategyfields no longer exist onAgentLoopConfig.
Limits & Safety
| Field | Type | Status | Description |
|---|---|---|---|
execution_limits | Option<ExecutionLimits> | [EXISTS] | Max turns, tokens, duration, cost |
cache_config | CacheConfig | [EXISTS] | Prompt caching configuration |
tool_execution | ToolExecutionStrategy | [EXISTS] | Sequential, Parallel, or Batched |
retry_config | RetryConfig | [EXISTS] | Exponential backoff with jitter for transient errors |
Callback Hooks -- Turn Level
| Field | Type | Status | Description |
|---|---|---|---|
before_turn | Option<BeforeTurnFn> | [EXISTS] | (messages, turn_index) -> bool; return false to abort the turn |
after_turn | Option<AfterTurnFn> | [EXISTS] | (messages, turn_usage) |
Callback Hooks -- Loop Level
| Field | Type | Status | Description |
|---|---|---|---|
before_loop | Option<BeforeLoopFn> | [EXISTS] | (messages, loop_index) -> bool; return false to abort |
after_loop | Option<AfterLoopFn> | [EXISTS] | (new_messages, accumulated_usage) |
on_error | Option<OnErrorFn> | [EXISTS] | Called when LLM returns StopReason::Error |
Callback Hooks -- Tool Level
| Field | Type | Status | Description |
|---|---|---|---|
before_tool_execution | Option<BeforeToolExecutionFn> | [EXISTS] | (tool_name, tool_call_id, args) -> bool; return false to skip |
after_tool_execution | Option<AfterToolExecutionFn> | [EXISTS] | (tool_name, tool_call_id, is_error) |
before_tool_execution_update | Option<BeforeToolExecutionUpdateFn> | [EXISTS] | (tool_name, tool_call_id, text) -> bool; return false to suppress |
after_tool_execution_update | Option<AfterToolExecutionUpdateFn> | [EXISTS] | (tool_name, tool_call_id, text) |
Input Filtering & Identity
| Field | Type | Status | Description |
|---|---|---|---|
input_filters | Vec<Arc<dyn InputFilter>> | [EXISTS] | Filters run in order; first Reject wins |
first_turn_trigger | TurnTrigger | [EXISTS] | Trigger type for first TurnStart; default User, set to SubAgent by sub-agent callers |
config_id | Option<String> | [EXISTS] | Stable identity for loop_id construction: "{session_id}.{config_id}.{N}" |
Callback Hook Type Aliases [EXISTS]
All hooks are Option<Arc<dyn Fn(...)>>. None means no hook (zero overhead).
| Type Alias | Signature | Status |
|---|---|---|
ConvertToLlmFn | Box<dyn Fn(&[AgentMessage]) -> Vec<Message>> | [EXISTS] |
TransformContextFn | Box<dyn Fn(Vec<AgentMessage>) -> Vec<AgentMessage>> | [EXISTS] |
GetMessagesFn | Box<dyn Fn() -> Vec<AgentMessage>> | [EXISTS] |
BeforeLoopFn | Arc<dyn Fn(&[AgentMessage], usize) -> bool> | [EXISTS] |
AfterLoopFn | Arc<dyn Fn(&[AgentMessage], &Usage)> | [EXISTS] |
BeforeTurnFn | Arc<dyn Fn(&[AgentMessage], usize) -> bool> | [EXISTS] |
AfterTurnFn | Arc<dyn Fn(&[AgentMessage], &Usage)> | [EXISTS] |
BeforeToolExecutionFn | Arc<dyn Fn(&str, &str, &serde_json::Value) -> bool> | [EXISTS] |
AfterToolExecutionFn | Arc<dyn Fn(&str, &str, bool)> | [EXISTS] |
BeforeToolExecutionUpdateFn | Arc<dyn Fn(&str, &str, &str) -> bool> | [EXISTS] |
AfterToolExecutionUpdateFn | Arc<dyn Fn(&str, &str, &str)> | [EXISTS] |
OnErrorFn | Arc<dyn Fn(&str)> | [EXISTS] |
InputFilter Trait [EXISTS]
Synchronous filter applied to user input before the LLM call. Intentionally synchronous for hot-path performance; use before_turn for async moderation.
| Method | Status | Description |
|---|---|---|
filter(text) -> FilterResult | [EXISTS] | Returns Pass, Warn(String), or Reject(String) |
FilterResult [EXISTS]
| Variant | Status | Description |
|---|---|---|
Pass | [EXISTS] | Message passes unchanged |
Warn(String) | [EXISTS] | Message passes; warning appended to context for LLM to see |
Reject(String) | [EXISTS] | Message rejected; agent loop returns immediately with InputRejected event |
Filters run in order. First Reject wins and discards accumulated warnings. Warn messages accumulate and are appended to the user message.
ThinkingLevel [EXISTS]
Controls the depth of model reasoning before responding.
| Variant | Status | Description |
|---|---|---|
Off (default) | [EXISTS] | No thinking tokens; fastest and cheapest |
Minimal | [EXISTS] | Lightest reasoning pass |
Low | [EXISTS] | Shallow chain-of-thought |
Medium | [EXISTS] | Balanced reasoning; default for most agentic workflows |
High | [EXISTS] | Maximum reasoning budget; most expensive |
Usage [EXISTS]
Token metrics per turn or accumulated.
| Field | Type | Status | Description |
|---|---|---|---|
input | u64 | [EXISTS] | Input tokens |
output | u64 | [EXISTS] | Output tokens |
reasoning | u64 | [EXISTS] | Reasoning tokens (subset of output; non-zero for OpenAI o-series) |
cache_read | u64 | [EXISTS] | Tokens served from cache |
cache_write | u64 | [EXISTS] | Tokens written to cache |
total_tokens | u64 | [EXISTS] | Total tokens |
| Method | Status | Description |
|---|---|---|
estimated_cost(cost_config) | [EXISTS] | Dollar cost from per-million-token rates |
combine(other) | [EXISTS] | Sum two Usage values |
cache_hit_rate() | [EXISTS] | Fraction of input tokens from cache (0.0-1.0) |
Code Reference
| Concept | File |
|---|---|
ContextConfig, CompactionConfig, CompactionScope | src/context/config.rs |
ExecutionLimits, ExecutionTracker | src/context/execution.rs |
AgentLoopConfig and all callback type aliases | src/agent_loop/config.rs |
Usage, CacheConfig, CacheStrategy, ThinkingLevel | src/types/usage.rs |
InputFilter, FilterResult, EvaluationStrategy | src/types/parallel.rs |
ToolExecutionStrategy | src/types/tool.rs |
RetryConfig | src/provider/retry.rs |
Conceptual Notes
- before_task / after_task [EXISTS] -- Session-level callbacks on
SessionRecorderConfig.BeforeTaskFn: Arc<dyn Fn(&Session) -> bool>fires on firstAgentStartwith a new session_id.AfterTaskFn: Arc<dyn Fn(&Session)>fires onflush(). - before_compaction_start / after_compaction_end [EXISTS] -- Compaction lifecycle callbacks (G1) on
AgentLoopConfig.before_compaction_start(estimated_tokens, message_count) -> boolfires beforeCompactionStarted.after_compaction_end(msgs_before, msgs_after, tokens_before, tokens_after)fires afterCompactionEnded. - Per-loop config tracking [EXISTS] -- Model, thinking_level, temperature, and other config values are captured per-loop in
LoopConfigSnapshoton eachLoopRecord(and inAgentStart.config_snapshot). Session no longer carries model_config, thinking_level, or temperature fields. Fallback hierarchy: Loop -> Agent default. - Config streamlining [DONE] -- Compaction strategies (
in_memory_strategy,block_strategy) have been consolidated intoCompactionConfig, completing G5. The dispatch logic inrun.rsreads them fromctx_config.compaction.AgentLoopConfigno longer carries strategy fields. - ParallelLoopOutcome / ParallelLoopResult -- Defined in
src/types/parallel.rs, these types support evaluational parallelism where multiple branches run concurrently and anEvaluationStrategyselects the winner. Related to config because parallel configs produce multipleAgentLoopConfiginstances.
phi-core — System Architecture
1. Component Map
Agent trait + BasicAgent (src/agents/)
Responsibility: Agent (trait, agents/agent.rs) defines the runtime interface — prompting, state access, control, and steering queues. BasicAgent (struct, agents/basic_agent.rs) is the default in-memory implementation: owns the conversation, tools, and ModelConfig (provider identity), and is the application-facing entry point. Construction: BasicAgent::new(ModelConfig::anthropic(...)). The optional provider_override field bypasses ProviderRegistry for custom or test providers. SubAgentTool (agents/sub_agent.rs) implements AgentTool to delegate tasks to a child agent_loop().
Public interface:
prompt(text)— Send a text prompt; returns an event stream receiver.prompt_messages(messages)— Send one or more messages as a prompt; returns an event stream receiver.prompt_with_sender(text, tx)— Send a text prompt, streaming events to a caller-provided sender.continue_loop()— Resume from existing context withContinuationKind::Default; returns an event stream receiver.continue_loop_with_sender(tx, kind)— Resume with an explicitContinuationKind(Default,Rerun { tag }, orBranch { tag }), streaming events to a caller-provided sender.steer(msg)— Queue a message that will be injected mid-run between tool executions.follow_up(msg)— Queue a message to be processed after the agent would otherwise stop.abort()— Cancel the in-progress run by signalling the cancellation token.reset()— Clear messages, queues, streaming state, and cancel token to return the agent to its initial state.save_messages()— Serialize the current conversation to a JSON string.restore_messages(json)— Replace the current conversation with messages deserialized from a JSON string.with_skills(skill_set)— Load skills and append their XML index to the system prompt per the AgentSkills standard.with_mcp_server_stdio(cmd, args, env)— Connect to an MCP server by spawning a child process and add its tools to the agent.with_mcp_server_http(url)— Connect to an MCP server via HTTP and add its tools to the agent.with_openapi_file(path, config, filter)— Load tools from an OpenAPI spec file and add them to the agent.with_openapi_url(url, config, filter)— Fetch an OpenAPI spec from a URL and add its tools to the agent.with_openapi_spec(spec_str, config, filter)— Parse an OpenAPI spec string (JSON or YAML) and add its tools to the agent.new_session()— Immediately rotate to a newsession_id; resets loop counters andlast_loop_id; returns the new session id.check_and_rotate(threshold)— Rotate to a new session if the agent has been idle longer thanthresholdsince the lastprompt_*call; returnsSome(new_session_id)on rotation,Noneotherwise.
BasicAgent state relevant to session management:
| Field | Type | Description |
|---|---|---|
agent_id | String | Stable identifier across all sessions for this instance |
session_id | String | Current session identifier; updated by new_session() |
loop_counters | HashMap<String, usize> | Per-config loop counter; cleared on new_session() |
last_loop_id | Option<String> | Most recent loop; cleared on new_session() |
last_active_at | Option<DateTime<Utc>> | Timestamp of last prompt_* call; used by check_and_rotate() |
AgentLoop (src/agent_loop/)
Responsibility: The core execution engine. Manages the turn loop, tool dispatch, steering injection, follow-up processing, and lifecycle event emission. Public interface:
agent_loop(prompts, context, config, tx, cancel)— Start an agent run from new prompt messages, applying input filters, emitting lifecycle events, and returning all new messages produced.agent_loop_continue(context, config, tx, cancel)— Resume from existing context (no new prompts); used for retries after errors or mid-conversation continuation.agent_loop_parallel(prompts, base_context, configs, strategy, tx, cancel) -> ParallelLoopResult— Run NAgentLoopConfigs concurrently and evaluate results viaEvaluationStrategy. Whenpromptsis non-empty, each branch usesagent_loop; whenpromptsis empty, each branch usesagent_loop_continue(the user query is already the last message inbase_context).base_contextis cloned per branch (toolsArc-shared; message history deep-copied). All branches share the samesession_id; each gets a distinctloop_id.ParallelLoopOutcome.original_context_lenmarks the base/branch message boundary. EmitsParallelLoopStart/ParallelLoopEndevents.selected_contextfeeds intoagent_loop_continue()for normal session resumption.derive_config_segment(config) -> String(pub crate) — Derives the stable{config_segment}portion of aloop_idfromconfig.config_idor provider/model/thinking fields.
EvaluationLoop (src/agent_loop/evaluation.rs)
Responsibility: Pluggable strategy for selecting among parallel loop outcomes. Decoupled from src/agent_loop/ to allow custom implementations without a circular dependency (trait is defined in src/types/; implementations live here).
Public interface:
EvaluationStrategy(trait, defined insrc/types/) —evaluate(prompts, outcomes, tx, cancel) -> (EvaluationDecision, Usage)EvaluationDecision(enum, defined insrc/types/) —Select(usize)— 0-based index of the winning outcome.ParallelLoopOutcome.original_context_len: usize— Number of messages in the cloned context at dispatch time. Allows strategies to split "original context" from "new branch output" messages without separate bookkeeping. Identical across all outcomes (same base context);outcomes[0]is the idiomatic source.TransparentEvaluation— Single-branch pass-through; panics if> 1outcome.PickFirstEvaluation— Always selects index 0. Useful for testing.TokenEfficientEvaluation— Selects the outcome with the lowest total token usage.ElaborateEvaluation— Selects the outcome with the highest total token usage.LlmJudgeEvaluation { judge_config, system_prompt }— Runs a separate LLM call to select the best branch. Supports bothagent_loopmode (query fromprompts) andagent_loop_continuemode (query extracted from lastMessage::Userincontext.messages[..original_context_len]). Includes prior conversation context in the judge prompt. Applies 2-iteration compaction: Iteration 1 compacts only prior context (3 tiers: tail-truncate → paragraph-summary → hard char limit), keeping outputs intact; Iteration 2 (if needed) compacts both context and outputs independently through the same tier pipeline. Budget derived fromjudge_config.context_config.max_context_tokens. Emits aProgressMessagewarning if comprehension criteria cannot be satisfied after iteration 2.
ContextManager (src/context/)
Responsibility: Token estimation, tiered context compaction, and execution limit tracking. Public interface:
estimate_tokens(text)— Rough token count heuristic: ~4 characters per token.compact_messages(messages, config)— Reduce message list to fit token budget using a tiered strategy: truncate tool outputs → summarize old turns → drop middle messages.CompactionStrategy(trait) — Interface for custom compaction logic; default implementation uses the tiered cascade (legacycompact_messages(); modern: CompactionBlock overlays).ContextTracker— Tracks context window usage by combining provider-reported token counts with local estimates for recent messages.ExecutionTracker— Tracks turns, cumulative tokens, and elapsed time against configured limits; signals when any limit is exceeded.ContextConfig— Tuning knobs for compaction: token budget, system-prompt overhead, head/tail message preservation counts, per-tool-output line limit.ExecutionLimits— Hard caps on agent execution: max turns, max total tokens, max wall-clock duration.
ProviderRegistry (src/provider/registry.rs, src/provider/mod.rs)
Responsibility: Dispatches StreamConfig to the correct provider implementation based on model_config.api: ApiProtocol. Built inline per agent_loop() call; zero allocation for a registry with all built-in providers pre-registered.
Public interface:
ProviderRegistry::default()— Pre-registers all 7 built-in providers; used automatically byagent_loop()whenAgentLoopConfig.provider_overrideisNone.ProviderRegistry::new()— Create an empty registry for custom provider sets.- Provider resolution:
model_config.apiselects the wire-protocol handler;model_configfields (id,api_key,base_url,compat, etc.) differentiate services within the same protocol.
StreamProvider implementations (src/provider/)
Responsibility: Translate the unified StreamConfig into provider-specific HTTP requests and parse streaming responses back into StreamEvents.
Providers: AnthropicProvider, OpenAiCompatProvider (15+ backends), OpenAiResponsesProvider, AzureOpenAiProvider, GoogleProvider, GoogleVertexProvider, BedrockProvider, MockProvider.
Public interface:
StreamProvider::stream(config, tx, cancel) -> Result<Message, ProviderError>— stream a single LLM response.StreamProvider::provider_id() -> &str— stable lowercase identifier for this provider (e.g."anthropic","openai","google","bedrock"). Used as the first segment of the auto-derivedconfig_idinloop_idconstruction.
ToolSystem (src/tools/)
Responsibility: Built-in tool implementations. Each implements AgentTool.
Tools: BashTool (shell execution), ReadFileTool (text + image files), WriteFileTool (create/overwrite), EditFileTool (surgical search/replace), ListFilesTool (directory listing), SearchTool (grep/ripgrep).
Public interface:
default_tools()— Returns the standard built-in toolset: bash, read-file, write-file, edit-file, list-files, search.AgentTool::name()— Unique tool identifier used in LLM tool-use calls and event correlation.AgentTool::label()— Human-readable display name for UI.AgentTool::description()— Free-text description sent to the LLM to explain when to use the tool.AgentTool::parameters_schema()— JSON Schema object describing the tool's accepted parameters.AgentTool::execute(params, ctx)— Run the tool with resolved parameters and a context carrying the cancellation token and progress callbacks.
SubAgentTool (src/agents/sub_agent.rs)
Responsibility: Implements AgentTool to delegate tasks to a child agent_loop() with isolated context, its own toolset, and a turn limit. The child gets its own agent_id, session_id, and loop_id; its parent_loop_id is linked back to the calling loop via with_parent_loop_id.
Public interface:
SubAgentTool::new(name, model_config).with_*(...)— Construct a sub-agent tool with its ownModelConfig(provider identity), system prompt, toolset, and turn limit, then register it as anAgentTool.SubAgentTool::with_provider_override(provider)— BypassProviderRegistrydispatch; used in tests to injectMockProvider.SubAgentTool::with_parent_loop_id(loop_id)— Supply the parent loop'sloop_idso the childAgentStartevent carriesparent_loop_id, enabling ancestry tracing across the event stream.
SkillSystem (src/context/skills.rs)
Responsibility: Loads SKILL.md files from one or more directories, parses YAML frontmatter, and formats them as an XML index injected into the system prompt.
Public interface:
SkillSet::load(dirs)— Load skills from multiple directories; later entries override earlier ones on name conflict.SkillSet::load_dir(dir, source)— Load skills from a single directory, tagging each with a source label.SkillSet::merge(other)— Merge anotherSkillSetin; the other's skills override on name conflict.SkillSet::format_for_prompt()— Render the skill list as an<available_skills>XML block ready for system-prompt injection.
McpClient (src/mcp/)
Responsibility: MCP client that connects to external tool servers over stdio or HTTP. Adapts discovered tools into AgentTool instances.
Public interface:
McpClient::connect_stdio(cmd, args, env)— Spawn a child process, complete the JSON-RPC initialize handshake, and return a connected client.McpClient::connect_http(url)— Connect to an HTTP-based MCP server and complete the initialize handshake.McpToolAdapter::from_client(client)— Query the server for available tools and return oneAgentTooladapter per tool.
OpenApiAdapter (src/openapi/, feature-gated)
Responsibility: Parses OpenAPI 3.x specs and generates one AgentTool per operation. Each tool makes an HTTP request to the spec's base URL.
Public interface:
from_file(path, config, filter)— Parse an OpenAPI spec from a local file and return one tool adapter per matching operation.from_url(url, config, filter)— Fetch an OpenAPI spec over HTTP and return one tool adapter per matching operation.from_str(spec, config, filter)— Parse an OpenAPI spec from an in-memory string (auto-detects JSON vs YAML) and return one tool adapter per matching operation. Availability: Only compiled when theopenapifeature flag is enabled.
SessionStore (src/session/)
Responsibility: Persistent session layer. Records every AgentEvent into a structured tree of Session + LoopRecord objects, and provides both free-function and trait-based APIs for flat JSON-file persistence.
Public interface:
SessionRecorder::new(config)— Create a recorder; callon_event(event)for every event on the agent'stxchannel.SessionRecorder::flush()— Finalize all open loops (status →Aborted) and move them into their sessions.SessionRecorder::drain_completed()— Consume and return all completed sessions.SessionRecorder::sessions()— Iterate all known sessions (completed + in-progress).SessionRecorder::get_session(id)— Look up a session bysession_id.SessionRecorder::current_loop(id)— Look up an in-progressLoopRecordbyloop_id.save_session(session, dir)— Write{dir}/{session_id}.json(createsdirif needed). Atomic via tmp-file + rename.load_session(session_id, dir)— Read{dir}/{session_id}.json.list_session_ids(dir)— List all session ids indir, newest first.load_sessions_for_agent(agent_id, dir)— Load all sessions matchingagent_id.delete_session(session_id, dir)— Remove{dir}/{session_id}.json.SessionStoretrait — asyncsave/load/list_ids/delete/list_for_agentfor callers that want a pluggable store (custom backends, mocks). (Added 0.7.0)FileSystemSessionStore::new(dir)— In-tree async impl ofSessionStore. Adds advisoryfs2exclusive lock on save (returnsSessionError::Lockedif a concurrent writer holds it). (Added 0.7.0) File format: Pretty-printed JSON. Flat directory — one file per session, no index. Writes are atomic (tmp + rename) regardless of API surface used.
RetryEngine (src/provider/retry.rs)
Responsibility: Computes exponential-backoff delay with ±20% jitter. Classifies which errors are retryable. Public interface:
RetryConfig— Parameters for automatic retry: initial delay, backoff multiplier, max delay, max attempt count.RetryConfig::delay_for_attempt(attempt)— Compute the sleep duration before attempt N using exponential backoff with ±20% jitter.is_retryable()(onProviderError) — Returns true only forRateLimitedandNetworkvariants; all other errors fail immediately.retry_after()(onProviderError) — Extracts the server-specified retry delay from aRateLimited { retry_after_ms: Some(...) }error, if present.
2. Dependency Graph
graph TD
App["Application Code"] --> Agent
Agent --> AgentLoop["AgentLoop\nagent_loop/"]
AgentLoop --> ContextManager["ContextManager\ncontext/"]
AgentLoop --> ProviderRegistry["Provider\ntraits.rs / registry.rs"]
AgentLoop --> ToolSystem["ToolSystem\ntools/"]
AgentLoop --> RetryEngine["RetryEngine\nprovider/retry.rs"]
ProviderRegistry --> Anthropic["AnthropicProvider"]
ProviderRegistry --> OpenAI["OpenAiCompatProvider\n(15+ backends)"]
ProviderRegistry --> OpenAIResp["OpenAiResponsesProvider"]
ProviderRegistry --> Azure["AzureOpenAiProvider"]
ProviderRegistry --> Google["GoogleProvider"]
ProviderRegistry --> Vertex["GoogleVertexProvider"]
ProviderRegistry --> Bedrock["BedrockProvider"]
ProviderRegistry --> Mock["MockProvider\n(tests)"]
Agent --> SkillSystem["SkillSystem\ncontext/skills.rs"]
Agent --> McpClient["McpClient\nmcp/"]
Agent --> OpenApiAdapter["OpenApiAdapter\nopenapi/ (feature)"]
McpClient --> ToolSystem
OpenApiAdapter --> ToolSystem
SubAgent["SubAgentTool\nsub_agent.rs"] --> AgentLoop
ToolSystem --> SubAgent
Types["types/\n(shared types)"] --> Agent
Types --> AgentLoop
Types --> ToolSystem
Types --> ProviderRegistry
SessionStore["SessionStore\nsession/"] --> Types
App --> SessionStore
3. Data Flow
3.1 Simple Text Prompt (no tool calls)
sequenceDiagram
participant App
participant Agent
participant AgentLoop
participant Provider
participant EventCh as EventChannel
App->>Agent: prompt("What is 2+2?")
Agent->>AgentLoop: agent_loop(prompts, context, config, tx, cancel)
AgentLoop->>EventCh: AgentStart
AgentLoop->>EventCh: TurnStart
AgentLoop->>EventCh: MessageStart (user)
AgentLoop->>EventCh: MessageEnd (user)
AgentLoop->>Provider: stream(StreamConfig)
Provider-->>EventCh: StreamEvent::Start
Provider-->>EventCh: StreamEvent::TextDelta x N
Provider-->>EventCh: StreamEvent::Done(Message)
AgentLoop->>EventCh: MessageStart (assistant placeholder)
AgentLoop->>EventCh: MessageUpdate x N (deltas)
AgentLoop->>EventCh: MessageEnd (assistant final)
AgentLoop->>EventCh: TurnEnd
AgentLoop->>EventCh: AgentEnd(messages)
App->>EventCh: receives events via rx.recv()
3.2 Tool Call Cycle
sequenceDiagram
participant AgentLoop
participant Provider
participant BashTool
participant EventCh as EventChannel
AgentLoop->>Provider: stream(config with tool defs)
Provider-->>AgentLoop: Done(Message{stop_reason: ToolUse, content: [ToolCall{...}]})
AgentLoop->>EventCh: TurnEnd(assistant message)
AgentLoop->>AgentLoop: extract tool_calls from assistant content
AgentLoop->>EventCh: ToolExecutionStart(id, name, args)
AgentLoop->>BashTool: execute(params, ToolContext)
BashTool-->>EventCh: ProgressMessage (via on_progress callback)
BashTool-->>AgentLoop: Ok(ToolResult)
AgentLoop->>EventCh: ToolExecutionEnd(id, name, result, is_error=false)
AgentLoop->>EventCh: MessageStart(ToolResult message)
AgentLoop->>EventCh: MessageEnd(ToolResult message)
AgentLoop->>AgentLoop: append tool results to context.messages
AgentLoop->>Provider: stream(config, now includes tool results)
Provider-->>AgentLoop: Done(Message{stop_reason: Stop})
AgentLoop->>EventCh: TurnEnd
AgentLoop->>EventCh: AgentEnd
3.3 Context Compaction Trigger
sequenceDiagram
participant AgentLoop
participant ContextManager
participant Provider
AgentLoop->>ContextManager: compact(messages, config)
ContextManager->>ContextManager: total_tokens(messages) > budget?
alt Level 1 fits
ContextManager-->>AgentLoop: truncated tool outputs
else Level 2 fits
ContextManager-->>AgentLoop: old turns summarized
else Level 3
ContextManager-->>AgentLoop: first + recent kept, middle dropped
end
AgentLoop->>Provider: stream(config with compacted messages)
3.4 Sub-Agent Delegation
sequenceDiagram
participant ParentLoop as Parent AgentLoop
participant SubAgentTool
participant ChildLoop as Child AgentLoop
participant ChildProvider as Provider
ParentLoop->>SubAgentTool: execute({task: "..."}, ToolContext)
SubAgentTool->>SubAgentTool: build AgentContext with child identity<br/>(new agent_id, session_id, loop_id="{child_session}.sub.1",<br/>parent_loop_id = parent's loop_id)
SubAgentTool->>ChildLoop: agent_loop([task_prompt], context, config, tx, cancel)
ChildLoop->>ChildLoop: emit AgentStart{loop_id, parent_loop_id}
ChildLoop->>ChildProvider: stream(...)
ChildProvider-->>ChildLoop: streaming events
ChildLoop-->>SubAgentTool: Vec<AgentMessage> (final messages)
SubAgentTool->>SubAgentTool: extract_final_text(messages)
SubAgentTool-->>ParentLoop: Ok(ToolResult{text, child_loop_id: Some(loop_id)})
Note over ParentLoop: ToolExecutionEnd{child_loop_id} emitted<br/>→ parent stream records child ancestry
4. Data Models
Content
Entity: Content (enum)
Variant Text:
text: String [the text content]
Variant Image:
data: String [base64-encoded binary]
mime_type: String [e.g. "image/png", "image/jpeg"]
Variant Thinking:
thinking: String [internal reasoning text]
signature: Option<String> [provider-specific thinking signature, optional]
Variant ToolCall:
id: String [unique call ID, e.g. UUID]
name: String [tool name matching AgentTool::name()]
arguments: JSON [parameter values matching tool's JSON Schema]
Serialization: tagged by "type" field ("text", "image", "thinking", "toolCall")
Message
Entity: Message (enum)
Variant User:
content: Vec<Content> [usually a single Text block]
timestamp: u64 [unix milliseconds]
Variant Assistant:
content: Vec<Content> [text, thinking, tool call blocks]
stop_reason: StopReason [why the model stopped]
model: String [model ID returned by provider]
provider: String [provider name, e.g. "anthropic"]
usage: Usage [token counts for this turn]
timestamp: u64 [unix milliseconds]
error_message: Option<String> [set when stop_reason == Error]
Variant ToolResult:
tool_call_id: String [matches Content::ToolCall.id]
tool_name: String [matches Content::ToolCall.name]
content: Vec<Content> [tool output, usually a Text block]
is_error: bool [true if tool execution failed]
timestamp: u64 [unix milliseconds]
Lifecycle: User messages are created by the caller. Assistant messages are
created by the provider after streaming completes. ToolResult messages
are created by the agent loop after tool execution.
AgentMessage
Entity: AgentMessage (enum, untagged)
Variant Llm(LlmMessage) [sent to the LLM; user/assistant/toolResult roles; LlmMessage wraps Message + Option<TurnId>]
Variant Extension(ExtensionMessage) [not sent to LLM; app-only metadata]
Note: stored in Agent.messages and AgentContext.messages
Extension messages are filtered out before LLM calls
ExtensionMessage
Entity: ExtensionMessage
role: String [always "extension"]
kind: String [app-defined event type, e.g. "ui_update"]
data: JSON [arbitrary app-defined payload]
StopReason
Entity: StopReason (enum)
Stop -> model completed naturally
Length -> max_tokens limit hit
ToolUse -> model returned tool calls (loop must continue)
Error -> provider or streaming error occurred
Aborted -> cancellation token was triggered
Serialization: camelCase ("stop", "length", "toolUse", "error", "aborted")
Usage
Entity: Usage
input: u64 [prompt tokens processed]
output: u64 [completion tokens generated]
cache_read: u64 [tokens served from prompt cache]
cache_write: u64 [tokens written to prompt cache]
total_tokens: u64 [sum, may be 0 if not reported]
Derived: cache_hit_rate() = cache_read / (input + cache_read + cache_write)
AgentEvent
Entity: AgentEvent (enum, #[serde(tag = "type")])
Every variant except AgentStart, ParallelLoopStart, and ParallelLoopEnd now carries
loop_id: String so that events from concurrent parallel branches can be reliably
attributed to the correct LoopRecord even when they are interleaved on one tx channel.
AgentStart {
agent_id: String [stable agent instance identifier]
session_id: String [groups all loops in one session]
loop_id: String ["{session_id}.{config_id}.{N}" — unique per call]
parent_loop_id: Option<String> [None for origin calls; Some for continuations/sub-agents]
continuation_kind: Option<ContinuationKind> [None=origin; Some(Default/Rerun/Branch)=continuation]
timestamp: DateTime<Utc>
metadata: Option<JSON>
}
AgentEnd {
loop_id: String [← identifies the loop]
messages: Vec<AgentMessage> [all new messages produced by this loop]
usage: Usage
timestamp: DateTime<Utc>
rejection: Option<String> [Some if input filter blocked the run]
}
TurnStart {
loop_id: String
turn_index: u32
timestamp: DateTime<Utc>
triggered_by: TurnTrigger [what caused this turn to begin]
}
TurnEnd {
loop_id: String
message: AgentMessage
usage: Usage
timestamp: DateTime<Utc>
tool_results: Vec<Message>
}
MessageStart { loop_id: String, message } [message streaming began]
MessageUpdate { loop_id: String, message, delta } [content delta arrived]
MessageEnd { loop_id: String, message } [message complete]
ToolExecutionStart { loop_id: String, tool_call_id, tool_name, args }
ToolExecutionUpdate { loop_id: String, tool_call_id, tool_name, partial_result }
ToolExecutionEnd {
loop_id: String
tool_call_id: String
tool_name: String
result: ToolResult
is_error: bool
child_loop_id: Option<String> [Some only when tool spawned a sub-agent loop]
}
ProgressMessage { loop_id: String, tool_call_id, tool_name, text }
InputRejected { loop_id: String, reason } [input filter blocked the prompt]
ParallelLoopStart { [loop_id NOT on this variant]
session_id: String
loop_ids: Vec<String> [one loop_id per branch, in config order]
timestamp: DateTime<Utc>
}
ParallelLoopEnd { [loop_id NOT on this variant]
session_id: String
selected_loop_id: String
selected_config_index: usize
evaluation_usage: Usage
timestamp: DateTime<Utc>
}
StreamDelta
Entity: StreamDelta (enum)
Text { delta: String } [text content chunk]
Thinking { delta: String } [thinking content chunk]
ToolCallDelta { delta: String } [tool call argument chunk]
ToolContext
Entity: ToolContext
tool_call_id: String [for correlation with AgentEvent]
tool_name: String [for correlation with AgentEvent]
cancel: CancellationToken [check is_cancelled() in long-running tools]
on_update: Option<ToolUpdateFn> [callback for streaming partial ToolResults]
on_progress: Option<ProgressFn> [callback for user-facing status text]
ContinuationKind
Entity: ContinuationKind (enum)
Default [unspecified continuation — preserves legacy semantics]
Rerun { tag: String } [retry from an equivalent context; tag is RFC 3339 UTC timestamp]
Branch { tag: String } [explore a different path from a branching point; tag is RFC 3339 UTC timestamp]
Set on AgentContext.continuation_kind before calling agent_loop_continue().
Surfaced in AgentStart.continuation_kind (None = origin call).
TurnTrigger semantics:
Default / Rerun → first turn uses TurnTrigger::Continuation
Branch → first turn uses TurnTrigger::Branch
TurnTrigger
Entity: TurnTrigger (enum)
User [first turn of an agent_loop() origin call with new user prompts]
SubAgent [first turn when running as a sub-agent via SubAgentTool]
Continuation [subsequent turns; tool round-trip, steering, or Default/Rerun continuation]
Branch [first turn of an agent_loop_continue(Branch) call]
Emitted in TurnStart.triggered_by.
Priority on first turn (run_loop):
1. Branch continuation → TurnTrigger::Branch
2. Any other continuation → TurnTrigger::Continuation
3. Origin call → config.first_turn_trigger (User or SubAgent)
Subsequent turns always use TurnTrigger::Continuation.
ToolResult / ToolError
Entity: ToolResult
content: Vec<Content> [tool output content blocks]
details: JSON [structured metadata, not sent to LLM, e.g. exit_code]
child_loop_id: Option<String> [set by sub-agent tools; None for all other tools]
Entity: ToolError (enum)
Failed(String) [general execution failure]
NotFound(String) [tool name not in registry]
InvalidArgs(String) [parameter validation failed]
Cancelled [CancellationToken was triggered]
ContextConfig
Entity: ContextConfig
max_context_tokens: usize [default: 100,000; total budget including system prompt]
system_prompt_tokens: usize [default: 4,000; reserved for system prompt]
keep_recent: usize [default: 10; messages always kept in full at tail]
keep_first: usize [default: 2; messages always kept at head]
tool_output_max_lines: usize [default: 50; L1 compaction per-tool-output limit]
Effective budget = max_context_tokens - system_prompt_tokens
ExecutionLimits / ExecutionTracker
Entity: ExecutionLimits
max_turns: usize [default: 50; LLM calls before forced stop]
max_total_tokens: usize [default: 1,000,000; cumulative token budget]
max_duration: Duration [default: 600s; wall-clock time limit]
Entity: ExecutionTracker (runtime state)
limits: ExecutionLimits [immutable config]
turns: usize [incremented after each LLM call]
tokens_used: usize [cumulative; updated from provider Usage]
started_at: Instant [set on construction]
RetryConfig
Entity: RetryConfig
max_retries: usize [default: 3; 0 = no retries]
initial_delay_ms: u64 [default: 1,000ms]
backoff_multiplier: f64 [default: 2.0; exponential growth factor]
max_delay_ms: u64 [default: 30,000ms; ceiling before jitter]
CacheConfig / CacheStrategy
Entity: CacheConfig
enabled: bool [master switch; default: true]
strategy: CacheStrategy
Entity: CacheStrategy (enum)
Auto [provider places breakpoints automatically]
Disabled [no caching hints sent]
Manual {
cache_system: bool [cache system prompt]
cache_tools: bool [cache tool definitions]
cache_messages: bool [cache second-to-last message]
}
StreamConfig (sent to provider)
Entity: StreamConfig
model_config: ModelConfig [REQUIRED — full provider identity: id, api_key, base_url, compat, cost]
system_prompt: String
messages: Vec<Message> [LLM-only messages, Extension filtered out]
tools: Vec<ToolDefinition> [schema-only; no execute functions]
thinking_level: ThinkingLevel
max_tokens: Option<u32> [overrides model_config.max_tokens when Some]
temperature: Option<f32>
cache_config: CacheConfig
Note: model identity (id, api_key, base_url, headers, compat) is accessed via
model_config.id, model_config.api_key, etc. No top-level model or api_key fields.
ToolDefinition (sent to LLM)
Entity: ToolDefinition
name: String [matches AgentTool::name()]
description: String [matches AgentTool::description()]
parameters: JSON [JSON Schema object matching AgentTool::parameters_schema()]
Skill / SkillSet
Entity: Skill
name: String [from YAML frontmatter; skill identifier]
description: String [from YAML frontmatter; one-line capability summary]
file_path: PathBuf [absolute path to the SKILL.md file]
base_dir: PathBuf [absolute path to the skill's directory]
source: String [origin label: "dir:0", "dir:1", etc.]
Entity: SkillSet
skills: Vec<Skill>
Lifecycle: Loaded from disk at startup via SkillSet::load(dirs).
Formatted as XML via format_for_prompt() and appended to system prompt.
Agent reads full SKILL.md on-demand when activating a skill via read_file tool.
QueueMode
Entity: QueueMode (enum) — controls steering/follow-up queue delivery
OneAtATime pop and return exactly one message per call (default)
All drain and return all queued messages at once
Used in: Agent.steering_mode, Agent.follow_up_mode
McpToolInfo / McpContent
Entity: McpToolInfo — tool metadata returned by MCP server
name: String [tool identifier used in tools/call]
description: Option<String> [human-readable description; default empty string]
inputSchema: JSON [JSON Schema for the tool's parameters]
Entity: McpContent (enum) — content item in a tool call result
Variant Text:
type: "text"
text: String
Variant Image:
type: "image"
data: String [base64-encoded]
mimeType: String
Entity: McpToolCallResult
content: Vec<McpContent> [output from the tool]
isError: bool [true if the tool reported an error]
OpenApiConfig / OpenApiAuth / OperationFilter
Entity: OpenApiConfig — configuration for OpenAPI tool generation
base_url: Option<String> [overrides spec servers[0].url; trailing slash stripped]
auth: OpenApiAuth [authentication method]
custom_headers: Map<String,String> [extra headers added to every request]
max_response_bytes: usize [default: 65536 (64KB); response body truncation limit]
timeout_secs: u64 [default: 30; per-request timeout]
name_prefix: Option<String> [if set, tool names formatted as "{prefix}__{operationId}"]
Entity: OpenApiAuth (enum)
None [no authentication]
Bearer(token: String) [Authorization: Bearer {token}]
ApiKey { header: String, value: String } [custom header: {header}: {value}]
Note: Bearer token and ApiKey value are redacted as "****" in debug output.
Entity: OperationFilter (enum) — controls which API operations become tools
All [include all operations that have an operationId]
ByOperationId(Vec<String>) [include only operations whose id is in the list]
ByTag(Vec<String>) [include operations tagged with any listed tag]
ByPathPrefix(String) [include operations whose path starts with the prefix]
Session / LoopRecord / SessionRecorder
Entity: Session
session_id: String
agent_id: String
created_at: DateTime<Utc>
last_active_at: DateTime<Utc>
formation: SessionFormation [Explicit | FirstLoop | InactivityTimeout{..}]
parent_spawn_ref: Option<SpawnRef> [set when this session was a sub-agent spawn]
loops: Vec<LoopRecord> [ordered by started_at]
Methods: root_loops(), children_of(loop_id), parallel_siblings(loop_id),
get_loop(loop_id), total_usage()
Entity: LoopRecord
loop_id: String
session_id: String
agent_id: String
parent_loop_id: Option<String>
continuation_kind: Option<ContinuationKind>
started_at: DateTime<Utc>
ended_at: Option<DateTime<Utc>>
status: LoopStatus [Pending | Running | Completed | Rejected | Aborted]
rejection: Option<String>
config: Option<LoopConfigSnapshot> [model, provider, config_id + name, api, base_url, reasoning, context_window, max_tokens, thinking_level, temperature]
messages: Vec<AgentMessage> [from AgentEnd.messages — authoritative]
usage: Usage
metadata: Option<JSON>
events: Vec<LoopEvent> [full event stream; MessageUpdate opt-in]
children_loop_ids: Vec<String> [same-session direct children]
child_loop_refs: Vec<ChildLoopRef> [cross-session sub-agent spawn links]
parallel_group: Option<ParallelGroupRecord>
Entity: ChildLoopRef — outbound cross-session link on the parent LoopRecord
tool_call_id: String
tool_name: String
child_loop_id: String
child_session_id: String
Entity: SpawnRef — inbound cross-session link on the child Session
parent_session_id: String
parent_loop_id: String
tool_call_id: String
tool_name: String
Entity: ParallelGroupRecord
all_loop_ids: Vec<String> [all branch loop_ids in config order]
selected_loop_id: String
selected_config_index: usize
evaluation_usage: Usage
is_selected: bool [true only on the winner's LoopRecord]
Entity: SessionRecorderConfig
formation_policy: SessionFormationPolicy [PerSessionId | InactivityTimeout{secs}]
include_streaming_events: bool [default: false — excludes MessageUpdate]
5. Integration Contracts
Anthropic Messages API
- Endpoint:
https://api.anthropic.com/v1/messages - Auth (standard):
x-api-key: {ANTHROPIC_API_KEY}+anthropic-version: 2023-06-01 - Auth (OAuth):
authorization: Bearer {TOKEN}+ beta headersclaude-code-20250219,oauth-2025-04-20,fine-grained-tool-streaming-2025-05-14;x-app: cli;anthropic-dangerous-direct-browser-access: true;user-agent: claude-cli/2.1.2 - Request: POST JSON with
model,system(array of text blocks),messages,tools,max_tokens(default 8192),stream: true - Response: Server-Sent Events stream; events:
message_start,content_block_start,content_block_delta,message_delta,message_stop - Tool args: Streamed as
InputJsonDeltatext fragments; buffered inarguments["__partial_json"]; parsed as complete JSON oncontent_block_stop - Thinking:
ThinkingLevelmapped to{type:"enabled", budget_tokens: N}— Minimal→128, Low→512, Medium→2048, High→8192 - Prompt caching:
cache_control: {type: "ephemeral"}placed at system/last-tool-def/second-to-last-message perCacheStrategy - Content format:
{type: "text"|"image"|"thinking"|"tool_use"|"tool_result", ...} - Tool results: Role "user", type "tool_result", fields:
tool_use_id,content,is_error
OpenAI-Compatible APIs (Chat Completions)
- Endpoints:
https://api.openai.com/v1/chat/completionsand 14+ compatible bases (xAI/Grok, Groq, Cerebras, Mistral, DeepSeek, etc.) - Auth:
Authorization: Bearer {API_KEY} - Request: POST JSON with
model,messages,tools,stream: true,stream_options: {include_usage: true} - max_tokens field name:
"max_tokens"(most) or"max_completion_tokens"(OpenAI) — controlled byOpenAiCompat.max_tokens_field - System prompt: First message with role
"system"or"developer"(OpenAI) — controlled bysupports_developer_role - Thinking:
reasoning_effort: "low"|"medium"|"high"ifsupports_reasoning_effort; response indelta.reasoning_content(OpenAI) ordelta.reasoning(xAI) - Response: SSE stream; each chunk has
choices[0].delta; tool args indelta.tool_calls[].function.arguments(incremental JSON string)
OpenAI Responses API
- Endpoint:
{base_url}/responses - Auth:
Authorization: Bearer {OPENAI_API_KEY} - System prompt:
"instructions"field (not"messages") - Message format: Different from Chat Completions — see Bedrock/Responses comparison below
- Thinking:
"reasoning": {effort: "low"|"medium"|"high"}field - SSE events:
response.output_text.delta,response.reasoning.delta,response.function_call_arguments.start/delta/done,response.completed
Azure OpenAI
- Endpoint:
{base_url}/responses?api-version=2025-01-01-preview(base_url pattern:https://{resource}.openai.azure.com/openai/deployments/{deployment}) - Auth:
api-key: {AZURE_OPENAI_API_KEY}header (notAuthorization: Bearer) - Request/Response: Same format as OpenAI Responses API
Google Generative AI (Gemini)
- Endpoint:
{base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={API_KEY} - Auth: API key as URL query parameter
?key=; no Authorization header - System prompt:
"systemInstruction": {parts: [{text: "..."}]} - Tools: Single object
{functionDeclarations: [...]}wrapping all tool definitions - Contents: Role "user" or "model"; ToolResults sent as
{role:"user", parts:[{functionResponse:{name, response:{result: text}}}]} - Tool args: Delivered complete in one event (no streaming deltas); tool IDs auto-generated as
"google-fc-{index}" - Response parsing: Custom SSE parser (not standard library); splits on
\n\n, extractsdata:line
Google Vertex AI
- Endpoint:
https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:streamGenerateContent?alt=sse - Auth:
Authorization: Bearer {OAUTH_TOKEN}(OAuth2, not API key in URL) - Request/Response: Identical to Google Generative AI; tool IDs generated as
"vertex-fc-{index}"
Amazon Bedrock (ConverseStream)
- Endpoint:
{base_url}/model/{model}/converse-stream(base_url:https://bedrock-runtime.{region}.amazonaws.com) - Auth:
Authorization: Bearer {token}or custom headers frommodel_config.headers; minimal SigV4 support - System prompt:
"system"array:[{text: "..."}] - Tools:
toolConfig.tools:[{toolSpec: {name, description, inputSchema: {json: schema}}}] - Tool results:
{toolResult: {toolUseId, content: [...], status: "success"|"error"}} - Streaming format: Newline-delimited JSON (not standard SSE); events:
contentBlockDelta,contentBlockStart,contentBlockStop,messageStop,metadata
Model Context Protocol (MCP)
-
Protocol: JSON-RPC 2.0
-
Message types:
- Request:
{jsonrpc:"2.0", id:u64, method:String, params:Option<Value>} - Response:
{jsonrpc:"2.0", id:Option<u64>, result:Option<Value>, error:Option<{code:i64,message:String,data?}>} - Request IDs: auto-incremented
AtomicU64starting at 1
- Request:
-
Initialization handshake (3 steps):
- Client sends
initializewith{protocolVersion:"2024-11-05", capabilities:{}, clientInfo:{name:"phi-core",version:"<pkg>"}} - Server responds with
{protocolVersion, capabilities:{tools?,resources?,prompts?}, serverInfo:{name,version}} - Client sends
notifications/initializednotification (no params; server may ignore id)
- Client sends
-
Tool discovery: Client sends
tools/list→ server returns{tools: [{name, description?, inputSchema}]} -
Tool execution: Client sends
tools/call {name, arguments}→ server returns{content:[{type:"text",text}|{type:"image",data,mimeType}], isError:bool} -
Stdio transport: Spawns child process; newline-delimited JSON over stdin/stdout;
tokio::sync::Mutexfor concurrent access; shutdown: EOF on stdin then kill child -
HTTP transport: POST JSON-RPC body to configured URL; stateless (no persistent connection)
-
Tool adapter:
McpToolAdapterwrapsMcpToolInfo+Arc<Mutex<McpClient>>; optionalprefixfor namespace disambiguation ({prefix}__{name}) -
Error enum:
Transport(String),Protocol(String),JsonRpc{code,message},Serialization,Io,ConnectionClosed
OpenAPI
-
Spec formats: OpenAPI 3.x; auto-detected: first non-whitespace char
{or[→ JSON, else YAML -
Sources:
from_file(path)(async read),from_url(url)(HTTP GET via reqwest),from_str(text)(in-memory) -
Base URL resolution:
config.base_url→spec.servers[0].url→ error if neither set; trailing slashes stripped -
Parameter classification:
Pathparameters → URL{param}substitution (RFC 3986 percent-encoding); requiredQueryparameters →.query()chains; optionalHeaderparameters →.header()chains; optionalCookieparameters → skipped (unsupported)RequestBody(application/json only) → keyed as"body"(or"_request_body"on collision); required ifrequestBody.required
-
HTTP execution pipeline (per tool call):
- Validate params is object (or null treated as
{}) - Substitute path params with percent-encoded values; error if any missing
- Build URL:
{base_url}{path} - Chain
.query()for query params present in input - Chain
.header()for header params present in input - Apply auth:
Bearer→.bearer_auth(),ApiKey→.header(header, value),None→ nothing - Apply
custom_headers - If
has_body:.json(params["body"]) - Send request; read full body text; truncate to
max_response_bytesat UTF-8 boundary - Return:
"{METHOD} {URL} → {STATUS_CODE}\n\n{BODY}"
- Validate params is object (or null treated as
-
Operation filter:
OperationFilter::All|ByOperationId|ByTag|ByPathPrefix; operations withoutoperationIdalways skipped with warning -
Tool naming: Default =
operationId; with prefix ={prefix}__{operationId}
File System
- Read:
tokio::fs::read_to_stringfor text (max 1MB),tokio::fs::readfor images (max 20MB) - Write:
tokio::fs::writewith automatic parent dir creation - Edit: Read → string replace (exact match, once) → write
- List: Spawns
findcommand via BashTool - Search: Spawns
greporrgcommand via BashTool
Shell
- Execution:
tokio::process::Command::new("bash").arg("-c").arg(command) - Timeout:
tokio::time::sleepwith default 120s, configurable - Output capture:
stdout+stderrpiped, truncated at 256KB each - Safety: Deny patterns checked before execution (substring match)
- Exit code: Returned in
ToolResult.details.exit_code; tool always returns Ok (non-zero is not a ToolError)
6. State Management
Agent-Level State (in Agent struct)
All fields on Agent:
| Field | Type | Notes |
|---|---|---|
system_prompt | String | Immutable once set; injected into every LLM call |
model | String | Model identifier |
api_key | String | API authentication key |
thinking_level | ThinkingLevel | Off/Minimal/Low/Medium/High |
max_tokens | Option<u32> | Max completion tokens |
temperature | Option<f32> | Sampling temperature |
model_config | Option<ModelConfig> | Provider-specific extras (base_url, headers, compat flags) |
messages | Vec<AgentMessage> | Grows on each prompt() call; reset by reset(); replaced by restore_messages() |
tools | Vec<Box<dyn AgentTool>> | Tool instances (heap-allocated trait objects) |
provider | Box<dyn StreamProvider> | Boxed, not Arc; owned exclusively by Agent |
steering_queue | Arc<Mutex<Vec<AgentMessage>>> | Written by steer(), drained by agent loop before each tool execution check |
follow_up_queue | Arc<Mutex<Vec<AgentMessage>>> | Written by follow_up(), drained when agent loop would stop |
steering_mode | QueueMode | Default: OneAtATime |
follow_up_mode | QueueMode | Default: OneAtATime |
context_config | Option<ContextConfig> | If None, context compaction is disabled |
execution_limits | Option<ExecutionLimits> | If None, no hard limits enforced |
cache_config | CacheConfig | Prompt caching hints (Anthropic) |
tool_execution | ToolExecutionStrategy | Parallel (default), Sequential, or Batched |
retry_config | RetryConfig | Backoff for RateLimited/Network errors |
before_turn | Option<BeforeTurnFn> | Signature: fn(&[AgentMessage], turn_number: usize) -> bool; return false to abort |
after_turn | Option<AfterTurnFn> | Signature: fn(&[AgentMessage], &Usage) |
on_error | Option<OnErrorFn> | Signature: fn(&str) |
input_filters | Vec<Arc<dyn InputFilter>> | Applied in order before LLM call |
| (compaction strategies) | (moved to ContextConfig.compaction) | in_memory_strategy and block_strategy fields on CompactionConfig (G5) |
cancel | Option<CancellationToken> | Created when prompt() starts, consumed by abort() |
is_streaming | bool | Set true on prompt() entry, false on exit |
agent_id | String | UUID v4 generated once at Agent::new(); stable for the Agent's lifetime. Injected into every AgentContext built by this agent. |
session_id | String | UUID v4 generated once at Agent::new(); groups all loops under one session. Stable for the Agent's lifetime. |
loop_counters | HashMap<String, usize> | Per-"{session_id}.{config_id}" monotonic counter; incremented by next_loop_id() to produce the N component of loop_id. |
last_loop_id | Option<String> | loop_id of the most recently started loop; set after each prompt_* or continue_loop_* call. Becomes parent_loop_id on the next continuation. |
before_loop | Option<BeforeLoopFn> | Hook called once before AgentStart. Signature: fn(&[AgentMessage], loop_index: usize) -> bool; return false to abort before AgentStart. |
after_loop | Option<AfterLoopFn> | Hook called once after AgentEnd. Signature: fn(&[AgentMessage], &Usage). |
before_tool_execution | Option<BeforeToolExecutionFn> | Hook called before each ToolExecutionStart. Signature: fn(&str, &str, &JSON) -> bool (tool_name, call_id, args); return false to skip. |
after_tool_execution | Option<AfterToolExecutionFn> | Hook called after each ToolExecutionEnd. Signature: fn(&str, &str, bool) (tool_name, call_id, is_error). |
before_tool_execution_update | Option<BeforeToolExecutionUpdateFn> | Hook called before each ToolExecutionUpdate. Signature: fn(&str, &str, &str) -> bool (tool_name, call_id, text); return false to suppress the event. |
after_tool_execution_update | Option<AfterToolExecutionUpdateFn> | Hook called after each ToolExecutionUpdate (only when not suppressed). Signature: fn(&str, &str, &str). |
Invariants:
assert!(!self.is_streaming)fires ifprompt()is called while already running — callers must usesteer()orfollow_up()during active runscancelis alwaysSomewhileis_streamingis truemessagesmust not end in anAssistantmessage beforeagent_loop_continue()is calledagent_idandsession_idare alwaysSomein anyAgentContextbuilt byAgent; direct callers ofagent_loop_continuemust also set them
AgentContext (per-run, passed into agent loop)
| State Element | Type | Description |
|---|---|---|
system_prompt | String | Immutable for the duration of the run |
messages | Vec<AgentMessage> | Mutated in-place: prompts appended, assistant messages appended, tool results appended; may be replaced by compaction |
tools | &[Box<dyn AgentTool>] | Immutable for the duration of the run |
agent_id | Option<String> | Stable agent instance ID. Set by Agent::prompt_*; also written back by agent_loop when None. Required (non-None) for agent_loop_continue. |
session_id | Option<String> | Stable session ID. Same lifecycle as agent_id. |
loop_id | Option<String> | Per-call identifier of the form "{session_id}.{config_id}.{N}". Set by Agent before calling agent_loop/agent_loop_continue; falls back to UUID if None at loop entry. |
parent_loop_id | Option<String> | loop_id of the loop this call continues from. None for origin calls. Set by Agent::continue_loop_with_sender to Agent.last_loop_id. |
continuation_kind | Option<ContinuationKind> | How this call relates to prior loops. None for origin; Some(Default|Rerun|Branch) for continuations. |
ExecutionTracker (per-run)
| State | Initial | Transitions |
|---|---|---|
turns | 0 | Incremented after each LLM call |
tokens_used | 0 | Incremented by token count of each LLM response |
started_at | Instant::now() | Immutable; compared against max_duration on each check |
Steering/Follow-up Queue Modes
QueueMode::OneAtATime(default for both queues): on each read, lock mutex, pop the first message only, return asVecof 1QueueMode::All: on each read, lock mutex, drain all queued messages, return the full vec
Both queues are passed to AgentLoopConfig as closures (get_steering_messages, get_follow_up_messages) that capture the Arc<Mutex<>> pointer, enabling external callers to enqueue messages while the agent loop is running on another task.
Event Hook Ordering
All hooks fire in a guaranteed strict order relative to their paired events. This ordering is enforced at runtime and is an invariant of the system:
before_loop → AgentStart
before_turn → TurnStart
[MessageStart/End for initial prompts — first turn of agent_loop() only]
[MessageStart/End for injected steering messages]
[LLM: MessageStart → MessageUpdate* → MessageEnd]
[per tool call:]
before_tool_execution → ToolExecutionStart
(before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)*
ToolExecutionEnd → after_tool_execution
TurnEnd → after_turn
(repeat inner block for each follow-up / steering-triggered turn)
AgentEnd → after_loop
Short-circuit rules — hook returns false:
| Hook | When false is returned | Behaviour |
|---|---|---|
before_loop | Before AgentStart | Loop is aborted; AgentEnd { messages: [] } is emitted; function returns immediately |
before_turn | Before TurnStart | Turn is skipped; TurnStart/TurnEnd are not emitted; AgentEnd is not guaranteed |
before_tool_execution | Before ToolExecutionStart | Tool call is skipped; ToolExecutionStart/End are not emitted; a skipped error ToolResult is returned to the LLM |
before_tool_execution_update | Before ToolExecutionUpdate | Event is suppressed; after_tool_execution_update is not called; tool keeps running and final ToolResult is unaffected |
7. Error Handling Strategy
Provider Errors (ProviderError)
| Error | Retryable | Handling |
|---|---|---|
RateLimited { retry_after_ms } | Yes | Exponential backoff; respects Retry-After header if present |
Network(msg) | Yes | Exponential backoff |
Auth(msg) | No | Propagated immediately as StopReason::Error message |
Api(msg) | No | Propagated as StopReason::Error message |
ContextOverflow { msg } | No | Detected on HTTP 400/413; triggers compaction on next turn (see below) |
Cancelled | No | Loop exits cleanly, AgentEnd emitted |
Other(msg) | No | Propagated as StopReason::Error message |
Context Overflow Recovery
- Provider returns HTTP 400/413 matching any of 15+ known overflow phrases.
ProviderError::classify()returnsContextOverflow.- The overflow may arrive as an HTTP error (caught in retry loop) or as a streaming error event (
StreamEvent::Errorwith matching message), caught byMessage::is_context_overflow(). - On the next turn, if
context_configis set,compact_messages()is called before the LLM call. - If no
context_configis set, the error message is included in conversation history and the loop continues — the LLM may self-recover or the next turn will also fail.
Tool Errors (ToolError)
Cancelled: Tool execution skipped;ToolResultcontent = "Skipped due to queued user message." withis_error: trueFailed(msg): Converted toToolResultwith error text;is_error: true; always returned to LLM so it can self-correctInvalidArgs(msg): Same as Failed; LLM can retry with corrected parametersNotFound(msg): Produced when tool name inToolCallhas no matchingAgentTool; same handling as Failed
Input Filter Errors
Reject(reason): EmitsAgentEvent::InputRejected, immediately emitsAgentEvent::AgentEnd { messages: [] }, returns empty message listWarn(msg): Warning text appended to last user message content; loop continues
Execution Limit Exhaustion
- When any limit is exceeded, a synthetic user message
[Agent stopped: {reason}]is appended to context and emitted as events. - Loop returns immediately after appending the message.
- No error is thrown;
AgentEndis emitted normally.
Before-Turn Abort
- If
before_turncallback returnsfalse, the loop returns immediately with noAgentEndemitted. - This is the only path where
AgentEndis not guaranteed.
Error Propagation Across Components
Provider → ProviderError → stream_assistant_response() → Message{stop_reason: Error}
↓
on_error callback invoked
↓
AgentEvent::TurnEnd emitted
↓
agent loop returns
Implementation Roadmap
Generated from:
../reference/glossary.md,../specs/architecture.md,../architecture/algorithms.mdLast updated: 2026-03-17 Paradigm: Language-agnostic / Implementation-independent
This roadmap defines six progressive stages of implementation derived from the reverse-engineered specification. Each level is a complete, testable stage. Complete and stabilize each level fully before advancing to the next.
Level 1 — Survive
Goal: The system can start, load configuration, initialize its core structures, and confirm it is alive. Nothing works end-to-end yet, but nothing crashes either.
Completion Criteria: A smoke test confirms the Agent can be constructed with a MockProvider, configured via builder methods, and all core data entities can be instantiated without error. No LLM call is required to pass Level 1.
Milestone 1.1 — Core Type System
-
REQ-001: Define the
Contentenum with four variants:Text { text },Image { data: base64, mime_type },Thinking { thinking, signature }, andToolCall { id, name, arguments }. Serialized with a"type"discriminant field. (Source: [AR])- Depends on: —
- Definition of Done: All four variants instantiate; round-trip JSON serialization produces the correct tagged shape.
-
REQ-002: Define the
Messageenum with three variants:User { content, timestamp },Assistant { content, stop_reason, model, provider, usage, timestamp, error_message }, andToolResult { tool_call_id, tool_name, content, is_error, timestamp }. (Source: [AR])- Depends on: REQ-001, REQ-005, REQ-006
- Definition of Done: All three variants instantiate; serialization preserves the
rolefield with values"user","assistant","toolResult".
-
REQ-003: Define
AgentMessageas an untagged enum wrappingLlm(LlmMessage)andExtension(ExtensionMessage). (Source: [AR])- Depends on: REQ-002, REQ-004
- Definition of Done: Both variants serialize/deserialize correctly; an
Extensionvariant round-trips without loss.
-
REQ-004: Define
ExtensionMessagewith fieldsrole: String(always"extension"),kind: String, anddata: JSON. (Source: [AR])- Depends on: —
- Definition of Done: Instantiates and serializes to
{role:"extension", kind:"...", data:{...}}.
-
REQ-005: Define
StopReasonenum with variantsStop,Length,ToolUse,Error,Aborted. Serialized in camelCase. (Source: [AR])- Depends on: —
- Definition of Done: All variants serialize to their documented camelCase strings.
-
REQ-006: Define
Usagestruct with fieldsinput,output,cache_read,cache_write,total_tokens(allu64). Include acache_hit_rate()derived method. (Source: [AR])- Depends on: —
- Definition of Done:
cache_hit_rate()returnscache_read / (input + cache_read + cache_write).
-
REQ-007: Define
AgentEventenum with all variants:AgentStart,AgentEnd { messages },TurnStart,TurnEnd { message, tool_results },MessageStart { message },MessageUpdate { message, delta },MessageEnd { message },ToolExecutionStart { tool_call_id, tool_name, args },ToolExecutionUpdate { tool_call_id, tool_name, partial_result },ToolExecutionEnd { tool_call_id, tool_name, result, is_error },ProgressMessage { tool_call_id, tool_name, text },InputRejected { reason }. (Source: [AR])- Depends on: REQ-002, REQ-008
- Definition of Done: All variants instantiate.
-
REQ-008: Define
StreamDeltaenum with variantsText { delta },Thinking { delta },ToolCallDelta { delta }. (Source: [AR])- Depends on: —
- Definition of Done: All variants instantiate and carry their string payload.
-
REQ-009: Define
ToolContextstruct with fieldstool_call_id,tool_name,cancel: CancellationToken,on_update: Option<ToolUpdateFn>,on_progress: Option<ProgressFn>. (Source: [AR])- Depends on: —
- Definition of Done: Struct instantiates; callback fields accept closures/function pointers.
-
REQ-010: Define
ToolResult { content: Vec<Content>, details: JSON }andToolErrorenum with variantsFailed(String),NotFound(String),InvalidArgs(String),Cancelled. (Source: [AR])- Depends on: REQ-001
- Definition of Done: All variants instantiate;
ToolErrorconverts to a display string.
-
REQ-011: Define
ContextConfigstruct with fields and defaults:max_context_tokens(100,000),system_prompt_tokens(4,000),keep_recent(10),keep_first(2),tool_output_max_lines(50). (Source: [AR])- Depends on: —
- Definition of Done: Default construction produces the documented default values.
-
REQ-012: Define
ExecutionLimitsstruct with defaultsmax_turns(50),max_total_tokens(1,000,000),max_duration(600s); andExecutionTrackerruntime state with fieldslimits,turns,tokens_used,started_at. (Source: [AR])- Depends on: —
- Definition of Done:
ExecutionTracker::new(limits)initializesturns=0,tokens_used=0,started_at=now.
-
REQ-013: Define
RetryConfigwith defaults:max_retries(3),initial_delay_ms(1,000),backoff_multiplier(2.0),max_delay_ms(30,000). (Source: [AR])- Depends on: —
- Definition of Done: Default construction produces documented defaults.
-
REQ-014: Define
CacheConfig { enabled: bool, strategy: CacheStrategy }andCacheStrategyenum with variantsAuto,Disabled,Manual { cache_system, cache_tools, cache_messages }. (Source: [AR])- Depends on: —
- Definition of Done: All variants instantiate; default
CacheConfighasenabled: true,strategy: Auto.
-
REQ-015: Define
StreamConfigstruct with fieldsmodel,system_prompt,messages: Vec<Message>,tools: Vec<ToolDefinition>,thinking_level,api_key,max_tokens,temperature,model_config,cache_config. (Source: [AR])- Depends on: REQ-014, REQ-016
- Definition of Done: Struct instantiates with all optional fields as
None.
-
REQ-016: Define
ToolDefinitionstruct with fieldsname,description,parameters: JSON. (Source: [AR])- Depends on: —
- Definition of Done: Struct instantiates and serializes to the expected JSON shape.
-
REQ-017: Define
QueueModeenum with variantsOneAtATimeandAll. (Source: [AR])- Depends on: —
- Definition of Done: Both variants exist; default is
OneAtATime.
-
REQ-018: All types in the
AgentMessagetree deriveSerializeandDeserialize. (Source: [OV])- Depends on: REQ-001 through REQ-017
- Definition of Done: Full round-trip JSON serialization of a
Vec<AgentMessage>containing all message types is lossless.
-
REQ-019: Define
ThinkingLevelenum with variantsOff,Minimal,Low,Medium,High. (Source: [OV])- Depends on: —
- Definition of Done: All variants exist.
Milestone 1.2 — Core Traits
-
REQ-020: Define
StreamProvidertrait with a single methodstream(config: StreamConfig, tx: EventSender, cancel: CancellationToken) -> Result<Message, ProviderError>. DefineProviderErrorenum with variantsApi(String),Network(String),Auth(String),RateLimited { retry_after_ms: Option<u64> },ContextOverflow { message: String },Cancelled,Other(String). (Source: [AR])- Depends on: REQ-002, REQ-015
- Definition of Done: Trait compiles;
ProviderErrorvariants all instantiate.
-
REQ-021: Define
AgentTooltrait with methodsname() -> &str,label() -> &str,description() -> &str,parameters_schema() -> JSON,execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>. (Source: [AR])- Depends on: REQ-009, REQ-010
- Definition of Done: Trait compiles; a minimal struct can implement it.
-
REQ-022: Define
InputFiltertrait with methodfilter(text: &str) -> FilterResultwhereFilterResultisPass,Warn(String), orReject(String). (Source: [OV])- Depends on: —
- Definition of Done: Trait compiles; all three result variants exist.
-
REQ-023: Define
CompactionStrategytrait with methodcompact(messages: Vec<AgentMessage>, config: ContextConfig) -> Vec<AgentMessage>. (Source: [AR])- Depends on: REQ-003, REQ-011
- Definition of Done: Trait compiles; a struct can implement it.
Milestone 1.3 — Agent Struct Construction
-
REQ-024: Implement
BasicAgent::new(model_config: ModelConfig) -> BasicAgent. Initialize all fields to documented defaults:messages = [],tools = [],thinking_level = Off,tool_execution = Parallel,steering_mode = OneAtATime,follow_up_mode = OneAtATime,context_config = Some(default),execution_limits = Some(default),retry_config = default,is_streaming = false,cancel = None. (Source: [PS])- Depends on: REQ-011 through REQ-017, REQ-019, REQ-020
- Definition of Done:
BasicAgent::new(ModelConfig::anthropic("m", "m", "k"))compiles and all fields have their documented defaults.
-
REQ-025: Implement builder methods:
with_system_prompt(text),with_model_config(cfg),with_provider_override(provider),with_max_tokens(n),with_thinking(level). (Source: [PS])- Depends on: REQ-024
- Definition of Done: Method chain
BasicAgent::new(ModelConfig::anthropic("m", "m", "k")).with_system_prompt("x")compiles and all fields are set correctly.
-
REQ-026: Implement
with_tools(vec),with_context_config(cfg),with_execution_limits(limits),with_retry_config(cfg),with_cache_config(cfg),with_tool_execution(strategy),with_steering_mode(mode),with_follow_up_mode(mode). (Source: [PS])- Depends on: REQ-024
- Definition of Done: All builders set their respective fields;
with_toolsreplaces (or extends) the tools list.
-
REQ-027: Initialize
steering_queueandfollow_up_queueasArc<Mutex<Vec<AgentMessage>>>inBasicAgent::new. (Source: [AR])- Depends on: REQ-003, REQ-024
- Definition of Done: Both queues are non-null, independently lockable, and start empty.
Milestone 1.4 — AgentContext and AgentLoopConfig
-
REQ-028: Define
AgentContextstruct with fieldssystem_prompt: String,messages: Vec<AgentMessage>,tools: &[Box<dyn AgentTool>]. (Source: [AR])- Depends on: REQ-003, REQ-021
- Definition of Done: Struct compiles;
messagesis mutable in-place during the loop.
-
REQ-029: Define
AgentLoopConfigstruct bundling all behavioral settings:provider,model,api_key,thinking_level,max_tokens,temperature,model_config,get_steering_messages: Option<Fn()>,get_follow_up_messages: Option<Fn()>,context_config,compaction_strategy,execution_limits,cache_config,tool_execution,retry_config,before_turn,after_turn,on_error,input_filters,transform_context,convert_to_llm. (Source: [OV])- Depends on: REQ-011 through REQ-017, REQ-023
- Definition of Done: Struct compiles with all optional fields as
None.
Milestone 1.5 — MockProvider and Smoke Test
-
REQ-030: Implement
MockProviderthat implementsStreamProvider. Accepts a list of pre-configured responses to return in sequence. Returns aMessage::Assistantwithstop_reason: Stopand configurable text content. (Source: [AR])- Depends on: REQ-020
- Definition of Done:
MockProvider::new(vec![response1, response2])returns each response in order whenstream()is called; after exhausting the list, returns a default stop response.
-
REQ-031: Smoke test: construct
Agent::new(MockProvider::new([])), configure with builder methods, verify all fields are set correctly, and confirm no panic occurs. (Source: [OV])- Depends on: REQ-024 through REQ-030
- Definition of Done: Test passes with zero panics; all configured fields read back correctly.
Level 2 — Useful
Goal: The primary use cases from the spec work end-to-end on valid, well-formed inputs. An agent can accept a prompt, call an LLM, execute tool calls, and return a final response.
Completion Criteria: Every primary use case from ../reference/glossary.md executes
successfully with valid inputs and a real (or mock) provider: single-turn text
response, multi-turn tool call cycle, message persistence round-trip, and agent
reset. The built-in coding tools all execute on valid inputs.
Milestone 2.1 — Event Channel Infrastructure
-
REQ-032: Implement an unbounded async event channel. The
agent_loopholds the sender (tx); callers receive from the receiver (rx). The channel never blocks the sender. (Source: [AR])- Depends on: REQ-007
- Definition of Done: Sender can emit 1,000 events without blocking; receiver drains them all in order.
-
REQ-033: Implement
CancellationTokenwith methodsnew(),cancel(),is_cancelled() -> bool,child_token() -> CancellationToken. Cancelling a parent automatically cancels all children. (Source: [AR])- Depends on: —
- Definition of Done: Cancelling a root token causes
is_cancelled()to returntrueon both the root and any child tokens created from it.
Milestone 2.2 — Agent Prompt Entry Point
-
REQ-034: Implement
Agent::prompt(text: String) -> EventReceiveras a thin wrapper that constructs aUsermessage and delegates toprompt_messages. (Source: [PS])- Depends on: REQ-002, REQ-035
- Definition of Done:
agent.prompt("hello")returns a receiver immediately (non-blocking).
-
REQ-035: Implement
Agent::prompt_messages_with_sender(messages, tx): setis_streaming = true, createCancellationToken, buildAgentContextsnapshot, buildAgentLoopConfig(wiring queue closures), spawnagent_loop, merge returned messages intoAgent.messageson completion, setis_streaming = false. (Source: [PS])- Depends on: REQ-027, REQ-028, REQ-029, REQ-033, REQ-036
- Definition of Done: After the spawned task completes,
agent.messagescontains the new messages andis_streamingisfalse.
Milestone 2.3 — Agent Loop Core
-
REQ-036: Implement
agent_loop: emitAgentStart, append prompts tocontext.messages, emitTurnStart/MessageStart/MessageEndfor each prompt, callrun_loop, emitAgentEnd, return new messages. (Source: [PS])- Depends on: REQ-032, REQ-037
- Definition of Done: With
MockProvider, a single call emitsAgentStart, at least oneTurnStart/TurnEndpair, andAgentEnd; returned messages include the input prompt and the assistant response.
-
REQ-037: Implement
agent_loop_continue: emitAgentStart/TurnStart, callrun_loop, emitAgentEnd. (Source: [PS])- Depends on: REQ-036
- Definition of Done: Resumes from existing context without re-appending prompts.
-
REQ-038: Implement
run_loopinner loop (happy path only: no steering, no follow-ups, no limits): callstream_assistant_response, append assistant message, extract tool calls, callexecute_tool_calls, append tool results, loop until no more tool calls, then break. (Source: [PS])- Depends on: REQ-039, REQ-045, REQ-060
- Definition of Done: With a MockProvider that returns one tool call then one
Stop,run_loopexecutes the tool and calls the LLM a second time before stopping.
Milestone 2.4 — LLM Streaming (Happy Path)
-
REQ-039: Implement
stream_assistant_response(no retry): buildStreamConfigfrom context and config, callprovider.stream(), process stream events (Start→ emitMessageStart;TextDelta/ThinkingDelta/ToolCallDelta→ emitMessageUpdate;Done→ emitMessageEnd;Error→ emitMessageStart+MessageEnd), return finalMessage. (Source: [PS])- Depends on: REQ-007, REQ-008, REQ-015, REQ-020, REQ-032
- Definition of Done: With MockProvider, caller receives
MessageStart, one or moreMessageUpdatewith text deltas, andMessageEndcontaining the complete assembled message.
-
REQ-040: Implement
AnthropicProvider::stream: POST tohttps://api.anthropic.com/v1/messageswithx-api-key+anthropic-version: 2023-06-01headers,stream: truebody; parse SSE events (message_start,content_block_start,content_block_delta,message_delta,message_stop); bufferInputJsonDeltatool-argument fragments; parse complete JSON oncontent_block_stop; emitStreamEvents. (Source: [AR])- Depends on: REQ-020, REQ-039
- Definition of Done: Integration test with a real or stubbed Anthropic endpoint produces a correctly parsed
Message::Assistantwith usage stats.
-
REQ-041: Implement
OpenAiCompatProvider::stream: POST to configured base URL +/chat/completionswithAuthorization: Bearerheader,stream: true,stream_options: {include_usage: true}; parse SSE chunkschoices[0].delta; accumulate tool-call argument strings; emitStreamEvents. (Source: [AR])- Depends on: REQ-020, REQ-039
- Definition of Done: Correctly parses a streamed chat-completion response from any OpenAI-compatible endpoint.
-
REQ-042: Implement
ProviderRegistrywithnew()(empty) anddefault()(pre-registersAnthropicProviderandOpenAiCompatProvider).ProviderRegistryitself implementsStreamProvider, dispatching based onApiProtocolor model prefix. (Source: [AR])- Depends on: REQ-040, REQ-041
- Definition of Done:
ProviderRegistry::default()can route a config toAnthropicProviderorOpenAiCompatProviderwithout manual dispatch.
-
REQ-043: Implement
StopReasondetermination in each provider: map provider-specific stop signals to the unifiedStopReasonenum ("end_turn"/"stop"→Stop;"max_tokens"/"length"→Length;"tool_use"/"tool_calls"→ToolUse; cancellation →Aborted; errors →Error). (Source: [PS])- Depends on: REQ-005, REQ-040, REQ-041
- Definition of Done: Each stop signal string maps to exactly one
StopReasonvariant.
-
REQ-044: Filter
Extensionmessages out ofAgentMessagehistory before buildingStreamConfig.messages. OnlyLlm(LlmMessage)variants are sent to the LLM (note:LlmMessagewrapsMessage+Option<TurnId>). (Source: [AR])- Depends on: REQ-003, REQ-015
- Definition of Done: An
AgentMessage::Extensionpresent incontext.messagesdoes not appear in theStreamConfigsent to the provider.
Milestone 2.5 — Tool Execution (Happy Path)
-
REQ-045: Implement
execute_tool_callsdispatching to the configuredToolExecutionStrategy. ForParallel(default), useexecute_batch. (Source: [PS])- Depends on: REQ-046
- Definition of Done: Multiple tool calls from one LLM response are dispatched concurrently; results arrive in original call order.
-
REQ-046: Implement
execute_single_tool: find tool by name, emitToolExecutionStart, buildToolContextwith child cancel token and callbacks, calltool.execute(args, ctx), emitToolExecutionEnd, constructMessage::ToolResult, emitMessageStart/MessageEnd, return(ToolResult, is_error). (Source: [PS])- Depends on: REQ-007, REQ-009, REQ-010, REQ-021, REQ-033
- Definition of Done: A registered tool is called; its result is wrapped in a
ToolResultmessage;ToolExecutionStartandToolExecutionEndevents are emitted.
-
REQ-047: Implement
BashTool::execute(basic): extractcommandparam, runbash -c {command}, capture stdout+stderr, construct text output ("Exit code: N\n{stdout}"or"Exit code: N\nSTDOUT:\n{stdout}\nSTDERR:\n{stderr}"), returnOk(ToolResult). (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done:
echo "hello"returnsOk(ToolResult)with text containing"Exit code: 0"and"hello".
-
REQ-048: Implement
ReadFileTool::execute(basic text path): extractpathparam, read file to string, split into lines, apply optionaloffset/limit, produce line-numbered output with header, returnOk(ToolResult). (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done: Reading a known text file returns numbered lines; partial reads with
offset/limitreturn the correct slice with a range header.
-
REQ-049: Implement
WriteFileTool::execute: extractpathandcontentparams, create parent directories as needed, write file, returnOk(ToolResult). (Source: [AR])- Depends on: REQ-010, REQ-021
- Definition of Done: Writing to a path with non-existent parent directories succeeds; file is created on disk with correct content.
-
REQ-050: Implement
EditFileTool::execute(basic): extractpath,old_text,new_text; read file; replace the first occurrence ofold_textwithnew_text; write back; return confirmation text. (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done: A known substitution in an existing file is applied correctly; confirmation message reports old/new line counts.
-
REQ-051: Implement
ListFilesTool::execute(basic): extractpath,pattern,max_depth; build and runfindcommand with exclusions fortarget/,.git/,node_modules/; return file paths as text. (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done: Listing a known directory returns its files; excluded directories do not appear in results.
-
REQ-052: Implement
SearchTool::execute(basic): extractpattern,path,include,case_sensitive; preferrg, fall back togrep; return matching lines. (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done: Searching for a known string in a known directory returns matching file paths and line content.
-
REQ-053: Implement
default_tools()returning aVec<Box<dyn AgentTool>>containing all six built-in tools: Bash, ReadFile, WriteFile, EditFile, ListFiles, Search. (Source: [AR])- Depends on: REQ-047 through REQ-052
- Definition of Done:
default_tools()returns exactly 6 tools with distinct names.
Milestone 2.6 — Context Compaction (Happy Path)
-
REQ-054: Implement
estimate_tokens(text) -> usizeusing the heuristicceil(byte_length / 4). (Source: [PS])- Depends on: —
- Definition of Done:
estimate_tokens("hello")returns 2 (5 bytes / 4, rounded up).
-
REQ-055: Implement
content_tokens(content: Vec<Content>) -> usizeandmessage_tokens(msg: AgentMessage) -> usizeper the specified formulas (image tokens:clamp(raw_bytes/750, 85, 16000); per-message overhead: +4 for user/assistant, +8 for tool result). (Source: [PS])- Depends on: REQ-001, REQ-003, REQ-054
- Definition of Done: Token counts match the specified formulas for each content type.
-
REQ-056: Implement
compact_messages(messages, config) -> Vec<AgentMessage>: if under budget, return unchanged; else cascade through Level 1 → Level 2 → Level 3 until budget is satisfied. (Source: [PS])- Depends on: REQ-055, REQ-057, REQ-058, REQ-059
- Definition of Done:
compact_messagescalled on a history exceeding budget returns a smaller history withtotal_tokens <= budget.
-
REQ-057: Implement
level1_truncate_tool_outputs: for eachToolResultmessage, truncate eachTextcontent block to at mostmax_linesusing head+tail preservation with an omission marker. (Source: [PS])- Depends on: REQ-003, REQ-054
- Definition of Done: A 200-line tool output truncated to
max_lines=50produces a 50-line result with"[... N lines truncated ...]"marker.
-
REQ-058: Implement
level2_summarize_old_turns: keep the lastkeep_recentmessages in full; replace older assistant+tool-result groups with a single one-line summary user message. (Source: [PS])- Depends on: REQ-003, REQ-054
- Definition of Done: Old assistant messages and their tool results are replaced by
"[Summary] ..."user messages; recent messages are untouched.
-
REQ-059: Implement
level3_drop_middle: keepkeep_firsthead messages andkeep_recenttail messages; replace the dropped middle with a marker message. Implementkeep_within_budgetfallback that greedily keeps the most-recent messages fitting the budget. (Source: [PS])- Depends on: REQ-003, REQ-054
- Definition of Done: Result contains the first N and last M messages with a marker; total tokens fits the budget.
-
REQ-060: Integrate
compact_messagescall inrun_loopbefore each LLM call whencontext_configisSome. (Source: [PS])- Depends on: REQ-038, REQ-056
- Definition of Done: When configured, each LLM call is preceded by a compaction pass; when
context_configisNone, no compaction occurs.
Milestone 2.7 — Execution Limits
-
REQ-061: Implement
ExecutionTracker::record_turn(tokens: usize)(incrementsturnsand adds totokens_used) andcheck_limits() -> Option<String>(returns a reason string if any limit is exceeded: turns, total tokens, or wall-clock duration). (Source: [AR])- Depends on: REQ-012
- Definition of Done:
check_limits()returnsNonewhen under all limits andSome("max turns exceeded")when over.
-
REQ-062: Integrate execution limit checking in
run_loop: calltracker.check_limits()at the start of each inner loop iteration; if exceeded, append a syntheticUsermessage"[Agent stopped: {reason}]", emitMessageStart/MessageEnd, and return. (Source: [PS])- Depends on: REQ-038, REQ-061
- Definition of Done: An agent with
max_turns=2stops after exactly 2 LLM calls; the last message contains the stop reason.
Milestone 2.8 — Message Persistence and Agent Control
-
REQ-063: Implement
Agent::save_messages() -> String: serializeagent.messagesto a JSON string. (Source: [OV])- Depends on: REQ-018
- Definition of Done:
save_messages()returns a valid JSON array; the string can be parsed back without error.
-
REQ-064: Implement
Agent::restore_messages(json: &str): deserialize the JSON string intoVec<AgentMessage>and replaceagent.messages. (Source: [OV])- Depends on: REQ-018, REQ-063
- Definition of Done: After
save_messages()→restore_messages(), the agent's message history is identical to the original.
-
REQ-065: Implement
Agent::reset(): clearmessages, drain both queues, cancel any active run, resetis_streamingtofalse, drop the cancel token. (Source: [AR])- Depends on: REQ-033
- Definition of Done: After
reset(),messagesis empty, both queues are empty, andis_streamingis false.
-
REQ-066: Implement
Agent::steer(msg: AgentMessage)(push tosteering_queue) andAgent::follow_up(msg: AgentMessage)(push tofollow_up_queue). (Source: [AR])- Depends on: REQ-027
- Definition of Done: After
steer(msg), the steering queue contains exactly that message and is safe to read from another thread.
-
REQ-067: Implement
Agent::abort(): if a cancel token exists, callcancel()on it. (Source: [AR])- Depends on: REQ-033, REQ-035
- Definition of Done: Calling
abort()during an active run causescancel.is_cancelled()to returntrueinside the running agent loop.
Level 3 — Smart
Goal: The system handles reality. Invalid inputs, missing data, external failures, and edge cases are all handled gracefully. Every
[invariant]andERRORbranch from the pseudocode is implemented.
Completion Criteria: No unhandled exception can be triggered by a known
class of bad input. All error paths from ../architecture/algorithms.md are covered:
provider failures, tool errors, context overflow, execution limits,
filter rejections, and cancellation.
Milestone 3.1 — Input Filter Chain
-
REQ-068: Implement the input filter chain at the start of
agent_loop: join allTextcontent fromUsermessages in prompts, run each registeredInputFilterin order. (Source: [PS])- Depends on: REQ-022, REQ-036
- Definition of Done: A filter registered via
with_input_filteris called with the user's text before any LLM call.
-
REQ-069: On first
Rejectresult, emitInputRejected { reason }thenAgentEnd { messages: [] }and return an empty message list immediately. (Source: [PS])- Depends on: REQ-068
- Definition of Done: A rejecting filter stops the run before the first LLM call; the caller's event stream contains
InputRejectedfollowed byAgentEnd.
-
REQ-070: Accumulate
Warnresults; after all filters pass, append all warning text asContent::Textto the lastUsermessage before it is appended to context. (Source: [PS])- Depends on: REQ-068
- Definition of Done: A warning filter adds
"[Warning: ...]"text to the user message; the run continues normally.
Milestone 3.2 — Retry Engine
-
REQ-071: Implement
delay_for_attempt(config, attempt) -> Duration: exponential backoff formulainitial_delay_ms * (multiplier ^ (attempt - 1)), capped atmax_delay_ms, multiplied by a uniform random jitter in[0.8, 1.2]. (Source: [PS])- Depends on: REQ-013
- Definition of Done: With defaults, attempt 1 produces a duration in
[800ms, 1200ms]; attempt 3 produces a duration in[3200ms, 4800ms].
-
REQ-072: Implement
is_retryable()onProviderError: returnstrueonly forRateLimitedandNetworkvariants. (Source: [AR])- Depends on: REQ-020
- Definition of Done:
Auth,Api,ContextOverflow,Cancelled,Otherall returnfalse;RateLimitedandNetworkreturntrue.
-
REQ-073: Implement
retry_after()onProviderError: extractsretry_after_msfromRateLimited { retry_after_ms: Some(n) }if present; returnsNoneotherwise. (Source: [AR])- Depends on: REQ-020
- Definition of Done:
ProviderError::RateLimited { retry_after_ms: Some(5000) }.retry_after()returnsSome(Duration::from_ms(5000)).
-
REQ-074: Integrate retry loop into
stream_assistant_response: on a retryable error, sleep forretry_after() OR delay_for_attempt(attempt)and retry up tomax_retriestimes; stop retrying ifcancel.is_cancelled(). (Source: [PS])- Depends on: REQ-039, REQ-071, REQ-072, REQ-073
- Definition of Done: A
RateLimitederror causes the loop to wait and retry; after exhausting retries, the error is propagated as anErrorstop reason.
Milestone 3.3 — Provider Error Classification
-
REQ-075: Implement
ProviderError::classify(status: u16, message: String) -> ProviderError: route toContextOverflowfirst (status 400/413 or matching overflow phrase), thenRateLimited(429), thenAuth(401/403), thenApi. (Source: [PS])- Depends on: REQ-020
- Definition of Done: HTTP 429 maps to
RateLimited; HTTP 401 maps toAuth; "prompt is too long" in the body maps toContextOverflow.
-
REQ-076: Implement
is_context_overflow(status, message) -> bool: check for empty body with status 400/413 (Cerebras/Mistral pattern); check for any of 15+ documented overflow phrases (case-insensitive substring match). (Source: [PS])- Depends on: —
- Definition of Done: All 15 documented overflow phrases are recognized; unrelated 400 errors with non-empty body are not misclassified.
-
REQ-077: Implement context overflow recovery: when the streaming error event contains a message matching overflow detection (
Message::is_context_overflow()), treat it as an overflow on the next turn by triggeringcompact_messages(ifcontext_configis set). (Source: [AR])- Depends on: REQ-056, REQ-075, REQ-076
- Definition of Done: A mock that returns an overflow error on turn 1 causes compaction before turn 2.
Milestone 3.4 — Tool Error Handling
-
REQ-078: On
ToolError::Failed(msg)orToolError::InvalidArgs(msg): convert to aToolResultwithcontent: [Text(msg)]andis_error: true; always return this to the LLM so it can self-correct. (Source: [AR])- Depends on: REQ-010, REQ-046
- Definition of Done: A tool that returns
Err(Failed("oops"))produces aToolResultmessage withis_error: trueand the text"oops".
-
REQ-079: On
ToolError::NotFound(name): produceToolResult { content: [Text("Tool {name} not found")], is_error: true }. (Source: [PS])- Depends on: REQ-046
- Definition of Done: Requesting a non-existent tool name in a tool call produces a
NotFounderror result.
-
REQ-080: On
ToolError::Cancelled: produceToolResult { content: [Text("Skipped due to queued user message.")], is_error: true }. (Source: [AR])- Depends on: REQ-010, REQ-046
- Definition of Done: A tool skipped due to steering produces the documented skipped message.
Milestone 3.5 — Error and Abort Stop Reason Handling
-
REQ-081: In
run_loop, when the assistant message hasstop_reason == Error: callon_error(error_message)if defined, callafter_turnif defined, emitTurnEnd, return immediately. (Source: [PS])- Depends on: REQ-038, REQ-082
- Definition of Done: A mock provider that returns an error stop reason causes the loop to exit;
on_erroris called with the message text.
-
REQ-082: In
run_loop, whenstop_reason == Aborted: callafter_turnif defined, emitTurnEnd, return immediately. (Source: [PS])- Depends on: REQ-038
- Definition of Done: Calling
agent.abort()mid-run causes the loop to exit cleanly;TurnEndis emitted.
-
REQ-083: Construct a synthetic error
Message::Assistanton irrecoverable provider failure (after retry exhaustion): empty content,stop_reason: Error,error_message: Some(e.to_string()). (Source: [PS])- Depends on: REQ-002, REQ-039
- Definition of Done: A provider that always fails produces an
Assistantmessage withstop_reason: Errorcontaining the provider's error text.
Milestone 3.6 — Sequential and Batched Tool Execution
-
REQ-084: Implement
execute_sequential: execute tool calls one at a time; after each, check the steering queue; on non-empty steering, skip remaining tools withToolError::Cancelledresults and return steering messages. (Source: [PS])- Depends on: REQ-046, REQ-080
- Definition of Done: With steering arriving after tool 1 of 3, tools 2 and 3 receive skipped error results; the steering message is returned for injection.
-
REQ-085: Implement
execute_batch(Parallel): launch all tools concurrently viajoin_all; after all complete, check steering once; return steering if present. (Source: [PS])- Depends on: REQ-046
- Definition of Done: Three parallel tools all complete; steering arriving before their completion is returned after all finish.
-
REQ-086: Implement
Batched { size }dispatch: split tool calls into groups ofsize; run each group viaexecute_batch; check steering between groups; on steering, skip remaining groups with cancelled results. (Source: [PS])- Depends on: REQ-085
- Definition of Done: With 5 tool calls,
Batched { size: 2 }executes groups [1,2], [3,4], [5]; steering after group 1 skips groups 2 and 3.
Milestone 3.7 — Steering and Follow-up Queue Integration
-
REQ-087: In
run_loop, drain the steering queue at the start of the outer loop before the first inner-loop iteration. (Source: [PS])- Depends on: REQ-038
- Definition of Done: Messages enqueued via
steer()beforeprompt()is called are injected as the first pending messages.
-
REQ-088: After tool execution, if steering messages were captured, set them as
pendingand continue the inner loop (injecting them before the next LLM call). (Source: [PS])- Depends on: REQ-038, REQ-084, REQ-085
- Definition of Done: A steering message injected during tool execution appears in context before the subsequent LLM call.
-
REQ-089: After the inner loop exits (no tool calls, no pending steering), check the follow-up queue; if non-empty, add follow-up messages to
pendingand continue the outer loop. (Source: [PS])- Depends on: REQ-038
- Definition of Done: A follow-up message enqueued via
follow_up()causes the agent to re-enter the loop rather than stopping.
-
REQ-090: Implement
QueueMode::OneAtATime(pop exactly one message per read) andQueueMode::All(drain the entire queue per read). Both modes are thread-safe (mutex-protected). (Source: [AR])- Depends on: REQ-017, REQ-027
- Definition of Done:
OneAtATimeleaves remaining messages in the queue;Allempties it; both are safe to call from the agent loop while another thread pushes.
Milestone 3.8 — Lifecycle Callbacks
-
REQ-091: Call
before_turn(messages, turn_number) -> boolat the start of each turn (before the LLM call). If it returnsfalse, return fromrun_loopimmediately without emittingAgentEnd. (Source: [PS])- Depends on: REQ-038
- Definition of Done: A
before_turnthat returnsfalseon turn 2 stops the loop after turn 1;AgentEndis not emitted.
-
REQ-092: Call
after_turn(messages, usage)after each LLM call and its tool executions, including on error/abort paths. (Source: [PS])- Depends on: REQ-038
- Definition of Done:
after_turnis called exactly once per turn, including when the turn ends in an error.
-
REQ-093: Call
on_error(message: &str)whenstop_reason == Error. (Source: [PS])- Depends on: REQ-081
- Definition of Done: An error-returning provider invokes the
on_errorcallback with the error message string.
Milestone 3.9 — Tool Safety and Edge Cases
-
REQ-094:
BashTool: check eachdeny_patternagainst the command (substring match) before execution; returnErr(Failed("Command blocked..."))on match. (Source: [PS])- Depends on: REQ-047
- Definition of Done: A command containing a deny pattern is rejected before any subprocess is spawned.
-
REQ-095:
BashTool: race subprocess completion against a configurable timeout and the cancellation token; on timeout returnErr(Failed("Command timed out after Ns")); on cancellation returnErr(Cancelled). (Source: [PS])- Depends on: REQ-047
- Definition of Done:
sleep 300with a 2s timeout produces a timeout error; cancellation producesCancelled.
-
REQ-096:
BashTool: truncatestdoutandstderrindependently atmax_output_bytes(default 256KB) and append"\n... (output truncated)". (Source: [PS])- Depends on: REQ-047
- Definition of Done: Output exceeding 256KB is truncated with the documented suffix.
-
REQ-097:
BashTool: optionalconfirm_fncallback; if defined and returnsfalse, returnErr(Failed("Command was not confirmed by the user.")). (Source: [PS])- Depends on: REQ-047
- Definition of Done: A rejecting
confirm_fnprevents subprocess execution.
-
REQ-098:
ReadFileTool: check file size before reading. Text files exceedingmax_bytes(1MB): returnErr(Failed("File too large. Use offset/limit...")). Image files exceeding 20MB: returnErr(Failed("Image too large")). (Source: [PS])- Depends on: REQ-048
- Definition of Done: Reading a file above the size limit returns the documented error without reading the file contents.
-
REQ-099:
ReadFileTool: for image extensions, read file as bytes, base64-encode, detect MIME type from extension, returnContent::Image. (Source: [PS])- Depends on: REQ-001, REQ-048
- Definition of Done: Reading a
.pngfile returns aToolResultwithContent::Image { data: base64, mime_type: "image/png" }.
-
REQ-100:
ReadFileTool: checkctx.cancel.is_cancelled()before each I/O operation; returnErr(Cancelled)if set. (Source: [PS])- Depends on: REQ-048
- Definition of Done: Cancelling before a read returns
Cancelledwithout touching the file.
-
REQ-101:
EditFileTool: ifold_textmatches zero occurrences, attemptfind_similar_textfor a fuzzy hint; returnErr(Failed("old_text not found... Did you mean: ...")). (Source: [PS])- Depends on: REQ-050
- Definition of Done: An edit with wrong
old_textreturns aFailederror; if a similar line exists, the hint is included.
-
REQ-102:
EditFileTool: ifold_textmatches more than one occurrence, returnErr(Failed("old_text matches N locations. Include more context...")). (Source: [PS])- Depends on: REQ-050
- Definition of Done: Attempting to replace ambiguous text returns a descriptive error with the match count.
-
REQ-103:
EditFileTool: checkctx.cancel.is_cancelled()before each I/O operation. (Source: [PS])- Depends on: REQ-050
- Definition of Done: Cancellation before read or write returns
Err(Cancelled).
-
REQ-104:
WriteFileTool: checkctx.cancel.is_cancelled()before writing. (Source: [AR])- Depends on: REQ-049
- Definition of Done: Cancellation prevents the write from occurring.
-
REQ-105:
ListFilesTool: racefindexecution against a timeout (default 10s) and the cancellation token; truncate results atmax_results(default 200) with a truncation suffix. (Source: [PS])- Depends on: REQ-051
- Definition of Done: Listing a directory with 500 files returns 200 with the truncation message.
-
REQ-106:
SearchTool: fall back fromrgtogrepif ripgrep is not available on the system. Checkctx.cancel.is_cancelled()before execution. (Source: [PS])- Depends on: REQ-052
- Definition of Done: Search succeeds on a system without
rginstalled; cancellation is respected.
Milestone 3.10 — Agent Invariants
-
REQ-107: In
prompt_messages_with_sender, assert!self.is_streamingwith a clear panic message before proceeding. (Source: [PS])- Depends on: REQ-035
- Definition of Done: Calling
prompt()while a run is active panics with a message directing the caller to usesteer()orfollow_up().
-
REQ-108: In
agent_loop_continue, validate preconditions:context.messagesis non-empty and the last message is not anAssistantvariant. (Source: [PS])- Depends on: REQ-037
- Definition of Done: Calling
agent_loop_continuewith an empty context or with a trailing assistant message returns an error or panics with a clear message.
Milestone 3.11 — Skill System
-
REQ-109: Implement
SkillSet::load(dirs: Vec<Path>): iterate directories, skip missing ones silently, scan each for subdirectories containingSKILL.md, parse frontmatter, build a name-keyed map (later dirs override earlier on collision), return sortedSkillSet. (Source: [PS])- Depends on: REQ-110
- Definition of Done: Loading two dirs where both contain a skill named
"foo"results in the second dir's version being used.
-
REQ-110: Implement
parse_frontmatter(content) -> (name, description): require content to begin with---, extract YAML block up to next\n---, parsename:anddescription:lines, strip surrounding quotes, returnErr(InvalidFrontmatter)orErr(MissingField)on failure. (Source: [PS])- Depends on: —
- Definition of Done: Valid frontmatter parses correctly; missing
namefield returns aMissingFielderror; missing delimiters returnInvalidFrontmatter.
-
REQ-111: Implement
SkillSet::format_for_prompt(): emit<available_skills>XML block with one<skill>element per skill (sorted by name ascending), XML-escaping all string values; return empty string if no skills loaded. (Source: [PS])- Depends on: REQ-109
- Definition of Done: Output is well-formed XML; special characters in skill names/descriptions are correctly escaped.
-
REQ-112: Implement
SkillSet::load_dir(dir, source)andSkillSet::merge(other). (Source: [AR])- Depends on: REQ-109
- Definition of Done:
mergecauses the other's skills to override on name conflict.
-
REQ-113: Implement
Agent::with_skills(skill_set): callformat_for_prompt()and append the XML block toself.system_prompt. (Source: [PS])- Depends on: REQ-111
- Definition of Done: After
with_skills(set), the agent's system prompt contains the<available_skills>XML block.
Milestone 3.12 — MCP Client
-
REQ-114: Implement
McpClient::connect_stdio(cmd, args, env): spawn subprocess with piped stdin/stdout; complete the 3-step initialize handshake; returnOk(McpClient). (Source: [PS])- Depends on: REQ-115, REQ-116
- Definition of Done: Spawning a compliant MCP server subprocess results in a connected client;
server_infois populated from the handshake.
-
REQ-115: Implement
McpClient::send_request(method, params): construct a JSON-RPC 2.0 request with auto-incremented atomic ID, send over transport, receive response, returnErr(JsonRpc{...})on error field orErr(Protocol("Empty result"))on missing result. (Source: [PS])- Depends on: —
- Definition of Done: A JSON-RPC response with an error field maps to
McpError::JsonRpc; a valid result field is returned asOk(value).
-
REQ-116: Implement
McpClient::list_tools()andMcpClient::call_tool(name, args). (Source: [PS])- Depends on: REQ-115
- Definition of Done:
list_tools()returns a parsedVec<McpToolInfo>;call_tool()returns a parsedMcpToolCallResult.
-
REQ-117: Implement
McpToolAdapterimplementingAgentTool: wrapsMcpToolInfometadata and anArc<Mutex<McpClient>>;execute()callsclient.call_tool()and convertsMcpContenttoContentvariants. (Source: [AR])- Depends on: REQ-001, REQ-021, REQ-116
- Definition of Done: An
McpToolAdaptercan be registered on an agent and called successfully in a tool-use turn.
-
REQ-118: Handle all
McpErrorvariants gracefully:Transport,Protocol,JsonRpc,Serialization,Io,ConnectionClosedall surface asToolError::Failedwith descriptive messages. (Source: [AR])- Depends on: REQ-117
- Definition of Done: Each
McpErrorvariant produces a non-panickingToolError::Failedwith a message identifying the error type and context.
-
REQ-119: Implement
Agent::with_mcp_server_stdio(cmd, args, env): callMcpClient::connect_stdio, thenMcpToolAdapter::from_client, append resulting tool adapters toself.tools. (Source: [AR])- Depends on: REQ-114, REQ-117
- Definition of Done: After
with_mcp_server_stdio, the agent's tool list includes all tools reported by the MCP server.
Level 4 — Professional
Goal: The system is safe, observable, and maintainable. It can be operated with multiple provider backends, supports prompt caching and extended thinking, exposes useful observability hooks, and shuts down gracefully.
Completion Criteria: All 7 provider protocols are implemented. Prompt caching, thinking levels, structured logging, and security-sensitive fields are all handled. The cancellation tree propagates correctly to all I/O boundaries. The system is configurable for production use.
Milestone 4.1 — Full Provider Suite
-
REQ-120: Implement
GoogleProvider::stream(Gemini API): POST to{base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={API_KEY}; use custom SSE parser (split on\n\n, extractdata:line); map tool calls fromfunctionDeclarations; auto-generate tool IDs as"google-fc-{index}"; tool results asfunctionResponseparts. (Source: [AR])- Depends on: REQ-020
- Definition of Done: A Gemini streaming response is parsed into the correct
StreamEvents; tool IDs are auto-generated in the documented format.
-
REQ-121: Implement
GoogleVertexProvider::stream(Vertex AI): identical wire format to Gemini; endpoint patternhttps://{region}-aiplatform.googleapis.com/...; auth viaAuthorization: Bearer {OAUTH_TOKEN}; tool IDs as"vertex-fc-{index}". (Source: [AR])- Depends on: REQ-120
- Definition of Done: Vertex request differs from Gemini only in endpoint and auth header.
-
REQ-122: Implement
BedrockProvider::stream(ConverseStream API): endpoint{base_url}/model/{model}/converse-stream; newline-delimited JSON (not standard SSE); parse eventscontentBlockDelta,contentBlockStart,contentBlockStop,messageStop,metadata; tool spec format:toolSpec { inputSchema: { json: schema } }; tool result format:{ toolResult: { toolUseId, content, status } }. (Source: [AR])- Depends on: REQ-020
- Definition of Done: A Bedrock ndjson streaming response is correctly parsed; tool definitions and results are in the Bedrock-specific format.
-
REQ-123: Implement
OpenAiResponsesProvider::stream(OpenAI Responses API): endpoint{base_url}/responses; system prompt in"instructions"field; SSE eventsresponse.output_text.delta,response.reasoning.delta,response.function_call_arguments.*,response.completed. (Source: [AR])- Depends on: REQ-020
- Definition of Done: The Responses API wire format differs correctly from Chat Completions in system prompt field and event names.
-
REQ-124: Implement
AzureOpenAiProvider::stream: endpoint{base_url}/responses?api-version=2025-01-01-preview; auth viaapi-key: {AZURE_OPENAI_API_KEY}header (notAuthorization: Bearer); same request/response format as OpenAI Responses API. (Source: [AR])- Depends on: REQ-123
- Definition of Done: Azure auth uses
api-keyheader; base URL patternhttps://{resource}.openai.azure.com/openai/deployments/{deployment}is supported.
-
REQ-125: Register all 7 providers (Anthropic, OpenAiCompat, OpenAiResponses, Azure, Google, Vertex, Bedrock) in
ProviderRegistry::default(). (Source: [AR])- Depends on: REQ-042, REQ-120 through REQ-124
- Definition of Done:
ProviderRegistry::default()can dispatch to any of the 7 implementations based on protocol selection.
Milestone 4.2 — Prompt Caching
-
REQ-126: Implement
CacheStrategy::Auto: provider automatically placescache_control: { type: "ephemeral" }breakpoints at the system prompt, the last tool definition, and the second-to-last message. (Source: [AR])- Depends on: REQ-014, REQ-040
- Definition of Done: In Anthropic requests, the three cache breakpoints appear in the correct positions when
strategy: Auto.
-
REQ-127: Implement
CacheStrategy::Manual { cache_system, cache_tools, cache_messages }: conditionally apply breakpoints per flag. ImplementCacheStrategy::Disabled: no breakpoints emitted. (Source: [AR])- Depends on: REQ-126
- Definition of Done: Each flag independently controls placement of its respective cache breakpoint.
-
REQ-128: Propagate
Usage.cache_readandUsage.cache_writefrom Anthropic response metadata intoMessage::Assistant.usage. (Source: [AR])- Depends on: REQ-006, REQ-040
- Definition of Done: Cache token counts from Anthropic are populated in the usage struct after a cached-hit response.
Milestone 4.3 — Extended Thinking
-
REQ-129: Map
ThinkingLevelto Anthropicthinkingparameter:Off→ omit;Minimal→budget_tokens: 128;Low→ 512;Medium→ 2048;High→ 8192. (Source: [AR])- Depends on: REQ-019, REQ-040
- Definition of Done: Setting
ThinkingLevel::Mediumcauses{type:"enabled", budget_tokens:2048}to appear in the Anthropic request.
-
REQ-130: Map
ThinkingLevelto OpenAI-compatreasoning_effortparameter whensupports_reasoning_effortflag is set:Minimal/Low→"low";Medium→"medium";High→"high". (Source: [AR])- Depends on: REQ-019, REQ-041
- Definition of Done:
ThinkingLevel::Highwith a reasoning-capable provider producesreasoning_effort: "high"in the request body.
-
REQ-131: Parse
Thinkingcontent blocks from streaming responses (Anthropicthinkingtype blocks; OpenAIdelta.reasoning_content/ xAIdelta.reasoning); emit asStreamDelta::Thinkingand store asContent::Thinkingin the final message. (Source: [AR])- Depends on: REQ-001, REQ-008, REQ-040
- Definition of Done: A streaming response containing thinking/reasoning content produces
MessageUpdateevents withStreamDelta::Thinkingand the finalContent::Thinkingblock in the assembled message.
Milestone 4.4 — MCP HTTP Transport
-
REQ-132: Implement
McpClient::connect_http(url): POST JSON-RPC bodies to the configured URL (stateless, no persistent connection); complete the initialize handshake. (Source: [AR])- Depends on: REQ-115
- Definition of Done: An HTTP-based MCP server can be connected to and queried for tools.
-
REQ-133: Implement
Agent::with_mcp_server_http(url)builder. Support optional tool name prefix ({prefix}__{name}) for namespace disambiguation. (Source: [AR])- Depends on: REQ-117, REQ-132
- Definition of Done: HTTP MCP tools appear in the agent's tool list; with a prefix configured, tool names are formatted as
"{prefix}__{name}".
-
REQ-134: On MCP stdio transport shutdown, send EOF on stdin then kill the child process. (Source: [AR])
- Depends on: REQ-114
- Definition of Done: Dropping or closing the stdio MCP client terminates the child process cleanly.
Milestone 4.5 — Observability and Logging
-
REQ-135: Implement structured retry logging: when a retry occurs, log attempt number, max retries, delay, and the triggering error at an appropriate log level. (Source: [PS])
- Depends on: REQ-074
- Definition of Done: A retried request produces a structured log entry containing all four fields.
-
REQ-136: Implement
ContextTracker: combine provider-reported token counts (fromUsage) with localestimate_tokensfor messages appended since the last provider report. Exposecurrent_tokens() -> usize. (Source: [AR])- Depends on: REQ-054, REQ-055
- Definition of Done: After a turn with known provider-reported usage,
current_tokens()reflects the reported value; after additional messages are appended, it adds heuristic estimates.
-
REQ-137: Populate
ToolResult.detailswith structured metadata per tool:BashTool→{ exit_code, success };ReadFileTool→{ path };WriteFileTool→{ path };EditFileTool→{ path, old_lines, new_lines };ListFilesTool→{ total, truncated };SubAgentTool→{ sub_agent, turns }. (Source: [AR])- Depends on: REQ-047 through REQ-052
- Definition of Done:
ToolResult.detailsfor a bash execution containsexit_codeandsuccesskeys.
Milestone 4.6 — Security
-
REQ-138: Redact sensitive
OpenApiAuthcredentials in debug output:Bearer(token)displays asBearer("****");ApiKey { value }displays asApiKey { header: "...", value: "****" }. (Source: [AR])- Depends on: —
- Definition of Done: Printing/logging an
OpenApiAuth::Bearer("secret")value produces"****"instead of the actual token.
-
REQ-139: Implement the complete
BashTooldeny-pattern list (configurable; default list to be specified at implementation time based on the safety policy described in the spec). (Source: [PS])- Depends on: REQ-094
- Definition of Done: A configurable list of deny patterns is applied; at least the patterns documented in the spec are included in the default list.
Milestone 4.7 — Graceful Cancellation
-
REQ-140: Implement
CancellationToken::child_token(): creates a new token that is cancelled when the parent is cancelled. EachToolContextreceives a child token. (Source: [PS])- Depends on: REQ-033, REQ-046
- Definition of Done: Calling
agent.abort()(which cancels the root token) causes all active tool contexts'cancel.is_cancelled()to returntruesimultaneously.
-
REQ-141:
SubAgentToolforwards the parent's cancel token to the childagent_loop(), soagent.abort()terminates sub-agents as well. (Source: [PS])- Depends on: REQ-033, REQ-140
- Definition of Done: Aborting the parent agent cancels the sub-agent's run.
Milestone 4.8 — Callbacks and Advanced Configuration
-
REQ-142: Implement
on_updatecallback inToolContext: when called, emitsAgentEvent::ToolExecutionUpdate { tool_call_id, tool_name, partial_result }to the event channel. (Source: [AR])- Depends on: REQ-007, REQ-046
- Definition of Done: A tool that calls
ctx.on_update(partial)causesToolExecutionUpdateevents to appear in the stream beforeToolExecutionEnd.
-
REQ-143: Implement
on_progresscallback inToolContext: when called, emitsAgentEvent::ProgressMessage { tool_call_id, tool_name, text }. (Source: [AR])- Depends on: REQ-007, REQ-046
- Definition of Done: A tool that calls
ctx.on_progress("working...")causes aProgressMessageevent in the stream.
-
REQ-144: Implement
Agent::prompt_with_sender(text, tx): likeprompt, but streams events to a caller-provided sender rather than creating a new channel. (Source: [AR])- Depends on: REQ-034
- Definition of Done: Events are sent to the provided
tx; the caller can multiplex one sender across multiple prompts.
-
REQ-145: Implement
transform_contextandconvert_to_llmoptional hooks onAgentLoopConfig. When set,stream_assistant_responsecalls them to preprocess messages before buildingStreamConfig. (Source: [PS])- Depends on: REQ-039
- Definition of Done: A
transform_contexthook that adds a prefix message causes that message to appear in every LLM call.
-
REQ-146: Implement
Agent::with_compaction_strategy(strategy)builder; when set, use the customCompactionStrategyinstead of the default tiered cascade. (Source: [AR])- Depends on: REQ-023, REQ-060
- Definition of Done: A custom strategy that always returns an empty list causes the LLM to be called with no history.
-
REQ-147: Define
ModelConfigstruct with fields:base_url: Option<String>,headers: Map<String,String>,max_tokens_field: String(default"max_tokens"),supports_developer_role: bool,supports_reasoning_effort: bool. Apply inOpenAiCompatProvider. (Source: [AR])- Depends on: REQ-041
- Definition of Done: Setting
max_tokens_field: "max_completion_tokens"causes the OpenAI provider to use that key in the request body.
Milestone 4.9 — Agent Identity and Event Hook Observability
-
REQ-180: Define
ContinuationKindenum intypes.rswith three variants:Default(unspecified continuation),Rerun { tag: String }(retry from equivalent context),Branch { tag: String }(different execution path). Tags are RFC 3339 UTC timestamps auto-generated at call time by the caller. (Source: [AR])- Depends on: —
- Definition of Done: All three variants instantiate;
Rerun { tag }andBranch { tag }round-trip through JSON serialization preserving the tag string.
-
REQ-181: Define
TurnTriggerenum intypes.rswith four variants:User(first turn of origin call),SubAgent(sub-agent invocation),Continuation(subsequent turns, tool round-trips, steering, Default/Rerun continuations),Branch(first turn of a Branch continuation). Addtriggered_by: TurnTriggerfield toAgentEvent::TurnStart. (Source: [AR])- Depends on: REQ-007
- Definition of Done:
TurnStartevents carry the correcttriggered_byvalue: origin calls emitUseron turn 0; Branch continuations emitBranchon turn 0; all other first turns and all subsequent turns emitContinuation.
-
REQ-182: Add
before_loop: Option<BeforeLoopFn>andafter_loop: Option<AfterLoopFn>toAgentLoopConfig.BeforeLoopFnfires beforeAgentStart— returnfalseto abort the loop (emitAgentEnd { messages: [] }instead).AfterLoopFnfires afterAgentEndwith the new messages and accumulated usage. Both are wired inagent_loopandagent_loop_continue. (Source: [AR])- Depends on: REQ-036, REQ-037
- Definition of Done: A
before_loopreturningfalsestops the run beforeAgentStart;after_loopis called exactly once per loop call, afterAgentEnd, with correct message and usage values.
-
REQ-183: Add
before_tool_execution: Option<BeforeToolExecutionFn>andafter_tool_execution: Option<AfterToolExecutionFn>toAgentLoopConfig.BeforeToolExecutionFnfires beforeToolExecutionStart— returnfalseto skip the tool (emit skipped error result).AfterToolExecutionFnfires afterToolExecutionEnd. (Source: [AR])- Depends on: REQ-046
- Definition of Done: A
before_tool_executionreturningfalsefor one tool causes that tool to be skipped with an error result; other tools in the same batch are unaffected.after_tool_executionis called exactly once per tool call.
-
REQ-184: Add
before_tool_execution_update: Option<BeforeToolExecutionUpdateFn>andafter_tool_execution_update: Option<AfterToolExecutionUpdateFn>toAgentLoopConfig.BeforeToolExecutionUpdateFnfires before eachToolExecutionUpdate— returnfalseto suppress the event (tool keeps running, finalToolResultunaffected).AfterToolExecutionUpdateFnfires after the event when not suppressed. (Source: [AR])- Depends on: REQ-142
- Definition of Done: Suppressing an update via
before_tool_execution_updatecauses noToolExecutionUpdateevent to be emitted;after_tool_execution_updateis not called for suppressed updates.
-
REQ-185: Enforce and document the event hook ordering invariant:
before_loop → AgentStart … before_turn → TurnStart … before_tool_execution → ToolExecutionStart … (before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)* … ToolExecutionEnd → after_tool_execution … TurnEnd → after_turn … AgentEnd → after_loop. No hook may fire out of this sequence. (Source: [AR])- Depends on: REQ-182, REQ-183, REQ-184
- Definition of Done: An integration test with all hooks registered verifies they fire in the documented order for a multi-turn, multi-tool run.
-
REQ-186: Add
fn provider_id(&self) -> &stras a required method on theStreamProvidertrait (src/provider/traits.rs). Implement in all 7 providers:"anthropic","openai","openai_responses","azure_openai","google","google_vertex","bedrock". TheMockProviderreturns"mock". (Source: [AR])- Depends on: REQ-020
- Definition of Done: All 8
StreamProviderimplementations compile withprovider_id()returning the documented string; existing tests pass unchanged.
-
REQ-187: Add
config_id: Option<String>field toAgentLoopConfig. WhenNone,Agent::next_loop_id()auto-derives the effective config ID as"{provider_id}.{model_slug}[.thinking]". WhenSome, the supplied value is used verbatim. Used as the middle segment ofloop_id:"{session_id}.{config_id}.{N}". (Source: [AR])- Depends on: REQ-029, REQ-186
- Definition of Done: Setting
config_id: Some("my-config")causesloop_idto include"my-config"as its middle segment; leavingNoneproduces an auto-derived segment from provider + model.
-
REQ-188: Add
agent_id: Stringandsession_id: Stringfields toAgentstruct, both initialized to UUID v4 inAgent::new(). These are stable for the lifetime of theAgentinstance and injected into everyAgentContextbuilt byAgent::prompt_*andcontinue_loop_*. (Source: [AR])- Depends on: REQ-024
- Definition of Done: All
AgentStartevents emitted by a singleAgentinstance share the sameagent_idandsession_idvalues across multipleprompt()calls.
-
REQ-189: Add
loop_counters: HashMap<String, usize>andlast_loop_id: Option<String>toAgent. ImplementAgent::next_loop_id(config) -> String: computeeffective_config_idfromconfig.config_idor auto-derivation; increment the per-"{session_id}.{effective_config_id}"counter; return"{session_id}.{effective_config_id}.{N}". Setlast_loop_idafter eachprompt_*/continue_loop_*call. (Source: [AR])- Depends on: REQ-187, REQ-188
- Definition of Done: Two
agent_loopcalls on the same agent with the same provider/model produceloop_idvalues ending in.1and.2respectively; different configs produce independent counters (both.1).
-
REQ-190: Add
agent_id,session_id,loop_id,parent_loop_id, andcontinuation_kindfields toAgentContext. Inagent_loop, generate and write backagent_id/session_id/loop_idifNoneat entry.parent_loop_idandcontinuation_kindremain whatever the caller set. (Source: [AR])- Depends on: REQ-028, REQ-180, REQ-189
- Definition of Done: After
agent_loopreturns,context.agent_id,context.session_id, andcontext.loop_idare allSome; a subsequentagent_loop_continueon the same context can read them without regenerating.
-
REQ-191: In
agent_loop_continue, assertcontext.agent_id.is_some()andcontext.session_id.is_some()with descriptive panic messages. Do not silently generate new UUIDs. (Source: [AR])- Depends on: REQ-037, REQ-190
- Definition of Done: Calling
agent_loop_continuewithagent_id: Nonepanics with a message referencing "agent_loop_continue requires context.agent_id to be set"; with both fieldsSome, the assertion passes.
-
REQ-192: Add
agent_id: String,session_id: String,loop_id: String,parent_loop_id: Option<String>, andcontinuation_kind: Option<ContinuationKind>toAgentEvent::AgentStart. Emit these fields from bothagent_loopandagent_loop_continue.parent_loop_idisNonefor origin calls;continuation_kindisNonefor origin calls andSome(...)for continuations. (Source: [AR])- Depends on: REQ-007, REQ-180, REQ-190, REQ-191
- Definition of Done:
AgentStartevents fromagent_loophaveparent_loop_id: Noneandcontinuation_kind: None; events fromagent_loop_continuecarry the values set onAgentContext.
-
REQ-193: In
run_loop, determineTurnTriggerfor the first turn based oncontext.continuation_kind:Branch(..)→TurnTrigger::Branch; any otherSome(..)→TurnTrigger::Continuation;None→config.first_turn_trigger(defaultUser;SubAgentfor sub-agent callers). All subsequent turns useTurnTrigger::Continuation. Emittriggered_byinAgentEvent::TurnStart. (Source: [AR])- Depends on: REQ-038, REQ-181
- Definition of Done: A
Branchcontinuation emitsTurnTrigger::Branchon turn 0 andTurnTrigger::Continuationon all subsequent turns; aDefaultcontinuation emitsTurnTrigger::Continuationon all turns.
-
REQ-194: Add
child_loop_id: Option<String>to bothToolResultandAgentEvent::ToolExecutionEnd. Sub-agent tools setToolResult.child_loop_idto the child loop'sloop_idafteragent_loopcompletes.execute_single_toolpropagatesresult.child_loop_idintoToolExecutionEnd. Non-sub-agent tools leave both fieldsNone. (Source: [AR])- Depends on: REQ-010, REQ-046, REQ-148, REQ-190
- Definition of Done: A
ToolExecutionEndevent from aSubAgentToolcall carries a non-Nonechild_loop_id; the sameloop_idappears in the child'sAgentStartevent.
-
REQ-195: Add
SubAgentTool::with_parent_loop_id(loop_id: String)builder method. When set, the childAgentContextbuilt insideexecute()hasparent_loop_id: Some(loop_id). The child'sAgentStartevent thus carriesparent_loop_id, enabling ancestry tracing from child back to parent. (Source: [AR])- Depends on: REQ-148, REQ-190
- Definition of Done: A sub-agent tool configured with
with_parent_loop_id("parent.loop.1")emits a childAgentStartevent withparent_loop_id: Some("parent.loop.1").
Milestone 4.10 — Evaluational Parallelism
-
REQ-196: Migrate
AgentContext.toolsfromVec<Box<dyn AgentTool>>toVec<Arc<dyn AgentTool>>. Add#[derive(Clone)]toAgentContext. UpdateAgent::set_tools,BasicAgent::with_tools,default_tools()return type, and all push sites inBasicAgent(sub-agent, openapi, mcp). RemoveArcToolWrapperfromsub_agent.rs. (Implemented)- Depends on: REQ-028, REQ-046
- Definition of Done:
AgentContext: Clone; all existing tests pass;ArcToolWrapperdeleted.
-
REQ-197: Add
Usage::combine(&self, other: &Usage) -> Usagemethod for summing usage across branches. (Implemented)- Depends on: —
- Definition of Done:
usage_a.combine(&usage_b)returns aUsagewith all fields summed.
-
REQ-198: Add
ParallelLoopOutcomeandParallelLoopResultstructs totypes.rs. AddAgentEvent::ParallelLoopStart { session_id, loop_ids, timestamp }andAgentEvent::ParallelLoopEnd { session_id, selected_loop_id, selected_config_index, evaluation_usage, timestamp }variants toAgentEvent. (Implemented)- Depends on: REQ-190, REQ-197
- Definition of Done: Both structs construct and the enum variants match correctly.
-
REQ-199: Define
EvaluationDecisionenum andEvaluationStrategytrait intypes.rs. Trait method:evaluate(prompts, outcomes, tx, cancel) -> (EvaluationDecision, Usage). Placed intypes.rs(notevaluation.rs) to avoid a circular dependency withagent_loop.rs. (Implemented)- Depends on: REQ-198
- Definition of Done: Custom implementations compile by importing from
crate::typesorcrate::evaluation.
-
REQ-200: Create
src/agent_loop/evaluation.rswith five built-inEvaluationStrategyimplementations:TransparentEvaluation(single-branch pass-through),PickFirstEvaluation(always index 0),TokenEfficientEvaluation(lowesttotal_tokens),ElaborateEvaluation(highesttotal_tokens),LlmJudgeEvaluation { judge_config, system_prompt }. (Implemented)- Depends on: REQ-199
- Definition of Done: All five strategies implement
EvaluationStrategy; unit tests pass for each.
-
REQ-201:
LlmJudgeEvaluation— judge prompt construction: extract original query text from user messages inpromptsonly; extract final assistant text from each branch'snew_messages(strip tool calls, tool results, intermediate turns). Build numbered judge prompt; runagent_loopwithjudge_config; parse first integer from reply; inheritsession_idfrom branches for traceability. (Implemented)- Depends on: REQ-200
- Definition of Done: Judge receives clean final responses, not raw tool traces; judge
AgentStarthas samesession_idas branches.
-
REQ-202:
LlmJudgeEvaluation— judge's comprehension criteria: all N branch final responses must fit in the judge model's context budget simultaneously. Apply iterative multi-tier compaction: tier 1 (last 80 lines), tier 2 (first+last paragraph), tier 3 (hard char limit derived from budget / N). Budget derives fromjudge_config.context_config.max_context_tokens(if set). EmitAgentEvent::ProgressMessagewarning if criteria cannot be satisfied after tier 3. Selected winner always returns the original uncompacted messages. (Implemented)- Depends on: REQ-201
- Definition of Done: With a tight
context_config.max_context_tokens, compaction fires and a warning is emitted; selected output is the original branch content.
-
REQ-203: Add
derive_config_segment(config: &AgentLoopConfig) -> Stringhelper (pub crate) andrun_parallel_branches(...)internal async function toagent_loop.rs. Addagent_loop_parallel(prompts, base_context, configs, strategy, tx, cancel) -> ParallelLoopResultpublic async function. Usesfutures::future::join_allfor branch concurrency (avoids'staticbound onAgentLoopConfighooks). Per-branch forwarder task (tokio::spawn) captures usage fromAgentEnd. (Implemented)- Depends on: REQ-196, REQ-199
- Definition of Done:
agent_loop_parallelwith 2 configs runs both branches, emitsParallelLoopStart/ParallelLoopEnd, and returns correctselected_index.
-
REQ-204: Export
evaluationmodule fromlib.rs; re-exportagent_loop_paralleland all five evaluation strategies at crate root. (Implemented)- Depends on: REQ-200, REQ-203
- Definition of Done:
use phi_core::{agent_loop_parallel, PickFirstEvaluation, LlmJudgeEvaluation}compiles.
-
REQ-205:
agent_loop_parallelroutes toagent_loop_continuewhenpromptsis empty. (Implemented)- Depends on: REQ-203
- Definition of Done: Calling
agent_loop_parallel(vec![], ctx_with_user_msg, ...)dispatches each branch viaagent_loop_continueand returns a validParallelLoopResult.
-
REQ-206: Add
original_context_len: usizetoParallelLoopOutcome. (Implemented)- Depends on: REQ-198, REQ-205
- Definition of Done:
outcome.context.messages[..outcome.original_context_len]is the shared base context;[original_context_len..]are branch-produced messages.
-
REQ-207:
LlmJudgeEvaluationextracts prior conversation context and query fromcontext.messages[..original_context_len]inagent_loop_continuemode; includes formatted prior-context transcript in judge prompt. (Implemented)- Depends on: REQ-201, REQ-206
- Definition of Done: When
promptsis empty, the judge prompt contains"Prior conversation context:"and"Original query:"sections derived from the original context.
-
REQ-208: Replace single-pass output compaction with 2-iteration
compact_for_judge: Iteration 1 compacts prior context only (outputs intact); Iteration 2 compacts both independently. (Implemented)- Depends on: REQ-202, REQ-207
- Definition of Done: Under a tight token budget, outputs remain uncompacted as long as prior-context compaction alone can satisfy the criteria.
-
REQ-209: Updated
build_judge_user_messageincludes optional prior context section before the query. (Implemented)- Depends on: REQ-207
- Definition of Done: Judge prompt includes
"Prior conversation context:\n<transcript>"when prior context is non-empty; omitted when empty (fresh-session case).
Level 5 — Creative
Goal: The system surpasses the original. Sub-agent delegation, OpenAPI tool generation, advanced Anthropic protocol features, and all documented ambiguities are resolved with principled design decisions.
Completion Criteria: SubAgentTool works end-to-end; the OpenAPI adapter
generates callable tools from a spec file; all [AMBIGUOUS] items have a
documented resolution; performance benchmarks for parallel tool execution
meet or exceed documented expectations.
Milestone 4.11 — Persistent Session Layer
-
REQ-210: Add
loop_id: Stringto allAgentEventvariants that lacked it (AgentEnd,TurnStart,TurnEnd,MessageStart,MessageUpdate,MessageEnd,ToolExecutionStart,ToolExecutionUpdate,ToolExecutionEnd,ProgressMessage,InputRejected). AddSerialize, DeserializetoAgentEvent,ContinuationKind,TurnTrigger,StreamDelta. Threadloop_idthrough all emission sites inagent_loop.rsandevaluation.rs. (Source: [AR])- Depends on: REQ-007, REQ-114
- Definition of Done: All
AgentEventvariants carryloop_id; events from interleaved parallel branches can be unambiguously attributed to the correctLoopRecord.
-
REQ-211: Define
Session,LoopRecord,LoopEvent, andLoopConfigSnapshottypes insrc/session/.Sessioncontains an orderedVec<LoopRecord>;LoopRecordholds identity fields (loop_id,session_id,agent_id), timing, status, messages (fromAgentEnd.messages), usage, events, and tree links (children_loop_ids,parent_loop_id).LoopConfigSnapshotstoresmodel,provider,config_id. (Source: [AR])- Depends on: REQ-210
- Definition of Done: All types serialize/deserialize (JSON round-trip lossless);
Session.total_usage()sumsLoopRecord.usageacross all loops.
-
REQ-212: Define
ChildLoopRefandSpawnReffor bidirectional cross-session sub-agent tracking.ChildLoopRefis stored inLoopRecord.child_loop_refs(parent → child);SpawnRefis stored inSession.parent_spawn_ref(child → parent). Both carrytool_call_id,tool_name, and cross-session ids. (Source: [AR])- Depends on: REQ-211
- Definition of Done: A parent session's
LoopRecord.child_loop_refscan be used to load and link the child session.
-
REQ-213: Define
ParallelGroupRecordand implementLoopStatus::Pendingpre-registration inSessionRecorder. WhenParallelLoopStartarrives, pre-createLoopRecord { status: Pending }for each branch loop_id so the group is registered beforeAgentStartfires for each branch.ParallelLoopEndretroactively setsParallelGroupRecordon all branch records. (Source: [AR])- Depends on: REQ-211
- Definition of Done: After a parallel loop completes, all branch
LoopRecords haveparallel_groupset; exactly one hasis_selected = true.
-
REQ-214: Implement
SessionRecorderwithPerSessionIdformation policy.on_event(event)routes events byloop_id: createsSessionon first-seensession_idfromAgentStart; closesLoopRecordonAgentEnd; appends bidirectional tree links; handles sub-agentSpawnRefenrichment fromToolExecutionEnd.child_loop_id. (Source: [AR])- Depends on: REQ-211, REQ-212, REQ-213
- Definition of Done:
test_session_recorder_single_loop,test_session_recorder_continuation,test_session_recorder_bidirectional_tree,test_session_recorder_continuation_kindall pass.
-
REQ-215: Add
BasicAgent::new_session()andcheck_and_rotate(threshold)toBasicAgent. Addlast_active_at: Option<DateTime<Utc>>field; updateprompt_messages_with_senderto record it.new_session()rotatessession_id, clearsloop_countersandlast_loop_id. (Source: [AR])- Depends on: REQ-214
- Definition of Done:
test_basic_agent_new_sessionandtest_basic_agent_check_and_rotatepass.
-
REQ-216: Implement
save_session,load_session,list_session_idspersistence API. File layout:{dir}/{session_id}.json(pretty-printed JSON, flat directory).list_session_idsreturns ids sorted by modification time (newest first). (Source: [AR])- Depends on: REQ-211
- Definition of Done:
test_session_save_load_roundtripandtest_session_list_idspass; saved files are valid, human-readable JSON.
-
REQ-217: Implement
load_sessions_for_agentanddelete_session.load_sessions_for_agentloads all sessions indirand filters byagent_id.delete_sessionremoves the file; returnsSessionError::NotFoundif absent. (Source: [AR])- Depends on: REQ-216
- Definition of Done:
test_session_deletepasses;load_sessions_for_agentreturns only sessions with the matchingagent_id.
-
REQ-218: Implement
Sessiontree navigation methods:root_loops(),children_of(loop_id),parallel_siblings(loop_id),get_loop(loop_id). Export all public session types fromsrc/lib.rs. (Source: [AR])- Depends on: REQ-211
- Definition of Done:
test_session_recorder_parallel_groupandtest_session_recorder_bidirectional_treeexercise all navigation methods; all assertions pass.
-
REQ-219: Write
docs/concepts/sessions.mddocumenting: Overview, Session Formation (three modes), LoopRecord Anatomy (field table,LoopStatuslifecycle,continuation_kindclassification,LoopConfigSnapshotrationale), Loop Tree Navigation, Cross-Session Sub-Agent Tracking, Parallel Evaluation Groups,SessionRecorderusage with code example, Persistence API, and 9 Design Decisions (each with decision / why / rejected alternative). (Source: [AR])- Depends on: REQ-211 – REQ-218
- Definition of Done:
docs/concepts/sessions.mdexists; covers all listed sections; code examples are syntactically valid Rust.
-
REQ-220: Update
docs/specs/architecture.md: addSessionStorecomponent section, addSessionStoreto dependency graph, updateAgentEventvariant table to documentloop_id: Stringon all applicable variants, addSession/LoopRecord/SessionRecorderdata model entries, addnew_session()/check_and_rotate()/last_active_atto BasicAgent interface table. Updatedocs/specs/roadmap.mdwith this milestone. (Source: [AR])- Depends on: REQ-219
- Definition of Done: Both spec files updated; all new types and methods are documented.
-
REQ-221: Fix
SessionRecorderSpawnRefenrichment to handle the case where the child session has already been moved tocompletedbefore the parent'sToolExecutionEndfires. Currently,ToolExecutionEndonly searchesopen_sessionsfor the child session to enrichparent_spawn_ref.tool_call_id/tool_name; ifflush()was called betweenchild AgentEndand the parent'sToolExecutionEnd(e.g. periodic batch checkpointing in production), the child session is incompletedand the enrichment is silently skipped — leavingtool_call_id: ""andtool_name: ""on theSpawnRefpermanently. Fix by also searchingcompletedsessions in the enrichment step, or by deferring child-session promotion tocompleteduntil the parent loop also closes. (Source: post-sprint review)- Depends on: REQ-214
- Definition of Done: A test demonstrates that calling
flush()betweenchild AgentEndandparent ToolExecutionEndstill produces a fully-enrichedSpawnRefon the child session.
Milestone 5.1 — Sub-Agent Delegation
-
REQ-148: Implement
SubAgentTool::execute: validateparams["task"]is non-empty; build a freshAgentContext(empty messages, own toolset); buildAgentLoopConfigwithmax_turnsguard (default 10), no steering/follow-ups, no input filters; spawn childagent_loop; await result; callextract_final_text. (Source: [PS])- Depends on: REQ-036, REQ-157
- Definition of Done: A sub-agent tool registered on a parent agent completes a delegated task and returns the child agent's final text as a
ToolResult.
-
REQ-149: Implement
extract_final_text(messages) -> String: scan messages in reverse for the lastAssistantmessage withTextcontent blocks; join and return them; fall back to"(sub-agent produced no text output)". (Source: [PS])- Depends on: REQ-002
- Definition of Done:
extract_final_textreturns the text of the last assistant message; an all-tool-call assistant message returns the fallback string.
-
REQ-150: Sub-agent event forwarding: spawn a task to consume child
AgentEvents and forward them to parent channel asToolExecutionUpdate(forMessageUpdate::Text) andProgressMessage(for childProgressMessage) events. (Source: [PS])- Depends on: REQ-007, REQ-148
- Definition of Done: Parent event stream includes
ToolExecutionUpdateevents showing the sub-agent's text generation in real time.
-
REQ-151: Implement
SubAgentToolbuilder:SubAgentTool::new(name, model_config).with_system_prompt(...).with_tools(...).with_max_turns(...).with_thinking(...). (Source: [AR])- Depends on: REQ-021, REQ-148
- Definition of Done: A fully configured
SubAgentToolcan be added to a parent agent's tool list viawith_tools.
Milestone 5.2 — OpenAPI Adapter (Feature-Gated)
-
REQ-152: Implement
OpenApiAdapter::from_str(spec, config, filter): auto-detect JSON vs YAML (first non-whitespace char{or[→ JSON, else YAML); parse OpenAPI 3.x spec; resolve base URL; generate oneOpenApiToolAdapterper matching operation. (Source: [AR])- Depends on: REQ-153, REQ-154, REQ-155, REQ-156
- Definition of Done: A valid OpenAPI 3.x spec string (JSON and YAML both) produces one tool adapter per operation with an
operationId.
-
REQ-153: Classify parameters:
path→ URL substitution with RFC 3986 percent-encoding;query→ query string;header→ request headers;cookie→ skip with no error;requestBody(application/json only) → keyed as"body"(or"_request_body"on name collision). (Source: [AR])- Depends on: REQ-021
- Definition of Done: Path parameters appear in the URL; query parameters appear in the query string; cookie parameters are silently ignored.
-
REQ-154: Implement the HTTP execution pipeline per tool call: validate params, substitute path params, build URL, chain query/header params, apply
OpenApiAuth, applycustom_headers, optionally attach JSON body, send request, read body, truncate atmax_response_byteson a UTF-8 boundary, return"{METHOD} {URL} → {STATUS}\n\n{BODY}". (Source: [AR])- Depends on: REQ-021
- Definition of Done: A POST to a test endpoint with path, query, and body params produces the documented return format.
-
REQ-155: Implement
OperationFilter:All(include everything with anoperationId);ByOperationId(ids)(include only listed IDs);ByTag(tags)(include operations tagged with any listed tag);ByPathPrefix(prefix)(include operations whose path starts with prefix). Operations withoutoperationIdalways emit a warning and are skipped. (Source: [AR])- Depends on: REQ-152
- Definition of Done: Each filter variant correctly includes/excludes operations; an operation without
operationIdlogs a warning and is excluded regardless of filter.
-
REQ-156: Apply optional
name_prefixfromOpenApiConfig: tool name becomes"{prefix}__{operationId}"when set. (Source: [AR])- Depends on: REQ-152
- Definition of Done: With
name_prefix: Some("myapi"), the tool foroperationId: "getUser"is named"myapi__getUser".
-
REQ-157: Implement
from_file(path, config, filter)(async file read) andfrom_url(url, config, filter)(HTTP GET via HTTP client). (Source: [AR])- Depends on: REQ-152
- Definition of Done: Both sources produce identical tool lists as
from_stron the same spec content.
-
REQ-158: Implement
Agent::with_openapi_file,with_openapi_url,with_openapi_specbuilders onAgent. Gate the entireopenapimodule behind anopenapifeature flag. (Source: [AR])- Depends on: REQ-026, REQ-157
- Definition of Done: Without the
openapifeature, the code compiles successfully without the adapter; with it, all three builders are available.
Milestone 5.3 — Advanced Anthropic Protocol
-
REQ-159: Implement Anthropic OAuth auth path: when
model_configindicates OAuth, useAuthorization: Bearer {TOKEN}header plus beta headersclaude-code-20250219,oauth-2025-04-20,fine-grained-tool-streaming-2025-05-14,x-app: cli,anthropic-dangerous-direct-browser-access: true,user-agent: claude-cli/2.1.2. (Source: [AR])- Depends on: REQ-040
- Definition of Done: An OAuth-configured provider sends all documented headers; standard API key auth sends the standard
x-api-keyheader.
-
REQ-160: Implement Anthropic
InputJsonDeltatool-argument streaming: buffer incrementalInputJsonDeltatext fragments inarguments["__partial_json"]; parse the complete accumulated string as JSON oncontent_block_stop. (Source: [AR])- Depends on: REQ-040
- Definition of Done: A tool call streamed in 5
InputJsonDeltafragments produces a single, complete, parseable JSONargumentsobject.
Milestone 5.4 — Ambiguity Resolutions
-
REQ-161: [AMBIGUOUS] Standardize
AgentEndemission on abort: define and document whetherAgentEndis emitted when cancellation is detected at various checkpoints (start of loop, mid-stream, mid-tool). Implement a consistent policy. (Source: [PS])- Depends on: REQ-067, REQ-082
- Definition of Done: The chosen policy is documented; behavior is consistent regardless of where in the loop cancellation is detected.
-
REQ-162:
TokenCountertrait incontext/token.rswithHeuristicTokenCounter(chars/4) as default. Pluggable viaContextConfig.token_counter. Threaded through all hot-path call sites. (Source: [OV])- Depends on: REQ-054
- Definition of Done: A
TokenCountertrait or injection point exists; the default implementation uses the 4-char heuristic; a precise implementation can be substituted via configuration.
-
REQ-163: [AMBIGUOUS] Define sub-agent error propagation: document what
execute()returns when the childagent_loopproduces only error/empty messages. Implement theextract_final_textfallback consistently. (Source: [PS])- Depends on: REQ-149
- Definition of Done: The policy is documented; child agent error messages are reflected in the fallback text or surfaced as
ToolError::Failed.
Level 6 — Boss
Goal: The system is exceptional. It is fully tested, scalable, developer-friendly, and operates as a platform with a clear public API contract and operational runbooks.
Completion Criteria: The system passes load tests at 10x expected tool concurrency. Full test coverage includes unit, integration, property-based, and end-to-end tests. Public API documentation is complete. Operational runbooks cover all known failure modes.
Milestone 6.1 — Full Test Suite
-
REQ-164: Unit tests for all three compaction levels (
level1,level2,level3) including: no-op when under budget; exact budget boundary; message count edge cases (fewer messages thankeep_recent/keep_first); correct ordering of head+marker+tail in level 3. (Source: [AR])- Depends on: REQ-056 through REQ-059
- Definition of Done: All edge cases identified above have dedicated test cases that pass.
-
REQ-165: Property-based tests for
compact_messages: for any valid(messages, config)input,total_tokens(compact_messages(messages, config)) <= budget. (Source: [AR])- Depends on: REQ-056
- Definition of Done: 10,000 random test cases all satisfy the budget invariant without panic.
-
REQ-166: Unit tests for
delay_for_attempt: verify exponential growth; verify jitter stays in[0.8, 1.2]range over 10,000 samples; verifymax_delay_mscap is respected. (Source: [AR])- Depends on: REQ-071
- Definition of Done: All three assertions pass across the full retry range.
-
REQ-167: Integration tests for each of the 7 provider protocols using a mock HTTP server: correct request format, correct response parsing, correct
StopReasonmapping, correct tool-call extraction. (Source: [AR])- Depends on: REQ-040 through REQ-042, REQ-120 through REQ-124
- Definition of Done: Each provider has at least one happy-path integration test and one error-path test using a local mock server.
-
REQ-168: Integration test for MCP stdio transport: spawn a minimal mock MCP server subprocess; verify initialize handshake, tool listing, and tool execution. (Source: [AR])
- Depends on: REQ-114 through REQ-119
- Definition of Done: The mock MCP server can be connected to, queried, and called; all three phases produce correct results.
-
REQ-169: End-to-end agent loop tests using
MockProvider: test single-turn text response; multi-turn tool call cycle; steering injection mid-run; follow-up queue; execution limit enforcement; context compaction trigger; input filter rejection. (Source: [AR])- Depends on: REQ-036 through REQ-090
- Definition of Done: All seven scenarios have a passing automated test.
Milestone 6.2 — Load and Scale Testing
-
REQ-170: Load test: run 100 parallel agents each with 10 concurrent tool calls using
MockProvider. Verify no data races, no deadlocks, correct result ordering, no memory leaks. (Source: [AR])- Depends on: REQ-045, REQ-085
- Definition of Done: 1,000 total tool calls complete correctly with no panics and tool results are in original call order.
-
REQ-171: Load test: run a single agent for 1,000 turns with compaction enabled. Verify token estimates stay bounded; no unbounded memory growth; compaction fires when expected. (Source: [AR])
- Depends on: REQ-056, REQ-060
- Definition of Done: Memory usage stabilizes after compaction; no messages are dropped that violate
keep_first/keep_recentinvariants.
-
REQ-172: Memory profile: verify
Agent.messagesdoes not grow unboundedly in a long conversation with compaction enabled. (Source: [AR])- Depends on: REQ-056, REQ-060
- Definition of Done: Message count stays within
keep_first + keep_recent + small_constantafter steady state is reached.
Milestone 6.3 — Public API Contract and Documentation
-
REQ-173: Publish complete API reference documentation for all public types, traits, and functions with usage examples for each primary use case from
../reference/glossary.md. (Source: [OV])- Depends on: REQ-001 through REQ-163
- Definition of Done: A developer with no prior context can build a working coding assistant and CLI REPL from the docs alone.
-
REQ-174: Document all 7 provider integration contracts: authentication method, endpoint pattern, request format, response parsing notes, any quirks (e.g., Bedrock ndjson, Google tool ID generation, Azure
api-keyheader). (Source: [AR])- Depends on: REQ-040 through REQ-042, REQ-120 through REQ-124
- Definition of Done: Each provider has a documentation page listing all fields from the integration contract table.
-
REQ-175: Write and publish working example implementations: (1) CLI REPL with
/quit,/clear,/modelcommands; (2) coding assistant with all built-in tools; (3) multi-agent pipeline withSubAgentTool. (Source: [OV])- Depends on: REQ-053, REQ-148
- Definition of Done: All three examples compile and run end-to-end; the CLI REPL handles all three slash commands.
-
REQ-176: Publish AgentSkills standard compliance documentation and MCP integration guide. (Source: [OV])
- Depends on: REQ-109 through REQ-113, REQ-114 through REQ-119
- Definition of Done: Both guides include a "getting started" section that results in a working integration.
Milestone 6.4 — Developer Tooling and Operational Readiness
-
REQ-177: Package and publish the library with proper semantic versioning. The
openapifeature is opt-in. Document all feature flags. (Source: [AR])- Depends on: REQ-158
- Definition of Done: Library installs as a dependency;
openapifeature is absent from the default build; enabling it adds the adapter without breaking existing code.
-
REQ-178: CI pipeline: run unit tests, integration tests (with mock servers), and
openapi-feature tests on every commit. Gate provider live tests behind API key secrets. (Source: [AR])- Depends on: REQ-164 through REQ-169
- Definition of Done: CI passes on every commit; provider live tests run in a separate gated workflow.
-
REQ-179: Operational runbook covering: retry tuning (when to adjust
RetryConfig); context overflow handling (choosingContextConfigvalues); provider failover (switching providers on persistent failures); MCP server crash recovery; performance profiling guide. (Source: [AR])- Depends on: REQ-071 through REQ-077
- Definition of Done: The runbook covers all five topics with actionable decision trees.
Requirement Index
| REQ | Description | Level | Milestone | Source | Depends On |
|---|---|---|---|---|---|
| REQ-001 | Content enum (Text, Image, Thinking, ToolCall) | 1 | 1.1 | [AR] | — |
| REQ-002 | Message enum (User, Assistant, ToolResult) | 1 | 1.1 | [AR] | REQ-001, REQ-005, REQ-006 |
| REQ-003 | AgentMessage enum (Llm, Extension) | 1 | 1.1 | [AR] | REQ-002, REQ-004 |
| REQ-004 | ExtensionMessage struct | 1 | 1.1 | [AR] | — |
| REQ-005 | StopReason enum | 1 | 1.1 | [AR] | — |
| REQ-006 | Usage struct with cache_hit_rate() | 1 | 1.1 | [AR] | — |
| REQ-007 | AgentEvent enum (all variants) | 1 | 1.1 | [AR] | REQ-002, REQ-008 |
| REQ-008 | StreamDelta enum | 1 | 1.1 | [AR] | — |
| REQ-009 | ToolContext struct | 1 | 1.1 | [AR] | — |
| REQ-010 | ToolResult and ToolError types | 1 | 1.1 | [AR] | REQ-001 |
| REQ-011 | ContextConfig struct with defaults | 1 | 1.1 | [AR] | — |
| REQ-012 | ExecutionLimits and ExecutionTracker | 1 | 1.1 | [AR] | — |
| REQ-013 | RetryConfig with defaults | 1 | 1.1 | [AR] | — |
| REQ-014 | CacheConfig and CacheStrategy | 1 | 1.1 | [AR] | — |
| REQ-015 | StreamConfig struct | 1 | 1.1 | [AR] | REQ-014, REQ-016 |
| REQ-016 | ToolDefinition struct | 1 | 1.1 | [AR] | — |
| REQ-017 | QueueMode enum | 1 | 1.1 | [AR] | — |
| REQ-018 | Full Serialize/Deserialize on AgentMessage tree | 1 | 1.1 | [OV] | REQ-001–017 |
| REQ-019 | ThinkingLevel enum | 1 | 1.1 | [OV] | — |
| REQ-020 | StreamProvider trait and ProviderError enum | 1 | 1.2 | [AR] | REQ-002, REQ-015 |
| REQ-021 | AgentTool trait | 1 | 1.2 | [AR] | REQ-009, REQ-010 |
| REQ-022 | InputFilter trait | 1 | 1.2 | [OV] | — |
| REQ-023 | CompactionStrategy trait | 1 | 1.2 | [AR] | REQ-003, REQ-011 |
| REQ-024 | Agent::new() with all field defaults | 1 | 1.3 | [PS] | REQ-011–017, REQ-019–020 |
| REQ-025 | Builder methods: system_prompt, model, api_key, etc. | 1 | 1.3 | [PS] | REQ-024 |
| REQ-026 | Builder methods: tools, context_config, limits, etc. | 1 | 1.3 | [PS] | REQ-024 |
| REQ-027 | Steering/follow-up queues as Arc<Mutex | 1 | 1.3 | [AR] | REQ-003, REQ-024 |
| REQ-028 | AgentContext struct | 1 | 1.4 | [AR] | REQ-003, REQ-021 |
| REQ-029 | AgentLoopConfig struct | 1 | 1.4 | [OV] | REQ-011–017, REQ-023 |
| REQ-030 | MockProvider implementation | 1 | 1.5 | [AR] | REQ-020 |
| REQ-031 | Smoke test: Agent constructs without error | 1 | 1.5 | [OV] | REQ-024–030 |
| REQ-032 | Unbounded async event channel | 2 | 2.1 | [AR] | REQ-007 |
| REQ-033 | CancellationToken with child_token propagation | 2 | 2.1 | [AR] | — |
| REQ-034 | Agent::prompt() entry point | 2 | 2.2 | [PS] | REQ-002, REQ-035 |
| REQ-035 | Agent::prompt_messages_with_sender() | 2 | 2.2 | [PS] | REQ-027–029, REQ-033, REQ-036 |
| REQ-036 | agent_loop() implementation | 2 | 2.3 | [PS] | REQ-032, REQ-037 |
| REQ-037 | agent_loop_continue() implementation | 2 | 2.3 | [PS] | REQ-036 |
| REQ-038 | run_loop() inner loop (happy path) | 2 | 2.3 | [PS] | REQ-039, REQ-045, REQ-060 |
| REQ-039 | stream_assistant_response() (no retry) | 2 | 2.4 | [PS] | REQ-007–008, REQ-015, REQ-020, REQ-032 |
| REQ-040 | AnthropicProvider::stream() | 2 | 2.4 | [AR] | REQ-020, REQ-039 |
| REQ-041 | OpenAiCompatProvider::stream() | 2 | 2.4 | [AR] | REQ-020, REQ-039 |
| REQ-042 | ProviderRegistry with default() | 2 | 2.4 | [AR] | REQ-040, REQ-041 |
| REQ-043 | StopReason determination in providers | 2 | 2.4 | [PS] | REQ-005, REQ-040–041 |
| REQ-044 | Filter Extension messages before LLM call | 2 | 2.4 | [AR] | REQ-003, REQ-015 |
| REQ-045 | execute_tool_calls() (Parallel dispatch) | 2 | 2.5 | [PS] | REQ-046 |
| REQ-046 | execute_single_tool() | 2 | 2.5 | [PS] | REQ-007, REQ-009–010, REQ-021, REQ-033 |
| REQ-047 | BashTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-048 | ReadFileTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-049 | WriteFileTool::execute() | 2 | 2.5 | [AR] | REQ-010, REQ-021 |
| REQ-050 | EditFileTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-051 | ListFilesTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-052 | SearchTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-053 | default_tools() returning all 6 tools | 2 | 2.5 | [AR] | REQ-047–052 |
| REQ-054 | estimate_tokens() heuristic | 2 | 2.6 | [PS] | — |
| REQ-055 | content_tokens() and message_tokens() | 2 | 2.6 | [PS] | REQ-001, REQ-003, REQ-054 |
| REQ-056 | compact_messages() 3-tier cascade | 2 | 2.6 | [PS] | REQ-055, REQ-057–059 |
| REQ-057 | level1_truncate_tool_outputs() | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-058 | level2_summarize_old_turns() | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-059 | level3_drop_middle() and keep_within_budget() | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-060 | Integrate compaction in run_loop | 2 | 2.6 | [PS] | REQ-038, REQ-056 |
| REQ-061 | ExecutionTracker::record_turn() and check_limits() | 2 | 2.7 | [AR] | REQ-012 |
| REQ-062 | Execution limit enforcement in run_loop | 2 | 2.7 | [PS] | REQ-038, REQ-061 |
| REQ-063 | Agent::save_messages() | 2 | 2.8 | [OV] | REQ-018 |
| REQ-064 | Agent::restore_messages() | 2 | 2.8 | [OV] | REQ-018, REQ-063 |
| REQ-065 | Agent::reset() | 2 | 2.8 | [AR] | REQ-033 |
| REQ-066 | Agent::steer() and Agent::follow_up() | 2 | 2.8 | [AR] | REQ-027 |
| REQ-067 | Agent::abort() | 2 | 2.8 | [AR] | REQ-033, REQ-035 |
| REQ-068 | Input filter chain execution | 3 | 3.1 | [PS] | REQ-022, REQ-036 |
| REQ-069 | Reject → emit InputRejected + AgentEnd([]) | 3 | 3.1 | [PS] | REQ-068 |
| REQ-070 | Warn → append warning text to last user message | 3 | 3.1 | [PS] | REQ-068 |
| REQ-071 | delay_for_attempt() exponential backoff with jitter | 3 | 3.2 | [PS] | REQ-013 |
| REQ-072 | is_retryable() on ProviderError | 3 | 3.2 | [AR] | REQ-020 |
| REQ-073 | retry_after() on ProviderError | 3 | 3.2 | [AR] | REQ-020 |
| REQ-074 | Retry loop in stream_assistant_response | 3 | 3.2 | [PS] | REQ-039, REQ-071–073 |
| REQ-075 | ProviderError::classify() HTTP status routing | 3 | 3.3 | [PS] | REQ-020 |
| REQ-076 | is_context_overflow() phrase matching | 3 | 3.3 | [PS] | — |
| REQ-077 | Context overflow recovery trigger | 3 | 3.3 | [AR] | REQ-056, REQ-075–076 |
| REQ-078 | ToolError::Failed/InvalidArgs → error ToolResult | 3 | 3.4 | [AR] | REQ-010, REQ-046 |
| REQ-079 | ToolError::NotFound → "Tool X not found" | 3 | 3.4 | [PS] | REQ-046 |
| REQ-080 | ToolError::Cancelled → "Skipped" ToolResult | 3 | 3.4 | [AR] | REQ-010, REQ-046 |
| REQ-081 | Error stop reason handling in run_loop | 3 | 3.5 | [PS] | REQ-038, REQ-082 |
| REQ-082 | Aborted stop reason handling in run_loop | 3 | 3.5 | [PS] | REQ-038 |
| REQ-083 | Synthetic error Message::Assistant on provider failure | 3 | 3.5 | [PS] | REQ-002, REQ-039 |
| REQ-084 | execute_sequential() with steering check | 3 | 3.6 | [PS] | REQ-046, REQ-080 |
| REQ-085 | execute_batch() (Parallel) with post-batch steering | 3 | 3.6 | [PS] | REQ-046 |
| REQ-086 | Batched { size } dispatch with inter-batch steering | 3 | 3.6 | [PS] | REQ-085 |
| REQ-087 | Drain steering queue at start of outer loop | 3 | 3.7 | [PS] | REQ-038 |
| REQ-088 | Inject steering messages into pending after tools | 3 | 3.7 | [PS] | REQ-038, REQ-084–085 |
| REQ-089 | Follow-up queue check re-enters outer loop | 3 | 3.7 | [PS] | REQ-038 |
| REQ-090 | QueueMode::OneAtATime and QueueMode::All | 3 | 3.7 | [AR] | REQ-017, REQ-027 |
| REQ-091 | before_turn callback with abort-if-false | 3 | 3.8 | [PS] | REQ-038 |
| REQ-092 | after_turn callback on every turn | 3 | 3.8 | [PS] | REQ-038 |
| REQ-093 | on_error callback on Error stop reason | 3 | 3.8 | [PS] | REQ-081 |
| REQ-094 | BashTool deny patterns | 3 | 3.9 | [PS] | REQ-047 |
| REQ-095 | BashTool timeout + cancellation race | 3 | 3.9 | [PS] | REQ-047 |
| REQ-096 | BashTool output truncation | 3 | 3.9 | [PS] | REQ-047 |
| REQ-097 | BashTool confirm_fn callback | 3 | 3.9 | [PS] | REQ-047 |
| REQ-098 | ReadFileTool size limits (1MB text, 20MB image) | 3 | 3.9 | [PS] | REQ-048 |
| REQ-099 | ReadFileTool image path (base64, MIME detection) | 3 | 3.9 | [PS] | REQ-001, REQ-048 |
| REQ-100 | ReadFileTool cancellation check | 3 | 3.9 | [PS] | REQ-048 |
| REQ-101 | EditFileTool zero-match error with fuzzy hint | 3 | 3.9 | [PS] | REQ-050 |
| REQ-102 | EditFileTool multiple-match error | 3 | 3.9 | [PS] | REQ-050 |
| REQ-103 | EditFileTool cancellation check | 3 | 3.9 | [PS] | REQ-050 |
| REQ-104 | WriteFileTool cancellation check | 3 | 3.9 | [AR] | REQ-049 |
| REQ-105 | ListFilesTool timeout + max_results truncation | 3 | 3.9 | [PS] | REQ-051 |
| REQ-106 | SearchTool rg→grep fallback + cancellation | 3 | 3.9 | [PS] | REQ-052 |
| REQ-107 | is_streaming guard in prompt_messages_with_sender | 3 | 3.10 | [PS] | REQ-035 |
| REQ-108 | agent_loop_continue precondition validation | 3 | 3.10 | [PS] | REQ-037 |
| REQ-109 | SkillSet::load() with collision handling | 3 | 3.11 | [PS] | REQ-110 |
| REQ-110 | parse_frontmatter() with error variants | 3 | 3.11 | [PS] | — |
| REQ-111 | SkillSet::format_for_prompt() XML output | 3 | 3.11 | [PS] | REQ-109 |
| REQ-112 | SkillSet::load_dir() and SkillSet::merge() | 3 | 3.11 | [AR] | REQ-109 |
| REQ-113 | Agent::with_skills() builder | 3 | 3.11 | [PS] | REQ-111 |
| REQ-114 | McpClient::connect_stdio() with handshake | 3 | 3.12 | [PS] | REQ-115, REQ-116 |
| REQ-115 | McpClient::send_request() JSON-RPC 2.0 | 3 | 3.12 | [PS] | — |
| REQ-116 | McpClient::list_tools() and call_tool() | 3 | 3.12 | [PS] | REQ-115 |
| REQ-117 | McpToolAdapter implementing AgentTool | 3 | 3.12 | [AR] | REQ-001, REQ-021, REQ-116 |
| REQ-118 | All McpError variants → ToolError::Failed | 3 | 3.12 | [AR] | REQ-117 |
| REQ-119 | Agent::with_mcp_server_stdio() builder | 3 | 3.12 | [AR] | REQ-114, REQ-117 |
| REQ-120 | GoogleProvider::stream() (Gemini API) | 4 | 4.1 | [AR] | REQ-020 |
| REQ-121 | GoogleVertexProvider::stream() (Vertex AI) | 4 | 4.1 | [AR] | REQ-120 |
| REQ-122 | BedrockProvider::stream() (ConverseStream) | 4 | 4.1 | [AR] | REQ-020 |
| REQ-123 | OpenAiResponsesProvider::stream() | 4 | 4.1 | [AR] | REQ-020 |
| REQ-124 | AzureOpenAiProvider::stream() | 4 | 4.1 | [AR] | REQ-123 |
| REQ-125 | All 7 providers in ProviderRegistry::default() | 4 | 4.1 | [AR] | REQ-042, REQ-120–124 |
| REQ-126 | CacheStrategy::Auto breakpoint placement | 4 | 4.2 | [AR] | REQ-014, REQ-040 |
| REQ-127 | CacheStrategy::Manual and Disabled | 4 | 4.2 | [AR] | REQ-126 |
| REQ-128 | Cache token counts in Usage | 4 | 4.2 | [AR] | REQ-006, REQ-040 |
| REQ-129 | ThinkingLevel → Anthropic thinking parameter | 4 | 4.3 | [AR] | REQ-019, REQ-040 |
| REQ-130 | ThinkingLevel → OpenAI reasoning_effort | 4 | 4.3 | [AR] | REQ-019, REQ-041 |
| REQ-131 | Parse Thinking content from streaming responses | 4 | 4.3 | [AR] | REQ-001, REQ-008, REQ-040 |
| REQ-132 | McpClient::connect_http() | 4 | 4.4 | [AR] | REQ-115 |
| REQ-133 | Agent::with_mcp_server_http() with prefix support | 4 | 4.4 | [AR] | REQ-117, REQ-132 |
| REQ-134 | MCP stdio shutdown (EOF + kill) | 4 | 4.4 | [AR] | REQ-114 |
| REQ-135 | Structured retry logging | 4 | 4.5 | [PS] | REQ-074 |
| REQ-136 | ContextTracker hybrid token tracking | 4 | 4.5 | [AR] | REQ-054–055 |
| REQ-137 | ToolResult.details per-tool metadata | 4 | 4.5 | [AR] | REQ-047–052 |
| REQ-138 | OpenApiAuth credential redaction in debug | 4 | 4.6 | [AR] | — |
| REQ-139 | BashTool default deny-pattern list | 4 | 4.6 | [PS] | REQ-094 |
| REQ-140 | CancellationToken::child_token() propagation | 4 | 4.7 | [PS] | REQ-033, REQ-046 |
| REQ-141 | Sub-agent inherits parent cancel token | 4 | 4.7 | [PS] | REQ-033, REQ-140 |
| REQ-142 | on_update callback → ToolExecutionUpdate event | 4 | 4.8 | [AR] | REQ-007, REQ-046 |
| REQ-143 | on_progress callback → ProgressMessage event | 4 | 4.8 | [AR] | REQ-007, REQ-046 |
| REQ-144 | Agent::prompt_with_sender() | 4 | 4.8 | [AR] | REQ-034 |
| REQ-145 | transform_context/convert_to_llm hooks | 4 | 4.8 | [PS] | REQ-039 |
| REQ-146 | Agent::with_compaction_strategy() builder | 4 | 4.8 | [AR] | REQ-023, REQ-060 |
| REQ-147 | ModelConfig struct and application in OpenAiCompat | 4 | 4.8 | [AR] | REQ-041 |
| REQ-148 | SubAgentTool::execute() | 5 | 5.1 | [PS] | REQ-036, REQ-157 |
| REQ-149 | extract_final_text() | 5 | 5.1 | [PS] | REQ-002 |
| REQ-150 | Sub-agent event forwarding to parent channel | 5 | 5.1 | [PS] | REQ-007, REQ-148 |
| REQ-151 | SubAgentTool builder API | 5 | 5.1 | [AR] | REQ-021, REQ-148 |
| REQ-152 | OpenApiAdapter::from_str() JSON/YAML parsing | 5 | 5.2 | [AR] | REQ-153–156 |
| REQ-153 | OpenAPI parameter classification | 5 | 5.2 | [AR] | REQ-021 |
| REQ-154 | OpenAPI HTTP execution pipeline | 5 | 5.2 | [AR] | REQ-021 |
| REQ-155 | OperationFilter variants | 5 | 5.2 | [AR] | REQ-152 |
| REQ-156 | name_prefix tool naming | 5 | 5.2 | [AR] | REQ-152 |
| REQ-157 | from_file() and from_url() spec sources | 5 | 5.2 | [AR] | REQ-152 |
| REQ-158 | OpenAPI builders on Agent + feature flag | 5 | 5.2 | [AR] | REQ-026, REQ-157 |
| REQ-159 | Anthropic OAuth auth path | 5 | 5.3 | [AR] | REQ-040 |
| REQ-160 | Anthropic InputJsonDelta tool-arg streaming | 5 | 5.3 | [AR] | REQ-040 |
| REQ-161 | [AMBIGUOUS] AgentEnd on abort policy | 5 | 5.4 | [PS] | REQ-067, REQ-082 |
| REQ-162 | [AMBIGUOUS] TokenCounter abstraction point | 5 | 5.4 | [OV] | REQ-054 |
| REQ-163 | [AMBIGUOUS] Sub-agent error propagation policy | 5 | 5.4 | [PS] | REQ-149 |
| REQ-164 | Compaction algorithm unit tests | 6 | 6.1 | [AR] | REQ-056–059 |
| REQ-165 | Property-based tests: budget invariant | 6 | 6.1 | [AR] | REQ-056 |
| REQ-166 | Retry backoff unit tests | 6 | 6.1 | [AR] | REQ-071 |
| REQ-167 | Provider integration tests (mock HTTP server) | 6 | 6.1 | [AR] | REQ-040–042, REQ-120–124 |
| REQ-168 | MCP stdio integration test | 6 | 6.1 | [AR] | REQ-114–119 |
| REQ-169 | End-to-end agent loop tests (MockProvider) | 6 | 6.1 | [AR] | REQ-036–090 |
| REQ-170 | Load test: 100 parallel agents, 10 concurrent tools | 6 | 6.2 | [AR] | REQ-045, REQ-085 |
| REQ-171 | Load test: 1,000-turn single agent with compaction | 6 | 6.2 | [AR] | REQ-056, REQ-060 |
| REQ-172 | Memory profile: message growth is bounded | 6 | 6.2 | [AR] | REQ-056, REQ-060 |
| REQ-173 | Public API reference documentation | 6 | 6.3 | [OV] | REQ-001–163 |
| REQ-174 | Provider integration contract documentation | 6 | 6.3 | [AR] | REQ-040–042, REQ-120–124 |
| REQ-175 | Working example implementations | 6 | 6.3 | [OV] | REQ-053, REQ-148 |
| REQ-176 | AgentSkills + MCP integration guides | 6 | 6.3 | [OV] | REQ-109–119 |
| REQ-177 | Library packaging with feature flags | 6 | 6.4 | [AR] | REQ-158 |
| REQ-178 | CI pipeline with gated live tests | 6 | 6.4 | [AR] | REQ-164–169 |
| REQ-179 | Operational runbooks | 6 | 6.4 | [AR] | REQ-071–077 |
| REQ-180 | ContinuationKind enum (Default, Rerun { tag }, Branch { tag }) | 4 | 4.9 | [AR] | — |
| REQ-181 | TurnTrigger enum (User, Continuation, SubAgent, Branch) | 4 | 4.9 | [AR] | — |
| REQ-182 | before_loop/after_loop hooks on AgentLoopConfig | 4 | 4.9 | [AR] | REQ-029, REQ-036 |
| REQ-183 | before_tool_execution/after_tool_execution hooks on AgentLoopConfig | 4 | 4.9 | [AR] | REQ-029, REQ-046 |
| REQ-184 | before_tool_execution_update/after_tool_execution_update hooks | 4 | 4.9 | [AR] | REQ-142, REQ-183 |
| REQ-185 | Guaranteed event hook ordering invariant | 4 | 4.9 | [AR] | REQ-182–184, REQ-091–092 |
| REQ-186 | provider_id() -> &str required method on StreamProvider; implement in all 7 providers | 4 | 4.9 | [AR] | REQ-020, REQ-125 |
| REQ-187 | config_id: Option<String> on AgentLoopConfig; auto-derived when None | 4 | 4.9 | [AR] | REQ-029, REQ-186 |
| REQ-188 | agent_id/session_id UUID fields on Agent; stable for Agent lifetime | 4 | 4.9 | [AR] | REQ-024 |
| REQ-189 | loop_counters and last_loop_id on Agent; next_loop_id() helper | 4 | 4.9 | [AR] | REQ-024, REQ-187, REQ-188 |
| REQ-190 | agent_id, session_id, loop_id, parent_loop_id, continuation_kind on AgentContext; write-back in agent_loop | 4 | 4.9 | [AR] | REQ-028, REQ-180, REQ-188 |
| REQ-191 | Assert agent_id/session_id are Some in agent_loop_continue | 4 | 4.9 | [AR] | REQ-037, REQ-190 |
| REQ-192 | AgentStart event: agent_id, session_id, loop_id, parent_loop_id, continuation_kind fields | 4 | 4.9 | [AR] | REQ-007, REQ-180, REQ-190 |
| REQ-193 | TurnStart.triggered_by: TurnTrigger; Branch continuation uses Branch on first turn | 4 | 4.9 | [AR] | REQ-007, REQ-181, REQ-190 |
| REQ-194 | child_loop_id: Option<String> on ToolResult and ToolExecutionEnd; set by sub-agent tools | 4 | 4.9 | [AR] | REQ-010, REQ-007, REQ-148 |
| REQ-195 | SubAgentTool::with_parent_loop_id(loop_id) builder; child AgentContext includes parent_loop_id | 4 | 4.9 | [AR] | REQ-151, REQ-190 |
Known Ambiguities
Items marked [AMBIGUOUS] in the spec that require a design decision
before implementation:
| ID | Description | Suggested Resolution | Level Introduced |
|---|---|---|---|
| AMB-001 | AgentEnd emission on abort — pseudocode says AgentEnd is NOT emitted on abort, but notes this may vary depending on where in the loop cancellation is detected (provider Start/Done events may still arrive). | Define a clear policy: AgentEnd is ALWAYS emitted when the loop exits, including on abort, so callers can rely on the channel always closing cleanly. Gate this by ensuring cancellation detection before the loop attempts to emit AgentEnd. | 5 |
| AMB-002 | Token counting precision — estimate_tokens uses a 4-chars-per-token heuristic explicitly noted as imprecise. No integration with tiktoken or similar is specified. | Introduce a TokenCounter trait (or function pointer) on ContextConfig that defaults to the 4-char heuristic but can be overridden by the caller. This keeps the default zero-dependency while enabling precision via injection. | 5 |
| AMB-003 | Sub-agent error propagation — when a child agent_loop produces only error or tool-only messages (no Text in the final assistant message), extract_final_text returns a fixed fallback string. It is unclear whether the calling tool should return Ok(ToolResult { fallback }) or Err(ToolError::Failed(...)). | Return Ok(ToolResult) with the fallback text always. If the sub-agent produced an error assistant message, include the error_message field in the fallback text so the parent LLM can see and react to it. | 5 |
Level Completion Checklist
- Level 1 — Survive: All core types, traits, and the Agent struct initialize without error; smoke test passes.
- Level 2 — Useful: Text prompt → LLM call → tool execution → final response works end-to-end; all 6 built-in tools execute on valid input; message persistence round-trips correctly.
- Level 3 — Smart: Input filters, retry, provider error classification, tool errors, execution limits, steering/follow-up queues, lifecycle callbacks, tool safety guards, skill loading, and MCP client all handle their error paths without panicking.
-
Level 4 — Professional: All 7 provider protocols implemented; prompt caching and extended thinking integrated; cancellation propagates to all I/O; structured logging in place;
ContextTrackeraccurate. -
Level 5 — Creative: Sub-agent delegation works end-to-end; OpenAPI adapter generates callable tools; Anthropic OAuth and
InputJsonDeltastreaming are correct; all three ambiguities have documented resolutions and implementations. - Level 6 — Boss: All test suites pass (unit, property-based, integration, end-to-end, load); public API docs and examples are complete; CI runs automatically; operational runbooks are written.
Session & Loop Identity — Future Scenarios
Added: 2026-03-22 Status: Foundation implemented (loop_id, ContinuationKind, parent_loop_id, child_loop_id). The scenarios below build on this foundation but are out of scope for the initial change.
The current implementation covers:
loop_idderived fromsession_id + config_id + counter(config owns its identity)ContinuationKindenum:Default,Rerun { tag },Branch { tag }parent_loop_idfor ancestry tracking across reruns/brancheschild_loop_idonToolExecutionEndfor parent→sub-agent traceability- Asserts in
agent_loop_continuerequiringagent_id/session_idto be set TurnTrigger::Branchfires on first turn of aBranchcontinuation
Future: HITL Resume
Scenario: User cancels a loop mid-execution (via Agent::abort()), reviews the partial
output, then resumes. The loop was aborted at some known message boundary.
Mechanism: Caller restores context.messages to the desired resume point, then calls
agent_loop_continue(Rerun | Branch). The kind communicates intent:
Rerun— resume from the same point (same logical path, treat as a retry)Branch— resume but with modifications (e.g., injected steering message, different system prompt, tweaked tool result) — a diverging path from the original
What needs to be built: A context.messages checkpoint API. The current Agent::messages()
getter returns a slice; the caller needs to be able to snapshot and restore it. The save_messages
/ restore_messages methods on Agent already support this (JSON round-trip). The missing piece
is a higher-level Agent::checkpoint() -> Checkpoint and Agent::restore(checkpoint) that
bundle the full state (messages + loop_id + session_id) for clean HITL resume without manual
field management.
Future: Checkpoint Restore
Scenario: Context is serialized to persistent storage (database, file) and later loaded for a new run — either by the same process after restart or by a different process instance.
Mechanism: Same as HITL resume at the loop level. The caller deserializes context.messages
and sets the identity fields (agent_id, session_id, loop_id) to their original values, then
calls agent_loop_continue(Branch). The parent_loop_id points to the last loop ID from the
original session, maintaining the ancestry chain across process boundaries.
What needs to be built: A serializable AgentSnapshot type that captures everything needed
to resume: messages, agent_id, session_id, last_loop_id, and any relevant config fields.
AgentSnapshot::save(path) / AgentSnapshot::load(path) convenience methods. The snapshot does
NOT include the provider config (API keys, base URLs) — those remain in the caller's environment.
Future: Parallel Exploration
Scenario: Multiple branches from the same checkpoint are run concurrently — e.g., A/B testing two different tool result injections, or evaluating three different system prompt variants on the same conversation prefix.
Mechanism: The caller snapshots the context at a branching point, then calls multiple
agent_loop_continue(Branch) concurrently, each with a different modification to context.messages
before the call. Each concurrent call produces an independent event stream with its own loop_id
and parent_loop_id pointing to the same branch-point loop.
What needs to be built: No new primitives are needed — agent_loop_continue and AgentContext
already support this. The caller is responsible for cloning the context and making independent calls.
A higher-level Agent::explore_branches(Vec<BranchSpec>) -> Vec<Receiver<AgentEvent>> convenience
method could simplify the pattern but is not required for correctness.
Concurrency note: Each branch needs its own AgentContext (owned), its own CancellationToken,
and its own mpsc::UnboundedSender. tokio::spawn each agent_loop_continue call independently.
The parent task collects results from all branch receivers.
Future: Auto Origin/Continue Selection
Scenario: The caller wants to send a new message to the agent without knowing whether the
current context requires an origin call (agent_loop) or a continuation (agent_loop_continue).
Mechanism: Inspect context.messages.last():
- No messages →
agent_loop(fresh start) - Last message is
UserorToolResult→agent_loop_continue(already awaiting model response) - Last message is
Assistant→agent_loopwith new prompt (start new turn)
What needs to be built: An Agent::send(message) method (or similar) that encapsulates
this logic. It would inspect the context state, build the appropriate call type, and dispatch.
This trades explicit caller control for convenience and is opt-in.