phi-core — System Architecture

1. Component Map

Agent trait + BasicAgent (src/agents/)

Responsibility: Agent (trait, agents/agent.rs) defines the runtime interface — prompting, state access, control, and steering queues. BasicAgent (struct, agents/basic_agent.rs) is the default in-memory implementation: owns the conversation, tools, and ModelConfig (provider identity), and is the application-facing entry point. Construction: BasicAgent::new(ModelConfig::anthropic(...)). The optional provider_override field bypasses ProviderRegistry for custom or test providers. SubAgentTool (agents/sub_agent.rs) implements AgentTool to delegate tasks to a child agent_loop(). Public interface:

  • prompt(text) — Send a text prompt; returns an event stream receiver.
  • prompt_messages(messages) — Send one or more messages as a prompt; returns an event stream receiver.
  • prompt_with_sender(text, tx) — Send a text prompt, streaming events to a caller-provided sender.
  • continue_loop() — Resume from existing context with ContinuationKind::Default; returns an event stream receiver.
  • continue_loop_with_sender(tx, kind) — Resume with an explicit ContinuationKind (Default, Rerun { tag }, or Branch { tag }), streaming events to a caller-provided sender.
  • steer(msg) — Queue a message that will be injected mid-run between tool executions.
  • follow_up(msg) — Queue a message to be processed after the agent would otherwise stop.
  • abort() — Cancel the in-progress run by signalling the cancellation token.
  • reset() — Clear messages, queues, streaming state, and cancel token to return the agent to its initial state.
  • save_messages() — Serialize the current conversation to a JSON string.
  • restore_messages(json) — Replace the current conversation with messages deserialized from a JSON string.
  • with_skills(skill_set) — Load skills and append their XML index to the system prompt per the AgentSkills standard.
  • with_mcp_server_stdio(cmd, args, env) — Connect to an MCP server by spawning a child process and add its tools to the agent.
  • with_mcp_server_http(url) — Connect to an MCP server via HTTP and add its tools to the agent.
  • with_openapi_file(path, config, filter) — Load tools from an OpenAPI spec file and add them to the agent.
  • with_openapi_url(url, config, filter) — Fetch an OpenAPI spec from a URL and add its tools to the agent.
  • with_openapi_spec(spec_str, config, filter) — Parse an OpenAPI spec string (JSON or YAML) and add its tools to the agent.
  • new_session() — Immediately rotate to a new session_id; resets loop counters and last_loop_id; returns the new session id.
  • check_and_rotate(threshold) — Rotate to a new session if the agent has been idle longer than threshold since the last prompt_* call; returns Some(new_session_id) on rotation, None otherwise.

BasicAgent state relevant to session management:

FieldTypeDescription
agent_idStringStable identifier across all sessions for this instance
session_idStringCurrent session identifier; updated by new_session()
loop_countersHashMap<String, usize>Per-config loop counter; cleared on new_session()
last_loop_idOption<String>Most recent loop; cleared on new_session()
last_active_atOption<DateTime<Utc>>Timestamp of last prompt_* call; used by check_and_rotate()

AgentLoop (src/agent_loop/)

Responsibility: The core execution engine. Manages the turn loop, tool dispatch, steering injection, follow-up processing, and lifecycle event emission. Public interface:

  • agent_loop(prompts, context, config, tx, cancel) — Start an agent run from new prompt messages, applying input filters, emitting lifecycle events, and returning all new messages produced.
  • agent_loop_continue(context, config, tx, cancel) — Resume from existing context (no new prompts); used for retries after errors or mid-conversation continuation.
  • agent_loop_parallel(prompts, base_context, configs, strategy, tx, cancel) -> ParallelLoopResult — Run N AgentLoopConfigs concurrently and evaluate results via EvaluationStrategy. When prompts is non-empty, each branch uses agent_loop; when prompts is empty, each branch uses agent_loop_continue (the user query is already the last message in base_context). base_context is cloned per branch (tools Arc-shared; message history deep-copied). All branches share the same session_id; each gets a distinct loop_id. ParallelLoopOutcome.original_context_len marks the base/branch message boundary. Emits ParallelLoopStart/ParallelLoopEnd events. selected_context feeds into agent_loop_continue() for normal session resumption.
  • derive_config_segment(config) -> String (pub crate) — Derives the stable {config_segment} portion of a loop_id from config.config_id or provider/model/thinking fields.

EvaluationLoop (src/agent_loop/evaluation.rs)

Responsibility: Pluggable strategy for selecting among parallel loop outcomes. Decoupled from src/agent_loop/ to allow custom implementations without a circular dependency (trait is defined in src/types/; implementations live here). Public interface:

  • EvaluationStrategy (trait, defined in src/types/)evaluate(prompts, outcomes, tx, cancel) -> (EvaluationDecision, Usage)
  • EvaluationDecision (enum, defined in src/types/)Select(usize) — 0-based index of the winning outcome.
  • ParallelLoopOutcome.original_context_len: usize — Number of messages in the cloned context at dispatch time. Allows strategies to split "original context" from "new branch output" messages without separate bookkeeping. Identical across all outcomes (same base context); outcomes[0] is the idiomatic source.
  • TransparentEvaluation — Single-branch pass-through; panics if > 1 outcome.
  • PickFirstEvaluation — Always selects index 0. Useful for testing.
  • TokenEfficientEvaluation — Selects the outcome with the lowest total token usage.
  • ElaborateEvaluation — Selects the outcome with the highest total token usage.
  • LlmJudgeEvaluation { judge_config, system_prompt } — Runs a separate LLM call to select the best branch. Supports both agent_loop mode (query from prompts) and agent_loop_continue mode (query extracted from last Message::User in context.messages[..original_context_len]). Includes prior conversation context in the judge prompt. Applies 2-iteration compaction: Iteration 1 compacts only prior context (3 tiers: tail-truncate → paragraph-summary → hard char limit), keeping outputs intact; Iteration 2 (if needed) compacts both context and outputs independently through the same tier pipeline. Budget derived from judge_config.context_config.max_context_tokens. Emits a ProgressMessage warning if comprehension criteria cannot be satisfied after iteration 2.

ContextManager (src/context/)

Responsibility: Token estimation, tiered context compaction, and execution limit tracking. Public interface:

  • estimate_tokens(text) — Rough token count heuristic: ~4 characters per token.
  • compact_messages(messages, config) — Reduce message list to fit token budget using a tiered strategy: truncate tool outputs → summarize old turns → drop middle messages.
  • CompactionStrategy (trait) — Interface for custom compaction logic; default implementation uses the tiered cascade (legacy compact_messages(); modern: CompactionBlock overlays).
  • ContextTracker — Tracks context window usage by combining provider-reported token counts with local estimates for recent messages.
  • ExecutionTracker — Tracks turns, cumulative tokens, and elapsed time against configured limits; signals when any limit is exceeded.
  • ContextConfig — Tuning knobs for compaction: token budget, system-prompt overhead, head/tail message preservation counts, per-tool-output line limit.
  • ExecutionLimits — Hard caps on agent execution: max turns, max total tokens, max wall-clock duration.

ProviderRegistry (src/provider/registry.rs, src/provider/mod.rs)

Responsibility: Dispatches StreamConfig to the correct provider implementation based on model_config.api: ApiProtocol. Built inline per agent_loop() call; zero allocation for a registry with all built-in providers pre-registered. Public interface:

  • ProviderRegistry::default() — Pre-registers all 7 built-in providers; used automatically by agent_loop() when AgentLoopConfig.provider_override is None.
  • ProviderRegistry::new() — Create an empty registry for custom provider sets.
  • Provider resolution: model_config.api selects the wire-protocol handler; model_config fields (id, api_key, base_url, compat, etc.) differentiate services within the same protocol.

StreamProvider implementations (src/provider/)

Responsibility: Translate the unified StreamConfig into provider-specific HTTP requests and parse streaming responses back into StreamEvents. Providers: AnthropicProvider, OpenAiCompatProvider (15+ backends), OpenAiResponsesProvider, AzureOpenAiProvider, GoogleProvider, GoogleVertexProvider, BedrockProvider, MockProvider. Public interface:

  • StreamProvider::stream(config, tx, cancel) -> Result<Message, ProviderError> — stream a single LLM response.
  • StreamProvider::provider_id() -> &str — stable lowercase identifier for this provider (e.g. "anthropic", "openai", "google", "bedrock"). Used as the first segment of the auto-derived config_id in loop_id construction.

ToolSystem (src/tools/)

Responsibility: Built-in tool implementations. Each implements AgentTool. Tools: BashTool (shell execution), ReadFileTool (text + image files), WriteFileTool (create/overwrite), EditFileTool (surgical search/replace), ListFilesTool (directory listing), SearchTool (grep/ripgrep). Public interface:

  • default_tools() — Returns the standard built-in toolset: bash, read-file, write-file, edit-file, list-files, search.
  • AgentTool::name() — Unique tool identifier used in LLM tool-use calls and event correlation.
  • AgentTool::label() — Human-readable display name for UI.
  • AgentTool::description() — Free-text description sent to the LLM to explain when to use the tool.
  • AgentTool::parameters_schema() — JSON Schema object describing the tool's accepted parameters.
  • AgentTool::execute(params, ctx) — Run the tool with resolved parameters and a context carrying the cancellation token and progress callbacks.

SubAgentTool (src/agents/sub_agent.rs)

Responsibility: Implements AgentTool to delegate tasks to a child agent_loop() with isolated context, its own toolset, and a turn limit. The child gets its own agent_id, session_id, and loop_id; its parent_loop_id is linked back to the calling loop via with_parent_loop_id. Public interface:

  • SubAgentTool::new(name, model_config).with_*(...) — Construct a sub-agent tool with its own ModelConfig (provider identity), system prompt, toolset, and turn limit, then register it as an AgentTool.
  • SubAgentTool::with_provider_override(provider) — Bypass ProviderRegistry dispatch; used in tests to inject MockProvider.
  • SubAgentTool::with_parent_loop_id(loop_id) — Supply the parent loop's loop_id so the child AgentStart event carries parent_loop_id, enabling ancestry tracing across the event stream.

SkillSystem (src/context/skills.rs)

Responsibility: Loads SKILL.md files from one or more directories, parses YAML frontmatter, and formats them as an XML index injected into the system prompt. Public interface:

  • SkillSet::load(dirs) — Load skills from multiple directories; later entries override earlier ones on name conflict.
  • SkillSet::load_dir(dir, source) — Load skills from a single directory, tagging each with a source label.
  • SkillSet::merge(other) — Merge another SkillSet in; the other's skills override on name conflict.
  • SkillSet::format_for_prompt() — Render the skill list as an <available_skills> XML block ready for system-prompt injection.

McpClient (src/mcp/)

Responsibility: MCP client that connects to external tool servers over stdio or HTTP. Adapts discovered tools into AgentTool instances. Public interface:

  • McpClient::connect_stdio(cmd, args, env) — Spawn a child process, complete the JSON-RPC initialize handshake, and return a connected client.
  • McpClient::connect_http(url) — Connect to an HTTP-based MCP server and complete the initialize handshake.
  • McpToolAdapter::from_client(client) — Query the server for available tools and return one AgentTool adapter per tool.

OpenApiAdapter (src/openapi/, feature-gated)

Responsibility: Parses OpenAPI 3.x specs and generates one AgentTool per operation. Each tool makes an HTTP request to the spec's base URL. Public interface:

  • from_file(path, config, filter) — Parse an OpenAPI spec from a local file and return one tool adapter per matching operation.
  • from_url(url, config, filter) — Fetch an OpenAPI spec over HTTP and return one tool adapter per matching operation.
  • from_str(spec, config, filter) — Parse an OpenAPI spec from an in-memory string (auto-detects JSON vs YAML) and return one tool adapter per matching operation. Availability: Only compiled when the openapi feature flag is enabled.

SessionStore (src/session/)

Responsibility: Persistent session layer. Records every AgentEvent into a structured tree of Session + LoopRecord objects, and provides both free-function and trait-based APIs for flat JSON-file persistence. Public interface:

  • SessionRecorder::new(config) — Create a recorder; call on_event(event) for every event on the agent's tx channel.
  • SessionRecorder::flush() — Finalize all open loops (status → Aborted) and move them into their sessions.
  • SessionRecorder::drain_completed() — Consume and return all completed sessions.
  • SessionRecorder::sessions() — Iterate all known sessions (completed + in-progress).
  • SessionRecorder::get_session(id) — Look up a session by session_id.
  • SessionRecorder::current_loop(id) — Look up an in-progress LoopRecord by loop_id.
  • save_session(session, dir) — Write {dir}/{session_id}.json (creates dir if needed). Atomic via tmp-file + rename.
  • load_session(session_id, dir) — Read {dir}/{session_id}.json.
  • list_session_ids(dir) — List all session ids in dir, newest first.
  • load_sessions_for_agent(agent_id, dir) — Load all sessions matching agent_id.
  • delete_session(session_id, dir) — Remove {dir}/{session_id}.json.
  • SessionStore trait — async save / load / list_ids / delete / list_for_agent for callers that want a pluggable store (custom backends, mocks). (Added 0.7.0)
  • FileSystemSessionStore::new(dir) — In-tree async impl of SessionStore. Adds advisory fs2 exclusive lock on save (returns SessionError::Locked if a concurrent writer holds it). (Added 0.7.0) File format: Pretty-printed JSON. Flat directory — one file per session, no index. Writes are atomic (tmp + rename) regardless of API surface used.

RetryEngine (src/provider/retry.rs)

Responsibility: Computes exponential-backoff delay with ±20% jitter. Classifies which errors are retryable. Public interface:

  • RetryConfig — Parameters for automatic retry: initial delay, backoff multiplier, max delay, max attempt count.
  • RetryConfig::delay_for_attempt(attempt) — Compute the sleep duration before attempt N using exponential backoff with ±20% jitter.
  • is_retryable() (on ProviderError) — Returns true only for RateLimited and Network variants; all other errors fail immediately.
  • retry_after() (on ProviderError) — Extracts the server-specified retry delay from a RateLimited { retry_after_ms: Some(...) } error, if present.

2. Dependency Graph

graph TD
    App["Application Code"] --> Agent
    Agent --> AgentLoop["AgentLoop\nagent_loop/"]
    AgentLoop --> ContextManager["ContextManager\ncontext/"]
    AgentLoop --> ProviderRegistry["Provider\ntraits.rs / registry.rs"]
    AgentLoop --> ToolSystem["ToolSystem\ntools/"]
    AgentLoop --> RetryEngine["RetryEngine\nprovider/retry.rs"]
    ProviderRegistry --> Anthropic["AnthropicProvider"]
    ProviderRegistry --> OpenAI["OpenAiCompatProvider\n(15+ backends)"]
    ProviderRegistry --> OpenAIResp["OpenAiResponsesProvider"]
    ProviderRegistry --> Azure["AzureOpenAiProvider"]
    ProviderRegistry --> Google["GoogleProvider"]
    ProviderRegistry --> Vertex["GoogleVertexProvider"]
    ProviderRegistry --> Bedrock["BedrockProvider"]
    ProviderRegistry --> Mock["MockProvider\n(tests)"]
    Agent --> SkillSystem["SkillSystem\ncontext/skills.rs"]
    Agent --> McpClient["McpClient\nmcp/"]
    Agent --> OpenApiAdapter["OpenApiAdapter\nopenapi/ (feature)"]
    McpClient --> ToolSystem
    OpenApiAdapter --> ToolSystem
    SubAgent["SubAgentTool\nsub_agent.rs"] --> AgentLoop
    ToolSystem --> SubAgent
    Types["types/\n(shared types)"] --> Agent
    Types --> AgentLoop
    Types --> ToolSystem
    Types --> ProviderRegistry
    SessionStore["SessionStore\nsession/"] --> Types
    App --> SessionStore

3. Data Flow

3.1 Simple Text Prompt (no tool calls)

sequenceDiagram
    participant App
    participant Agent
    participant AgentLoop
    participant Provider
    participant EventCh as EventChannel

    App->>Agent: prompt("What is 2+2?")
    Agent->>AgentLoop: agent_loop(prompts, context, config, tx, cancel)
    AgentLoop->>EventCh: AgentStart
    AgentLoop->>EventCh: TurnStart
    AgentLoop->>EventCh: MessageStart (user)
    AgentLoop->>EventCh: MessageEnd (user)
    AgentLoop->>Provider: stream(StreamConfig)
    Provider-->>EventCh: StreamEvent::Start
    Provider-->>EventCh: StreamEvent::TextDelta x N
    Provider-->>EventCh: StreamEvent::Done(Message)
    AgentLoop->>EventCh: MessageStart (assistant placeholder)
    AgentLoop->>EventCh: MessageUpdate x N (deltas)
    AgentLoop->>EventCh: MessageEnd (assistant final)
    AgentLoop->>EventCh: TurnEnd
    AgentLoop->>EventCh: AgentEnd(messages)
    App->>EventCh: receives events via rx.recv()

3.2 Tool Call Cycle

sequenceDiagram
    participant AgentLoop
    participant Provider
    participant BashTool
    participant EventCh as EventChannel

    AgentLoop->>Provider: stream(config with tool defs)
    Provider-->>AgentLoop: Done(Message{stop_reason: ToolUse, content: [ToolCall{...}]})
    AgentLoop->>EventCh: TurnEnd(assistant message)
    AgentLoop->>AgentLoop: extract tool_calls from assistant content
    AgentLoop->>EventCh: ToolExecutionStart(id, name, args)
    AgentLoop->>BashTool: execute(params, ToolContext)
    BashTool-->>EventCh: ProgressMessage (via on_progress callback)
    BashTool-->>AgentLoop: Ok(ToolResult)
    AgentLoop->>EventCh: ToolExecutionEnd(id, name, result, is_error=false)
    AgentLoop->>EventCh: MessageStart(ToolResult message)
    AgentLoop->>EventCh: MessageEnd(ToolResult message)
    AgentLoop->>AgentLoop: append tool results to context.messages
    AgentLoop->>Provider: stream(config, now includes tool results)
    Provider-->>AgentLoop: Done(Message{stop_reason: Stop})
    AgentLoop->>EventCh: TurnEnd
    AgentLoop->>EventCh: AgentEnd

3.3 Context Compaction Trigger

sequenceDiagram
    participant AgentLoop
    participant ContextManager
    participant Provider

    AgentLoop->>ContextManager: compact(messages, config)
    ContextManager->>ContextManager: total_tokens(messages) > budget?
    alt Level 1 fits
        ContextManager-->>AgentLoop: truncated tool outputs
    else Level 2 fits
        ContextManager-->>AgentLoop: old turns summarized
    else Level 3
        ContextManager-->>AgentLoop: first + recent kept, middle dropped
    end
    AgentLoop->>Provider: stream(config with compacted messages)

3.4 Sub-Agent Delegation

sequenceDiagram
    participant ParentLoop as Parent AgentLoop
    participant SubAgentTool
    participant ChildLoop as Child AgentLoop
    participant ChildProvider as Provider

    ParentLoop->>SubAgentTool: execute({task: "..."}, ToolContext)
    SubAgentTool->>SubAgentTool: build AgentContext with child identity<br/>(new agent_id, session_id, loop_id="{child_session}.sub.1",<br/>parent_loop_id = parent's loop_id)
    SubAgentTool->>ChildLoop: agent_loop([task_prompt], context, config, tx, cancel)
    ChildLoop->>ChildLoop: emit AgentStart{loop_id, parent_loop_id}
    ChildLoop->>ChildProvider: stream(...)
    ChildProvider-->>ChildLoop: streaming events
    ChildLoop-->>SubAgentTool: Vec<AgentMessage> (final messages)
    SubAgentTool->>SubAgentTool: extract_final_text(messages)
    SubAgentTool-->>ParentLoop: Ok(ToolResult{text, child_loop_id: Some(loop_id)})
    Note over ParentLoop: ToolExecutionEnd{child_loop_id} emitted<br/>→ parent stream records child ancestry

4. Data Models

Content

Entity: Content (enum)
  Variant Text:
    text: String               [the text content]
  Variant Image:
    data: String               [base64-encoded binary]
    mime_type: String          [e.g. "image/png", "image/jpeg"]
  Variant Thinking:
    thinking: String           [internal reasoning text]
    signature: Option<String>  [provider-specific thinking signature, optional]
  Variant ToolCall:
    id: String                 [unique call ID, e.g. UUID]
    name: String               [tool name matching AgentTool::name()]
    arguments: JSON            [parameter values matching tool's JSON Schema]

Serialization: tagged by "type" field ("text", "image", "thinking", "toolCall")

Message

Entity: Message (enum)
  Variant User:
    content: Vec<Content>      [usually a single Text block]
    timestamp: u64             [unix milliseconds]
  Variant Assistant:
    content: Vec<Content>      [text, thinking, tool call blocks]
    stop_reason: StopReason    [why the model stopped]
    model: String              [model ID returned by provider]
    provider: String           [provider name, e.g. "anthropic"]
    usage: Usage               [token counts for this turn]
    timestamp: u64             [unix milliseconds]
    error_message: Option<String>  [set when stop_reason == Error]
  Variant ToolResult:
    tool_call_id: String       [matches Content::ToolCall.id]
    tool_name: String          [matches Content::ToolCall.name]
    content: Vec<Content>      [tool output, usually a Text block]
    is_error: bool             [true if tool execution failed]
    timestamp: u64             [unix milliseconds]

Lifecycle: User messages are created by the caller. Assistant messages are
           created by the provider after streaming completes. ToolResult messages
           are created by the agent loop after tool execution.

AgentMessage

Entity: AgentMessage (enum, untagged)
  Variant Llm(LlmMessage)       [sent to the LLM; user/assistant/toolResult roles; LlmMessage wraps Message + Option<TurnId>]
  Variant Extension(ExtensionMessage)  [not sent to LLM; app-only metadata]

Note: stored in Agent.messages and AgentContext.messages
      Extension messages are filtered out before LLM calls

ExtensionMessage

Entity: ExtensionMessage
  role: String        [always "extension"]
  kind: String        [app-defined event type, e.g. "ui_update"]
  data: JSON          [arbitrary app-defined payload]

StopReason

Entity: StopReason (enum)
  Stop      -> model completed naturally
  Length    -> max_tokens limit hit
  ToolUse   -> model returned tool calls (loop must continue)
  Error     -> provider or streaming error occurred
  Aborted   -> cancellation token was triggered

Serialization: camelCase ("stop", "length", "toolUse", "error", "aborted")

Usage

Entity: Usage
  input: u64          [prompt tokens processed]
  output: u64         [completion tokens generated]
  cache_read: u64     [tokens served from prompt cache]
  cache_write: u64    [tokens written to prompt cache]
  total_tokens: u64   [sum, may be 0 if not reported]

Derived: cache_hit_rate() = cache_read / (input + cache_read + cache_write)

AgentEvent

Entity: AgentEvent (enum, #[serde(tag = "type")])

Every variant except AgentStart, ParallelLoopStart, and ParallelLoopEnd now carries
loop_id: String so that events from concurrent parallel branches can be reliably
attributed to the correct LoopRecord even when they are interleaved on one tx channel.

  AgentStart {
    agent_id:          String                    [stable agent instance identifier]
    session_id:        String                    [groups all loops in one session]
    loop_id:           String                    ["{session_id}.{config_id}.{N}" — unique per call]
    parent_loop_id:    Option<String>            [None for origin calls; Some for continuations/sub-agents]
    continuation_kind: Option<ContinuationKind>  [None=origin; Some(Default/Rerun/Branch)=continuation]
    timestamp:         DateTime<Utc>
    metadata:          Option<JSON>
  }
  AgentEnd {
    loop_id:  String                         [← identifies the loop]
    messages: Vec<AgentMessage>              [all new messages produced by this loop]
    usage:    Usage
    timestamp: DateTime<Utc>
    rejection: Option<String>               [Some if input filter blocked the run]
  }
  TurnStart {
    loop_id:      String
    turn_index:   u32
    timestamp:    DateTime<Utc>
    triggered_by: TurnTrigger               [what caused this turn to begin]
  }
  TurnEnd {
    loop_id:      String
    message:      AgentMessage
    usage:        Usage
    timestamp:    DateTime<Utc>
    tool_results: Vec<Message>
  }
  MessageStart  { loop_id: String, message }            [message streaming began]
  MessageUpdate { loop_id: String, message, delta }     [content delta arrived]
  MessageEnd    { loop_id: String, message }            [message complete]
  ToolExecutionStart  { loop_id: String, tool_call_id, tool_name, args }
  ToolExecutionUpdate { loop_id: String, tool_call_id, tool_name, partial_result }
  ToolExecutionEnd {
    loop_id:       String
    tool_call_id:  String
    tool_name:     String
    result:        ToolResult
    is_error:      bool
    child_loop_id: Option<String>           [Some only when tool spawned a sub-agent loop]
  }
  ProgressMessage { loop_id: String, tool_call_id, tool_name, text }
  InputRejected   { loop_id: String, reason }           [input filter blocked the prompt]
  ParallelLoopStart {                                   [loop_id NOT on this variant]
    session_id: String
    loop_ids:   Vec<String>                 [one loop_id per branch, in config order]
    timestamp:  DateTime<Utc>
  }
  ParallelLoopEnd {                                     [loop_id NOT on this variant]
    session_id:             String
    selected_loop_id:       String
    selected_config_index:  usize
    evaluation_usage:       Usage
    timestamp:              DateTime<Utc>
  }

StreamDelta

Entity: StreamDelta (enum)
  Text { delta: String }              [text content chunk]
  Thinking { delta: String }          [thinking content chunk]
  ToolCallDelta { delta: String }     [tool call argument chunk]

ToolContext

Entity: ToolContext
  tool_call_id: String               [for correlation with AgentEvent]
  tool_name: String                  [for correlation with AgentEvent]
  cancel: CancellationToken          [check is_cancelled() in long-running tools]
  on_update: Option<ToolUpdateFn>    [callback for streaming partial ToolResults]
  on_progress: Option<ProgressFn>    [callback for user-facing status text]

ContinuationKind

Entity: ContinuationKind (enum)
  Default                 [unspecified continuation — preserves legacy semantics]
  Rerun { tag: String }   [retry from an equivalent context; tag is RFC 3339 UTC timestamp]
  Branch { tag: String }  [explore a different path from a branching point; tag is RFC 3339 UTC timestamp]

Set on AgentContext.continuation_kind before calling agent_loop_continue().
Surfaced in AgentStart.continuation_kind (None = origin call).
TurnTrigger semantics:
  Default / Rerun → first turn uses TurnTrigger::Continuation
  Branch          → first turn uses TurnTrigger::Branch

TurnTrigger

Entity: TurnTrigger (enum)
  User      [first turn of an agent_loop() origin call with new user prompts]
  SubAgent  [first turn when running as a sub-agent via SubAgentTool]
  Continuation  [subsequent turns; tool round-trip, steering, or Default/Rerun continuation]
  Branch    [first turn of an agent_loop_continue(Branch) call]

Emitted in TurnStart.triggered_by.
Priority on first turn (run_loop):
  1. Branch continuation     → TurnTrigger::Branch
  2. Any other continuation  → TurnTrigger::Continuation
  3. Origin call             → config.first_turn_trigger (User or SubAgent)
Subsequent turns always use TurnTrigger::Continuation.

ToolResult / ToolError

Entity: ToolResult
  content:       Vec<Content>    [tool output content blocks]
  details:       JSON            [structured metadata, not sent to LLM, e.g. exit_code]
  child_loop_id: Option<String>  [set by sub-agent tools; None for all other tools]

Entity: ToolError (enum)
  Failed(String)          [general execution failure]
  NotFound(String)        [tool name not in registry]
  InvalidArgs(String)     [parameter validation failed]
  Cancelled               [CancellationToken was triggered]

ContextConfig

Entity: ContextConfig
  max_context_tokens: usize     [default: 100,000; total budget including system prompt]
  system_prompt_tokens: usize   [default: 4,000; reserved for system prompt]
  keep_recent: usize            [default: 10; messages always kept in full at tail]
  keep_first: usize             [default: 2; messages always kept at head]
  tool_output_max_lines: usize  [default: 50; L1 compaction per-tool-output limit]

Effective budget = max_context_tokens - system_prompt_tokens

ExecutionLimits / ExecutionTracker

Entity: ExecutionLimits
  max_turns: usize              [default: 50; LLM calls before forced stop]
  max_total_tokens: usize       [default: 1,000,000; cumulative token budget]
  max_duration: Duration        [default: 600s; wall-clock time limit]

Entity: ExecutionTracker (runtime state)
  limits: ExecutionLimits       [immutable config]
  turns: usize                  [incremented after each LLM call]
  tokens_used: usize            [cumulative; updated from provider Usage]
  started_at: Instant           [set on construction]

RetryConfig

Entity: RetryConfig
  max_retries: usize            [default: 3; 0 = no retries]
  initial_delay_ms: u64         [default: 1,000ms]
  backoff_multiplier: f64       [default: 2.0; exponential growth factor]
  max_delay_ms: u64             [default: 30,000ms; ceiling before jitter]

CacheConfig / CacheStrategy

Entity: CacheConfig
  enabled: bool                 [master switch; default: true]
  strategy: CacheStrategy

Entity: CacheStrategy (enum)
  Auto                          [provider places breakpoints automatically]
  Disabled                      [no caching hints sent]
  Manual {
    cache_system: bool          [cache system prompt]
    cache_tools: bool           [cache tool definitions]
    cache_messages: bool        [cache second-to-last message]
  }

StreamConfig (sent to provider)

Entity: StreamConfig
  model_config: ModelConfig     [REQUIRED — full provider identity: id, api_key, base_url, compat, cost]
  system_prompt: String
  messages: Vec<Message>        [LLM-only messages, Extension filtered out]
  tools: Vec<ToolDefinition>    [schema-only; no execute functions]
  thinking_level: ThinkingLevel
  max_tokens: Option<u32>       [overrides model_config.max_tokens when Some]
  temperature: Option<f32>
  cache_config: CacheConfig

Note: model identity (id, api_key, base_url, headers, compat) is accessed via
      model_config.id, model_config.api_key, etc. No top-level model or api_key fields.

ToolDefinition (sent to LLM)

Entity: ToolDefinition
  name: String              [matches AgentTool::name()]
  description: String       [matches AgentTool::description()]
  parameters: JSON          [JSON Schema object matching AgentTool::parameters_schema()]

Skill / SkillSet

Entity: Skill
  name: String              [from YAML frontmatter; skill identifier]
  description: String       [from YAML frontmatter; one-line capability summary]
  file_path: PathBuf        [absolute path to the SKILL.md file]
  base_dir: PathBuf         [absolute path to the skill's directory]
  source: String            [origin label: "dir:0", "dir:1", etc.]

Entity: SkillSet
  skills: Vec<Skill>

Lifecycle: Loaded from disk at startup via SkillSet::load(dirs).
           Formatted as XML via format_for_prompt() and appended to system prompt.
           Agent reads full SKILL.md on-demand when activating a skill via read_file tool.

QueueMode

Entity: QueueMode (enum) — controls steering/follow-up queue delivery

  OneAtATime   pop and return exactly one message per call (default)
  All          drain and return all queued messages at once

Used in: Agent.steering_mode, Agent.follow_up_mode

McpToolInfo / McpContent

Entity: McpToolInfo — tool metadata returned by MCP server
  name: String                  [tool identifier used in tools/call]
  description: Option<String>   [human-readable description; default empty string]
  inputSchema: JSON             [JSON Schema for the tool's parameters]

Entity: McpContent (enum) — content item in a tool call result
  Variant Text:
    type: "text"
    text: String
  Variant Image:
    type: "image"
    data: String    [base64-encoded]
    mimeType: String

Entity: McpToolCallResult
  content: Vec<McpContent>  [output from the tool]
  isError: bool             [true if the tool reported an error]

OpenApiConfig / OpenApiAuth / OperationFilter

Entity: OpenApiConfig — configuration for OpenAPI tool generation
  base_url: Option<String>          [overrides spec servers[0].url; trailing slash stripped]
  auth: OpenApiAuth                 [authentication method]
  custom_headers: Map<String,String> [extra headers added to every request]
  max_response_bytes: usize         [default: 65536 (64KB); response body truncation limit]
  timeout_secs: u64                 [default: 30; per-request timeout]
  name_prefix: Option<String>       [if set, tool names formatted as "{prefix}__{operationId}"]

Entity: OpenApiAuth (enum)
  None                              [no authentication]
  Bearer(token: String)             [Authorization: Bearer {token}]
  ApiKey { header: String, value: String }  [custom header: {header}: {value}]

Note: Bearer token and ApiKey value are redacted as "****" in debug output.

Entity: OperationFilter (enum) — controls which API operations become tools
  All                               [include all operations that have an operationId]
  ByOperationId(Vec<String>)        [include only operations whose id is in the list]
  ByTag(Vec<String>)                [include operations tagged with any listed tag]
  ByPathPrefix(String)              [include operations whose path starts with the prefix]

Session / LoopRecord / SessionRecorder

Entity: Session
  session_id:      String
  agent_id:        String
  created_at:      DateTime<Utc>
  last_active_at:  DateTime<Utc>
  formation:       SessionFormation  [Explicit | FirstLoop | InactivityTimeout{..}]
  parent_spawn_ref: Option<SpawnRef> [set when this session was a sub-agent spawn]
  loops:           Vec<LoopRecord>   [ordered by started_at]

Methods: root_loops(), children_of(loop_id), parallel_siblings(loop_id),
         get_loop(loop_id), total_usage()

Entity: LoopRecord
  loop_id:             String
  session_id:          String
  agent_id:            String
  parent_loop_id:      Option<String>
  continuation_kind:   Option<ContinuationKind>
  started_at:          DateTime<Utc>
  ended_at:            Option<DateTime<Utc>>
  status:              LoopStatus          [Pending | Running | Completed | Rejected | Aborted]
  rejection:           Option<String>
  config:              Option<LoopConfigSnapshot>  [model, provider, config_id + name, api, base_url, reasoning, context_window, max_tokens, thinking_level, temperature]
  messages:            Vec<AgentMessage>   [from AgentEnd.messages — authoritative]
  usage:               Usage
  metadata:            Option<JSON>
  events:              Vec<LoopEvent>      [full event stream; MessageUpdate opt-in]
  children_loop_ids:   Vec<String>         [same-session direct children]
  child_loop_refs:     Vec<ChildLoopRef>   [cross-session sub-agent spawn links]
  parallel_group:      Option<ParallelGroupRecord>

Entity: ChildLoopRef — outbound cross-session link on the parent LoopRecord
  tool_call_id:    String
  tool_name:       String
  child_loop_id:   String
  child_session_id: String

Entity: SpawnRef — inbound cross-session link on the child Session
  parent_session_id: String
  parent_loop_id:    String
  tool_call_id:      String
  tool_name:         String

Entity: ParallelGroupRecord
  all_loop_ids:         Vec<String>   [all branch loop_ids in config order]
  selected_loop_id:     String
  selected_config_index: usize
  evaluation_usage:     Usage
  is_selected:          bool          [true only on the winner's LoopRecord]

Entity: SessionRecorderConfig
  formation_policy:         SessionFormationPolicy  [PerSessionId | InactivityTimeout{secs}]
  include_streaming_events: bool                    [default: false — excludes MessageUpdate]

5. Integration Contracts

Anthropic Messages API

  • Endpoint: https://api.anthropic.com/v1/messages
  • Auth (standard): x-api-key: {ANTHROPIC_API_KEY} + anthropic-version: 2023-06-01
  • Auth (OAuth): authorization: Bearer {TOKEN} + beta headers claude-code-20250219,oauth-2025-04-20,fine-grained-tool-streaming-2025-05-14; x-app: cli; anthropic-dangerous-direct-browser-access: true; user-agent: claude-cli/2.1.2
  • Request: POST JSON with model, system (array of text blocks), messages, tools, max_tokens (default 8192), stream: true
  • Response: Server-Sent Events stream; events: message_start, content_block_start, content_block_delta, message_delta, message_stop
  • Tool args: Streamed as InputJsonDelta text fragments; buffered in arguments["__partial_json"]; parsed as complete JSON on content_block_stop
  • Thinking: ThinkingLevel mapped to {type:"enabled", budget_tokens: N} — Minimal→128, Low→512, Medium→2048, High→8192
  • Prompt caching: cache_control: {type: "ephemeral"} placed at system/last-tool-def/second-to-last-message per CacheStrategy
  • Content format: {type: "text"|"image"|"thinking"|"tool_use"|"tool_result", ...}
  • Tool results: Role "user", type "tool_result", fields: tool_use_id, content, is_error

OpenAI-Compatible APIs (Chat Completions)

  • Endpoints: https://api.openai.com/v1/chat/completions and 14+ compatible bases (xAI/Grok, Groq, Cerebras, Mistral, DeepSeek, etc.)
  • Auth: Authorization: Bearer {API_KEY}
  • Request: POST JSON with model, messages, tools, stream: true, stream_options: {include_usage: true}
  • max_tokens field name: "max_tokens" (most) or "max_completion_tokens" (OpenAI) — controlled by OpenAiCompat.max_tokens_field
  • System prompt: First message with role "system" or "developer" (OpenAI) — controlled by supports_developer_role
  • Thinking: reasoning_effort: "low"|"medium"|"high" if supports_reasoning_effort; response in delta.reasoning_content (OpenAI) or delta.reasoning (xAI)
  • Response: SSE stream; each chunk has choices[0].delta; tool args in delta.tool_calls[].function.arguments (incremental JSON string)

OpenAI Responses API

  • Endpoint: {base_url}/responses
  • Auth: Authorization: Bearer {OPENAI_API_KEY}
  • System prompt: "instructions" field (not "messages")
  • Message format: Different from Chat Completions — see Bedrock/Responses comparison below
  • Thinking: "reasoning": {effort: "low"|"medium"|"high"} field
  • SSE events: response.output_text.delta, response.reasoning.delta, response.function_call_arguments.start/delta/done, response.completed

Azure OpenAI

  • Endpoint: {base_url}/responses?api-version=2025-01-01-preview (base_url pattern: https://{resource}.openai.azure.com/openai/deployments/{deployment})
  • Auth: api-key: {AZURE_OPENAI_API_KEY} header (not Authorization: Bearer)
  • Request/Response: Same format as OpenAI Responses API

Google Generative AI (Gemini)

  • Endpoint: {base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={API_KEY}
  • Auth: API key as URL query parameter ?key=; no Authorization header
  • System prompt: "systemInstruction": {parts: [{text: "..."}]}
  • Tools: Single object {functionDeclarations: [...]} wrapping all tool definitions
  • Contents: Role "user" or "model"; ToolResults sent as {role:"user", parts:[{functionResponse:{name, response:{result: text}}}]}
  • Tool args: Delivered complete in one event (no streaming deltas); tool IDs auto-generated as "google-fc-{index}"
  • Response parsing: Custom SSE parser (not standard library); splits on \n\n, extracts data: line

Google Vertex AI

  • Endpoint: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:streamGenerateContent?alt=sse
  • Auth: Authorization: Bearer {OAUTH_TOKEN} (OAuth2, not API key in URL)
  • Request/Response: Identical to Google Generative AI; tool IDs generated as "vertex-fc-{index}"

Amazon Bedrock (ConverseStream)

  • Endpoint: {base_url}/model/{model}/converse-stream (base_url: https://bedrock-runtime.{region}.amazonaws.com)
  • Auth: Authorization: Bearer {token} or custom headers from model_config.headers; minimal SigV4 support
  • System prompt: "system" array: [{text: "..."}]
  • Tools: toolConfig.tools: [{toolSpec: {name, description, inputSchema: {json: schema}}}]
  • Tool results: {toolResult: {toolUseId, content: [...], status: "success"|"error"}}
  • Streaming format: Newline-delimited JSON (not standard SSE); events: contentBlockDelta, contentBlockStart, contentBlockStop, messageStop, metadata

Model Context Protocol (MCP)

  • Protocol: JSON-RPC 2.0

  • Message types:

    • Request: {jsonrpc:"2.0", id:u64, method:String, params:Option<Value>}
    • Response: {jsonrpc:"2.0", id:Option<u64>, result:Option<Value>, error:Option<{code:i64,message:String,data?}>}
    • Request IDs: auto-incremented AtomicU64 starting at 1
  • Initialization handshake (3 steps):

    1. Client sends initialize with {protocolVersion:"2024-11-05", capabilities:{}, clientInfo:{name:"phi-core",version:"<pkg>"}}
    2. Server responds with {protocolVersion, capabilities:{tools?,resources?,prompts?}, serverInfo:{name,version}}
    3. Client sends notifications/initialized notification (no params; server may ignore id)
  • Tool discovery: Client sends tools/list → server returns {tools: [{name, description?, inputSchema}]}

  • Tool execution: Client sends tools/call {name, arguments} → server returns {content:[{type:"text",text}|{type:"image",data,mimeType}], isError:bool}

  • Stdio transport: Spawns child process; newline-delimited JSON over stdin/stdout; tokio::sync::Mutex for concurrent access; shutdown: EOF on stdin then kill child

  • HTTP transport: POST JSON-RPC body to configured URL; stateless (no persistent connection)

  • Tool adapter: McpToolAdapter wraps McpToolInfo + Arc<Mutex<McpClient>>; optional prefix for namespace disambiguation ({prefix}__{name})

  • Error enum: Transport(String), Protocol(String), JsonRpc{code,message}, Serialization, Io, ConnectionClosed

OpenAPI

  • Spec formats: OpenAPI 3.x; auto-detected: first non-whitespace char { or [ → JSON, else YAML

  • Sources: from_file(path) (async read), from_url(url) (HTTP GET via reqwest), from_str(text) (in-memory)

  • Base URL resolution: config.base_urlspec.servers[0].url → error if neither set; trailing slashes stripped

  • Parameter classification:

    • Path parameters → URL {param} substitution (RFC 3986 percent-encoding); required
    • Query parameters → .query() chains; optional
    • Header parameters → .header() chains; optional
    • Cookie parameters → skipped (unsupported)
    • RequestBody (application/json only) → keyed as "body" (or "_request_body" on collision); required if requestBody.required
  • HTTP execution pipeline (per tool call):

    1. Validate params is object (or null treated as {})
    2. Substitute path params with percent-encoded values; error if any missing
    3. Build URL: {base_url}{path}
    4. Chain .query() for query params present in input
    5. Chain .header() for header params present in input
    6. Apply auth: Bearer.bearer_auth(), ApiKey.header(header, value), None → nothing
    7. Apply custom_headers
    8. If has_body: .json(params["body"])
    9. Send request; read full body text; truncate to max_response_bytes at UTF-8 boundary
    10. Return: "{METHOD} {URL} → {STATUS_CODE}\n\n{BODY}"
  • Operation filter: OperationFilter::All|ByOperationId|ByTag|ByPathPrefix; operations without operationId always skipped with warning

  • Tool naming: Default = operationId; with prefix = {prefix}__{operationId}

File System

  • Read: tokio::fs::read_to_string for text (max 1MB), tokio::fs::read for images (max 20MB)
  • Write: tokio::fs::write with automatic parent dir creation
  • Edit: Read → string replace (exact match, once) → write
  • List: Spawns find command via BashTool
  • Search: Spawns grep or rg command via BashTool

Shell

  • Execution: tokio::process::Command::new("bash").arg("-c").arg(command)
  • Timeout: tokio::time::sleep with default 120s, configurable
  • Output capture: stdout + stderr piped, truncated at 256KB each
  • Safety: Deny patterns checked before execution (substring match)
  • Exit code: Returned in ToolResult.details.exit_code; tool always returns Ok (non-zero is not a ToolError)

6. State Management

Agent-Level State (in Agent struct)

All fields on Agent:

FieldTypeNotes
system_promptStringImmutable once set; injected into every LLM call
modelStringModel identifier
api_keyStringAPI authentication key
thinking_levelThinkingLevelOff/Minimal/Low/Medium/High
max_tokensOption<u32>Max completion tokens
temperatureOption<f32>Sampling temperature
model_configOption<ModelConfig>Provider-specific extras (base_url, headers, compat flags)
messagesVec<AgentMessage>Grows on each prompt() call; reset by reset(); replaced by restore_messages()
toolsVec<Box<dyn AgentTool>>Tool instances (heap-allocated trait objects)
providerBox<dyn StreamProvider>Boxed, not Arc; owned exclusively by Agent
steering_queueArc<Mutex<Vec<AgentMessage>>>Written by steer(), drained by agent loop before each tool execution check
follow_up_queueArc<Mutex<Vec<AgentMessage>>>Written by follow_up(), drained when agent loop would stop
steering_modeQueueModeDefault: OneAtATime
follow_up_modeQueueModeDefault: OneAtATime
context_configOption<ContextConfig>If None, context compaction is disabled
execution_limitsOption<ExecutionLimits>If None, no hard limits enforced
cache_configCacheConfigPrompt caching hints (Anthropic)
tool_executionToolExecutionStrategyParallel (default), Sequential, or Batched
retry_configRetryConfigBackoff for RateLimited/Network errors
before_turnOption<BeforeTurnFn>Signature: fn(&[AgentMessage], turn_number: usize) -> bool; return false to abort
after_turnOption<AfterTurnFn>Signature: fn(&[AgentMessage], &Usage)
on_errorOption<OnErrorFn>Signature: fn(&str)
input_filtersVec<Arc<dyn InputFilter>>Applied in order before LLM call
(compaction strategies)(moved to ContextConfig.compaction)in_memory_strategy and block_strategy fields on CompactionConfig (G5)
cancelOption<CancellationToken>Created when prompt() starts, consumed by abort()
is_streamingboolSet true on prompt() entry, false on exit
agent_idStringUUID v4 generated once at Agent::new(); stable for the Agent's lifetime. Injected into every AgentContext built by this agent.
session_idStringUUID v4 generated once at Agent::new(); groups all loops under one session. Stable for the Agent's lifetime.
loop_countersHashMap<String, usize>Per-"{session_id}.{config_id}" monotonic counter; incremented by next_loop_id() to produce the N component of loop_id.
last_loop_idOption<String>loop_id of the most recently started loop; set after each prompt_* or continue_loop_* call. Becomes parent_loop_id on the next continuation.
before_loopOption<BeforeLoopFn>Hook called once before AgentStart. Signature: fn(&[AgentMessage], loop_index: usize) -> bool; return false to abort before AgentStart.
after_loopOption<AfterLoopFn>Hook called once after AgentEnd. Signature: fn(&[AgentMessage], &Usage).
before_tool_executionOption<BeforeToolExecutionFn>Hook called before each ToolExecutionStart. Signature: fn(&str, &str, &JSON) -> bool (tool_name, call_id, args); return false to skip.
after_tool_executionOption<AfterToolExecutionFn>Hook called after each ToolExecutionEnd. Signature: fn(&str, &str, bool) (tool_name, call_id, is_error).
before_tool_execution_updateOption<BeforeToolExecutionUpdateFn>Hook called before each ToolExecutionUpdate. Signature: fn(&str, &str, &str) -> bool (tool_name, call_id, text); return false to suppress the event.
after_tool_execution_updateOption<AfterToolExecutionUpdateFn>Hook called after each ToolExecutionUpdate (only when not suppressed). Signature: fn(&str, &str, &str).

Invariants:

  • assert!(!self.is_streaming) fires if prompt() is called while already running — callers must use steer() or follow_up() during active runs
  • cancel is always Some while is_streaming is true
  • messages must not end in an Assistant message before agent_loop_continue() is called
  • agent_id and session_id are always Some in any AgentContext built by Agent; direct callers of agent_loop_continue must also set them

AgentContext (per-run, passed into agent loop)

State ElementTypeDescription
system_promptStringImmutable for the duration of the run
messagesVec<AgentMessage>Mutated in-place: prompts appended, assistant messages appended, tool results appended; may be replaced by compaction
tools&[Box<dyn AgentTool>]Immutable for the duration of the run
agent_idOption<String>Stable agent instance ID. Set by Agent::prompt_*; also written back by agent_loop when None. Required (non-None) for agent_loop_continue.
session_idOption<String>Stable session ID. Same lifecycle as agent_id.
loop_idOption<String>Per-call identifier of the form "{session_id}.{config_id}.{N}". Set by Agent before calling agent_loop/agent_loop_continue; falls back to UUID if None at loop entry.
parent_loop_idOption<String>loop_id of the loop this call continues from. None for origin calls. Set by Agent::continue_loop_with_sender to Agent.last_loop_id.
continuation_kindOption<ContinuationKind>How this call relates to prior loops. None for origin; Some(Default|Rerun|Branch) for continuations.

ExecutionTracker (per-run)

StateInitialTransitions
turns0Incremented after each LLM call
tokens_used0Incremented by token count of each LLM response
started_atInstant::now()Immutable; compared against max_duration on each check

Steering/Follow-up Queue Modes

  • QueueMode::OneAtATime (default for both queues): on each read, lock mutex, pop the first message only, return as Vec of 1
  • QueueMode::All: on each read, lock mutex, drain all queued messages, return the full vec

Both queues are passed to AgentLoopConfig as closures (get_steering_messages, get_follow_up_messages) that capture the Arc<Mutex<>> pointer, enabling external callers to enqueue messages while the agent loop is running on another task.

Event Hook Ordering

All hooks fire in a guaranteed strict order relative to their paired events. This ordering is enforced at runtime and is an invariant of the system:

before_loop → AgentStart
  before_turn → TurnStart
    [MessageStart/End for initial prompts — first turn of agent_loop() only]
    [MessageStart/End for injected steering messages]
    [LLM: MessageStart → MessageUpdate* → MessageEnd]
    [per tool call:]
      before_tool_execution → ToolExecutionStart
        (before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)*
      ToolExecutionEnd → after_tool_execution
  TurnEnd → after_turn
  (repeat inner block for each follow-up / steering-triggered turn)
AgentEnd → after_loop

Short-circuit rules — hook returns false:

HookWhen false is returnedBehaviour
before_loopBefore AgentStartLoop is aborted; AgentEnd { messages: [] } is emitted; function returns immediately
before_turnBefore TurnStartTurn is skipped; TurnStart/TurnEnd are not emitted; AgentEnd is not guaranteed
before_tool_executionBefore ToolExecutionStartTool call is skipped; ToolExecutionStart/End are not emitted; a skipped error ToolResult is returned to the LLM
before_tool_execution_updateBefore ToolExecutionUpdateEvent is suppressed; after_tool_execution_update is not called; tool keeps running and final ToolResult is unaffected

7. Error Handling Strategy

Provider Errors (ProviderError)

ErrorRetryableHandling
RateLimited { retry_after_ms }YesExponential backoff; respects Retry-After header if present
Network(msg)YesExponential backoff
Auth(msg)NoPropagated immediately as StopReason::Error message
Api(msg)NoPropagated as StopReason::Error message
ContextOverflow { msg }NoDetected on HTTP 400/413; triggers compaction on next turn (see below)
CancelledNoLoop exits cleanly, AgentEnd emitted
Other(msg)NoPropagated as StopReason::Error message

Context Overflow Recovery

  1. Provider returns HTTP 400/413 matching any of 15+ known overflow phrases.
  2. ProviderError::classify() returns ContextOverflow.
  3. The overflow may arrive as an HTTP error (caught in retry loop) or as a streaming error event (StreamEvent::Error with matching message), caught by Message::is_context_overflow().
  4. On the next turn, if context_config is set, compact_messages() is called before the LLM call.
  5. If no context_config is set, the error message is included in conversation history and the loop continues — the LLM may self-recover or the next turn will also fail.

Tool Errors (ToolError)

  • Cancelled: Tool execution skipped; ToolResult content = "Skipped due to queued user message." with is_error: true
  • Failed(msg): Converted to ToolResult with error text; is_error: true; always returned to LLM so it can self-correct
  • InvalidArgs(msg): Same as Failed; LLM can retry with corrected parameters
  • NotFound(msg): Produced when tool name in ToolCall has no matching AgentTool; same handling as Failed

Input Filter Errors

  • Reject(reason): Emits AgentEvent::InputRejected, immediately emits AgentEvent::AgentEnd { messages: [] }, returns empty message list
  • Warn(msg): Warning text appended to last user message content; loop continues

Execution Limit Exhaustion

  • When any limit is exceeded, a synthetic user message [Agent stopped: {reason}] is appended to context and emitted as events.
  • Loop returns immediately after appending the message.
  • No error is thrown; AgentEnd is emitted normally.

Before-Turn Abort

  • If before_turn callback returns false, the loop returns immediately with no AgentEnd emitted.
  • This is the only path where AgentEnd is not guaranteed.

Error Propagation Across Components

Provider → ProviderError → stream_assistant_response() → Message{stop_reason: Error}
                                                        ↓
                                            on_error callback invoked
                                                        ↓
                                        AgentEvent::TurnEnd emitted
                                                        ↓
                                           agent loop returns