Implementation Roadmap
Generated from:
../reference/glossary.md,../specs/architecture.md,../architecture/algorithms.mdLast updated: 2026-03-17 Paradigm: Language-agnostic / Implementation-independent
This roadmap defines six progressive stages of implementation derived from the reverse-engineered specification. Each level is a complete, testable stage. Complete and stabilize each level fully before advancing to the next.
Level 1 — Survive
Goal: The system can start, load configuration, initialize its core structures, and confirm it is alive. Nothing works end-to-end yet, but nothing crashes either.
Completion Criteria: A smoke test confirms the Agent can be constructed with a MockProvider, configured via builder methods, and all core data entities can be instantiated without error. No LLM call is required to pass Level 1.
Milestone 1.1 — Core Type System
-
REQ-001: Define the
Contentenum with four variants:Text { text },Image { data: base64, mime_type },Thinking { thinking, signature }, andToolCall { id, name, arguments }. Serialized with a"type"discriminant field. (Source: [AR])- Depends on: —
- Definition of Done: All four variants instantiate; round-trip JSON serialization produces the correct tagged shape.
-
REQ-002: Define the
Messageenum with three variants:User { content, timestamp },Assistant { content, stop_reason, model, provider, usage, timestamp, error_message }, andToolResult { tool_call_id, tool_name, content, is_error, timestamp }. (Source: [AR])- Depends on: REQ-001, REQ-005, REQ-006
- Definition of Done: All three variants instantiate; serialization preserves the
rolefield with values"user","assistant","toolResult".
-
REQ-003: Define
AgentMessageas an untagged enum wrappingLlm(LlmMessage)andExtension(ExtensionMessage). (Source: [AR])- Depends on: REQ-002, REQ-004
- Definition of Done: Both variants serialize/deserialize correctly; an
Extensionvariant round-trips without loss.
-
REQ-004: Define
ExtensionMessagewith fieldsrole: String(always"extension"),kind: String, anddata: JSON. (Source: [AR])- Depends on: —
- Definition of Done: Instantiates and serializes to
{role:"extension", kind:"...", data:{...}}.
-
REQ-005: Define
StopReasonenum with variantsStop,Length,ToolUse,Error,Aborted. Serialized in camelCase. (Source: [AR])- Depends on: —
- Definition of Done: All variants serialize to their documented camelCase strings.
-
REQ-006: Define
Usagestruct with fieldsinput,output,cache_read,cache_write,total_tokens(allu64). Include acache_hit_rate()derived method. (Source: [AR])- Depends on: —
- Definition of Done:
cache_hit_rate()returnscache_read / (input + cache_read + cache_write).
-
REQ-007: Define
AgentEventenum with all variants:AgentStart,AgentEnd { messages },TurnStart,TurnEnd { message, tool_results },MessageStart { message },MessageUpdate { message, delta },MessageEnd { message },ToolExecutionStart { tool_call_id, tool_name, args },ToolExecutionUpdate { tool_call_id, tool_name, partial_result },ToolExecutionEnd { tool_call_id, tool_name, result, is_error },ProgressMessage { tool_call_id, tool_name, text },InputRejected { reason }. (Source: [AR])- Depends on: REQ-002, REQ-008
- Definition of Done: All variants instantiate.
-
REQ-008: Define
StreamDeltaenum with variantsText { delta },Thinking { delta },ToolCallDelta { delta }. (Source: [AR])- Depends on: —
- Definition of Done: All variants instantiate and carry their string payload.
-
REQ-009: Define
ToolContextstruct with fieldstool_call_id,tool_name,cancel: CancellationToken,on_update: Option<ToolUpdateFn>,on_progress: Option<ProgressFn>. (Source: [AR])- Depends on: —
- Definition of Done: Struct instantiates; callback fields accept closures/function pointers.
-
REQ-010: Define
ToolResult { content: Vec<Content>, details: JSON }andToolErrorenum with variantsFailed(String),NotFound(String),InvalidArgs(String),Cancelled. (Source: [AR])- Depends on: REQ-001
- Definition of Done: All variants instantiate;
ToolErrorconverts to a display string.
-
REQ-011: Define
ContextConfigstruct with fields and defaults:max_context_tokens(100,000),system_prompt_tokens(4,000),keep_recent(10),keep_first(2),tool_output_max_lines(50). (Source: [AR])- Depends on: —
- Definition of Done: Default construction produces the documented default values.
-
REQ-012: Define
ExecutionLimitsstruct with defaultsmax_turns(50),max_total_tokens(1,000,000),max_duration(600s); andExecutionTrackerruntime state with fieldslimits,turns,tokens_used,started_at. (Source: [AR])- Depends on: —
- Definition of Done:
ExecutionTracker::new(limits)initializesturns=0,tokens_used=0,started_at=now.
-
REQ-013: Define
RetryConfigwith defaults:max_retries(3),initial_delay_ms(1,000),backoff_multiplier(2.0),max_delay_ms(30,000). (Source: [AR])- Depends on: —
- Definition of Done: Default construction produces documented defaults.
-
REQ-014: Define
CacheConfig { enabled: bool, strategy: CacheStrategy }andCacheStrategyenum with variantsAuto,Disabled,Manual { cache_system, cache_tools, cache_messages }. (Source: [AR])- Depends on: —
- Definition of Done: All variants instantiate; default
CacheConfighasenabled: true,strategy: Auto.
-
REQ-015: Define
StreamConfigstruct with fieldsmodel,system_prompt,messages: Vec<Message>,tools: Vec<ToolDefinition>,thinking_level,api_key,max_tokens,temperature,model_config,cache_config. (Source: [AR])- Depends on: REQ-014, REQ-016
- Definition of Done: Struct instantiates with all optional fields as
None.
-
REQ-016: Define
ToolDefinitionstruct with fieldsname,description,parameters: JSON. (Source: [AR])- Depends on: —
- Definition of Done: Struct instantiates and serializes to the expected JSON shape.
-
REQ-017: Define
QueueModeenum with variantsOneAtATimeandAll. (Source: [AR])- Depends on: —
- Definition of Done: Both variants exist; default is
OneAtATime.
-
REQ-018: All types in the
AgentMessagetree deriveSerializeandDeserialize. (Source: [OV])- Depends on: REQ-001 through REQ-017
- Definition of Done: Full round-trip JSON serialization of a
Vec<AgentMessage>containing all message types is lossless.
-
REQ-019: Define
ThinkingLevelenum with variantsOff,Minimal,Low,Medium,High. (Source: [OV])- Depends on: —
- Definition of Done: All variants exist.
Milestone 1.2 — Core Traits
-
REQ-020: Define
StreamProvidertrait with a single methodstream(config: StreamConfig, tx: EventSender, cancel: CancellationToken) -> Result<Message, ProviderError>. DefineProviderErrorenum with variantsApi(String),Network(String),Auth(String),RateLimited { retry_after_ms: Option<u64> },ContextOverflow { message: String },Cancelled,Other(String). (Source: [AR])- Depends on: REQ-002, REQ-015
- Definition of Done: Trait compiles;
ProviderErrorvariants all instantiate.
-
REQ-021: Define
AgentTooltrait with methodsname() -> &str,label() -> &str,description() -> &str,parameters_schema() -> JSON,execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>. (Source: [AR])- Depends on: REQ-009, REQ-010
- Definition of Done: Trait compiles; a minimal struct can implement it.
-
REQ-022: Define
InputFiltertrait with methodfilter(text: &str) -> FilterResultwhereFilterResultisPass,Warn(String), orReject(String). (Source: [OV])- Depends on: —
- Definition of Done: Trait compiles; all three result variants exist.
-
REQ-023: Define
CompactionStrategytrait with methodcompact(messages: Vec<AgentMessage>, config: ContextConfig) -> Vec<AgentMessage>. (Source: [AR])- Depends on: REQ-003, REQ-011
- Definition of Done: Trait compiles; a struct can implement it.
Milestone 1.3 — Agent Struct Construction
-
REQ-024: Implement
BasicAgent::new(model_config: ModelConfig) -> BasicAgent. Initialize all fields to documented defaults:messages = [],tools = [],thinking_level = Off,tool_execution = Parallel,steering_mode = OneAtATime,follow_up_mode = OneAtATime,context_config = Some(default),execution_limits = Some(default),retry_config = default,is_streaming = false,cancel = None. (Source: [PS])- Depends on: REQ-011 through REQ-017, REQ-019, REQ-020
- Definition of Done:
BasicAgent::new(ModelConfig::anthropic("m", "m", "k"))compiles and all fields have their documented defaults.
-
REQ-025: Implement builder methods:
with_system_prompt(text),with_model_config(cfg),with_provider_override(provider),with_max_tokens(n),with_thinking(level). (Source: [PS])- Depends on: REQ-024
- Definition of Done: Method chain
BasicAgent::new(ModelConfig::anthropic("m", "m", "k")).with_system_prompt("x")compiles and all fields are set correctly.
-
REQ-026: Implement
with_tools(vec),with_context_config(cfg),with_execution_limits(limits),with_retry_config(cfg),with_cache_config(cfg),with_tool_execution(strategy),with_steering_mode(mode),with_follow_up_mode(mode). (Source: [PS])- Depends on: REQ-024
- Definition of Done: All builders set their respective fields;
with_toolsreplaces (or extends) the tools list.
-
REQ-027: Initialize
steering_queueandfollow_up_queueasArc<Mutex<Vec<AgentMessage>>>inBasicAgent::new. (Source: [AR])- Depends on: REQ-003, REQ-024
- Definition of Done: Both queues are non-null, independently lockable, and start empty.
Milestone 1.4 — AgentContext and AgentLoopConfig
-
REQ-028: Define
AgentContextstruct with fieldssystem_prompt: String,messages: Vec<AgentMessage>,tools: &[Box<dyn AgentTool>]. (Source: [AR])- Depends on: REQ-003, REQ-021
- Definition of Done: Struct compiles;
messagesis mutable in-place during the loop.
-
REQ-029: Define
AgentLoopConfigstruct bundling all behavioral settings:provider,model,api_key,thinking_level,max_tokens,temperature,model_config,get_steering_messages: Option<Fn()>,get_follow_up_messages: Option<Fn()>,context_config,compaction_strategy,execution_limits,cache_config,tool_execution,retry_config,before_turn,after_turn,on_error,input_filters,transform_context,convert_to_llm. (Source: [OV])- Depends on: REQ-011 through REQ-017, REQ-023
- Definition of Done: Struct compiles with all optional fields as
None.
Milestone 1.5 — MockProvider and Smoke Test
-
REQ-030: Implement
MockProviderthat implementsStreamProvider. Accepts a list of pre-configured responses to return in sequence. Returns aMessage::Assistantwithstop_reason: Stopand configurable text content. (Source: [AR])- Depends on: REQ-020
- Definition of Done:
MockProvider::new(vec![response1, response2])returns each response in order whenstream()is called; after exhausting the list, returns a default stop response.
-
REQ-031: Smoke test: construct
Agent::new(MockProvider::new([])), configure with builder methods, verify all fields are set correctly, and confirm no panic occurs. (Source: [OV])- Depends on: REQ-024 through REQ-030
- Definition of Done: Test passes with zero panics; all configured fields read back correctly.
Level 2 — Useful
Goal: The primary use cases from the spec work end-to-end on valid, well-formed inputs. An agent can accept a prompt, call an LLM, execute tool calls, and return a final response.
Completion Criteria: Every primary use case from ../reference/glossary.md executes
successfully with valid inputs and a real (or mock) provider: single-turn text
response, multi-turn tool call cycle, message persistence round-trip, and agent
reset. The built-in coding tools all execute on valid inputs.
Milestone 2.1 — Event Channel Infrastructure
-
REQ-032: Implement an unbounded async event channel. The
agent_loopholds the sender (tx); callers receive from the receiver (rx). The channel never blocks the sender. (Source: [AR])- Depends on: REQ-007
- Definition of Done: Sender can emit 1,000 events without blocking; receiver drains them all in order.
-
REQ-033: Implement
CancellationTokenwith methodsnew(),cancel(),is_cancelled() -> bool,child_token() -> CancellationToken. Cancelling a parent automatically cancels all children. (Source: [AR])- Depends on: —
- Definition of Done: Cancelling a root token causes
is_cancelled()to returntrueon both the root and any child tokens created from it.
Milestone 2.2 — Agent Prompt Entry Point
-
REQ-034: Implement
Agent::prompt(text: String) -> EventReceiveras a thin wrapper that constructs aUsermessage and delegates toprompt_messages. (Source: [PS])- Depends on: REQ-002, REQ-035
- Definition of Done:
agent.prompt("hello")returns a receiver immediately (non-blocking).
-
REQ-035: Implement
Agent::prompt_messages_with_sender(messages, tx): setis_streaming = true, createCancellationToken, buildAgentContextsnapshot, buildAgentLoopConfig(wiring queue closures), spawnagent_loop, merge returned messages intoAgent.messageson completion, setis_streaming = false. (Source: [PS])- Depends on: REQ-027, REQ-028, REQ-029, REQ-033, REQ-036
- Definition of Done: After the spawned task completes,
agent.messagescontains the new messages andis_streamingisfalse.
Milestone 2.3 — Agent Loop Core
-
REQ-036: Implement
agent_loop: emitAgentStart, append prompts tocontext.messages, emitTurnStart/MessageStart/MessageEndfor each prompt, callrun_loop, emitAgentEnd, return new messages. (Source: [PS])- Depends on: REQ-032, REQ-037
- Definition of Done: With
MockProvider, a single call emitsAgentStart, at least oneTurnStart/TurnEndpair, andAgentEnd; returned messages include the input prompt and the assistant response.
-
REQ-037: Implement
agent_loop_continue: emitAgentStart/TurnStart, callrun_loop, emitAgentEnd. (Source: [PS])- Depends on: REQ-036
- Definition of Done: Resumes from existing context without re-appending prompts.
-
REQ-038: Implement
run_loopinner loop (happy path only: no steering, no follow-ups, no limits): callstream_assistant_response, append assistant message, extract tool calls, callexecute_tool_calls, append tool results, loop until no more tool calls, then break. (Source: [PS])- Depends on: REQ-039, REQ-045, REQ-060
- Definition of Done: With a MockProvider that returns one tool call then one
Stop,run_loopexecutes the tool and calls the LLM a second time before stopping.
Milestone 2.4 — LLM Streaming (Happy Path)
-
REQ-039: Implement
stream_assistant_response(no retry): buildStreamConfigfrom context and config, callprovider.stream(), process stream events (Start→ emitMessageStart;TextDelta/ThinkingDelta/ToolCallDelta→ emitMessageUpdate;Done→ emitMessageEnd;Error→ emitMessageStart+MessageEnd), return finalMessage. (Source: [PS])- Depends on: REQ-007, REQ-008, REQ-015, REQ-020, REQ-032
- Definition of Done: With MockProvider, caller receives
MessageStart, one or moreMessageUpdatewith text deltas, andMessageEndcontaining the complete assembled message.
-
REQ-040: Implement
AnthropicProvider::stream: POST tohttps://api.anthropic.com/v1/messageswithx-api-key+anthropic-version: 2023-06-01headers,stream: truebody; parse SSE events (message_start,content_block_start,content_block_delta,message_delta,message_stop); bufferInputJsonDeltatool-argument fragments; parse complete JSON oncontent_block_stop; emitStreamEvents. (Source: [AR])- Depends on: REQ-020, REQ-039
- Definition of Done: Integration test with a real or stubbed Anthropic endpoint produces a correctly parsed
Message::Assistantwith usage stats.
-
REQ-041: Implement
OpenAiCompatProvider::stream: POST to configured base URL +/chat/completionswithAuthorization: Bearerheader,stream: true,stream_options: {include_usage: true}; parse SSE chunkschoices[0].delta; accumulate tool-call argument strings; emitStreamEvents. (Source: [AR])- Depends on: REQ-020, REQ-039
- Definition of Done: Correctly parses a streamed chat-completion response from any OpenAI-compatible endpoint.
-
REQ-042: Implement
ProviderRegistrywithnew()(empty) anddefault()(pre-registersAnthropicProviderandOpenAiCompatProvider).ProviderRegistryitself implementsStreamProvider, dispatching based onApiProtocolor model prefix. (Source: [AR])- Depends on: REQ-040, REQ-041
- Definition of Done:
ProviderRegistry::default()can route a config toAnthropicProviderorOpenAiCompatProviderwithout manual dispatch.
-
REQ-043: Implement
StopReasondetermination in each provider: map provider-specific stop signals to the unifiedStopReasonenum ("end_turn"/"stop"→Stop;"max_tokens"/"length"→Length;"tool_use"/"tool_calls"→ToolUse; cancellation →Aborted; errors →Error). (Source: [PS])- Depends on: REQ-005, REQ-040, REQ-041
- Definition of Done: Each stop signal string maps to exactly one
StopReasonvariant.
-
REQ-044: Filter
Extensionmessages out ofAgentMessagehistory before buildingStreamConfig.messages. OnlyLlm(LlmMessage)variants are sent to the LLM (note:LlmMessagewrapsMessage+Option<TurnId>). (Source: [AR])- Depends on: REQ-003, REQ-015
- Definition of Done: An
AgentMessage::Extensionpresent incontext.messagesdoes not appear in theStreamConfigsent to the provider.
Milestone 2.5 — Tool Execution (Happy Path)
-
REQ-045: Implement
execute_tool_callsdispatching to the configuredToolExecutionStrategy. ForParallel(default), useexecute_batch. (Source: [PS])- Depends on: REQ-046
- Definition of Done: Multiple tool calls from one LLM response are dispatched concurrently; results arrive in original call order.
-
REQ-046: Implement
execute_single_tool: find tool by name, emitToolExecutionStart, buildToolContextwith child cancel token and callbacks, calltool.execute(args, ctx), emitToolExecutionEnd, constructMessage::ToolResult, emitMessageStart/MessageEnd, return(ToolResult, is_error). (Source: [PS])- Depends on: REQ-007, REQ-009, REQ-010, REQ-021, REQ-033
- Definition of Done: A registered tool is called; its result is wrapped in a
ToolResultmessage;ToolExecutionStartandToolExecutionEndevents are emitted.
-
REQ-047: Implement
BashTool::execute(basic): extractcommandparam, runbash -c {command}, capture stdout+stderr, construct text output ("Exit code: N\n{stdout}"or"Exit code: N\nSTDOUT:\n{stdout}\nSTDERR:\n{stderr}"), returnOk(ToolResult). (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done:
echo "hello"returnsOk(ToolResult)with text containing"Exit code: 0"and"hello".
-
REQ-048: Implement
ReadFileTool::execute(basic text path): extractpathparam, read file to string, split into lines, apply optionaloffset/limit, produce line-numbered output with header, returnOk(ToolResult). (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done: Reading a known text file returns numbered lines; partial reads with
offset/limitreturn the correct slice with a range header.
-
REQ-049: Implement
WriteFileTool::execute: extractpathandcontentparams, create parent directories as needed, write file, returnOk(ToolResult). (Source: [AR])- Depends on: REQ-010, REQ-021
- Definition of Done: Writing to a path with non-existent parent directories succeeds; file is created on disk with correct content.
-
REQ-050: Implement
EditFileTool::execute(basic): extractpath,old_text,new_text; read file; replace the first occurrence ofold_textwithnew_text; write back; return confirmation text. (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done: A known substitution in an existing file is applied correctly; confirmation message reports old/new line counts.
-
REQ-051: Implement
ListFilesTool::execute(basic): extractpath,pattern,max_depth; build and runfindcommand with exclusions fortarget/,.git/,node_modules/; return file paths as text. (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done: Listing a known directory returns its files; excluded directories do not appear in results.
-
REQ-052: Implement
SearchTool::execute(basic): extractpattern,path,include,case_sensitive; preferrg, fall back togrep; return matching lines. (Source: [PS])- Depends on: REQ-010, REQ-021
- Definition of Done: Searching for a known string in a known directory returns matching file paths and line content.
-
REQ-053: Implement
default_tools()returning aVec<Box<dyn AgentTool>>containing all six built-in tools: Bash, ReadFile, WriteFile, EditFile, ListFiles, Search. (Source: [AR])- Depends on: REQ-047 through REQ-052
- Definition of Done:
default_tools()returns exactly 6 tools with distinct names.
Milestone 2.6 — Context Compaction (Happy Path)
-
REQ-054: Implement
estimate_tokens(text) -> usizeusing the heuristicceil(byte_length / 4). (Source: [PS])- Depends on: —
- Definition of Done:
estimate_tokens("hello")returns 2 (5 bytes / 4, rounded up).
-
REQ-055: Implement
content_tokens(content: Vec<Content>) -> usizeandmessage_tokens(msg: AgentMessage) -> usizeper the specified formulas (image tokens:clamp(raw_bytes/750, 85, 16000); per-message overhead: +4 for user/assistant, +8 for tool result). (Source: [PS])- Depends on: REQ-001, REQ-003, REQ-054
- Definition of Done: Token counts match the specified formulas for each content type.
-
REQ-056: Implement
compact_messages(messages, config) -> Vec<AgentMessage>: if under budget, return unchanged; else cascade through Level 1 → Level 2 → Level 3 until budget is satisfied. (Source: [PS])- Depends on: REQ-055, REQ-057, REQ-058, REQ-059
- Definition of Done:
compact_messagescalled on a history exceeding budget returns a smaller history withtotal_tokens <= budget.
-
REQ-057: Implement
level1_truncate_tool_outputs: for eachToolResultmessage, truncate eachTextcontent block to at mostmax_linesusing head+tail preservation with an omission marker. (Source: [PS])- Depends on: REQ-003, REQ-054
- Definition of Done: A 200-line tool output truncated to
max_lines=50produces a 50-line result with"[... N lines truncated ...]"marker.
-
REQ-058: Implement
level2_summarize_old_turns: keep the lastkeep_recentmessages in full; replace older assistant+tool-result groups with a single one-line summary user message. (Source: [PS])- Depends on: REQ-003, REQ-054
- Definition of Done: Old assistant messages and their tool results are replaced by
"[Summary] ..."user messages; recent messages are untouched.
-
REQ-059: Implement
level3_drop_middle: keepkeep_firsthead messages andkeep_recenttail messages; replace the dropped middle with a marker message. Implementkeep_within_budgetfallback that greedily keeps the most-recent messages fitting the budget. (Source: [PS])- Depends on: REQ-003, REQ-054
- Definition of Done: Result contains the first N and last M messages with a marker; total tokens fits the budget.
-
REQ-060: Integrate
compact_messagescall inrun_loopbefore each LLM call whencontext_configisSome. (Source: [PS])- Depends on: REQ-038, REQ-056
- Definition of Done: When configured, each LLM call is preceded by a compaction pass; when
context_configisNone, no compaction occurs.
Milestone 2.7 — Execution Limits
-
REQ-061: Implement
ExecutionTracker::record_turn(tokens: usize)(incrementsturnsand adds totokens_used) andcheck_limits() -> Option<String>(returns a reason string if any limit is exceeded: turns, total tokens, or wall-clock duration). (Source: [AR])- Depends on: REQ-012
- Definition of Done:
check_limits()returnsNonewhen under all limits andSome("max turns exceeded")when over.
-
REQ-062: Integrate execution limit checking in
run_loop: calltracker.check_limits()at the start of each inner loop iteration; if exceeded, append a syntheticUsermessage"[Agent stopped: {reason}]", emitMessageStart/MessageEnd, and return. (Source: [PS])- Depends on: REQ-038, REQ-061
- Definition of Done: An agent with
max_turns=2stops after exactly 2 LLM calls; the last message contains the stop reason.
Milestone 2.8 — Message Persistence and Agent Control
-
REQ-063: Implement
Agent::save_messages() -> String: serializeagent.messagesto a JSON string. (Source: [OV])- Depends on: REQ-018
- Definition of Done:
save_messages()returns a valid JSON array; the string can be parsed back without error.
-
REQ-064: Implement
Agent::restore_messages(json: &str): deserialize the JSON string intoVec<AgentMessage>and replaceagent.messages. (Source: [OV])- Depends on: REQ-018, REQ-063
- Definition of Done: After
save_messages()→restore_messages(), the agent's message history is identical to the original.
-
REQ-065: Implement
Agent::reset(): clearmessages, drain both queues, cancel any active run, resetis_streamingtofalse, drop the cancel token. (Source: [AR])- Depends on: REQ-033
- Definition of Done: After
reset(),messagesis empty, both queues are empty, andis_streamingis false.
-
REQ-066: Implement
Agent::steer(msg: AgentMessage)(push tosteering_queue) andAgent::follow_up(msg: AgentMessage)(push tofollow_up_queue). (Source: [AR])- Depends on: REQ-027
- Definition of Done: After
steer(msg), the steering queue contains exactly that message and is safe to read from another thread.
-
REQ-067: Implement
Agent::abort(): if a cancel token exists, callcancel()on it. (Source: [AR])- Depends on: REQ-033, REQ-035
- Definition of Done: Calling
abort()during an active run causescancel.is_cancelled()to returntrueinside the running agent loop.
Level 3 — Smart
Goal: The system handles reality. Invalid inputs, missing data, external failures, and edge cases are all handled gracefully. Every
[invariant]andERRORbranch from the pseudocode is implemented.
Completion Criteria: No unhandled exception can be triggered by a known
class of bad input. All error paths from ../architecture/algorithms.md are covered:
provider failures, tool errors, context overflow, execution limits,
filter rejections, and cancellation.
Milestone 3.1 — Input Filter Chain
-
REQ-068: Implement the input filter chain at the start of
agent_loop: join allTextcontent fromUsermessages in prompts, run each registeredInputFilterin order. (Source: [PS])- Depends on: REQ-022, REQ-036
- Definition of Done: A filter registered via
with_input_filteris called with the user's text before any LLM call.
-
REQ-069: On first
Rejectresult, emitInputRejected { reason }thenAgentEnd { messages: [] }and return an empty message list immediately. (Source: [PS])- Depends on: REQ-068
- Definition of Done: A rejecting filter stops the run before the first LLM call; the caller's event stream contains
InputRejectedfollowed byAgentEnd.
-
REQ-070: Accumulate
Warnresults; after all filters pass, append all warning text asContent::Textto the lastUsermessage before it is appended to context. (Source: [PS])- Depends on: REQ-068
- Definition of Done: A warning filter adds
"[Warning: ...]"text to the user message; the run continues normally.
Milestone 3.2 — Retry Engine
-
REQ-071: Implement
delay_for_attempt(config, attempt) -> Duration: exponential backoff formulainitial_delay_ms * (multiplier ^ (attempt - 1)), capped atmax_delay_ms, multiplied by a uniform random jitter in[0.8, 1.2]. (Source: [PS])- Depends on: REQ-013
- Definition of Done: With defaults, attempt 1 produces a duration in
[800ms, 1200ms]; attempt 3 produces a duration in[3200ms, 4800ms].
-
REQ-072: Implement
is_retryable()onProviderError: returnstrueonly forRateLimitedandNetworkvariants. (Source: [AR])- Depends on: REQ-020
- Definition of Done:
Auth,Api,ContextOverflow,Cancelled,Otherall returnfalse;RateLimitedandNetworkreturntrue.
-
REQ-073: Implement
retry_after()onProviderError: extractsretry_after_msfromRateLimited { retry_after_ms: Some(n) }if present; returnsNoneotherwise. (Source: [AR])- Depends on: REQ-020
- Definition of Done:
ProviderError::RateLimited { retry_after_ms: Some(5000) }.retry_after()returnsSome(Duration::from_ms(5000)).
-
REQ-074: Integrate retry loop into
stream_assistant_response: on a retryable error, sleep forretry_after() OR delay_for_attempt(attempt)and retry up tomax_retriestimes; stop retrying ifcancel.is_cancelled(). (Source: [PS])- Depends on: REQ-039, REQ-071, REQ-072, REQ-073
- Definition of Done: A
RateLimitederror causes the loop to wait and retry; after exhausting retries, the error is propagated as anErrorstop reason.
Milestone 3.3 — Provider Error Classification
-
REQ-075: Implement
ProviderError::classify(status: u16, message: String) -> ProviderError: route toContextOverflowfirst (status 400/413 or matching overflow phrase), thenRateLimited(429), thenAuth(401/403), thenApi. (Source: [PS])- Depends on: REQ-020
- Definition of Done: HTTP 429 maps to
RateLimited; HTTP 401 maps toAuth; "prompt is too long" in the body maps toContextOverflow.
-
REQ-076: Implement
is_context_overflow(status, message) -> bool: check for empty body with status 400/413 (Cerebras/Mistral pattern); check for any of 15+ documented overflow phrases (case-insensitive substring match). (Source: [PS])- Depends on: —
- Definition of Done: All 15 documented overflow phrases are recognized; unrelated 400 errors with non-empty body are not misclassified.
-
REQ-077: Implement context overflow recovery: when the streaming error event contains a message matching overflow detection (
Message::is_context_overflow()), treat it as an overflow on the next turn by triggeringcompact_messages(ifcontext_configis set). (Source: [AR])- Depends on: REQ-056, REQ-075, REQ-076
- Definition of Done: A mock that returns an overflow error on turn 1 causes compaction before turn 2.
Milestone 3.4 — Tool Error Handling
-
REQ-078: On
ToolError::Failed(msg)orToolError::InvalidArgs(msg): convert to aToolResultwithcontent: [Text(msg)]andis_error: true; always return this to the LLM so it can self-correct. (Source: [AR])- Depends on: REQ-010, REQ-046
- Definition of Done: A tool that returns
Err(Failed("oops"))produces aToolResultmessage withis_error: trueand the text"oops".
-
REQ-079: On
ToolError::NotFound(name): produceToolResult { content: [Text("Tool {name} not found")], is_error: true }. (Source: [PS])- Depends on: REQ-046
- Definition of Done: Requesting a non-existent tool name in a tool call produces a
NotFounderror result.
-
REQ-080: On
ToolError::Cancelled: produceToolResult { content: [Text("Skipped due to queued user message.")], is_error: true }. (Source: [AR])- Depends on: REQ-010, REQ-046
- Definition of Done: A tool skipped due to steering produces the documented skipped message.
Milestone 3.5 — Error and Abort Stop Reason Handling
-
REQ-081: In
run_loop, when the assistant message hasstop_reason == Error: callon_error(error_message)if defined, callafter_turnif defined, emitTurnEnd, return immediately. (Source: [PS])- Depends on: REQ-038, REQ-082
- Definition of Done: A mock provider that returns an error stop reason causes the loop to exit;
on_erroris called with the message text.
-
REQ-082: In
run_loop, whenstop_reason == Aborted: callafter_turnif defined, emitTurnEnd, return immediately. (Source: [PS])- Depends on: REQ-038
- Definition of Done: Calling
agent.abort()mid-run causes the loop to exit cleanly;TurnEndis emitted.
-
REQ-083: Construct a synthetic error
Message::Assistanton irrecoverable provider failure (after retry exhaustion): empty content,stop_reason: Error,error_message: Some(e.to_string()). (Source: [PS])- Depends on: REQ-002, REQ-039
- Definition of Done: A provider that always fails produces an
Assistantmessage withstop_reason: Errorcontaining the provider's error text.
Milestone 3.6 — Sequential and Batched Tool Execution
-
REQ-084: Implement
execute_sequential: execute tool calls one at a time; after each, check the steering queue; on non-empty steering, skip remaining tools withToolError::Cancelledresults and return steering messages. (Source: [PS])- Depends on: REQ-046, REQ-080
- Definition of Done: With steering arriving after tool 1 of 3, tools 2 and 3 receive skipped error results; the steering message is returned for injection.
-
REQ-085: Implement
execute_batch(Parallel): launch all tools concurrently viajoin_all; after all complete, check steering once; return steering if present. (Source: [PS])- Depends on: REQ-046
- Definition of Done: Three parallel tools all complete; steering arriving before their completion is returned after all finish.
-
REQ-086: Implement
Batched { size }dispatch: split tool calls into groups ofsize; run each group viaexecute_batch; check steering between groups; on steering, skip remaining groups with cancelled results. (Source: [PS])- Depends on: REQ-085
- Definition of Done: With 5 tool calls,
Batched { size: 2 }executes groups [1,2], [3,4], [5]; steering after group 1 skips groups 2 and 3.
Milestone 3.7 — Steering and Follow-up Queue Integration
-
REQ-087: In
run_loop, drain the steering queue at the start of the outer loop before the first inner-loop iteration. (Source: [PS])- Depends on: REQ-038
- Definition of Done: Messages enqueued via
steer()beforeprompt()is called are injected as the first pending messages.
-
REQ-088: After tool execution, if steering messages were captured, set them as
pendingand continue the inner loop (injecting them before the next LLM call). (Source: [PS])- Depends on: REQ-038, REQ-084, REQ-085
- Definition of Done: A steering message injected during tool execution appears in context before the subsequent LLM call.
-
REQ-089: After the inner loop exits (no tool calls, no pending steering), check the follow-up queue; if non-empty, add follow-up messages to
pendingand continue the outer loop. (Source: [PS])- Depends on: REQ-038
- Definition of Done: A follow-up message enqueued via
follow_up()causes the agent to re-enter the loop rather than stopping.
-
REQ-090: Implement
QueueMode::OneAtATime(pop exactly one message per read) andQueueMode::All(drain the entire queue per read). Both modes are thread-safe (mutex-protected). (Source: [AR])- Depends on: REQ-017, REQ-027
- Definition of Done:
OneAtATimeleaves remaining messages in the queue;Allempties it; both are safe to call from the agent loop while another thread pushes.
Milestone 3.8 — Lifecycle Callbacks
-
REQ-091: Call
before_turn(messages, turn_number) -> boolat the start of each turn (before the LLM call). If it returnsfalse, return fromrun_loopimmediately without emittingAgentEnd. (Source: [PS])- Depends on: REQ-038
- Definition of Done: A
before_turnthat returnsfalseon turn 2 stops the loop after turn 1;AgentEndis not emitted.
-
REQ-092: Call
after_turn(messages, usage)after each LLM call and its tool executions, including on error/abort paths. (Source: [PS])- Depends on: REQ-038
- Definition of Done:
after_turnis called exactly once per turn, including when the turn ends in an error.
-
REQ-093: Call
on_error(message: &str)whenstop_reason == Error. (Source: [PS])- Depends on: REQ-081
- Definition of Done: An error-returning provider invokes the
on_errorcallback with the error message string.
Milestone 3.9 — Tool Safety and Edge Cases
-
REQ-094:
BashTool: check eachdeny_patternagainst the command (substring match) before execution; returnErr(Failed("Command blocked..."))on match. (Source: [PS])- Depends on: REQ-047
- Definition of Done: A command containing a deny pattern is rejected before any subprocess is spawned.
-
REQ-095:
BashTool: race subprocess completion against a configurable timeout and the cancellation token; on timeout returnErr(Failed("Command timed out after Ns")); on cancellation returnErr(Cancelled). (Source: [PS])- Depends on: REQ-047
- Definition of Done:
sleep 300with a 2s timeout produces a timeout error; cancellation producesCancelled.
-
REQ-096:
BashTool: truncatestdoutandstderrindependently atmax_output_bytes(default 256KB) and append"\n... (output truncated)". (Source: [PS])- Depends on: REQ-047
- Definition of Done: Output exceeding 256KB is truncated with the documented suffix.
-
REQ-097:
BashTool: optionalconfirm_fncallback; if defined and returnsfalse, returnErr(Failed("Command was not confirmed by the user.")). (Source: [PS])- Depends on: REQ-047
- Definition of Done: A rejecting
confirm_fnprevents subprocess execution.
-
REQ-098:
ReadFileTool: check file size before reading. Text files exceedingmax_bytes(1MB): returnErr(Failed("File too large. Use offset/limit...")). Image files exceeding 20MB: returnErr(Failed("Image too large")). (Source: [PS])- Depends on: REQ-048
- Definition of Done: Reading a file above the size limit returns the documented error without reading the file contents.
-
REQ-099:
ReadFileTool: for image extensions, read file as bytes, base64-encode, detect MIME type from extension, returnContent::Image. (Source: [PS])- Depends on: REQ-001, REQ-048
- Definition of Done: Reading a
.pngfile returns aToolResultwithContent::Image { data: base64, mime_type: "image/png" }.
-
REQ-100:
ReadFileTool: checkctx.cancel.is_cancelled()before each I/O operation; returnErr(Cancelled)if set. (Source: [PS])- Depends on: REQ-048
- Definition of Done: Cancelling before a read returns
Cancelledwithout touching the file.
-
REQ-101:
EditFileTool: ifold_textmatches zero occurrences, attemptfind_similar_textfor a fuzzy hint; returnErr(Failed("old_text not found... Did you mean: ...")). (Source: [PS])- Depends on: REQ-050
- Definition of Done: An edit with wrong
old_textreturns aFailederror; if a similar line exists, the hint is included.
-
REQ-102:
EditFileTool: ifold_textmatches more than one occurrence, returnErr(Failed("old_text matches N locations. Include more context...")). (Source: [PS])- Depends on: REQ-050
- Definition of Done: Attempting to replace ambiguous text returns a descriptive error with the match count.
-
REQ-103:
EditFileTool: checkctx.cancel.is_cancelled()before each I/O operation. (Source: [PS])- Depends on: REQ-050
- Definition of Done: Cancellation before read or write returns
Err(Cancelled).
-
REQ-104:
WriteFileTool: checkctx.cancel.is_cancelled()before writing. (Source: [AR])- Depends on: REQ-049
- Definition of Done: Cancellation prevents the write from occurring.
-
REQ-105:
ListFilesTool: racefindexecution against a timeout (default 10s) and the cancellation token; truncate results atmax_results(default 200) with a truncation suffix. (Source: [PS])- Depends on: REQ-051
- Definition of Done: Listing a directory with 500 files returns 200 with the truncation message.
-
REQ-106:
SearchTool: fall back fromrgtogrepif ripgrep is not available on the system. Checkctx.cancel.is_cancelled()before execution. (Source: [PS])- Depends on: REQ-052
- Definition of Done: Search succeeds on a system without
rginstalled; cancellation is respected.
Milestone 3.10 — Agent Invariants
-
REQ-107: In
prompt_messages_with_sender, assert!self.is_streamingwith a clear panic message before proceeding. (Source: [PS])- Depends on: REQ-035
- Definition of Done: Calling
prompt()while a run is active panics with a message directing the caller to usesteer()orfollow_up().
-
REQ-108: In
agent_loop_continue, validate preconditions:context.messagesis non-empty and the last message is not anAssistantvariant. (Source: [PS])- Depends on: REQ-037
- Definition of Done: Calling
agent_loop_continuewith an empty context or with a trailing assistant message returns an error or panics with a clear message.
Milestone 3.11 — Skill System
-
REQ-109: Implement
SkillSet::load(dirs: Vec<Path>): iterate directories, skip missing ones silently, scan each for subdirectories containingSKILL.md, parse frontmatter, build a name-keyed map (later dirs override earlier on collision), return sortedSkillSet. (Source: [PS])- Depends on: REQ-110
- Definition of Done: Loading two dirs where both contain a skill named
"foo"results in the second dir's version being used.
-
REQ-110: Implement
parse_frontmatter(content) -> (name, description): require content to begin with---, extract YAML block up to next\n---, parsename:anddescription:lines, strip surrounding quotes, returnErr(InvalidFrontmatter)orErr(MissingField)on failure. (Source: [PS])- Depends on: —
- Definition of Done: Valid frontmatter parses correctly; missing
namefield returns aMissingFielderror; missing delimiters returnInvalidFrontmatter.
-
REQ-111: Implement
SkillSet::format_for_prompt(): emit<available_skills>XML block with one<skill>element per skill (sorted by name ascending), XML-escaping all string values; return empty string if no skills loaded. (Source: [PS])- Depends on: REQ-109
- Definition of Done: Output is well-formed XML; special characters in skill names/descriptions are correctly escaped.
-
REQ-112: Implement
SkillSet::load_dir(dir, source)andSkillSet::merge(other). (Source: [AR])- Depends on: REQ-109
- Definition of Done:
mergecauses the other's skills to override on name conflict.
-
REQ-113: Implement
Agent::with_skills(skill_set): callformat_for_prompt()and append the XML block toself.system_prompt. (Source: [PS])- Depends on: REQ-111
- Definition of Done: After
with_skills(set), the agent's system prompt contains the<available_skills>XML block.
Milestone 3.12 — MCP Client
-
REQ-114: Implement
McpClient::connect_stdio(cmd, args, env): spawn subprocess with piped stdin/stdout; complete the 3-step initialize handshake; returnOk(McpClient). (Source: [PS])- Depends on: REQ-115, REQ-116
- Definition of Done: Spawning a compliant MCP server subprocess results in a connected client;
server_infois populated from the handshake.
-
REQ-115: Implement
McpClient::send_request(method, params): construct a JSON-RPC 2.0 request with auto-incremented atomic ID, send over transport, receive response, returnErr(JsonRpc{...})on error field orErr(Protocol("Empty result"))on missing result. (Source: [PS])- Depends on: —
- Definition of Done: A JSON-RPC response with an error field maps to
McpError::JsonRpc; a valid result field is returned asOk(value).
-
REQ-116: Implement
McpClient::list_tools()andMcpClient::call_tool(name, args). (Source: [PS])- Depends on: REQ-115
- Definition of Done:
list_tools()returns a parsedVec<McpToolInfo>;call_tool()returns a parsedMcpToolCallResult.
-
REQ-117: Implement
McpToolAdapterimplementingAgentTool: wrapsMcpToolInfometadata and anArc<Mutex<McpClient>>;execute()callsclient.call_tool()and convertsMcpContenttoContentvariants. (Source: [AR])- Depends on: REQ-001, REQ-021, REQ-116
- Definition of Done: An
McpToolAdaptercan be registered on an agent and called successfully in a tool-use turn.
-
REQ-118: Handle all
McpErrorvariants gracefully:Transport,Protocol,JsonRpc,Serialization,Io,ConnectionClosedall surface asToolError::Failedwith descriptive messages. (Source: [AR])- Depends on: REQ-117
- Definition of Done: Each
McpErrorvariant produces a non-panickingToolError::Failedwith a message identifying the error type and context.
-
REQ-119: Implement
Agent::with_mcp_server_stdio(cmd, args, env): callMcpClient::connect_stdio, thenMcpToolAdapter::from_client, append resulting tool adapters toself.tools. (Source: [AR])- Depends on: REQ-114, REQ-117
- Definition of Done: After
with_mcp_server_stdio, the agent's tool list includes all tools reported by the MCP server.
Level 4 — Professional
Goal: The system is safe, observable, and maintainable. It can be operated with multiple provider backends, supports prompt caching and extended thinking, exposes useful observability hooks, and shuts down gracefully.
Completion Criteria: All 7 provider protocols are implemented. Prompt caching, thinking levels, structured logging, and security-sensitive fields are all handled. The cancellation tree propagates correctly to all I/O boundaries. The system is configurable for production use.
Milestone 4.1 — Full Provider Suite
-
REQ-120: Implement
GoogleProvider::stream(Gemini API): POST to{base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={API_KEY}; use custom SSE parser (split on\n\n, extractdata:line); map tool calls fromfunctionDeclarations; auto-generate tool IDs as"google-fc-{index}"; tool results asfunctionResponseparts. (Source: [AR])- Depends on: REQ-020
- Definition of Done: A Gemini streaming response is parsed into the correct
StreamEvents; tool IDs are auto-generated in the documented format.
-
REQ-121: Implement
GoogleVertexProvider::stream(Vertex AI): identical wire format to Gemini; endpoint patternhttps://{region}-aiplatform.googleapis.com/...; auth viaAuthorization: Bearer {OAUTH_TOKEN}; tool IDs as"vertex-fc-{index}". (Source: [AR])- Depends on: REQ-120
- Definition of Done: Vertex request differs from Gemini only in endpoint and auth header.
-
REQ-122: Implement
BedrockProvider::stream(ConverseStream API): endpoint{base_url}/model/{model}/converse-stream; newline-delimited JSON (not standard SSE); parse eventscontentBlockDelta,contentBlockStart,contentBlockStop,messageStop,metadata; tool spec format:toolSpec { inputSchema: { json: schema } }; tool result format:{ toolResult: { toolUseId, content, status } }. (Source: [AR])- Depends on: REQ-020
- Definition of Done: A Bedrock ndjson streaming response is correctly parsed; tool definitions and results are in the Bedrock-specific format.
-
REQ-123: Implement
OpenAiResponsesProvider::stream(OpenAI Responses API): endpoint{base_url}/responses; system prompt in"instructions"field; SSE eventsresponse.output_text.delta,response.reasoning.delta,response.function_call_arguments.*,response.completed. (Source: [AR])- Depends on: REQ-020
- Definition of Done: The Responses API wire format differs correctly from Chat Completions in system prompt field and event names.
-
REQ-124: Implement
AzureOpenAiProvider::stream: endpoint{base_url}/responses?api-version=2025-01-01-preview; auth viaapi-key: {AZURE_OPENAI_API_KEY}header (notAuthorization: Bearer); same request/response format as OpenAI Responses API. (Source: [AR])- Depends on: REQ-123
- Definition of Done: Azure auth uses
api-keyheader; base URL patternhttps://{resource}.openai.azure.com/openai/deployments/{deployment}is supported.
-
REQ-125: Register all 7 providers (Anthropic, OpenAiCompat, OpenAiResponses, Azure, Google, Vertex, Bedrock) in
ProviderRegistry::default(). (Source: [AR])- Depends on: REQ-042, REQ-120 through REQ-124
- Definition of Done:
ProviderRegistry::default()can dispatch to any of the 7 implementations based on protocol selection.
Milestone 4.2 — Prompt Caching
-
REQ-126: Implement
CacheStrategy::Auto: provider automatically placescache_control: { type: "ephemeral" }breakpoints at the system prompt, the last tool definition, and the second-to-last message. (Source: [AR])- Depends on: REQ-014, REQ-040
- Definition of Done: In Anthropic requests, the three cache breakpoints appear in the correct positions when
strategy: Auto.
-
REQ-127: Implement
CacheStrategy::Manual { cache_system, cache_tools, cache_messages }: conditionally apply breakpoints per flag. ImplementCacheStrategy::Disabled: no breakpoints emitted. (Source: [AR])- Depends on: REQ-126
- Definition of Done: Each flag independently controls placement of its respective cache breakpoint.
-
REQ-128: Propagate
Usage.cache_readandUsage.cache_writefrom Anthropic response metadata intoMessage::Assistant.usage. (Source: [AR])- Depends on: REQ-006, REQ-040
- Definition of Done: Cache token counts from Anthropic are populated in the usage struct after a cached-hit response.
Milestone 4.3 — Extended Thinking
-
REQ-129: Map
ThinkingLevelto Anthropicthinkingparameter:Off→ omit;Minimal→budget_tokens: 128;Low→ 512;Medium→ 2048;High→ 8192. (Source: [AR])- Depends on: REQ-019, REQ-040
- Definition of Done: Setting
ThinkingLevel::Mediumcauses{type:"enabled", budget_tokens:2048}to appear in the Anthropic request.
-
REQ-130: Map
ThinkingLevelto OpenAI-compatreasoning_effortparameter whensupports_reasoning_effortflag is set:Minimal/Low→"low";Medium→"medium";High→"high". (Source: [AR])- Depends on: REQ-019, REQ-041
- Definition of Done:
ThinkingLevel::Highwith a reasoning-capable provider producesreasoning_effort: "high"in the request body.
-
REQ-131: Parse
Thinkingcontent blocks from streaming responses (Anthropicthinkingtype blocks; OpenAIdelta.reasoning_content/ xAIdelta.reasoning); emit asStreamDelta::Thinkingand store asContent::Thinkingin the final message. (Source: [AR])- Depends on: REQ-001, REQ-008, REQ-040
- Definition of Done: A streaming response containing thinking/reasoning content produces
MessageUpdateevents withStreamDelta::Thinkingand the finalContent::Thinkingblock in the assembled message.
Milestone 4.4 — MCP HTTP Transport
-
REQ-132: Implement
McpClient::connect_http(url): POST JSON-RPC bodies to the configured URL (stateless, no persistent connection); complete the initialize handshake. (Source: [AR])- Depends on: REQ-115
- Definition of Done: An HTTP-based MCP server can be connected to and queried for tools.
-
REQ-133: Implement
Agent::with_mcp_server_http(url)builder. Support optional tool name prefix ({prefix}__{name}) for namespace disambiguation. (Source: [AR])- Depends on: REQ-117, REQ-132
- Definition of Done: HTTP MCP tools appear in the agent's tool list; with a prefix configured, tool names are formatted as
"{prefix}__{name}".
-
REQ-134: On MCP stdio transport shutdown, send EOF on stdin then kill the child process. (Source: [AR])
- Depends on: REQ-114
- Definition of Done: Dropping or closing the stdio MCP client terminates the child process cleanly.
Milestone 4.5 — Observability and Logging
-
REQ-135: Implement structured retry logging: when a retry occurs, log attempt number, max retries, delay, and the triggering error at an appropriate log level. (Source: [PS])
- Depends on: REQ-074
- Definition of Done: A retried request produces a structured log entry containing all four fields.
-
REQ-136: Implement
ContextTracker: combine provider-reported token counts (fromUsage) with localestimate_tokensfor messages appended since the last provider report. Exposecurrent_tokens() -> usize. (Source: [AR])- Depends on: REQ-054, REQ-055
- Definition of Done: After a turn with known provider-reported usage,
current_tokens()reflects the reported value; after additional messages are appended, it adds heuristic estimates.
-
REQ-137: Populate
ToolResult.detailswith structured metadata per tool:BashTool→{ exit_code, success };ReadFileTool→{ path };WriteFileTool→{ path };EditFileTool→{ path, old_lines, new_lines };ListFilesTool→{ total, truncated };SubAgentTool→{ sub_agent, turns }. (Source: [AR])- Depends on: REQ-047 through REQ-052
- Definition of Done:
ToolResult.detailsfor a bash execution containsexit_codeandsuccesskeys.
Milestone 4.6 — Security
-
REQ-138: Redact sensitive
OpenApiAuthcredentials in debug output:Bearer(token)displays asBearer("****");ApiKey { value }displays asApiKey { header: "...", value: "****" }. (Source: [AR])- Depends on: —
- Definition of Done: Printing/logging an
OpenApiAuth::Bearer("secret")value produces"****"instead of the actual token.
-
REQ-139: Implement the complete
BashTooldeny-pattern list (configurable; default list to be specified at implementation time based on the safety policy described in the spec). (Source: [PS])- Depends on: REQ-094
- Definition of Done: A configurable list of deny patterns is applied; at least the patterns documented in the spec are included in the default list.
Milestone 4.7 — Graceful Cancellation
-
REQ-140: Implement
CancellationToken::child_token(): creates a new token that is cancelled when the parent is cancelled. EachToolContextreceives a child token. (Source: [PS])- Depends on: REQ-033, REQ-046
- Definition of Done: Calling
agent.abort()(which cancels the root token) causes all active tool contexts'cancel.is_cancelled()to returntruesimultaneously.
-
REQ-141:
SubAgentToolforwards the parent's cancel token to the childagent_loop(), soagent.abort()terminates sub-agents as well. (Source: [PS])- Depends on: REQ-033, REQ-140
- Definition of Done: Aborting the parent agent cancels the sub-agent's run.
Milestone 4.8 — Callbacks and Advanced Configuration
-
REQ-142: Implement
on_updatecallback inToolContext: when called, emitsAgentEvent::ToolExecutionUpdate { tool_call_id, tool_name, partial_result }to the event channel. (Source: [AR])- Depends on: REQ-007, REQ-046
- Definition of Done: A tool that calls
ctx.on_update(partial)causesToolExecutionUpdateevents to appear in the stream beforeToolExecutionEnd.
-
REQ-143: Implement
on_progresscallback inToolContext: when called, emitsAgentEvent::ProgressMessage { tool_call_id, tool_name, text }. (Source: [AR])- Depends on: REQ-007, REQ-046
- Definition of Done: A tool that calls
ctx.on_progress("working...")causes aProgressMessageevent in the stream.
-
REQ-144: Implement
Agent::prompt_with_sender(text, tx): likeprompt, but streams events to a caller-provided sender rather than creating a new channel. (Source: [AR])- Depends on: REQ-034
- Definition of Done: Events are sent to the provided
tx; the caller can multiplex one sender across multiple prompts.
-
REQ-145: Implement
transform_contextandconvert_to_llmoptional hooks onAgentLoopConfig. When set,stream_assistant_responsecalls them to preprocess messages before buildingStreamConfig. (Source: [PS])- Depends on: REQ-039
- Definition of Done: A
transform_contexthook that adds a prefix message causes that message to appear in every LLM call.
-
REQ-146: Implement
Agent::with_compaction_strategy(strategy)builder; when set, use the customCompactionStrategyinstead of the default tiered cascade. (Source: [AR])- Depends on: REQ-023, REQ-060
- Definition of Done: A custom strategy that always returns an empty list causes the LLM to be called with no history.
-
REQ-147: Define
ModelConfigstruct with fields:base_url: Option<String>,headers: Map<String,String>,max_tokens_field: String(default"max_tokens"),supports_developer_role: bool,supports_reasoning_effort: bool. Apply inOpenAiCompatProvider. (Source: [AR])- Depends on: REQ-041
- Definition of Done: Setting
max_tokens_field: "max_completion_tokens"causes the OpenAI provider to use that key in the request body.
Milestone 4.9 — Agent Identity and Event Hook Observability
-
REQ-180: Define
ContinuationKindenum intypes.rswith three variants:Default(unspecified continuation),Rerun { tag: String }(retry from equivalent context),Branch { tag: String }(different execution path). Tags are RFC 3339 UTC timestamps auto-generated at call time by the caller. (Source: [AR])- Depends on: —
- Definition of Done: All three variants instantiate;
Rerun { tag }andBranch { tag }round-trip through JSON serialization preserving the tag string.
-
REQ-181: Define
TurnTriggerenum intypes.rswith four variants:User(first turn of origin call),SubAgent(sub-agent invocation),Continuation(subsequent turns, tool round-trips, steering, Default/Rerun continuations),Branch(first turn of a Branch continuation). Addtriggered_by: TurnTriggerfield toAgentEvent::TurnStart. (Source: [AR])- Depends on: REQ-007
- Definition of Done:
TurnStartevents carry the correcttriggered_byvalue: origin calls emitUseron turn 0; Branch continuations emitBranchon turn 0; all other first turns and all subsequent turns emitContinuation.
-
REQ-182: Add
before_loop: Option<BeforeLoopFn>andafter_loop: Option<AfterLoopFn>toAgentLoopConfig.BeforeLoopFnfires beforeAgentStart— returnfalseto abort the loop (emitAgentEnd { messages: [] }instead).AfterLoopFnfires afterAgentEndwith the new messages and accumulated usage. Both are wired inagent_loopandagent_loop_continue. (Source: [AR])- Depends on: REQ-036, REQ-037
- Definition of Done: A
before_loopreturningfalsestops the run beforeAgentStart;after_loopis called exactly once per loop call, afterAgentEnd, with correct message and usage values.
-
REQ-183: Add
before_tool_execution: Option<BeforeToolExecutionFn>andafter_tool_execution: Option<AfterToolExecutionFn>toAgentLoopConfig.BeforeToolExecutionFnfires beforeToolExecutionStart— returnfalseto skip the tool (emit skipped error result).AfterToolExecutionFnfires afterToolExecutionEnd. (Source: [AR])- Depends on: REQ-046
- Definition of Done: A
before_tool_executionreturningfalsefor one tool causes that tool to be skipped with an error result; other tools in the same batch are unaffected.after_tool_executionis called exactly once per tool call.
-
REQ-184: Add
before_tool_execution_update: Option<BeforeToolExecutionUpdateFn>andafter_tool_execution_update: Option<AfterToolExecutionUpdateFn>toAgentLoopConfig.BeforeToolExecutionUpdateFnfires before eachToolExecutionUpdate— returnfalseto suppress the event (tool keeps running, finalToolResultunaffected).AfterToolExecutionUpdateFnfires after the event when not suppressed. (Source: [AR])- Depends on: REQ-142
- Definition of Done: Suppressing an update via
before_tool_execution_updatecauses noToolExecutionUpdateevent to be emitted;after_tool_execution_updateis not called for suppressed updates.
-
REQ-185: Enforce and document the event hook ordering invariant:
before_loop → AgentStart … before_turn → TurnStart … before_tool_execution → ToolExecutionStart … (before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)* … ToolExecutionEnd → after_tool_execution … TurnEnd → after_turn … AgentEnd → after_loop. No hook may fire out of this sequence. (Source: [AR])- Depends on: REQ-182, REQ-183, REQ-184
- Definition of Done: An integration test with all hooks registered verifies they fire in the documented order for a multi-turn, multi-tool run.
-
REQ-186: Add
fn provider_id(&self) -> &stras a required method on theStreamProvidertrait (src/provider/traits.rs). Implement in all 7 providers:"anthropic","openai","openai_responses","azure_openai","google","google_vertex","bedrock". TheMockProviderreturns"mock". (Source: [AR])- Depends on: REQ-020
- Definition of Done: All 8
StreamProviderimplementations compile withprovider_id()returning the documented string; existing tests pass unchanged.
-
REQ-187: Add
config_id: Option<String>field toAgentLoopConfig. WhenNone,Agent::next_loop_id()auto-derives the effective config ID as"{provider_id}.{model_slug}[.thinking]". WhenSome, the supplied value is used verbatim. Used as the middle segment ofloop_id:"{session_id}.{config_id}.{N}". (Source: [AR])- Depends on: REQ-029, REQ-186
- Definition of Done: Setting
config_id: Some("my-config")causesloop_idto include"my-config"as its middle segment; leavingNoneproduces an auto-derived segment from provider + model.
-
REQ-188: Add
agent_id: Stringandsession_id: Stringfields toAgentstruct, both initialized to UUID v4 inAgent::new(). These are stable for the lifetime of theAgentinstance and injected into everyAgentContextbuilt byAgent::prompt_*andcontinue_loop_*. (Source: [AR])- Depends on: REQ-024
- Definition of Done: All
AgentStartevents emitted by a singleAgentinstance share the sameagent_idandsession_idvalues across multipleprompt()calls.
-
REQ-189: Add
loop_counters: HashMap<String, usize>andlast_loop_id: Option<String>toAgent. ImplementAgent::next_loop_id(config) -> String: computeeffective_config_idfromconfig.config_idor auto-derivation; increment the per-"{session_id}.{effective_config_id}"counter; return"{session_id}.{effective_config_id}.{N}". Setlast_loop_idafter eachprompt_*/continue_loop_*call. (Source: [AR])- Depends on: REQ-187, REQ-188
- Definition of Done: Two
agent_loopcalls on the same agent with the same provider/model produceloop_idvalues ending in.1and.2respectively; different configs produce independent counters (both.1).
-
REQ-190: Add
agent_id,session_id,loop_id,parent_loop_id, andcontinuation_kindfields toAgentContext. Inagent_loop, generate and write backagent_id/session_id/loop_idifNoneat entry.parent_loop_idandcontinuation_kindremain whatever the caller set. (Source: [AR])- Depends on: REQ-028, REQ-180, REQ-189
- Definition of Done: After
agent_loopreturns,context.agent_id,context.session_id, andcontext.loop_idare allSome; a subsequentagent_loop_continueon the same context can read them without regenerating.
-
REQ-191: In
agent_loop_continue, assertcontext.agent_id.is_some()andcontext.session_id.is_some()with descriptive panic messages. Do not silently generate new UUIDs. (Source: [AR])- Depends on: REQ-037, REQ-190
- Definition of Done: Calling
agent_loop_continuewithagent_id: Nonepanics with a message referencing "agent_loop_continue requires context.agent_id to be set"; with both fieldsSome, the assertion passes.
-
REQ-192: Add
agent_id: String,session_id: String,loop_id: String,parent_loop_id: Option<String>, andcontinuation_kind: Option<ContinuationKind>toAgentEvent::AgentStart. Emit these fields from bothagent_loopandagent_loop_continue.parent_loop_idisNonefor origin calls;continuation_kindisNonefor origin calls andSome(...)for continuations. (Source: [AR])- Depends on: REQ-007, REQ-180, REQ-190, REQ-191
- Definition of Done:
AgentStartevents fromagent_loophaveparent_loop_id: Noneandcontinuation_kind: None; events fromagent_loop_continuecarry the values set onAgentContext.
-
REQ-193: In
run_loop, determineTurnTriggerfor the first turn based oncontext.continuation_kind:Branch(..)→TurnTrigger::Branch; any otherSome(..)→TurnTrigger::Continuation;None→config.first_turn_trigger(defaultUser;SubAgentfor sub-agent callers). All subsequent turns useTurnTrigger::Continuation. Emittriggered_byinAgentEvent::TurnStart. (Source: [AR])- Depends on: REQ-038, REQ-181
- Definition of Done: A
Branchcontinuation emitsTurnTrigger::Branchon turn 0 andTurnTrigger::Continuationon all subsequent turns; aDefaultcontinuation emitsTurnTrigger::Continuationon all turns.
-
REQ-194: Add
child_loop_id: Option<String>to bothToolResultandAgentEvent::ToolExecutionEnd. Sub-agent tools setToolResult.child_loop_idto the child loop'sloop_idafteragent_loopcompletes.execute_single_toolpropagatesresult.child_loop_idintoToolExecutionEnd. Non-sub-agent tools leave both fieldsNone. (Source: [AR])- Depends on: REQ-010, REQ-046, REQ-148, REQ-190
- Definition of Done: A
ToolExecutionEndevent from aSubAgentToolcall carries a non-Nonechild_loop_id; the sameloop_idappears in the child'sAgentStartevent.
-
REQ-195: Add
SubAgentTool::with_parent_loop_id(loop_id: String)builder method. When set, the childAgentContextbuilt insideexecute()hasparent_loop_id: Some(loop_id). The child'sAgentStartevent thus carriesparent_loop_id, enabling ancestry tracing from child back to parent. (Source: [AR])- Depends on: REQ-148, REQ-190
- Definition of Done: A sub-agent tool configured with
with_parent_loop_id("parent.loop.1")emits a childAgentStartevent withparent_loop_id: Some("parent.loop.1").
Milestone 4.10 — Evaluational Parallelism
-
REQ-196: Migrate
AgentContext.toolsfromVec<Box<dyn AgentTool>>toVec<Arc<dyn AgentTool>>. Add#[derive(Clone)]toAgentContext. UpdateAgent::set_tools,BasicAgent::with_tools,default_tools()return type, and all push sites inBasicAgent(sub-agent, openapi, mcp). RemoveArcToolWrapperfromsub_agent.rs. (Implemented)- Depends on: REQ-028, REQ-046
- Definition of Done:
AgentContext: Clone; all existing tests pass;ArcToolWrapperdeleted.
-
REQ-197: Add
Usage::combine(&self, other: &Usage) -> Usagemethod for summing usage across branches. (Implemented)- Depends on: —
- Definition of Done:
usage_a.combine(&usage_b)returns aUsagewith all fields summed.
-
REQ-198: Add
ParallelLoopOutcomeandParallelLoopResultstructs totypes.rs. AddAgentEvent::ParallelLoopStart { session_id, loop_ids, timestamp }andAgentEvent::ParallelLoopEnd { session_id, selected_loop_id, selected_config_index, evaluation_usage, timestamp }variants toAgentEvent. (Implemented)- Depends on: REQ-190, REQ-197
- Definition of Done: Both structs construct and the enum variants match correctly.
-
REQ-199: Define
EvaluationDecisionenum andEvaluationStrategytrait intypes.rs. Trait method:evaluate(prompts, outcomes, tx, cancel) -> (EvaluationDecision, Usage). Placed intypes.rs(notevaluation.rs) to avoid a circular dependency withagent_loop.rs. (Implemented)- Depends on: REQ-198
- Definition of Done: Custom implementations compile by importing from
crate::typesorcrate::evaluation.
-
REQ-200: Create
src/agent_loop/evaluation.rswith five built-inEvaluationStrategyimplementations:TransparentEvaluation(single-branch pass-through),PickFirstEvaluation(always index 0),TokenEfficientEvaluation(lowesttotal_tokens),ElaborateEvaluation(highesttotal_tokens),LlmJudgeEvaluation { judge_config, system_prompt }. (Implemented)- Depends on: REQ-199
- Definition of Done: All five strategies implement
EvaluationStrategy; unit tests pass for each.
-
REQ-201:
LlmJudgeEvaluation— judge prompt construction: extract original query text from user messages inpromptsonly; extract final assistant text from each branch'snew_messages(strip tool calls, tool results, intermediate turns). Build numbered judge prompt; runagent_loopwithjudge_config; parse first integer from reply; inheritsession_idfrom branches for traceability. (Implemented)- Depends on: REQ-200
- Definition of Done: Judge receives clean final responses, not raw tool traces; judge
AgentStarthas samesession_idas branches.
-
REQ-202:
LlmJudgeEvaluation— judge's comprehension criteria: all N branch final responses must fit in the judge model's context budget simultaneously. Apply iterative multi-tier compaction: tier 1 (last 80 lines), tier 2 (first+last paragraph), tier 3 (hard char limit derived from budget / N). Budget derives fromjudge_config.context_config.max_context_tokens(if set). EmitAgentEvent::ProgressMessagewarning if criteria cannot be satisfied after tier 3. Selected winner always returns the original uncompacted messages. (Implemented)- Depends on: REQ-201
- Definition of Done: With a tight
context_config.max_context_tokens, compaction fires and a warning is emitted; selected output is the original branch content.
-
REQ-203: Add
derive_config_segment(config: &AgentLoopConfig) -> Stringhelper (pub crate) andrun_parallel_branches(...)internal async function toagent_loop.rs. Addagent_loop_parallel(prompts, base_context, configs, strategy, tx, cancel) -> ParallelLoopResultpublic async function. Usesfutures::future::join_allfor branch concurrency (avoids'staticbound onAgentLoopConfighooks). Per-branch forwarder task (tokio::spawn) captures usage fromAgentEnd. (Implemented)- Depends on: REQ-196, REQ-199
- Definition of Done:
agent_loop_parallelwith 2 configs runs both branches, emitsParallelLoopStart/ParallelLoopEnd, and returns correctselected_index.
-
REQ-204: Export
evaluationmodule fromlib.rs; re-exportagent_loop_paralleland all five evaluation strategies at crate root. (Implemented)- Depends on: REQ-200, REQ-203
- Definition of Done:
use phi_core::{agent_loop_parallel, PickFirstEvaluation, LlmJudgeEvaluation}compiles.
-
REQ-205:
agent_loop_parallelroutes toagent_loop_continuewhenpromptsis empty. (Implemented)- Depends on: REQ-203
- Definition of Done: Calling
agent_loop_parallel(vec![], ctx_with_user_msg, ...)dispatches each branch viaagent_loop_continueand returns a validParallelLoopResult.
-
REQ-206: Add
original_context_len: usizetoParallelLoopOutcome. (Implemented)- Depends on: REQ-198, REQ-205
- Definition of Done:
outcome.context.messages[..outcome.original_context_len]is the shared base context;[original_context_len..]are branch-produced messages.
-
REQ-207:
LlmJudgeEvaluationextracts prior conversation context and query fromcontext.messages[..original_context_len]inagent_loop_continuemode; includes formatted prior-context transcript in judge prompt. (Implemented)- Depends on: REQ-201, REQ-206
- Definition of Done: When
promptsis empty, the judge prompt contains"Prior conversation context:"and"Original query:"sections derived from the original context.
-
REQ-208: Replace single-pass output compaction with 2-iteration
compact_for_judge: Iteration 1 compacts prior context only (outputs intact); Iteration 2 compacts both independently. (Implemented)- Depends on: REQ-202, REQ-207
- Definition of Done: Under a tight token budget, outputs remain uncompacted as long as prior-context compaction alone can satisfy the criteria.
-
REQ-209: Updated
build_judge_user_messageincludes optional prior context section before the query. (Implemented)- Depends on: REQ-207
- Definition of Done: Judge prompt includes
"Prior conversation context:\n<transcript>"when prior context is non-empty; omitted when empty (fresh-session case).
Level 5 — Creative
Goal: The system surpasses the original. Sub-agent delegation, OpenAPI tool generation, advanced Anthropic protocol features, and all documented ambiguities are resolved with principled design decisions.
Completion Criteria: SubAgentTool works end-to-end; the OpenAPI adapter
generates callable tools from a spec file; all [AMBIGUOUS] items have a
documented resolution; performance benchmarks for parallel tool execution
meet or exceed documented expectations.
Milestone 4.11 — Persistent Session Layer
-
REQ-210: Add
loop_id: Stringto allAgentEventvariants that lacked it (AgentEnd,TurnStart,TurnEnd,MessageStart,MessageUpdate,MessageEnd,ToolExecutionStart,ToolExecutionUpdate,ToolExecutionEnd,ProgressMessage,InputRejected). AddSerialize, DeserializetoAgentEvent,ContinuationKind,TurnTrigger,StreamDelta. Threadloop_idthrough all emission sites inagent_loop.rsandevaluation.rs. (Source: [AR])- Depends on: REQ-007, REQ-114
- Definition of Done: All
AgentEventvariants carryloop_id; events from interleaved parallel branches can be unambiguously attributed to the correctLoopRecord.
-
REQ-211: Define
Session,LoopRecord,LoopEvent, andLoopConfigSnapshottypes insrc/session/.Sessioncontains an orderedVec<LoopRecord>;LoopRecordholds identity fields (loop_id,session_id,agent_id), timing, status, messages (fromAgentEnd.messages), usage, events, and tree links (children_loop_ids,parent_loop_id).LoopConfigSnapshotstoresmodel,provider,config_id. (Source: [AR])- Depends on: REQ-210
- Definition of Done: All types serialize/deserialize (JSON round-trip lossless);
Session.total_usage()sumsLoopRecord.usageacross all loops.
-
REQ-212: Define
ChildLoopRefandSpawnReffor bidirectional cross-session sub-agent tracking.ChildLoopRefis stored inLoopRecord.child_loop_refs(parent → child);SpawnRefis stored inSession.parent_spawn_ref(child → parent). Both carrytool_call_id,tool_name, and cross-session ids. (Source: [AR])- Depends on: REQ-211
- Definition of Done: A parent session's
LoopRecord.child_loop_refscan be used to load and link the child session.
-
REQ-213: Define
ParallelGroupRecordand implementLoopStatus::Pendingpre-registration inSessionRecorder. WhenParallelLoopStartarrives, pre-createLoopRecord { status: Pending }for each branch loop_id so the group is registered beforeAgentStartfires for each branch.ParallelLoopEndretroactively setsParallelGroupRecordon all branch records. (Source: [AR])- Depends on: REQ-211
- Definition of Done: After a parallel loop completes, all branch
LoopRecords haveparallel_groupset; exactly one hasis_selected = true.
-
REQ-214: Implement
SessionRecorderwithPerSessionIdformation policy.on_event(event)routes events byloop_id: createsSessionon first-seensession_idfromAgentStart; closesLoopRecordonAgentEnd; appends bidirectional tree links; handles sub-agentSpawnRefenrichment fromToolExecutionEnd.child_loop_id. (Source: [AR])- Depends on: REQ-211, REQ-212, REQ-213
- Definition of Done:
test_session_recorder_single_loop,test_session_recorder_continuation,test_session_recorder_bidirectional_tree,test_session_recorder_continuation_kindall pass.
-
REQ-215: Add
BasicAgent::new_session()andcheck_and_rotate(threshold)toBasicAgent. Addlast_active_at: Option<DateTime<Utc>>field; updateprompt_messages_with_senderto record it.new_session()rotatessession_id, clearsloop_countersandlast_loop_id. (Source: [AR])- Depends on: REQ-214
- Definition of Done:
test_basic_agent_new_sessionandtest_basic_agent_check_and_rotatepass.
-
REQ-216: Implement
save_session,load_session,list_session_idspersistence API. File layout:{dir}/{session_id}.json(pretty-printed JSON, flat directory).list_session_idsreturns ids sorted by modification time (newest first). (Source: [AR])- Depends on: REQ-211
- Definition of Done:
test_session_save_load_roundtripandtest_session_list_idspass; saved files are valid, human-readable JSON.
-
REQ-217: Implement
load_sessions_for_agentanddelete_session.load_sessions_for_agentloads all sessions indirand filters byagent_id.delete_sessionremoves the file; returnsSessionError::NotFoundif absent. (Source: [AR])- Depends on: REQ-216
- Definition of Done:
test_session_deletepasses;load_sessions_for_agentreturns only sessions with the matchingagent_id.
-
REQ-218: Implement
Sessiontree navigation methods:root_loops(),children_of(loop_id),parallel_siblings(loop_id),get_loop(loop_id). Export all public session types fromsrc/lib.rs. (Source: [AR])- Depends on: REQ-211
- Definition of Done:
test_session_recorder_parallel_groupandtest_session_recorder_bidirectional_treeexercise all navigation methods; all assertions pass.
-
REQ-219: Write
docs/concepts/sessions.mddocumenting: Overview, Session Formation (three modes), LoopRecord Anatomy (field table,LoopStatuslifecycle,continuation_kindclassification,LoopConfigSnapshotrationale), Loop Tree Navigation, Cross-Session Sub-Agent Tracking, Parallel Evaluation Groups,SessionRecorderusage with code example, Persistence API, and 9 Design Decisions (each with decision / why / rejected alternative). (Source: [AR])- Depends on: REQ-211 – REQ-218
- Definition of Done:
docs/concepts/sessions.mdexists; covers all listed sections; code examples are syntactically valid Rust.
-
REQ-220: Update
docs/specs/architecture.md: addSessionStorecomponent section, addSessionStoreto dependency graph, updateAgentEventvariant table to documentloop_id: Stringon all applicable variants, addSession/LoopRecord/SessionRecorderdata model entries, addnew_session()/check_and_rotate()/last_active_atto BasicAgent interface table. Updatedocs/specs/roadmap.mdwith this milestone. (Source: [AR])- Depends on: REQ-219
- Definition of Done: Both spec files updated; all new types and methods are documented.
-
REQ-221: Fix
SessionRecorderSpawnRefenrichment to handle the case where the child session has already been moved tocompletedbefore the parent'sToolExecutionEndfires. Currently,ToolExecutionEndonly searchesopen_sessionsfor the child session to enrichparent_spawn_ref.tool_call_id/tool_name; ifflush()was called betweenchild AgentEndand the parent'sToolExecutionEnd(e.g. periodic batch checkpointing in production), the child session is incompletedand the enrichment is silently skipped — leavingtool_call_id: ""andtool_name: ""on theSpawnRefpermanently. Fix by also searchingcompletedsessions in the enrichment step, or by deferring child-session promotion tocompleteduntil the parent loop also closes. (Source: post-sprint review)- Depends on: REQ-214
- Definition of Done: A test demonstrates that calling
flush()betweenchild AgentEndandparent ToolExecutionEndstill produces a fully-enrichedSpawnRefon the child session.
Milestone 5.1 — Sub-Agent Delegation
-
REQ-148: Implement
SubAgentTool::execute: validateparams["task"]is non-empty; build a freshAgentContext(empty messages, own toolset); buildAgentLoopConfigwithmax_turnsguard (default 10), no steering/follow-ups, no input filters; spawn childagent_loop; await result; callextract_final_text. (Source: [PS])- Depends on: REQ-036, REQ-157
- Definition of Done: A sub-agent tool registered on a parent agent completes a delegated task and returns the child agent's final text as a
ToolResult.
-
REQ-149: Implement
extract_final_text(messages) -> String: scan messages in reverse for the lastAssistantmessage withTextcontent blocks; join and return them; fall back to"(sub-agent produced no text output)". (Source: [PS])- Depends on: REQ-002
- Definition of Done:
extract_final_textreturns the text of the last assistant message; an all-tool-call assistant message returns the fallback string.
-
REQ-150: Sub-agent event forwarding: spawn a task to consume child
AgentEvents and forward them to parent channel asToolExecutionUpdate(forMessageUpdate::Text) andProgressMessage(for childProgressMessage) events. (Source: [PS])- Depends on: REQ-007, REQ-148
- Definition of Done: Parent event stream includes
ToolExecutionUpdateevents showing the sub-agent's text generation in real time.
-
REQ-151: Implement
SubAgentToolbuilder:SubAgentTool::new(name, model_config).with_system_prompt(...).with_tools(...).with_max_turns(...).with_thinking(...). (Source: [AR])- Depends on: REQ-021, REQ-148
- Definition of Done: A fully configured
SubAgentToolcan be added to a parent agent's tool list viawith_tools.
Milestone 5.2 — OpenAPI Adapter (Feature-Gated)
-
REQ-152: Implement
OpenApiAdapter::from_str(spec, config, filter): auto-detect JSON vs YAML (first non-whitespace char{or[→ JSON, else YAML); parse OpenAPI 3.x spec; resolve base URL; generate oneOpenApiToolAdapterper matching operation. (Source: [AR])- Depends on: REQ-153, REQ-154, REQ-155, REQ-156
- Definition of Done: A valid OpenAPI 3.x spec string (JSON and YAML both) produces one tool adapter per operation with an
operationId.
-
REQ-153: Classify parameters:
path→ URL substitution with RFC 3986 percent-encoding;query→ query string;header→ request headers;cookie→ skip with no error;requestBody(application/json only) → keyed as"body"(or"_request_body"on name collision). (Source: [AR])- Depends on: REQ-021
- Definition of Done: Path parameters appear in the URL; query parameters appear in the query string; cookie parameters are silently ignored.
-
REQ-154: Implement the HTTP execution pipeline per tool call: validate params, substitute path params, build URL, chain query/header params, apply
OpenApiAuth, applycustom_headers, optionally attach JSON body, send request, read body, truncate atmax_response_byteson a UTF-8 boundary, return"{METHOD} {URL} → {STATUS}\n\n{BODY}". (Source: [AR])- Depends on: REQ-021
- Definition of Done: A POST to a test endpoint with path, query, and body params produces the documented return format.
-
REQ-155: Implement
OperationFilter:All(include everything with anoperationId);ByOperationId(ids)(include only listed IDs);ByTag(tags)(include operations tagged with any listed tag);ByPathPrefix(prefix)(include operations whose path starts with prefix). Operations withoutoperationIdalways emit a warning and are skipped. (Source: [AR])- Depends on: REQ-152
- Definition of Done: Each filter variant correctly includes/excludes operations; an operation without
operationIdlogs a warning and is excluded regardless of filter.
-
REQ-156: Apply optional
name_prefixfromOpenApiConfig: tool name becomes"{prefix}__{operationId}"when set. (Source: [AR])- Depends on: REQ-152
- Definition of Done: With
name_prefix: Some("myapi"), the tool foroperationId: "getUser"is named"myapi__getUser".
-
REQ-157: Implement
from_file(path, config, filter)(async file read) andfrom_url(url, config, filter)(HTTP GET via HTTP client). (Source: [AR])- Depends on: REQ-152
- Definition of Done: Both sources produce identical tool lists as
from_stron the same spec content.
-
REQ-158: Implement
Agent::with_openapi_file,with_openapi_url,with_openapi_specbuilders onAgent. Gate the entireopenapimodule behind anopenapifeature flag. (Source: [AR])- Depends on: REQ-026, REQ-157
- Definition of Done: Without the
openapifeature, the code compiles successfully without the adapter; with it, all three builders are available.
Milestone 5.3 — Advanced Anthropic Protocol
-
REQ-159: Implement Anthropic OAuth auth path: when
model_configindicates OAuth, useAuthorization: Bearer {TOKEN}header plus beta headersclaude-code-20250219,oauth-2025-04-20,fine-grained-tool-streaming-2025-05-14,x-app: cli,anthropic-dangerous-direct-browser-access: true,user-agent: claude-cli/2.1.2. (Source: [AR])- Depends on: REQ-040
- Definition of Done: An OAuth-configured provider sends all documented headers; standard API key auth sends the standard
x-api-keyheader.
-
REQ-160: Implement Anthropic
InputJsonDeltatool-argument streaming: buffer incrementalInputJsonDeltatext fragments inarguments["__partial_json"]; parse the complete accumulated string as JSON oncontent_block_stop. (Source: [AR])- Depends on: REQ-040
- Definition of Done: A tool call streamed in 5
InputJsonDeltafragments produces a single, complete, parseable JSONargumentsobject.
Milestone 5.4 — Ambiguity Resolutions
-
REQ-161: [AMBIGUOUS] Standardize
AgentEndemission on abort: define and document whetherAgentEndis emitted when cancellation is detected at various checkpoints (start of loop, mid-stream, mid-tool). Implement a consistent policy. (Source: [PS])- Depends on: REQ-067, REQ-082
- Definition of Done: The chosen policy is documented; behavior is consistent regardless of where in the loop cancellation is detected.
-
REQ-162:
TokenCountertrait incontext/token.rswithHeuristicTokenCounter(chars/4) as default. Pluggable viaContextConfig.token_counter. Threaded through all hot-path call sites. (Source: [OV])- Depends on: REQ-054
- Definition of Done: A
TokenCountertrait or injection point exists; the default implementation uses the 4-char heuristic; a precise implementation can be substituted via configuration.
-
REQ-163: [AMBIGUOUS] Define sub-agent error propagation: document what
execute()returns when the childagent_loopproduces only error/empty messages. Implement theextract_final_textfallback consistently. (Source: [PS])- Depends on: REQ-149
- Definition of Done: The policy is documented; child agent error messages are reflected in the fallback text or surfaced as
ToolError::Failed.
Level 6 — Boss
Goal: The system is exceptional. It is fully tested, scalable, developer-friendly, and operates as a platform with a clear public API contract and operational runbooks.
Completion Criteria: The system passes load tests at 10x expected tool concurrency. Full test coverage includes unit, integration, property-based, and end-to-end tests. Public API documentation is complete. Operational runbooks cover all known failure modes.
Milestone 6.1 — Full Test Suite
-
REQ-164: Unit tests for all three compaction levels (
level1,level2,level3) including: no-op when under budget; exact budget boundary; message count edge cases (fewer messages thankeep_recent/keep_first); correct ordering of head+marker+tail in level 3. (Source: [AR])- Depends on: REQ-056 through REQ-059
- Definition of Done: All edge cases identified above have dedicated test cases that pass.
-
REQ-165: Property-based tests for
compact_messages: for any valid(messages, config)input,total_tokens(compact_messages(messages, config)) <= budget. (Source: [AR])- Depends on: REQ-056
- Definition of Done: 10,000 random test cases all satisfy the budget invariant without panic.
-
REQ-166: Unit tests for
delay_for_attempt: verify exponential growth; verify jitter stays in[0.8, 1.2]range over 10,000 samples; verifymax_delay_mscap is respected. (Source: [AR])- Depends on: REQ-071
- Definition of Done: All three assertions pass across the full retry range.
-
REQ-167: Integration tests for each of the 7 provider protocols using a mock HTTP server: correct request format, correct response parsing, correct
StopReasonmapping, correct tool-call extraction. (Source: [AR])- Depends on: REQ-040 through REQ-042, REQ-120 through REQ-124
- Definition of Done: Each provider has at least one happy-path integration test and one error-path test using a local mock server.
-
REQ-168: Integration test for MCP stdio transport: spawn a minimal mock MCP server subprocess; verify initialize handshake, tool listing, and tool execution. (Source: [AR])
- Depends on: REQ-114 through REQ-119
- Definition of Done: The mock MCP server can be connected to, queried, and called; all three phases produce correct results.
-
REQ-169: End-to-end agent loop tests using
MockProvider: test single-turn text response; multi-turn tool call cycle; steering injection mid-run; follow-up queue; execution limit enforcement; context compaction trigger; input filter rejection. (Source: [AR])- Depends on: REQ-036 through REQ-090
- Definition of Done: All seven scenarios have a passing automated test.
Milestone 6.2 — Load and Scale Testing
-
REQ-170: Load test: run 100 parallel agents each with 10 concurrent tool calls using
MockProvider. Verify no data races, no deadlocks, correct result ordering, no memory leaks. (Source: [AR])- Depends on: REQ-045, REQ-085
- Definition of Done: 1,000 total tool calls complete correctly with no panics and tool results are in original call order.
-
REQ-171: Load test: run a single agent for 1,000 turns with compaction enabled. Verify token estimates stay bounded; no unbounded memory growth; compaction fires when expected. (Source: [AR])
- Depends on: REQ-056, REQ-060
- Definition of Done: Memory usage stabilizes after compaction; no messages are dropped that violate
keep_first/keep_recentinvariants.
-
REQ-172: Memory profile: verify
Agent.messagesdoes not grow unboundedly in a long conversation with compaction enabled. (Source: [AR])- Depends on: REQ-056, REQ-060
- Definition of Done: Message count stays within
keep_first + keep_recent + small_constantafter steady state is reached.
Milestone 6.3 — Public API Contract and Documentation
-
REQ-173: Publish complete API reference documentation for all public types, traits, and functions with usage examples for each primary use case from
../reference/glossary.md. (Source: [OV])- Depends on: REQ-001 through REQ-163
- Definition of Done: A developer with no prior context can build a working coding assistant and CLI REPL from the docs alone.
-
REQ-174: Document all 7 provider integration contracts: authentication method, endpoint pattern, request format, response parsing notes, any quirks (e.g., Bedrock ndjson, Google tool ID generation, Azure
api-keyheader). (Source: [AR])- Depends on: REQ-040 through REQ-042, REQ-120 through REQ-124
- Definition of Done: Each provider has a documentation page listing all fields from the integration contract table.
-
REQ-175: Write and publish working example implementations: (1) CLI REPL with
/quit,/clear,/modelcommands; (2) coding assistant with all built-in tools; (3) multi-agent pipeline withSubAgentTool. (Source: [OV])- Depends on: REQ-053, REQ-148
- Definition of Done: All three examples compile and run end-to-end; the CLI REPL handles all three slash commands.
-
REQ-176: Publish AgentSkills standard compliance documentation and MCP integration guide. (Source: [OV])
- Depends on: REQ-109 through REQ-113, REQ-114 through REQ-119
- Definition of Done: Both guides include a "getting started" section that results in a working integration.
Milestone 6.4 — Developer Tooling and Operational Readiness
-
REQ-177: Package and publish the library with proper semantic versioning. The
openapifeature is opt-in. Document all feature flags. (Source: [AR])- Depends on: REQ-158
- Definition of Done: Library installs as a dependency;
openapifeature is absent from the default build; enabling it adds the adapter without breaking existing code.
-
REQ-178: CI pipeline: run unit tests, integration tests (with mock servers), and
openapi-feature tests on every commit. Gate provider live tests behind API key secrets. (Source: [AR])- Depends on: REQ-164 through REQ-169
- Definition of Done: CI passes on every commit; provider live tests run in a separate gated workflow.
-
REQ-179: Operational runbook covering: retry tuning (when to adjust
RetryConfig); context overflow handling (choosingContextConfigvalues); provider failover (switching providers on persistent failures); MCP server crash recovery; performance profiling guide. (Source: [AR])- Depends on: REQ-071 through REQ-077
- Definition of Done: The runbook covers all five topics with actionable decision trees.
Requirement Index
| REQ | Description | Level | Milestone | Source | Depends On |
|---|---|---|---|---|---|
| REQ-001 | Content enum (Text, Image, Thinking, ToolCall) | 1 | 1.1 | [AR] | — |
| REQ-002 | Message enum (User, Assistant, ToolResult) | 1 | 1.1 | [AR] | REQ-001, REQ-005, REQ-006 |
| REQ-003 | AgentMessage enum (Llm, Extension) | 1 | 1.1 | [AR] | REQ-002, REQ-004 |
| REQ-004 | ExtensionMessage struct | 1 | 1.1 | [AR] | — |
| REQ-005 | StopReason enum | 1 | 1.1 | [AR] | — |
| REQ-006 | Usage struct with cache_hit_rate() | 1 | 1.1 | [AR] | — |
| REQ-007 | AgentEvent enum (all variants) | 1 | 1.1 | [AR] | REQ-002, REQ-008 |
| REQ-008 | StreamDelta enum | 1 | 1.1 | [AR] | — |
| REQ-009 | ToolContext struct | 1 | 1.1 | [AR] | — |
| REQ-010 | ToolResult and ToolError types | 1 | 1.1 | [AR] | REQ-001 |
| REQ-011 | ContextConfig struct with defaults | 1 | 1.1 | [AR] | — |
| REQ-012 | ExecutionLimits and ExecutionTracker | 1 | 1.1 | [AR] | — |
| REQ-013 | RetryConfig with defaults | 1 | 1.1 | [AR] | — |
| REQ-014 | CacheConfig and CacheStrategy | 1 | 1.1 | [AR] | — |
| REQ-015 | StreamConfig struct | 1 | 1.1 | [AR] | REQ-014, REQ-016 |
| REQ-016 | ToolDefinition struct | 1 | 1.1 | [AR] | — |
| REQ-017 | QueueMode enum | 1 | 1.1 | [AR] | — |
| REQ-018 | Full Serialize/Deserialize on AgentMessage tree | 1 | 1.1 | [OV] | REQ-001–017 |
| REQ-019 | ThinkingLevel enum | 1 | 1.1 | [OV] | — |
| REQ-020 | StreamProvider trait and ProviderError enum | 1 | 1.2 | [AR] | REQ-002, REQ-015 |
| REQ-021 | AgentTool trait | 1 | 1.2 | [AR] | REQ-009, REQ-010 |
| REQ-022 | InputFilter trait | 1 | 1.2 | [OV] | — |
| REQ-023 | CompactionStrategy trait | 1 | 1.2 | [AR] | REQ-003, REQ-011 |
| REQ-024 | Agent::new() with all field defaults | 1 | 1.3 | [PS] | REQ-011–017, REQ-019–020 |
| REQ-025 | Builder methods: system_prompt, model, api_key, etc. | 1 | 1.3 | [PS] | REQ-024 |
| REQ-026 | Builder methods: tools, context_config, limits, etc. | 1 | 1.3 | [PS] | REQ-024 |
| REQ-027 | Steering/follow-up queues as Arc<Mutex | 1 | 1.3 | [AR] | REQ-003, REQ-024 |
| REQ-028 | AgentContext struct | 1 | 1.4 | [AR] | REQ-003, REQ-021 |
| REQ-029 | AgentLoopConfig struct | 1 | 1.4 | [OV] | REQ-011–017, REQ-023 |
| REQ-030 | MockProvider implementation | 1 | 1.5 | [AR] | REQ-020 |
| REQ-031 | Smoke test: Agent constructs without error | 1 | 1.5 | [OV] | REQ-024–030 |
| REQ-032 | Unbounded async event channel | 2 | 2.1 | [AR] | REQ-007 |
| REQ-033 | CancellationToken with child_token propagation | 2 | 2.1 | [AR] | — |
| REQ-034 | Agent::prompt() entry point | 2 | 2.2 | [PS] | REQ-002, REQ-035 |
| REQ-035 | Agent::prompt_messages_with_sender() | 2 | 2.2 | [PS] | REQ-027–029, REQ-033, REQ-036 |
| REQ-036 | agent_loop() implementation | 2 | 2.3 | [PS] | REQ-032, REQ-037 |
| REQ-037 | agent_loop_continue() implementation | 2 | 2.3 | [PS] | REQ-036 |
| REQ-038 | run_loop() inner loop (happy path) | 2 | 2.3 | [PS] | REQ-039, REQ-045, REQ-060 |
| REQ-039 | stream_assistant_response() (no retry) | 2 | 2.4 | [PS] | REQ-007–008, REQ-015, REQ-020, REQ-032 |
| REQ-040 | AnthropicProvider::stream() | 2 | 2.4 | [AR] | REQ-020, REQ-039 |
| REQ-041 | OpenAiCompatProvider::stream() | 2 | 2.4 | [AR] | REQ-020, REQ-039 |
| REQ-042 | ProviderRegistry with default() | 2 | 2.4 | [AR] | REQ-040, REQ-041 |
| REQ-043 | StopReason determination in providers | 2 | 2.4 | [PS] | REQ-005, REQ-040–041 |
| REQ-044 | Filter Extension messages before LLM call | 2 | 2.4 | [AR] | REQ-003, REQ-015 |
| REQ-045 | execute_tool_calls() (Parallel dispatch) | 2 | 2.5 | [PS] | REQ-046 |
| REQ-046 | execute_single_tool() | 2 | 2.5 | [PS] | REQ-007, REQ-009–010, REQ-021, REQ-033 |
| REQ-047 | BashTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-048 | ReadFileTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-049 | WriteFileTool::execute() | 2 | 2.5 | [AR] | REQ-010, REQ-021 |
| REQ-050 | EditFileTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-051 | ListFilesTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-052 | SearchTool::execute() (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-053 | default_tools() returning all 6 tools | 2 | 2.5 | [AR] | REQ-047–052 |
| REQ-054 | estimate_tokens() heuristic | 2 | 2.6 | [PS] | — |
| REQ-055 | content_tokens() and message_tokens() | 2 | 2.6 | [PS] | REQ-001, REQ-003, REQ-054 |
| REQ-056 | compact_messages() 3-tier cascade | 2 | 2.6 | [PS] | REQ-055, REQ-057–059 |
| REQ-057 | level1_truncate_tool_outputs() | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-058 | level2_summarize_old_turns() | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-059 | level3_drop_middle() and keep_within_budget() | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-060 | Integrate compaction in run_loop | 2 | 2.6 | [PS] | REQ-038, REQ-056 |
| REQ-061 | ExecutionTracker::record_turn() and check_limits() | 2 | 2.7 | [AR] | REQ-012 |
| REQ-062 | Execution limit enforcement in run_loop | 2 | 2.7 | [PS] | REQ-038, REQ-061 |
| REQ-063 | Agent::save_messages() | 2 | 2.8 | [OV] | REQ-018 |
| REQ-064 | Agent::restore_messages() | 2 | 2.8 | [OV] | REQ-018, REQ-063 |
| REQ-065 | Agent::reset() | 2 | 2.8 | [AR] | REQ-033 |
| REQ-066 | Agent::steer() and Agent::follow_up() | 2 | 2.8 | [AR] | REQ-027 |
| REQ-067 | Agent::abort() | 2 | 2.8 | [AR] | REQ-033, REQ-035 |
| REQ-068 | Input filter chain execution | 3 | 3.1 | [PS] | REQ-022, REQ-036 |
| REQ-069 | Reject → emit InputRejected + AgentEnd([]) | 3 | 3.1 | [PS] | REQ-068 |
| REQ-070 | Warn → append warning text to last user message | 3 | 3.1 | [PS] | REQ-068 |
| REQ-071 | delay_for_attempt() exponential backoff with jitter | 3 | 3.2 | [PS] | REQ-013 |
| REQ-072 | is_retryable() on ProviderError | 3 | 3.2 | [AR] | REQ-020 |
| REQ-073 | retry_after() on ProviderError | 3 | 3.2 | [AR] | REQ-020 |
| REQ-074 | Retry loop in stream_assistant_response | 3 | 3.2 | [PS] | REQ-039, REQ-071–073 |
| REQ-075 | ProviderError::classify() HTTP status routing | 3 | 3.3 | [PS] | REQ-020 |
| REQ-076 | is_context_overflow() phrase matching | 3 | 3.3 | [PS] | — |
| REQ-077 | Context overflow recovery trigger | 3 | 3.3 | [AR] | REQ-056, REQ-075–076 |
| REQ-078 | ToolError::Failed/InvalidArgs → error ToolResult | 3 | 3.4 | [AR] | REQ-010, REQ-046 |
| REQ-079 | ToolError::NotFound → "Tool X not found" | 3 | 3.4 | [PS] | REQ-046 |
| REQ-080 | ToolError::Cancelled → "Skipped" ToolResult | 3 | 3.4 | [AR] | REQ-010, REQ-046 |
| REQ-081 | Error stop reason handling in run_loop | 3 | 3.5 | [PS] | REQ-038, REQ-082 |
| REQ-082 | Aborted stop reason handling in run_loop | 3 | 3.5 | [PS] | REQ-038 |
| REQ-083 | Synthetic error Message::Assistant on provider failure | 3 | 3.5 | [PS] | REQ-002, REQ-039 |
| REQ-084 | execute_sequential() with steering check | 3 | 3.6 | [PS] | REQ-046, REQ-080 |
| REQ-085 | execute_batch() (Parallel) with post-batch steering | 3 | 3.6 | [PS] | REQ-046 |
| REQ-086 | Batched { size } dispatch with inter-batch steering | 3 | 3.6 | [PS] | REQ-085 |
| REQ-087 | Drain steering queue at start of outer loop | 3 | 3.7 | [PS] | REQ-038 |
| REQ-088 | Inject steering messages into pending after tools | 3 | 3.7 | [PS] | REQ-038, REQ-084–085 |
| REQ-089 | Follow-up queue check re-enters outer loop | 3 | 3.7 | [PS] | REQ-038 |
| REQ-090 | QueueMode::OneAtATime and QueueMode::All | 3 | 3.7 | [AR] | REQ-017, REQ-027 |
| REQ-091 | before_turn callback with abort-if-false | 3 | 3.8 | [PS] | REQ-038 |
| REQ-092 | after_turn callback on every turn | 3 | 3.8 | [PS] | REQ-038 |
| REQ-093 | on_error callback on Error stop reason | 3 | 3.8 | [PS] | REQ-081 |
| REQ-094 | BashTool deny patterns | 3 | 3.9 | [PS] | REQ-047 |
| REQ-095 | BashTool timeout + cancellation race | 3 | 3.9 | [PS] | REQ-047 |
| REQ-096 | BashTool output truncation | 3 | 3.9 | [PS] | REQ-047 |
| REQ-097 | BashTool confirm_fn callback | 3 | 3.9 | [PS] | REQ-047 |
| REQ-098 | ReadFileTool size limits (1MB text, 20MB image) | 3 | 3.9 | [PS] | REQ-048 |
| REQ-099 | ReadFileTool image path (base64, MIME detection) | 3 | 3.9 | [PS] | REQ-001, REQ-048 |
| REQ-100 | ReadFileTool cancellation check | 3 | 3.9 | [PS] | REQ-048 |
| REQ-101 | EditFileTool zero-match error with fuzzy hint | 3 | 3.9 | [PS] | REQ-050 |
| REQ-102 | EditFileTool multiple-match error | 3 | 3.9 | [PS] | REQ-050 |
| REQ-103 | EditFileTool cancellation check | 3 | 3.9 | [PS] | REQ-050 |
| REQ-104 | WriteFileTool cancellation check | 3 | 3.9 | [AR] | REQ-049 |
| REQ-105 | ListFilesTool timeout + max_results truncation | 3 | 3.9 | [PS] | REQ-051 |
| REQ-106 | SearchTool rg→grep fallback + cancellation | 3 | 3.9 | [PS] | REQ-052 |
| REQ-107 | is_streaming guard in prompt_messages_with_sender | 3 | 3.10 | [PS] | REQ-035 |
| REQ-108 | agent_loop_continue precondition validation | 3 | 3.10 | [PS] | REQ-037 |
| REQ-109 | SkillSet::load() with collision handling | 3 | 3.11 | [PS] | REQ-110 |
| REQ-110 | parse_frontmatter() with error variants | 3 | 3.11 | [PS] | — |
| REQ-111 | SkillSet::format_for_prompt() XML output | 3 | 3.11 | [PS] | REQ-109 |
| REQ-112 | SkillSet::load_dir() and SkillSet::merge() | 3 | 3.11 | [AR] | REQ-109 |
| REQ-113 | Agent::with_skills() builder | 3 | 3.11 | [PS] | REQ-111 |
| REQ-114 | McpClient::connect_stdio() with handshake | 3 | 3.12 | [PS] | REQ-115, REQ-116 |
| REQ-115 | McpClient::send_request() JSON-RPC 2.0 | 3 | 3.12 | [PS] | — |
| REQ-116 | McpClient::list_tools() and call_tool() | 3 | 3.12 | [PS] | REQ-115 |
| REQ-117 | McpToolAdapter implementing AgentTool | 3 | 3.12 | [AR] | REQ-001, REQ-021, REQ-116 |
| REQ-118 | All McpError variants → ToolError::Failed | 3 | 3.12 | [AR] | REQ-117 |
| REQ-119 | Agent::with_mcp_server_stdio() builder | 3 | 3.12 | [AR] | REQ-114, REQ-117 |
| REQ-120 | GoogleProvider::stream() (Gemini API) | 4 | 4.1 | [AR] | REQ-020 |
| REQ-121 | GoogleVertexProvider::stream() (Vertex AI) | 4 | 4.1 | [AR] | REQ-120 |
| REQ-122 | BedrockProvider::stream() (ConverseStream) | 4 | 4.1 | [AR] | REQ-020 |
| REQ-123 | OpenAiResponsesProvider::stream() | 4 | 4.1 | [AR] | REQ-020 |
| REQ-124 | AzureOpenAiProvider::stream() | 4 | 4.1 | [AR] | REQ-123 |
| REQ-125 | All 7 providers in ProviderRegistry::default() | 4 | 4.1 | [AR] | REQ-042, REQ-120–124 |
| REQ-126 | CacheStrategy::Auto breakpoint placement | 4 | 4.2 | [AR] | REQ-014, REQ-040 |
| REQ-127 | CacheStrategy::Manual and Disabled | 4 | 4.2 | [AR] | REQ-126 |
| REQ-128 | Cache token counts in Usage | 4 | 4.2 | [AR] | REQ-006, REQ-040 |
| REQ-129 | ThinkingLevel → Anthropic thinking parameter | 4 | 4.3 | [AR] | REQ-019, REQ-040 |
| REQ-130 | ThinkingLevel → OpenAI reasoning_effort | 4 | 4.3 | [AR] | REQ-019, REQ-041 |
| REQ-131 | Parse Thinking content from streaming responses | 4 | 4.3 | [AR] | REQ-001, REQ-008, REQ-040 |
| REQ-132 | McpClient::connect_http() | 4 | 4.4 | [AR] | REQ-115 |
| REQ-133 | Agent::with_mcp_server_http() with prefix support | 4 | 4.4 | [AR] | REQ-117, REQ-132 |
| REQ-134 | MCP stdio shutdown (EOF + kill) | 4 | 4.4 | [AR] | REQ-114 |
| REQ-135 | Structured retry logging | 4 | 4.5 | [PS] | REQ-074 |
| REQ-136 | ContextTracker hybrid token tracking | 4 | 4.5 | [AR] | REQ-054–055 |
| REQ-137 | ToolResult.details per-tool metadata | 4 | 4.5 | [AR] | REQ-047–052 |
| REQ-138 | OpenApiAuth credential redaction in debug | 4 | 4.6 | [AR] | — |
| REQ-139 | BashTool default deny-pattern list | 4 | 4.6 | [PS] | REQ-094 |
| REQ-140 | CancellationToken::child_token() propagation | 4 | 4.7 | [PS] | REQ-033, REQ-046 |
| REQ-141 | Sub-agent inherits parent cancel token | 4 | 4.7 | [PS] | REQ-033, REQ-140 |
| REQ-142 | on_update callback → ToolExecutionUpdate event | 4 | 4.8 | [AR] | REQ-007, REQ-046 |
| REQ-143 | on_progress callback → ProgressMessage event | 4 | 4.8 | [AR] | REQ-007, REQ-046 |
| REQ-144 | Agent::prompt_with_sender() | 4 | 4.8 | [AR] | REQ-034 |
| REQ-145 | transform_context/convert_to_llm hooks | 4 | 4.8 | [PS] | REQ-039 |
| REQ-146 | Agent::with_compaction_strategy() builder | 4 | 4.8 | [AR] | REQ-023, REQ-060 |
| REQ-147 | ModelConfig struct and application in OpenAiCompat | 4 | 4.8 | [AR] | REQ-041 |
| REQ-148 | SubAgentTool::execute() | 5 | 5.1 | [PS] | REQ-036, REQ-157 |
| REQ-149 | extract_final_text() | 5 | 5.1 | [PS] | REQ-002 |
| REQ-150 | Sub-agent event forwarding to parent channel | 5 | 5.1 | [PS] | REQ-007, REQ-148 |
| REQ-151 | SubAgentTool builder API | 5 | 5.1 | [AR] | REQ-021, REQ-148 |
| REQ-152 | OpenApiAdapter::from_str() JSON/YAML parsing | 5 | 5.2 | [AR] | REQ-153–156 |
| REQ-153 | OpenAPI parameter classification | 5 | 5.2 | [AR] | REQ-021 |
| REQ-154 | OpenAPI HTTP execution pipeline | 5 | 5.2 | [AR] | REQ-021 |
| REQ-155 | OperationFilter variants | 5 | 5.2 | [AR] | REQ-152 |
| REQ-156 | name_prefix tool naming | 5 | 5.2 | [AR] | REQ-152 |
| REQ-157 | from_file() and from_url() spec sources | 5 | 5.2 | [AR] | REQ-152 |
| REQ-158 | OpenAPI builders on Agent + feature flag | 5 | 5.2 | [AR] | REQ-026, REQ-157 |
| REQ-159 | Anthropic OAuth auth path | 5 | 5.3 | [AR] | REQ-040 |
| REQ-160 | Anthropic InputJsonDelta tool-arg streaming | 5 | 5.3 | [AR] | REQ-040 |
| REQ-161 | [AMBIGUOUS] AgentEnd on abort policy | 5 | 5.4 | [PS] | REQ-067, REQ-082 |
| REQ-162 | [AMBIGUOUS] TokenCounter abstraction point | 5 | 5.4 | [OV] | REQ-054 |
| REQ-163 | [AMBIGUOUS] Sub-agent error propagation policy | 5 | 5.4 | [PS] | REQ-149 |
| REQ-164 | Compaction algorithm unit tests | 6 | 6.1 | [AR] | REQ-056–059 |
| REQ-165 | Property-based tests: budget invariant | 6 | 6.1 | [AR] | REQ-056 |
| REQ-166 | Retry backoff unit tests | 6 | 6.1 | [AR] | REQ-071 |
| REQ-167 | Provider integration tests (mock HTTP server) | 6 | 6.1 | [AR] | REQ-040–042, REQ-120–124 |
| REQ-168 | MCP stdio integration test | 6 | 6.1 | [AR] | REQ-114–119 |
| REQ-169 | End-to-end agent loop tests (MockProvider) | 6 | 6.1 | [AR] | REQ-036–090 |
| REQ-170 | Load test: 100 parallel agents, 10 concurrent tools | 6 | 6.2 | [AR] | REQ-045, REQ-085 |
| REQ-171 | Load test: 1,000-turn single agent with compaction | 6 | 6.2 | [AR] | REQ-056, REQ-060 |
| REQ-172 | Memory profile: message growth is bounded | 6 | 6.2 | [AR] | REQ-056, REQ-060 |
| REQ-173 | Public API reference documentation | 6 | 6.3 | [OV] | REQ-001–163 |
| REQ-174 | Provider integration contract documentation | 6 | 6.3 | [AR] | REQ-040–042, REQ-120–124 |
| REQ-175 | Working example implementations | 6 | 6.3 | [OV] | REQ-053, REQ-148 |
| REQ-176 | AgentSkills + MCP integration guides | 6 | 6.3 | [OV] | REQ-109–119 |
| REQ-177 | Library packaging with feature flags | 6 | 6.4 | [AR] | REQ-158 |
| REQ-178 | CI pipeline with gated live tests | 6 | 6.4 | [AR] | REQ-164–169 |
| REQ-179 | Operational runbooks | 6 | 6.4 | [AR] | REQ-071–077 |
| REQ-180 | ContinuationKind enum (Default, Rerun { tag }, Branch { tag }) | 4 | 4.9 | [AR] | — |
| REQ-181 | TurnTrigger enum (User, Continuation, SubAgent, Branch) | 4 | 4.9 | [AR] | — |
| REQ-182 | before_loop/after_loop hooks on AgentLoopConfig | 4 | 4.9 | [AR] | REQ-029, REQ-036 |
| REQ-183 | before_tool_execution/after_tool_execution hooks on AgentLoopConfig | 4 | 4.9 | [AR] | REQ-029, REQ-046 |
| REQ-184 | before_tool_execution_update/after_tool_execution_update hooks | 4 | 4.9 | [AR] | REQ-142, REQ-183 |
| REQ-185 | Guaranteed event hook ordering invariant | 4 | 4.9 | [AR] | REQ-182–184, REQ-091–092 |
| REQ-186 | provider_id() -> &str required method on StreamProvider; implement in all 7 providers | 4 | 4.9 | [AR] | REQ-020, REQ-125 |
| REQ-187 | config_id: Option<String> on AgentLoopConfig; auto-derived when None | 4 | 4.9 | [AR] | REQ-029, REQ-186 |
| REQ-188 | agent_id/session_id UUID fields on Agent; stable for Agent lifetime | 4 | 4.9 | [AR] | REQ-024 |
| REQ-189 | loop_counters and last_loop_id on Agent; next_loop_id() helper | 4 | 4.9 | [AR] | REQ-024, REQ-187, REQ-188 |
| REQ-190 | agent_id, session_id, loop_id, parent_loop_id, continuation_kind on AgentContext; write-back in agent_loop | 4 | 4.9 | [AR] | REQ-028, REQ-180, REQ-188 |
| REQ-191 | Assert agent_id/session_id are Some in agent_loop_continue | 4 | 4.9 | [AR] | REQ-037, REQ-190 |
| REQ-192 | AgentStart event: agent_id, session_id, loop_id, parent_loop_id, continuation_kind fields | 4 | 4.9 | [AR] | REQ-007, REQ-180, REQ-190 |
| REQ-193 | TurnStart.triggered_by: TurnTrigger; Branch continuation uses Branch on first turn | 4 | 4.9 | [AR] | REQ-007, REQ-181, REQ-190 |
| REQ-194 | child_loop_id: Option<String> on ToolResult and ToolExecutionEnd; set by sub-agent tools | 4 | 4.9 | [AR] | REQ-010, REQ-007, REQ-148 |
| REQ-195 | SubAgentTool::with_parent_loop_id(loop_id) builder; child AgentContext includes parent_loop_id | 4 | 4.9 | [AR] | REQ-151, REQ-190 |
Known Ambiguities
Items marked [AMBIGUOUS] in the spec that require a design decision
before implementation:
| ID | Description | Suggested Resolution | Level Introduced |
|---|---|---|---|
| AMB-001 | AgentEnd emission on abort — pseudocode says AgentEnd is NOT emitted on abort, but notes this may vary depending on where in the loop cancellation is detected (provider Start/Done events may still arrive). | Define a clear policy: AgentEnd is ALWAYS emitted when the loop exits, including on abort, so callers can rely on the channel always closing cleanly. Gate this by ensuring cancellation detection before the loop attempts to emit AgentEnd. | 5 |
| AMB-002 | Token counting precision — estimate_tokens uses a 4-chars-per-token heuristic explicitly noted as imprecise. No integration with tiktoken or similar is specified. | Introduce a TokenCounter trait (or function pointer) on ContextConfig that defaults to the 4-char heuristic but can be overridden by the caller. This keeps the default zero-dependency while enabling precision via injection. | 5 |
| AMB-003 | Sub-agent error propagation — when a child agent_loop produces only error or tool-only messages (no Text in the final assistant message), extract_final_text returns a fixed fallback string. It is unclear whether the calling tool should return Ok(ToolResult { fallback }) or Err(ToolError::Failed(...)). | Return Ok(ToolResult) with the fallback text always. If the sub-agent produced an error assistant message, include the error_message field in the fallback text so the parent LLM can see and react to it. | 5 |
Level Completion Checklist
- Level 1 — Survive: All core types, traits, and the Agent struct initialize without error; smoke test passes.
- Level 2 — Useful: Text prompt → LLM call → tool execution → final response works end-to-end; all 6 built-in tools execute on valid input; message persistence round-trips correctly.
- Level 3 — Smart: Input filters, retry, provider error classification, tool errors, execution limits, steering/follow-up queues, lifecycle callbacks, tool safety guards, skill loading, and MCP client all handle their error paths without panicking.
-
Level 4 — Professional: All 7 provider protocols implemented; prompt caching and extended thinking integrated; cancellation propagates to all I/O; structured logging in place;
ContextTrackeraccurate. -
Level 5 — Creative: Sub-agent delegation works end-to-end; OpenAPI adapter generates callable tools; Anthropic OAuth and
InputJsonDeltastreaming are correct; all three ambiguities have documented resolutions and implementations. - Level 6 — Boss: All test suites pass (unit, property-based, integration, end-to-end, load); public API docs and examples are complete; CI runs automatically; operational runbooks are written.
Session & Loop Identity — Future Scenarios
Added: 2026-03-22 Status: Foundation implemented (loop_id, ContinuationKind, parent_loop_id, child_loop_id). The scenarios below build on this foundation but are out of scope for the initial change.
The current implementation covers:
loop_idderived fromsession_id + config_id + counter(config owns its identity)ContinuationKindenum:Default,Rerun { tag },Branch { tag }parent_loop_idfor ancestry tracking across reruns/brancheschild_loop_idonToolExecutionEndfor parent→sub-agent traceability- Asserts in
agent_loop_continuerequiringagent_id/session_idto be set TurnTrigger::Branchfires on first turn of aBranchcontinuation
Future: HITL Resume
Scenario: User cancels a loop mid-execution (via Agent::abort()), reviews the partial
output, then resumes. The loop was aborted at some known message boundary.
Mechanism: Caller restores context.messages to the desired resume point, then calls
agent_loop_continue(Rerun | Branch). The kind communicates intent:
Rerun— resume from the same point (same logical path, treat as a retry)Branch— resume but with modifications (e.g., injected steering message, different system prompt, tweaked tool result) — a diverging path from the original
What needs to be built: A context.messages checkpoint API. The current Agent::messages()
getter returns a slice; the caller needs to be able to snapshot and restore it. The save_messages
/ restore_messages methods on Agent already support this (JSON round-trip). The missing piece
is a higher-level Agent::checkpoint() -> Checkpoint and Agent::restore(checkpoint) that
bundle the full state (messages + loop_id + session_id) for clean HITL resume without manual
field management.
Future: Checkpoint Restore
Scenario: Context is serialized to persistent storage (database, file) and later loaded for a new run — either by the same process after restart or by a different process instance.
Mechanism: Same as HITL resume at the loop level. The caller deserializes context.messages
and sets the identity fields (agent_id, session_id, loop_id) to their original values, then
calls agent_loop_continue(Branch). The parent_loop_id points to the last loop ID from the
original session, maintaining the ancestry chain across process boundaries.
What needs to be built: A serializable AgentSnapshot type that captures everything needed
to resume: messages, agent_id, session_id, last_loop_id, and any relevant config fields.
AgentSnapshot::save(path) / AgentSnapshot::load(path) convenience methods. The snapshot does
NOT include the provider config (API keys, base URLs) — those remain in the caller's environment.
Future: Parallel Exploration
Scenario: Multiple branches from the same checkpoint are run concurrently — e.g., A/B testing two different tool result injections, or evaluating three different system prompt variants on the same conversation prefix.
Mechanism: The caller snapshots the context at a branching point, then calls multiple
agent_loop_continue(Branch) concurrently, each with a different modification to context.messages
before the call. Each concurrent call produces an independent event stream with its own loop_id
and parent_loop_id pointing to the same branch-point loop.
What needs to be built: No new primitives are needed — agent_loop_continue and AgentContext
already support this. The caller is responsible for cloning the context and making independent calls.
A higher-level Agent::explore_branches(Vec<BranchSpec>) -> Vec<Receiver<AgentEvent>> convenience
method could simplify the pattern but is not required for correctness.
Concurrency note: Each branch needs its own AgentContext (owned), its own CancellationToken,
and its own mpsc::UnboundedSender. tokio::spawn each agent_loop_continue call independently.
The parent task collects results from all branch receivers.
Future: Auto Origin/Continue Selection
Scenario: The caller wants to send a new message to the agent without knowing whether the
current context requires an origin call (agent_loop) or a continuation (agent_loop_continue).
Mechanism: Inspect context.messages.last():
- No messages →
agent_loop(fresh start) - Last message is
UserorToolResult→agent_loop_continue(already awaiting model response) - Last message is
Assistant→agent_loopwith new prompt (start new turn)
What needs to be built: An Agent::send(message) method (or similar) that encapsulates
this logic. It would inspect the context state, build the appropriate call type, and dispatch.
This trades explicit caller control for convenience and is opt-in.