phi-core — Project Overview

1. Purpose Statement

phi-core is a Rust async library for building stateful, multi-turn LLM agents that can autonomously execute tools to accomplish tasks. The library solves the core engineering problems of agent construction: routing between many LLM provider APIs through a unified interface, running a prompt-then-tool-call loop until the model signals completion, streaming real-time events to UI consumers, and automatically managing context windows so conversations do not exceed model token limits. It is designed to be embedded as a dependency in application code — it provides no standalone binary, no HTTP server, and no user interface of its own.

2. Key Capabilities

CapabilitySource Location
Multi-turn conversation loop (prompt → LLM → tool call → repeat)src/agent_loop/
Support for 20+ LLM providers via 7 distinct API protocolssrc/provider/
Real-time event streaming over an async channelsrc/types/ (AgentEvent), src/agent_loop/
Parallel, sequential, or batched tool executionsrc/agent_loop/:execute_tool_calls()
Context compaction via CompactionBlock overlays (legacy: tiered compact_messages())src/context/ — compaction is now modeled via CompactionBlock
Built-in coding tools: bash execution, file read/write/edit, directory listing, grep searchsrc/tools/
Sub-agent delegation: run an isolated child agent as a toolsrc/agents/sub_agent.rs
Model Context Protocol (MCP) client for stdio and HTTP tool serverssrc/mcp/
AgentSkills system: load instruction sets from directory-based skill filessrc/context/skills.rs
OpenAPI tool auto-generation from spec files or URLs (optional feature)src/openapi/
JSON serialization of entire conversation history for persistencesrc/types/ (all types derive Serialize/Deserialize)
Exponential-backoff retry for rate-limit and network errorssrc/provider/retry.rs
Prompt caching hints for compatible providers (Anthropic)src/types/ (CacheConfig)
Extended thinking / reasoning modesrc/types/ (ThinkingLevel)
Lifecycle callbacks: before/after each turn, on errorsrc/agent_loop/ (BeforeTurnFn, AfterTurnFn, OnErrorFn)
Loop-level hooks: setup/teardown around each complete agent runsrc/agent_loop/ (BeforeLoopFn, AfterLoopFn)
Tool-level hooks: intercept each tool execution and streaming updatesrc/agent_loop/ (BeforeToolExecutionFn, AfterToolExecutionFn, BeforeToolExecutionUpdateFn, AfterToolExecutionUpdateFn)
Agent identity: stable agent_id / session_id / loop_id for cross-loop traceabilitysrc/agents/basic_agent.rs, src/types/
Evaluational parallelism: agent_loop_parallel() runs N AgentLoopConfigs concurrently on the same prompt, evaluates results via the pluggable EvaluationStrategy trait, and delivers the best outcome. Built-in strategies: TransparentEvaluation, PickFirstEvaluation, TokenEfficientEvaluation, ElaborateEvaluation, LlmJudgeEvaluation (with iterative compaction to satisfy judge's comprehension criteria). ParallelLoopStart/ParallelLoopEnd events bracket execution. Session continuity: selected_context feeds directly into agent_loop_continue().src/agent_loop/ (agent_loop_parallel), src/agent_loop/evaluation.rs, src/types/
Continuation kinds: Initial, Default, Rerun, Branch, Compaction variants for origin, retry, explore, and compaction semanticssrc/types/ (ContinuationKind), src/agent_loop/
Input filtering: moderation, PII redaction, injection detectionsrc/types/ (InputFilter)
User steering mid-run: inject messages between tool callssrc/agents/basic_agent.rs (steering queue), src/agent_loop/
Follow-up work queuing: append more tasks after agent would stopsrc/agents/basic_agent.rs (follow-up queue), src/agent_loop/
Execution limits: max turns, max total tokens, max durationsrc/context/ (ExecutionLimits, ExecutionTracker)

3. Inputs & Outputs

Inputs

InputFormatDescription
User promptVec<AgentMessage> or StringText (or multi-content) messages to start or continue a conversation
System promptStringInstruction set defining agent behavior, injected at each LLM call
Tool definitionsVec<Box<dyn AgentTool>>Executable tools exposed to the LLM via JSON Schema
LLM provider configModelConfigSingle provider identity card: id, api_key, base_url, api: ApiProtocol, cost, compat. Factory methods: ModelConfig::anthropic(), ::openai(), ::local(), ::google(), ::openrouter(). Pass to BasicAgent::new() or AgentLoopConfig.model_config.
Steering messagesVec<AgentMessage> via queueUser-injected messages that interrupt mid-run tool execution
Follow-up messagesVec<AgentMessage> via queueQueued tasks appended when the agent would otherwise stop
Context configContextConfigToken budget, compaction parameters
Execution limitsExecutionLimitsMax turns, tokens, duration
Skill directoriesVec<Path>Directories containing SKILL.md files
MCP server commandsCommand string, args, envStdio or HTTP MCP server specifications
OpenAPI specFile path, URL, or YAML/JSON stringAPI specs to auto-generate tools from
Cancellation tokenCancellationTokenExternal abort signal

Outputs

OutputFormatDescription
Agent event streamUnboundedReceiver<AgentEvent>Real-time stream of all events (text deltas, tool calls, results, errors)
Final messagesVec<AgentMessage>All new messages produced in the run (returned from agent_loop())
Serialized conversationJSONComplete message history, serializable for persistence
Tool resultsEmbedded in AgentEvent::ToolExecutionEndStructured result of each tool call
Usage statisticsUsage struct per turnInput/output/cache token counts per LLM call

4. Actors & Use Cases

Application Developer

The primary consumer. Embeds phi-core as a library dependency.

Use CaseHow Triggered
Build a coding assistantCreate Agent, attach built-in tools, call agent.prompt("...")
Build a CLI REPLLoop reading stdin, call agent.prompt(), render events (see examples/cli.rs)
Persist conversation across sessionsCall agent.save_messages() → JSON → agent.restore_messages()
Run a task autonomously with limitsSet ExecutionLimits, observe AgentEvent::AgentEnd
Interrupt a running agentCall agent.steer(message) while event loop is running
Chain specialized agentsAttach SubAgentTool instances to a parent agent
Use third-party toolsConnect to an MCP server via agent.with_mcp_server_stdio()
Expose a REST API as toolsLoad OpenAPI spec via agent.with_openapi_file()

End User (via application)

Interacts through the application wrapping this library. Uses cases match what the application exposes (e.g., CLI prompts in examples/cli.rs: /quit, /clear, /model).

LLM Provider

External service receiving structured HTTP requests. The library sends conversation history and tool schemas; the provider returns streaming token deltas and final messages. Providers never call back into the library.

MCP Server

External process exposing tools over the Model Context Protocol. The library connects as a client via stdio pipe or HTTP. The server exposes tool definitions that are adapted into AgentTool instances.

Sub-Agent

A child instance of the agent loop spawned internally when a SubAgentTool is called. Operates with its own fresh context and toolset. Results are returned to the parent as a ToolResult.

5. Constraints & Non-Goals

  • No built-in HTTP server. The library is embeddable only; serving the agent over HTTP requires external frameworks.
  • No user interface. UI rendering (text display, color, input handling) is the application's responsibility (see examples/cli.rs for a reference implementation).
  • No authentication management. API keys must be supplied by the caller. The library does not fetch, rotate, or cache credentials.
  • Single event consumer per run. agent_loop() returns a single UnboundedReceiver<AgentEvent>. Fan-out to multiple consumers requires application-level bridging.
  • No agent-to-agent networking. Sub-agents run in-process only. No remote agent delegation.
  • No persistent storage. Conversation state is held in memory. Serialization to disk is the caller's responsibility (the library provides serialize/deserialize helpers).
  • No built-in precision token counting. The default HeuristicTokenCounter uses 4 characters per token. A pluggable TokenCounter trait (src/context/token.rs) allows callers to supply a custom counter (e.g., tiktoken-based), but no precision implementation ships with the library.
  • No multi-modal generation. Images can be sent to the model (as Content::Image), but image generation is not supported.
  • No structured output / JSON mode. The library passes raw messages; enforcing structured output is the caller's responsibility via system prompt.
  • Skipped tools on steering. When steering messages arrive mid-batch, remaining tool calls in that batch are skipped with an error result — their outputs are never computed. This is a documented behavior, not a bug.

6. Key Terminology Glossary

TermDefinition
AgentThe runtime interface trait (src/agents/agent.rs). Programs against this trait to remain independent of the specific implementation. BasicAgent (src/agents/basic_agent.rs) is the default in-memory implementation: owns conversation history, tools, ModelConfig (provider identity + auth + cost), and configuration. Construction: BasicAgent::new(ModelConfig::anthropic(...)). The application-facing entry point.
Agent LoopThe recursive execution cycle (src/agent_loop/) that calls the LLM, processes tool calls, checks steering, and repeats until the LLM stops or limits are hit.
TurnOne complete LLM call plus the resulting tool executions. Bounded by TurnStart/TurnEnd events. Materialized as a Turn struct on LoopRecord.turns (src/session/model.rs).
SteeringA Vec<AgentMessage> injected into the running loop between tool executions. Used to redirect the agent mid-task without restarting it.
Follow-upA Vec<AgentMessage> queued to be injected after the agent would naturally stop. Extends the run without creating a new agent_loop() call.
ModelConfigThe single, complete description of a provider connection (src/provider/model.rs). Fields: id (model name sent to API), name (display label), api: ApiProtocol (wire-protocol dispatch key), provider (logging label), base_url, api_key, cost: CostConfig, headers, compat: Option<OpenAiCompat>. Factory methods: anthropic(), openai(), local(), google(), openrouter(). Passed to BasicAgent::new(), SubAgentTool::new(), and AgentLoopConfig.model_config.
ApiProtocolEnum that selects which HTTP wire format to use: AnthropicMessages, OpenAiCompletions, OpenAiResponses, AzureOpenAiResponses, GoogleGenerativeAi, GoogleVertex, BedrockConverseStream. Used by ProviderRegistry as a dispatch key.
StreamProviderThe trait (src/provider/traits.rs) that any LLM backend must implement. Has a single method stream() that takes a StreamConfig and sends StreamEvents.
AgentToolThe trait (src/types/) that any executable tool must implement. Methods: name(), label(), description(), parameters_schema(), execute().
ToolContextA struct passed to AgentTool::execute() containing the call ID, name, cancellation token, and optional progress callbacks.
AgentEventThe streaming event enum emitted to the consumer during a run. Covers agent lifecycle, turn lifecycle, message streaming, and tool execution.
StreamDeltaA partial content update emitted during LLM streaming: Text, Thinking, or ToolCallDelta.
StopReasonWhy the LLM ended its response. Variants: Stop (natural end), Length (token limit), ToolUse (returned tool calls), Error (failure), Aborted (cancellation), MaxTurns, UserStop, Handoff, GuardRail, ContextCompacted, Paused.
AgentMessageThe top-level message enum stored in the conversation history. Either Llm(LlmMessage) (sent to the LLM; LlmMessage wraps Message + optional TurnId for turn tracking) or Extension(ExtensionMessage) (app-only metadata).
MessageThe LLM-protocol message enum: User, Assistant, or ToolResult.
ContentA single content block within a message: Text, Image (base64), Thinking, or ToolCall.
UsageToken count metadata returned with each Assistant message: input, output, cache_read, cache_write, total_tokens.
ContextConfigConfiguration for the automatic context compaction: token budget, lines-to-keep per tool output, number of recent/first messages to preserve.
CompactionStrategyA trait for customizing how messages are compacted when the token budget is exceeded. The default implementation uses 3 tiers.
CompactionBlockThe model used by the compaction system to represent compacted message regions. Replaces the previous inline approach in compact_messages() with a structured block-based representation.
ExecutionLimitsHard caps on agent execution: max_turns, max_total_tokens, max_duration, max_cost: Option<f64>. When exceeded, the loop appends a system message and stops.
ToolExecutionStrategyHow multiple tool calls from one LLM response are dispatched: Sequential, Parallel (default), or Batched { size }.
CacheConfig / CacheStrategyControls prompt caching breakpoint placement for providers that support it (Anthropic). Strategies: Auto, Disabled, Manual.
ThinkingLevelControls extended reasoning depth: Off, Minimal, Low, Medium, High. Translated to provider-specific parameters.
AgentSkillsA directory-based system for loading instruction files (SKILL.md) that extend agent capabilities. Compatible with the AgentSkills open standard.
MCPModel Context Protocol. A standard for tool servers that communicate over stdio or HTTP. The library acts as an MCP client.
SubAgentToolAn AgentTool implementation that, when called by the parent LLM, spawns a complete child agent_loop() with isolated context.
InputFilterA synchronous trait applied to user text before the LLM call. Returns Pass, Warn(text) (appended to message), or Reject(reason) (aborts run).
ExtensionMessageAn AgentMessage variant that is not sent to the LLM. Used for application-specific metadata (UI state, notifications) stored in conversation history.
ContextTrackerTracks context token usage using a hybrid of real provider-reported counts and local heuristic estimates for messages since the last report.
ProviderErrorThe error enum returned by StreamProvider::stream(). Variants: Api, Network, Auth, RateLimited, ContextOverflow, Cancelled, Other.
ToolDefinitionA schema-only description of a tool sent to the LLM (name, description, JSON Schema parameters). Does not include the execute function.
RetryConfigExponential-backoff configuration for retrying RateLimited and Network provider errors.
AgentLoopConfigA flat configuration struct passed to agent_loop() / agent_loop_continue() bundling all behavioral settings. Required field: model_config: ModelConfig (provider identity, auth, cost rates). Optional provider_override: Option<Arc<dyn StreamProvider>> bypasses registry dispatch (used in tests).
QueueModeControls how queued messages (steering/follow-ups) are consumed per read. OneAtATime (default): pops only the first queued message. All: drains the entire queue at once.
McpContentA content item returned by an MCP tool call. Variants: Text { text } and Image { data: base64, mimeType }.
OpenApiAuthAuthentication method for OpenAPI requests. Variants: None, Bearer(token), ApiKey { header, value }. Token/value is redacted in debug output.
OperationFilterControls which OpenAPI operations become tools. Variants: All, ByOperationId, ByTag, ByPathPrefix. Operations without an operationId are always skipped.
agent_idA UUID v4 string generated once when Agent::new() is called. Stable for the lifetime of the Agent instance. Included in every AgentStart event to identify which agent produced the run.
session_idA UUID v4 string generated once when Agent::new() is called. Groups all loops (origin + continuations) that belong to one logical session. Stable for the lifetime of the Agent instance.
loop_idA string of the form "{session_id}.{config_id}.{N}" that uniquely identifies one agent_loop / agent_loop_continue call. The config_id segment is either caller-supplied or auto-derived from provider + model + thinking level. N is a per-config_id monotonic counter. Included in every AgentStart event.
ContinuationKindLabels how an agent_loop or agent_loop_continue call relates to prior loops. Set on AgentContext.continuation_kind before calling. Variants: Initial (origin agent_loop call; the #[default]), Default (unspecified continuation), Rerun { tag } (retry the same scenario from an equivalent context), Branch { tag } (explore a different execution path), Compaction (context-compacted continuation). Tags are RFC 3339 UTC timestamps. Surfaced in AgentStart.continuation_kind.
TurnTriggerIdentifies what caused a turn to begin. Emitted in TurnStart.triggered_by. Variants: User (first turn of an Initial continuation — i.e., origin agent_loop call), SubAgent (running as a sub-agent via SubAgentTool), Continuation (subsequent turns, tool round-trips, Default/Rerun continuations, and steering-injected turns; renamed from FollowUp), Branch (first turn of a ContinuationKind::Branch continuation).
BeforeLoopFn / AfterLoopFnLoop-level lifecycle hooks on AgentLoopConfig. BeforeLoopFn fires before AgentStart — return false to abort the run before it begins. AfterLoopFn fires after AgentEnd with the new messages and accumulated usage.
BeforeToolExecutionFn / AfterToolExecutionFnTool-level lifecycle hooks on AgentLoopConfig. BeforeToolExecutionFn fires before ToolExecutionStart — return false to skip the tool call. AfterToolExecutionFn fires after ToolExecutionEnd with the tool name, call ID, and error flag.
BeforeToolExecutionUpdateFn / AfterToolExecutionUpdateFnStreaming tool update hooks on AgentLoopConfig. Fire around each ToolExecutionUpdate event emitted when a tool calls ctx.on_update(partial). BeforeToolExecutionUpdateFn returns false to suppress the event (tool keeps running; final ToolResult is unaffected). AfterToolExecutionUpdateFn fires after the event if not suppressed.