Context Management

Long-running agents accumulate messages that exceed the model's context window. phi-core provides token tracking, overflow detection, tiered compaction, and execution limits.

The context module is split into sub-modules: token, config, tracker, compaction, strategy, compact_messages, execution, orchestration.

Token Estimation

Fast estimation without external tokenizer dependencies:

#![allow(unused)]
fn main() {
use phi_core::context::{estimate_tokens, message_tokens, total_tokens};

estimate_tokens("Hello world");          // ~3 tokens (chars / 4)
message_tokens(&agent_message);          // estimate for a single message
total_tokens(&messages);                 // estimate for all messages
}

Context Tracking

ContextTracker combines real token counts from provider responses with estimation for new messages — more accurate than pure estimation:

#![allow(unused)]
fn main() {
use phi_core::context::ContextTracker;

let mut tracker = ContextTracker::new();

// After each assistant response, record the real usage:
tracker.record_usage(&assistant_usage, message_index);

// Get current context size (real usage + estimated trailing):
let tokens = tracker.estimate_context_tokens(agent.messages());

// After compaction, reset the tracker:
tracker.reset();
}

When no usage data is available, it falls back to chars/4 estimation.

Context Overflow Detection

When the context exceeds a model's window, providers return overflow errors. phi-core detects these automatically across all major providers.

HTTP-level detection

Providers that check before streaming (Google, Bedrock, Vertex) return ProviderError::ContextOverflow:

#![allow(unused)]
fn main() {
use phi_core::provider::ProviderError;

match agent.prompt("...").await {
    // The loop already handles this — but you can also match it:
    Err(ProviderError::ContextOverflow { message }) => {
        // Compact and retry
    }
    _ => {}
}
}

ProviderError::classify() auto-detects overflow from error messages covering Anthropic, OpenAI, Google, AWS Bedrock, xAI, Groq, OpenRouter, llama.cpp, LM Studio, MiniMax, Kimi, GitHub Copilot, and generic patterns.

Message-level detection

SSE-based providers (Anthropic, OpenAI) return overflow as a StopReason::Error message. Check with:

#![allow(unused)]
fn main() {
if message.is_context_overflow() {
    // Compact and retry
}
}

Handling overflow in your application

phi-core provides the detection and building blocks. Your application wires the compaction strategy:

#![allow(unused)]
fn main() {
// Proactive: check before each prompt
let tokens = tracker.estimate_context_tokens(agent.messages());
if tokens > context_window - reserve {
    let compacted = compact_messages(agent.messages().to_vec(), &config);
    agent.replace_messages(compacted);
}

// Reactive: catch overflow errors
// ... on ContextOverflow or message.is_context_overflow():
//   compact, then retry with agent.continue_loop()
}

For LLM-based summarization (asking the model to summarize old messages), implement that in your application layer — phi-core provides replace_messages() and compact_messages() as building blocks.

ContextConfig

#![allow(unused)]
fn main() {
pub struct ContextConfig {
    pub max_context_tokens: usize,      // Default: 100,000
    pub system_prompt_tokens: usize,    // Default: 4,000
    pub compaction: CompactionConfig,   // Primary compaction settings

    // Custom token counter (serde-skipped). None → HeuristicTokenCounter (chars/4).
    pub token_counter: Option<Arc<dyn TokenCounter>>,

    // Legacy backward-compat fields (prefer CompactionConfig equivalents):
    pub keep_recent: usize,             // Default: 10
    pub keep_first: usize,             // Default: 2
    pub tool_output_max_lines: usize,  // Default: 50
}

pub struct CompactionConfig {
    // ── WHEN to compact ──
    pub compact_at_pct: f64,                     // Default: 0.90 (90%)
    pub compact_budget_threshold_pct: f64,       // Default: 0.05 (5%)
    pub compaction_scope: CompactionScope,       // Default: FixedCount(3)

    // ── HOW to compact ──
    pub keep_first_turns: usize,                 // Default: 2
    pub keep_recent_turns: usize,                // Default: 10
    pub max_summary_tokens: usize,               // Default: 2_000 (budget, not per-turn)
    pub tool_output_max_lines: usize,            // Default: 50
}
}

CompactionScope

Controls how many earlier loops are included in compaction and context loading:

VariantDescription
FixedCount(usize)Compact a fixed number of earlier loops on the active chain. Default: FixedCount(3).
TokenBudgetWalk the chain backward, accumulating per-loop token estimates, and stop when max_context_tokens would be exceeded. Loops whose raw messages exceed the budget are still included — their compacted summaries will fit.

See compaction.md for full details on the non-destructive overlay model.

Tiered Compaction

compact_messages() tries each level in order, stopping as soon as messages fit the budget:

Level 1: Truncate Tool Outputs

Replaces long tool outputs with head + tail (keeping first N/2 and last N/2 lines). This is the cheapest — preserves conversation structure, typically saves 50-70% in coding sessions.

Level 2: Summarize Old Turns

Keeps the last keep_recent messages in full detail. Older assistant messages are replaced with one-line summaries like "[Summary] [Assistant used 3 tool(s)]", and their tool results are dropped.

Level 3: Drop Middle Messages

Keeps keep_first messages from the start and keep_recent from the end, dropping everything in between. A marker message notes how many were removed.

ExecutionLimits

Prevents runaway agents:

#![allow(unused)]
fn main() {
pub struct ExecutionLimits {
    pub max_turns: usize,              // Default: 50
    pub max_total_tokens: usize,       // Default: 1,000,000
    pub max_duration: Duration,        // Default: 600s (10 min)
    pub max_cost: Option<f64>,         // Default: None (no cost cap)
}
}

max_cost caps cumulative dollar cost for the run. Requires AgentLoopConfig.cost_config to be set — without pricing rates the accumulated cost is always 0.0 and this limit has no effect.

When a limit is reached, the agent stops with a message like "[Agent stopped: Max turns reached (50/50)]".

Disabling Context Management

#![allow(unused)]
fn main() {
let agent = BasicAgent::new(model_config)
    .without_context_management();
}

This sets both context_config and execution_limits to None.