Architecture

New here? Which pattern fits my use case? routes you to the smallest piece that fixes your symptom — most callers only need one of the two engines, not the whole pipeline.

contextweaver is structured around two cooperating engines that together solve the "context window problem" for tool-using AI agents.

Why context engineering matters

The discipline of context engineering — deciding what goes into a model's context window, when, and at what cost — has emerged as the lever that moves quality and latency once tool-use agents reach production scale. Even with 200K-token windows, dumping every tool schema and conversation turn into the prompt is expensive, slows latency, and degrades output quality as the model's effective attention thins. The lever is selective compilation (per-phase budgets, tool shortlisting, oversized-output firewalling), not raw window size.

contextweaver implements that lever as two cooperating engines — the Context Engine (eight-stage pipeline) and the Routing Engine (bounded DAG + beam search) — and treats determinism, dependency closure, and sensitivity filters as load-bearing invariants rather than nice-to-haves. For background on the term itself, see Atlan's What Is Context Engineering.

High-level overview

               ┌────────────────────────────┐
  Events ─────>│      Context Engine         │──> ContextPack (prompt)
               │  candidates → closure →     │
               │  sensitivity → firewall →   │
               │  score → dedup → select →   │
               │  render                     │
               └────────────────────────────┘
                          ▲ facts / episodes
               ┌──────────┴─────────────────┐
  Tools ──────>│      Routing Engine         │──> ChoiceCards
               │  Catalog → TreeBuilder →    │
               │  ChoiceGraph → Router       │
               └────────────────────────────┘

Package layout

Path	Responsibility
`types.py`	Core dataclasses and enums (`SelectableItem`, `ContextItem`, `Phase`, `ItemKind`)
`envelope.py`	Result types (`ResultEnvelope`, `BuildStats`, `ContextPack`, `ChoiceCard`, `HydrationResult`)
`diagnostics.py`	Versioned gateway event schema, JSONL/in-memory sinks, aggregate reports
`inspection.py`	Payload-safe offline context/routing/artifact reports
`config.py`	Configuration dataclasses (`ContextBudget`, `ContextPolicy`, `ScoringConfig`)
`protocols.py`	Protocol interfaces (`TokenEstimator`, `EventHook`, `Summarizer`, …)
`exceptions.py`	Custom exception hierarchy
`_utils.py`	Text similarity primitives (`tokenize`, `jaccard`, `TfIdfScorer`)
`serde.py`	Serialisation helpers for `to_dict` / `from_dict` patterns
`store/`	In-memory data stores (`EventLog`, `ArtifactStore`, `EpisodicStore`, `FactStore`)
`summarize/`	Rule engine and structured fact extraction
`context/`	Full context compilation pipeline
`routing/`	Catalog, DAG builder, beam-search router, card renderer
`adapters/`	MCP, FastMCP, and A2A protocol adapters
`__main__.py`	CLI entry point (`inspect` includes context/routing/artifact diagnostics)

Context Engine pipeline

The Context Engine compiles a phase-aware, budget-constrained prompt from the event log. The pipeline has eight stages:

generate_candidates — pull phase-relevant events from the event log into the initial candidate pool.
dependency_closure — if a selected item has a parent_id, bring the parent along even if it scored lower.
sensitivity_filter — drop or redact items whose sensitivity level meets or exceeds ContextPolicy.sensitivity_floor.
apply_firewall — tool results are stored out-of-band in the ArtifactStore and replaced with summarized/truncated text for prompt assembly.
score_candidates — rank candidates by recency, tag match, kind priority, and token cost.
deduplicate_candidates — remove near-duplicate items using Jaccard similarity over tokenised text.
select_and_pack — greedily pack the highest-scoring candidates into the token budget for the current phase.
render_context — assemble the final prompt string, grouped by section (facts, history, tool results), with BuildStats metadata.

The pipeline owns BuildStats construction after selection. Candidate totals are captured before sensitivity filtering, while sensitivity, deduplication, kind-limit, and budget exclusions are attributed per item. This preserves the invariant included_count + dropped_count == total_candidates and keeps lifecycle hooks aligned with the returned statistics.

Routing Engine pipeline

The Routing Engine efficiently navigates large tool catalogs so the LLM never sees all tools at once:

Catalog — register and manage SelectableItem objects.
TreeBuilder — convert a flat item list into a bounded ChoiceGraph DAG using namespace grouping, Jaccard clustering, or alphabetical fallback.
Router — beam-search over the graph to find the top-k items most relevant to a user query.
ChoiceCards — render compact, LLM-friendly cards for the selected items (never includes full schemas).

Data stores

All stores are protocol-based with in-memory defaults:

EventLog — append-only log of ContextItem events.
ArtifactStore — blob storage for raw tool outputs intercepted by the firewall.
EpisodicStore — short episodic memory entries (keyed by episode ID).
FactStore — key-value fact entries persisted across turns.
StoreBundle — convenience wrapper grouping all four stores.

Progressive disclosure

context/views.py provides a ViewRegistry that maps content-type patterns to view generators. When the firewall stores a large tool output as an artifact, the view system generates alternative representations (JSON subset, CSV summary, etc.) the agent can drilldown into without retrieving the full blob. drilldown_tool_spec() exposes drilldown as an agent-callable tool.

Design principles

Minimal core dependencies — a small, audited set (tiktoken, PyYAML, rank-bm25, mcp, jsonschema, typer, rich); Python ≥ 3.10.
Deterministic — tie-break by ID, sorted keys.
Protocol-based — all store and estimator interfaces are typing.Protocol, allowing custom implementations.
Async-first — the Context Engine exposes build() (async) with a build_sync() wrapper for synchronous callers.
Budget-aware — every build is constrained by the phase-specific token budget; BuildStats explains what was kept and what was dropped.