Code-review bot
Production reference architecture for a pull-request review bot fronting ~24 analysis tools. Demonstrates how the context firewall carries the weight of a code-review workflow: large diff and grep outputs go straight to the artifact store while the prompt stays compact.
TL;DR
| What | Where |
|---|---|
| The script | examples/architectures/code_review_bot/main.py |
| The catalog | examples/architectures/code_review_bot/catalog.yaml |
| Captured output | examples/architectures/code_review_bot/OUTPUT.md |
| Local README | examples/architectures/code_review_bot/README.md |
Run it:
python examples/architectures/code_review_bot/main.py
(Or make architectures / make example.)
The shape
The bot walks a six-step review of a regression-introducing refactor in
payments/charge.py:
- "show me the diff of this pull request against main" —
git.diff(large output → firewall) - "grep for the symbol legacy_charge in the codebase" —
grep.symbol(large output → firewall) - "run the test suite for the changed module" —
test.run_module - "run mypy on the changed module to surface type errors" —
typecheck.module - "run ruff on the changed files and report style violations" —
lint.run - "post a review comment requesting changes on the regression" —
review.post_comment
For each step:
- The
Routernarrows 24 tools to a top-3 shortlist (top_k=3). - The bot picks one tool from the shortlist using an explicit intent map. That separation is the load-bearing pattern: contextweaver bounds the choice, the bot (or, in production, an LLM) makes the final selection.
- The tool is "called" against a mocked backend. Large outputs go through the firewall — the 28 KB diff dump and 2.5 KB grep result become 500-char summaries on the prompt while the raw bytes are parked in the artifact store.
- Persistent facts (
pr.target_file,pr.test_status,pr.type_errors) are written viaContextManager.add_fact_syncso they survive into the answer-phase prompt for every subsequent step.
What's load-bearing
| contextweaver feature | Used | What it does here |
|---|---|---|
Router.route(query) |
✅ | Narrows 24 tools → top-3 shortlist (top_k=3) |
| Bounded choice pattern | ✅ | Bot picks from the shortlist, not from the whole catalog |
TreeBuilder DAG |
✅ | One-shot graph build at startup; routes are sub-millisecond |
| Context firewall | ✅✅ | Compacts the ~28 KB diff dump and ~2.5 KB grep result down to ~500-char summaries before they touch the prompt |
| Artifact store | ✅ | Raw bytes stay addressable for drilldown; only summaries land in the prompt |
| Persistent facts | ✅ | Three fact keys survive across all six review steps |
| Tight per-phase budgets | ✅ | ContextBudget(route=1500, call=2500, interpret=2500, answer=3500) keeps the answer prompt small even after the firewall externalises the heavy bytes |
Why this architecture matters
Code-review bots are firewall-bound. Every step produces output that would saturate a model's context window if inlined: a typical PR diff is 10–50 KB, a grep for a renamed symbol returns dozens of hits, lint and typecheck pipelines emit hundreds of lines on a hot patch. Without a firewall, the prompt blows the budget by step 3 and the bot starts truncating mid-review.
contextweaver's Context Engine
handles this without a per-tool integration: the firewall fires on any
result exceeding firewall_threshold (2 KB default), parks the raw
bytes in the artifact store, injects a compact summary on the prompt,
and leaves the artifact handle so the LLM can request a drilldown when
needed.
What's intentionally not here
- Real git integration. Mock tool responses keep the example
deterministic and CI-friendly. A real deployment would wire
git.difftosubprocess.run(["git", "diff", "main"], ...)or to an MCP server. - LLM-based diff summarisation. The
review.summarize_difftool's canned response is a stand-in for what aSummarizerplugin (issue #26) would do. - Multi-PR session state. The fact store survives the in-process
review, but to persist across PRs you would swap the default
InMemoryFactStorefor aSqliteFactStore(issue #174) or a Mem0 / Zep adapter (issue #195).
Read next
- Slack ops bot — the first architecture in the series; demonstrates persistent facts and routing across many namespaces.
- Voice agent — the third architecture; demonstrates
the
asyncio.to_thread(mgr.build_sync, …)pattern for real-time pipelines. - The cookbook covers the individual primitives — routing, firewall, drilldown — used here.