Voice agent (Pipecat)

Production reference architecture for a real-time customer-service voice bot. Demonstrates the asyncio.to_thread(mgr.build_sync, …) pattern documented in the Pipecat integration guide, plus tight per-phase budgets that keep TTS responsive under a 300 ms latency bound.

TL;DR

What	Where
The script	`examples/architectures/voice_agent/main.py`
The catalog	`examples/architectures/voice_agent/catalog.yaml`
Captured output	`examples/architectures/voice_agent/OUTPUT.md`
Local README	`examples/architectures/voice_agent/README.md`
Companion guide	`docs/integration_pipecat.md`

Run it:

python examples/architectures/voice_agent/main.py

(Or make architectures / make example.)

For the optional Pipecat hook:

pip install 'contextweaver[voice]'

The shape

The bot walks a five-turn customer-service call (order chase, address update, callback scheduling):

"hi, can you look up order number A-481 for me" — orders.lookup
"what is the shipping tracking status for that order" — shipping.tracking
"can you change the delivery address to my new home" — shipping.update_address
"when is the next available delivery slot" — shipping.delivery_slot
"schedule a callback for me at 2pm tomorrow" — callback.schedule

For each turn:

The Router narrows 18 tools to a top-3 shortlist (top_k=3). Routing is sub-millisecond and runs on the audio thread.
The bot picks one tool from the shortlist using an explicit intent map. That separation is the load-bearing pattern: contextweaver bounds the choice, the bot (or, in production, an LLM) makes the final selection.
Every context build runs via asyncio.to_thread(mgr.build_sync, …) — keeping the audio event loop free while the prompt is assembled on a worker thread.
Persistent facts (customer.order_id, customer.shipping_address, customer.callback) survive across all five turns of the call.

What's load-bearing

contextweaver feature	Used	What it does here
`Router.route(query)`	✅	Narrows 18 tools → top-3 shortlist (`top_k=3`)
Bounded choice pattern	✅	Bot picks from the shortlist, not from the whole catalog
`asyncio.to_thread(mgr.build_sync, …)`	✅	Every context build runs on a worker thread so the audio pipeline event loop stays free
Tight per-phase budgets	✅	`ContextBudget(route=200, call=500, interpret=400, answer=1000)` keeps every prompt small enough for sub-300 ms TTS
Persistent facts	✅	Three fact keys survive across all five turns of the call

The async pattern in detail

contextweaver's context pipeline is sync (deterministic, no IO). For real-time pipelines you want it off the audio event loop. The canonical pattern, used in this example and documented in the Pipecat integration guide:

async def _async_build(mgr, *, phase, query):
    return await asyncio.to_thread(mgr.build_sync, phase=phase, query=query)

Why this works:

The _build pipeline (eight sync stages — see the architecture overview) is pure Python computation; no awaits, no IO.
Wrapping it in asyncio.to_thread hands it to the default executor so the event loop can continue draining audio frames.
The routing call (router.route(query)) is fast enough — typically sub-millisecond for catalogs in the 10–100 range — that you can keep it on the audio thread without measurable jitter.

What's intentionally not here

A live audio pipeline. The script is text-only; the per-turn output simulates STT. For a worked Pipecat FrameProcessor, see the Pipecat integration guide.
TTS latency measurement. The script prints "off-thread" timings for the context build, which is the part contextweaver controls. TTS / network IO is the model's / transport's responsibility.
Across-call session state. The fact store survives the in-process call. To persist across calls, swap the default InMemoryFactStore for a SqliteFactStore (issue #174) or a Mem0 / Zep adapter (issue #195).