Skip to content

LlamaIndex Integration

Pair contextweaver's bounded-choice routing, phase-specific context, and context firewall with LlamaIndex's ReActAgent so the LLM sees a focused shortlist of tools and a budgeted prompt instead of the entire catalogue and conversation history.

Why

A LlamaIndex ReActAgent doing function-calling over a large catalogue runs into three concrete problems:

  • Tool overload. Every tool's name + description goes into the system prompt. With 50+ tools that's thousands of tokens before the user even speaks.
  • Unbounded context. The agent's chat memory grows turn by turn; once a tool returns a multi-KB blob, every subsequent turn pays for it.
  • No phase awareness. The same prompt is used for "which tool?", "what arguments?", and "what's the final answer?" — they all need different things.

contextweaver fixes all three without forking LlamaIndex.

Prerequisites

pip install contextweaver llama-index llama-index-llms-openai
export OPENAI_API_KEY=sk-...

Architecture

User query
   │
   ▼
contextweaver Router            ← all tools registered in Catalog
   │ (top-k shortlist)
   ▼
LlamaIndex ReActAgent           ← receives only the shortlist as tools
   │ (function call)
   ▼
contextweaver Firewall          ← intercepts large results
   │ (summary + artifact handle)
   ▼
contextweaver ContextManager    ← phase-specific prompt compilation
   │ (budgeted ContextPack)
   ▼
LLM

You hook contextweaver in at two points: before tool selection to narrow the tool list, and after each tool call to firewall the raw result.

Minimal wiring

from llama_index.core.agent import ReActAgent
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI

from contextweaver.context.manager import ContextManager
from contextweaver.routing.catalog import Catalog
from contextweaver.routing.router import Router
from contextweaver.routing.tree import TreeBuilder
from contextweaver.types import ContextItem, ItemKind, Phase, SelectableItem


def db_query(sql: str) -> str:
    """Execute a read-only SQL query and return JSON rows."""
    return '{"rows": [...]}'   # imagine a 5 KB response


def send_email(to: str, body: str) -> str:
    """Send an email to *to* with *body*."""
    return "ok"


# 1. Register every tool in contextweaver's Catalog as a SelectableItem.
catalog = Catalog()
for fn in (db_query, send_email):
    catalog.register(SelectableItem(
        id=fn.__name__,
        kind="tool",
        name=fn.__name__,
        description=(fn.__doc__ or "").strip().splitlines()[0],
        namespace=fn.__name__.split("_", 1)[0],
    ))
graph = TreeBuilder(max_children=8).build(catalog.all())
router = Router(graph, items=catalog.all(), top_k=3)

# 2. One ContextManager per session.
ctx_mgr = ContextManager()

# 3. Per-turn loop.
def respond(user_query: str, turn: int) -> str:
    # Ingest the user turn.
    ctx_mgr.ingest_sync(ContextItem(
        id=f"u{turn}", kind=ItemKind.user_turn, text=user_query,
    ))

    # Route to top-k tools (LlamaIndex never sees the full catalogue).
    routed = router.route(user_query)
    selected = [
        FunctionTool.from_defaults(fn=locals_lookup[tid])
        for tid in routed.candidate_ids
    ]

    # Hand the shortlist + budgeted prompt to LlamaIndex.  pack.prompt is a
    # plain string; LlamaIndex's chat_history expects a list[ChatMessage],
    # so we wrap it as a single SYSTEM message that primes the agent.
    pack = ctx_mgr.build_sync(phase=Phase.answer, query=user_query)
    agent = ReActAgent.from_tools(selected, llm=OpenAI(model="gpt-4"))
    response = agent.chat(
        user_query,
        chat_history=[ChatMessage(role=MessageRole.SYSTEM, content=pack.prompt)],
    )
    return str(response)

locals_lookup is whatever map your runtime uses to resolve a tool ID back to its Python implementation; the routing layer is intentionally just IDs and scores.

Firewalling tool results

ReActAgent exposes the underlying tool callable; wrap it so the raw output flows through the context firewall before LlamaIndex sees it:

from llama_index.core.tools import FunctionTool

def _firewalled(fn, tool_call_id: str):
    def wrapped(*args, **kwargs):
        raw = fn(*args, **kwargs)
        item, _envelope = ctx_mgr.ingest_tool_result_sync(
            tool_call_id=tool_call_id,
            raw_output=str(raw),
            tool_name=fn.__name__,
        )
        # item.text is the firewall summary; the raw bytes are in
        # ctx_mgr.artifact_store under item.artifact_ref.handle.
        return item.text
    wrapped.__name__ = fn.__name__
    wrapped.__doc__ = fn.__doc__
    return wrapped

tools = [
    FunctionTool.from_defaults(fn=_firewalled(db_query, "tc-1")),
    FunctionTool.from_defaults(fn=_firewalled(send_email, "tc-2")),
]

Inside the agent the LLM sees a compact summary; if it needs more detail it can ask for a drilldown (see the firewall + drilldown cookbook recipe).

Phase-specific budgets

You usually want different budgets per phase. Pass a ContextBudget to ContextManager once and call build_sync(phase=...) everywhere:

from contextweaver.config import ContextBudget

ctx_mgr = ContextManager(
    budget=ContextBudget(route=500, call=1200, interpret=1500, answer=3000),
)

LlamaIndex's ReActAgent doesn't have a built-in concept of phases, so the typical pattern is:

  • Phase.route — when selecting which tool to call (via router.route())
  • Phase.call — when assembling arguments (the agent already does this with the schema; use this phase if you build your own ReAct trace)
  • Phase.interpret — right after the tool returns, when the model decides what to do next
  • Phase.answer — when generating the final user-facing reply

Advanced patterns

  • Custom phase budgets for long RAG retrievals — bump interpret and answer budgets so a multi-KB chunk has room to land.
  • Episodic memory across sessions — store pack.stats.to_dict() and the final agent response in an EpisodicStore for the next session.
  • Fact extraction — the firewall pulls structured facts out of tool_result items by default; expose them via ResultEnvelope.facts to build a per-session knowledge base.
  • Custom retrieval backend — register a BM25 / fuzzy retriever via engine_registry when LlamaIndex is already producing embeddings and you'd rather route on those.

Troubleshooting

  • agent.chat() answers as if it has no memory. You probably didn't pass pack.prompt into the call. LlamaIndex won't pick it up from contextweaver implicitly — you compile the context, you inject it (as chat_history=[ChatMessage(role=MessageRole.SYSTEM, content=pack.prompt)], not as a bare string).
  • The firewall summary is too short. Override Summarizer on ContextManager(summarizer=...); the default is a 500-char truncation of the first paragraph, deliberately conservative.
  • The router skips a tool you know is relevant. Check result.scores — TF-IDF on short descriptions can lose to keyword collisions. Add tags or tweak the description, or use context_hints=[...].
  • Budget overrun. Inspect pack.stats after every build — the dropped_reasons map tells you exactly which stage rejected what.

See also