LlamaIndex
RAG-aware AI governance for LlamaIndex — automatic policy checks for every LLM call, tool invocation, and (uniquely) every retrieved document chunk before synthesis.
LlamaIndex Integration
Add one callback handler. Every LLM call, every tool invocation, and (uniquely for RAG) every retrieved document chunk is checked against your Palveron policies. PII in a source document never silently bleeds into your model's context window.
Installation
pip install palveron-llamaindexQuickstart
from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager
from palveron_llamaindex import PalveronCallbackHandler
handler = PalveronCallbackHandler(api_key="pv_live_xxx")
Settings.callback_manager = CallbackManager([handler])
documents = SimpleDirectoryReader("./customer_files").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# PII in any retrieved chunk → PalveronGovernanceError before synthesis
response = query_engine.query("Summarize the customer file")What Is Checked
| Event | When | What | Why it matters |
|---|---|---|---|
LLM | Before each LLM call | Prompt + chat history | Catches prompt injection / PII at the model boundary |
RETRIEVE | After each retrieval | Every retrieved chunk individually | RAG-specific: catches PII in source documents before synthesis |
FUNCTION_CALL | Before each tool runs | Tool name + arguments | Stops agents from invoking tools with sensitive args |
AGENT_STEP | Each agent reasoning step (opt-in) | Step content | Audit-grade traceability for ReActAgent / function-calling agents |
The retrieval check is the differentiator. Every other adapter governs LLM I/O; LlamaIndex pipelines retrieve from a vector index that may contain PII that was never cleaned. We check it at the latest possible safe point — after retrieval, before synthesis.
Configuration
handler = PalveronCallbackHandler(
api_key="pv_live_xxx",
base_url="https://gateway.internal.company.com:8080", # on-prem
check_llm=True, # verify LLM prompts (default)
check_retrievals=True, # verify retrieved chunks — RAG-specific (default)
check_tools=True, # verify tool inputs (default)
check_agent_steps=False, # verify agent steps (opt-in)
fail_open=False, # block on gateway outage (Enterprise default)
metadata={"team": "support", "index": "customer_kb"},
)Retrieval Post-Processor (Alternative)
If you only want governance on retrievals (no LLM-side checks), use the post-processor directly on a query engine:
from palveron_llamaindex import PalveronNodePostprocessor
query_engine = index.as_query_engine(
node_postprocessors=[PalveronNodePostprocessor(api_key="pv_live_xxx")],
)
# Or drop chunks that needed modification instead of raising
query_engine = index.as_query_engine(
node_postprocessors=[
PalveronNodePostprocessor(api_key="pv_live_xxx", drop_modified=True),
],
)Agent Tool
Register Palveron as an explicit tool so a ReActAgent can verify content during reasoning:
from llama_index.core.agent import ReActAgent
from palveron_llamaindex import palveron_verify_tool
agent = ReActAgent.from_tools(
[
palveron_verify_tool("pv_live_xxx"),
search_tool,
email_tool,
],
verbose=True,
)
response = agent.chat("Draft a reply to John, then verify it before sending")Behaviour on Decisions
| Decision | Behaviour |
|---|---|
ALLOWED / PASSED | Call proceeds |
MODIFIED | Call proceeds; PII-redaction logged. (For retrievals, the original chunk is retained — set drop_modified=True to filter instead.) |
FLAGGED | Call proceeds; policy hit is logged |
BLOCKED | Raises PalveronGovernanceError |
PENDING_APPROVAL | Raises PalveronGovernanceError (queued for human review) |
RATE_LIMITED | Raises PalveronGovernanceError (quota hit) |
Governance Records
print(f"Blocked: {handler.blocked_count}")
print(f"Trace IDs: {handler.trace_ids}")
for record in handler.records:
print(f"{record.event} [{record.surface}]: {record.decision} ({record.latency_ms:.0f}ms)")Error Handling
from palveron_llamaindex import PalveronCallbackHandler, PalveronGovernanceError
handler = PalveronCallbackHandler(api_key="pv_live_xxx")
Settings.callback_manager = CallbackManager([handler])
try:
response = query_engine.query("Find the file for SSN 123-45-6789")
except PalveronGovernanceError as e:
print(e.decision) # "BLOCKED"
print(e.trace_id) # "trc_abc123"
print(e.reason) # "PII detected in retrieved chunk customer_2024_03.txt"Source Code
Open source (MIT): github.com/palveron/adapter-llamaindex.
Next Steps
- Create your own policies for your RAG pipelines
- Google ADK integration for multimodal agents
- Pydantic AI integration for type-safe agent governance
Pydantic AI
Type-safe agent governance for Pydantic AI — structured output validation, dependency-injected verification, and MCP context forwarding without giving up Pydantic's type guarantees.
Microsoft Agent Governance Toolkit
Bridge between Palveron and Microsoft AGT — central policies, sub-millisecond local enforcement, unified audit trails.