RAG-aware AI governance for LlamaIndex — automatic policy checks for every LLM call, tool invocation, and (uniquely) every retrieved document chunk before synthesis.

LlamaIndex Integration

Add one callback handler. Every LLM call, every tool invocation, and (uniquely for RAG) every retrieved document chunk is checked against your Palveron policies. PII in a source document never silently bleeds into your model's context window.

Installation

pip install palveron-llamaindex

Quickstart

from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager
from palveron_llamaindex import PalveronCallbackHandler

handler = PalveronCallbackHandler(api_key="pv_live_xxx")
Settings.callback_manager = CallbackManager([handler])

documents = SimpleDirectoryReader("./customer_files").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# PII in any retrieved chunk → PalveronGovernanceError before synthesis
response = query_engine.query("Summarize the customer file")

What Is Checked

Event	When	What	Why it matters
`LLM`	Before each LLM call	Prompt + chat history	Catches prompt injection / PII at the model boundary
`RETRIEVE`	After each retrieval	Every retrieved chunk individually	RAG-specific: catches PII in source documents before synthesis
`FUNCTION_CALL`	Before each tool runs	Tool name + arguments	Stops agents from invoking tools with sensitive args
`AGENT_STEP`	Each agent reasoning step (opt-in)	Step content	Audit-grade traceability for ReActAgent / function-calling agents

The retrieval check is the differentiator. Every other adapter governs LLM I/O; LlamaIndex pipelines retrieve from a vector index that may contain PII that was never cleaned. We check it at the latest possible safe point — after retrieval, before synthesis.

Configuration

handler = PalveronCallbackHandler(
    api_key="pv_live_xxx",
    base_url="https://gateway.internal.company.com:8080",  # on-prem
    check_llm=True,              # verify LLM prompts (default)
    check_retrievals=True,       # verify retrieved chunks — RAG-specific (default)
    check_tools=True,            # verify tool inputs (default)
    check_agent_steps=False,     # verify agent steps (opt-in)
    fail_open=False,             # block on gateway outage (Enterprise default)
    metadata={"team": "support", "index": "customer_kb"},
)

Retrieval Post-Processor (Alternative)

If you only want governance on retrievals (no LLM-side checks), use the post-processor directly on a query engine:

from palveron_llamaindex import PalveronNodePostprocessor

query_engine = index.as_query_engine(
    node_postprocessors=[PalveronNodePostprocessor(api_key="pv_live_xxx")],
)

# Or drop chunks that needed modification instead of raising
query_engine = index.as_query_engine(
    node_postprocessors=[
        PalveronNodePostprocessor(api_key="pv_live_xxx", drop_modified=True),
    ],
)

Agent Tool

from llama_index.core.agent import ReActAgent
from palveron_llamaindex import palveron_verify_tool

agent = ReActAgent.from_tools(
    [
        palveron_verify_tool("pv_live_xxx"),
        search_tool,
        email_tool,
    ],
    verbose=True,
)

response = agent.chat("Draft a reply to John, then verify it before sending")

Behaviour on Decisions

Decision	Behaviour
`PASSED`	Call proceeds
`MODIFIED`	Call proceeds; PII-redaction logged. (For retrievals, the original chunk is retained — set `drop_modified=True` to filter instead.)
`FLAGGED`	Call proceeds; policy hit is logged
`BLOCKED`	Raises `PalveronGovernanceError`
`PENDING_APPROVAL`	Raises `PalveronGovernanceError` (queued for human review)
HTTP 429 (rate limit)	Raises `PalveronGovernanceError` (quota hit)

Governance Records

print(f"Blocked: {handler.blocked_count}")
print(f"Trace IDs: {handler.trace_ids}")

for record in handler.records:
    print(f"{record.event} [{record.surface}]: {record.decision} ({record.latency_ms:.0f}ms)")

Error Handling

from palveron_llamaindex import PalveronCallbackHandler, PalveronGovernanceError

handler = PalveronCallbackHandler(api_key="pv_live_xxx")
Settings.callback_manager = CallbackManager([handler])

try:
    response = query_engine.query("Find the file for SSN 123-45-6789")
except PalveronGovernanceError as e:
    print(e.decision)   # "BLOCKED"
    print(e.trace_id)   # "cktrace..."
    print(e.reason)     # "PII detected in retrieved chunk customer_2024_03.txt"

Source Code

Open source (MIT): github.com/palveron/adapter-llamaindex.

Next Steps

Create your own policies for your RAG pipelines
Google ADK integration for multimodal agents
Pydantic AI integration for type-safe agent governance

LlamaIndex

On this page