PalveronPalveronDocs

LlamaIndex

RAG-aware AI governance for LlamaIndex — automatic policy checks for every LLM call, tool invocation, and (uniquely) every retrieved document chunk before synthesis.

LlamaIndex Integration

Add one callback handler. Every LLM call, every tool invocation, and (uniquely for RAG) every retrieved document chunk is checked against your Palveron policies. PII in a source document never silently bleeds into your model's context window.

Installation

pip install palveron-llamaindex

Quickstart

from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager
from palveron_llamaindex import PalveronCallbackHandler

handler = PalveronCallbackHandler(api_key="pv_live_xxx")
Settings.callback_manager = CallbackManager([handler])

documents = SimpleDirectoryReader("./customer_files").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# PII in any retrieved chunk → PalveronGovernanceError before synthesis
response = query_engine.query("Summarize the customer file")

What Is Checked

EventWhenWhatWhy it matters
LLMBefore each LLM callPrompt + chat historyCatches prompt injection / PII at the model boundary
RETRIEVEAfter each retrievalEvery retrieved chunk individuallyRAG-specific: catches PII in source documents before synthesis
FUNCTION_CALLBefore each tool runsTool name + argumentsStops agents from invoking tools with sensitive args
AGENT_STEPEach agent reasoning step (opt-in)Step contentAudit-grade traceability for ReActAgent / function-calling agents

The retrieval check is the differentiator. Every other adapter governs LLM I/O; LlamaIndex pipelines retrieve from a vector index that may contain PII that was never cleaned. We check it at the latest possible safe point — after retrieval, before synthesis.

Configuration

handler = PalveronCallbackHandler(
    api_key="pv_live_xxx",
    base_url="https://gateway.internal.company.com:8080",  # on-prem
    check_llm=True,              # verify LLM prompts (default)
    check_retrievals=True,       # verify retrieved chunks — RAG-specific (default)
    check_tools=True,            # verify tool inputs (default)
    check_agent_steps=False,     # verify agent steps (opt-in)
    fail_open=False,             # block on gateway outage (Enterprise default)
    metadata={"team": "support", "index": "customer_kb"},
)

Retrieval Post-Processor (Alternative)

If you only want governance on retrievals (no LLM-side checks), use the post-processor directly on a query engine:

from palveron_llamaindex import PalveronNodePostprocessor

query_engine = index.as_query_engine(
    node_postprocessors=[PalveronNodePostprocessor(api_key="pv_live_xxx")],
)

# Or drop chunks that needed modification instead of raising
query_engine = index.as_query_engine(
    node_postprocessors=[
        PalveronNodePostprocessor(api_key="pv_live_xxx", drop_modified=True),
    ],
)

Agent Tool

Register Palveron as an explicit tool so a ReActAgent can verify content during reasoning:

from llama_index.core.agent import ReActAgent
from palveron_llamaindex import palveron_verify_tool

agent = ReActAgent.from_tools(
    [
        palveron_verify_tool("pv_live_xxx"),
        search_tool,
        email_tool,
    ],
    verbose=True,
)

response = agent.chat("Draft a reply to John, then verify it before sending")

Behaviour on Decisions

DecisionBehaviour
ALLOWED / PASSEDCall proceeds
MODIFIEDCall proceeds; PII-redaction logged. (For retrievals, the original chunk is retained — set drop_modified=True to filter instead.)
FLAGGEDCall proceeds; policy hit is logged
BLOCKEDRaises PalveronGovernanceError
PENDING_APPROVALRaises PalveronGovernanceError (queued for human review)
RATE_LIMITEDRaises PalveronGovernanceError (quota hit)

Governance Records

print(f"Blocked: {handler.blocked_count}")
print(f"Trace IDs: {handler.trace_ids}")

for record in handler.records:
    print(f"{record.event} [{record.surface}]: {record.decision} ({record.latency_ms:.0f}ms)")

Error Handling

from palveron_llamaindex import PalveronCallbackHandler, PalveronGovernanceError

handler = PalveronCallbackHandler(api_key="pv_live_xxx")
Settings.callback_manager = CallbackManager([handler])

try:
    response = query_engine.query("Find the file for SSN 123-45-6789")
except PalveronGovernanceError as e:
    print(e.decision)   # "BLOCKED"
    print(e.trace_id)   # "trc_abc123"
    print(e.reason)     # "PII detected in retrieved chunk customer_2024_03.txt"

Source Code

Open source (MIT): github.com/palveron/adapter-llamaindex.

Next Steps

On this page