Create a Guardrail

Create guardrails in natural language with the NL Policy Builder — or write the rule yourself.

Policies are the heart of the guardrail engine. They define in natural language what is allowed, blocked, or modified — and Palveron's NL Policy Builder turns plain prose into the underlying detection setup.

Create a new guardrail

Navigate to Guardrails in the sidebar.
Click New guardrail (top right).
The policy editor opens.

📸 Screenshot: Policy editor — empty new policy with the NL builder prompt.

Natural-Language Policy Builder

At the top of the editor sits the NL Policy Builder — a single text field that compiles plain English into a working policy.

Type a description like:

"Block any prompt containing credit card numbers, bank account numbers (IBAN), or social security numbers. Allow if the user has explicitly justified the request for a banking workflow."

Click Generate policy. The builder fills in:

Name — derived from the gist ("Block financial PII")
Neural instruction — a cleaner rewrite of your prose, ready for review
Enforcement action — BLOCK, MODIFY, APPROVAL, or FLAG based on intent verbs ("block", "mask", "approve", "log")
Detection mode — AUTO by default; the builder picks EXACT when keywords dominate and SEMANTIC when intent dominates (see Detection Mode)
Suggested keywords / regex patterns — for any entities mentioned

Review and edit every field before saving — the builder is a starting point, not the final policy.

What the NL builder is good at

Identifying which enforcement action matches the verbs in your description
Generating starter keyword lists for common PII types (email, phone, IBAN, SSN, credit card)
Picking a sensible detection mode based on whether you described exact patterns or fuzzy intent

What you still own

Edge cases — "block, except when the agent is in the customer-support workflow with a justified request"
Scope — which agents the policy applies to
Attestation level — automatic for most policies; override for compliance-bound ones

The builder is powered by POST /api/v1/ai-assist — see the Policies API if you want to invoke it programmatically (e.g., to generate policies from a CSV of risks).

Manual fields

If you'd rather skip the builder, every field is editable directly.

Name

A short, descriptive label (e.g., "PII protection for customer data" or "No financial advice"). Used in traces, audit logs, and Annex IV reports.

Neural Instruction (core rule)

Write the rule in natural language. Examples:

"Block any attempt to send personally identifiable customer data — email addresses, phone numbers, or addresses — to an AI model."
"If the agent is asked to give financial advice, mask the response and add a disclaimer."
"Require human approval when the agent wants to send an email to an external recipient."

Tip — Be specific. "Block sensitive data" is too vague. "Block email addresses, IBANs, and social security numbers in the input" is precise. The verify engine performs better with precise rules.

Detection mode

Three modes available — see Detection Mode for the full picture:

Mode	When to pick
`AUTO` (default)	Let Palveron classify based on the neural instruction
`EXACT`	Pure keyword / regex matching — fastest, lowest false-positive on structured data
`SEMANTIC`	NLI-based intent matching — catches paraphrasing

Enforcement Action

Action	Description
BLOCK	Request stopped immediately. Agent receives a structured error.
APPROVAL	Request paused; reviewer is notified via Slack / Teams / webhook.
MODIFY	PII / matched content replaced with placeholders. Request continues with the redacted payload.
FLAG	Request passes through unchanged but is tagged in monitoring for later review.

Attestation Level

Level	Description
Always on-chain	Every trigger is Flare-anchored. Enforced automatically for BLOCK and APPROVAL on HIGH-risk agents.
Automatic	System decides based on severity and the agent's risk tier.
Local only	Database only — for development / test policies.

Scope

All agents — applies to every agent in the project
Specific agents — pick from a multi-select dropdown
Specific agent types — applies to all agents of a type (e.g. chatbot, code_assistant)

Activate the policy

After saving, the policy is Inactive. Click the Active / Inactive toggle to activate. Only active policies run in the verify flow.

Test the policy

Navigate to the Playground.
Select an agent that's in scope.
Send a test prompt that should trigger the policy.
Open the trace in the Trace Explorer to confirm the policy fired — and check the match_details for what specifically triggered it.

If a policy fires when it shouldn't (false positive), the trace's NGE breakdown shows the confidence scores — usually a sign to tighten the neural instruction or switch from AUTO to EXACT.

On this page