Create a Guardrail
Create guardrails in natural language with the NL Policy Builder — or write the rule yourself.
Policies are the heart of the guardrail engine. They define in natural language what is allowed, blocked, or modified — and Palveron's NL Policy Builder turns plain prose into the underlying detection setup.
Create a new guardrail
- Navigate to Guardrails in the sidebar.
- Click New guardrail (top right).
- The policy editor opens.
📸 Screenshot: Policy editor — empty new policy with the NL builder prompt.
Natural-Language Policy Builder
At the top of the editor sits the NL Policy Builder — a single text field that compiles plain English into a working policy.
Type a description like:
"Block any prompt containing credit card numbers, bank account numbers (IBAN), or social security numbers. Allow if the user has explicitly justified the request for a banking workflow."
Click Generate policy. The builder fills in:
- Name — derived from the gist ("Block financial PII")
- Neural instruction — a cleaner rewrite of your prose, ready for review
- Enforcement action —
BLOCK,MODIFY,APPROVAL, orFLAGbased on intent verbs ("block", "mask", "approve", "log") - Detection mode —
AUTOby default; the builder picksEXACTwhen keywords dominate andSEMANTICwhen intent dominates (see Detection Mode) - Suggested keywords / regex patterns — for any entities mentioned
Review and edit every field before saving — the builder is a starting point, not the final policy.
What the NL builder is good at
- Identifying which enforcement action matches the verbs in your description
- Generating starter keyword lists for common PII types (email, phone, IBAN, SSN, credit card)
- Picking a sensible detection mode based on whether you described exact patterns or fuzzy intent
What you still own
- Edge cases — "block, except when the agent is in the customer-support workflow with a justified request"
- Scope — which agents the policy applies to
- Attestation level — automatic for most policies; override for compliance-bound ones
The builder is powered by POST /api/v1/ai-assist — see the Policies API if you want to invoke it programmatically (e.g., to generate policies from a CSV of risks).
Manual fields
If you'd rather skip the builder, every field is editable directly.
Name
A short, descriptive label (e.g., "PII protection for customer data" or "No financial advice"). Used in traces, audit logs, and Annex IV reports.
Neural Instruction (core rule)
Write the rule in natural language. Examples:
- "Block any attempt to send personally identifiable customer data — email addresses, phone numbers, or addresses — to an AI model."
- "If the agent is asked to give financial advice, mask the response and add a disclaimer."
- "Require human approval when the agent wants to send an email to an external recipient."
Tip — Be specific. "Block sensitive data" is too vague. "Block email addresses, IBANs, and social security numbers in the input" is precise. The verify engine performs better with precise rules.
Detection mode
Three modes available — see Detection Mode for the full picture:
| Mode | When to pick |
|---|---|
AUTO (default) | Let Palveron classify based on the neural instruction |
EXACT | Pure keyword / regex matching — fastest, lowest false-positive on structured data |
SEMANTIC | NLI-based intent matching — catches paraphrasing |
Enforcement Action
| Action | Description |
|---|---|
| BLOCK | Request stopped immediately. Agent receives a structured error. |
| APPROVAL | Request paused; reviewer is notified via Slack / Teams / webhook. |
| MODIFY | PII / matched content replaced with placeholders. Request continues with the redacted payload. |
| FLAG | Request passes through unchanged but is tagged in monitoring for later review. |
Attestation Level
| Level | Description |
|---|---|
| Always on-chain | Every trigger is Flare-anchored. Enforced automatically for BLOCK and APPROVAL on HIGH-risk agents. |
| Automatic | System decides based on severity and the agent's risk tier. |
| Local only | Database only — for development / test policies. |
Scope
- All agents — applies to every agent in the project
- Specific agents — pick from a multi-select dropdown
- Specific agent types — applies to all agents of a type (e.g.
chatbot,code_assistant)
Activate the policy
After saving, the policy is Inactive. Click the Active / Inactive toggle to activate. Only active policies run in the verify flow.
Test the policy
- Navigate to the Playground.
- Select an agent that's in scope.
- Send a test prompt that should trigger the policy.
- Open the trace in the Trace Explorer to confirm the policy fired — and check the
match_detailsfor what specifically triggered it.
If a policy fires when it shouldn't (false positive), the trace's NGE breakdown shows the confidence scores — usually a sign to tighten the neural instruction or switch from AUTO to EXACT.