Detection Mode
Control how policies are evaluated — exact keyword matching or semantic NLI analysis.
Every policy in Palveron carries a detection mode that decides how the verify engine evaluates the rule. Pick the right mode and policies get faster, more accurate, and easier to debug.
The three modes
| Mode | How it works | Speed | Best for |
|---|---|---|---|
| Exact Match | Regex + Aho-Corasick keyword matching | ~1-3 ms | PII, IBANs, credit cards, blocked phrases — structured data with predictable patterns |
| Semantic | NLI neural inference against the rule's intent | ~20-40 ms | Paraphrased intent, off-topic detection, toxicity — fuzzy categories |
| Auto (default) | Palveron classifies the rule and picks the right mode | depends on choice | Most policies — let the engine decide |
The headline trade-off: Exact is faster (regex). Semantic is smarter (NLI neural inference). Auto picks the best fit.
How Auto-classification works
AUTO is the default for new policies. When a policy is saved, the engine runs a one-time classification on the neural instruction:
- Tokenize the instruction.
- Compute features — keyword density, regex patterns, intent verbs ("block", "mask"), entity references.
- Classify into
EXACTorSEMANTIC. - Cache the decision; re-evaluate only when the instruction changes.
Rules with phrases like "Block any prompt containing credit card numbers or IBANs" classify as EXACT — the entities are well-defined structured data.
Rules with phrases like "Block any attempt to extract trade secrets about our pricing strategy" classify as SEMANTIC — "trade secret" and "pricing strategy" are intent-level signals that paraphrase easily.
Manual override — the three-card selector
In the Policy Editor, click the Detection mode field to reveal three cards:
| Card | Pick when |
|---|---|
| Auto | You're not sure — let Palveron decide and adjust later if needed |
| Exact Match | The rule is about structured data (PII, IBANs, codes, keywords) — you want the speed and lowest false-positive rate |
| Semantic | The rule is about intent, off-topic, paraphrased risks — you need the NLI engine's understanding |
The selected card sticks until you change it. Re-saving the policy with AUTO re-runs the classifier.
Smart recommendation banner
When the policy editor detects a mismatch between your detection mode and the rule's content, it surfaces a banner:
"This rule mentions 'trade secrets' and 'pricing strategy' — usually a semantic concept. Switch to Semantic detection?"
Three triggers fire the banner:
- Exact + intent verbs in the rule ("any attempt to...", "manipulate", "extract") → suggests
Semantic. - Semantic + structured entities ("credit card", "IBAN", "SSN" with no surrounding fuzzy intent) → suggests
Exact. - Auto + tier mismatch —
AUTOon Community tier (which only supports Exact) → suggestsExact.
Dismiss the banner to suppress it for the current edit. It will return if you change the rule and the trigger still applies.
Tier availability
| Tier | Available modes |
|---|---|
| Community | Exact Match only |
| Pro | Exact Match + Semantic |
| Business / Enterprise | All three (Auto / Exact Match / Semantic) |
AUTO is silently treated as EXACT on Community projects — the banner surfaces this so you're not surprised.
matchDetails in the Trace Explorer
When a policy fires, the trace's match_details field shows what specifically triggered the decision. The values map back to the detection mode that owned the match:
match_details.type | Source | Detection mode |
|---|---|---|
keyword_match | Regex / Aho-Corasick | Exact Match |
entity_detection | NGE NER (e.g. SSN, EMAIL) | Exact Match (NGE deterministic) |
tool_call | MCP tool name + agent | n/a (policy is on the tool, not content) |
rate_pattern | Per-agent burst signal | n/a |
semantic_similarity | NLI similarity ≥ threshold | Semantic |
A single trace can carry multiple match types — when a Semantic policy fires on a prompt that also hits an Exact policy, both rows appear, and the Trace Explorer's filter lets you slice by either.
When to use which mode
Pick Exact Match when…
- The rule names specific PII types (SSN, IBAN, email, phone, credit card)
- You're blocking on keyword lists (internal codenames, blocked product references, forbidden phrases)
- You want sub-5 ms decisions even at scale
- Low false-positive tolerance — the prompt either matches the regex or it doesn't
Pick Semantic when…
- The rule is about intent ("attempt to extract", "manipulate", "discuss our internal strategy")
- The rule needs to catch paraphrases ("delete the database" vs. "drop the schema" vs. "wipe production")
- You have an off-topic policy (anything outside a defined scope)
- Toxicity, harassment, or sentiment detection
Stay on Auto when…
- The policy is new and you don't yet have ground truth on what works
- The rule mixes structured and intent components — let the engine pick
- You're rolling out 10+ policies and don't want to set the mode for each one manually
Exact Match is faster (regex). Semantic is smarter (NLI neural inference). Auto picks the best fit. When you're not sure, leave it on Auto and inspect the first few traces — the Trace Explorer shows which mode actually ran, so you can correct course quickly.