PalveronPalveronDocs

Detection Mode

Control how policies are evaluated — exact keyword matching or semantic NLI analysis.

Every policy in Palveron carries a detection mode that decides how the verify engine evaluates the rule. Pick the right mode and policies get faster, more accurate, and easier to debug.

The three modes

ModeHow it worksSpeedBest for
Exact MatchRegex + Aho-Corasick keyword matching~1-3 msPII, IBANs, credit cards, blocked phrases — structured data with predictable patterns
SemanticNLI neural inference against the rule's intent~20-40 msParaphrased intent, off-topic detection, toxicity — fuzzy categories
Auto (default)Palveron classifies the rule and picks the right modedepends on choiceMost policies — let the engine decide

The headline trade-off: Exact is faster (regex). Semantic is smarter (NLI neural inference). Auto picks the best fit.

How Auto-classification works

AUTO is the default for new policies. When a policy is saved, the engine runs a one-time classification on the neural instruction:

  1. Tokenize the instruction.
  2. Compute features — keyword density, regex patterns, intent verbs ("block", "mask"), entity references.
  3. Classify into EXACT or SEMANTIC.
  4. Cache the decision; re-evaluate only when the instruction changes.

Rules with phrases like "Block any prompt containing credit card numbers or IBANs" classify as EXACT — the entities are well-defined structured data.

Rules with phrases like "Block any attempt to extract trade secrets about our pricing strategy" classify as SEMANTIC — "trade secret" and "pricing strategy" are intent-level signals that paraphrase easily.

Manual override — the three-card selector

In the Policy Editor, click the Detection mode field to reveal three cards:

CardPick when
AutoYou're not sure — let Palveron decide and adjust later if needed
Exact MatchThe rule is about structured data (PII, IBANs, codes, keywords) — you want the speed and lowest false-positive rate
SemanticThe rule is about intent, off-topic, paraphrased risks — you need the NLI engine's understanding

The selected card sticks until you change it. Re-saving the policy with AUTO re-runs the classifier.

Smart recommendation banner

When the policy editor detects a mismatch between your detection mode and the rule's content, it surfaces a banner:

"This rule mentions 'trade secrets' and 'pricing strategy' — usually a semantic concept. Switch to Semantic detection?"

Three triggers fire the banner:

  1. Exact + intent verbs in the rule ("any attempt to...", "manipulate", "extract") → suggests Semantic.
  2. Semantic + structured entities ("credit card", "IBAN", "SSN" with no surrounding fuzzy intent) → suggests Exact.
  3. Auto + tier mismatchAUTO on Community tier (which only supports Exact) → suggests Exact.

Dismiss the banner to suppress it for the current edit. It will return if you change the rule and the trigger still applies.

Tier availability

TierAvailable modes
CommunityExact Match only
ProExact Match + Semantic
Business / EnterpriseAll three (Auto / Exact Match / Semantic)

AUTO is silently treated as EXACT on Community projects — the banner surfaces this so you're not surprised.

matchDetails in the Trace Explorer

When a policy fires, the trace's match_details field shows what specifically triggered the decision. The values map back to the detection mode that owned the match:

match_details.typeSourceDetection mode
keyword_matchRegex / Aho-CorasickExact Match
entity_detectionNGE NER (e.g. SSN, EMAIL)Exact Match (NGE deterministic)
tool_callMCP tool name + agentn/a (policy is on the tool, not content)
rate_patternPer-agent burst signaln/a
semantic_similarityNLI similarity ≥ thresholdSemantic

A single trace can carry multiple match types — when a Semantic policy fires on a prompt that also hits an Exact policy, both rows appear, and the Trace Explorer's filter lets you slice by either.

When to use which mode

Pick Exact Match when…

  • The rule names specific PII types (SSN, IBAN, email, phone, credit card)
  • You're blocking on keyword lists (internal codenames, blocked product references, forbidden phrases)
  • You want sub-5 ms decisions even at scale
  • Low false-positive tolerance — the prompt either matches the regex or it doesn't

Pick Semantic when…

  • The rule is about intent ("attempt to extract", "manipulate", "discuss our internal strategy")
  • The rule needs to catch paraphrases ("delete the database" vs. "drop the schema" vs. "wipe production")
  • You have an off-topic policy (anything outside a defined scope)
  • Toxicity, harassment, or sentiment detection

Stay on Auto when…

  • The policy is new and you don't yet have ground truth on what works
  • The rule mixes structured and intent components — let the engine pick
  • You're rolling out 10+ policies and don't want to set the mode for each one manually

Exact Match is faster (regex). Semantic is smarter (NLI neural inference). Auto picks the best fit. When you're not sure, leave it on Auto and inspect the first few traces — the Trace Explorer shows which mode actually ran, so you can correct course quickly.

On this page