Neural Governance Engine
Run local ONNX inference for PII, injection, and intent detection — sub-50 ms, 85-95% LLM cost reduction.
The Neural Governance Engine (NGE) is Palveron's local inference layer. Five ONNX models run inside the gateway, turning most LLM-assist round-trips into sub-50 ms local calls.
Prompt → NGE pipeline (5 stages) → Decision
1. Regex fast, deterministic
2. Aho-Corasick multi-keyword matching
3. ONNX NER entity extraction
4. NLI Contextual semantic intent scoring
5. LLM-Assist escalate borderline onlyWhat NGE detects
| Signal | Source stage | Use case |
|---|---|---|
| Structured PII (SSN, IBAN, credit card, email, phone) | Regex | Strict, low-false-positive masking |
| Brand-specific keywords, custom denylists | Aho-Corasick | Internal product names, blocked phrases |
Named entities — PERSON, ORG, LOC, EMAIL, SECRET, INJECTION | ONNX NER × 2 | Context-aware PII detection beyond regex |
Intent classification — injection, toxicity, off_topic, pii_density | NLI Contextual | Catches paraphrased malicious intent |
| Edge cases | LLM-Assist | Last-resort cloud escalation for low-confidence local results |
Engine modes
Configure under Settings → Security → Neural Governance Engine or via env var NGE_MODE:
| Mode | Behavior | When to use |
|---|---|---|
disabled | Regex + LLM-assist only (pre-Sprint-54 behavior) | Benchmarking or fall-back |
nge_local | Local models only — never calls the cloud LLM | EU data residency, zero-egress; foundation for air-gapped deployment (on the roadmap) |
nge_fallback (default) | Local-first; borderline cases escalate to LLM | Most customers — best balance |
llm_only | Skip local, always call cloud LLM | Benchmarking accuracy upper bound |
Palveron's internal benchmarks (nge_fallback vs. llm_only) show:
- 85-95% reduction in LLM-assist calls
- Under 1 percentage point accuracy drop on a 50k-prompt evaluation set
- 6-10× latency improvement at the p50 (median)
Sensitivity presets
A single slider maps to internal thresholds across all five stages:
| Preset | Thresholds | Typical industry |
|---|---|---|
| Strict | Low — more BLOCKs and MODIFYs | Healthcare, finance, government |
| Balanced (default) | Vendor-tuned | General enterprise |
| Tolerant | Higher — fewer interventions | Internal R&D, developer tooling |
Don't tune sensitivity per-policy — use presets at the project level and rely on per-policy enforcement actions (BLOCK, APPROVAL, ANONYMIZE, FLAG) to handle granularity.
Shadow Mode
When Shadow Mode is on, the engine evaluates every request but does not enforce. Traces show what would have happened; the request always passes through unchanged.
Use Shadow Mode to:
- A/B test a stricter sensitivity preset before rolling out
- Onboard a new agent type without false-positive blocks
- Validate NGE accuracy against a known-good corpus
Switch enforcement back on under Settings → Security → Neural Governance Engine → Shadow mode when satisfied with the trace history.
Reading NGE results in traces
Open any trace in the Trace Explorer. NGE-driven traces include:
- Engine badge —
nge_local,nge_fallback (local),nge_fallback (LLM),llm_only,disabled - Confidence bars — per NLI dimension (
injection,toxicity,off_topic,pii_density) with the active threshold line - Entity highlights — color-coded inline annotations on the prompt:
PII,SECRET,INJECTION,TOXICITY - Stage timing — milliseconds spent in each pipeline stage
Stage ms Decision contribution
Regex 0.3 ssn match → MASK
Aho-Corasick 0.4 no hit
ONNX NER 18.2 PERSON × 1, EMAIL × 1
NLI Contextual 22.1 injection: 0.04 (under 0.5)
LLM-Assist — not invoked
Total 40.9 ✓ MODIFIEDNGE in the API response
POST /api/v1/verify returns NGE fields alongside the decision:
{
"decision": "MODIFIED",
"engine": "nge_local",
"nge_policy_scores": {
"injection": 0.04, "toxicity": 0.02,
"off_topic": 0.18, "pii_density": 0.91
},
"detected_entities": [
{ "type": "SSN", "start": 26, "end": 37, "confidence": 0.99 }
],
"deterministic_results": { "regex_matches": ["ssn_us"] }
}See api/verify for the full response schema.
Language packs
Default packs: English + German. Each additional pack (Spanish, French, Italian, Portuguese, Dutch, Polish) adds one ONNX model (~250 MB) and loads on first use.
Enable extra packs under Settings → Security → Neural Governance Engine → Language packs. Multi-lingual prompts auto-route to the closest pack via the gateway's language detector.
Self-hosted considerations
On self-hosted deployments, set:
NGE_MODE=nge_fallback
NGE_MODELS_DIR=/app/models/ngeModels are distributed separately (they're ~4 GB total). Download:
python scripts/download-nge-models.py --models-dir ./models/ngeMount the directory into the gateway container (the Helm chart does this for you via a PVC). NGE warms up on boot — the gateway's /ready endpoint returns 200 only after every model has finished loading.
NGE scores are assistance signals, not legal verdicts. They inform the policy engine; the policy engine still owns the final decision. Treat them like a smoke alarm — informative, but the firefighter (your policy) decides what to do.