Kredete - Automated Guardrails

← Back to architecture

Automated Guardrails — AI Safety & Content Control Pipeline

Multi-layer guardrail system protecting all LLM interactions: input validation → content filtering → PII redaction → output sanitization → policy enforcement · applied to every Prael conversation

Inbound sources

Request origins

Prael AI agent

Credit agent

Attribution agent

Intent agent

API gateway

Internal services

Input types

User text prompts

System prompts

Tool call results

Document uploads

Function outputs

Threat categories

Prompt injection

Jailbreak attempts

PII leakage

Harmful content

Off-topic requests

Volume metrics

Requests/day

2.4M

Blocked

0.3%

Flagged

1.2%

1 Input validation & pre-processing

Schema validation

Token length check

Character encoding

Format validation

Language detection

Injection detection

Prompt injection scan

Jailbreak classifier

Indirect injection

Context manipulation

Rate limiting

Per-user throttle

Per-session limit

Burst detection

Cost-based cap

Authentication

JWT validation

Scope verification

Session integrity

Device fingerprint

2 Content classification & filtering

Toxicity detection

Hate speech

Violence / threats

Sexual content

Self-harm

Profanity filter

Topic boundaries

Finance (allowed)

Credit (allowed)

Payments (allowed)

Medical (redirect)

Legal advice (block)

Financial compliance

Investment advice check

Guarantee language

Risk disclosure

Reg-E compliance

Fair lending check

Intent classification

Legitimate query

Social engineering

Data extraction

System probing

Adversarial testing

3 PII detection & redaction

PII entity detection

SSN / national ID

Credit card numbers

Bank account details

Date of birth

Phone / email

Redaction strategy

Token replacement

Masking (***)

Synthetic substitution

Format-preserving hash

Context-aware redact

Data residency

Region classification

GDPR data (EU only)

CCPA data (CA flag)

Cross-border block

Retention tagging

Audit trail

Redaction log

Original hash (vault)

Access record

Compliance timestamp

DPO notification

4 Output sanitization & response control

Response validation

Hallucination check

Factual grounding

Source citation

Confidence scoring

Output filtering

PII leak scan

Internal data leak

Prompt echo check

System prompt leak

Compliance injection

Disclaimer append

Risk warnings

Regulatory notices

Not-advice caveat

Response shaping

Tone calibration

Length enforcement

Format compliance

Brand voice check

5 Policy enforcement & escalation

Decision engine

Pass → serve response

Flag → log + serve

Modify → sanitize + serve

Block → reject + log

Escalation paths

Auto-escalate (severity 1)

Queue for review

Compliance alert

Security incident

Policy updates

Rule versioning

A/B policy testing

Hot reload (no deploy)

Rollback capability

Reporting

Daily guardrail report

False positive rate

Trend analysis

Board compliance deck

Target: <0.01% false negatives

Guardrail config

Active rule sets

Core safety ✓

Financial compliance ✓

PII protection ✓

Topic boundaries ✓

Custom rules (12)

Model providers

OpenAI Moderation API

Anthropic safety layer

Kredete custom classifier

Regex patterns (fast)

NER model (PII)

Performance

Latency

18ms

Accuracy

99.7%

FP rate

0.4%

Uptime

99.99%

Compliance standards

NIST AI RMF

EU AI Act (Art. 14)

SOC 2 Type II

ISO 27001