Documentation Index
Fetch the complete documentation index at: https://docs.multivon.ai/llms.txt
Use this file to discover all available pages before exploring further.
Compliance evaluators run entirely within your environment. No data leaves your infrastructure.
PIIEvaluator
Scans LLM outputs for personally identifiable information using local regex
patterns plus checksum validation (Luhn, Verhoeff, Mod-97, Mod-11). Zero API
calls by default. Optional NER fallback via Presidio
when installed.
When to use: Any regulated deployment (healthcare, finance, legal,
government) where PII in model outputs is a compliance risk. Run it in CI/CD
so regressions are caught before production.
Passes when no PII is detected. Fails with a per-type breakdown.
from multivon_eval import PIIEvaluator
# Default — all jurisdictions, strict checksum validation, no NER.
suite.add_evaluators(PIIEvaluator())
# Healthcare with name detection (lazy-imports presidio_analyzer if installed).
suite.add_evaluators(PIIEvaluator(jurisdiction="hipaa", use_ner=True))
# India-specific with custom identifier overlay.
suite.add_evaluators(PIIEvaluator(
jurisdiction="dpdp",
patterns={"employee_id": r"EMP-\d{6}"},
))
# Reason field uses [REDACTED-TYPE] tokens; original output is never mutated.
suite.add_evaluators(PIIEvaluator(redact=True))
Parameters
| Parameter | Default | Description |
|---|
jurisdiction | "all" | "hipaa", "gdpr", "dpdp", "ccpa", "pipeda", or "all". |
patterns | None | Additional {name: regex} overlay. Merged after jurisdiction patterns. |
redact | False | Reason field shows [REDACTED-TYPE] tokens instead of substrings. Original output is never mutated. |
threshold | 1.0 | Pass threshold — any PII match fails by default. |
strict | True | Apply checksum validators (Luhn, Verhoeff, Mod-97, Mod-11, PAN structural, SSN structural) to drop false positives. Pass False to see raw regex hits. |
use_ner | False | Lazy-import presidio_analyzer (if installed) to additionally catch PERSON, LOCATION, DATE_TIME. Partial coverage for HIPAA Safe Harbor categories regex can’t reach. Silent no-op when Presidio isn’t installed. |
Standards covered
Every pattern in the evaluator carries a citation to its source standard.
The table below is exhaustive — if your jurisdiction needs an identifier
not listed here, supply it via patterns={...} (or open a PR — additions
are easy).
HIPAA Safe Harbor — 45 CFR § 164.514(b)(2)
Eighteen identifier categories. The evaluator covers thirteen via regex;
the remaining five (free-text names, geographic subdivisions, photographs,
biometrics, “other unique IDs”) need use_ner=True for partial coverage.
| # | Identifier | Detection |
|---|
| 1 | Names (with Patient, Mr, Dr etc. prefix) | Regex (context-led, high precision) |
| 1 | Names without prefix | NER (use_ner=True) |
| 2 | Street addresses | NER (use_ner=True); loose regex baseline |
| 2 | US ZIP codes (ZIP 94103, Zipcode 12345-6789) | Regex (context-anchored) |
| 3 | Dates of admission / discharge / death / birth | Regex (context-anchored) |
| 3 | Age > 89 (aged 92, years old: 95) | Regex |
| 4 | Telephone numbers | Regex (NANP + international) |
| 5 | Fax numbers | Regex (NANP shape, Fax: prefix) |
| 6 | Email addresses | Regex (RFC 5322 simplified) |
| 7 | Social Security Numbers | Regex + structural validator (drops 000-, 666-, 9xx-, all-same decoys) |
| 8 | Medical record numbers (4–15 digits) | Regex |
| 9 | Health plan beneficiary numbers (HPN, Group No, Policy No) | Regex |
| 10 | Account numbers (Acct, Account) | Regex |
| 11 | Certificate / license numbers (NPI, DEA, License, Cert) | Regex |
| 12 | Vehicle identifiers (VINs, 17-char) | Regex |
| 13 | Device identifiers (UDI, Device ID, Implant No) | Regex |
| 14 | Web URLs | Regex |
| 15 | IP addresses (IPv4) | Regex |
| 16 | Biometric identifiers | NER (use_ner=True); partial |
| 17 | Full-face photographs | Not text — must be screened upstream |
| 18 | Other unique identifying numbers | patterns={...} overlay |
GDPR — Regulation (EU) 2016/679, Art.4(1)
National identification numbers across EU member states + base PII.
| Identifier | Country | Validator |
|---|
NI Number (AB123456C) | UK | Structural (excludes D/F/I/Q/U/V prefixes) |
| NHS Number | UK | Mod-11 (10-digit, drops 3-3-4 grouping false positives) |
DNI (12345678Z) | Spain | Letter (mod-23) when strict |
NIE (X1234567L) | Spain | Letter (mod-23) when strict |
| Codice Fiscale (16 alphanumeric) | Italy | Structural |
| NIR / INSEE (15-digit Sécurité Sociale) | France | Structural |
Steuer-IdNr (Steuer-IdNr: 12345678901) | Germany | Context-anchored |
BSN (BSN: 12345678) | Netherlands | Context-anchored; 11-test optional |
PESEL (11-digit, PESEL context) | Poland | Structural |
Personnummer (YYMMDD-XXXX) | Sweden | Context-anchored |
CPR (DDMMYY-XXXX) | Denmark | Context-anchored |
PPSN (7 digits + 1-2 letters) | Ireland | Context-anchored |
HETU (DDMMYY[-+A]NNNX) | Finland | Structural |
| EU VAT (country prefix + digits) | EU | Structural |
| IBAN | All | Mod-97 per ISO 13616 |
Article 9 special categories (race, religion, health data, sex life,
trade-union membership): these are content categories, not identifier
formats. Use a topic classifier or NER pipeline; the regex evaluator can’t
reach them.
DPDP India — Act 22 of 2023
Indian government-issued identifiers + Indian PII formats.
| Identifier | Format | Validator |
|---|
| Aadhaar | UIDAI 12-digit | Verhoeff dihedral checksum |
| PAN | Income Tax [A-Z]{5}\d{4}[A-Z] | Structural — 4th char ∈ |
| GSTIN | \d{2}[A-Z]{5}\d{4}[A-Z][A-Z0-9]Z[A-Z0-9] | Structural |
| IFSC | [A-Z]{4}0[A-Z0-9]{6} | Structural |
| Voter ID (EPIC) | [A-Z]{3}\d{7} | Structural |
| India mobile (+91) | 10-digit, starts 6-9 | Structural |
| Driving License | State + RTO + year + serial | Structural |
| India Passport | Letter (A–P, R–W, Y) + 7 digits | Structural |
| Vehicle Registration | KA-01-AB-1234 style | Structural |
| Ration Card | Context-anchored | — |
CCPA — Cal. Civ. Code § 1798.140(o)
| Identifier | Detection |
|---|
Bank account / routing (context-anchored: account, acct, routing) | Regex |
California Driver’s License ([A-Z]\d{7}) | Regex |
(Other CCPA categories — biometric data, geolocation, browsing history —
need pipeline-level controls; regex can’t cover them.)
PIPEDA (Canada)
Schedule 1 categories overlap entirely with the base PII set (name, email,
phone, address, SSN/SIN, financial). No Canada-specific identifier format
needs to be added.
Strict mode (default)
When strict=True (the default), regex hits are filtered through identity
validators before being reported. This dramatically cuts false positives:
| Identifier | Validator |
|---|
| Credit card (Visa/MC/Amex/Discover/JCB) | Luhn Mod-10 |
| Aadhaar | Verhoeff |
| IBAN | Mod-97 (ISO 13616) |
| NHS Number | Mod-11 |
| PAN India | Structural (holder-type) |
| SSN (US) | Structural (drops 000-, 666-, 9xx-, all-same decoys) |
| GSTIN | Structural |
Pass strict=False to see raw matches without validation — useful for
debugging and for jurisdictions where checksum specs aren’t published.
Optional NER (use_ner=True)
When use_ner=True, the evaluator additionally invokes
presidio_analyzer on the output to
catch PERSON, LOCATION, DATE_TIME, NRP, ORGANIZATION, etc. — providing
partial coverage for HIPAA Safe Harbor categories that regex can’t reach
(unprefixed names, free-form addresses, biometrics references).
Presidio is an optional dependency:
pip install presidio_analyzer
python -m spacy download en_core_web_lg
When Presidio isn’t installed, use_ner=True is a silent no-op — the
evaluator just runs the regex/checksum pipeline.
Sample output
PII detected (3 type(s)):
patient_name: "John Smith" (1 match)
medical_record_number: "MRN 12345" (1 match)
email: "[email protected]" (1 match)
With strict=True, decoys like 1234-5678-9012 (not Verhoeff-valid
Aadhaar), 4532-0151-1283-0367 (Luhn-invalid Visa shape), and
123-45-6789 (test SSN) are dropped from the report.
SchemaEvaluator
Validates that LLM outputs conform to a Pydantic model or JSON Schema dict. Zero API calls — validation is purely local.
When to use: Structured output tasks — extraction, classification, API response generation — where you need per-field failure breakdowns rather than a binary pass/fail.
Passes when output is valid JSON matching the schema. Fails with per-field error messages.
from pydantic import BaseModel
from multivon_eval import SchemaEvaluator
class Summary(BaseModel):
title: str
score: float
tags: list[str]
suite.add_evaluators(SchemaEvaluator(Summary))
# JSON Schema alternative
suite.add_evaluators(SchemaEvaluator({
"type": "object",
"required": ["title", "score"],
"properties": {
"title": {"type": "string"},
"score": {"type": "number", "minimum": 0, "maximum": 1},
}
}))
Parameters:
| Parameter | Type | Default | Description |
|---|
schema | type | dict | required | Pydantic model class or JSON Schema dict |
strict | bool | False | If True, extra fields not in the schema are also treated as failures |
threshold | float | 1.0 | Minimum score to pass (default: any field error = fail) |
strict mode behavior: When strict=False (default), extra keys in the JSON output are ignored — only required fields and type constraints are checked. When strict=True, any key present in the output that is not declared in the schema is counted as a violation. Use strict mode when you need to enforce that the model doesn’t leak internal fields or hallucinate extra properties.
Strips markdown code fences automatically before parsing. For JSON Schema, scoring is proportional: score = max(0.0, 1.0 - errors/10). For Pydantic models, any validation error returns score 0.0.
Sample output:
Schema validation failed:
score: Input should be a valid number
tags: Field required
ComplianceReporter
Not an evaluator — a report writer. Produces tamper-evident NDJSON audit trails.
See Compliance & Privacy guide for full documentation.