What multivon-eval is, mechanically
When you callComplianceReporter.record(report), the library appends a JSON record to a local NDJSON file. The record contains:
| Field | What it captures |
|---|---|
record_id | 12-char hex ID (truncated from a UUID4) — sufficient for cross-referencing within a suite’s log; pair with timestamp and suite_name for global uniqueness |
timestamp | UTC ISO-8601 at record time |
framework | "eu-ai-act" / "nist-ai-rmf" / "hipaa" / "none" |
chain_version | Format version of the chained payload (currently 1) |
prev_hash | SHA-256 of the previous record’s payload (or 64 zeros for the first) |
summary | Pass/fail counts, error counts, pass rate, stability score, your tags |
evaluator_results | Per-evaluator avg score + pass rate + control mappings |
provenance | Package version, git SHA (with dirty flag), Python + OS, full SuiteLock with evaluator fingerprints, judge configs used, calibration entries hit, and the cases hash |
record_hash | SHA-256 of the entire payload above (excluding this field) |
multivon_eval/compliance.py:563–642.
The records are linked into a hash chain — deleting or editing any record mid-log invalidates every subsequent record’s prev_hash. See the audit-trail page for the algorithm and the verifier.
Data flow — where bytes go
JudgeConfig: Anthropic, OpenAI, Google, an on-prem vLLM/Ollama instance, or any OpenAI-compatible URL. If your DPIA precludes cloud judges, point JudgeConfig at a local model and no judge data leaves either.
Implication for your DPIA / RoPA: for the eval workflow, your organization is both the data controller and the data processor. Multivon is not a sub-processor. If you use a cloud LLM judge, that vendor is the sub-processor for the judge call only; the rest of the eval (cases, outputs, audit log) never reaches them.
Frameworks mapped today
| Framework | Measurable controls | Process controls | Source |
|---|---|---|---|
| EU AI Act (Regulation (EU) 2024/1689) | 5 (Art. 9(2)(b), 10(2)(f-g), 10(5), 15(1), 15(2)) | 5 (Art. 11, 12, 13, 14, 15(4-5)) | compliance.py:163–180 |
| NIST AI RMF 1.0 | 5 (MEASURE 2.3, 2.5, 2.6, 2.10, 2.11) | 5 (GOVERN 1.1, MEASURE 2.7, 2.8, 2.9, MANAGE 4.1) | compliance.py:231–245 |
| HIPAA Security Rule (45 CFR §164.312) + Privacy Rule (§164.514(b)(2) Safe Harbor) | 4 — three Security Rule technical safeguards (§164.312(a), (b), (c)) + one Privacy Rule de-identification standard (§164.514(b)(2)) | 4 (§164.308, §164.310, §164.316, BAA) | compliance.py:299–316 |
ComplianceReporter, every evaluator result gets annotated with the controls it provides evidence for. The mappings are in _EU_AI_ACT_BY_EVALUATOR, _NIST_BY_EVALUATOR, and _HIPAA_BY_EVALUATOR — these dictionaries are auditable in the source. We list an evaluator against a control only when its output is direct evidence for that control; an auditor can re-derive every claim by reading the mapping tables and the evaluator implementations they reference.
Pre-flight coverage analysis
Before you run an eval suite against a regulated system, callreporter.coverage(suite) to see exactly which controls your evaluators exercise:
compliance.py:791–821.
What multivon-eval does NOT do
We are explicit about scope so a compliance buyer doesn’t discover the boundary in the middle of an audit.- No certification. multivon-eval produces evidence; auditors decide whether evidence is sufficient. We do not issue certificates of conformity.
- No legal opinion. The Article and subcategory mappings are our best reading of the published frameworks. We are not a law firm. A regulatory question about your specific deployment should go to your legal counsel.
- No organizational governance. The process controls in each framework (training records, role assignments, incident response, third-party risk management, business associate agreements) require organizational measures — multivon-eval cannot produce them.
- No real-time monitoring. A
ComplianceReporterrecords eval runs as you trigger them. Post-deployment monitoring (NIST MANAGE 4.1) requires you to call it from a scheduled job or production loop — the library doesn’t pull metrics itself. - No PHI / PII handling promise beyond evaluator output.
PIIEvaluator(jurisdiction="hipaa")regex-matches 13 of the 18 HIPAA Safe Harbor identifiers (MRN, NPI, DEA, license, device IDs, account numbers, certificate numbers, health-plan numbers, VINs, admission/discharge dates, fax, URLs). The 5 that regex cannot reliably detect — personal names, geographic subdivisions smaller than state, full-face photos, biometric identifiers, and arbitrary unique identifying numbers/characteristics — require upstream de-identification or human review. The evaluator does not redact PHI in transit, encrypt at rest, or enforce access control. Those are infrastructure concerns owned by the deploying team. - No vendor-of-record relationship for the cloud judges. If you configure an OpenAI judge, OpenAI is your sub-processor for the judge call. multivon-eval does not proxy or wrap that relationship.
- No telemetry, no account, no callback. The library does not phone home. There is no cloud component.
When the Compliance Bundle helps
Everything described above is in the open-source library, free under Apache 2.0. The Compliance Bundle adds the human services around it: framework-mapping updates as regulations change, calibrated judge threshold packs per new model release, customer-branded auditor templates, a named technical contact with an SLA, and a legally reviewed attestation letter you can include in your compliance file. It is in early access; the page describes what it does and does not include today.Quick links
- EU AI Act — Article-by-article coverage
- Audit trail — Hash chain mechanics + verifier
- Compliance Bundle — Scope + early-access status
- Sample audit package zip (5.5 KB) — what an auditor actually receives
- Security & data handling
multivon_eval/compliance.py— full source for the reporter, control catalog, and verifier

