Documentation Index
Fetch the complete documentation index at: https://docs.multivon.ai/llms.txt
Use this file to discover all available pages before exploring further.
pdfhell is a single binary installed by pip install pdfhell (or via uvx pdfhell …). All subcommands are non-interactive — designed for CI and scripting.
pdfhell list-traps
Print every available trap family on stdout, one per line.
$ pdfhell list-traps
hidden_ocr_mismatch
footnote_override
split_table_across_pages
pdfhell make
Generate one trap PDF + its case JSON for inspection.
pdfhell make --trap <family> --seed <int> [--out <dir>]
| Flag | Default | Notes |
|---|
--trap | required | One of the family names from pdfhell list-traps. |
--seed | required | Integer seed. Same seed → byte-identical PDF + identical answer key. |
--out | ./cases | Output directory. Created if missing. |
Writes <case_id>.pdf and <case_id>.json to --out. The JSON includes the expected answer, forbidden answers (trap-caught failure modes), and metadata.
pdfhell build
Materialise a named suite to disk.
pdfhell build --suite <smoke|mini> --out <dir>
| Flag | Default | Notes |
|---|
--suite | mini | smoke (3 cases) or mini (30 cases). |
--out | ./cases/<suite> | Output directory. |
Used by pdfhell run automatically on first use — you rarely need to call this directly.
pdfhell run — main entry point
Evaluate a vision model against a suite.
pdfhell run --model <provider>:<model>
[--suite smoke|mini]
[--cases-dir <dir>]
[--workers <n>]
[--out <path>]
[--junit <path>]
[--audit-pack <path>]
[--fail-threshold <0.0-1.0>]
[--quiet]
| Flag | Default | Notes |
|---|
--model | required | provider:model. Providers: anthropic, openai, google. Examples: anthropic:claude-sonnet-4-6, openai:gpt-4o, google:gemini-2.5-flash. |
--suite | mini | smoke or mini. |
--cases-dir | ./cases/<suite> | Built on first use if missing. |
--workers | 4 | Parallel API requests. |
--out | runs/<suite>-<model>.json | Full report JSON. |
--junit | (none) | Optional JUnit XML for CI dashboards (GitHub Actions, GitLab CI, Jenkins). |
--audit-pack | (none) | Optional hash-chained ZIP: PDFs + answer keys + run JSON + JUnit + SHA-256 manifest + README. The artifact procurement teams need. |
--fail-threshold | (none) | Float in [0.0, 1.0]. Exits non-zero if pass_rate is below this — for CI gates. |
--quiet | false | Suppress per-case progress; print summary only. |
API key comes from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY). pdfhell never reads them from disk or asks for them interactively.
pdfhell report
Print a saved run’s summary.
pdfhell report runs/mini-anthropic-claude-sonnet-4-6.json
Useful for re-rendering a previous run’s headline without re-running the model.
Exit codes
| Code | Meaning |
|---|
0 | Run completed; if --fail-threshold was set, the threshold was met. |
1 | Run completed but pass_rate was below --fail-threshold. CI should treat as failure. |
2 | Bad arguments (unknown suite, unknown trap, missing required flag). |
The --out JSON has this shape:
{
"model": "anthropic:claude-sonnet-4-6",
"suite": "mini",
"n": 30,
"pass_rate": 0.967,
"refused_rate": 0.0,
"per_trap_pass": {
"hidden_ocr_mismatch": 1.0,
"footnote_override": 0.9,
"split_table_across_pages": 1.0
},
"per_trap_fell_for_trap": { },
"cases": [
{
"case_id": "hidden_ocr_mismatch-1001",
"trap_family": "hidden_ocr_mismatch",
"correct": true,
"fell_for_trap": false,
"refused": false,
"expected": "$1,234.56",
"model_output": "$1,234.56",
"matched_expected": true,
"matched_forbidden": [],
"failure_mode": ""
}
]
}
per_trap_fell_for_trap is the diagnostic signal: a model that’s getting only 60% on a trap family but fell_for_trap=0.6 is consistently caught by the designed failure mode (the trap is working). A model at 60% with fell_for_trap=0 is failing by hallucinating something else — different bug, different fix.
pdfhell discover
Emit pdfhell’s machine-readable capability catalog as JSON to stdout. The same shape an agent gets via the multivon-mcp eval_discover tool — provided as a CLI so agents that don’t speak MCP (Claude Code via Bash, shell scripts, CI gates planning a run) can pipe pdfhell discover --json | jq ....
pdfhell discover # pretty-printed
pdfhell discover --compact # single-line JSON for piping
Output shape:
{
"package": "pdfhell",
"version": "0.1.3",
"traps": [
{"name": "hidden_ocr_mismatch", "example_question": "…", "example_expected_answer": "$18,900.25"},
…
],
"suites": [
{"name": "smoke", "version": "smoke-v1", "suite_hash": "8cb2f6ab", "total_cases": 3, "trap_seeds": {…}},
{"name": "mini", "version": "mini-v1", "suite_hash": "8ad87b8d", "total_cases": 30, "trap_seeds": {…}}
]
}
Use this when an agent needs to plan a run (e.g. “list the trap families before I call pdfhell_run”) without round-tripping through MCP.
Scoring notes
pdfhell uses contains-match scoring (whitespace-tolerant, case-insensitive, with trailing-punctuation strip). One nuance worth knowing:
Currency-prefix tolerance. When the expected answer starts with a currency symbol ($, €, £, ¥, ₹) immediately before a digit, the matcher accepts the answer with or without the symbol. So expected = "$780,803.18" matches a model output of either "$780,803.18" or "780,803.18". This avoids false negatives on the split-table trap, where models often omit the $ even when the table column header includes it. Symmetric: an expected = "780,803.18" (no prefix) matches a model output of "$780,803.18" too.
Known limitation: short numeric-only answers can substring-match longer numbers ("18" matches "1875"). Pad your expected answers with the surrounding context (e.g. "$18.00" rather than "18") if you need stricter matching.