Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.multivon.ai/llms.txt

Use this file to discover all available pages before exploring further.

multivon-eval ships three Anthropic Agent Skills so Claude Code knows when to bootstrap an eval suite, when to gate a PR on it, and how to explain why a particular evaluator was picked. The skills are plain Markdown with YAML frontmatter — no DSL — and they live in multivon_eval/_skills/ inside the PyPI package.

What is an Agent Skill?

A skill is a directory containing a SKILL.md file that Claude Code auto-loads from ~/.claude/skills/. The frontmatter declares name, description, and allowed-tools; the body is instructions Claude Code reads when the description matches the user’s request. Without skills, Claude Code has to infer evaluator selection and CLI flags from your docs — and hallucinates command names half the time. With skills, the tool’s own team writes the workflow once and every Claude Code session inherits it.

The bootstrap to audit to explain loop

The three skills form one iterative loop. You start cold, run bootstrap, ask explain to surface what just got picked, then on every subsequent PR audit gates the change against the suite bootstrap generated.

eval-bootstrap

Cold-start an eval suite from a product description plus sample traces. Emits eval_suite.py, seed_cases.jsonl, thresholds.yaml, DISCOVERY_REPORT.md.

eval-audit

Pre-ship gate on a PR diff. Runs only the cases that stress the changed surface. Blocks safety-class regressions at p < 0.05.

eval-explain

Three-sentence answer to “why did multivon pick this evaluator”. Reads DISCOVERY_REPORT.md plus the evaluator docstring.

Install

The skills ship inside the multivon-eval PyPI package (>= 0.9.8). One command writes them into ~/.claude/skills/.
pip install 'multivon-eval>=0.9.8'
multivon-eval install-skills
install-skills prefers a directory symlink into ~/.claude/skills/eval-bootstrap (and the two siblings) so a later pip install -U multivon-eval propagates SKILL.md edits without re-running the command. On Windows or filesystems that refuse directory symlinks, it falls back to a recursive copy and prints a note that you’ll need to re-run the command after upgrades.
Two flags worth knowing:
--dry-run
flag
Print what would happen — which source paths, which targets, symlink vs copy — without touching the filesystem. Run this first if you’re unsure what ~/.claude/skills/ already contains.
--force
flag
Replace existing entries at the three target paths. Without --force, the command refuses to overwrite anything already on disk and tells you which entries collided.

Auto-discovery flow

Once the symlinks exist under ~/.claude/skills/, Claude Code auto-loads them on session start — no config edit, no restart of an existing session required (a new session picks them up on next launch). Each session’s tool list includes the skill names; Claude Code matches user phrases against each skill’s description and invokes the matching skill before answering. The auto-invoke triggers are spelled out on each skill’s detail page. Verify the install:
ls ~/.claude/skills/
# Expect: eval-audit  eval-bootstrap  eval-explain  (plus any others you have)

Manual fallback

If you can’t run install-skills (older versions, or you want to vendor into a different directory), the symlinks are three lines:
PKG_PATH=$(python -c "import multivon_eval, pathlib; print(pathlib.Path(multivon_eval.__file__).parent)")
mkdir -p ~/.claude/skills
ln -sf "$PKG_PATH/_skills/eval-bootstrap" ~/.claude/skills/eval-bootstrap
ln -sf "$PKG_PATH/_skills/eval-audit"     ~/.claude/skills/eval-audit
ln -sf "$PKG_PATH/_skills/eval-explain"   ~/.claude/skills/eval-explain

What you can do once installed

Three concrete prompts that route to the three skills:
"Add evals to this project."

→ Routes to eval-bootstrap. Reads pyproject.toml to detect your
  LLM provider, finds sample traces (traces/, data/traces/, or
  prompts you), runs `multivon-eval bootstrap`, rewrites
  `stub_model` to call your real model, writes EVALS.md.
The three together close the loop from cold project to gated CI without you ever leaving Claude Code.