Claude Code Skills

multivon-eval ships three Anthropic Agent Skills so Claude Code knows when to bootstrap an eval suite, when to gate a PR on it, and how to explain why a particular evaluator was picked. The skills are plain Markdown with YAML frontmatter (no DSL), and they live in multivon_eval/_skills/ inside the PyPI package.

What is an Agent Skill?

A skill is a directory containing a SKILL.md file that Claude Code auto-loads from ~/.claude/skills/. The frontmatter declares name, description, and allowed-tools; the body is instructions Claude Code reads when the description matches the user’s request. Without skills, Claude Code has to infer evaluator selection and CLI flags from your docs, and hallucinates command names half the time. With skills, the tool’s own team writes the workflow once and every Claude Code session inherits it.

The bootstrap to audit to explain loop

The three skills form one iterative loop. You start cold, run bootstrap, ask explain to surface what just got picked, then on every subsequent PR audit gates the change against the suite bootstrap generated.

eval-bootstrap

Cold-start an eval suite from a product description plus sample traces. Emits eval_suite.py, seed_cases.jsonl, thresholds.yaml, DISCOVERY_REPORT.md.

eval-audit

Pre-ship gate on a PR diff. Runs only the cases that stress the changed surface. Blocks safety-class regressions at p < 0.05.

eval-explain

Three-sentence answer to “why did multivon pick this evaluator”. Reads DISCOVERY_REPORT.md plus the evaluator docstring.

Install

The skills ship inside the multivon-eval PyPI package (>= 0.9.8). One command writes them into ~/.claude/skills/.

pip install 'multivon-eval>=0.9.8'
multivon-eval install-skills

install-skills prefers a directory symlink into ~/.claude/skills/eval-bootstrap (and the two siblings) so a later pip install -U multivon-eval propagates SKILL.md edits without re-running the command. On Windows or filesystems that refuse directory symlinks, it falls back to a recursive copy and prints a note that you’ll need to re-run the command after upgrades.

Two flags worth knowing:

--dry-run

flag

Print what would happen — which source paths, which targets, symlink vs copy — without touching the filesystem. Run this first if you’re unsure what ~/.claude/skills/ already contains.

--force

flag

Replace existing entries at the three target paths. Without --force, the command refuses to overwrite anything already on disk and tells you which entries collided.

Auto-discovery flow

Once the symlinks exist under ~/.claude/skills/, Claude Code auto-loads them on session start. No config edit is needed; a new session picks them up on next launch. Each session’s tool list includes the skill names; Claude Code matches user phrases against each skill’s description and invokes the matching skill before answering. The auto-invoke triggers are spelled out on each skill’s detail page. Verify the install:

ls ~/.claude/skills/
# Expect: eval-audit  eval-bootstrap  eval-explain  (plus any others you have)

Manual fallback

If you can’t run install-skills (older versions, or you want to vendor into a different directory), the symlinks are three lines:

PKG_PATH=$(python -c "import multivon_eval, pathlib; print(pathlib.Path(multivon_eval.__file__).parent)")
mkdir -p ~/.claude/skills
# rm first: `ln -sf` against an existing symlinked DIRECTORY creates a
# nested link inside the target instead of replacing it.
rm -rf ~/.claude/skills/eval-bootstrap ~/.claude/skills/eval-audit ~/.claude/skills/eval-explain
ln -s "$PKG_PATH/_skills/eval-bootstrap" ~/.claude/skills/eval-bootstrap
ln -s "$PKG_PATH/_skills/eval-audit"     ~/.claude/skills/eval-audit
ln -s "$PKG_PATH/_skills/eval-explain"   ~/.claude/skills/eval-explain

What you can do once installed

Three concrete prompts that route to the three skills:

"Add evals to this project."

→ Routes to eval-bootstrap. Reads pyproject.toml to detect your
  LLM provider, finds sample traces (traces/, data/traces/, or
  prompts you), runs `multivon-eval bootstrap`, rewrites
  `stub_model` to call your real model, writes EVALS.md.

"Will this prompt change regress evals?"

→ Routes to eval-audit. Scopes the diff via
  `multivon_eval.attribution.scan`, identifies stressed cases,
  runs them with `python eval_suite.py --runs 3`, reports
  PASS / WARN / BLOCK with Wilson CI + paired-McNemar.

"Why did multivon recommend Faithfulness here?"

→ Routes to eval-explain. Reads DISCOVERY_REPORT.md, the
  evaluator's docstring, and an example seed case. Answers
  in exactly 3 sentences plus an illustrative case.

The three together close the loop from cold project to gated CI without you ever leaving Claude Code.

Getting Started

Evaluators

Guides

Reference

Compliance

Claude Code Skills

What is an Agent Skill?

The bootstrap to audit to explain loop

eval-bootstrap

eval-audit

eval-explain

Install

Auto-discovery flow

Manual fallback

What you can do once installed

​What is an Agent Skill?

​The bootstrap to audit to explain loop

eval-bootstrap

eval-audit

eval-explain

​Install

​Auto-discovery flow

​Manual fallback

​What you can do once installed

What is an Agent Skill?

The bootstrap to audit to explain loop

Install

Auto-discovery flow

Manual fallback

What you can do once installed