Documentation Index
Fetch the complete documentation index at: https://docs.multivon.ai/llms.txt
Use this file to discover all available pages before exploring further.
multivon-mcp is a Model Context Protocol server that exposes 19 evaluation tools to any MCP-compatible agent — Claude Desktop, Claude Code, Cursor, Cline, OpenCode. The agent calls them by name; no copy-paste, no python -c "...", no asking the agent to figure out the SDK calls.
When the agent is helping you build an LLM product, it can:
- Score a RAG output for hallucination without you writing the scaffolding
- Generate an adversarial PDF on demand to test your document AI
- Run the full pdfhell mini-suite against a model and analyse the results
- Produce a hash-chained audit pack for procurement diligence
- Discover the full evaluation capability catalog as JSON
Why an MCP server (not just a CLI)
The MCP layer matters because the agent is the user. A CLI is the right shape for humans driving terminals; an MCP server is the right shape for an LLM that needs to call the tool mid-edit, get JSON back, and condition its next action on the result. Three concrete wins:- Tool discovery is free.
eval_discoverreturns the full 44-evaluator catalog as JSON. The agent never has to parse markdown to know what’s available. - JSON results, not pretty-printed. Every tool returns
{"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float}. The agent can branch on the score programmatically. - Calibration is automatic. Each evaluator uses the calibrated threshold for the configured judge — no asking the agent to guess at the right cutoff.
Install
multivon-eval, pdfhell, and the MCP SDK. Provider SDKs (anthropic, openai, google-genai) come along — bring your own API key in env.
Next steps
- Configuration — wire it into Claude Desktop, Claude Code, Cursor, or Cline.
- Tool reference — every tool, every argument.
- Agent recipes — short patterns for common eval flows.

