Skip to content

Experiments

verikt’s claims are backed by experiments, not opinions. Each experiment isolates a single variable, measures the outcome, and reports what happened — including null results and invalid runs. We publish falsified hypotheses alongside validated ones.

Each experiment runs the same agent on the same task. Condition A: no guide — what most teams do today. Condition B: verikt guide output prepended to the prompt. Same model, one variable.

100%architecture compliance with guide
22.2×less variance across runs
0extra tokens — guide loads from cache

The guide doesn’t make agents better on average — it makes them consistently correct.

Every experiment follows the same protocol: same agent, same task, one variable changed. Every run persists the full prompt, raw response, and verikt check output as artifacts. Full methodology →


Try it yourself: brew install diktahq/tap/verikt && verikt guide — then open your AI agent. Quick start →