verikt’s claims are backed by experiments, not opinions. Each experiment isolates a single variable, measures the outcome, and reports what happened — including null results and invalid runs. We publish falsified hypotheses alongside validated ones.
Each experiment runs the same agent on the same task. Condition A: no guide — what most teams do today. Condition B: verikt guide output prepended to the prompt. Same model, one variable.
100%architecture compliance with guide
22.2×less variance across runs
0extra tokens — guide loads from cache
The guide doesn’t make agents better on average — it makes them consistently correct.
EXP-01Validatedn=3
Same task, different architecture
Guide: hexagonal + 100% compliance. Control: flat, 0% compliance.
EXP-02Validatedn=3
Three engineers, one standard
Variance [2,1,1] → [0,0,0]. 22.2× reduction.
EXP-03Partialn=3
Build me a job runner
Architecture violations [6,8,7] → [1,1,1]. Anti-patterns unchanged.
EXP-04Null resultn=1
New feature, existing service
Both conditions: 0 violations. On a conforming codebase, the existing architecture constrains the agent — the guide adds nothing measurable.
EXP-05Promisingn=1
Does the prompt even matter?
Guide is the dominant variable. Prompt quality secondary.
EXP-06Validatedn=1
Someone else’s codebase
Guide-only: 4 violations. Guide + mapping: 0 violations, −67% tokens.
EXP-07Null resultn=3
Consistency on feature additions
Both conditions: [0,0,0] violations, zero variance. On a conforming fixture, the codebase does the teaching — not the guide.
EXP-08Partialn=3
Anti-pattern prevention
Architecture violations [6,8,7] → [1,1,1]. NEVER rules ignored when task requires the pattern. Same data as EXP-03 — confirms the boundary.
EXP-09Validatedn=2
Does the guide expand what agents recommend?
Without guide: “consider adding retries.” With guide: verikt add retry circuit-breaker idempotency — and a RECOMMENDATIONS file generated every time.
EXP-10Validatedn=12
Governance checkpoint
Variance [2,0,2]→[1,1,1]. 80-token checkpoint eliminates variance, reduces violations 25% on greenfield. Ceiling effect on brownfield.
Every experiment follows the same protocol: same agent, same task, one variable changed. Every run persists the full prompt, raw response, and verikt check output as artifacts. Full methodology →
Try it yourself: brew install diktahq/tap/verikt && verikt guide — then open your AI agent. Quick start →