Experiments

verikt’s claims are backed by experiments, not opinions. Each experiment isolates a single variable, measures the outcome, and reports what happened — including null results and invalid runs. We publish falsified hypotheses alongside validated ones.

Each experiment runs the same agent on the same task. Condition A: no guide — what most teams do today. Condition B: verikt guide output prepended to the prompt. Same model, one variable.

100%architecture compliance with guide

22.2×less variance across runs

0extra tokens — guide loads from cache

Results

The guide doesn’t make agents better on average — it makes them consistently correct.

EXP-01Validatedn=3

Same task, different architecture

Guide: hexagonal + 100% compliance. Control: flat, 0% compliance.

EXP-02Validatedn=3

Three engineers, one standard

Variance [2,1,1] → [0,0,0]. 22.2× reduction.

EXP-03Partialn=3

Build me a job runner

Architecture violations [6,8,7] → [1,1,1]. Anti-patterns unchanged.

EXP-04Null resultn=1

New feature, existing service

Both conditions: 0 violations. On a conforming codebase, the existing architecture constrains the agent — the guide adds nothing measurable.

EXP-05Promisingn=1

Does the prompt even matter?

Guide is the dominant variable. Prompt quality secondary.

EXP-06Validatedn=1

Someone else’s codebase

Guide-only: 4 violations. Guide + mapping: 0 violations, −67% tokens.

EXP-07Null resultn=3

Consistency on feature additions

Both conditions: [0,0,0] violations, zero variance. On a conforming fixture, the codebase does the teaching — not the guide.

EXP-08Partialn=3

Anti-pattern prevention

Architecture violations [6,8,7] → [1,1,1]. NEVER rules ignored when task requires the pattern. Same data as EXP-03 — confirms the boundary.

EXP-09Validatedn=2

Does the guide expand what agents recommend?

Without guide: “consider adding retries.” With guide: verikt add retry circuit-breaker idempotency — and a RECOMMENDATIONS file generated every time.

EXP-10Validatedn=12

Governance checkpoint

Variance [2,0,2]→[1,1,1]. 80-token checkpoint eliminates variance, reduces violations 25% on greenfield. Ceiling effect on brownfield.

Methodology

Every experiment follows the same protocol: same agent, same task, one variable changed. Every run persists the full prompt, raw response, and verikt check output as artifacts. Full methodology →

Try it yourself: brew install diktahq/tap/verikt && verikt guide — then open your AI agent. Quick start →