EXP-06 — Someone Else's Codebase
The previous experiments all used services the agent built from scratch. This one tests what happens on a real, unfamiliar codebase the agent didn’t build — go-feature-flag, ~15k LOC of production Go.
What the agent produced
Section titled “What the agent produced”EXP-06d — flagship: guide-only vs guide + codebase mapping
Guide only Guide + codebase mapping───────────────────── ──────────────────────────────4 violations ✗ 0 violations ✓ passbaseline tokens −67% tokensbaseline turns −50% turnsViolations by type (guide-only condition)
Dependency violations: presentArchitecture violations: presentverikt check: FAILCapabilities
Guide only Guide + mapping───────────────────── ──────────────────────────────rules without context rules + where things liveagent explores first agent acts immediatelyMetrics (EXP-06d, n=1)
Section titled “Metrics (EXP-06d, n=1)”| Guide only | Guide + mapping | |
|---|---|---|
| Violations | 4 | 0 |
| verikt check | fail | pass |
| Token usage | baseline | −67% |
| Agent turns | baseline | −50% |
Finding
Section titled “Finding”The guide tells the agent the rules. The mapping tells it where things live.
Without the map, the agent works in an unfamiliar codebase by exploring — reading files, tracing imports, figuring out the structure before acting. That exploration costs tokens and turns, and the agent still guesses wrong about where to put new code.
With the map, the agent knows the structure before it starts. It acts immediately, and it acts correctly.
Guide alone is insufficient for brownfield work. Guide + mapping eliminates violations and cuts token usage in half.
Sub-experiments
Section titled “Sub-experiments”Four conditions were tested, escalating from stateless to fully agentic:
- EXP-06a: Broad stateless prompt, no tool access — null result (expected)
- EXP-06b: Scoped stateless prompt, no tool access — null result (expected)
- EXP-06c: Agentic with tools, guide vs no-guide
- EXP-06d: Agentic with tools, guide+mapping vs guide-only — flagship result above
- Fixture: go-feature-flag v1.30.0 (~15k LOC production Go)
- Agent: claude-sonnet-4-6 with file tools
- Fixture delivery: tool-access (Mode C — agent reads codebase directly)
→ Experiment Methodology — reproduction instructions
→ EXP-07: Does variance reduction hold on feature additions?