Skip to content

EXP-05 — Does the Prompt Even Matter?

A lazy 3-line prompt with the guide outperformed a detailed requirements spec without it. The guide is the dominant variable — prompt quality is secondary.

B > Clazy+guide beat thorough+no guide
guidedominant variable

2×2 design: lazy vs thorough prompt × no guide vs guide. Same inventory service task across all four conditions.

Architecture shape

Condition A (lazy, no guide) Condition B (lazy, guide)
───────────────────────── ──────────────────────────────
(root)/ (root)/
internal/ adapter/
main.go domain/
port/
service/
cmd/
Condition C (thorough, no guide) Condition D (thorough, guide)
───────────────────────── ──────────────────────────────
(root)/ (root)/
internal/ adapter/
main.go domain/
port/
service/
cmd/

Violations

Condition Prompt Guide Hexagonal Violations Result
─────────────────────────────────────────────────────────────────
A lazy no No 6 FAIL
B lazy yes Yes 3 FAIL
C thorough no No 7 FAIL
D thorough yes Yes 1 FAIL

Capabilities wired

No-guide conditions (A, C) Guide conditions (B, D)
───────────────────────── ──────────────────────────────
(agent's discretion) http-api ✓ wired
mysql ✓ wired
platform ✓ wired
health ⚠ missing
validation ⚠ missing
ConditionPromptGuideHexagonalViolations
ALazyNoNo6
BLazyYesYes3
CThoroughNoNo7
DThoroughYesYes1

The guide is the dominant variable. Prompt quality is secondary.

Both guide conditions (B, D) produced hexagonal structure. Both no-guide conditions (A, C) produced flat — even with a thorough prompt that explicitly said “no business logic in HTTP handlers.”

The thorough prompt without the guide (C=7 violations) produced more violations than the lazy prompt without it (A=6). Extra specificity didn’t help — the agent produced a different but equally non-conforming flat structure.

B (lazy+guide, 3 violations) outperformed C (thorough+no-guide, 7 violations) on architecture. Engineers don’t need to write architecture instructions in every prompt. They need the guide loaded.

The original hypothesis — that B≈C (lazy+guide comparable to thorough+no-guide) — failed on this run: diff was 4, not ≤1. But the stronger finding emerged: the guide dominates regardless of prompt quality.

  • Task: Build an inventory service — products with name, SKU, quantity
  • Lazy prompt: 3 lines — service name, fields, HTTP API
  • Thorough prompt: Full requirements, typed errors, storage behind interface, explicit “no business logic in handlers”
  • Agent: claude-sonnet-4-6
  • Runs: 1 per condition (needs 2 more)

→ Experiment Methodology — reproduction instructions

→ Artifacts on GitHub

→ EXP-06: Does the guide help on someone else’s codebase?