mh-may11-501 lifecycle · best-of-N output vs hand-tuned
Left: best-of-N=3 with calibrated audit (composite ≥ 13, no dim < 3, no figurative). Right: hand-tuned v5 reference. 5/7 generator outputs shipped; 2 fallback to bento-icon. Average composite: 15.0 generator (shipped) vs 14.9 hand-tuned.
Problem · 1 — "Strategy without execution"
Generator
round 2 pass
F3·C3·D3·R5 → composite 14
Hand-tuned
F3·C3·D3·R4 → composite 13
Problem · 2 — "Fractional generalists"
Generator
round 1 pass
F4·C3·D3·R5 → composite 15
Hand-tuned
F4·C3·D4·R5 → composite 16
Problem · 3 — "Junior at the console"
Generator → bento fallback
fallback
best 12; fell back to bento-icon
Hand-tuned
F3·C3·D3·R4 → composite 13
Solution · 1 — "Single owner"
Generator → bento fallback
fallback
best 12; fell back to bento-icon
Hand-tuned
F4·C4·D4·R5 → composite 17 (top of run)
Solution · 2 — "ESP fluent"
Generator
round 1 pass
F4·C4·D3·R4 → composite 15
Hand-tuned
F3·C4·D3·R5 → composite 15
Solution · 3 — "Attribution by week two"
Generator
round 1 pass
F4·C4·D3·R5 → composite 16
Hand-tuned
F4·C3·D4·R4 → composite 15
Differentiation
Generator
round 1 pass
F4·C3·D3·R5 → composite 15
Hand-tuned
F3·C4·D3·R5 → composite 15