mh-may11-501 lifecycle · best-of-N output vs hand-tuned

Left: best-of-N=3 with calibrated audit (composite ≥ 13, no dim < 3, no figurative). Right: hand-tuned v5 reference. 5/7 generator outputs shipped; 2 fallback to bento-icon. Average composite: 15.0 generator (shipped) vs 14.9 hand-tuned.

Problem · 1 — "Strategy without execution"

Generator
round 2 pass
F3·C3·D3·R5 → composite 14
Hand-tuned
F3·C3·D3·R4 → composite 13

Problem · 2 — "Fractional generalists"

Generator
round 1 pass
F4·C3·D3·R5 → composite 15
Hand-tuned
F4·C3·D4·R5 → composite 16

Problem · 3 — "Junior at the console"

Generator → bento fallback
fallback
best 12; fell back to bento-icon
Hand-tuned
F3·C3·D3·R4 → composite 13

Solution · 1 — "Single owner"

Generator → bento fallback
fallback
best 12; fell back to bento-icon
Hand-tuned
F4·C4·D4·R5 → composite 17 (top of run)

Solution · 2 — "ESP fluent"

Generator
round 1 pass
F4·C4·D3·R4 → composite 15
Hand-tuned
F3·C4·D3·R5 → composite 15

Solution · 3 — "Attribution by week two"

Generator
round 1 pass
F4·C4·D3·R5 → composite 16
Hand-tuned
F4·C3·D4·R4 → composite 15

Differentiation

Generator
round 1 pass
F4·C3·D3·R5 → composite 15
Hand-tuned
F3·C4·D3·R5 → composite 15