WAC*
Summary
Stancil draws a parallel between how we evaluate AI models (standardized benchmarks), how we evaluate job candidates (interviews and tests), and how we evaluate baseball players (statistics like WAR), arguing that all three domains have learned that standardized tests fail to capture real-world performance. He suggests the real benchmark for any AI product—or employee—is now how much value they add above the baseline of Claude or similar AI tools connected to internal resources, coining the metric 'WAC' (Wins Above Claude).
Key Insight
The real measure of any AI product or knowledge worker is no longer performance on standardized benchmarks but value delivered above the increasingly capable and accessible baseline of general-purpose AI agents connected to your actual tools and context.
Spicy Quotes (click to share)
- 6
How do you create a standardized benchmark when context—the idiosyncratic stuff that is, almost by definition, impossible to standardize—is the critical thing that makes one tool useful and another tool unusable?
- 7
We don't benchmark employees, nor do we benchmark traditional software. Perhaps AI software will soon be the same. And instead, the next question that every product—and maybe every employee?—will have to answer, is, how much are you worth above C*?
- 6
Nobody cared about the scores or about which models performed the best against some arbitrary set of tasks. They cared about how well different products performed against some system that they'd hacked together over the last two weeks, and had integrated with their relevant internal tools and resources.
- 5
Could a harness be a web interface that encourages its users to write better prompts? Could it be a text box with useful suggestions for what to type? Could it just be a text that is tall enough to type in?
Tone
satirical, analytical
