The Word "Test" Has Eaten Engineering

The Word "Test" Has Eaten Engineering
Three different things now wear the same name, and we are losing the difference between them
David Proctor
Apr 28, 2026
In 2012, an engineer at Microsoft Bing got bored of waiting. A colleague had pitched a tiny tweak to how ad headlines were displayed. The idea sat in a backlog for months, too small to argue about, too speculative to ship. So the engineer did the lazy thing: he ran an A/B test. Within hours the new headline variation was producing abnormally high revenue, triggering a "too good to be true" alert. An analysis showed that the change had increased revenue by an astonishing 12%, which on an annual basis would come to more than $100 million in the United States alone, without hurting key user-experience metrics.
The story is told inside Microsoft as a parable about the power of experimentation. I read it as something else: the moment the word test quietly stopped meaning what it used to.
Once upon a time, a "test" was a thing you ran before you shipped. You wrote some code, you wrote some assertions about that code, you ran the assertions, and a green checkmark gave you permission to merge. Today, when an engineer says the word "test," they could mean any of three completely different activities — with completely different epistemologies, completely different failure modes, and completely different ethical implications.
Three Things, One Word
1. Verification
The classic unit test. A piece of code that asserts another piece of code does what its author intended. Cheap, deterministic, repeatable. This is what your CS professor meant.
2. Experiment
The A/B test — not a test of code at all, but a test of users. You ship two variants and let randomness decide which one survives. Microsoft, Amazon, Booking.com, Facebook, and Google each conduct more than 10,000 online controlled experiments annually.
3. Observation
Testing as a property of production itself. Feature flags, canary deploys, high-cardinality telemetry. Continuous, instrumented, observation-as-verification. The newest meaning, and the most interesting.
These three things are not variants of the same idea. They are different ideas wearing the same hat.
Everyone is Testing on You, All the Time
Representatives with experience in large-scale experimentation from thirteen different organizations — Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, Yandex, and Stanford University — were invited to the first Practical Online Controlled Experiments Summit. Together these organizations tested more than one hundred thousand experiment treatments last year.
100K+
Experiments per year
Across 13 major tech organizations combined
25K
Booking.com tests
Roughly 70 experiments per day on live users
10K+
Per company annually
Microsoft, Amazon, Google, Facebook each exceed this threshold
"This is sold as a triumph of empiricism, and in narrow ways it is. Jeff Bezos has spent two decades evangelizing the idea that "Our success at Amazon is a function of how many experiments do we per year, per month, per week, per day.""
"But notice what an A/B "test" is actually testing. It is not testing whether the code works. It is testing whether a particular metric — clicks, conversions, time-on-page — moves up. The thing being verified is not correctness; it is desirability of an outcome the company has chosen to optimize for. Booking.com at least knows this, which is why Booking.com maintains guardrails such as cancellation rates, customer service contact rate, payment failures, refund friction, load times, and accessibility. Most companies running experiments do not have those guardrails. They have a primary metric and a stopwatch."
"When you call this a "test," you are borrowing the moral authority of a unit test — the implication that something has been verified — and applying it to a process whose actual question is "did more people click?" These are not the same question."
The QA Role Didn't Die. It Got Rewritten.
Meanwhile, the original meaning of "test" — the boring, deterministic, pre-production kind — is going through its own quiet revolution. The dedicated QA department, the one whose entire job was to keep bad code from reaching users, is being dismantled.
The Indeed Story
In March 2023, Indeed laid off 15% of its 14,000-person staff. The layoffs resulted in the elimination of the QA role — most QA engineers were let go. Those not shown the door were given 6 months to pass an interview and transition to an SWE position.
What happened next is the depressingly predictable part: an anonymous software engineer told Gergely Orosz that since the layoffs, "The overall quality of tests has nosedived."
Industry-Wide Shifts
43%
Manual testing decline
Since 2023
17%
QA engineering growth
From 2023 to 2025
64%
AI adoption in QA
Actively using or building a roadmap
The cynical reading is that companies used the AI moment as cover for cost-cutting they wanted to do anyway. A QA team of 10 could easily cost $1m per year. The more interesting reading is that the role of testing has migrated up the stack. It has stopped being a department and started being a property of the deploy pipeline, and increasingly, of the running system itself.
Where 'Test' Lives Now
"The only environment that matters is production. For the good of humanity, ditch the rest. — Charity Majors, CEO, Honeycomb"
It is a heresy, and it is also mostly true. If you run a distributed system in 2025, the staging environment is a fond fiction. The interesting failures — cardinality explosions, slow p99 drift, the photo that loads for 99.7% of users but not for the ones in São Paulo on Android 11 — are not catchable in any reasonable approximation of production.
So the third meaning of "test" is born: continuous, instrumented, observation-as-verification. Feature flags. Canary deploys. High-cardinality telemetry. "I hope that we look back in a couple years at the bad old days when we used to just ship code and wait to get paged."
The Cost of Conflation
When "we tested it" can mean "the unit tests passed" *or* "the A/B variant won" *or* "we shipped it behind a flag and watched the dashboards," then "we tested it" means almost nothing. It is a phrase you use to end a meeting.
The fix is not nostalgia. It is not bringing back the dedicated QA department. It is, more modestly, a discipline of language. When you say "test," say which one. When you read "we A/B tested it," ask what the guardrail metrics were. When someone tells you they "test in production," ask who reads the dashboards at 3 a.m. and what they do when they see something weird.
Two Takeaways
For Engineers
Stop using "test" as an unmodified noun. Verification, experiment, observation — pick one. The word test on its own is now ambient noise.
For Everyone Else
The next time a product tells you it has been "thoroughly tested," remember that you may be the test. That is sometimes fine. It is sometimes not. The difference is worth a question.
Sources
References and further reading for The Word "Test" Has Eaten Engineering by David Proctor.
[1] The QA role is changing, but QA will never die
Qase (2024). https://qase.io/blog/the-qa-role-is-changing/
[2] Why QA Engineers Are Reinventing Themselves in 2025
Coders Stop / Medium (2025). https://medium.com/@coders.stop/why-qa-engineers-are-reinventing-themselves-in-2025-dcc1d67b117f
[3] Quality Assurance Across the Tech Industry
The Pragmatic Engineer (Gergely Orosz). https://newsletter.pragmaticengineer.com/p/qa-across-tech
[4] Will AI Replace QA Engineers?
Momentic (citing 2024/25 World Quality Report). https://momentic.ai/blog/will-ai-replace-qa-engineers
[5] Honeycomb's Charity Majors: Go Ahead, Test in Production
The New Stack. https://thenewstack.io/honeycombs-charity-majors-go-ahead-test-in-production/
[6] Test in production: Yes, you can (and you should)
O'Reilly Velocity / Charity Majors. https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/75076.html
[7] Test in Production With Charity Majors
CoRecursive Podcast. https://corecursive.com/019-test-in-production-with-charity-majors/
[8] ExP Platform — Accelerating Innovation through Trustworthy Experimentation
Microsoft / Ron Kohavi et al. https://exp-platform.com/
[9] How A/B Testing Helps Microsoft and Why You Should Consider It Too
Rhys Kilian / Medium. https://medium.com/@rtkilian/how-a-b-testing-helps-microsoft-and-why-you-should-consider-it-too-c975f2922ffe
[10] How Many A/B Tests Should You Run Per Month?
Convert.com. https://www.convert.com/blog/a-b-testing/how-many-ab-tests-can-you-run/
[11] The Rise of Experimentation as The New Industry Standard
Statsig. https://www.statsig.com/blog/the-rise-of-experimentation
[12] Scaling A/B Testing: Inside Booking.com, Netflix & Microsoft's Experimentation
Venue Cloud. https://venue.cloud/news/insights/scaling-a-b-testing-inside-booking-com-netflix-microsoft-s-experimentation
[13] The Surprising Power of Online Experiments
Harvard Business Review (Kohavi & Thomke, 2017). https://hbr.org/2017/09/the-surprising-power-of-online-experiments
Latent Genius is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
© 2026 David Proctor · Privacy · Terms · Collection notice
Start your Substack
Get the app