Stop Testing on Your Audience

9 mins

Two crash test dummies in a car with seatbelts on

When it comes to our experimentation infrastructure, we've assumed that human behavior is the metric that matters. But the paradigm has changed in the age of AI, and traditional A/B testing must evolve.

Scott Liewehr is Global Vice President of Market Strategy & Growth at Sitecore and a CMS Critic contributor.

In my last piece for CMS Critic, I introduced the H2A→A2H framework and argued that the audience for your content has fundamentally changed now that LLMs are in the mix. AI now sits on both sides of the equation: one AI helps you create, and a second AI decides whether your brand shows up at all.

If you haven’t read it, the short version is this: you’re writing for the wrong audience, and the invisible editor (the LLM) that mediates between your brand and your buyer rewards substance, structure, authority, and specificity, and ignores everything else.

This piece is about what happens next. Specifically, it’s about one of the most sacred practices in digital marketing and why I think it’s about to collapse under the weight of its own logic.

I’m talking about A/B testing.

Here’s the argument: if the primary consumer of your content is now an LLM, why are you still testing it on humans?

You’re optimizing for the wrong audience

The entire experimentation infrastructure that marketing has built over the past 15 years assumes that human behavior is the metric that matters. You publish multiple versions of a page, split your traffic, measure which one earns more clicks, more time on page, more conversions. The humans tell you which version wins.

That makes perfect sense in a human-to-human world, when a human creates content and another human consumes it. It even makes sense in an H2A2H world, where AI helps the marketer create or scale the content, but the audience is still human. But in an H2A→A2H world, where an LLM is deciding whether your content even reaches the human on the other end, you’ve got a measurement problem. You’re measuring human reactions to determine what works, but the LLM is the one separating signal from noise before any human sees it.

Think about it this way. When a buyer asks an LLM to recommend a product, the model assembles an answer by evaluating sources across a range of factors, including the substance, structure, authority, and specificity of your brand content. It decides which brands to cite and which to ignore. If your brand doesn’t pass that filter, no human ever sees it. It doesn’t matter which headline variant tested better with your website visitors if the invisible editor never selected your brand in the first place.

The experimentation question has to change. The old question was: which version does my human audience prefer? The new question is: which version does the invisible editor cite, and does it carry my brand position faithfully when synthesized into an answer? Those are fundamentally different questions, and traditional A/B testing was never designed to ask the second one.

I said in my last article that in an H2A→A2H reality, the list doesn’t exist. There is only one single, synthesized answer. If you’re still optimizing for a list by testing on the humans reading it, you’re solving last decade’s problem.

The opportunity cost nobody talks about

There’s a secondary argument here that I think is worth sitting with.

When you build three versions of a landing page for multivariate testing, you believe in all three. The test is how you discover which performs best. But the discovery comes at a cost: two-thirds of your audience bounces, fails to convert, or disengages entirely. Those are missed opportunities; real prospects who encountered the weaker versions and moved on. The learning is valuable, but the tuition of missed pipeline is expensive.

That cost would be easier to absorb if the learning were quick. A/B and multivariate testing demand statistical significance, which demands large sample sizes. For brands with massive traffic, this can happen reasonably quickly. But for the mid-market company with 15,000 monthly visitors? Running a rigorous test is a struggle. The sample sizes are too small, the test duration stretches into weeks or months, and by the time you have a statistically valid result, the market context may have shifted. For a meaningful number of brands, traditional experimentation has always been more aspirational than practical, and those brands that can’t reach significance are left guessing.

The opportunity cost compounds on both ends. Large-audience brands pay a “learning tax” across a lot of visitors. Small-audience brands can’t afford to run the test at all. Neither situation is great, and both are artifacts of a model that requires real people as its crash-test dummies.

In an AI-native model, both problems disappear. You generate hundreds of content variations using AI, then test every one against thousands of simulated buyer personas – AI agents that behave like real buyers with real intent, real objections, and real decision criteria – before a single real person ever sees any of it. The AI generates. The AI evaluates. The AI selects. And your entire audience gets the optimal version from the first impression.

No humans interacted with the underperforming version. The mid-market brand with limited traffic runs the same caliber of test as the enterprise with millions of visitors. The learning happened upstream, in simulation, at machine speed, and prior to publishing.

The experimentation paradigm inverts: you’re testing for your audience, not on them.

The assumption that needs rethinking

At this point, a reasonable marketer might think: “Okay, but don’t I still need experimentation infrastructure to evaluate all those variations? You’ve moved the testing upstream, but someone still has to figure out which version wins.”

It’s a fair instinct, and it’s rooted in how we’ve thought about experimentation for years. But I think it reflects an assumption that doesn’t hold in an AI-native workflow.

When the AI system that generated 200 variations can also evaluate them against simulated buyer personas, score them for likely performance, and select the winner, the “experimentation” has collapsed into the content intelligence layer itself. It’s a function of the orchestration system. The evaluation capability has been absorbed into an intelligent platform that does more. You don’t need a standalone experimentation product sitting between your CMS and your audience when the intelligence that created the content can also judge it.

There’s also a scope issue. Traditional experimentation asks a narrow question: Which version gets more clicks, more conversions, more engagement? Those are optimization questions within a fixed frame. The AI-native model lets you question the frame itself. You can ask: Does this content earn citations from the invisible editor? Does it hold up when a buyer asks a follow-up question and the AI goes back to the well? Does it carry my brand position faithfully across different query contexts?

That’s a different category of evaluation entirely, and it requires content intelligence, not experimentation tooling.

What this means practically

If you’re running a traditional experimentation program today, I’m not suggesting you shut it off tomorrow. There’s still value in testing live experiences for behavioral signals that simulated personas can’t fully replicate, at least for now. Human beings are unpredictable (I have teenagers; I can confirm this), and the gap between simulated behavior and real behavior is still real.

But the trajectory is clear, and the planning window is shorter than you might think. Here’s where I’d focus:

Start running parallel tracks. Keep your live experimentation program, but begin building the capability to pre-test content against AI personas before publication. Measure the delta between what your simulated testing predicts and what your live tests confirm. As that delta shrinks, and it will, you’ll have the evidence to shift investment.
Start evaluating your content against AI consumption metrics alongside human engagement metrics. Your experimentation program should be measuring whether your content earns citations from LLMs, whether it surfaces in synthesized answers, whether your brand position survives the compression of the invisible editor. Traditional A/B testing was never designed to measure any of that.
And if you’re one of the many brands that never had enough traffic to run meaningful multivariate tests in the first place, recognize that the constraint just evaporated. AI-native experimentation doesn’t require a single visitor to your site. The barrier to entry that kept rigorous testing out of reach for mid-market and niche brands is gone.

The bigger picture

This connects back to the H2A→A2H framework in a way that I think matters. The old experimentation model was built for an H2H world; a world where you published content for humans, tested it on humans, and optimized it for humans. The H2A→A2H world adds a new constituency that sits upstream of your human audience and makes editorial decisions about whether they see your content at all. Experimentation has to evolve to serve both.

The brands that figure this out won’t just have better conversion rates. They’ll stop paying the learning tax on their audience, they’ll unlock experimentation capabilities that were previously gated by traffic volume, and they’ll start optimizing for the audience that actually determines their visibility.

The crash test dummies are about to be retired. And honestly, it’s about time.

Upcoming Events

RAISE Amsterdam

May 20, 2026 – Amsterdam, Netherlands

Sponsored by Kontent.ai, RAISE Amsterdam brings together bold marketing and technology leaders to explore how agentic AI is reshaping the way ambitious brands create and deliver content and what it takes to use it safely, responsibly, and with proper governance. Experience the ideas that are shaping the future and dive into big thinking, innovative strategies, and expert insights. Held at the Klein Canvas, Volkshotel, you'll hear from world-class speakers in dynamic sessions that will reveal how AI-powered content operations are accelerating production, improving governance, and driving creative impact. Limited spots are available, so book yours today.

Umbraco Codegarden 2026

June 10–11, 2026 – Copenhagen, DK

Join us in Copenhagen (or online) for the biggest Umbraco conference in the world – two full days of learning, genuine conversations, and the kind of inspiration that brings business leaders, developers, and digital creators together. Codegarden 2026 is packed with both business and tech content, from deep-dive workshops and advanced sessions to real-world case studies and strategy talks. You’ll leave with ideas, strategies, and knowledge you can put into practice immediately. Book your tickets today.

CMS Connect 26

August 5-6, 2026 – Montreal, Canada

The best conferences create space for honest, experience-based conversations. Not sales pitches. Not hype. Just thoughtful exchanges between people who spend their days designing, building, running, and evolving digital experiences. CMS Connect brings together people who share real stories from their work and platforms and who are interested in learning from each other on how to make things better. Over two days in Montreal, you can expect practitioner-led talks grounded in experience, conversations about trade-offs, constraints, and decisions, and time to compare notes with peers facing similar challenges. Space is limited for this exclusive event, so book your seats today.

Open Source CMS 26

October 20–21, 2026 – Utrecht, Netherlands

Join us for the first annual edition of our prestigious international conference dedicated to making open source CMS better. This event is already being called the “missing gathering place” for the open source CMS community – an international conference with confirmed participants from Europe and North America. Be part of a friendly mix of digital leaders from notable open source CMS projects, agencies, even a few industry analysts who get together to learn, network, and talk about what really matters when it comes to creating better open source CMS projects right now and for the foreseeable future. Book your tickets today.

Contentstack ContentCon 2026

September 30 - October 1, Amsterdam / October 27-28, 2026 – Chicago

Contentstack’s annual customer conference is the premier event for executives, marketing leaders, and developers to redefine their digital experience strategy. This is your opportunity to step out of the "status quo" and into "elite" status, learning exactly how the world’s most successful brands are using the technology you already own to do the impossible. Enjoy a full day of interactive workshops, certifications, and inspirational on-stage sessions designed to help you become an expert on cutting-edge digital strategies and how to turn Contentstack's CMS and adaptive personalization tools into your greatest competitive advantage. Book your seats today.

Artificial Intelligence

artificial intelligence