Introduction
Have you ever sat in a meeting where someone asked "So what's the actual revenue impact of all these experiments?"
Sure, your win rates look impressive on slides. But translating testing success into revenue impact? That's where most experimentation programs hit a wall because:
- Programs get stuck in the "win rate trap" "Your win rates look good, but the revenue impact isn't showing up"
- Complex experiments feel risky "What if we invest resources and it doesn't pay off?"
- Scaling seems overwhelming "How do top programs run 400+ experiments while we struggle with 30?"
- Getting buy-in is a constant battle "Leadership wants results, but won't commit resources..."
Like you, we are convinced your testing program can deliver more value.
However, is your program even ready to scale?
- Are you measuring what matters?
- Is your infrastructure ready for more tests?
- Do you have real cross-team buy-in?
Try the full Digital Maturity Assessment to get your detailed readiness score.
The path forward: Our analysis of 127,000 experiments reveals the gap between good and great experimentation programs isn't about tools or talent, it's about approach.
While quick wins and test velocity matter when starting out, mature programs need to move beyond simple win rates toward true program quality.
In this guide, we'll show you exactly how they do it, the number side of their experimentation programs, the metrics they use, and how you can scale too.
Ensure you're setting the right metrics
Here’s the truth about experimentation: Not all metrics are created equal.
Over 90% of experiments target 5 common metrics:
- CTA Clicks (34.8% of experiments, 1.4% impact)
- Revenue (28.2% of experiments, 0.4% impact)
- Checkout (16.2% of experiments, 0.7% impact)
- Registration (9.4% of experiments, 2.0% impact)
- Add-to-cart (4.2% of experiments, 1.2% impact)
However, the data shows that 3 of those top 5 metrics have relatively low expected impact.
Could you accidentally be ignoring metrics that can make a difference? For example, search optimization shows a 2.3% expected impact but is used in only 1.3% of experiments.
Effective experimentation metrics should be predictive, actionable, and holistic - capturing not just immediate results, but potential long-term value, customer behavior shifts, and strategic learning opportunities.
Here’s how to build your metrics framework:
- Define your metric hierarchy
- Primary metrics: Direct revenue impact (conversion rate, AOV, revenue per visitor)
- Secondary metrics: Supporting indicators (add-to-cart rate, engagement time)
- Guardrail metrics: Protect user experience (page load time, error rates)
- Journey metrics: Track full customer path (cross-page behavior, return rates)
2. Map to customer journey stages
- Awareness: Time on site, page views, bounce rate
- Consideration: Product views, search usage, calculator interactions
- Decision: Cart adds, checkout starts, purchases
- Retention: Repeat visits, account logins, reorder rates
3. Set up proper measurement
- Configure metrics in the Experimentation dashboard
- Set up event tracking
- Establish statistical thresholds (95% confidence recommended)
- Define minimum sample size requirements
4. Align with business goals
- Revenue metrics: AOV, LTV, revenue per visitor
- Efficiency metrics: Conversion rate, bounce rate
- Experience metrics: Satisfaction score, support tickets
- Strategic metrics: Market share, competitive position
Review and adjust your metrics quarterly. As your program matures, shift focus from surface metrics (like button clicks) to compound metrics that reveal deeper insights (like lifetime value impact).
More on how to choose experimentation metrics
But measuring impact is only half the battle. You need the right analytics infrastructure to turn these metrics into actionable insights.
The role of analytics
Analytics isn't just an interrogation of data. It's about having the ability to think critically through assisted use of data.
However, most businesses can't scale because they can't keep their experimentation data in-house. Companies using warehouse-native analytics drive better experiment performance.
Here's how:
- Test directly against your warehouse metrics, from revenue to lifetime value, without complex data pipelines.
- Generate instant cohort insights without writing complex queries for every analysis.
- Run experiments across web, email, and CRM using one unified Stats Engine.
- Keep your sensitive data exactly where it belongs - in your warehouse
- One source of truth means teams can focus on insights, not reconciling reports
With metrics and analytics in place, you can focus on scaling your testing velocity.
Vijay Ganesan, VP of Software Engineering, discusses what warehouse-native analytics actually means.
Increase testing velocity
The median company runs 34 experiments per year. The top 3% run over 500. To be in the top 10%, you need to be running 200 experiments annually.
But here's what most articles won't tell you: Running more tests isn't enough. You need the right team and infrastructure to handle that velocity.
Why most programs hit a wall
- Never having enough developer resources
- Poor test prioritization
- No standardized process
- Insufficient QA
- Siloed teams
You can break the wall by protecting your developer resources.
Because you will never have enough resources to build everything you want to do.
The highest expected impact occurs at 1-10 annual tests per engineer. Move beyond 30 and the expected impact drops by 87%.
Volume at the cost of quality can harm performance and the expected impact of your experiments.
We suggest one experiment per developer per two-week sprint. Score every test idea from 1-10 on Potential (revenue impact), Importance (strategic value), and Ease (resource needs). Multiply the scores to prioritize your roadmap.
Next, start building your core team with an Experimentation Lead, Product Manager, and Developer. Then add specialists (Marketing, Design, Data Science) as your program grows.
Run more complex experiments
The highest-performing experiments make substantial code changes and test multiple variations simultaneously.
As per the data:
- 77% of experiments test only two variations (A/B)
- Yet tests with 4+ variations are 2.4x more likely to win
- They also deliver 27.4% higher uplifts
Great experiments boldly enhance user experience and stay open to various paths.
When scaling, the focus needs to shift to not just velocity, but also bigger changes and impact.
Here are the types of tests advanced experimentation programs are running:
- Multivariate testing: Test multiple variables simultaneously
- Server-Side testing: More powerful, more flexible, more secure
- Feature flags: Safe releases and gradual rollouts
- Multi-armed bandit: Automated optimization at scale
Top programs are also using a framework to build and implement complex experiments.
1. Strategic foundation
- Start with server-side infrastructure
- Build feature flagging capabilities
- Establish clear success metrics
2. Smart design
- Test multiple variations (4+ variants show 2.4x more impact)
- Consider interaction effects
- Plan progressive rollouts
3. Risk management
- Monitor system performance
- Create clear rollback procedures
- Document learning patterns
Start with 20% of traffic and gradually increase based on monitoring results. Top programs typically reach full traffic within 2-3 deployment cycles.
Personalization in motion
One size fit all is no longer a viable digital marketing approach. You can't just push the same website recommendations to a broad audience.
Ask yourself: Do you even resonate?
Still, most digital businesses avoid personalization due to resource constraints, uncertainty about customer preferences, and the complexity of implementing tailored experiences.
Half of all experiments today use a personalization strategy. It generates a 41% higher impact compared to general experiences.
When personalizing, keep in mind:
- WHAT: The change you want to make in the default digital experience
- WHO: The specific user or group you want to deliver it to
- WHY: If the change meets the original objective/goal
- WHERE: The platform you’ll use to create a personalized experience
Personalization examples:
- Send targeted offers to shoppers for their favorite products based on browsing behavior.
- Offer travel promotions for different locations based on the current weather or season.
- Show video content to viewers based on where they live and what they search for.
It's all about creating customer journeys that provide a comprehensive view from the customers' perspective.
Here's Nicola Ayan, VP, Solution Strategy discussing where to get started with personalization.
Build a strong culture of experimentation
2025 is about proving your revenue impact to stakeholders
Running an experimentation program isn't just about launching tests, it's about building a culture that delivers real business impact.
1. Connect experiments to business impact
Every experiment should be traced back to business value. Build your goal tree to:
- Align teams on key metrics
- Show clear impact paths
- Guide experiment prioritization
- Track program success
2. Clear roles and ownership
Agreeing on roles and responsibilities is important. For this, you can use the RASCI model.
- R = Responsible = Owns the stage’s completion
- A = to whom “R” is Accountable for = Who is on the hook for the success of the stage
- S = Supportive = Can provide resources or support the completion of the stage
- C = Consulted = Has information and / or capability necessary to complete the stage
- I = Informed = Must be notified of completion of stage, but need not be consulted
3. Balance risk and reward
While senior leaders tend to play it safe with incremental wins (15% higher win rates), junior teams often drive bigger breakthroughs (27% higher uplifts).
The key question isn't just "Will it win?" but "What's the potential impact?"
Would you rather have tests that win 50% of the time but only deliver $100 in revenue? OR tests that win 10% of the time but drive million-dollar uplifts?
After all, every experiment delivers value, whether it's:
- Winning tests that drive revenue
- Losing tests that prevent harmful changes
- Inconclusive results that save resources from low-impact areas
So, build a culture that allows your teams to take the right risk when the opportunity occurs.
Here’s how DocuSign built their culture of experimentation.
Further, having a close relationship with the changing priorities of the wider business is essential for the prioritization of your tests and the growth of your team. Avoid being siloed.
To learn more, check out this leadership guide to experimentation.
AI in experimentation
In the past, launching a new test was a slow, cumbersome process. Teams would spend months perfecting their hypotheses, worrying about sample sizes, and manually analyzing results. But with AI, that approach feels as outdated as dial-up modems.
Here's how AI is changing every step of the experimentation process:
1. Smarter test design
The days of simple A/B testing are over. AI now analyzes user behavior patterns, historical test data, and market trends to suggest tests with the highest potential impact. Think of it as having a data scientist who's analyzed millions of tests helping you decide what to test next.
As users interact with your experiments, AI tracks which variations they pause on, how long they engage with specific elements, and their browsing patterns. It even incorporates contextual intelligence - time of day, device type, geographic location, seasonal trends - to refine test designs in real time.
2. Beyond rules-based testing
Traditional experimentation relied heavily on if/then logic. If a user does X, show them variation Y. But as your testing program grows, accounting for all possible segments becomes impossible.
AI helps you move beyond these limitations and create personalized experiences, managing thousands of test variations simultaneously while optimizing for each user segment.
You can
- Present the most relevant variations to individual users
- Reduce friction in the testing process
- Create more compelling, targeted experiences
- Optimize resource allocation automatically
3. Optimization
Instead of waiting weeks for results, AI continuously monitors test performance and adjusts traffic allocation. It analyzes comprehensive data points:
- Clickstream behavior
- Time spent on variations
- Mouse movement patterns
- Cross-test interactions
- Contextual information
This enables rapid iteration and learning, with AI predicting test outcomes more accurately. Overall, AI will help you:
- Quickly identify winning combinations
- Scale successful tests across segments
- Roll back unsuccessful changes instantly
- Maintain guardrails for user experience
You can even move beyond surface metrics to measure real business impact and deliver products without risks. It's a system that will continuously learn and optimize.
Director, Digital Marketing Michiel Dorjee discusses AI's impact on experimentation program
Making it all work together
Your experimentation program is like an engine where all parts need to work together. Here's your scaling checklist:
Before scaling, ensure you have:
- Metrics that matter: Beyond vanity metrics to real revenue impact
- The right team: Core players and specialists aligned on goals
- Advanced testing capabilities: From simple A/B to sophisticated MVT
- Smart personalization: Tailored experiences that drive engagement
- Strong culture: Teams empowered to take calculated risks
- AI acceleration: Automated insights and optimization
Scaling isn't about running more tests. It's about running smarter tests that drive real business value.
Keep learning
Want to dive deeper?
Download our complete Evolution of Experimentation Report for insights from 127,000+ experiments and detailed frameworks for scaling your program.