Categories
Feature Resources

A Complete Guide to Designing, Running, and Interpreting Controlled Experiments (A/B testing)

For Smarter Problem Fixes Using Split Testing

Businesses and organisations constantly face challenges that require quick yet reliable solutions. Whether you’re optimising a website, improving an app, or even testing a new marketing strategy, guesswork simply won’t cut it. Instead, controlled experiments provide a data-driven way to solve problems effectively. One of the most popular and practical forms of controlled experimentation is split testing or A/B testing.

Split testing, also known as A/B testing, is a method where two or more variations of a single element are compared to determine which performs better against a defined goal. This approach eliminates assumptions and focuses on measurable results, leading to smarter fixes and better decisions.

This comprehensive guide will take you through everything you need to know about designing, running, and interpreting controlled experiments using split testing. You’ll learn how to create testable hypotheses, choose your variants, decide on success metrics, understand sample sizes, analyse results, and avoid common pitfalls. We’ve even included practical prompts and a handy template so you can start applying these principles right away.

Let’s dive in.


Why Controlled Experiments and Split Testing?

Before we get into the specifics, it helps to understand why controlled experiments are so valuable.

Imagine you want to improve the conversion rate of a signup page. You might believe changing the call-to-action button colour from blue to green will help. But what if it doesn’t? Or worse, what if it actually reduces conversions?

A controlled experiment provides a structured way to answer this question with evidence rather than intuition.

Split testing works by randomly dividing your audience into groups. Group A sees the original version (known as the control), and Group B sees the new version (the variant). By tracking a predefined success metric (such as clicks, signups, or purchases), you can compare how each version performs statistically.

The best part is the rigorous control for confounding factors. Since the groups are randomly assigned and run simultaneously, external influences (like time of day or sudden market changes) impact both groups equally. This means any difference in outcomes is very likely due to the change being tested.


Step 1: Designing Your Controlled Experiment

1. Define a Clear, Testable Hypothesis

The foundation of any good experiment is a hypothesis — a statement predicting how a change will affect an outcome. However, not all hypotheses are created equal.

A vague hypothesis might say something like:

“Visitors will like the new button.”

This isn’t testable because it’s unclear what “like” means or how it will be measured.

A testable hypothesis should clearly state:

  • What you are changing
  • What you expect to happen
  • How you will measure success

For example:

“Changing the call-to-action button colour from blue to green will increase click-through rates by at least 10% within two weeks.”

This hypothesis is specific, measurable, and time-bound — all great qualities that keep your experiment focused.

Prompt for practice:
Rewrite this vague hypothesis into a testable one: 
“Users will prefer the new layout.”

Example rewrite:
“Introducing a simplified homepage layout will increase the average session duration by 15% over 14 days.”


2. Identify Your Variants

Your experiment needs at least two versions to test: the control and one or more variants.

  • Variant A (Control): The current version of the element you’re testing.
  • Variant B (Variant): The modified version you want to test against the control.

Depending on your resources and objectives, you can test multiple variants at once (known as A/B/n testing), but keep in mind this requires larger sample sizes and more complex analysis.

Example:
Control: Original blue “Sign Up” button 
Variant: Green “Sign Up” button 


3. Choose Your Success Metric

Pick a metric that directly reflects the goal of your experiment. This is how you will measure whether your change had a positive effect.

Common success metrics include:

  • Click-through rate (CTR)
  • Conversion rate (number of users completing desired actions divided by total visitors)
  • Average session duration
  • Bounce rate
  • Revenue per visitor

Make sure the metric is quantifiable and relevant to your hypothesis.


4. Determine Minimum Sample Size

A frequent stumbling block is not collecting enough data before making decisions. If your sample size is too small, your results might be inconclusive or misleading.

To calculate the minimum sample size, you need:

  • Your current baseline conversion rate (from historical data) 
  • The minimum detectable effect size (the smallest improvement you care about) 
  • Desired statistical significance level (often set at 95%) 
  • Desired power (probability of correctly detecting an effect when there is one, commonly 80%)

Fortunately, there are many online calculators you can use by inputting these values.

Prompt for practice:
Propose a minimum sample size for a test given 10,000 monthly visitors, a baseline conversion rate of 5%, and an expected improvement of 10%.

Example response:
With 10,000 visitors per month and a baseline conversion rate of 5%, detecting a 10% relative increase (from 5% to 5.5%) at 95% confidence and 80% power typically requires around 20,000 visitors total, or approximately two months of traffic split evenly between control and variant groups.

This means you would run the test for two months or until you reach the required number of visitors, whichever comes first.


Step 2: Running Your Controlled Experiment

1. Randomly Assign Users to Variants

Proper randomisation ensures that users have an equal chance of seeing any variant. This prevents bias in your results.

Most split testing tools handle this automatically, often by randomising based on user sessions or cookies.


2. Keep the Experiment Running Long Enough

The temptation is to stop the test as soon as you see a positive (or negative) result. Resist this urge.

Statistical significance requires adequate data collected over a representative period. Short test durations may be influenced by anomalies or daily fluctuations.

A good rule of thumb is to run tests for at least one full business cycle — often one or two weeks — depending on your traffic volume.


3. Avoid Interfering During the Test

Don’t make other unrelated changes to the tested pages or elements during the experiment. Doing so introduces confounding variables that make it impossible to interpret results accurately.


Step 3: Interpreting Your Controlled Experiment Results

1. Analyse Statistical Significance

Once the test concludes, analyse if the observed differences are statistically significant — meaning they are unlikely to have occurred by chance.

If you used a split testing platform, this analysis is often done automatically, presenting results with p-values, confidence intervals, and conversion rates.

A p-value less than 0.05 (corresponding to 95% confidence) typically indicates significance.


2. Consider Effect Size and Business Impact

Statistical significance alone is not enough. A very small improvement may be statistically significant with huge sample sizes but not meaningful for your business.

Look at the actual effect size (percentage improvement) and estimate its impact on revenue, user experience, or other relevant KPIs.


3. Make Decisions Based on Data

If Variant B significantly outperforms Variant A, consider rolling out the change permanently.

If no significant difference emerges, you may conclude that the change had no impact and avoid unnecessary redesigns.


Practical Template for Your Split Tests

To organise your split testing process, try using this simple document template:

ComponentDescriptionExample
HypothesisThe clear, testable prediction you want to evaluateChanging the signup button colour from blue to green will increase CTR by 10% within 14 days.
Variant AThe original/control versionBlue signup button
Variant BThe new/test versionGreen signup button
Success MetricHow you will measure the outcomeClick-through rate (CTR)
Minimum Sample SizeEstimated required number of visitors/users20,000 visitors total over 2 weeks

Using such structure keeps you focused and accountable throughout your experimentation.


Common Pitfalls to Avoid

1. Stopping the Test Too Early

It is extremely tempting to declare a winner early when results look promising. This practice is called “peeking” and can easily lead to false positives.

Stopping early increases the chance of declaring an effect that does not exist. Always wait until your pre-calculated sample size or duration is complete.


2. Multiple Testing Without Adjustment

Running multiple tests or testing many variants at once increases the chance of finding at least one false positive just by luck.

If you must run multiple tests simultaneously, adjust your statistical significance thresholds accordingly through methods like the Bonferroni correction or control the false discovery rate.


3. Ignoring User Segments or Context

Sometimes an experiment shows no overall effect but hides differing impacts in subgroups (for example, mobile vs desktop users).

Be careful when slicing data and avoid “data dredging” or post-hoc analysis without proper statistical controls.


4. Insufficient Sample Size

Small samples produce unreliable results. If traffic is limited, consider alternative study designs or longer test durations.


Final Thoughts

Controlled experiments using split testing provide a powerful framework for making better decisions by removing guesswork and allowing you to learn directly from your users’ behaviour.

By carefully designing your hypothesis, choosing meaningful metrics, gathering enough data, and analysing results rigorously, you can confidently fix problems and optimise outcomes in a smart, cost-effective way.

Remember to embrace a disciplined approach, avoid the common pitfalls, and use the practical template provided here as a foundation for your tests.

The next time you face a problem needing a fix, think “experiment first” rather than “guess and hope.” Your results will thank you.


Ready to start? Here’s a quick action plan:

  1. Pick a clear challenge or optimisation goal you want to address.
  2. Formulate a testable hypothesis using the template above.
  3. Identify your control and variant(s).
  4. Choose a quantifiable success metric.
  5. Calculate your minimum sample size using an online calculator.
  6. Set up your split test with a tool like GrowthBook, PostHog, Optimizely, or VWO (there is also Plerdy for less technical teams).
  7. Run the experiment without interference for the calculated duration.
  8. Analyse your results considering significance and business impact.
  9. Roll out winners or iterate further based on data insights.

Controlled experiments are not just for big companies; with the right approach, anyone can apply this methodology to make smarter, evidence-based improvements. Happy testing!

Related posts: