A/B Test Planner

Calculate sample sizes and duration for statistically valid A/B tests

Test Parameters

Traffic Settings

50%
Sample Size/Variant
31,200
Total Sample Size
62,400
Days Required
125 days

Test Summary

To detect a 10% relative change in your 5% baseline conversion rate with 80% power and 95% significance, you need 31,200 visitors per variant (62,400 total). At 1,000 daily visitors with 50% in test, this will take approximately 125 days.

Related Tools

What is A/B Test Planning?

A/B test planning involves calculating the sample sizes and test duration needed to detect meaningful differences between variants with statistical confidence. Proper planning ensures your experiments are valid and avoids wasted time on underpowered tests.

This A/B test calculator helps you determine how many visitors you need and how long to run your test. Input your baseline metrics, minimum effect you want to detect, and traffic levels to get accurate sample size requirements.

Key Statistical Concepts

📊 Baseline Conversion Rate

Your current conversion rate before the test. Lower baselines require larger sample sizes to detect the same relative change.

🎯 Minimum Detectable Effect (MDE)

The smallest improvement you care about detecting. Smaller effects need larger samples. A 10% relative lift on 5% baseline = 5.5% new rate.

💪 Statistical Power

Probability of detecting a real effect when it exists (avoiding false negatives). 80% power is standard—you'll detect 80% of real effects.

✅ Significance Level

Confidence that a detected effect is real, not random chance. 95% significance = 5% chance of false positive (Type I error).

Sample Size Trade-offs

Parameter ChangeEffect on Sample Size
Smaller MDE (detect smaller effects)↑ Larger sample needed
Higher power (90% vs 80%)↑ Larger sample needed
Higher significance (99% vs 95%)↑ Larger sample needed
Lower baseline conversion↑ Larger sample needed

A/B Testing Best Practices

  • Calculate before testing: Determine sample size upfront. Don't peek and stop early.
  • Run full duration: Stopping early inflates false positives. Complete the planned runtime.
  • Be realistic about MDE: Detecting 2% lifts requires huge samples. Focus on bigger bets.
  • Account for weekly cycles: Run tests in full-week increments to capture day-of-week variation.
  • One primary metric: Decide your success metric upfront. Multiple metrics need correction.

Frequently Asked Questions

What if I can't wait that long?

Accept a larger MDE (only detect bigger effects), use more traffic, or consider sequential testing methods that allow earlier stopping with proper corrections.

Why is my required sample so large?

Low baseline rates and small MDEs require large samples. A 1% baseline trying to detect 5% relative lift needs ~150,000+ per variant. Consider testing bigger changes.

Can I test multiple variants?

Yes, but you need to multiply sample size by number of variants and apply statistical corrections (Bonferroni or similar) to maintain significance.