What is A/B Test Planning?

A/B test planning involves calculating the sample sizes and test duration needed to detect meaningful differences between variants with statistical confidence. Proper planning ensures your experiments are valid and avoids wasted time on underpowered tests.

This A/B test calculator helps you determine how many visitors you need and how long to run your test. Input your baseline metrics, minimum effect you want to detect, and traffic levels to get accurate sample size requirements.

Key Statistical Concepts

📊 Baseline Conversion Rate

Your current conversion rate before the test. Lower baselines require larger sample sizes to detect the same relative change.

🎯 Minimum Detectable Effect (MDE)

The smallest improvement you care about detecting. Smaller effects need larger samples. A 10% relative lift on 5% baseline = 5.5% new rate.

💪 Statistical Power

Probability of detecting a real effect when it exists (avoiding false negatives). 80% power is standard—you'll detect 80% of real effects.

✅ Significance Level

Confidence that a detected effect is real, not random chance. 95% significance = 5% chance of false positive (Type I error).

Sample Size Trade-offs

Parameter Change	Effect on Sample Size
Smaller MDE (detect smaller effects)	↑ Larger sample needed
Higher power (90% vs 80%)	↑ Larger sample needed
Higher significance (99% vs 95%)	↑ Larger sample needed
Lower baseline conversion	↑ Larger sample needed

A/B Testing Best Practices

Calculate before testing: Determine sample size upfront. Don't peek and stop early.
Run full duration: Stopping early inflates false positives. Complete the planned runtime.
Be realistic about MDE: Detecting 2% lifts requires huge samples. Focus on bigger bets.
Account for weekly cycles: Run tests in full-week increments to capture day-of-week variation.
One primary metric: Decide your success metric upfront. Multiple metrics need correction.

Frequently Asked Questions

What if I can't wait that long?

Accept a larger MDE (only detect bigger effects), use more traffic, or consider sequential testing methods that allow earlier stopping with proper corrections.

Why is my required sample so large?

Low baseline rates and small MDEs require large samples. A 1% baseline trying to detect 5% relative lift needs ~150,000+ per variant. Consider testing bigger changes.

Can I test multiple variants?

Yes, but you need to multiply sample size by number of variants and apply statistical corrections (Bonferroni or similar) to maintain significance.

A/B Test Planner

Test Parameters

Traffic Settings

Test Summary

Related Tools

Latency Estimator

AI ROI Calculator

AI Spend Dashboard Template

Token Usage Tracker