A/B Test Planner
Calculate sample sizes and duration for statistically valid A/B tests
Test Parameters
Traffic Settings
Test Summary
To detect a 10% relative change in your 5% baseline conversion rate with 80% power and 95% significance, you need 31,200 visitors per variant (62,400 total). At 1,000 daily visitors with 50% in test, this will take approximately 125 days.
Related Tools
Latency Estimator
Estimate and compare latency across different LLM providers and models
AI ROI Calculator
Calculate ROI for implementing AI solutions vs human labor costs
AI Spend Dashboard Template
Visualize and track AI API usage and costs across providers
Token Usage Tracker
Track token consumption and estimate costs for various LLM interactions
What is A/B Test Planning?
A/B test planning involves calculating the sample sizes and test duration needed to detect meaningful differences between variants with statistical confidence. Proper planning ensures your experiments are valid and avoids wasted time on underpowered tests.
This A/B test calculator helps you determine how many visitors you need and how long to run your test. Input your baseline metrics, minimum effect you want to detect, and traffic levels to get accurate sample size requirements.
Key Statistical Concepts
📊 Baseline Conversion Rate
Your current conversion rate before the test. Lower baselines require larger sample sizes to detect the same relative change.
🎯 Minimum Detectable Effect (MDE)
The smallest improvement you care about detecting. Smaller effects need larger samples. A 10% relative lift on 5% baseline = 5.5% new rate.
💪 Statistical Power
Probability of detecting a real effect when it exists (avoiding false negatives). 80% power is standard—you'll detect 80% of real effects.
✅ Significance Level
Confidence that a detected effect is real, not random chance. 95% significance = 5% chance of false positive (Type I error).
Sample Size Trade-offs
| Parameter Change | Effect on Sample Size |
|---|---|
| Smaller MDE (detect smaller effects) | ↑ Larger sample needed |
| Higher power (90% vs 80%) | ↑ Larger sample needed |
| Higher significance (99% vs 95%) | ↑ Larger sample needed |
| Lower baseline conversion | ↑ Larger sample needed |
A/B Testing Best Practices
- Calculate before testing: Determine sample size upfront. Don't peek and stop early.
- Run full duration: Stopping early inflates false positives. Complete the planned runtime.
- Be realistic about MDE: Detecting 2% lifts requires huge samples. Focus on bigger bets.
- Account for weekly cycles: Run tests in full-week increments to capture day-of-week variation.
- One primary metric: Decide your success metric upfront. Multiple metrics need correction.
Frequently Asked Questions
What if I can't wait that long?
Accept a larger MDE (only detect bigger effects), use more traffic, or consider sequential testing methods that allow earlier stopping with proper corrections.
Why is my required sample so large?
Low baseline rates and small MDEs require large samples. A 1% baseline trying to detect 5% relative lift needs ~150,000+ per variant. Consider testing bigger changes.
Can I test multiple variants?
Yes, but you need to multiply sample size by number of variants and apply statistical corrections (Bonferroni or similar) to maintain significance.
