How long should I run an A/B test?

At minimum, run tests for complete business cycles (full weeks to account for weekday vs weekend behavior) and until you reach your pre-calculated required sample size. Most reliable tests run for 2 to 4 weeks. Short tests under 7 days are rarely reliable because they capture unusual days or traffic spikes. Use a sample size calculator before starting and commit to the timeline regardless of early results.

What is the minimum sample size for a valid A/B test?

Sample size depends on your baseline conversion rate, the minimum lift you want to detect, your desired confidence level, and statistical power. For a landing page converting at 3% where you want to detect a 20% relative lift (to 3.6%), you need roughly 9,000 visitors per variant at 95% confidence and 80% power. Online sample size calculators (Optimizely, Evan Miller) will give you the exact number for your inputs.

Can I test more than two variants at once?

Yes, but you must adjust your significance threshold for multiple comparisons. Testing five variants simultaneously with a 5% false positive rate for each gives you roughly a 23% chance that at least one false positive appears across the test. Use a Bonferroni correction (divide your alpha by the number of comparisons) or switch to a multi-armed bandit approach for tests with more than 3 to 4 variants.

What should I do if my test never reaches significance?

First, confirm you have reached the pre-calculated required sample size. If you have and the result is still not significant, the true effect is likely smaller than your minimum detectable effect. Either the change has no meaningful impact on conversion, or the lift is below the threshold worth optimizing for. Document the result, archive the test, and move to a higher-impact hypothesis.

Marketing & Advertising

A/B Test Significance Calculator

Determine statistical significance for conversion rate tests using a two-proportion z-test with p-value, confidence level, relative lift, and statistical power calculations.

Test Data

Control (A)

Visitors

Conversions

Variant (B)

Visitors

Conversions

Confidence Target (%)

Statistical Results

Enter your test data for both variants, then click calculate.

Embed This Calculator on Your Website

Add this free calculator to your blog, website, or CMS with a simple copy-paste embed code.

Introduction

Calling a winner too early is one of the most expensive mistakes in conversion rate optimization. A variant that shows a 15% lift after 200 conversions might revert to baseline performance after 2,000 conversions, because the early data was noise, not signal. According to research published by Evan Miller on A/B testing statistics, the majority of A/B tests that are stopped early show false positives, with teams declaring winners on insufficient data. Statistical significance is the safeguard against this. A result is statistically significant when the probability that the observed difference occurred by chance falls below your chosen threshold, typically 5% (95% confidence level). This calculator takes your control and variant conversion data and returns the statistical significance, the p-value, and whether you have enough traffic to trust the result.

What This Calculator Does

This calculator takes the number of visitors and conversions for both your control (A) and variant (B) groups, then returns the statistical significance of the difference, the p-value, the relative lift in conversion rate, and whether you have reached the minimum sample size required to trust the result at your chosen confidence level. Use it before stopping any A/B test to confirm the result is reliable, not a random fluctuation.

The Formula

Z-Score = (p_B - p_A) / sqrt(p_pooled × (1 - p_pooled) × (1/n_A + 1/n_B)) | p-value derived from Z-score using standard normal distribution

p_A and p_B are the conversion rates for control and variant (conversions / visitors). p_pooled is the combined conversion rate across both groups. n_A and n_B are the sample sizes. The Z-score measures how many standard deviations the observed difference is from zero. A Z-score above 1.96 corresponds to 95% confidence (p < 0.05). Above 2.58 corresponds to 99% confidence (p < 0.01). The p-value is the probability of seeing a difference this large or larger if there were truly no difference between A and B.

Step-by-Step Example

Record control and variant data

Control (A): 4,200 visitors, 168 conversions = 4.0% conversion rate. Variant (B): 4,150 visitors, 199 conversions = 4.8% conversion rate. Relative lift: (4.8% - 4.0%) / 4.0% × 100 = 20% improvement.

Calculate pooled conversion rate

p_pooled = (168 + 199) / (4,200 + 4,150) = 367 / 8,350 = 4.395%.

Calculate Z-score

Standard error = sqrt(0.04395 × 0.95605 × (1/4,200 + 1/4,150)) = sqrt(0.04395 × 0.95605 × 0.000477) = sqrt(0.00002005) = 0.004478. Z = (0.048 - 0.040) / 0.004478 = 1.787. At Z = 1.787, p-value ≈ 0.074 (7.4%). This result is not statistically significant at the 95% confidence level.

Decide: continue or stop

Since p = 0.074 > 0.05, do not call a winner. Continue the test until you reach the required sample size. The Minimum Detectable Effect (MDE) for this baseline with 80% power at 95% confidence requires approximately 5,500 visitors per variant to detect a 20% relative lift. Run the test for another week.

Real-World Use Cases

E-commerce Product Page CTA Test

A retailer tests 'Add to Cart' vs 'Buy Now' on a product page generating 3,000 visits per week. After two weeks (6,000 visits per variant), Control: 3.2% CVR, Variant: 3.9% CVR. Z-score: 2.41, p = 0.016. Statistically significant at 95%. Expected revenue lift: 6,000 weekly visitors × 0.7% additional CVR × $85 AOV = $357 additional weekly revenue. The team rolls out the winner.

Email Subject Line Testing

A brand sends a test to 5,000 subscribers: 2,500 receive subject A (28.4% open rate), 2,500 receive subject B (31.2% open rate). Z-score: 2.09, p = 0.037. Significant at 95%. But the team notes the absolute difference is 2.8 percentage points on a list of 50,000 total subscribers, representing 1,400 additional opens per campaign. At their $0.40 revenue per email open, this is worth $560 per send.

Landing Page Headline Test with Insufficient Sample

A SaaS company tests a landing page headline after 3 days. Control: 480 visitors, 19 conversions (3.96%). Variant: 465 visitors, 26 conversions (5.59%). Relative lift of 41% looks exciting. Z-score: 1.64, p = 0.101. Not significant. The team resists pressure to call a winner and continues the test for two more weeks. Final result after 3,800 per variant: the lift shrinks to 12% and reaches 95% significance, a real but smaller win.

Comparison

Confidence Level	Z-Score Threshold	p-value	Use Case
90%	1.645	< 0.10	Low-stakes tests, internal tools
95% (standard)	1.960	< 0.05	Most A/B tests, landing pages, CTAs
99%	2.576	< 0.01	Pricing pages, checkout flow, high-revenue changes
99.9%	3.291	< 0.001	Major site-wide rollouts, rebranding decisions

Common Mistakes to Avoid

Peeking at results and stopping when it looks significant. If you check an A/B test daily and stop as soon as p < 0.05 appears, your true false positive rate is not 5%, it is often 20% or more. Pre-commit to your sample size and end date before running the test, and do not act on results before reaching the predetermined endpoint.
Running tests on segments with different traffic sources. If your control gets 70% organic traffic and your variant gets 70% paid traffic, you are not testing the headline. You are testing the audience. Ensure randomization is truly random and that traffic composition is balanced across variants before drawing conclusions.
Confusing statistical significance with practical significance. A 0.1% absolute lift in conversion rate that reaches 99% statistical significance at 500,000 visits is mathematically real but may not be worth implementing if the development cost exceeds the revenue impact. Always translate the result into dollar impact before deciding to ship.

Frequently Asked Questions

Accuracy and Disclaimer

This calculator provides statistical significance estimates based on the conversion data you enter. Results assume a two-tailed z-test for proportions with independent samples. Statistical significance does not guarantee practical significance or long-term performance. Consult a data scientist or CRO specialist for complex experimental designs or high-stakes decisions.

Conclusion

Statistical significance tells you whether a result is real. It does not tell you whether it is meaningful or permanent. A 2% lift in conversion rate that is 99% statistically significant can still be outweighed by implementation costs or segment effects. Always pair significance with practical significance: is the lift large enough to matter? After confirming significance here, use the ROAS Calculator to quantify the revenue impact of the lift before committing to the change, and monitor the winning variant for 2 to 4 additional weeks post-launch to confirm the effect holds.

Related Marketing & Advertising Calculators

Marketing & Advertising

ROAS Calculator

Calculate return on ad spend, net profit after COGS, cost per conversion, and break-even ROAS for any paid advertising campaign using 2026 channel benchmarks.

Use Calculator Marketing & Advertising

Email Marketing ROI Calculator

Calculate email marketing ROI, revenue per email, subscriber lifetime value, and cost per subscriber from open rates, click rates, and conversion data using 2026 benchmarks.

Use Calculator Marketing & Advertising

Social Media Ad Budget Calculator

Estimate impressions, clicks, conversions, and expected reach from your social media ad budget using 2026 CPM and CTR benchmarks for Facebook, Instagram, TikTok, LinkedIn, YouTube, and X.

Use Calculator Marketing & Advertising

Content Marketing ROI Calculator

Measure content marketing ROI by comparing organic traffic value and lead generation against content production costs, tools, and team expenses with payback period analysis.

Use Calculator Marketing & Advertising

SEO Traffic Value Calculator

Calculate the dollar equivalent of your organic search traffic by applying industry-specific CPC rates to monthly visits, with branded and non-branded traffic segmentation.

Use Calculator Marketing & Advertising

Influencer Pricing Calculator

Estimate influencer rates by follower count, engagement rate, platform, content type, and niche with cost per engagement and CPM analysis using 2026 creator economy data.

Use Calculator

You May Also Find Useful

Finance & Accounting

A/B Test Significance Calculator

Embed This Calculator on Your Website

Introduction

What This Calculator Does

The Formula

Step-by-Step Example

Real-World Use Cases

E-commerce Product Page CTA Test

Email Subject Line Testing

Landing Page Headline Test with Insufficient Sample

Comparison

Common Mistakes to Avoid

Frequently Asked Questions

Accuracy and Disclaimer

Conclusion

Related Marketing & Advertising Calculators

ROAS Calculator

Email Marketing ROI Calculator

Social Media Ad Budget Calculator

Content Marketing ROI Calculator

SEO Traffic Value Calculator

Influencer Pricing Calculator

You May Also Find Useful

Tax Calculator

Salary to Hourly Calculator

Commission Calculator