Profession Calculators
Science & Research

Statistical Power Calculator

Determine the statistical power of your study or calculate required sample size to achieve desired power. Supports Z-tests, T-tests, and Chi-square tests for post-hoc and a priori power analysis.

Share:

This statistical power calculator performs post-hoc power analysis to determine the probability of detecting a true effect given your sample size, effect size, and significance level. Use to evaluate completed studies, assess adequacy of pilot data, or optimize study design. Power of 80% or higher is conventionally considered adequate.

Study Parameters
Effect Size (Proportions)

Embed This Calculator on Your Website

Add this free calculator to your blog, website, or CMS with a simple copy-paste embed code.

Introduction

Published research is riddled with underpowered studies that failed to detect real effects — and, paradoxically, with claimed discoveries that could not be reproduced. The Open Science Collaboration replication study found that only 36% of psychology findings replicated with the same effect size, largely due to inflated original effect estimates in underpowered studies that happened to cross the significance threshold by chance. Statistical power is the probability that a study correctly rejects a false null hypothesis — and 80% is considered the minimum acceptable threshold for well-designed research. A study with 60% power has a 40% chance of missing a real effect entirely, wasting every resource invested. But power is also a post-hoc diagnostic: reviewers, meta-analysts, and journal editors increasingly calculate the power of published studies to evaluate whether negative results are truly negative or just inconclusive. This calculator computes statistical power for any combination of sample size, effect size, and significance level so researchers can evaluate study designs prospectively and audit published findings retrospectively.

What This Calculator Does

This calculator computes statistical power for two common designs: comparison of two independent means (t-test) and comparison of two proportions (z-test). Inputs include sample size per group (or total N for paired designs), expected effect size (Cohen's d for means, or raw proportions for proportions tests), and significance level (alpha). The tool outputs: power as a probability, the probability of Type II error (beta = 1 - power), the required sample size to achieve 80% and 90% power, and a sensitivity analysis showing power across a range of effect sizes for the given N.

The Formula

Power = Φ(|μ1 - μ2| / (σ × √(2/n)) - Z_α/2) for two-means test | Power = Φ(|p1 - p2| / √(p̄(1-p̄) × 2/n) - Z_α/2) where p̄ = (p1+p2)/2 for two-proportions test | Φ = Standard normal cumulative distribution function

For the two-means t-test, power depends on the non-centrality parameter: the true difference in means divided by the standard error of the difference (which depends on the common standard deviation and sample size per group). For proportions, the non-centrality parameter uses the observed difference divided by the standard error under the true proportions. In both cases, the power is calculated as the probability that the test statistic exceeds the critical value given the true effect. Higher N, larger effect, or higher alpha each increase power. Reducing alpha (e.g., from 0.05 to 0.01) substantially reduces power at a given N.

Step-by-Step Example

1

Enter study parameters

Example: two-group experiment already conducted, 45 participants per group (N=90 total). Observed effect size: Cohen's d = 0.35 (small-medium). Significance level: alpha = 0.05 (two-tailed). Standard deviation: 15 units.

2

Calculate the non-centrality parameter

For Cohen's d = 0.35, true difference = 0.35 × 15 = 5.25 units. Standard error of difference: 15 × √(2/45) = 15 × 0.2108 = 3.162. Non-centrality: 5.25 / 3.162 = 1.660.

3

Determine power

Power = Φ(1.660 - 1.960) = Φ(-0.300) = 1 - Φ(0.300) = 1 - 0.618 = 0.382. This study has only 38% power — there was a 62% chance of missing the true effect. A non-significant result here is uninformative, not evidence of no effect.

4

Determine what sample size would have achieved 80% power

For d=0.35, alpha=0.05, 80% power: required n = 2 × [(1.960 + 0.842) / 0.35]² = 2 × [8.006]² = 2 × 64.1 = 128 per group (256 total). The study with N=90 was powered for approximately Cohen's d = 0.52 at 80% power, meaning it was only adequately powered to detect a medium-to-large effect.

Real-World Use Cases

Grant Application Reviewer Assessment

A study section reviewer for NIH receives a grant application reporting a pilot study with N=30 and p=0.08 (not significant). Using the power calculator with the effect size from the pilot (d=0.45) and N=15 per group, the reviewer finds power = 38%. The non-significant pilot result is consistent with an underpowered study detecting a real medium effect, not with no effect. The reviewer recommends funding the properly powered main study.

Systematic Review Power Audit

A meta-analyst reviewing 12 studies on a behavioral intervention calculates the power of each study to detect the pooled effect size (d=0.32) from the meta-analysis. She finds that 8 of 12 studies had power below 50%, explaining why 7 of those 8 reported non-significant results. The meta-analysis conclusion — that the intervention works despite the many negative individual studies — is supported by the power audit showing systematic underpowering rather than a true null effect.

A/B Test Post-Analysis for Tech Researcher

A data scientist ran an A/B test with 1,200 users per variant for 14 days. The conversion rate change was 2.1% vs. 2.5%, not statistically significant at alpha=0.05. Using the power calculator with p1=0.021, p2=0.025, N=1,200, the analysis reveals power of 43% — the test was underpowered to detect a 0.4 percentage point difference. The correct enrollment to achieve 80% power for this effect size is 4,200 per variant.

Comparison

Sample Size per GroupCohen's d = 0.20 (Small)Cohen's d = 0.50 (Medium)Cohen's d = 0.80 (Large)
259%41%76%
5014%70%96%
10022%94%100%
20040%100%100%
40067%100%100%
80092%100%100%

Common Mistakes to Avoid

  • Interpreting low power after a non-significant result as evidence of no effect. A study with 40% power that finds p=0.12 has not shown the null hypothesis is true — it has shown it was not adequately designed to detect the effect if it exists. Stating 'we found no evidence of an effect' based on a low-powered non-significant result is a fundamental statistical error that frequently appears in published literature.

  • Confusing the probability of detecting an effect with the probability the effect is real. Power addresses design adequacy, not the probability that the alternative hypothesis is true. A study with 95% power has a 95% chance of detecting the specified effect if it truly exists. It says nothing about whether the true effect is the specified size or exists at all.

  • Not reporting the power of negative studies. A null result from a 90%-powered study is meaningful evidence of a small or nonexistent effect. A null result from a 30%-powered study is nearly meaningless. Journals that publish negative results without power information force readers to make their own assessments, leading to misinterpretation of evidence.

  • Using post-hoc power calculated from the observed effect size. Using the observed effect from a completed study to calculate its power produces a circular calculation that is mathematically equivalent to reporting the p-value in a different form. The power calculation for study evaluation should use the effect size the study was designed to detect, not the observed effect from the data.

Frequently Asked Questions

Accuracy and Disclaimer

Statistical power calculations in this tool use standard normal approximations suitable for large-sample inference. For small samples, exact methods or simulation-based power calculations may be more appropriate. Results are for planning and educational purposes only. For regulatory submissions, clinical trial protocols, or grant applications, power calculations should be reviewed by a qualified biostatistician using appropriate software.

Conclusion

Power calculations belong at both ends of a study — before enrollment to set the sample size target, and before submission to verify that negative results are truly inconclusive rather than underpowered. Use the Sample Size Calculator to work the problem in the opposite direction (required N for a given power target), and the Confidence Interval Calculator to translate power and sample size into the expected width of your result intervals.