What is the relationship between power, alpha, and sample size?

Power (1-beta), alpha, and N form an interdependent system. Fixing any two determines the third. Increasing N increases power at fixed alpha. Decreasing alpha (being more stringent) reduces power at fixed N. The 80% power convention with alpha=0.05 is a balance point, not a law — some fields use 90% power with alpha=0.01, particularly in clinical trial contexts where Type I or Type II errors have serious consequences.

What does Cohen's d mean and how do I estimate it?

Cohen's d is the standardized mean difference between two groups: (mean1 - mean2) / pooled standard deviation. Values of 0.2 are considered small (subtle effects, large N required), 0.5 medium (detectable with moderate N), and 0.8 large (easily detectable). For novel research with no pilot data, use the minimum meaningful difference divided by the expected standard deviation of your outcome measure, based on the measurement scale and expected variability.

What power level should I target for my study?

The conventional minimum is 80%, but many disciplines and journals now expect 90% power or justify deviations explicitly. For studies with high costs per participant, novel interventions with uncertain effect sizes, or confirmatory trials, 90% power is standard. Exploratory studies may accept 80% when resources are constrained. IRB and funding review committees will expect power justification regardless of the level chosen.

How does multiple testing affect power?

When multiple hypotheses are tested, alpha is typically adjusted downward to control the family-wise error rate (Bonferroni: divide alpha by number of tests). A lower alpha reduces power at fixed N. A study testing 5 outcomes with Bonferroni correction at alpha=0.01 per test needs substantially larger N to maintain 80% power for each test compared to a single-outcome study at alpha=0.05. Power calculations for multi-endpoint studies require specifying the correction method.

Science & Research

Statistical Power Calculator

Determine the statistical power of your study or calculate required sample size to achieve desired power. Supports Z-tests, T-tests, and Chi-square tests for post-hoc and a priori power analysis.

This statistical power calculator performs post-hoc power analysis to determine the probability of detecting a true effect given your sample size, effect size, and significance level. Use to evaluate completed studies, assess adequacy of pilot data, or optimize study design. Power of 80% or higher is conventionally considered adequate.

Study Parameters

Test Type

Significance Level (α)

Group 1 Sample Size (n₁)

Group 2 Sample Size (n₂)

Test Type

Effect Size (Proportions)

Group 1 Proportion (p₁)

Group 2 Proportion (p₂)

Embed This Calculator on Your Website

Add this free calculator to your blog, website, or CMS with a simple copy-paste embed code.

Introduction

Published research is riddled with underpowered studies that failed to detect real effects — and, paradoxically, with claimed discoveries that could not be reproduced. The Open Science Collaboration replication study found that only 36% of psychology findings replicated with the same effect size, largely due to inflated original effect estimates in underpowered studies that happened to cross the significance threshold by chance. Statistical power is the probability that a study correctly rejects a false null hypothesis — and 80% is considered the minimum acceptable threshold for well-designed research. A study with 60% power has a 40% chance of missing a real effect entirely, wasting every resource invested. But power is also a post-hoc diagnostic: reviewers, meta-analysts, and journal editors increasingly calculate the power of published studies to evaluate whether negative results are truly negative or just inconclusive. This calculator computes statistical power for any combination of sample size, effect size, and significance level so researchers can evaluate study designs prospectively and audit published findings retrospectively.

What This Calculator Does

This calculator computes statistical power for two common designs: comparison of two independent means (t-test) and comparison of two proportions (z-test). Inputs include sample size per group (or total N for paired designs), expected effect size (Cohen's d for means, or raw proportions for proportions tests), and significance level (alpha). The tool outputs: power as a probability, the probability of Type II error (beta = 1 - power), the required sample size to achieve 80% and 90% power, and a sensitivity analysis showing power across a range of effect sizes for the given N.

The Formula

For the two-means t-test, power depends on the non-centrality parameter: the true difference in means divided by the standard error of the difference (which depends on the common standard deviation and sample size per group). For proportions, the non-centrality parameter uses the observed difference divided by the standard error under the true proportions. In both cases, the power is calculated as the probability that the test statistic exceeds the critical value given the true effect. Higher N, larger effect, or higher alpha each increase power. Reducing alpha (e.g., from 0.05 to 0.01) substantially reduces power at a given N.

Step-by-Step Example

Enter study parameters

Example: two-group experiment already conducted, 45 participants per group (N=90 total). Observed effect size: Cohen's d = 0.35 (small-medium). Significance level: alpha = 0.05 (two-tailed). Standard deviation: 15 units.

Calculate the non-centrality parameter

For Cohen's d = 0.35, true difference = 0.35 × 15 = 5.25 units. Standard error of difference: 15 × √(2/45) = 15 × 0.2108 = 3.162. Non-centrality: 5.25 / 3.162 = 1.660.

Determine power

Power = Φ(1.660 - 1.960) = Φ(-0.300) = 1 - Φ(0.300) = 1 - 0.618 = 0.382. This study has only 38% power — there was a 62% chance of missing the true effect. A non-significant result here is uninformative, not evidence of no effect.

Determine what sample size would have achieved 80% power

For d=0.35, alpha=0.05, 80% power: required n = 2 × [(1.960 + 0.842) / 0.35]² = 2 × [8.006]² = 2 × 64.1 = 128 per group (256 total). The study with N=90 was powered for approximately Cohen's d = 0.52 at 80% power, meaning it was only adequately powered to detect a medium-to-large effect.

Real-World Use Cases

Grant Application Reviewer Assessment

A study section reviewer for NIH receives a grant application reporting a pilot study with N=30 and p=0.08 (not significant). Using the power calculator with the effect size from the pilot (d=0.45) and N=15 per group, the reviewer finds power = 38%. The non-significant pilot result is consistent with an underpowered study detecting a real medium effect, not with no effect. The reviewer recommends funding the properly powered main study.

Systematic Review Power Audit

A meta-analyst reviewing 12 studies on a behavioral intervention calculates the power of each study to detect the pooled effect size (d=0.32) from the meta-analysis. She finds that 8 of 12 studies had power below 50%, explaining why 7 of those 8 reported non-significant results. The meta-analysis conclusion — that the intervention works despite the many negative individual studies — is supported by the power audit showing systematic underpowering rather than a true null effect.

A/B Test Post-Analysis for Tech Researcher

A data scientist ran an A/B test with 1,200 users per variant for 14 days. The conversion rate change was 2.1% vs. 2.5%, not statistically significant at alpha=0.05. Using the power calculator with p1=0.021, p2=0.025, N=1,200, the analysis reveals power of 43% — the test was underpowered to detect a 0.4 percentage point difference. The correct enrollment to achieve 80% power for this effect size is 4,200 per variant.

Comparison

Sample Size per Group	Cohen's d = 0.20 (Small)	Cohen's d = 0.50 (Medium)	Cohen's d = 0.80 (Large)
25	9%	41%	76%
50	14%	70%	96%
100	22%	94%	100%
200	40%	100%	100%
400	67%	100%	100%
800	92%	100%	100%

Common Mistakes to Avoid

Interpreting low power after a non-significant result as evidence of no effect. A study with 40% power that finds p=0.12 has not shown the null hypothesis is true — it has shown it was not adequately designed to detect the effect if it exists. Stating 'we found no evidence of an effect' based on a low-powered non-significant result is a fundamental statistical error that frequently appears in published literature.
Confusing the probability of detecting an effect with the probability the effect is real. Power addresses design adequacy, not the probability that the alternative hypothesis is true. A study with 95% power has a 95% chance of detecting the specified effect if it truly exists. It says nothing about whether the true effect is the specified size or exists at all.
Not reporting the power of negative studies. A null result from a 90%-powered study is meaningful evidence of a small or nonexistent effect. A null result from a 30%-powered study is nearly meaningless. Journals that publish negative results without power information force readers to make their own assessments, leading to misinterpretation of evidence.
Using post-hoc power calculated from the observed effect size. Using the observed effect from a completed study to calculate its power produces a circular calculation that is mathematically equivalent to reporting the p-value in a different form. The power calculation for study evaluation should use the effect size the study was designed to detect, not the observed effect from the data.

Frequently Asked Questions

Accuracy and Disclaimer

Statistical power calculations in this tool use standard normal approximations suitable for large-sample inference. For small samples, exact methods or simulation-based power calculations may be more appropriate. Results are for planning and educational purposes only. For regulatory submissions, clinical trial protocols, or grant applications, power calculations should be reviewed by a qualified biostatistician using appropriate software.

Conclusion

Power calculations belong at both ends of a study — before enrollment to set the sample size target, and before submission to verify that negative results are truly inconclusive rather than underpowered. Use the Sample Size Calculator to work the problem in the opposite direction (required N for a given power target), and the Confidence Interval Calculator to translate power and sample size into the expected width of your result intervals.

Related Science & Research Calculators

Science & Research

Sample Size Calculator

Calculate the minimum sample size needed for statistical significance with your desired confidence level, margin of error, and population size. Supports proportions and means for research studies, surveys, and clinical trials.

Use Calculator Science & Research

Confidence Interval Calculator

Compute confidence intervals for proportions and means with your specified confidence level. Uses Z-distribution for large samples and t-distribution for small samples (n < 30) with accurate interval bounds.

Use Calculator Science & Research

Molarity Calculator

Calculate solution concentration in molarity (M), convert mass to molarity, and perform dilution calculations (C1V1 = C2V2). Essential for chemistry labs, biochemistry, and molecular biology workflows.

Use Calculator Science & Research

Radioactive Decay Calculator

Model radioactive decay over time using half-life calculations. Calculate remaining activity, decay constant, mean lifetime, and time to reach specific activity levels for physics, medicine, and environmental science applications.

Use Calculator Science & Research

PCR Primer Tm Calculator

Calculate primer melting temperature (Tm) using nearest-neighbor and basic methods. Determines optimal annealing temperature for PCR reactions based on primer sequence, GC content, salt concentration, and amplicon length.

Use Calculator

You May Also Find Useful

Finance & Accounting

Statistical Power Calculator

Embed This Calculator on Your Website

Introduction

What This Calculator Does

The Formula

Step-by-Step Example

Real-World Use Cases

Grant Application Reviewer Assessment

Systematic Review Power Audit

A/B Test Post-Analysis for Tech Researcher

Comparison

Common Mistakes to Avoid

Frequently Asked Questions

Accuracy and Disclaimer

Conclusion

Related Science & Research Calculators

Sample Size Calculator

Confidence Interval Calculator

Molarity Calculator

Radioactive Decay Calculator

PCR Primer Tm Calculator

You May Also Find Useful

Tax Calculator

Salary to Hourly Calculator

Commission Calculator