Sample Size and Statistical Power: Definition, Formulas, Calculations Worked Examples

Getting your Trinity Audio player ready...
Summarize this Blog with AI

Key Takeaways

  • Statistical power depends on four interlocking values: sample size, effect size, alpha, and the variability of the data; changing any one changes the others.
  • The right sample size formula depends on study design: correlational, cross-sectional, longitudinal, and experimental studies each use different equations and inputs.
  • A target power of 0.80 and alpha of 0.05 are conventional starting points, but longitudinal and cluster designs require upward adjustment for attrition and design effects.
  • Free tools such as G*Power, R, and Python can match the accuracy of paid software such as SPSS, SAS, Stata, and PASS for most standard designs.

Contents

Glossary of Key Terms

These terms recur throughout the guide; refer back to this list whenever a formula or example uses one of them.

TermDefinition
Statistical powerThe probability that a study correctly detects a true effect; conventionally set at 0.80, meaning an 80% chance of detecting the effect if it exists.
Alpha (α)The Type I error rate: the probability of concluding there is an effect when none exists; commonly set at 0.05.
Beta (β)The Type II error rate: the probability of missing a true effect; power equals 1 minus beta.
Effect sizeA standardized measure of the magnitude of a difference or relationship, independent of sample size; examples include Cohen’s d, r, and f.
Sample size (n)The number of participants, observations, or units included in a study or in each study arm.
Null hypothesisThe default assumption that there is no effect, no difference, or no relationship between variables.
Confidence intervalA range of values, calculated from sample data, that is likely to contain the true population value at a stated probability, typically 95%.
Margin of errorThe maximum expected difference between a sample estimate and the true population value, at a given confidence level.
Attrition rateThe proportion of enrolled participants who drop out of a study before it ends, particularly relevant to longitudinal designs.
Design effect (DEFF)A multiplier applied to a simple random sample size to account for reduced precision caused by clustering or complex sampling.
A priori power analysisA power calculation performed before data collection to determine the required sample size.
Post hoc power analysisA power calculation performed after data collection; generally discouraged because it is mathematically tied to the p-value already obtained.

What Is Statistical Power and Why Does It Matter?

Statistical power is the probability that a study will detect a real effect if one truly exists in the population; a power of 0.80 means an 80% chance of finding a genuine effect and a 20% chance of missing it.

Power matters because an underpowered study wastes participant time, funding, and ethical goodwill on a result that cannot reliably distinguish a true effect from noise. An overpowered study, by contrast, may detect trivially small effects that have no practical importance, and it uses more resources than necessary.

Type I and Type II Errors

Every hypothesis test carries two possible errors, summarized below.

DecisionNull Hypothesis Is TrueNull Hypothesis Is False
Reject the nullType I error, false positive; probability = alphaCorrect decision; probability = power (1 minus beta)
Fail to reject the nullCorrect decision; probability = 1 minus alphaType II error, false negative; probability = beta

The Four Building Blocks of Every Power Analysis

  • Sample size (n): the number of participants or observations.
  • Effect size: the magnitude of the difference, relationship, or association the study is designed to detect.
  • Alpha (α): the significance threshold, almost always 0.05 for a two-tailed test.
  • Power (1 minus β): the desired probability of detecting the effect, conventionally 0.80 or 0.90.

Any three of these four values can be used to solve for the fourth; sample size planning solves for n given the other three.

How Do You Calculate the Sample Size You Need?

You calculate sample size by choosing a statistical test that matches your design, specifying alpha, power, and an expected effect size, then solving the test’s sample size formula or entering those values into software such as G*Power.

The General Logic Behind Every Sample Size Formula

Although formulas differ by test, most share the same structure: sample size increases as the desired power increases, as alpha becomes stricter and smaller, and as the expected effect size shrinks. In plain terms, smaller effects, stricter significance thresholds, and higher power targets all require more participants.

Effect Size Benchmarks by Cohen’s Conventions

When no prior data exist, Jacob Cohen’s conventional benchmarks provide a starting point; treat them as a last resort rather than a substitute for a pilot study or literature-based estimate.

StatisticSmallMediumLarge
Cohen’s d, t test0.200.500.80
Pearson’s r, correlation0.100.300.50
Cohen’s f, ANOVA0.100.250.40
Cohen’s f2, regression0.020.150.35
Cramer’s V, chi-square0.100.300.50

Sample Size and Power for Correlational Studies

Correlational studies test whether two continuous variables are associated, using Pearson’s r as the effect size. Sample size planning relies on the Fisher r-to-z transformation because r is not normally distributed.

C = 0.5 * ln[(1+r) / (1-r)]      n = [(z(1-α/2) + z(1-β)) / C]^2 + 3

Worked Example: Sleep Duration and Academic Performance

A researcher wants to test whether nightly sleep duration correlates with grade point average in undergraduates, expecting a small to medium correlation of r = 0.30, with alpha = 0.05, two-tailed, and power = 0.80.

  1. Convert r to Fisher’s z: C = 0.5 * ln(1.30/0.70) = 0.5 * ln(1.857) = 0.310
  2. Look up the z-values: z(1-α/2) = 1.96 for alpha = 0.05 two-tailed; z(1-β) = 0.84 for power = 0.80
  3. Apply the formula: n = [(1.96 + 0.84) / 0.310]^2 + 3 = (9.03) + 3, which rounds up to n = 85
  4. Add a buffer for missing or unusable data; at an expected 15% loss, the target enrollment becomes roughly 100 participants

Result: recruit at least 85 participants for adequate power, and aim for approximately 100 to absorb incomplete responses.

Sample Size and Power for Cross-Sectional Studies

Cross-sectional studies typically estimate a prevalence, proportion, or mean at a single point in time; sample size planning here centers on precision, expressed as a margin of error, rather than on detecting a difference between groups.

n = [Z^2 * p(1-p)] / e^2

Worked Example: Estimating Hypertension Prevalence in Adults

A public health team wants to estimate the prevalence of hypertension among adults in a district, expecting prevalence near 35%, with a margin of error of 5 percentage points and 95% confidence.

  1. Set inputs: Z = 1.96 for 95% confidence; p = 0.35; e = 0.05
  2. Apply the formula: n = (1.96^2 * 0.35 * 0.65) / 0.05^2 = 0.874 / 0.0025 = 349.6, which rounds up to n = 350
  3. If the survey uses multistage cluster sampling rather than simple random sampling, multiply by a design effect; a typical DEFF of 1.5 raises the target to 525 respondents

Result: 350 respondents under simple random sampling, or approximately 525 under a two-stage cluster design with a design effect of 1.5.

Sample Size and Power for Longitudinal Studies

Longitudinal studies measure the same participants repeatedly over time, so sample size planning must account for two extra factors beyond a single-measurement design: the correlation between repeated measurements, which usually reduces the required n, and anticipated attrition, which increases it.

Worked Example: Tracking Cognitive Decline Over a Five-Year Period

A team plans to compare cognitive decline trajectories between two groups, measured at five annual time points, expecting a medium interaction effect, f = 0.25, with alpha = 0.05 and power = 0.80, and an average correlation of 0.50 between repeated measurements.

  1. Enter the design into a repeated measures power module, for example G*Power’s within-between interaction procedure for ANOVA, specifying two groups, five measurements, f = 0.25, correlation = 0.50, alpha = 0.05, and power = 0.80
  2. A design of this shape typically returns a baseline requirement of approximately 20 participants per group, or 40 participants in total, before accounting for dropout
  3. Adjust for attrition using n(adjusted) = n / (1 minus attrition rate); with 30% attrition expected over five years, 40 / 0.70 = 57.1, which rounds up to 58 participants, or roughly 30 per group after rounding for balance

Result: enroll approximately 30 participants per group, 60 in total, to retain adequate power after five years of expected attrition.

Sample Size and Power for Experimental Studies

Experimental studies, including randomized controlled trials, compare outcomes between groups that receive different interventions. The most common planning scenario is a two-group comparison of means, using Cohen’s d as the effect size.

n (per group) = 2 * [(z(1-α/2) + z(1-β)) / d]^2

Worked Example: A Two-Arm Randomized Controlled Drug Trial

A pharmaceutical trial compares a new medication against placebo on a continuous symptom score, expecting a medium effect size of d = 0.50, with alpha = 0.05, two-tailed, and power = 0.80.

  1. Set inputs: z(1-α/2) = 1.96; z(1-β) = 0.84; d = 0.50
  2. Apply the formula: n = 2 * [(1.96 + 0.84) / 0.50]^2 = 2 * (5.6)^2 = 2 * 31.36 = 62.7, which rounds up to 63 per group
  3. Round to the conventional reporting value of 64 participants per group, or 128 in total, matching standard reference tables for d = 0.50, alpha = 0.05, and power = 0.80
  4. Inflate further for expected dropout; at 10% attrition, 64 / 0.90 = 71.1, so recruit approximately 72 per group, 144 in total

Result: randomize approximately 64 participants per arm for the analyzable sample, and recruit closer to 72 per arm to allow for a 10% dropout rate.

How Many Participants Do You Need for Regression and Multivariate Models?

Multiple regression commonly follows Green’s rule of thumb: N is at least 50 + 8m to test the overall model, and N is at least 104 + m to test individual predictors, where m is the number of predictors; use whichever value is larger.

Example: a model with 5 predictors requires N of at least 50 + 8(5) = 90 to test the overall R-squared, and N of at least 104 + 5 = 109 to test individual regression coefficients reliably. Because 109 is the larger figure, the recommended sample size is approximately 110 participants.

  • For structural equation modeling, common guidance ranges from 10 to 20 cases per estimated parameter, with a practical floor of 200 to 400 cases for models of moderate complexity.
  • For logistic regression, the events-per-variable rule suggests at least 10 events of the less frequent outcome category per predictor entered into the model.
  • For multilevel or hierarchical models, sample size operates at two levels simultaneously; a common starting guideline is at least 30 higher-level units, such as clusters or schools, with 30 or more lower-level units each, though this varies with the intraclass correlation.

What Happens If Your Study Is Underpowered?

An underpowered study has a high chance of missing a true effect and, when it does find a significant result, tends to overestimate the size of that effect. Both problems distort the published research record.

  • Higher false negative rate: a real effect goes undetected, and the finding is reported as null even though the treatment or relationship works.
  • Inflated effect size estimates: significant results from small samples tend to overstate the true effect, a pattern known as the winner’s curse or effect size inflation.
  • Reduced reproducibility: underpowered findings that reach significance by chance are less likely to replicate in follow-up studies.
  • Wasted resources and ethical cost: participants take on risk or burden in a study that was never likely to answer its own question.

If a study turns out underpowered after the fact, the appropriate response is to report the observed effect size with its confidence interval, avoid a post hoc power calculation, and treat the finding as preliminary pending a properly powered replication.

Common Mistakes That Undermine Power Analysis

  • Running a post hoc power calculation using the study’s own observed effect size, which is circular and adds no new information beyond the p-value already obtained.
  • Treating a pilot study‘s effect size as a precise, fixed estimate rather than a rough, imprecise starting point with wide uncertainty.
  • Ignoring expected attrition in longitudinal designs, leading to an underpowered final sample even though the enrollment target looked adequate.
  • Ignoring clustering or multistage sampling and failing to apply a design effect, which understates the sample size needed for the desired precision.
  • Switching from a two-tailed to a one-tailed test after seeing preliminary data, purely to reduce the required sample size.
  • Applying a single generic rule of thumb, such as 30 participants per group, regardless of the statistical test, effect size, or design actually being used.
  • Failing to adjust for multiple comparisons when several outcomes or subgroup analyses are planned, which changes the effective alpha and the required sample size.
  • Assuming that a nonsignificant result with a small sample proves the absence of an effect, rather than acknowledging that the study may simply have lacked power.

Statistical Software for Power Analysis: Features and Pricing

Prices below are approximate USD list prices as of 2026 and vary by license type, institution, region, and current promotions; confirm current figures directly with each vendor before budgeting.

SoftwareFree or PaidApproximate Price, USDBest For
G*PowerFreeNo costStandard a priori and post hoc power calculations across t tests, F tests, chi-square, and correlation
R (pwr, WebPower, simr packages)FreeNo costFlexible, scriptable power analysis, including simulation-based power for mixed and multilevel models
Python (statsmodels, pingouin)FreeNo costPower analysis integrated into a broader Python data science or machine learning workflow
JASP and jamovi (jpower module)FreeNo costMenu-driven power analysis for users who want a graphical interface without a programming language
IBM SPSS StatisticsPaid, subscription or perpetualApproximately 105 per month or 1,188 per year for Base subscription; 3,830 one-time for a perpetual Base licenseTeams already standardized on SPSS for the main data analysis, using Python or R extensions for power calculations
SAS, including PROC POWER and PROC GLMPOWERPaid, quote-based; free academic optionCustom institutional quote; SAS OnDemand for Academics available at no cost for teaching and learningRegulated environments such as pharmaceutical and clinical trial submissions that require SAS-validated output
Stata, with the built-in power commandPaid, perpetual or annualRoughly 125 to 600 for student and academic perpetual licenses; roughly 225 to 925 or more per year for annual licenses, depending on editionApplied researchers in economics, epidemiology, and social science who also use Stata for their main analysis
PASS by NCSSPaid, perpetual or subscriptionTypically in the four-figure range per named user; contact NCSS for a current quoteDedicated, large-scale power and sample size planning across hundreds of specialized study designs, including clinical trials

G*Power

G*Power is a free, standalone application built specifically for power analysis and is the most widely cited tool in psychology, education, and health research methods sections.

  • Covers t tests, F tests, chi-square tests, z tests, correlation and regression, and several exact tests.
  • Offers five analysis types: a priori (solve for n), post hoc (solve for power), compromise, criterion, and sensitivity analysis.
  • Runs on Windows and macOS with no license fee and no usage restrictions.

R: pwr, pwr2, WebPower, and simr Packages

R offers several free packages that together cover almost every design discussed in this guide, from simple t tests to simulation-based power for mixed effects models.

  • pwr: covers the classic tests, including t tests, correlation, chi-square, ANOVA, and proportions, mirroring most of G*Power’s coverage in code form.
  • WebPower: extends coverage to structural equation modeling, multilevel and longitudinal models, and mediation analysis.
  • simr: uses simulation to estimate power for generalized linear mixed models, useful when no closed-form formula exists.

Python: statsmodels and pingouin

Python’s statsmodels.stats.power module and the pingouin package provide power and sample size functions that integrate directly into a pandas or NumPy based analysis pipeline, avoiding the need to switch tools.

IBM SPSS Statistics

SPSS Statistics itself does not include a dedicated power analysis procedure in its core menus; researchers typically run power calculations through SPSS’s built-in Python or R integration, calling the same open source packages described above from within the SPSS environment.

SAS and SAS Studio

SAS/STAT includes PROC POWER and PROC GLMPOWER, which handle t tests, ANOVA, regression, proportions, correlation, and survival analysis with syntax-based input and detailed output tables and power curves; SAS OnDemand for Academics provides free, cloud-based access for students and instructors.

Stata

Stata has included a built-in power suite since Stata 14, accessed through the power, power twomeans, power oneway, and related commands, along with a menu-driven Power and Sample-Size interface for users who prefer not to type commands.

PASS by NCSS

PASS is a dedicated power and sample size package covering several hundred procedures, including many specialized clinical trial designs such as noninferiority, equivalence, group sequential, and survival analyses that are not readily available elsewhere.

Is Free Software as Accurate as Paid Software for Power Analysis?

Yes, for standard designs; G*Power, R, and Python implement the same published statistical formulas as SPSS, SAS, Stata, and PASS, and cross-validation studies routinely show matching results across tools for common tests such as t tests, ANOVA, and correlation.

Paid software earns its price mainly through breadth of specialized procedures, validated output for regulatory submission, vendor technical support, and a graphical interface; for a standard two-group comparison or a simple correlation, a free tool will return the identical sample size.

Step-by-Step Commands for a Basic Power Analysis in Each Tool

The table below walks through the same example, a two-sample t test with d = 0.50, alpha = 0.05, and power = 0.80, in five common tools.

ToolKey StepsSample Command or Menu PathOutput
G*PowerOpen the application; choose the test family and statistical test; select the a priori analysis type; enter effect size, alpha, and power; click CalculateTest family: t tests; Statistical test: Means, two independent groups; Type of power analysis: A prioriTotal sample size and sample size per group, plus a plotted power curve
R, pwr packageInstall and load the package; call the matching function; pass effect size, significance level, and power; leave n as NULL to solve for itinstall.packages(“pwr”); library(pwr); pwr.t.test(d=0.5, sig.level=0.05, power=0.8, type=”two.sample”)n per group printed to the console, typically as a decimal to round up
Python, statsmodelsInstall the package; import the relevant power class; instantiate it; call solve_power with effect size, alpha, and power, leaving nobs1 unsetfrom statsmodels.stats.power import TTestIndPower; TTestIndPower().solve_power(effect_size=0.5, alpha=0.05, power=0.8)A floating point sample size per group, rounded up in practice
StataUse the built-in power command with the twomeans subcommand; specify the standardized difference directly or supply means and a standard deviationpower twomeans, diff(0.5) sd(1) alpha(0.05) power(0.8)A results table listing N1, N2, total N, and the achieved power
PASS by NCSSOpen the Two-Sample T-Test procedure under the Means menu; enter the effect size, alpha, and target power in the procedure window; click CalculateMenu path: Design, Means, Two Independent Means, T-TestA results table and an accompanying power curve chart, exportable to Word or PDF
Get further tips on how to select a sample, from Dr Oliver Gruenvogel, AI expert in the life sciences

How Should You Report Sample Size and Power in a Manuscript?

Report sample size and power in the methods section by stating the statistical test, the expected effect size and its source, alpha, the target power, the resulting sample size, and any adjustment made for attrition or clustering.

A complete, reviewer-ready statement typically includes all of the following elements.

  • The specific statistical test the power analysis was based on, for example an independent-samples t test or a repeated-measures ANOVA.
  • The expected effect size, with a citation or rationale for why that value was chosen, such as a prior study, a meta-analysis, or a pilot dataset.
  • Alpha and power, stated explicitly rather than assumed, for example alpha = 0.05 and power = 0.80.
  • The software or formula used to perform the calculation, including the version number where relevant.
  • The resulting required sample size, and the final planned enrollment after adjusting for attrition, non-response, or design effects.

Avoid reporting a post hoc power value calculated from the study’s own results; instead, if the study turned out underpowered, report the effect size with its 95% confidence interval and discuss the limitation directly.

Frequently Asked Questions

What sample size is considered statistically significant for a study?

No fixed sample size is inherently significant; significance depends on the p-value produced by the test, which is driven by sample size, effect size, and variability together, not by sample size alone.

How do I calculate sample size for a survey with a 95 percent confidence level?

Use n = [Z^2 * p(1-p)] / e^2, with Z = 1.96 for 95% confidence, p as the expected proportion (use 0.50 if unknown, since it maximizes the required n), and e as the acceptable margin of error, commonly 0.03 to 0.05.

What is a good statistical power value for a research study?

A power of 0.80 is the conventional minimum, meaning an 80% chance of detecting a true effect; high-stakes fields such as clinical trials and regulatory submissions often require 0.90 or higher.

Can a study have too large a sample size?

Yes; an excessively large sample can detect statistically significant effects that are too small to matter practically, and it also consumes more time, funding, and participant burden than the research question requires.

How does effect size affect the sample size needed for a t test?

Effect size and sample size move in opposite directions: a small effect size, such as Cohen’s d = 0.20, requires a much larger sample than a large effect size, such as d = 0.80, to reach the same power.

What is the minimum sample size for a correlation study?

For a typical two-tailed correlation with alpha = 0.05 and power = 0.80, the minimum sample size ranges from about 28 participants to detect a large correlation of r = 0.50, up to roughly 782 participants to detect a small correlation of r = 0.10.

Is G*Power really free, and is it accurate enough for a dissertation or journal submission?

Yes; G*Power carries no license fee and is peer-reviewed and validated in the methodological literature, making it acceptable for dissertations and journal articles across psychology, education, medicine, and the social sciences.

What happens if my study is underpowered after I have already collected the data?

Report the observed effect size with its confidence interval, avoid calculating post hoc power from the same data, and frame the result as preliminary evidence that should be confirmed with an adequately powered replication.

Related post

Featured post

Comment

There are no comment yet.

TOP