How to choose the right statistical test

This article is in

Marisha Fonseca
Jun 2, 2026

Reading time

6 mins

How to choose the right statistical test

Choosing the right statistical test requires considering five key factors: your research question, study design, data characteristics, parametric assumptions, and the pairing structure of your data. While this guide covers the most common scenarios, remember that statistical testing is not one-size-fits-all, and context matters.

Before analyzing your data:

Clearly state your research question and primary outcome
Document your study design (independent vs. paired groups, number of measurements)
Assess your data for normality and outliers
Consider your sample size and resulting statistical power
Pre-specify whether you’ll use one-sided or two-sided tests
Consult the decision tables above to identify candidate tests
When in doubt, consider consulting with a biostatistician

Jump to Contents

Research Question
Study Design
Data Characteristics
Parametric vs. Nonparametric Tests
Paired vs. Unpaired / Repeated Measures Tests
Sample Size and Statistical Power
One-Sided vs. Two-Sided Tests
Test Selection Decision Table

Research Question

Start by clearly defining your research question, as it determines which statistical approach you’ll need.

Think about what you’re trying to investigate:

Are you looking for relationships between variables? For example, if you want to investigate the relationship between two continuous variables like blood pressure and heart rate in patients, you should consider using correlation analysis. This helps you understand whether these variables are positively or negatively related and the strength of that relationship.
Are you looking at the effect of an exposure? For instance, if you’re interested in whether exposure to X increases the likelihood of Y disease, you’d need to calculate odds ratios for exposed and unexposed groups.
Are you comparing outcomes between groups? If your research question is about whether a new treatment reduces pain levels compared to a control, you’ll need a comparison test rather than a correlation.
Are you predicting future outcomes? If you want to predict patient outcomes based on multiple variables, regression models would be appropriate.

Your research question directly shapes whether you should use descriptive statistics, correlation, regression, or group comparison tests.

Study Design

Next, examine your study design carefully, as it dictates which statistical tests are valid for your data.

Key design considerations:

How many groups are you comparing? Tests for comparing two groups differ from tests for three or more groups. Two-group comparisons use t-tests or Mann-Whitney tests, while multiple-group comparisons require ANOVA or Kruskal-Wallis tests.
Are your groups independent or related? This distinction is critical and determines whether you use unpaired or paired tests (covered in detail in the Paired vs. Unpaired section below).
Are you measuring variables over time? If you’re conducting a longitudinal study with repeated measurements on the same subjects, you need repeated measures tests rather than standard tests.

Example:

Suppose you’re conducting a drug trial with multiple groups of patients and measuring the effect on pain levels at different time points. A two-way repeated measures ANOVA would be suitable. This allows you to examine both the effect of the drug (between-group factor) and the change over time (within-group factor).

Your study design determines the fundamental structure of the test you’ll use.

Data Characteristics

Before applying any test, you must evaluate several characteristics of your data.

Characteristics to assess:

What type of outcome are you measuring? Continuous outcomes (like tumor size or blood pressure) require different tests than categorical outcomes (like presence or absence of disease).
Is your data normally distributed? This is one of the most important questions, as it determines whether you can use parametric tests. If you’re studying the effects of a new treatment on a continuous outcome like tumor size reduction, check if your data is normally distributed. You can assess normality through histograms, Q-Q plots, or normality tests like Shapiro-Wilk.
Are there outliers or extreme values? These can violate assumptions of parametric tests and may push you toward nonparametric alternatives.

Example

If you’re comparing survival rates of patients with different treatments (categorical data), a chi-square test would be appropriate. This test helps determine if there is a significant association between the treatment received and the survival outcome.

Understanding your data’s distribution is vital for choosing the correct test.

Parametric vs. Nonparametric Tests

This is one of the most critical decisions in test selection, yet it’s often misunderstood.

Parametric tests

Parametric tests make assumptions about your data:

They assume your data comes from a normally distributed (Gaussian) population
They work with the actual values of your data
Common parametric tests: t-test, ANOVA, Pearson correlation, linear regression
Parametric tests are generally more powerful (better at detecting true effects) when their assumptions are met

When to use parametric tests:

Your data is approximately normally distributed
You have a reasonable sample size (generally 20+ observations per group)
Your outcome is measured on a continuous scale
You’ve verified normality through visual inspection (histogram/Q-Q plot) or statistical tests

Nonparametric tests

Nonparametric tests make fewer assumptions:

They don’t assume normal distribution
They work with the ranks or order of your data rather than actual values
Common nonparametric tests: Mann-Whitney U, Kruskal-Wallis, Spearman correlation, Wilcoxon signed-rank test
They are often called “distribution-free” tests

When to use nonparametric tests:

Your data is clearly not normally distributed (heavily skewed or bimodal)
You have very small sample sizes where normality is hard to assess
Your outcome is ordinal or ranked data (pain scores on a scale of 1-10, class rankings)
Some values are “off the scale” (too high or too low to measure exactly)
Your data contains extreme outliers that violate normality assumptions

Choosing between parametric and nonparametric tests:

Parametric Test	Nonparametric Alternative	When to use nonparametric
t-test (2 groups)	Mann-Whitney U test	Data not normally distributed
Paired t-test	Wilcoxon signed-rank test	Non-normal paired data
ANOVA (3+ groups)	Kruskal-Wallis test	Non-normal data with multiple groups
Repeated measures ANOVA	Friedman test	Non-normal repeated measures data
Pearson correlation	Spearman correlation	Non-normal or ranked variables

The choice between parametric and nonparametric tests depends on your data distribution. With large sample sizes, parametric tests are robust to violations of normality due to the central limit theorem. With small sample sizes, the choice matters much more, and nonparametric tests may be safer if you’re uncertain about normality.

Paired vs. Unpaired / Repeated Measures Tests

This decision point confuses many researchers. You must determine whether your groups are independent or related.

Use unpaired (independent) tests when:

Each subject appears in only one group
The groups are completely separate with no matching or pairing
There is no logical connection between a value in one group and a specific value in another group
Examples: comparing treatment group vs. control group, comparing men’s blood pressure to women’s blood pressure

Use paired tests when:

The same subject is measured twice (before and after intervention)
Subjects have been matched on important variables (age, disease severity)
You have repeated measurements on the same subject over time
There is a natural pairing between observations
Examples: baseline vs. post-treatment measurements, measurements on twins, measurements on eyes (left eye vs. right eye of same person)

Common paired vs. unpaired test pairs:

Comparison Type	Unpaired/Independent Test	Paired/Dependent Test
Compare 2 groups, continuous normally distributed data	Unpaired t-test	Paired t-test
Compare 2 groups, non-normal or ordinal data	Mann-Whitney U test	Wilcoxon signed-rank test
Compare 2 groups, categorical data	Chi-square or Fisher’s exact test	McNemar’s test
Compare 3+ groups, continuous normally distributed data	One-way ANOVA	Repeated measures ANOVA
Compare 3+ groups, non-normal data	Kruskal-Wallis test	Friedman test

Why this distinction matters:

Paired tests are more powerful for detecting differences because they account for variability between subjects. However, you can only use a paired test if your study design actually involves paired or repeated measurements. You cannot decide pairing after collecting data based on the values you observed—the pairing must be predetermined by your study design.

Example scenario: If you measure patients’ pain levels before administering a treatment and again one week after treatment, you use a paired test (paired t-test or Wilcoxon) because the same person is measured twice. If you instead compare pain levels of one group receiving treatment to a different group receiving placebo, you use an unpaired test (unpaired t-test or Mann-Whitney) because different people are in each group.

Sample Size and Statistical Power

Sample size affects which tests you can use and how reliably they’ll detect real effects.

How sample size influences test choice:

Large samples (n > 30 per group): Parametric tests are robust even if data isn’t perfectly normal. Both parametric and nonparametric tests work well, though parametric tests are slightly more powerful.
Small samples (n < 20 per group): Nonparametric tests lose statistical power (ability to detect true differences). Parametric tests may be problematic if data isn’t normal, but with small samples it’s also hard to assess normality. This creates a dilemma with no perfect solution.
Very small samples (n < 10): You have limited power to detect effects no matter which test you use. This is when study design decisions (paired vs. unpaired) become especially important.

Practical implications for test selection:

With adequate sample size, you can usually choose the test based on data distribution
With small sample size, even nonparametric alternatives may lack power
Before data collection, conduct a power analysis to determine your required sample size
Power analysis also accounts for the effect size you expect to detect and your desired statistical significance level (usually 0.05)

The relationship between sample size and test assumptions:

Sample Size	If Data is Normal	If Data is Not Normal
Large (n > 30)	Use parametric test	Parametric test still works well
Small (n < 20)	Use parametric test	Consider nonparametric test, but power may be low
Very small (n < 10)	Nonparametric test may be safer	Nonparametric test, but expect low power

Statistical power refers to your test’s ability to detect a true effect if one exists. Higher power is better. Power depends on sample size, effect size, significance level, and the specific test used. If your sample size is small, consider that your study may not have sufficient power to detect meaningful differences, even if they exist.

One-Sided vs. Two-Sided Tests

This decision must be made before collecting your data and affects your interpretation of results.

Two-sided tests (most common):

Test whether a difference exists in either direction
If comparing treatments A and B, tests whether A > B or B > A
More conservative approach
Recommended when you genuinely don’t know which direction differences might go
Calculation: divide the P-value threshold by 2 for each tail of the distribution
Example: with alpha = 0.05, each tail gets 0.025

Use two-sided tests when:

You have no strong prior hypothesis about direction
You’d be interested in results regardless of which direction the difference goes
Following best practices (most journals and statisticians recommend two-sided tests)
You’re uncertain about the relationship between variables

One-sided tests (less common):

Test whether a difference exists in a specific, pre-specified direction
Must predict the direction before analyzing data
More powerful (easier to detect an effect if it exists in your predicted direction)
Cannot detect or report effects in the opposite direction
Calculation: uses the full significance level in one tail

Use one-sided tests only when:

You have a very strong theoretical reason to predict direction
You genuinely wouldn’t care about differences in the opposite direction
You’ve specified the direction in your protocol before data collection
You’re willing to ignore unexpected effects in the opposite direction

Critical caution:

Do not decide to use a one-sided test after looking at your data. This practice, called “p-hacking,” compromises the validity of your results. The direction must be specified in advance. If your data show effects opposite to your prediction, you cannot simply switch to a one-sided test to reach significance.

Example comparison:

Scenario	Appropriate Test	Rationale
Testing if drug A differs from drug B in any way	Two-sided	No prediction about which is better
Testing if new treatment reduces pain more than control	One-sided (if pre-specified)	Directional hypothesis specified in advance
Comparing multiple patient outcomes	Two-sided	Multiple outcomes mean some may go either way
Testing if a standard reference value differs from your sample	Two-sided	Unless you predicted direction before study

Unless you have a compelling reason and have pre-specified direction, use two-sided tests.

Test Selection Decision Table

This table guides you to appropriate tests based on your study characteristics. First identify your outcome type, then the number of groups, then your data distribution.

Outcome: Continuous Data (Normally Distributed)

Number of Groups	Study Design	Appropriate Test
1 group	Compare to hypothetical value	One-sample t-test
2 groups	Independent/unpaired	Unpaired t-test
2 groups	Paired/dependent	Paired t-test
3+ groups	Independent/unpaired	One-way ANOVA
3+ groups	Paired/repeated measures	Repeated measures ANOVA
Relationship between 2 variables	Correlation	Pearson correlation
Predicting Y from X	Regression	Linear regression

Outcome: Continuous Data (Not Normally Distributed or Ordinal/Ranked Data)

Number of Groups	Study Design	Appropriate Test
1 group	Compare to hypothetical value	Wilcoxon signed-rank test or Mann-Whitney test
2 groups	Independent/unpaired	Mann-Whitney U test
2 groups	Paired/dependent	Wilcoxon signed-rank test
3+ groups	Independent/unpaired	Kruskal-Wallis test
3+ groups	Paired/repeated measures	Friedman test
Relationship between 2 variables	Correlation	Spearman correlation
Predicting Y from X	Regression	Nonparametric regression

Outcome: Categorical Data (Binary or Multiple Categories)

Number of Groups	Study Design	Appropriate Test
1 group	Compare proportions to expected	Chi-square goodness-of-fit test
2 groups	Independent/unpaired	Chi-square test or Fisher’s exact test
2 groups	Paired/dependent	McNemar’s test
3+ groups	Independent/unpaired	Chi-square test
Association between variables	Relationship	Contingency tables or logistic regression

Outcome: Survival or Time-to-Event Data

Number of Groups	Study Design	Appropriate Test
1 group	Describe survival pattern	Kaplan-Meier survival curve
2 groups	Compare survival between groups	Log-rank test
3+ groups	Compare survival across groups	Cox proportional hazards regression
Multiple predictors	Predict survival	Cox proportional hazards regression

How to use these tables:

Identify your primary outcome type (continuous, categorical, or survival time)
Determine if your outcome is normally distributed (if continuous)
Count how many groups you’re comparing
Identify whether your groups are independent or paired
Find the intersection of these characteristics in the appropriate table
The cell shows the recommended test(s)

This article was originally published on August 3, 2023, and updated on June 2, 2026.

Author

Marisha Fonseca

An editor at heart and perfectionist by disposition, providing solutions for journals, publishers, and universities in areas like alt-text writing and publication consultancy.

See more from Marisha Fonseca

Found this useful?

If so, share it with your fellow researchers

View Comments

Conducting Research Medicine