How to choose the right statistical test


Reading time
6 mins
 How to choose the right statistical test

Choosing the right statistical test requires considering five key factors: your research question, study design, data characteristics, parametric assumptions, and the pairing structure of your data. While this guide covers the most common scenarios, remember that statistical testing is not one-size-fits-all, and context matters.

Before analyzing your data:

  • Clearly state your research question and primary outcome
  • Document your study design (independent vs. paired groups, number of measurements)
  • Assess your data for normality and outliers
  • Consider your sample size and resulting statistical power
  • Pre-specify whether you’ll use one-sided or two-sided tests
  • Consult the decision tables above to identify candidate tests
  • When in doubt, consider consulting with a biostatistician

Jump to Contents

 

Research Question

Start by clearly defining your research question, as it determines which statistical approach you’ll need.

Think about what you’re trying to investigate:

  • Are you looking for relationships between variables? For example, if you want to investigate the relationship between two continuous variables like blood pressure and heart rate in patients, you should consider using correlation analysis. This helps you understand whether these variables are positively or negatively related and the strength of that relationship.
  • Are you looking at the effect of an exposure? For instance, if you’re interested in whether exposure to X increases the likelihood of Y disease, you’d need to calculate odds ratios for exposed and unexposed groups.
  • Are you comparing outcomes between groups? If your research question is about whether a new treatment reduces pain levels compared to a control, you’ll need a comparison test rather than a correlation.
  • Are you predicting future outcomes? If you want to predict patient outcomes based on multiple variables, regression models would be appropriate.

Your research question directly shapes whether you should use descriptive statistics, correlation, regression, or group comparison tests.

Study Design

Next, examine your study design carefully, as it dictates which statistical tests are valid for your data.

Key design considerations:

  • How many groups are you comparing? Tests for comparing two groups differ from tests for three or more groups. Two-group comparisons use t-tests or Mann-Whitney tests, while multiple-group comparisons require ANOVA or Kruskal-Wallis tests.
  • Are your groups independent or related? This distinction is critical and determines whether you use unpaired or paired tests (covered in detail in the Paired vs. Unpaired section below).
  • Are you measuring variables over time? If you’re conducting a longitudinal study with repeated measurements on the same subjects, you need repeated measures tests rather than standard tests.

Example:

Suppose you’re conducting a drug trial with multiple groups of patients and measuring the effect on pain levels at different time points. A two-way repeated measures ANOVA would be suitable. This allows you to examine both the effect of the drug (between-group factor) and the change over time (within-group factor).

Your study design determines the fundamental structure of the test you’ll use.

Data Characteristics

Before applying any test, you must evaluate several characteristics of your data.

Characteristics to assess:

  • What type of outcome are you measuring? Continuous outcomes (like tumor size or blood pressure) require different tests than categorical outcomes (like presence or absence of disease).
  • Is your data normally distributed? This is one of the most important questions, as it determines whether you can use parametric tests. If you’re studying the effects of a new treatment on a continuous outcome like tumor size reduction, check if your data is normally distributed. You can assess normality through histograms, Q-Q plots, or normality tests like Shapiro-Wilk.
  • Are there outliers or extreme values? These can violate assumptions of parametric tests and may push you toward nonparametric alternatives.

Example

If you’re comparing survival rates of patients with different treatments (categorical data), a chi-square test would be appropriate. This test helps determine if there is a significant association between the treatment received and the survival outcome.

Understanding your data’s distribution is vital for choosing the correct test.

Parametric vs. Nonparametric Tests

This is one of the most critical decisions in test selection, yet it’s often misunderstood.

Parametric tests

Parametric tests make assumptions about your data:

  • They assume your data comes from a normally distributed (Gaussian) population
  • They work with the actual values of your data
  • Common parametric tests: t-test, ANOVA, Pearson correlation, linear regression
  • Parametric tests are generally more powerful (better at detecting true effects) when their assumptions are met

When to use parametric tests:

  • Your data is approximately normally distributed
  • You have a reasonable sample size (generally 20+ observations per group)
  • Your outcome is measured on a continuous scale
  • You’ve verified normality through visual inspection (histogram/Q-Q plot) or statistical tests

Nonparametric tests

Nonparametric tests make fewer assumptions:

  • They don’t assume normal distribution
  • They work with the ranks or order of your data rather than actual values
  • Common nonparametric tests: Mann-Whitney U, Kruskal-Wallis, Spearman correlation, Wilcoxon signed-rank test
  • They are often called “distribution-free” tests

When to use nonparametric tests:

  • Your data is clearly not normally distributed (heavily skewed or bimodal)
  • You have very small sample sizes where normality is hard to assess
  • Your outcome is ordinal or ranked data (pain scores on a scale of 1-10, class rankings)
  • Some values are “off the scale” (too high or too low to measure exactly)
  • Your data contains extreme outliers that violate normality assumptions

Choosing between parametric and nonparametric tests:

Parametric Test Nonparametric Alternative When to use nonparametric
t-test (2 groups) Mann-Whitney U test Data not normally distributed
Paired t-test Wilcoxon signed-rank test Non-normal paired data
ANOVA (3+ groups) Kruskal-Wallis test Non-normal data with multiple groups
Repeated measures ANOVA Friedman test Non-normal repeated measures data
Pearson correlation Spearman correlation Non-normal or ranked variables

 

The choice between parametric and nonparametric tests depends on your data distribution. With large sample sizes, parametric tests are robust to violations of normality due to the central limit theorem. With small sample sizes, the choice matters much more, and nonparametric tests may be safer if you’re uncertain about normality.

Paired vs. Unpaired / Repeated Measures Tests

This decision point confuses many researchers. You must determine whether your groups are independent or related.

Use unpaired (independent) tests when:

  • Each subject appears in only one group
  • The groups are completely separate with no matching or pairing
  • There is no logical connection between a value in one group and a specific value in another group
  • Examples: comparing treatment group vs. control group, comparing men’s blood pressure to women’s blood pressure

Use paired tests when:

  • The same subject is measured twice (before and after intervention)
  • Subjects have been matched on important variables (age, disease severity)
  • You have repeated measurements on the same subject over time
  • There is a natural pairing between observations
  • Examples: baseline vs. post-treatment measurements, measurements on twins, measurements on eyes (left eye vs. right eye of same person)

Common paired vs. unpaired test pairs:

Comparison Type Unpaired/Independent Test Paired/Dependent Test
Compare 2 groups, continuous normally distributed data Unpaired t-test Paired t-test
Compare 2 groups, non-normal or ordinal data Mann-Whitney U test Wilcoxon signed-rank test
Compare 2 groups, categorical data Chi-square or Fisher’s exact test McNemar’s test
Compare 3+ groups, continuous normally distributed data One-way ANOVA Repeated measures ANOVA
Compare 3+ groups, non-normal data Kruskal-Wallis test Friedman test

 

Why this distinction matters:

Paired tests are more powerful for detecting differences because they account for variability between subjects. However, you can only use a paired test if your study design actually involves paired or repeated measurements. You cannot decide pairing after collecting data based on the values you observed—the pairing must be predetermined by your study design.

Example scenario: If you measure patients’ pain levels before administering a treatment and again one week after treatment, you use a paired test (paired t-test or Wilcoxon) because the same person is measured twice. If you instead compare pain levels of one group receiving treatment to a different group receiving placebo, you use an unpaired test (unpaired t-test or Mann-Whitney) because different people are in each group.

Sample Size and Statistical Power

Sample size affects which tests you can use and how reliably they’ll detect real effects.

How sample size influences test choice:

  • Large samples (n > 30 per group): Parametric tests are robust even if data isn’t perfectly normal. Both parametric and nonparametric tests work well, though parametric tests are slightly more powerful.
  • Small samples (n < 20 per group): Nonparametric tests lose statistical power (ability to detect true differences). Parametric tests may be problematic if data isn’t normal, but with small samples it’s also hard to assess normality. This creates a dilemma with no perfect solution.
  • Very small samples (n < 10): You have limited power to detect effects no matter which test you use. This is when study design decisions (paired vs. unpaired) become especially important.

Practical implications for test selection:

  • With adequate sample size, you can usually choose the test based on data distribution
  • With small sample size, even nonparametric alternatives may lack power
  • Before data collection, conduct a power analysis to determine your required sample size
  • Power analysis also accounts for the effect size you expect to detect and your desired statistical significance level (usually 0.05)

The relationship between sample size and test assumptions:

Sample Size If Data is Normal If Data is Not Normal
Large (n > 30) Use parametric test Parametric test still works well
Small (n < 20) Use parametric test Consider nonparametric test, but power may be low
Very small (n < 10) Nonparametric test may be safer Nonparametric test, but expect low power

Statistical power refers to your test’s ability to detect a true effect if one exists. Higher power is better. Power depends on sample size, effect size, significance level, and the specific test used. If your sample size is small, consider that your study may not have sufficient power to detect meaningful differences, even if they exist.

One-Sided vs. Two-Sided Tests

This decision must be made before collecting your data and affects your interpretation of results.

Two-sided tests (most common):

  • Test whether a difference exists in either direction
  • If comparing treatments A and B, tests whether A > B or B > A
  • More conservative approach
  • Recommended when you genuinely don’t know which direction differences might go
  • Calculation: divide the P-value threshold by 2 for each tail of the distribution
  • Example: with alpha = 0.05, each tail gets 0.025

Use two-sided tests when:

  • You have no strong prior hypothesis about direction
  • You’d be interested in results regardless of which direction the difference goes
  • Following best practices (most journals and statisticians recommend two-sided tests)
  • You’re uncertain about the relationship between variables

One-sided tests (less common):

  • Test whether a difference exists in a specific, pre-specified direction
  • Must predict the direction before analyzing data
  • More powerful (easier to detect an effect if it exists in your predicted direction)
  • Cannot detect or report effects in the opposite direction
  • Calculation: uses the full significance level in one tail

Use one-sided tests only when:

  • You have a very strong theoretical reason to predict direction
  • You genuinely wouldn’t care about differences in the opposite direction
  • You’ve specified the direction in your protocol before data collection
  • You’re willing to ignore unexpected effects in the opposite direction

Critical caution:

Do not decide to use a one-sided test after looking at your data. This practice, called “p-hacking,” compromises the validity of your results. The direction must be specified in advance. If your data show effects opposite to your prediction, you cannot simply switch to a one-sided test to reach significance.

Example comparison:

Scenario Appropriate Test Rationale
Testing if drug A differs from drug B in any way Two-sided No prediction about which is better
Testing if new treatment reduces pain more than control One-sided (if pre-specified) Directional hypothesis specified in advance
Comparing multiple patient outcomes Two-sided Multiple outcomes mean some may go either way
Testing if a standard reference value differs from your sample Two-sided Unless you predicted direction before study

Unless you have a compelling reason and have pre-specified direction, use two-sided tests.

Test Selection Decision Table

This table guides you to appropriate tests based on your study characteristics. First identify your outcome type, then the number of groups, then your data distribution.

Outcome: Continuous Data (Normally Distributed)

Number of Groups Study Design Appropriate Test
1 group Compare to hypothetical value One-sample t-test
2 groups Independent/unpaired Unpaired t-test
2 groups Paired/dependent Paired t-test
3+ groups Independent/unpaired One-way ANOVA
3+ groups Paired/repeated measures Repeated measures ANOVA
Relationship between 2 variables Correlation Pearson correlation
Predicting Y from X Regression Linear regression

 

Outcome: Continuous Data (Not Normally Distributed or Ordinal/Ranked Data)

Number of Groups Study Design Appropriate Test
1 group Compare to hypothetical value Wilcoxon signed-rank test or Mann-Whitney test
2 groups Independent/unpaired Mann-Whitney U test
2 groups Paired/dependent Wilcoxon signed-rank test
3+ groups Independent/unpaired Kruskal-Wallis test
3+ groups Paired/repeated measures Friedman test
Relationship between 2 variables Correlation Spearman correlation
Predicting Y from X Regression Nonparametric regression

 

Outcome: Categorical Data (Binary or Multiple Categories)

Number of Groups Study Design Appropriate Test
1 group Compare proportions to expected Chi-square goodness-of-fit test
2 groups Independent/unpaired Chi-square test or Fisher’s exact test
2 groups Paired/dependent McNemar’s test
3+ groups Independent/unpaired Chi-square test
Association between variables Relationship Contingency tables or logistic regression

 

Outcome: Survival or Time-to-Event Data

Number of Groups Study Design Appropriate Test
1 group Describe survival pattern Kaplan-Meier survival curve
2 groups Compare survival between groups Log-rank test
3+ groups Compare survival across groups Cox proportional hazards regression
Multiple predictors Predict survival Cox proportional hazards regression

 

How to use these tables:

  • Identify your primary outcome type (continuous, categorical, or survival time)
  • Determine if your outcome is normally distributed (if continuous)
  • Count how many groups you’re comparing
  • Identify whether your groups are independent or paired
  • Find the intersection of these characteristics in the appropriate table
  • The cell shows the recommended test(s)

 

This article was originally published on August 3, 2023, and updated on June 2, 2026.

Author

Marisha Fonseca

An editor at heart and perfectionist by disposition, providing solutions for journals, publishers, and universities in areas like alt-text writing and publication consultancy.

See more from Marisha Fonseca

Found this useful?

If so, share it with your fellow researchers


Related post

Related Reading