How to choose the right statistical test
Choosing the right statistical test requires considering five key factors: your research question, study design, data characteristics, parametric assumptions, and the pairing structure of your data. While this guide covers the most common scenarios, remember that statistical testing is not one-size-fits-all, and context matters.
Before analyzing your data:
- Clearly state your research question and primary outcome
- Document your study design (independent vs. paired groups, number of measurements)
- Assess your data for normality and outliers
- Consider your sample size and resulting statistical power
- Pre-specify whether you’ll use one-sided or two-sided tests
- Consult the decision tables above to identify candidate tests
- When in doubt, consider consulting with a biostatistician
Jump to Contents
- Research Question
- Study Design
- Data Characteristics
- Parametric vs. Nonparametric Tests
- Paired vs. Unpaired / Repeated Measures Tests
- Sample Size and Statistical Power
- One-Sided vs. Two-Sided Tests
- Test Selection Decision Table
Research Question
Start by clearly defining your research question, as it determines which statistical approach you’ll need.
Think about what you’re trying to investigate:
- Are you looking for relationships between variables? For example, if you want to investigate the relationship between two continuous variables like blood pressure and heart rate in patients, you should consider using correlation analysis. This helps you understand whether these variables are positively or negatively related and the strength of that relationship.
- Are you looking at the effect of an exposure? For instance, if you’re interested in whether exposure to X increases the likelihood of Y disease, you’d need to calculate odds ratios for exposed and unexposed groups.
- Are you comparing outcomes between groups? If your research question is about whether a new treatment reduces pain levels compared to a control, you’ll need a comparison test rather than a correlation.
- Are you predicting future outcomes? If you want to predict patient outcomes based on multiple variables, regression models would be appropriate.
Your research question directly shapes whether you should use descriptive statistics, correlation, regression, or group comparison tests.
Study Design
Next, examine your study design carefully, as it dictates which statistical tests are valid for your data.
Key design considerations:
- How many groups are you comparing? Tests for comparing two groups differ from tests for three or more groups. Two-group comparisons use t-tests or Mann-Whitney tests, while multiple-group comparisons require ANOVA or Kruskal-Wallis tests.
- Are your groups independent or related? This distinction is critical and determines whether you use unpaired or paired tests (covered in detail in the Paired vs. Unpaired section below).
- Are you measuring variables over time? If you’re conducting a longitudinal study with repeated measurements on the same subjects, you need repeated measures tests rather than standard tests.
Example:
Suppose you’re conducting a drug trial with multiple groups of patients and measuring the effect on pain levels at different time points. A two-way repeated measures ANOVA would be suitable. This allows you to examine both the effect of the drug (between-group factor) and the change over time (within-group factor).
Your study design determines the fundamental structure of the test you’ll use.
Data Characteristics
Before applying any test, you must evaluate several characteristics of your data.
Characteristics to assess:
- What type of outcome are you measuring? Continuous outcomes (like tumor size or blood pressure) require different tests than categorical outcomes (like presence or absence of disease).
- Is your data normally distributed? This is one of the most important questions, as it determines whether you can use parametric tests. If you’re studying the effects of a new treatment on a continuous outcome like tumor size reduction, check if your data is normally distributed. You can assess normality through histograms, Q-Q plots, or normality tests like Shapiro-Wilk.
- Are there outliers or extreme values? These can violate assumptions of parametric tests and may push you toward nonparametric alternatives.
Example
If you’re comparing survival rates of patients with different treatments (categorical data), a chi-square test would be appropriate. This test helps determine if there is a significant association between the treatment received and the survival outcome.
Understanding your data’s distribution is vital for choosing the correct test.
Parametric vs. Nonparametric Tests
This is one of the most critical decisions in test selection, yet it’s often misunderstood.
Parametric tests
Parametric tests make assumptions about your data:
- They assume your data comes from a normally distributed (Gaussian) population
- They work with the actual values of your data
- Common parametric tests: t-test, ANOVA, Pearson correlation, linear regression
- Parametric tests are generally more powerful (better at detecting true effects) when their assumptions are met
When to use parametric tests:
- Your data is approximately normally distributed
- You have a reasonable sample size (generally 20+ observations per group)
- Your outcome is measured on a continuous scale
- You’ve verified normality through visual inspection (histogram/Q-Q plot) or statistical tests
Nonparametric tests
Nonparametric tests make fewer assumptions:
- They don’t assume normal distribution
- They work with the ranks or order of your data rather than actual values
- Common nonparametric tests: Mann-Whitney U, Kruskal-Wallis, Spearman correlation, Wilcoxon signed-rank test
- They are often called “distribution-free” tests
When to use nonparametric tests:
- Your data is clearly not normally distributed (heavily skewed or bimodal)
- You have very small sample sizes where normality is hard to assess
- Your outcome is ordinal or ranked data (pain scores on a scale of 1-10, class rankings)
- Some values are “off the scale” (too high or too low to measure exactly)
- Your data contains extreme outliers that violate normality assumptions
Choosing between parametric and nonparametric tests:
| Parametric Test | Nonparametric Alternative | When to use nonparametric |
| t-test (2 groups) | Mann-Whitney U test | Data not normally distributed |
| Paired t-test | Wilcoxon signed-rank test | Non-normal paired data |
| ANOVA (3+ groups) | Kruskal-Wallis test | Non-normal data with multiple groups |
| Repeated measures ANOVA | Friedman test | Non-normal repeated measures data |
| Pearson correlation | Spearman correlation | Non-normal or ranked variables |
The choice between parametric and nonparametric tests depends on your data distribution. With large sample sizes, parametric tests are robust to violations of normality due to the central limit theorem. With small sample sizes, the choice matters much more, and nonparametric tests may be safer if you’re uncertain about normality.
Paired vs. Unpaired / Repeated Measures Tests
This decision point confuses many researchers. You must determine whether your groups are independent or related.
Use unpaired (independent) tests when:
- Each subject appears in only one group
- The groups are completely separate with no matching or pairing
- There is no logical connection between a value in one group and a specific value in another group
- Examples: comparing treatment group vs. control group, comparing men’s blood pressure to women’s blood pressure
Use paired tests when:
- The same subject is measured twice (before and after intervention)
- Subjects have been matched on important variables (age, disease severity)
- You have repeated measurements on the same subject over time
- There is a natural pairing between observations
- Examples: baseline vs. post-treatment measurements, measurements on twins, measurements on eyes (left eye vs. right eye of same person)
Common paired vs. unpaired test pairs:
| Comparison Type | Unpaired/Independent Test | Paired/Dependent Test |
| Compare 2 groups, continuous normally distributed data | Unpaired t-test | Paired t-test |
| Compare 2 groups, non-normal or ordinal data | Mann-Whitney U test | Wilcoxon signed-rank test |
| Compare 2 groups, categorical data | Chi-square or Fisher’s exact test | McNemar’s test |
| Compare 3+ groups, continuous normally distributed data | One-way ANOVA | Repeated measures ANOVA |
| Compare 3+ groups, non-normal data | Kruskal-Wallis test | Friedman test |
Why this distinction matters:
Paired tests are more powerful for detecting differences because they account for variability between subjects. However, you can only use a paired test if your study design actually involves paired or repeated measurements. You cannot decide pairing after collecting data based on the values you observed—the pairing must be predetermined by your study design.
Example scenario: If you measure patients’ pain levels before administering a treatment and again one week after treatment, you use a paired test (paired t-test or Wilcoxon) because the same person is measured twice. If you instead compare pain levels of one group receiving treatment to a different group receiving placebo, you use an unpaired test (unpaired t-test or Mann-Whitney) because different people are in each group.
Sample Size and Statistical Power
Sample size affects which tests you can use and how reliably they’ll detect real effects.
How sample size influences test choice:
- Large samples (n > 30 per group): Parametric tests are robust even if data isn’t perfectly normal. Both parametric and nonparametric tests work well, though parametric tests are slightly more powerful.
- Small samples (n < 20 per group): Nonparametric tests lose statistical power (ability to detect true differences). Parametric tests may be problematic if data isn’t normal, but with small samples it’s also hard to assess normality. This creates a dilemma with no perfect solution.
- Very small samples (n < 10): You have limited power to detect effects no matter which test you use. This is when study design decisions (paired vs. unpaired) become especially important.
Practical implications for test selection:
- With adequate sample size, you can usually choose the test based on data distribution
- With small sample size, even nonparametric alternatives may lack power
- Before data collection, conduct a power analysis to determine your required sample size
- Power analysis also accounts for the effect size you expect to detect and your desired statistical significance level (usually 0.05)
The relationship between sample size and test assumptions:
| Sample Size | If Data is Normal | If Data is Not Normal |
| Large (n > 30) | Use parametric test | Parametric test still works well |
| Small (n < 20) | Use parametric test | Consider nonparametric test, but power may be low |
| Very small (n < 10) | Nonparametric test may be safer | Nonparametric test, but expect low power |
Statistical power refers to your test’s ability to detect a true effect if one exists. Higher power is better. Power depends on sample size, effect size, significance level, and the specific test used. If your sample size is small, consider that your study may not have sufficient power to detect meaningful differences, even if they exist.
One-Sided vs. Two-Sided Tests
This decision must be made before collecting your data and affects your interpretation of results.
Two-sided tests (most common):
- Test whether a difference exists in either direction
- If comparing treatments A and B, tests whether A > B or B > A
- More conservative approach
- Recommended when you genuinely don’t know which direction differences might go
- Calculation: divide the P-value threshold by 2 for each tail of the distribution
- Example: with alpha = 0.05, each tail gets 0.025
Use two-sided tests when:
- You have no strong prior hypothesis about direction
- You’d be interested in results regardless of which direction the difference goes
- Following best practices (most journals and statisticians recommend two-sided tests)
- You’re uncertain about the relationship between variables
One-sided tests (less common):
- Test whether a difference exists in a specific, pre-specified direction
- Must predict the direction before analyzing data
- More powerful (easier to detect an effect if it exists in your predicted direction)
- Cannot detect or report effects in the opposite direction
- Calculation: uses the full significance level in one tail
Use one-sided tests only when:
- You have a very strong theoretical reason to predict direction
- You genuinely wouldn’t care about differences in the opposite direction
- You’ve specified the direction in your protocol before data collection
- You’re willing to ignore unexpected effects in the opposite direction
Critical caution:
Do not decide to use a one-sided test after looking at your data. This practice, called “p-hacking,” compromises the validity of your results. The direction must be specified in advance. If your data show effects opposite to your prediction, you cannot simply switch to a one-sided test to reach significance.
Example comparison:
| Scenario | Appropriate Test | Rationale |
| Testing if drug A differs from drug B in any way | Two-sided | No prediction about which is better |
| Testing if new treatment reduces pain more than control | One-sided (if pre-specified) | Directional hypothesis specified in advance |
| Comparing multiple patient outcomes | Two-sided | Multiple outcomes mean some may go either way |
| Testing if a standard reference value differs from your sample | Two-sided | Unless you predicted direction before study |
Unless you have a compelling reason and have pre-specified direction, use two-sided tests.
Test Selection Decision Table
This table guides you to appropriate tests based on your study characteristics. First identify your outcome type, then the number of groups, then your data distribution.
Outcome: Continuous Data (Normally Distributed)
| Number of Groups | Study Design | Appropriate Test |
| 1 group | Compare to hypothetical value | One-sample t-test |
| 2 groups | Independent/unpaired | Unpaired t-test |
| 2 groups | Paired/dependent | Paired t-test |
| 3+ groups | Independent/unpaired | One-way ANOVA |
| 3+ groups | Paired/repeated measures | Repeated measures ANOVA |
| Relationship between 2 variables | Correlation | Pearson correlation |
| Predicting Y from X | Regression | Linear regression |
Outcome: Continuous Data (Not Normally Distributed or Ordinal/Ranked Data)
| Number of Groups | Study Design | Appropriate Test |
| 1 group | Compare to hypothetical value | Wilcoxon signed-rank test or Mann-Whitney test |
| 2 groups | Independent/unpaired | Mann-Whitney U test |
| 2 groups | Paired/dependent | Wilcoxon signed-rank test |
| 3+ groups | Independent/unpaired | Kruskal-Wallis test |
| 3+ groups | Paired/repeated measures | Friedman test |
| Relationship between 2 variables | Correlation | Spearman correlation |
| Predicting Y from X | Regression | Nonparametric regression |
Outcome: Categorical Data (Binary or Multiple Categories)
| Number of Groups | Study Design | Appropriate Test |
| 1 group | Compare proportions to expected | Chi-square goodness-of-fit test |
| 2 groups | Independent/unpaired | Chi-square test or Fisher’s exact test |
| 2 groups | Paired/dependent | McNemar’s test |
| 3+ groups | Independent/unpaired | Chi-square test |
| Association between variables | Relationship | Contingency tables or logistic regression |
Outcome: Survival or Time-to-Event Data
| Number of Groups | Study Design | Appropriate Test |
| 1 group | Describe survival pattern | Kaplan-Meier survival curve |
| 2 groups | Compare survival between groups | Log-rank test |
| 3+ groups | Compare survival across groups | Cox proportional hazards regression |
| Multiple predictors | Predict survival | Cox proportional hazards regression |
How to use these tables:
- Identify your primary outcome type (continuous, categorical, or survival time)
- Determine if your outcome is normally distributed (if continuous)
- Count how many groups you’re comparing
- Identify whether your groups are independent or paired
- Find the intersection of these characteristics in the appropriate table
- The cell shows the recommended test(s)
This article was originally published on August 3, 2023, and updated on June 2, 2026.





