Correlation analysis: Types, when and how to conduct it

This article is in

Marisha Fonseca
Jun 6, 2026

Reading time

6 mins

Correlation analysis: Types, when and how to conduct it

In this article, you’ll learn

What is correlation analysis?
What are correlation coefficients?
Types of correlations
When to use correlation analysis in biomedical research
Assumptions for correlation analysis
Pearson correlation vs Spearman correlation: When to use each
What is the difference between correlation and regression?
Data collection methods for correlational research
Confounding variables in correlation analysis
Uses of correlation analysis in biomedical research
Five key precautions for correlation analysis
How to report correlation results: Example format
Frequently Asked Questions

What is correlation analysis?

Correlation analysis is a statistical method that helps biomedical researchers uncover relationships between two or more variables in their data. It determines whether a connection or association exists between different factors, such as the correlation between two biomarkers, the relationship between a treatment and patient outcomes, or the interplay between genetic factors and disease risk.

The primary goal is to assess the strength and direction of the relationship between variables. Importantly, correlation analysis identifies whether variables are related, but it does NOT establish whether one variable causes changes in the other.

What are correlation coefficients?

Researchers use a correlation coefficient to quantify the relationship between variables. The most commonly used are:

Pearson correlation coefficient (r) – used for continuous parametric data
Spearman’s rank correlation coefficient (rho or ρ) – used for continuous non-parametric data or ordinal data

Both coefficients range from -1 to +1:

Correlation Coefficient Value	Meaning
+1.0	Perfect positive correlation: as one variable increases, the other always increases proportionally
+0.7 to +0.99	Strong positive correlation
+0.3 to +0.69	Moderate positive correlation
0 to +0.29	Weak positive correlation
0	No correlation: variables are unrelated
-0.29 to -0.01	Weak negative correlation
-0.3 to -0.69	Moderate negative correlation
-0.7 to -0.99	Strong negative correlation
-1.0	Perfect negative correlation: as one variable increases, the other always decreases proportionally

Note: These ranges are conventional guidelines and may vary by research field and context.

Types of correlations

Correlations can be classified in three ways:

Positive and negative correlation

Type	Definition	Example
Positive correlation	As one variable increases, the other also increases	Blood pressure and risk of stroke; drug dose and therapeutic effect
Negative correlation	As one variable increases, the other decreases	Exercise frequency and body weight; treatment adherence and hospitalization rate
No correlation	Change in one variable has no effect on the other	Eye color and blood type

Linear and non-linear correlations

Type	Definition	Example
Linear correlation	Constant rate of change in one variable relative to another; appears as a straight line on a scatter plot	Height and weight; age and cholesterol level
Non-linear correlation	Inconsistent rate of change; the relationship changes across the range of values	Medication dose and therapeutic benefit (may plateau at higher doses); temperature and enzyme activity

Simple, multiple, and partial correlations

Type	Definition	When Used
Simple correlation	Examines relationship between only two variables	Smoking status and lung cancer risk
Multiple correlation	Examines relationship between three or more variables simultaneously	How rainfall, fertilizer quality, and sunlight together affect crop yield
Partial correlation	Examines relationship between two variables while controlling for other variables	Effect of age on disease severity while controlling for genetic factors and comorbidities

When to use correlation analysis in biomedical research

Use correlation analysis when:

You want to identify whether an association exists between variables without manipulating them
Variables cannot be controlled or manipulated for ethical, practical, or feasibility reasons (e.g., studying effects of disease on patients)
You need to understand natural relationships in real-world settings
You want to generate hypotheses for future experimental research
Your research question focuses on “Is there a relationship?” rather than “Does one variable cause the other?”

Do NOT use correlation analysis when:

Your goal is to establish causation
You can conduct an experimental study where you manipulate variables
You need to predict outcomes based on multiple independent variables (consider regression analysis instead)
Your data violates the assumptions required for your chosen correlation method

Assumptions for correlation analysis

The validity of correlation analysis depends on meeting specific statistical assumptions. The requirements differ based on your data type:

Assumptions for Pearson’s correlation (r)

Use Pearson correlation only when all of these are true:

Both variables are continuous and measured on an interval or ratio scale
Data follows a normal distribution for both variables
The relationship between variables is linear (check with scatter plot)
There are no significant outliers distorting the relationship
Sample size is adequate (generally n > 30 recommended)

Assumptions for Spearman correlation (ρ)

Use Spearman correlation when:

Data is ordinal, ranked, or continuous but non-normally distributed
The relationship may be non-linear
You have smaller sample sizes or significant outliers
You prefer a non-parametric approach

Pearson correlation vs Spearman correlation: When to use each

Characteristic	Pearson (r)	Spearman (ρ)
Data type required	Continuous, interval or ratio scale	Ordinal, ranked, or continuous
Distribution required	Normal distribution	No normality requirement
Relationship type	Linear only	Linear or non-linear
Sensitivity to outliers	High (affected by extreme values)	Low (uses ranks, not raw values)
Sample size flexibility	Better with larger samples	Works with smaller samples
Parametric or non-parametric	Parametric	Non-parametric
Biomedical example	Correlation between body weight and blood pressure	Correlation between pain scale ranking and mobility rating

What is the difference between correlation and regression?

Correlation and regression both examine relationships between variables but serve different purposes:

Aspect	Correlation Analysis	Regression Analysis
Main purpose	Determine if relationship exists between variables	Predict value of dependent variable from independent variable(s)
Direction of relationship	Symmetric (no causal direction implied)	Directional (independent variable predicts dependent variable)
Type of variables	Both variables treated equally	Distinguishes between predictor and outcome
What it measures	Strength and direction of relationship	How much change in one variable causes change in another
Prediction ability	Limited prediction capability	Designed for prediction
Complexity	Simple, two or more variables	Can handle multiple variables easily
Output	Correlation coefficient (r or ρ)	Regression equation; R-squared value
When to use	Exploratory analysis, hypothesis generation	When you want to predict outcomes

Example:

Correlation analysis asks “Do hours studied and exam scores relate?” Regression analysis asks “Can we predict exam score from hours studied, and by how much?”

Data collection methods for correlational research

Since variables are not manipulated, data can be collected through multiple methods:

Method	How It Works	Advantages	Disadvantages	Example
Naturalistic observation	Observe and record variables in their natural setting without intervention	Captures real-world behavior; realistic results; no artificial conditions	Cannot control variables; time-consuming; risk of researcher bias	Observing medication adherence patterns in patients at a clinic
Surveys and questionnaires	Participants complete surveys about variables of interest	Large sample sizes possible; cost-effective; quick data collection	Response bias; poorly designed questions affect results; unrepresentative sample	Questionnaire correlating stress levels with sleep quality in healthcare workers
Archival/secondary data	Analyze existing records, databases, or historical data	Free or low-cost; large datasets; long-term trend data; no participant burden	May be incomplete or unreliable; limited control over what was measured; data may not match your research question exactly	Using hospital records to correlate hospital stay duration with infection rates over 5 years

Confounding variables in correlation analysis

A confounding variable is a third factor that influences both variables you are studying, creating a false or misleading correlation.

Example:

You find a strong positive correlation between ice cream sales and drowning deaths across months of the year. It appears ice cream causes drowning. However, the confounding variable is seasonal temperature. Warmer weather (temperature) causes both higher ice cream sales AND more people swimming, which increases drowning risk. Temperature is the true underlying cause.
A study finds correlation between coffee consumption and heart disease risk. However, the confounding variable may be smoking: people who drink more coffee are more likely to smoke, and smoking (not coffee) causes heart disease.

Why confounders matter:

When confounding variables exist, the observed correlation does not reflect the true relationship between your two variables of interest. This is why correlation does not imply causation.

How to handle confounding variables:

Identify potential confounding variables in your study design
Measure confounding variables and report them
Use statistical methods like partial correlation to control for their effects
Acknowledge limitations in your results
Recommend further experimental research to establish causation

Uses of correlation analysis in biomedical research

Biomedical researchers employ correlation analysis for:

Investigating associations between risk factors and disease (smoking and lung cancer, blood pressure and heart disease severity)
Assessing relationships between biomarkers and disease progression
Exploring connections between genetic factors and disease risk
Identifying potential diagnostic or prognostic biomarkers
Understanding treatment response patterns
Examining relationships in health behavior studies
Analyzing relationships in epidemiological data
Generating hypotheses for further experimental investigation

By understanding these relationships, researchers can identify potential biomarkers, risk factors, or treatment strategies crucial for advancing disease understanding, optimizing patient care, and developing new therapies.

Five key precautions for correlation analysis

1. Direction Matters

Always describe correlation as positive or negative unless reporting the correlation coefficient (which includes the minus sign if negative)
Example: Say “a negative correlation exists” or “r = -0.52” but not “a negative correlation of weak strength”

2. Be precise about strength

Report exact correlation coefficient values in your abstract (e.g., r = 0.68) rather than vague terms like “strong” or “weak”
In the Methods section, define your classification system: specify what ranges you consider strong, moderate, and weak (e.g., “r values of 0.7-1.0 were considered strong”)
Document whether your strength classifications follow conventional guidelines or are field-specific

3. Assumptions must be met

The type of data you have determines which correlation analysis you should run
For continuous parametric variables with normal distribution → Pearson’s r
For continuous non-parametric variables or ordinal data → Spearman’s rho
Check and document that your data meets the required assumptions
Report any violations of assumptions and how you addressed them

4. Presentation accuracy matters

The “r” in Pearson’s r is always lowercase
The “ρ” in Spearman’s ρ is the Greek letter rho, NOT the English letter “p”
Always include the correlation coefficient value
Report p-values to indicate statistical significance (e.g., r = 0.65, p < 0.001)
Use correct statistical notation throughout your manuscript

5. Correlation does not imply causation

Just because two variables correlate does NOT mean one causes the other
Confounding variables may create the appearance of correlation
The relationship could be coincidental
Multiple competing explanations may exist for an observed correlation
Never draw causal conclusions from correlation analysis alone
Recommend controlled experimental studies to establish causation
Acknowledge alternative explanations for the correlation in your discussion

How to report correlation results: Example format

When presenting correlation findings in your biomedical paper:

Abstract example:

“A moderate positive correlation was found between patient age and disease severity (r = 0.54, p < 0.001, 95% CI [0.42, 0.64]).”

Methods section example:

“Pearson correlation coefficients were calculated to assess the relationship between continuous variables. Correlation strength was classified as weak (r < 0.3), moderate (0.3 ≤ r < 0.7), or strong (r ≥ 0.7). Statistical significance was set at p < 0.05. Spearman’s rho was used for variables not meeting normality assumptions as assessed by Shapiro-Wilk test.”

Results section example:

“Systolic blood pressure showed a strong positive correlation with left ventricular mass (r = 0.72, p < 0.001). Body mass index was moderately correlated with insulin resistance (ρ = 0.58, p = 0.002), and this relationship remained significant when controlling for age as a confounding variable (partial r = 0.51, p = 0.008).”

Frequently Asked Questions

Q1: If I find a correlation, does that mean one variable causes the other?

A: No. Correlation identifies that a relationship exists between two variables, but it does not establish causation. Three explanations exist for any observed correlation: (1) Variable A causes Variable B, (2) Variable B causes Variable A, or (3) A confounding third variable causes both.

For example, a study might find correlation between hospital admissions and ice cream sales, but seasonal temperature is the confounding variable causing both. Only controlled experimental studies can definitively establish causation.

Q2: How large should my sample size be to calculate a valid correlation?

A: A minimum sample size of 30 is generally recommended for Pearson correlation, though larger samples (n > 100) are preferred for more reliable estimates, especially if your data may have outliers. Spearman correlation can work with smaller samples. The required sample size depends on the expected correlation strength: smaller correlations require larger samples to detect. Use statistical power analysis software to calculate the specific sample size needed for your research based on your expected effect size and desired statistical power.

Q3: What if my data is not normally distributed?

A: If your data violates the normality assumption required for Pearson correlation, use Spearman’s rho (rank correlation) instead. Spearman correlation does not assume normal distribution and is more robust to outliers since it ranks data rather than using raw values. Alternatively, you can transform your data (e.g., log transformation) to approach normality, though this changes interpretation. Always check assumptions and report which method you used and why.

Q4: How do I interpret a correlation coefficient of 0.35?

A: A correlation of r = 0.35 indicates a weak to moderate positive relationship between the two variables. Using conventional guidelines, this would typically be classified as weak (< 0.3) to moderate (0.3-0.7) depending on your field. However, the interpretation depends on context: in some biomedical fields, even a weak correlation may be clinically meaningful. Always report the exact value, the p-value (statistical significance), and the confidence interval rather than just stating “weak” or “strong.” Discuss what the correlation means in practical terms for your research.

Q5: Can I use correlation analysis to predict future patient outcomes?

A: Correlation analysis alone is not designed for prediction. While correlation identifies relationships, regression analysis is better suited for making predictions. Regression analysis builds an equation that estimates how changes in predictor variables relate to changes in an outcome variable. If your goal is prediction (e.g., “Can we predict patient recovery time from their initial severity score?”), use linear or logistic regression instead. Correlation analysis is better for exploratory research and hypothesis generation.

References

The BMJ Statistics at Square One. (n.d.). Correlation and regression. https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/11-correlation-and-regression
Australian Bureau of Statistics. (n.d.). Correlation and causation. https://www.abs.gov.au/statistics/understanding-statistics/statistical-terms-and-concepts/correlation-and-causation
National Library of Medicine. (n.d.). Methods for correlational studies. In: Handbook of eHealth Evaluation: An Evidence-based Approach. https://www.ncbi.nlm.nih.gov/books/NBK481614/

Author

Marisha Fonseca

An editor at heart and perfectionist by disposition, providing solutions for journals, publishers, and universities in areas like alt-text writing and publication consultancy.

See more from Marisha Fonseca

Found this useful?

If so, share it with your fellow researchers

View Comments

Conducting Research Medicine

What is correlation analysis?

What are correlation coefficients?

Types of correlations

Positive and negative correlation

Linear and non-linear correlations

Simple, multiple, and partial correlations

When to use correlation analysis in biomedical research

Use correlation analysis when:

Do NOT use correlation analysis when:

Assumptions for correlation analysis

Assumptions for Pearson’s correlation (r)

Assumptions for Spearman correlation (ρ)

Pearson correlation vs Spearman correlation: When to use each

What is the difference between correlation and regression?

Example:

Data collection methods for correlational research

Confounding variables in correlation analysis

Example:

Why confounders matter:

How to handle confounding variables:

Uses of correlation analysis in biomedical research

Five key precautions for correlation analysis

1. Direction Matters

2. Be precise about strength

3. Assumptions must be met

4. Presentation accuracy matters

5. Correlation does not imply causation

How to report correlation results: Example format

Abstract example:

Methods section example:

Results section example:

Frequently Asked Questions

Q1: If I find a correlation, does that mean one variable causes the other?

Q2: How large should my sample size be to calculate a valid correlation?

Q3: What if my data is not normally distributed?

Q4: How do I interpret a correlation coefficient of 0.35?

Q5: Can I use correlation analysis to predict future patient outcomes?

References

Author

Marisha Fonseca

Found this useful?

Related post

Gender Parity in Drug Regulatory Guidelines: What...

What is ascertainment bias? Examples and preventive...

How to choose the right statistical test

Related Reading

Conducting a longitudinal study? Here’s why and how you should use Bayesian methods

What is data dredging? How to avoid data dredging

How to review a post-acceptance copyedited manuscript

How to analyze time-to-event data

Linking statistical significance to clinical importance of trial data: A paradigm shift

Filter by a topic