Heterogeneity and homogeneity of treatment effects: what biomedical researchers need to know

This article is in

Data Analysis

Marisha Fonseca
Apr 20, 2026

Reading time

5 mins

Heterogeneity and homogeneity of treatment effects: what biomedical researchers need to know

Heterogeneity and Homogeneity of Treatment Effects: Definition, Examples, and Tests

When a drug cuts blood pressure in half for one patient but does almost nothing for another, that gap is not noise. Understanding why treatments work differently across individuals, or confirming that they work the same way, is one of the most practically important questions in medicine, economics, public policy, and social science. This post unpacks the concepts of treatment effect heterogeneity and homogeneity, explains why they matter, and shows how researchers detect and interpret them.

Jump to Contents

What Is a Treatment Effect?
Homogeneity of Treatment Effects
Heterogeneity of Treatment Effects
How to Measure and Test Heterogeneity
Heterogeneity in Different Disciplines
Pitfalls and Common Mistakes
Practical Recommendations for Researchers
Summary

What Is a Treatment Effect?

Before diving into heterogeneity, it helps to pin down what a “treatment effect” actually means.

In causal inference, a treatment effect is the difference in an outcome for a unit that received a treatment versus what that same unit would have experienced without it. Formally, for individual i:

Individual Treatment Effect (ITE) = Y(1)ᵢ − Y(0)ᵢ

where Y(1) is the potential outcome under treatment and Y(0) is the potential outcome under control.

Because we can never observe both states for the same person at the same time (the fundamental problem of causal inference), researchers typically estimate averages:

Estimand	Full Name	What It Answers
ATE	Average Treatment Effect	Effect for a randomly drawn unit from the full population
ATT	Average Treatment Effect on the Treated	Effect for those who actually received treatment
ATU	Average Treatment Effect on the Untreated	Effect for those who did not receive treatment
CATE	Conditional Average Treatment Effect	Effect for a subgroup defined by covariates X

When the ATE, ATT, and ATU are all roughly the same, treatment effects are homogeneous. When they diverge or when the CATE varies substantially across subgroups, treatment effects are heterogeneous.

Homogeneity of Treatment Effects

Definition

A treatment is said to have homogeneous effects when its causal impact is approximately constant across all units in the population. In other words, if you knew the ATE, you would have a pretty accurate prediction of the effect for any individual.

Assumptions That Imply Homogeneity

Homogeneity is rarely a natural feature of the world. It is more often an assumption researchers invoke, sometimes consciously, sometimes not:

Constant treatment effect assumption: Y(1)ᵢ − Y(0)ᵢ = τ for all i, where τ is a fixed scalar
No treatment-covariate interaction: the effect does not depend on age, gender, baseline risk, or any other observable
Stable unit treatment value assumption (SUTVA): no spillovers, and only one version of treatment exists

When Homogeneity Is Plausible

Homogeneity is more defensible in some settings than others:

Physical and chemical interventions acting through a single well-understood mechanism (e.g., a specific enzyme inhibitor)
Highly standardized environments where the delivery of treatment does not vary (randomized lab experiments with tight protocols)
Short time horizons where individual adaptation and learning cannot yet vary the response

The Cost of Wrongly Assuming Homogeneity

Assuming homogeneity when effects are actually heterogeneous can lead to serious errors:

Overestimating benefits for some subgroups while masking harm in others
Under-powered subgroup analyses that look “null” even when real variation exists
Inefficient policy targeting: spending resources on people who gain little from a program
Misleading meta-analyses that average away clinically important differences

Heterogeneity of Treatment Effects

Definition

Treatment effect heterogeneity (TEH) exists when the causal impact of an intervention varies systematically across individuals, subgroups, or contexts. This is the rule, not the exception, in most real-world settings.

Sources of Heterogeneity

Heterogeneity can arise from multiple, often interacting, sources:

Biological and individual characteristics

Genetic variation (e.g., in pharmacogenomics, CYP2D6 metabolizer status affects drug dosage needs)
Age, sex, body mass index, comorbidities
Baseline severity of the condition being treated

Behavioral and psychological factors

Adherence and compliance patterns
Baseline motivation or prior experience with similar interventions
Social support networks

Contextual and environmental factors

Geographic and institutional setting
Concurrent exposures (other medications, policies, economic shocks)
Timing and dose of treatment

Statistical and methodological sources

Model misspecification can create apparent heterogeneity
Measurement error in moderating variables
Small sample sizes inflating variance in subgroup estimates

Visualising Heterogeneity

A simple conceptual picture helps. Imagine a forest plot where each row is a subgroup:

The overall ATE looks respectable, but there’s actually considerable variation by sex and income. Reporting only the ATE would be profoundly misleading in this scenario.

How to Measure and Test Heterogeneity

The Interaction Test (Effect Modification)

The most common approach in regression-based studies is to include an interaction term between the treatment indicator and a potential moderator:

Y = α + τ·T + β·X + γ·(T × X) + ε

If γ is statistically significant, the treatment effect varies with X
γ is the estimate of effect modification by X
This approach is confirmatory — it tests a pre-specified moderator

Key limitations:

Only tests one moderator at a time (requires multiple tests with multiplicity correction)
Assumes the moderating relationship is linear
Severely underpowered for most trial sample sizes when subgroups are small

Variance of Individual Treatment Effects

In experimental settings with repeated measures or clustered designs, it is sometimes possible to bound or estimate the variance of individual treatment effects directly. A large variance is direct evidence of heterogeneity even without identifying who benefits more.

Quantile Treatment Effects

Rather than asking about the average effect, quantile treatment effects (QTE) ask: what is the treatment effect at the 10th percentile of the outcome distribution? The 50th? The 90th?

Quantile	Effect Size	Interpretation
10th percentile	−0.3	Some units are harmed by treatment
25th percentile	+0.1	Modest benefit for lower quartile
50th percentile	+0.8	Median unit benefits moderately
75th percentile	+1.9	Strong benefit for upper quartile
90th percentile	+3.2	Large effect for high responders

A flat QTE profile supports homogeneity; a steeply sloped one suggests considerable heterogeneity.

Machine Learning Approaches: CATE Estimation

Modern machine learning methods have transformed the study of treatment effect heterogeneity by searching for high-dimensional patterns in moderators without pre-specification.

Prominent methods include:

Causal Forests (Wager & Athey, 2018): Builds an ensemble of causal trees that partition the covariate space into regions of similar treatment response; provides honest, out-of-sample CATE estimates
X-Learner: A meta-learner that imputes missing potential outcomes and regresses treatment effects on covariates; particularly effective when treatment and control groups are unequal in size
Double/Debiased Machine Learning (DML): Uses cross-fitting and orthogonalization to estimate treatment effects while controlling for high-dimensional confounders
Bayesian Additive Regression Trees (BART): Nonparametric Bayesian approach that naturally quantifies uncertainty in CATE estimates

Trade-offs to keep in mind:

Approach	Strengths	Weaknesses
Interaction regression	Simple, interpretable, widely understood	Requires pre-specification, low power
Causal forests	Data-adaptive, honest inference	Computationally intensive, less interpretable
X-Learner	Efficient with unequal group sizes	Requires good nuisance model estimation
BART	Full posterior uncertainty, flexible	Slow for very large datasets

Heterogeneity in Different Disciplines

Clinical Medicine and Pharmacology

The concept of precision medicine is built almost entirely on treatment effect heterogeneity. The goal is to identify which patients benefit from which treatments:

Biomarker-stratified trials pre-specify a biological moderator (e.g., PD-L1 expression for immunotherapy)
Adaptive enrichment designs enrich enrollment toward subgroups showing larger effects mid-trial
Pharmacogenomics maps genetic variation onto differential drug metabolism and response

A landmark example: HER2-positive breast cancer patients benefit enormously from trastuzumab, while HER2-negative patients gain essentially nothing. A single average effect across both groups obscures a treatment that is transformative for one subgroup and wasteful (with non-trivial toxicity) for another.

Economics and Public Policy

In program evaluation, heterogeneous treatment effects have enormous implications for policy targeting:

Job training programs may benefit low-skilled workers significantly but offer little to those already near the top of the wage distribution
Conditional cash transfer programs show very different effects on education outcomes by baseline poverty level and rural/urban setting
Minimum wage increases have heterogeneous effects across industries, firm sizes, and local labor market conditions

The Local Average Treatment Effect (LATE), which is estimated by instrumental variables, is itself a heterogeneity concept: it captures the ATE only for “compliers,” those whose treatment status is changed by the instrument.

Education Research

Heterogeneity is pervasive in educational interventions:

Class-size reductions tend to benefit younger students and those from low-income backgrounds more than older or affluent peers
Tutoring programs show large effects for students with moderate skill gaps but smaller effects at the extremes (floor and ceiling effects)
Digital learning tools interact strongly with home internet access and parental support

Pitfalls and Common Mistakes

Even sophisticated researchers fall into traps when studying treatment effect heterogeneity. Watch for:

Data dredging/subgroup fishing: Running many subgroup analyses and reporting only significant ones dramatically inflates false discovery rates. Pre-registration and multiplicity correction are essential.
Confirmatory bias in forest plots: Presenting a forest plot with 20 subgroups but not testing the interaction formally leads to spurious conclusions.
Ecological fallacy in reverse: Group-level heterogeneity does not necessarily imply individual-level heterogeneity, and vice versa.
Conflating statistical heterogeneity with clinical heterogeneity: A statistically significant interaction may be too small to matter clinically; a clinically meaningful difference may lack statistical power to be detected.
Ignoring treatment effect heterogeneity in meta-analyses: The I² statistic in meta-analysis quantifies between-study variance but does not explain why studies differ. Random-effects models account for heterogeneity statistically but do not resolve it scientifically.

Practical Recommendations for Researchers

Before Data Collection

Pre-register hypotheses about moderators; distinguish confirmatory from exploratory subgroup analyses
Power the study to detect meaningful interactions, not just main effects (interaction tests require roughly 4× the sample size of main effect tests)
Collect rich covariate data to enable post-hoc heterogeneity analysis

During Analysis

Report the ATE alongside CATEs; never only report subgroup effects without the overall estimate
Use appropriate corrections for multiple comparisons (Bonferroni, Benjamini-Hochberg, or pre-specified hierarchical testing)
Visualize effect distributions, not just point estimates and confidence intervals

When Reporting

Clearly distinguish pre-specified from exploratory subgroup analyses
Report effect sizes and confidence intervals for each subgroup, not just p-values
Discuss the direction of heterogeneity: is it quantitative (same direction, different magnitude) or qualitative (different directions, a.k.a. crossover interaction)?

Summary

Concept	Key Point
Homogeneous treatment effect	Constant causal impact across all units; often an assumption, rarely true
Heterogeneous treatment effect	Varying causal impact across individuals or subgroups; the norm in complex systems
CATE	The expected treatment effect conditional on observed covariates X
Effect modification	A covariate that systematically changes the magnitude or direction of the treatment effect
Interaction test	The standard regression tool for testing effect modification; requires pre-specification and is underpowered
Causal forests / ML	Data-adaptive methods for discovering heterogeneity in high-dimensional covariate spaces
Precision medicine	The applied pursuit of treatment effect heterogeneity in clinical settings

Treatment effect heterogeneity is not a statistical nuisance to be controlled away. It is a scientific fact about how complex interventions interact with a heterogeneous world. Embracing it rigorously, rather than averaging it away, is what separates research that merely answers “does this work on average?” from research that answers the far more useful question: “does this work for whom, under what conditions, and why?”

This article was originally published on February 14, 2023, and revised on April 20, 2026.

Author

Marisha Fonseca

An editor at heart and perfectionist by disposition, providing solutions for journals, publishers, and universities in areas like alt-text writing and publication consultancy.

See more from Marisha Fonseca

Found this useful?

If so, share it with your fellow researchers

View Comments

Conducting Research Medicine

Heterogeneity and homogeneity of treatment effects: what biomedical researchers need to know

What Is a Treatment Effect?

Homogeneity of Treatment Effects

Definition

Assumptions That Imply Homogeneity

When Homogeneity Is Plausible

The Cost of Wrongly Assuming Homogeneity

Heterogeneity of Treatment Effects

Definition

Sources of Heterogeneity

Biological and individual characteristics

Behavioral and psychological factors

Contextual and environmental factors

Statistical and methodological sources

Visualising Heterogeneity

How to Measure and Test Heterogeneity

The Interaction Test (Effect Modification)

Key limitations:

Variance of Individual Treatment Effects

Quantile Treatment Effects

Machine Learning Approaches: CATE Estimation

Trade-offs to keep in mind:

Heterogeneity in Different Disciplines

Clinical Medicine and Pharmacology

Education Research

Pitfalls and Common Mistakes

Practical Recommendations for Researchers

Before Data Collection

During Analysis

When Reporting

Summary

Author

Marisha Fonseca

Found this useful?

Related Reading

How to use both Bayesian and frequentist statistics in your study

How to match experimental and control groups

Unlocking the secrets of our genes: Best practices in genome-wide association studies

5 Exciting trends shaping omics data analysis

A handy guide to Bayesian Neural Networks for biomedical researchers

What Is a Treatment Effect?

Homogeneity of Treatment Effects

Definition

Assumptions That Imply Homogeneity

When Homogeneity Is Plausible

The Cost of Wrongly Assuming Homogeneity

Heterogeneity of Treatment Effects

Definition

Sources of Heterogeneity

Biological and individual characteristics

Behavioral and psychological factors

Contextual and environmental factors

Statistical and methodological sources

Visualising Heterogeneity

How to Measure and Test Heterogeneity

The Interaction Test (Effect Modification)

Key limitations:

Variance of Individual Treatment Effects

Quantile Treatment Effects

Machine Learning Approaches: CATE Estimation

Trade-offs to keep in mind:

Heterogeneity in Different Disciplines

Clinical Medicine and Pharmacology

Education Research

Pitfalls and Common Mistakes

Practical Recommendations for Researchers

Before Data Collection

During Analysis

When Reporting

Summary

Author

Marisha Fonseca

Found this useful?

Related post

Gender Parity in Drug Regulatory Guidelines: What...

What is ascertainment bias? Examples and preventive...

How to conduct and report clinical trials

Related Reading

How to use both Bayesian and frequentist statistics in your study

How to match experimental and control groups

Unlocking the secrets of our genes: Best practices in genome-wide association studies

5 Exciting trends shaping omics data analysis

A handy guide to Bayesian Neural Networks for biomedical researchers

Filter by a topic