Heterogeneity and homogeneity of treatment effects: what biomedical researchers need to know

This article is in

Reading time
5 mins
 Heterogeneity and homogeneity of treatment effects: what biomedical researchers need to know

Heterogeneity and Homogeneity of Treatment Effects: Definition, Examples, and Tests

When a drug cuts blood pressure in half for one patient but does almost nothing for another, that gap is not noise. Understanding why treatments work differently across individuals, or confirming that they work the same way, is one of the most practically important questions in medicine, economics, public policy, and social science. This post unpacks the concepts of treatment effect heterogeneity and homogeneity, explains why they matter, and shows how researchers detect and interpret them.

 

Jump to Contents

 

What Is a Treatment Effect?

Before diving into heterogeneity, it helps to pin down what a “treatment effect” actually means.

In causal inference, a treatment effect is the difference in an outcome for a unit that received a treatment versus what that same unit would have experienced without it. Formally, for individual i:

Individual Treatment Effect (ITE) = Y(1)ᵢ − Y(0)ᵢ

where Y(1) is the potential outcome under treatment and Y(0) is the potential outcome under control.

Because we can never observe both states for the same person at the same time (the fundamental problem of causal inference), researchers typically estimate averages:

Estimand Full Name What It Answers
ATE Average Treatment Effect Effect for a randomly drawn unit from the full population
ATT Average Treatment Effect on the Treated Effect for those who actually received treatment
ATU Average Treatment Effect on the Untreated Effect for those who did not receive treatment
CATE Conditional Average Treatment Effect Effect for a subgroup defined by covariates X

When the ATE, ATT, and ATU are all roughly the same, treatment effects are homogeneous. When they diverge or when the CATE varies substantially across subgroups, treatment effects are heterogeneous.

 

Homogeneity of Treatment Effects

Definition

A treatment is said to have homogeneous effects when its causal impact is approximately constant across all units in the population. In other words, if you knew the ATE, you would have a pretty accurate prediction of the effect for any individual.

Assumptions That Imply Homogeneity

Homogeneity is rarely a natural feature of the world. It is more often an assumption researchers invoke, sometimes consciously, sometimes not:

  • Constant treatment effect assumption: Y(1)ᵢ − Y(0)ᵢ = τ for all i, where τ is a fixed scalar
  • No treatment-covariate interaction: the effect does not depend on age, gender, baseline risk, or any other observable
  • Stable unit treatment value assumption (SUTVA): no spillovers, and only one version of treatment exists

When Homogeneity Is Plausible

Homogeneity is more defensible in some settings than others:

  • Physical and chemical interventions acting through a single well-understood mechanism (e.g., a specific enzyme inhibitor)
  • Highly standardized environments where the delivery of treatment does not vary (randomized lab experiments with tight protocols)
  • Short time horizons where individual adaptation and learning cannot yet vary the response

The Cost of Wrongly Assuming Homogeneity

Assuming homogeneity when effects are actually heterogeneous can lead to serious errors:

  • Overestimating benefits for some subgroups while masking harm in others
  • Under-powered subgroup analyses that look “null” even when real variation exists
  • Inefficient policy targeting: spending resources on people who gain little from a program
  • Misleading meta-analyses that average away clinically important differences

 

Heterogeneity of Treatment Effects

Definition

Treatment effect heterogeneity (TEH) exists when the causal impact of an intervention varies systematically across individuals, subgroups, or contexts. This is the rule, not the exception, in most real-world settings.

Sources of Heterogeneity

Heterogeneity can arise from multiple, often interacting, sources:

Biological and individual characteristics

  • Genetic variation (e.g., in pharmacogenomics, CYP2D6 metabolizer status affects drug dosage needs)
  • Age, sex, body mass index, comorbidities
  • Baseline severity of the condition being treated

Behavioral and psychological factors

  • Adherence and compliance patterns
  • Baseline motivation or prior experience with similar interventions
  • Social support networks

Contextual and environmental factors

  • Geographic and institutional setting
  • Concurrent exposures (other medications, policies, economic shocks)
  • Timing and dose of treatment

Statistical and methodological sources

  • Model misspecification can create apparent heterogeneity
  • Measurement error in moderating variables
  • Small sample sizes inflating variance in subgroup estimates

Visualising Heterogeneity

A simple conceptual picture helps. Imagine a forest plot where each row is a subgroup:

The overall ATE looks respectable, but there’s actually considerable variation by sex and income. Reporting only the ATE would be profoundly misleading in this scenario.

 

How to Measure and Test Heterogeneity

The Interaction Test (Effect Modification)

The most common approach in regression-based studies is to include an interaction term between the treatment indicator and a potential moderator:

Y = α + τ·T + β·X + γ·(T × X) + ε

  • If γ is statistically significant, the treatment effect varies with X
  • γ is the estimate of effect modification by X
  • This approach is confirmatory — it tests a pre-specified moderator

Key limitations:

  • Only tests one moderator at a time (requires multiple tests with multiplicity correction)
  • Assumes the moderating relationship is linear
  • Severely underpowered for most trial sample sizes when subgroups are small

Variance of Individual Treatment Effects

In experimental settings with repeated measures or clustered designs, it is sometimes possible to bound or estimate the variance of individual treatment effects directly. A large variance is direct evidence of heterogeneity even without identifying who benefits more.

Quantile Treatment Effects

Rather than asking about the average effect, quantile treatment effects (QTE) ask: what is the treatment effect at the 10th percentile of the outcome distribution? The 50th? The 90th?

Quantile Effect Size Interpretation
10th percentile −0.3 Some units are harmed by treatment
25th percentile +0.1 Modest benefit for lower quartile
50th percentile +0.8 Median unit benefits moderately
75th percentile +1.9 Strong benefit for upper quartile
90th percentile +3.2 Large effect for high responders

A flat QTE profile supports homogeneity; a steeply sloped one suggests considerable heterogeneity.

Machine Learning Approaches: CATE Estimation

Modern machine learning methods have transformed the study of treatment effect heterogeneity by searching for high-dimensional patterns in moderators without pre-specification.

Prominent methods include:

  • Causal Forests (Wager & Athey, 2018): Builds an ensemble of causal trees that partition the covariate space into regions of similar treatment response; provides honest, out-of-sample CATE estimates
  • X-Learner: A meta-learner that imputes missing potential outcomes and regresses treatment effects on covariates; particularly effective when treatment and control groups are unequal in size
  • Double/Debiased Machine Learning (DML): Uses cross-fitting and orthogonalization to estimate treatment effects while controlling for high-dimensional confounders
  • Bayesian Additive Regression Trees (BART): Nonparametric Bayesian approach that naturally quantifies uncertainty in CATE estimates

Trade-offs to keep in mind:

Approach Strengths Weaknesses
Interaction regression Simple, interpretable, widely understood Requires pre-specification, low power
Causal forests Data-adaptive, honest inference Computationally intensive, less interpretable
X-Learner Efficient with unequal group sizes Requires good nuisance model estimation
BART Full posterior uncertainty, flexible Slow for very large datasets

 

Heterogeneity in Different Disciplines

Clinical Medicine and Pharmacology

The concept of precision medicine is built almost entirely on treatment effect heterogeneity. The goal is to identify which patients benefit from which treatments:

  • Biomarker-stratified trials pre-specify a biological moderator (e.g., PD-L1 expression for immunotherapy)
  • Adaptive enrichment designs enrich enrollment toward subgroups showing larger effects mid-trial
  • Pharmacogenomics maps genetic variation onto differential drug metabolism and response

A landmark example: HER2-positive breast cancer patients benefit enormously from trastuzumab, while HER2-negative patients gain essentially nothing. A single average effect across both groups obscures a treatment that is transformative for one subgroup and wasteful (with non-trivial toxicity) for another.

Economics and Public Policy

In program evaluation, heterogeneous treatment effects have enormous implications for policy targeting:

  • Job training programs may benefit low-skilled workers significantly but offer little to those already near the top of the wage distribution
  • Conditional cash transfer programs show very different effects on education outcomes by baseline poverty level and rural/urban setting
  • Minimum wage increases have heterogeneous effects across industries, firm sizes, and local labor market conditions

The Local Average Treatment Effect (LATE), which is estimated by instrumental variables, is itself a heterogeneity concept: it captures the ATE only for “compliers,” those whose treatment status is changed by the instrument.

Education Research

Heterogeneity is pervasive in educational interventions:

  • Class-size reductions tend to benefit younger students and those from low-income backgrounds more than older or affluent peers
  • Tutoring programs show large effects for students with moderate skill gaps but smaller effects at the extremes (floor and ceiling effects)
  • Digital learning tools interact strongly with home internet access and parental support

 

Pitfalls and Common Mistakes

Even sophisticated researchers fall into traps when studying treatment effect heterogeneity. Watch for:

  • Data dredging/subgroup fishing: Running many subgroup analyses and reporting only significant ones dramatically inflates false discovery rates. Pre-registration and multiplicity correction are essential.
  • Confirmatory bias in forest plots: Presenting a forest plot with 20 subgroups but not testing the interaction formally leads to spurious conclusions.
  • Ecological fallacy in reverse: Group-level heterogeneity does not necessarily imply individual-level heterogeneity, and vice versa.
  • Conflating statistical heterogeneity with clinical heterogeneity: A statistically significant interaction may be too small to matter clinically; a clinically meaningful difference may lack statistical power to be detected.
  • Ignoring treatment effect heterogeneity in meta-analyses: The I² statistic in meta-analysis quantifies between-study variance but does not explain why studies differ. Random-effects models account for heterogeneity statistically but do not resolve it scientifically.

 

Practical Recommendations for Researchers

Before Data Collection

  • Pre-register hypotheses about moderators; distinguish confirmatory from exploratory subgroup analyses
  • Power the study to detect meaningful interactions, not just main effects (interaction tests require roughly 4× the sample size of main effect tests)
  • Collect rich covariate data to enable post-hoc heterogeneity analysis

During Analysis

  • Report the ATE alongside CATEs; never only report subgroup effects without the overall estimate
  • Use appropriate corrections for multiple comparisons (Bonferroni, Benjamini-Hochberg, or pre-specified hierarchical testing)
  • Visualize effect distributions, not just point estimates and confidence intervals

When Reporting

  • Clearly distinguish pre-specified from exploratory subgroup analyses
  • Report effect sizes and confidence intervals for each subgroup, not just p-values
  • Discuss the direction of heterogeneity: is it quantitative (same direction, different magnitude) or qualitative (different directions, a.k.a. crossover interaction)?

 

Summary

Concept Key Point
Homogeneous treatment effect Constant causal impact across all units; often an assumption, rarely true
Heterogeneous treatment effect Varying causal impact across individuals or subgroups; the norm in complex systems
CATE The expected treatment effect conditional on observed covariates X
Effect modification A covariate that systematically changes the magnitude or direction of the treatment effect
Interaction test The standard regression tool for testing effect modification; requires pre-specification and is underpowered
Causal forests / ML Data-adaptive methods for discovering heterogeneity in high-dimensional covariate spaces
Precision medicine The applied pursuit of treatment effect heterogeneity in clinical settings

 

Treatment effect heterogeneity is not a statistical nuisance to be controlled away. It is a scientific fact about how complex interventions interact with a heterogeneous world. Embracing it rigorously, rather than averaging it away, is what separates research that merely answers “does this work on average?” from research that answers the far more useful question: “does this work for whom, under what conditions, and why?”

 

This article was originally published on February 14, 2023, and revised on April 20, 2026.

Author

Marisha Fonseca

An editor at heart and perfectionist by disposition, providing solutions for journals, publishers, and universities in areas like alt-text writing and publication consultancy.

See more from Marisha Fonseca

Found this useful?

If so, share it with your fellow researchers


Related post

Related Reading