2026.06.08
2026.07.15

What Is a Cohort Study? Definition, Examples, How to Conduct

Contents

What is a cohort study?
Glossary of Key Terms
How a Cohort Study Works
Types of Cohort Studies
Landmark Cohort Studies in Biomedical Science
When to Use a Cohort Study
Cohort vs. Case-Control: When to Choose Which
Designing a Cohort Study: Key Considerations
Strengths and Limitations of Cohort Studies
Where Does a Cohort Study Sit in the Evidence Hierarchy?
Reporting Cohort Studies: STROBE
Critical Appraisal of a Cohort Study
Cohort Studies vs. Other Study Designs: A Summary
Key Takeaways
Frequently Asked Questions

A cohort study is one of the most powerful and widely used research designs in epidemiology and biomedical science. By following groups of people over time, researchers can uncover how exposures, from smoking habits to genetic mutations, influence the development of disease. From the landmark British Doctors Study that confirmed the link between cigarette smoking and lung cancer, to the multigenerational Framingham Heart Study still running today, cohort studies have fundamentally shaped our understanding of human health.

This guide explains what a cohort study is, how it works, when to use it, and how to critically evaluate one.

What is a cohort study?

A cohort study is a type of observational, longitudinal study that follows a group of individuals, known as the “cohort”, who share a common characteristic or exposure over a defined period of time. Researchers compare outcomes between those exposed to a risk factor and those who are not, to determine whether the exposure is associated with the development of a particular disease or health outcome [1].

The word “cohort” itself derives from the Latin term for a unit of 300 to 600 Roman soldiers who marched together. The term was adopted by the epidemiology community in the 1930s to describe “a designated group which is followed or traced over a period of time” [2].

Crucially, in a cohort study, the investigator does not intervene. Participants are simply observed. This distinguishes cohort studies from clinical trials, where researchers assign participants to treatments.

Glossary of Key Terms

Term	Definition
Cohort	A group of individuals sharing a common characteristic or exposure who are followed together over time
Exposure	Any factor—biological, chemical, behavioural, or environmental—whose association with a health outcome is being investigated (e.g., smoking, a drug, a toxin)
Outcome	The health event or endpoint being measured, such as disease diagnosis, death, or change in health status
Prospective Cohort Study	A cohort study that follows participants forward in time from exposure assessment to outcome occurrence
Retrospective Cohort Study	A cohort study in which both exposure and outcome have already occurred; historical records are used to reconstruct the study
Incidence	The number of new cases of a disease or outcome arising in a population over a defined time period
Prevalence	The proportion of a population that has a disease or condition at a single point in time
Cumulative Incidence (Risk)	The proportion of individuals in a cohort who develop the outcome over the entire follow-up period
Incidence Rate (Incidence Density)	The number of new cases divided by total person-time at risk; reflects the speed at which new cases occur
Person-Time	The sum of the time each individual participant contributed to the study while free of the outcome; used as the denominator in incidence rate calculations
Risk Ratio (Relative Risk)	The ratio of the risk of the outcome in the exposed group to the risk in the unexposed group; a value >1 indicates increased risk
Risk Difference (Attributable Risk)	The absolute difference in incidence between the exposed and unexposed groups; reflects the excess risk attributable to the exposure
Confounding	Distortion of the exposure–outcome relationship by a third variable that is associated with both the exposure and the outcome (e.g., age, sex)
Attrition Bias	Systematic error introduced when participants who drop out of a study differ in meaningful ways from those who remain
Recall Bias	Error that arises when participants’ recollection of past exposures is influenced by their current disease status; less of a concern in prospective cohort studies
Loss to Follow-Up	The failure to collect outcome data on participants who leave or disengage from a study before it concludes
Temporality	The requirement that exposure must precede outcome in time; a key criterion for establishing causality and a strength of prospective cohort designs
Nested Case-Control Study	A case-control study conducted within an existing defined cohort, drawing both cases and controls from the same cohort population
STROBE	Strengthening the Reporting of Observational Studies in Epidemiology—an international checklist of standards for reporting observational studies including cohort studies
CASP	Critical Appraisal Skills Programme—provides structured checklists for evaluating the methodological quality of research studies, including cohort studies
Baseline	The starting point of a cohort study, at which participants are enrolled, exposure is assessed, and all members are confirmed free of the outcome of interest
Open (Dynamic) Cohort	A cohort in which new participants can enter and others can leave throughout the study period
Closed Cohort	A cohort in which all participants are fixed at the outset; no new members are added during follow-up

How a Cohort Study Works

The basic structure of a cohort study involves three stages [3]:

1. Establishing the cohort

The investigator selects participants based on a shared exposure or characteristic (e.g., smokers vs. non-smokers, workers in a chemical plant, individuals born in the same year)
Participants must be free of the outcome of interest at baseline; anyone already diagnosed with the disease being studied is excluded [4]
Two groups are defined: an exposed group and an unexposed (or less exposed) comparison group (similar to a control group)
The comparison group should be representative of the exposed group on all factors except the exposure itself

2. Following the cohort

Data on exposures are recorded at baseline via interviews, questionnaires, bioassays, clinical measurements, or medical records
The cohort is then monitored over time for the development of new cases of the outcome of interest [4]
Follow-up methods may include direct participant contact, routine health records, death registries, or hospital databases

3. Evaluating outcomes

Researchers calculate the incidence of disease in both the exposed and unexposed groups
Key measures include:

Measure	Definition	Formula
Cumulative Incidence (Risk)	Proportion of the cohort that develops the outcome	New cases ÷ population at risk at baseline
Incidence Rate (IR)	Rapidity of new outcome occurrence	New cases ÷ person-time at risk
Risk Ratio (Relative Risk)	Comparison of risk between groups	Risk in exposed ÷ risk in unexposed
Incidence Rate Ratio	Comparison of rates between groups	IR in exposed ÷ IR in unexposed
Risk Difference	Absolute difference in risk	Risk in exposed − risk in unexposed

A risk ratio greater than 1 indicates that the exposure increases the risk of the outcome; less than 1 indicates a protective effect [4].

Types of Cohort Studies

There are several types of cohort study, each suited to different research questions [5]:

Type	Direction	Key Feature	Speed/Cost
Prospective	Forward in time	Exposure assessed at baseline; cohort followed into the future	Slow and expensive
Retrospective (Historical)	Backward in time	Both exposure and outcome already occurred; historical data used	Faster, less expensive
Concurrent	Real-time	Compares groups with differing exposures simultaneously	Moderate
Nested Case-Control	Within an existing cohort	Cases with outcome matched to controls from the same cohort	Efficient for rare outcomes

Prospective Cohort Studies

In a prospective design, participants are enrolled and their baseline exposures are measured before any outcomes have occurred. The cohort is then actively followed forward in time. This approach allows exposure to be documented with high accuracy, before any disease has influenced how participants recall or experience that exposure [4].

Example:

The ongoing Nurses’ Health Study, now in its third generation with approximately 280,000 participants across the USA, has prospectively followed nurses for over 40 years, revealing associations between obesity and cancer risk and between shift work and chronic disease [1].

Retrospective Cohort Studies

In a retrospective design, both the exposure and the outcome have already occurred by the time the study begins. The researcher uses historical records (medical charts, employment records, registries) to reconstruct the cohort and its exposure history, then links this to outcome data [4].

Example:

A researcher studying the long-term lung health of miners could use decades of occupational exposure records and hospital databases to compare respiratory outcomes between workers with high versus low silica dust exposure, without needing to wait years for outcomes to emerge.

Key trade-off:

Retrospective studies are faster and cheaper, but data completeness and quality are determined by records that were not designed with the study in mind, increasing the risk of missing or incomplete data [1].

Landmark Cohort Studies in Biomedical Science

Some of the most influential findings in medicine have emerged from large cohort studies:

Study	Year Started	Cohort	Key Findings
British Doctors Study	1951	~40,000 UK registered physicians	Definitively linked cigarette smoking to lung cancer and increased all-cause mortality [1]
Framingham Heart Study	1948	Residents of Framingham, Massachusetts (now 3rd generation)	Identified major cardiovascular disease risk factors; underpinned global prevention guidelines [1]
Nurses’ Health Study	1976	~280,000 US nurses (now 3rd generation)	Links between lifestyle factors, hormones, obesity, and cancer/chronic disease [1]
Swiss HIV Cohort Study	1988	HIV-positive individuals across Switzerland	Advancing understanding of HIV pathogenesis, treatment, immunology, and co-infections [3]
Millennium Cohort Study	2000	~18,000 children born in the UK (2000–2002)	Long-term effects of early childhood education, social factors, and family environment on health

When to Use a Cohort Study

A cohort study is the appropriate design when:

It is unethical or impractical to randomise participants to an exposure (e.g., you cannot randomly assign people to smoke cigarettes)
The exposure is rare, such as occupational exposure to a specific toxin, making cohort design more efficient than case-control for examining multiple downstream outcomes [4]
You need to establish temporality, to confirm that the exposure definitely preceded the disease
You want to study multiple outcomes from a single exposure (e.g., the effects of smoking on lung cancer, heart disease, and stroke simultaneously)
The long-term natural history of a disease or risk factor needs to be characterized [5]
Studying incidence rates is the primary goal: cohort studies are uniquely well-suited to calculating disease incidence, whereas prevalence is better captured by cross-sectional studies

Cohort vs. Case-Control: When to Choose Which

Feature	Cohort Study	Case-Control Study
Starting point	Exposure status	Outcome/disease status
Direction	Exposure → outcome	Outcome → exposure (backwards)
Best for	Rare exposures, multiple outcomes	Rare outcomes, faster initial results
Temporality	Clearly established	Less certain
Incidence calculation	Yes	No (only odds ratios)
Risk of recall bias	Low (especially prospective)	Higher
Cost and time	High	Lower

Designing a Cohort Study: Key Considerations

Good cohort study design depends on five core elements [5]:

Selection of participants

Inclusion and exclusion criteria must be precisely defined
The cohort must be free of the outcome at baseline
The comparison group should differ from the exposed group only in exposure status

Measurement of exposures

Measurement tools must be valid and reliable
Exposure data should be collected consistently across the entire follow-up period
Biomarkers, clinical assessments, or validated questionnaires are preferable to self-report alone where feasible

Data collection and follow-up

A systematic follow-up schedule must be established from the outset
Strategies to minimise loss to follow-up are critical (e.g., regular contact, linkage to national registries)
Loss to follow-up is one of the most important threats to a cohort study’s validity [5]

Controlling for confounding

Confounders are variables that are associated with both the exposure and the outcome (e.g., age, sex, socioeconomic status, smoking)
Methods include: restriction at design stage, matching, stratification in analysis, or multivariable regression

Sample size and statistical power

Large sample sizes are needed, especially when the outcome is rare (defined as fewer than 1 event per 1,000 person-years)
Sample size calculations must be performed prior to recruitment [4]

Strengths and Limitations of Cohort Studies

Strengths

Establishes temporality: Because participants are disease-free at baseline and followed forward in time, it is clear that exposure preceded outcome; this is a prerequisite for inferring causality [4]
Avoids recall bias: In prospective cohorts, exposure is recorded before the outcome occurs, so a participant’s health status cannot distort their recollection of exposure [4]
Allows calculation of absolute risk and incidence rates: Unlike case-control studies, cohort studies can directly calculate risk, incidence rate, and risk difference
Permits study of multiple outcomes: A single cohort can be used to study several different disease endpoints from one exposure [4]
Efficient for rare exposures: If the exposure itself is uncommon (e.g., workers in a uranium processing plant), a cohort built around that exposure is more efficient than waiting for rare cases to accumulate
Stronger causal inference than cross-sectional or case-control designs due to longitudinal follow-up [5]

Limitations

Time-consuming and expensive: Large prospective cohort studies may run for decades and require sustained funding and infrastructure [1]
Attrition bias: Participants who drop out of a study may systematically differ from those who remain. If sicker individuals leave early, findings may be skewed [1]
Not suitable for very rare outcomes: For diseases with very low incidence rates, an enormous sample size would be needed to detect a meaningful difference between groups [4]
Confounding: Despite best efforts, residual confounding from unmeasured variables can distort the apparent relationship between exposure and outcome
Retrospective limitations: When historical records are used, data quality is fixed; missing data cannot be recovered, and data fields may not have been collected with the study question in mind [1]
Hawthorne effect: Participants aware they are being studied may change their behaviour, biasing exposure or outcome measurement [1]
No randomisation: Without random assignment, the exposed and unexposed groups may differ in unmeasured ways, limiting causal claims compared to a randomised controlled trial

Where Does a Cohort Study Sit in the Evidence Hierarchy?

Cohort studies occupy an important position in the evidence pyramid. They rank below randomised controlled trials (RCTs), which are the gold standard for establishing causation, but above case-control studies, cross-sectional studies, and expert opinion. Well-conducted prospective cohort studies with large samples and long follow-up can provide near-experimental-quality evidence, particularly when:

The association is large and consistent across multiple studies
A biological mechanism is plausible
There is a clear dose-response relationship (i.e., greater exposure is associated with greater risk)

The global guidelines on cardiovascular disease prevention, smoking cessation, and cancer risk reduction are built substantially on the evidence from major cohort studies [1].

Reporting Cohort Studies: STROBE

When reporting the findings of a cohort study, researchers are recommended to follow the STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) guidelines. The STROBE checklist covers all key elements of a cohort study report, including [1]:

Title and abstract
Background and objectives
Study design and setting
Eligibility criteria and sources/methods of participant selection
Variables and measurement
Handling of potential sources of bias
Statistical methods
Results: participant numbers, descriptive data, outcome data, main results
Discussion: key findings, limitations, interpretation, generalisability

STROBE is an international, expert-led initiative aimed at improving the transparency and completeness of observational study reporting, making studies easier to critically appraise.

Critical Appraisal of a Cohort Study

Reading a cohort study critically requires a systematic approach. The CASP (Critical Appraisal Skills Programme) Cohort Study Checklist provides a structured framework [5]. Key questions to ask include:

Study design and population

Is the research question clearly defined?
Was the cohort recruited in an acceptable way, and is it representative of the target population?
Were all participants disease-free at baseline?

Exposure and outcome measurement

Was the exposure accurately and reliably measured?
Were outcomes assessed objectively and were assessors blinded to exposure status where possible?
Were all relevant outcomes considered?

Follow-up and bias

Was the follow-up period long enough to observe the outcome?
What was the loss-to-follow-up rate, and were those who were lost systematically different from those who remained?
Were potential confounders identified and controlled for in the analysis?

Results and applicability

Are the results statistically significant? Are confidence intervals reported?
Are the results clinically meaningful as well as statistically significant?
Can the findings be applied to the local or target population?

Cohort Studies vs. Other Study Designs: A Summary

Design	Intervention?	Time Direction	Causality Strength	Cost
RCT	Yes	Forward	Strongest	Very high
Prospective Cohort	No	Forward	Strong	High
Retrospective Cohort	No	Backward	Moderate-Strong	Moderate
Case-Control	No	Backward	Moderate	Lower
Cross-Sectional	No	Single time point	Weak	Low
Case Report/Series	No	Retrospective	Weakest	Very low

Key Takeaways

A cohort study is an observational, longitudinal study that follows a group of individuals — free of the outcome at baseline — to examine how an exposure influences the development of disease over time.
There are two main types: prospective studies follow participants forward from exposure to outcome, while retrospective studies use historical records to reconstruct both exposure and outcome after they have occurred.
Participants are selected on the basis of exposure status, not disease status — this distinguishes cohort studies from case-control studies, where selection starts from the outcome.
The most important measures produced by cohort studies are incidence rates, cumulative incidence, risk ratios, and risk differences — metrics that directly quantify the burden and relative magnitude of risk.
A critical strength of the prospective cohort design is its ability to establish temporality: because exposure is documented before any outcome occurs, researchers can be more confident that the exposure preceded, and potentially caused, the disease.
Cohort studies are particularly suited to rare exposures, multiple outcomes from a single exposure, and situations where randomisation is unethical or impractical.
Major limitations include high cost and long duration, risk of attrition and recall bias, potential for residual confounding, and unsuitability for very rare outcomes.
Loss to follow-up is one of the most serious methodological threats; participants who leave a study early may be systematically different from those who remain, introducing bias.
In the evidence hierarchy, well-designed cohort studies rank below RCTs but above case-control and cross-sectional designs, and they have directly underpinned major global clinical guidelines — including those on cardiovascular disease prevention and smoking cessation.
Reporting should conform to the STROBE guidelines, and readers should use a structured tool such as the CASP Cohort Study Checklist to critically appraise the quality, validity, and applicability of cohort study findings.

Frequently Asked Questions

Can a cohort study prove causation?

Cohort studies can provide strong evidence for a causal relationship, but they cannot prove causation on their own. Because participants are not randomly assigned to exposures, unmeasured confounding can never be entirely ruled out. Researchers use the Bradford Hill criteria—a set of nine standards including strength of association, consistency, biological plausibility, dose-response relationship, and temporality—to judge how confidently a causal inference can be drawn from observational data. When multiple independent cohort studies converge on the same finding, and a plausible biological mechanism exists, the cumulative evidence can be compelling enough to inform clinical and public health policy.

What is the difference between an open and a closed cohort, and does it matter?

In a closed cohort, all participants are enrolled at a fixed point in time and no new members are added; everyone either completes the study or leaves it. In an open (or dynamic) cohort, individuals can enter and exit the cohort throughout the study period as long as they meet the eligibility criteria. This is much like a general practice patient list, where people register and deregister continuously. The choice matters analytically: closed cohorts are best analysed using cumulative incidence (risk), whereas open cohorts require incidence rate calculations using person-time, because participants contribute different amounts of observation time. Most large epidemiological cohorts like the UK Biobank function as open cohorts.

How do researchers handle confounding in a cohort study?

Confounding can be addressed at both the design stage and the analysis stage. At the design stage, strategies include

restriction (limiting enrolment to a narrow group, e.g., only non-smokers, to eliminate smoking as a confounder),
matching (pairing each exposed participant with an unexposed participant of similar characteristics), and
selecting a well-matched comparison group.

During analysis, statistical techniques such as multivariable regression, stratification, and propensity score methods are used to adjust for measured confounders. However, residual confounding from variables that were not measured at all (e.g., unmeasured genetic predispositions) remains an irreducible limitation of any observational study.

What is a nested case-control study, and why embed it within a cohort?

A nested case-control study is a case-control study conducted within the boundaries of an already established cohort. As cases of the outcome develop during follow-up, they are identified and matched to a sample of controls drawn from the same cohort who have not yet developed the outcome. This design is particularly efficient when the measurement of a key exposure (such as a stored blood biomarker, a genetic assay, or a detailed dietary assessment) is expensive or technically demanding, and it would be impractical to perform it on every cohort member.

For example, a researcher studying whether serum vitamin D levels predict colorectal cancer risk might store baseline blood samples for all cohort participants but only assay samples from the cases and their matched controls, dramatically reducing cost while retaining much of the inferential power of the full cohort.

How large does a cohort study need to be, and how long should it run?

Neither question has a single answer: both depend on the specific research question. Sample size is determined by

the expected incidence of the outcome in the unexposed group,
the minimum effect size considered clinically meaningful,
the desired statistical power (typically 80% or higher), and
the significance threshold (usually α = 0.05).

For rare outcomes such as a specific cancer subtype, tens or even hundreds of thousands of participants may be needed. Duration is similarly driven by the biology of the disease: exposures that lead to rapidly developing outcomes (e.g., an acute infection) may require only months of follow-up, while studies of chronic diseases such as Alzheimer’s disease or atherosclerosis may require decades. Underestimating either sample size or follow-up duration are among the most common reasons cohort studies fail to detect a true association.

What ethical considerations are specific to cohort studies?

Several ethical issues are distinctive to the cohort design.

First, because participants are followed over long periods, ongoing informed consent (not just a single signature at enrolment) is considered best practice, especially if new data types (such as genetic samples or linkage to electronic health records) are added during the study.
Second, researchers face a duty of care when incidental findings are uncovered: if follow-up data reveal that a participant has an undiagnosed serious condition, there are obligations around disclosure that must be defined in advance in the study protocol.
Third, differential attrition raises equity concerns: if participants from lower socioeconomic or minority ethnic groups are more likely to drop out, the remaining cohort may not represent the broader population, and findings could inadvertently reinforce health inequalities.
Finally, long-term storage and future use of biological samples and personal data require robust governance frameworks and transparent data access policies to maintain participant trust throughout the study’s lifetime.

References

Barrett D, Noble H. What are cohort studies? Evid Based Nurs. 2019;22(4):95–96.
Alexander LK, Lopes B, Ricchetti-Masterson K, Yeatts KB. Cohort Studies. ERIC Notebook Series. 2nd ed. Chapel Hill (NC): University of North Carolina at Chapel Hill, Department of Epidemiology; 2015.
Setia MS. Methodology Series Module 1: Cohort Studies. Indian J Dermatol. 2016 Jan–Feb;61(1):21–25. doi: 10.4103/0019-5154.174011.
Song JW, Chung KC. Observational studies: cohort and case-control studies. Plast Reconstr Surg. 2010 Dec;126(6):2234–2242. doi: 10.1097/PRS.0b013e3181f44abc.
Critical Appraisal Skills Programme (CASP). What is a cohort study and why are they important? [Internet]. Oxford: CASP UK; 2023 [cited 2026 Jun 8]. Available from: https://casp-uk.net/what-is-a-cohort-study/

What is a Control Group? Definition, How to Choose, Uses

Research Data Management: How to Make a Data Management Plan (DMP)

What Is a Cohort Study? Definition, Examples, How to Conduct

1. Establishing the cohort

2. Following the cohort

3. Evaluating outcomes

Prospective Cohort Studies

Example:

Retrospective Cohort Studies

Example:

Key trade-off:

Selection of participants

Measurement of exposures

Data collection and follow-up

Controlling for confounding

Sample size and statistical power

Strengths

Limitations

Study design and population

Exposure and outcome measurement

Follow-up and bias

Results and applicability

Can a cohort study prove causation?

What is the difference between an open and a closed cohort, and does it matter?

How do researchers handle confounding in a cohort study?

What is a nested case-control study, and why embed it within a cohort?

How large does a cohort study need to be, and how long should it run?

What ethical considerations are specific to cohort studies?

References

Comment