|
Getting your Trinity Audio player ready...
|
Contents
- What is a cohort study?
- Glossary of Key Terms
- How a Cohort Study Works
- Types of Cohort Studies
- Landmark Cohort Studies in Biomedical Science
- When to Use a Cohort Study
- Cohort vs. Case-Control: When to Choose Which
- Designing a Cohort Study: Key Considerations
- Strengths and Limitations of Cohort Studies
- Where Does a Cohort Study Sit in the Evidence Hierarchy?
- Reporting Cohort Studies: STROBE
- Critical Appraisal of a Cohort Study
- Cohort Studies vs. Other Study Designs: A Summary
- Key Takeaways
- Frequently Asked Questions
A cohort study is one of the most powerful and widely used research designs in epidemiology and biomedical science. By following groups of people over time, researchers can uncover how exposures, from smoking habits to genetic mutations, influence the development of disease. From the landmark British Doctors Study that confirmed the link between cigarette smoking and lung cancer, to the multigenerational Framingham Heart Study still running today, cohort studies have fundamentally shaped our understanding of human health.
This guide explains what a cohort study is, how it works, when to use it, and how to critically evaluate one.
What is a cohort study?
A cohort study is a type of observational, longitudinal study that follows a group of individuals, known as the “cohort”, who share a common characteristic or exposure over a defined period of time. Researchers compare outcomes between those exposed to a risk factor and those who are not, to determine whether the exposure is associated with the development of a particular disease or health outcome [1].
The word “cohort” itself derives from the Latin term for a unit of 300 to 600 Roman soldiers who marched together. The term was adopted by the epidemiology community in the 1930s to describe “a designated group which is followed or traced over a period of time” [2].
Crucially, in a cohort study, the investigator does not intervene. Participants are simply observed. This distinguishes cohort studies from clinical trials, where researchers assign participants to treatments.
Glossary of Key Terms
| Term | Definition |
| Cohort | A group of individuals sharing a common characteristic or exposure who are followed together over time |
| Exposure | Any factor—biological, chemical, behavioural, or environmental—whose association with a health outcome is being investigated (e.g., smoking, a drug, a toxin) |
| Outcome | The health event or endpoint being measured, such as disease diagnosis, death, or change in health status |
| Prospective Cohort Study | A cohort study that follows participants forward in time from exposure assessment to outcome occurrence |
| Retrospective Cohort Study | A cohort study in which both exposure and outcome have already occurred; historical records are used to reconstruct the study |
| Incidence | The number of new cases of a disease or outcome arising in a population over a defined time period |
| Prevalence | The proportion of a population that has a disease or condition at a single point in time |
| Cumulative Incidence (Risk) | The proportion of individuals in a cohort who develop the outcome over the entire follow-up period |
| Incidence Rate (Incidence Density) | The number of new cases divided by total person-time at risk; reflects the speed at which new cases occur |
| Person-Time | The sum of the time each individual participant contributed to the study while free of the outcome; used as the denominator in incidence rate calculations |
| Risk Ratio (Relative Risk) | The ratio of the risk of the outcome in the exposed group to the risk in the unexposed group; a value >1 indicates increased risk |
| Risk Difference (Attributable Risk) | The absolute difference in incidence between the exposed and unexposed groups; reflects the excess risk attributable to the exposure |
| Confounding | Distortion of the exposure–outcome relationship by a third variable that is associated with both the exposure and the outcome (e.g., age, sex) |
| Attrition Bias | Systematic error introduced when participants who drop out of a study differ in meaningful ways from those who remain |
| Recall Bias | Error that arises when participants’ recollection of past exposures is influenced by their current disease status; less of a concern in prospective cohort studies |
| Loss to Follow-Up | The failure to collect outcome data on participants who leave or disengage from a study before it concludes |
| Temporality | The requirement that exposure must precede outcome in time; a key criterion for establishing causality and a strength of prospective cohort designs |
| Nested Case-Control Study | A case-control study conducted within an existing defined cohort, drawing both cases and controls from the same cohort population |
| STROBE | Strengthening the Reporting of Observational Studies in Epidemiology—an international checklist of standards for reporting observational studies including cohort studies |
| CASP | Critical Appraisal Skills Programme—provides structured checklists for evaluating the methodological quality of research studies, including cohort studies |
| Baseline | The starting point of a cohort study, at which participants are enrolled, exposure is assessed, and all members are confirmed free of the outcome of interest |
| Open (Dynamic) Cohort | A cohort in which new participants can enter and others can leave throughout the study period |
| Closed Cohort | A cohort in which all participants are fixed at the outset; no new members are added during follow-up |
How a Cohort Study Works
The basic structure of a cohort study involves three stages [3]:
1. Establishing the cohort
- The investigator selects participants based on a shared exposure or characteristic (e.g., smokers vs. non-smokers, workers in a chemical plant, individuals born in the same year)
- Participants must be free of the outcome of interest at baseline; anyone already diagnosed with the disease being studied is excluded [4]
- Two groups are defined: an exposed group and an unexposed (or less exposed) comparison group (similar to a control group)
- The comparison group should be representative of the exposed group on all factors except the exposure itself
2. Following the cohort
- Data on exposures are recorded at baseline via interviews, questionnaires, bioassays, clinical measurements, or medical records
- The cohort is then monitored over time for the development of new cases of the outcome of interest [4]
- Follow-up methods may include direct participant contact, routine health records, death registries, or hospital databases
3. Evaluating outcomes
- Researchers calculate the incidence of disease in both the exposed and unexposed groups
- Key measures include:
| Measure | Definition | Formula |
| Cumulative Incidence (Risk) | Proportion of the cohort that develops the outcome | New cases ÷ population at risk at baseline |
| Incidence Rate (IR) | Rapidity of new outcome occurrence | New cases ÷ person-time at risk |
| Risk Ratio (Relative Risk) | Comparison of risk between groups | Risk in exposed ÷ risk in unexposed |
| Incidence Rate Ratio | Comparison of rates between groups | IR in exposed ÷ IR in unexposed |
| Risk Difference | Absolute difference in risk | Risk in exposed − risk in unexposed |
A risk ratio greater than 1 indicates that the exposure increases the risk of the outcome; less than 1 indicates a protective effect [4].
Types of Cohort Studies
There are several types of cohort study, each suited to different research questions [5]:
| Type | Direction | Key Feature | Speed/Cost |
| Prospective | Forward in time | Exposure assessed at baseline; cohort followed into the future | Slow and expensive |
| Retrospective (Historical) | Backward in time | Both exposure and outcome already occurred; historical data used | Faster, less expensive |
| Concurrent | Real-time | Compares groups with differing exposures simultaneously | Moderate |
| Nested Case-Control | Within an existing cohort | Cases with outcome matched to controls from the same cohort | Efficient for rare outcomes |
Prospective Cohort Studies
In a prospective design, participants are enrolled and their baseline exposures are measured before any outcomes have occurred. The cohort is then actively followed forward in time. This approach allows exposure to be documented with high accuracy, before any disease has influenced how participants recall or experience that exposure [4].
Example:
The ongoing Nurses’ Health Study, now in its third generation with approximately 280,000 participants across the USA, has prospectively followed nurses for over 40 years, revealing associations between obesity and cancer risk and between shift work and chronic disease [1].
Retrospective Cohort Studies
In a retrospective design, both the exposure and the outcome have already occurred by the time the study begins. The researcher uses historical records (medical charts, employment records, registries) to reconstruct the cohort and its exposure history, then links this to outcome data [4].
Example:
A researcher studying the long-term lung health of miners could use decades of occupational exposure records and hospital databases to compare respiratory outcomes between workers with high versus low silica dust exposure, without needing to wait years for outcomes to emerge.
Key trade-off:
Retrospective studies are faster and cheaper, but data completeness and quality are determined by records that were not designed with the study in mind, increasing the risk of missing or incomplete data [1].
Landmark Cohort Studies in Biomedical Science
Some of the most influential findings in medicine have emerged from large cohort studies:
| Study | Year Started | Cohort | Key Findings |
| British Doctors Study | 1951 | ~40,000 UK registered physicians | Definitively linked cigarette smoking to lung cancer and increased all-cause mortality [1] |
| Framingham Heart Study | 1948 | Residents of Framingham, Massachusetts (now 3rd generation) | Identified major cardiovascular disease risk factors; underpinned global prevention guidelines [1] |
| Nurses’ Health Study | 1976 | ~280,000 US nurses (now 3rd generation) | Links between lifestyle factors, hormones, obesity, and cancer/chronic disease [1] |
| Swiss HIV Cohort Study | 1988 | HIV-positive individuals across Switzerland | Advancing understanding of HIV pathogenesis, treatment, immunology, and co-infections [3] |
| Millennium Cohort Study | 2000 | ~18,000 children born in the UK (2000–2002) | Long-term effects of early childhood education, social factors, and family environment on health |
When to Use a Cohort Study
A cohort study is the appropriate design when:
- It is unethical or impractical to randomise participants to an exposure (e.g., you cannot randomly assign people to smoke cigarettes)
- The exposure is rare, such as occupational exposure to a specific toxin, making cohort design more efficient than case-control for examining multiple downstream outcomes [4]
- You need to establish temporality, to confirm that the exposure definitely preceded the disease
- You want to study multiple outcomes from a single exposure (e.g., the effects of smoking on lung cancer, heart disease, and stroke simultaneously)
- The long-term natural history of a disease or risk factor needs to be characterised [5]
- Studying incidence rates is the primary goal: cohort studies are uniquely well-suited to calculating disease incidence, whereas prevalence is better captured by cross-sectional studies
Cohort vs. Case-Control: When to Choose Which
| Feature | Cohort Study | Case-Control Study |
| Starting point | Exposure status | Outcome/disease status |
| Direction | Exposure → outcome | Outcome → exposure (backwards) |
| Best for | Rare exposures, multiple outcomes | Rare outcomes, faster initial results |
| Temporality | Clearly established | Less certain |
| Incidence calculation | Yes | No (only odds ratios) |
| Risk of recall bias | Low (especially prospective) | Higher |
| Cost and time | High | Lower |
Designing a Cohort Study: Key Considerations
Good cohort study design depends on five core elements [5]:
Selection of participants
- Inclusion and exclusion criteria must be precisely defined
- The cohort must be free of the outcome at baseline
- The comparison group should differ from the exposed group only in exposure status
Measurement of exposures
- Measurement tools must be valid and reliable
- Exposure data should be collected consistently across the entire follow-up period
- Biomarkers, clinical assessments, or validated questionnaires are preferable to self-report alone where feasible
Data collection and follow-up
- A systematic follow-up schedule must be established from the outset
- Strategies to minimise loss to follow-up are critical (e.g., regular contact, linkage to national registries)
- Loss to follow-up is one of the most important threats to a cohort study’s validity [5]
Controlling for confounding
- Confounders are variables that are associated with both the exposure and the outcome (e.g., age, sex, socioeconomic status, smoking)
- Methods include: restriction at design stage, matching, stratification in analysis, or multivariable regression
Sample size and statistical power
- Large sample sizes are needed, especially when the outcome is rare (defined as fewer than 1 event per 1,000 person-years)
- Sample size calculations must be performed prior to recruitment [4]
Strengths and Limitations of Cohort Studies
Strengths
- Establishes temporality: Because participants are disease-free at baseline and followed forward in time, it is clear that exposure preceded outcome; this is a prerequisite for inferring causality [4]
- Avoids recall bias: In prospective cohorts, exposure is recorded before the outcome occurs, so a participant’s health status cannot distort their recollection of exposure [4]
- Allows calculation of absolute risk and incidence rates: Unlike case-control studies, cohort studies can directly calculate risk, incidence rate, and risk difference
- Permits study of multiple outcomes: A single cohort can be used to study several different disease endpoints from one exposure [4]
- Efficient for rare exposures: If the exposure itself is uncommon (e.g., workers in a uranium processing plant), a cohort built around that exposure is more efficient than waiting for rare cases to accumulate
- Stronger causal inference than cross-sectional or case-control designs due to longitudinal follow-up [5]
Limitations
- Time-consuming and expensive: Large prospective cohort studies may run for decades and require sustained funding and infrastructure [1]
- Attrition bias: Participants who drop out of a study may systematically differ from those who remain. If sicker individuals leave early, findings may be skewed [1]
- Not suitable for very rare outcomes: For diseases with very low incidence rates, an enormous sample size would be needed to detect a meaningful difference between groups [4]
- Confounding: Despite best efforts, residual confounding from unmeasured variables can distort the apparent relationship between exposure and outcome
- Retrospective limitations: When historical records are used, data quality is fixed; missing data cannot be recovered, and data fields may not have been collected with the study question in mind [1]
- Hawthorne effect: Participants aware they are being studied may change their behaviour, biasing exposure or outcome measurement [1]
- No randomisation: Without random assignment, the exposed and unexposed groups may differ in unmeasured ways, limiting causal claims compared to a randomised controlled trial
Where Does a Cohort Study Sit in the Evidence Hierarchy?
Cohort studies occupy an important position in the evidence pyramid. They rank below randomised controlled trials (RCTs), which are the gold standard for establishing causation, but above case-control studies, cross-sectional studies, and expert opinion. Well-conducted prospective cohort studies with large samples and long follow-up can provide near-experimental-quality evidence, particularly when:
- The association is large and consistent across multiple studies
- A biological mechanism is plausible
- There is a clear dose-response relationship (i.e., greater exposure is associated with greater risk)
The global guidelines on cardiovascular disease prevention, smoking cessation, and cancer risk reduction are built substantially on the evidence from major cohort studies [1].
Reporting Cohort Studies: STROBE
When reporting the findings of a cohort study, researchers are recommended to follow the STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) guidelines. The STROBE checklist covers all key elements of a cohort study report, including [1]:
- Title and abstract
- Background and objectives
- Study design and setting
- Eligibility criteria and sources/methods of participant selection
- Variables and measurement
- Handling of potential sources of bias
- Statistical methods
- Results: participant numbers, descriptive data, outcome data, main results
- Discussion: key findings, limitations, interpretation, generalisability
STROBE is an international, expert-led initiative aimed at improving the transparency and completeness of observational study reporting, making studies easier to critically appraise.
Critical Appraisal of a Cohort Study
Reading a cohort study critically requires a systematic approach. The CASP (Critical Appraisal Skills Programme) Cohort Study Checklist provides a structured framework [5]. Key questions to ask include:
Study design and population
- Is the research question clearly defined?
- Was the cohort recruited in an acceptable way, and is it representative of the target population?
- Were all participants disease-free at baseline?
Exposure and outcome measurement
- Was the exposure accurately and reliably measured?
- Were outcomes assessed objectively and were assessors blinded to exposure status where possible?
- Were all relevant outcomes considered?
Follow-up and bias
- Was the follow-up period long enough to observe the outcome?
- What was the loss-to-follow-up rate, and were those who were lost systematically different from those who remained?
- Were potential confounders identified and controlled for in the analysis?
Results and applicability
- Are the results statistically significant? Are confidence intervals reported?
- Are the results clinically meaningful as well as statistically significant?
- Can the findings be applied to the local or target population?
Cohort Studies vs. Other Study Designs: A Summary
| Design | Intervention? | Time Direction | Causality Strength | Cost |
| RCT | Yes | Forward | Strongest | Very high |
| Prospective Cohort | No | Forward | Strong | High |
| Retrospective Cohort | No | Backward | Moderate-Strong | Moderate |
| Case-Control | No | Backward | Moderate | Lower |
| Cross-Sectional | No | Single time point | Weak | Low |
| Case Report/Series | No | Retrospective | Weakest | Very low |
Key Takeaways
- A cohort study is an observational, longitudinal study that follows a group of individuals — free of the outcome at baseline — to examine how an exposure influences the development of disease over time.
- There are two main types: prospective studies follow participants forward from exposure to outcome, while retrospective studies use historical records to reconstruct both exposure and outcome after they have occurred.
- Participants are selected on the basis of exposure status, not disease status — this distinguishes cohort studies from case-control studies, where selection starts from the outcome.
- The most important measures produced by cohort studies are incidence rates, cumulative incidence, risk ratios, and risk differences — metrics that directly quantify the burden and relative magnitude of risk.
- A critical strength of the prospective cohort design is its ability to establish temporality: because exposure is documented before any outcome occurs, researchers can be more confident that the exposure preceded, and potentially caused, the disease.
- Cohort studies are particularly suited to rare exposures, multiple outcomes from a single exposure, and situations where randomisation is unethical or impractical.
- Major limitations include high cost and long duration, risk of attrition and recall bias, potential for residual confounding, and unsuitability for very rare outcomes.
- Loss to follow-up is one of the most serious methodological threats; participants who leave a study early may be systematically different from those who remain, introducing bias.
- In the evidence hierarchy, well-designed cohort studies rank below RCTs but above case-control and cross-sectional designs, and they have directly underpinned major global clinical guidelines — including those on cardiovascular disease prevention and smoking cessation.
- Reporting should conform to the STROBE guidelines, and readers should use a structured tool such as the CASP Cohort Study Checklist to critically appraise the quality, validity, and applicability of cohort study findings.
Frequently Asked Questions
Can a cohort study prove causation?
Cohort studies can provide strong evidence for a causal relationship, but they cannot prove causation on their own. Because participants are not randomly assigned to exposures, unmeasured confounding can never be entirely ruled out. Researchers use the Bradford Hill criteria—a set of nine standards including strength of association, consistency, biological plausibility, dose-response relationship, and temporality—to judge how confidently a causal inference can be drawn from observational data. When multiple independent cohort studies converge on the same finding, and a plausible biological mechanism exists, the cumulative evidence can be compelling enough to inform clinical and public health policy.
What is the difference between an open and a closed cohort, and does it matter?
In a closed cohort, all participants are enrolled at a fixed point in time and no new members are added; everyone either completes the study or leaves it. In an open (or dynamic) cohort, individuals can enter and exit the cohort throughout the study period as long as they meet the eligibility criteria. This is much like a general practice patient list, where people register and deregister continuously. The choice matters analytically: closed cohorts are best analysed using cumulative incidence (risk), whereas open cohorts require incidence rate calculations using person-time, because participants contribute different amounts of observation time. Most large epidemiological cohorts like the UK Biobank function as open cohorts.
How do researchers handle confounding in a cohort study?
Confounding can be addressed at both the design stage and the analysis stage. At the design stage, strategies include
- restriction (limiting enrolment to a narrow group, e.g., only non-smokers, to eliminate smoking as a confounder),
- matching (pairing each exposed participant with an unexposed participant of similar characteristics), and
- selecting a well-matched comparison group.
During analysis, statistical techniques such as multivariable regression, stratification, and propensity score methods are used to adjust for measured confounders. However, residual confounding from variables that were not measured at all (e.g., unmeasured genetic predispositions) remains an irreducible limitation of any observational study.
What is a nested case-control study, and why embed it within a cohort?
A nested case-control study is a case-control study conducted within the boundaries of an already established cohort. As cases of the outcome develop during follow-up, they are identified and matched to a sample of controls drawn from the same cohort who have not yet developed the outcome. This design is particularly efficient when the measurement of a key exposure (such as a stored blood biomarker, a genetic assay, or a detailed dietary assessment) is expensive or technically demanding, and it would be impractical to perform it on every cohort member.
For example, a researcher studying whether serum vitamin D levels predict colorectal cancer risk might store baseline blood samples for all cohort participants but only assay samples from the cases and their matched controls, dramatically reducing cost while retaining much of the inferential power of the full cohort.
How large does a cohort study need to be, and how long should it run?
Neither question has a single answer: both depend on the specific research question. Sample size is determined by
- the expected incidence of the outcome in the unexposed group,
- the minimum effect size considered clinically meaningful,
- the desired statistical power (typically 80% or higher), and
- the significance threshold (usually α = 0.05).
For rare outcomes such as a specific cancer subtype, tens or even hundreds of thousands of participants may be needed. Duration is similarly driven by the biology of the disease: exposures that lead to rapidly developing outcomes (e.g., an acute infection) may require only months of follow-up, while studies of chronic diseases such as Alzheimer’s disease or atherosclerosis may require decades. Underestimating either sample size or follow-up duration are among the most common reasons cohort studies fail to detect a true association.
What ethical considerations are specific to cohort studies?
Several ethical issues are distinctive to the cohort design.
- First, because participants are followed over long periods, ongoing informed consent (not just a single signature at enrolment) is considered best practice, especially if new data types (such as genetic samples or linkage to electronic health records) are added during the study.
- Second, researchers face a duty of care when incidental findings are uncovered: if follow-up data reveal that a participant has an undiagnosed serious condition, there are obligations around disclosure that must be defined in advance in the study protocol.
- Third, differential attrition raises equity concerns: if participants from lower socioeconomic or minority ethnic groups are more likely to drop out, the remaining cohort may not represent the broader population, and findings could inadvertently reinforce health inequalities.
- Finally, long-term storage and future use of biological samples and personal data require robust governance frameworks and transparent data access policies to maintain participant trust throughout the study’s lifetime.
References
- Barrett D, Noble H. What are cohort studies? Evid Based Nurs. 2019;22(4):95–96.
- Alexander LK, Lopes B, Ricchetti-Masterson K, Yeatts KB. Cohort Studies. ERIC Notebook Series. 2nd ed. Chapel Hill (NC): University of North Carolina at Chapel Hill, Department of Epidemiology; 2015.
- Setia MS. Methodology Series Module 1: Cohort Studies. Indian J Dermatol. 2016 Jan–Feb;61(1):21–25. doi: 10.4103/0019-5154.174011.
- Song JW, Chung KC. Observational studies: cohort and case-control studies. Plast Reconstr Surg. 2010 Dec;126(6):2234–2242. doi: 10.1097/PRS.0b013e3181f44abc.
- Critical Appraisal Skills Programme (CASP). What is a cohort study and why are they important? [Internet]. Oxford: CASP UK; 2023 [cited 2026 Jun 8]. Available from: https://casp-uk.net/what-is-a-cohort-study/

Comment