Incidence and prevalence: How to calculate and report
Incidence and prevalence are the foundational measures of disease frequency in epidemiology. This guide covers what each means, how to calculate them correctly with formulas and worked examples, key differences, reporting best practices, and common pitfalls to avoid.
Introduction
In biomedical research, especially epidemiology, incidence and prevalence are indispensable tools for quantifying how common a disease is within a population. Though both describe disease frequency, they capture fundamentally different information:
- incidence measures how often new cases arise
- prevalence measures how many people currently have the disease.
Using the wrong measure, or reporting either incorrectly, can mislead public health decisions and hinder reproducibility.
This article explains both concepts in depth, provides the formulas and step-by-step calculations researchers need, highlights the most important distinctions, and offers reporting guidance grounded in best practice.
Definitions: what do incidence and prevalence measure?
Incidence refers to the number of new cases of a disease occurring in a defined population over a specified time period.
It captures the risk of developing the disease and is the measure of choice when studying disease causes, risk factors, or the impact of preventive interventions.
Prevalence refers to the total number of existing cases (both new and old) in a population at a given point in or over a defined period.
It reflects the overall burden of a disease and is particularly informative for planning healthcare resource allocation.
A helpful analogy: if a disease were water flowing through a pipe, incidence would be the rate at which water enters (the flow), and prevalence would be the total amount of water currently in the pipe (the stock).
Types of incidence and prevalence
Two types of incidence
Researchers should be aware that “incidence” is not a single measure; it comes in two distinct forms that are frequently confused:
Cumulative incidence (risk)
CI = New cases ÷ Population at risk
Expressed as a proportion (e.g., 0.05 or 5%). Assumes the entire population is followed for the same period. Best for closed cohorts with complete follow-up.
Incidence rate (density)
IR = New cases ÷ Total person-time at risk
Expressed as cases per person-time unit (e.g., per 1,000 person-years). Accounts for variable follow-up. Essential when participants enter or leave the study at different times.
Person-time is the sum of all time periods during which each participant was at risk. For example, if 200 participants are followed for an average of 3.5 years each, the total person-time at risk is 700 person-years.
Two types of prevalence
Point prevalence
PP = Existing cases at time T ÷ Total population at time T
A snapshot at a single point in time. Most commonly reported as a percentage or per 1,000/100,000 people.
Period prevalence
PP = Cases present during period ÷ Average population during period
Covers all cases that existed at any point during a defined window (e.g., one year). Includes both pre-existing cases and new ones that arose during the period.
Formulas and worked examples
Calculating incidence rate: worked example
A prospective cohort study follows 4,000 diabetes-free adults. Over 5 years, 120 participants develop type 2 diabetes. Due to deaths and drop-outs, the total follow-up time accumulated is 18,500 person-years.
Incidence rate = 120 ÷ 18,500 = 0.00649 cases per person-year
To express per 1,000 person-years: 0.00649 × 1,000 = 6.49 cases per 1,000 person-years
Interpretation: For every 1,000 adults followed for one year, approximately 6.5 new cases of type 2 diabetes arise.
Calculating prevalence: worked example
A national health survey finds that 14.2 million of a country’s population of 68 million adults have been diagnosed with hypertension on the survey date.
Point prevalence = 14,200,000 ÷ 68,000,000 = 0.209
Expressed as a percentage: 20.9%
Interpretation: Approximately 1 in 5 adults in this population is living with hypertension at this point in time.
The relationship between incidence and prevalence
Incidence and prevalence are mathematically linked through a relationship that is essential to understand when interpreting disease burden data:
Key epidemiological relationship
Prevalence ≈ Incidence Rate × Average Disease Duration
This formula holds when disease prevalence is stable (i.e., in a steady state). Its implications are profound:
- High incidence + short duration = low prevalence. A disease that is common but quickly resolves (or is rapidly fatal) will have a low prevalence despite a high incidence. Seasonal influenza is a classic example.
- Low incidence + long duration = high prevalence. Chronic conditions like type 2 diabetes or HIV (in the era of antiretroviral therapy) accumulate large numbers of prevalent cases even if new cases each year are modest relative to the total population.
- Effective treatment increases prevalence. Paradoxically, if a new treatment dramatically reduces mortality from a disease without curing it, prevalence rises even if incidence remains unchanged because people live longer with the condition.
Incidence vs. prevalence: key differences at a glance
| Feature | Incidence | Prevalence |
| What it measures | New cases arising over a period | Existing cases at a point or period |
| Population denominator | Those at risk (disease-free at start) | Total population studied |
| Time dimension | Always requires a time period | Point (snapshot) or period |
| Primary use | Etiology, risk factors, causal inference | Healthcare planning, burden of disease |
| Best study design | Cohort study (prospective or retrospective) | Cross-sectional study |
| Affected by disease duration? | No | Yes, if a disease is chronic, prevalence is inflated |
| Typical expression | Cases per 1,000 person-years | Percentage or cases per 100,000 |
Incidence, prevalence, and study design
The choice between measuring incidence and prevalence is tightly linked to study design. Using the wrong design for the intended measure is one of the most consequential methodological errors in epidemiological research.
Measuring incidence → use a cohort study
Cohort studies follow disease-free participants forward in time, making them ideal for tracking new case development. Prospective cohort studies are gold standard for incidence; retrospective cohort studies use existing records. The key requirement is that participants must be at risk (i.e., free of the outcome) at enrollment.
Measuring prevalence → use a cross-sectional study
Cross-sectional studies assess both exposure and outcome simultaneously in a defined population at one point. They cannot distinguish new from existing cases, making them unsuitable for incidence estimation but ideal for quantifying point or period prevalence.
Why biomedical researchers calculate incidence and prevalence
- Quantifying disease burden: Incidence and prevalence provide the empirical foundation for understanding how widely a disease affects a population and how that burden is changing over time. This is essential for prioritising research funding and public health policy.
- Identifying risk factors: By comparing incidence rates across demographic subgroups or exposure categories, researchers can identify genetic, environmental, or behavioural factors associated with elevated risk.
- Monitoring trends: Tracking incidence and prevalence longitudinally allows researchers and public health agencies to detect emerging outbreaks, assess the effectiveness of population-level interventions, and model future disease trajectories.
- Evaluating interventions: Comparing incidence before and after an intervention such as a vaccination programme provides a direct measure of effectiveness. For instance, tracking iron-deficiency anaemia incidence in children before and after a nationwide school nutrition programme can quantify the programme’s impact.
- Allocating healthcare resources: Prevalence data, in particular, informs decisions about how many treatment facilities, healthcare workers, and medicines a health system needs. High-prevalence chronic conditions require very different resource planning than low-prevalence acute diseases.
Best practices in calculating and reporting incidence and prevalence
Because incidence and prevalence data can directly influence public health policy, rigorous analysis and transparent reporting are critical. The following best practices are consistent with reporting standards in major epidemiological journals:
- Clearly define the study population: Provide a comprehensive description including eligibility criteria, relevant demographics (age, sex, geography), and any exclusions. Vague population definitions make reported estimates non-comparable and non-replicable.
- State the time period explicitly: Always report the period over which incidence was measured, or the date(s) at which prevalence was assessed. A prevalence estimate without a reference date is uninterpretable.
- Choose the appropriate denominator: For incidence rates, use person-time at risk (not total headcount) whenever follow-up varies across participants. Using headcount as the denominator when follow-up times differ will systematically under- or over-estimate the true rate.
- Use appropriate units and multipliers: Express incidence rates per a meaningful multiplier (e.g., per 1,000 or 100,000 person-years) appropriate to the disease’s frequency. Rare diseases may require rates per million; common diseases per 1,000. Consistency with prior literature aids comparability.
- Report subgroup estimates where relevant: Disaggregate results by age, sex, ethnicity, or other meaningful strata. Pooled estimates can mask clinically important heterogeneity. For example, a disease with equivalent overall prevalence may show striking sex differences.
- Always report confidence intervals: A point estimate without a confidence interval conveys no information about precision. Confidence intervals allow readers to assess statistical uncertainty and are required by virtually all major epidemiology journals.
- Justify your prevalence type: Explicitly state whether you are reporting point prevalence or period prevalence, and why that choice was appropriate for your research question.
Common errors and pitfalls to avoid
- Using total population instead of population at risk: The denominator for incidence must exclude people who already have the disease at the start of the observation period. Including prevalent cases inflates the denominator and underestimates incidence.
- Ignoring censored observations: In studies with variable follow-up, participants who drop out or die from other causes are “censored.” Ignoring censoring and using simple headcounts rather than person-time will bias the incidence rate estimate.
- Confusing point and period prevalence: These are distinct measures and should not be treated interchangeably. Reporting period prevalence when your design only captured a snapshot or vice versa is a misclassification error.
- Mismatching the multiplier to disease frequency: Expressing a very rare disease as “0.00003 per person-year” rather than “3 per 100,000 person-years” makes results harder to interpret and compare. Align your multiplier to the scale conventional in your field.
- Overlooking competing risks: In studies of disease-specific incidence, participants who die from a different cause cannot develop the outcome of interest. Failing to account for competing risks overestimates the incidence of the index disease, particularly in older populations.
- Conflating incidence with prevalence in claims or registry data: Administrative datasets require careful definition of a “look-back” period to exclude pre-existing (prevalent) cases. Too short a look-back period misclassifies prevalent cases as incident, inflating incidence estimates.
Do you want to generate robust and high-quality evidence that can improve public health and well-being? Get expert advice from a biostatistician on calculating incidence and prevalence, through Editage’s Statistical Analysis & Review Services.
Frequently asked questions
What is the difference between incidence and prevalence?
Incidence counts new cases of a disease arising in a population over a defined time period: it measures risk or rate of disease onset. Prevalence counts all existing cases (new and old) at a given point or period: it measures the overall burden. Incidence is dynamic; prevalence is a cumulative stock shaped by incidence, disease duration, and mortality.
Can prevalence ever be higher than incidence?
Yes, and for chronic diseases it almost always is. Because prevalence accumulates over time (existing cases persist), a disease that is only moderately incident but lasts for years or decades will build up a large prevalent pool. Type 2 diabetes is a clear example: annual incidence is a fraction of a percent, yet prevalence exceeds 10% in many countries.
How do you calculate incidence rate when follow-up times differ between participants?
Use person-time as the denominator. Sum the time each participant spent at risk (from enrollment to either the event, censoring, or end of study), then divide the number of new cases by this total person-time. Express the result per 1,000 or 100,000 person-years as appropriate. This approach correctly handles censored observations and variable entry dates.
Which study design is best for measuring incidence?
Cohort studies (both prospective and retrospective) are the standard design for measuring incidence, because participants are identified as disease-free at baseline and then followed over time to track new case development. Cross-sectional studies are not suitable for estimating incidence, as they cannot determine when cases arose relative to exposure.
What is the prevalence–incidence relationship formula?
In a steady-state population, Prevalence ≈ Incidence Rate × Average Disease Duration. This means a disease’s prevalence is jointly determined by how often it arises (incidence) and how long affected individuals remain diseased (duration). Successful treatments that prolong survival but do not cure the disease increase duration, and therefore increase prevalence even without any change in incidence.




