Editage Insights Editage Insights
Log In/Register Submit Enquiry
Editage Insights
  • Publication Success
    • Academic Editing Services
    • Publication Support Services
    • Translation Services
    • Statistical Analysis and Review Services
  • Research Journey
    • Conducting Research
      • Research Data Management
      • Publication Planning
    • Manuscript Writing
      • Academic Writing
      • Research Paper Structure
    • Journal Selection
      • Choosing a Journal
      • Publication Models
    • Journal Submission & Peer Review
      • Manuscript Submission
      • Tracking Your Submission
      • Journal Rejection
      • Journal Retraction
  • Career Growth
    • Securing Research Funding
      • Funding Sources
      • Grant Application
    • Science Communication
      • Public Engagement
      • Plain Language Summaries
      • Video & Graphical Abstracts
      • Promoting your Research
    • Professional Development
      • Collaboration and networking
      • Presentation skills
      • Project Management
    • Career Advancement
      • Becoming a Peer Reviewer
      • Career Advice for Researchers
  • Mental Health
    • Mental Health in Academia
      • Research Culture
      • Researcher Wellness
    • Stories by Researchers
  • Q & A
  • Training Resources
    • WEBINARS & WORKSHOPS
    • Downloadables
  • Industry Outlook
    • AI & Digital Transformation
    • Maximizing Impact
    • Research Integrity
    • Researcher Engagement
    • Trends in Scholarly Publishing
  • Submit Enquiry
  1. Home
  2. Data Analysis
  3. How to analyze…
  • Data Analysis
  • Data Storage & Management
  • Publication Support Services

Infographic: How to analyze count data in research

INFOGRAPHICS and DOWNLOADABLES New

  • Marisha Fonseca
  • Marisha Fonseca

    An editor at heart and perfectionist by disposition, providing solutions for journals, publishers, and universities in areas like alt-text writing and publication consultancy.

    May 22, 2026

Reading time
13 mins
 How to analyze count data in research

In this article, you’ll learn

  • What is count data?
  • Count Data vs. Other Data Types
  • Where is count data used in research?
  • Related Statistical Concepts Glossary
  • How to analyze count data?
  • Chi-Square Test
  • Fisher’s Exact Test
  • Wilcoxon Rank-Sum Test
  • Negative Binomial Regression
  • Poisson Regression
  • Comparison Table: Choosing the Right Test for Count Data
  • Assumptions and Violations of Count Data Tests
  • Step-by-Step How-To for Each Test
  • Limitations of Each Test
  • Reporting Results of Count Data Analysis in a Manuscript
  • Real-World Examples of Count Data Analysis by Research Field
  • Frequently Asked Questions

What is count data?

Count data refers to numerical values that represent the frequency or occurrence of discrete events. These events often involve the counting of specific entities, such as cells, disease cases, or genetic mutations. In the context of biomedical research, count data can be thought of as the number of times an event of interest occurs within a defined sample or population.

Count Data vs. Other Data Types

Understanding whether your data qualifies as count data is an important step before choosing a statistical test. Count data is often confused with other data types, which can lead to incorrect analysis choices.

What makes data “count data”?

  • It consists of non-negative integers (0, 1, 2, 3, …)
  • It represents the number of times a discrete event occurred
  • There is a meaningful lower bound of zero, but no fixed upper bound in most cases
  • Examples: number of hospital readmissions, number of mutations detected, number of adverse events

 

Count Data vs. Continuous Data

Feature Count Data Continuous Data
Values Non-negative integers only Any real number within a range
Examples Number of tumour cells, number of infections Blood pressure, body weight, temperature
Distribution Poisson or negative binomial Normal (Gaussian) or other continuous distributions
Appropriate tests Chi-square, Poisson regression, negative binomial regression t-test, ANOVA, linear regression
Can it be negative? No Yes (depending on the variable)

Count Data vs. Ordinal Data

Feature Count Data Ordinal Data
Values Actual numerical counts Ranked categories (e.g., low, medium, high)
Examples Number of relapses Pain score on a 1-5 scale, disease severity rating
Mathematical operations Addition and subtraction are meaningful Ranking order is meaningful but differences between levels are not
Appropriate tests Poisson or negative binomial regression Ordinal logistic regression, Wilcoxon signed-rank test

 

Count Data vs. Binary/Categorical Data

Feature Count Data Binary/Categorical Data
Values Non-negative integers Fixed categories (yes/no, group A/B/C)
Examples Number of seizures per month Whether a patient has a disease (yes/no)
Appropriate tests Poisson regression, negative binomial regression Chi-square, Fisher’s exact test, logistic regression

 

Common Mistakes in Classifying Data Types

  • Treating count data as continuous and applying a t-test or linear regression, which can violate distributional assumptions and produce biased results
  • Categorising count data into groups (e.g., low/high) unnecessarily, which loses information
  • Confusing a Likert scale response (ordinal) with count data simply because both consist of integers
  • Applying chi-square to data with very small sample sizes instead of Fisher’s exact test

 

Where is count data used in research?

Count data is extensively used in various areas of biomedical research. For example, in epidemiology, researchers may count the number of individuals with a particular disease in a population, while in genomics, scientists often count the occurrences of specific genetic variants or the expression levels of genes. In clinical research, counting adverse events or patient outcomes is common.

Related Statistical Concepts Glossary

Before we dive into analyzing count data, let’s define some of the key terms you’re going to find in this article.

Term Definition Example
Count data Non-negative integers representing how many times a discrete event occurred Number of hospital visits, number of mutations, number of adverse events
Discrete distribution A probability distribution describing outcomes that can only take specific, separate values (usually integers) Poisson and negative binomial distributions, as opposed to the normal distribution
Poisson distribution Models the number of events occurring in a fixed interval, assuming events occur independently at a constant average rate; the mean and variance are equal (both equal lambda) Number of new infections per week in a stable epidemic
Negative binomial distribution An extension of the Poisson distribution with an additional dispersion parameter that allows variance to exceed the mean Used when count data is overdispersed, such as hospital readmissions in a high-risk population
Overdispersion A condition where the variance of a count variable is greater than its mean, violating Poisson regression assumptions A dataset of patient readmissions where a small number of patients account for a disproportionately high number of events
Zero-inflation A condition where a dataset contains more zero counts than a standard Poisson or negative binomial model would predict Species count surveys where most sites record no observations of a rare animal
Contingency table A table displaying the frequency distribution of two or more categorical variables simultaneously A 2×2 table showing disease status (yes/no) by smoking status (yes/no)
Non-parametric test A statistical test that does not assume a specific distribution for the data Wilcoxon rank-sum test used instead of a t-test when count data is skewed
Incidence rate ratio (IRR) The exponentiated coefficient from a Poisson or negative binomial model; the ratio of the expected count for one group compared to a reference IRR = 1.45 means the expected count is 45% higher in the exposed group than the reference group
Equidispersion A condition where the mean and variance of a count variable are approximately equal, as assumed by the Poisson distribution A Poisson-distributed variable with mean = 3 and variance ≈ 3
Degrees of freedom The number of values free to vary when calculating a statistic; for chi-square = (rows – 1) x (columns – 1) A 2×2 contingency table has 1 degree of freedom
p-value The probability of observing a test statistic as extreme as the one calculated, assuming the null hypothesis is true p = 0.03 means there is a 3% chance of observing this result if there were truly no association
Confidence interval (CI) A range of values within which the true population parameter is expected to fall with a specified probability 95% CI [1.12, 1.87] means we are 95% confident the true IRR lies between 1.12 and 1.87
AIC (Akaike Information Criterion) A measure for comparing statistical models; lower values indicate a better balance between fit and complexity Used to choose between Poisson and negative binomial regression; the model with lower AIC is preferred
Generalised linear model (GLM) A framework extending linear regression to accommodate non-normal outcome variables, including counts Poisson and negative binomial regression are both types of GLM

How to analyze count data?

To analyze count data effectively, biomedical researchers rely on specialized statistical methods such as the chi-square test. These statistical approaches are designed to handle data where the outcomes are discrete and non-negative, making them particularly suitable for count data analysis. They help researchers understand patterns, relationships, and associations within the data.

Accurate analysis of count data is crucial in biomedicine, as it can provide insights into disease prevalence, the impact of genetic factors, and the effectiveness of treatments. Biomedical researchers may use count data to assess the success of a new drug in reducing the number of disease cases, to identify genes associated with a specific condition, or to monitor the progression of a disease.

Chi-Square Test

The chi-square test is one of the most widely used statistical tests for count data. It assesses whether there is a statistically significant association between two categorical variables by comparing the observed counts in each category to the counts that would be expected if there were no association.

  • Developed by Karl Pearson in 1900
  • Works on a contingency table of observed frequencies
  • Produces a chi-square statistic and a p-value
  • A p-value below 0.05 typically indicates a significant association between the variables

Assumptions:

  • Data must be in the form of counts (frequencies), not percentages or proportions
  • Each observation must be independent
  • Expected frequency in each cell should be at least 5
  • Categories must be mutually exclusive

When NOT to use it:

  • When sample sizes are very small (use Fisher’s exact test instead)
  • When expected cell frequencies fall below 5
  • When the same subjects appear in more than one category

 

Fisher’s Exact Test

Fisher’s exact test is used to examine the association between two categorical variables, particularly when sample sizes are too small for the chi-square test to be reliable. Unlike the chi-square test, it calculates the exact probability of observing the data, rather than relying on an approximation.

  • Developed by Ronald Fisher in 1922
  • Most commonly applied to 2×2 contingency tables
  • Suitable for small samples where expected cell counts fall below 5
  • Computationally intensive for large datasets, which is why chi-square is preferred when samples are adequate

Assumptions:

  • The row and column totals (marginal totals) are fixed
  • Observations are independent
  • Data is categorical

When NOT to use it:

  • When sample sizes are large (chi-square is more appropriate and computationally practical)
  • When comparing more than two groups (extensions exist but are less common)

 

Wilcoxon Rank-Sum Test

The Wilcoxon rank-sum test (also known as the Mann-Whitney U test) is a non-parametric test used to compare count data between two independent groups when the data does not follow a normal distribution. Instead of comparing means, it compares the entire distribution of values between the two groups.

  • Non-parametric, meaning it makes no assumption about the distribution of the data
  • Ranks all observations from both groups combined, then compares the sum of ranks between groups
  • Appropriate when data is skewed or contains outliers
  • Produces a U statistic and a corresponding p-value

Assumptions:

  • The two groups are independent
  • Data is at least ordinal (can be ranked)
  • The distribution of both groups has the same shape (even if not normal)

When NOT to use it:

  • When data is normally distributed (a t-test would be more powerful)
  • When comparing more than two groups (use Kruskal-Wallis test instead)
  • When data is paired (use Wilcoxon signed-rank test instead)

 

Negative Binomial Regression

Negative binomial regression is an extension of Poisson regression designed to handle overdispersed count data, that is, data where the variance is greater than the mean. This situation is common in real-world biomedical and epidemiological datasets.

  • An extension of the generalized linear model (GLM) framework
  • Adds a dispersion parameter to account for extra variability in the data
  • Produces incidence rate ratios (IRR) or regression coefficients depending on how results are reported
  • More flexible than Poisson regression because it does not assume the mean and variance are equal

Assumptions:

  • The outcome is a non-negative count variable
  • Observations are independent
  • There is overdispersion in the data (variance > mean)
  • The log of the expected count is a linear function of the predictor variables

When NOT to use it:

  • When data is not overdispersed (Poisson regression is more appropriate)
  • When there is a very high proportion of zero counts (consider zero-inflated negative binomial regression)

 

Poisson Regression

Poisson regression is a type of generalized linear model used to model count data that follows a Poisson distribution. It is used to examine the relationship between one or more predictor variables and a count outcome, and is particularly suitable for studying rates and frequencies.

  • Based on the Poisson distribution, in which events occur independently and at a constant average rate
  • The model estimates the log of the expected count as a linear function of predictors
  • Coefficients are typically reported as incidence rate ratios (IRR) after exponentiating
  • Can include an offset term to account for differences in exposure time or population size

Assumptions:

  • The outcome is a non-negative count
  • Events occur independently
  • The mean and variance of the count outcome are approximately equal (equidispersion)
  • The log of the expected count changes linearly with predictors

When NOT to use it:

  • When variance is much greater than the mean (overdispersion); use negative binomial regression instead
  • When the data has a large number of zero counts; use zero-inflated Poisson or zero-inflated negative binomial models instead
  • When the outcome is a proportion or continuous variable

 

Comparison Table: Choosing the Right Test for Count Data

Test Data Type Sample Size Key Assumption When NOT to Use Common Software Functions
Chi-Square Test Two categorical variables Large (expected counts >= 5 per cell) Independence of observations; expected cell count >= 5 Small samples; expected counts < 5 R: chisq.test() / Python: scipy.stats.chi2_contingency / SPSS: Crosstabs
Fisher’s Exact Test Two categorical variables (typically 2×2) Small Fixed marginal totals; independent observations Large samples (computationally impractical) R: fisher.test() / Python: scipy.stats.fisher_exact / SPSS: Crosstabs (exact option)
Wilcoxon Rank-Sum Test Continuous or count outcome; two independent groups Any Data can be ranked; same shape of distribution in both groups Paired data; more than two groups; normally distributed data R: wilcox.test() / Python: scipy.stats.mannwhitneyu / SPSS: Nonparametric Tests
Negative Binomial Regression Count outcome with overdispersion Moderate to large Overdispersion (variance > mean); independent observations Data is not overdispersed; excessive zeros R: glm.nb() in MASS / Python: statsmodels NegativeBinomial / SPSS: Generalized Linear Models
Poisson Regression Count outcome Moderate to large Equidispersion (mean = variance); independent events Overdispersed data; excessive zeros R: glm(…, family=poisson) / Python: statsmodels Poisson / SPSS: Generalized Linear Models

 

Assumptions and Violations of Count Data Tests

Before running any statistical test for count data, researchers should verify that the key assumptions of the chosen test are met. Violations of these assumptions can lead to incorrect conclusions.

 

How to Test for Overdispersion (Poisson vs. Negative Binomial)

Overdispersion occurs when the variance in your count data is greater than the mean. Poisson regression assumes they are equal. If overdispersion is present and ignored, standard errors will be underestimated and p-values will be misleadingly small.

Ways to detect overdispersion:

  • Compare the mean and variance of your count variable as a first check; a variance substantially larger than the mean is a warning sign
  • Fit a Poisson regression model and examine the ratio of the residual deviance to the degrees of freedom; a value much greater than 1 suggests overdispersion
  • Use a formal dispersion test in R: dispersiontest() from the AER package
  • Fit both a Poisson and a negative binomial model and compare them using the Akaike Information Criterion (AIC); a lower AIC for the negative binomial model suggests overdispersion

What to do if overdispersion is detected:

  • Switch from Poisson regression to negative binomial regression
  • Alternatively, use quasi-Poisson regression, which adjusts standard errors without requiring a fully specified overdispersion model

 

What to Do When Count Data Has Excess Zeros

In many biomedical datasets, the number of zero counts is higher than what a Poisson or negative binomial distribution would predict. This is called zero-inflation.

Common scenarios:

  • Counting the number of adverse events in a low-risk population where most patients experience none
  • Counting mutations in a sample where many specimens have no mutations at all

How to detect zero-inflation:

  • Compare the observed proportion of zeros in your data to the proportion predicted by a fitted Poisson or negative binomial model
  • Use the rootogram (a graphical tool in R via the countreg or vcd package) to visualise the fit
  • Apply a formal test such as the Vuong test to compare a standard model against a zero-inflated alternative

What to do:

  • Use a zero-inflated Poisson (ZIP) model if the base count process follows a Poisson distribution
  • Use a zero-inflated negative binomial (ZINB) model if there is also overdispersion
  • Use a hurdle model if the process generating zeros is conceptually distinct from the process generating non-zero counts

 

Normality Testing for the Wilcoxon Rank-Sum Test

The Wilcoxon rank-sum test is used when the normality assumption of a t-test cannot be met. Before deciding between the two, normality should be formally assessed.

Ways to check normality:

  • Visual methods: Q-Q plots and histograms are the most practical first step
  • Shapiro-Wilk test: recommended for small to moderate sample sizes (R: shapiro.test())
  • Kolmogorov-Smirnov test: more appropriate for larger samples
  • Anderson-Darling test: generally considered more powerful than Kolmogorov-Smirnov

Interpreting results:

  • A statistically significant result from a normality test (p < 0.05) means the normality assumption is violated and the Wilcoxon rank-sum test is more appropriate
  • In large samples, normality tests are very sensitive and may flag minor, inconsequential deviations; visual inspection should accompany formal testing

 

Verifying Independence of Observations

All five tests covered on this page assume that observations are independent. Violation of this assumption is one of the most common errors in applied research.

  • Independence is violated when the same subject is measured more than once (repeated measures), when patients are clustered within hospitals or clinics, or when family members are included as separate observations
  • If observations are paired or matched, use paired equivalents such as the McNemar test (instead of chi-square) or the Wilcoxon signed-rank test (instead of Wilcoxon rank-sum)
  • If observations are clustered, consider mixed-effects models or generalised estimating equations (GEE)

 

Step-by-Step How-To for Each Test

 

How to run a Chi-Square Test

  • Step 1: Organise your data into a contingency table showing the counts of each combination of categories.
  • Step 2: Calculate the expected frequency for each cell using the formula: Expected = (Row total x Column total) / Grand total
  • Step 3: Verify that all expected frequencies are at least 5. If not, use Fisher’s exact test.
  • Step 4: Run the test.

In R:

    • chisq.test(table(datavariable2))

In Python:

    • from scipy.stats import chi2_contingency
    • chi2, p, dof, expected = chi2_contingency(contingency_table)
  • Step 5: Interpret the output.
    • The chi-square statistic measures how far observed counts deviate from expected counts
    • The p-value tells you whether the association is statistically significant
    • Report degrees of freedom, chi-square value, and p-value: e.g., X²(1) = 4.23, p = 0.04

 

How to run Fisher’s Exact Test

  • Step 1: Organise your data into a 2×2 contingency table.
  • Step 2: Confirm that sample sizes are small or that expected cell counts fall below 5.
  • Step 3: Run the test.

In R:

    • fisher.test(table(datavariable2))

In Python:

    • from scipy.stats import fisher_exact
    • oddsratio, pvalue = fisher_exact(contingency_table)
  • Step 4: Interpret the output.
    • The odds ratio describes the strength and direction of the association
    • The p-value indicates statistical significance
    • Report as: Fisher’s exact test, p = 0.03, OR = 2.5

 

Wilcoxon Rank-Sum Test

  • Step 1: Confirm that your data consists of two independent groups and that normality is not met.
  • Step 2: Run the test.

In R:

    • wilcox.test(outcome ~ group, data = data)

In Python:

    • from scipy.stats import mannwhitneyu
    • stat, p = mannwhitneyu(group1, group2, alternative=’two-sided’)
  • Step 3: Interpret the output.
    • The W statistic (or U statistic) reflects the difference in ranks between the two groups
    • A significant p-value indicates the distributions of the two groups differ
    • Report as: W = 345, p = 0.02
  • Step 4: Consider reporting the median and interquartile range (IQR) for each group alongside the test result, as these are more informative than means for non-normal data.

 

How to run Negative Binomial Regression

  • Step 1: Confirm your outcome is a count variable and that overdispersion is present.
  • Step 2: Fit the model.

In R:

    • library(MASS)
    • model <- glm.nb(outcome ~ predictor1 + predictor2, data = data)
    • summary(model)

In Python:

    • import statsmodels.api as sm
    • model = sm.NegativeBinomial(y, X).fit()
    • print(model.summary())
  • Step 3: Interpret the output.
    • Exponentiate the coefficients to obtain incidence rate ratios (IRR): exp(coef)
    • An IRR greater than 1 indicates an increase in the expected count; less than 1 indicates a decrease
    • Report as: IRR = 1.45, 95% CI [1.12, 1.87], p = 0.005
  • Step 4: Check model fit by comparing AIC with a Poisson model and examining residual plots.

 

How to run Poisson Regression

  • Step 1: Confirm your outcome is a count variable and that the mean and variance are approximately equal.
  • Step 2: Fit the model.

In R:

    • model <- glm(outcome ~ predictor1 + predictor2, data = data, family = poisson)
    • summary(model)

In Python:

    • import statsmodels.api as sm
    • model = sm.Poisson(y, X).fit()
    • print(model.summary())
  • Step 3: If comparing rates across groups with different observation periods, include an offset:

In R:

  • model <- glm(outcome ~ predictor1 + offset(log(exposure)), data = data, family = poisson)
  • Step 4: Interpret the output.
    • Exponentiate coefficients to get IRRs
    • Report as: IRR = 0.78, 95% CI [0.65, 0.94], p = 0.008
  • Step 5: Test for overdispersion using dispersiontest() from the AER package. If overdispersion is detected, switch to negative binomial regression.

 

Limitations of Each Test

Test Key Limitations
Chi-Square Test Requires large sample sizes; unreliable when expected cell counts are less than 5; does not provide a measure of the strength of association on its own; sensitive to large sample sizes (may flag trivial associations as significant)
Fisher’s Exact Test Computationally intensive for tables larger than 2×2 or for large datasets; assumes fixed marginal totals, which may not reflect the study design; does not generalise easily to multiple groups or covariates
Wilcoxon Rank-Sum Test Less statistical power than a t-test when normality assumptions are actually met; only compares two groups (Kruskal-Wallis is needed for three or more); does not model the relationship between predictors and outcome; result can be difficult to interpret in practical terms
Negative Binomial Regression Requires a larger sample size than simpler tests to estimate the dispersion parameter reliably; more complex to interpret and report than chi-square or Fisher’s; may still be inadequate if there is extreme zero-inflation
Poisson Regression Strict equidispersion assumption is rarely met in practice; underestimates standard errors if overdispersion is ignored; does not handle excess zeros well without modification; regression coefficients require exponentiation to be interpretable as rate ratios

 

Additional cross-cutting limitations to be aware of:

  • None of these tests account for confounding variables unless a regression model is used; chi-square and Fisher’s exact test in particular only assess the bivariate relationship between two variables
  • All tests assume independence of observations; clustering, repeated measures, or matched designs require different or additional analytical approaches
  • Statistical significance does not equal clinical or practical significance; a large sample can produce a significant p-value for a trivially small association
  • These tests do not distinguish between correlation and causation

 

Reporting Results of Count Data Analysis in a Manuscript

What to Include in the Methods Section

The Methods section should give readers enough information to evaluate and replicate the analysis. For count data, this means:

  • State clearly that the outcome variable is a count (e.g., “the primary outcome was the number of hospital readmissions per patient over 12 months”)
  • Specify the statistical test or model used and the justification for choosing it (e.g., “negative binomial regression was used to account for overdispersion in the outcome variable, confirmed by a dispersion test”)
  • Report how you checked key assumptions (normality, independence, overdispersion, zero-inflation)
  • State the significance threshold used (typically p < 0.05)
  • Name the statistical software and version (e.g., “All analyses were performed in R version 4.3.1 using the MASS package”)

 

How to Present Results

Regardless of which test is used, results should include:

  • The name of the statistical test used
  • The test statistic and its degrees of freedom (where applicable)
  • The exact p-value (not just “p < 0.05”)
  • Effect size or measure of association (chi-square: Cramér’s V; regression models: IRR with 95% confidence interval)

 

Descriptive statistics table (to present before the test results):

Variable Group A (n = XX) Group B (n = XX)
Count outcome, median (IQR) XX (XX-XX) XX (XX-XX)
Count outcome, mean (SD) XX (XX) XX (XX)
Proportion with zero count, n (%) XX (XX%) XX (XX%)

 

Regression results table (for Poisson or negative binomial models):

Predictor IRR 95% CI p-value
Predictor 1 1.45 1.12 to 1.87 0.005
Predictor 2 0.78 0.61 to 0.99 0.042
Predictor 3 (reference) 1.00 — —

 

Chi-square or Fisher’s exact test results table:

Outcome Group A, n (%) Group B, n (%) Test statistic p-value
Event present XX (XX%) XX (XX%) X²(1) = 4.23 0.040
Event absent XX (XX%) XX (XX%)

 

Common Reviewer Feedback on Statistical Reporting of Count Data

Reviewers of biomedical manuscripts frequently raise the following issues when count data is analysed. Addressing these proactively will improve the chances of acceptance:

  • “The authors used a t-test on count data without checking distributional assumptions.”
    • Use and justify appropriate count data methods instead of defaulting to tests designed for continuous normally distributed data.
  • “Poisson regression was used but overdispersion was not assessed.”
    • Always test for overdispersion when using Poisson regression and report the result.
  • “The authors report p-values only; effect sizes and confidence intervals should be included.”
    • For regression models, always report IRR with 95% CI. For chi-square, include Cramér’s V or odds ratio.
  • “It is unclear why Fisher’s exact test was chosen over chi-square.”
    • State sample size or expected cell count as justification.
  • “The software used for analysis is not mentioned.”
    • Always name the software, version, and relevant packages.
  • “Confounders were not adjusted for in the analysis.”
    • If applicable, use multivariable regression models to adjust for known confounders rather than relying on bivariate tests alone.

 

 

Real-World Examples of Count Data Analysis by Research Field

Epidemiology

Count data is central to epidemiological research, where the frequency of disease occurrence is the primary outcome of interest.

  • Chi-square test: comparing the number of diabetes cases across different age groups in a cross-sectional survey
  • Poisson regression: modelling the number of new tuberculosis cases per 100,000 population as a function of socioeconomic indicators across regions
  • Negative binomial regression: examining the number of malaria episodes per child per year in a longitudinal cohort, where some children have many episodes and overdispersion is expected

 

Genomics and RNA-seq Analysis

Count data appears naturally in genomics, where the number of sequencing reads mapped to each gene or genomic feature must be analysed.

  • Negative binomial regression: the standard approach for differential gene expression analysis in RNA-seq data; used in tools such as DESeq2 and edgeR, both of which model read counts using a negative binomial distribution
  • Fisher’s exact test: testing whether a particular gene variant is more frequent in cases than controls in a genome-wide association study (GWAS) with small subgroup sizes
  • Wilcoxon rank-sum test: comparing gene expression counts between two treatment conditions when distributional assumptions are unclear

 

Clinical Trials

In clinical research, count outcomes arise frequently when measuring the frequency of events experienced by participants.

  • Chi-square test: comparing the proportion of patients who experienced at least one adverse event across treatment and placebo groups
  • Fisher’s exact test: assessing whether a rare serious adverse event occurred more frequently in one treatment arm in a small pilot trial
  • Negative binomial regression: modelling the number of disease exacerbations per patient over a 12-month follow-up period, accounting for patients with unusually high exacerbation rates

 

Public Health

Public health research relies on count data to monitor population-level outcomes and guide policy decisions.

  • Poisson regression: estimating the expected number of road traffic fatalities per million vehicle miles travelled as a function of speed limits and seatbelt legislation
  • Negative binomial regression: modelling emergency department visits per patient per year in a population with high variability in healthcare utilisation
  • Chi-square test: comparing vaccination uptake counts across different demographic groups to identify disparities

 

Ecology

In ecology, count data arises when researchers record the number of individuals of a species observed in a defined area or time window.

  • Poisson regression: modelling the number of bird species observed at a survey site as a function of habitat type and vegetation density
  • Zero-inflated Poisson or negative binomial regression: commonly needed in ecology because many survey sites record zero observations, particularly for rare or elusive species
  • Wilcoxon rank-sum test: comparing insect abundance counts between two habitat types when the data is highly skewed due to a few very high-density sites

 

Frequently Asked Questions

What is count data in statistics?

Count data refers to numerical data that represents the number of times a discrete event occurs within a defined unit of observation, such as a patient, a time period, or a geographic area. Count data values are always non-negative integers (0, 1, 2, 3, and so on). Examples include the number of hospital readmissions per patient, the number of gene mutations detected in a sample, or the number of adverse drug reactions reported in a clinical trial.

 

What is the difference between Poisson and negative binomial regression?

Both are used to model count outcomes, but they differ in a key assumption:

Feature Poisson Regression Negative Binomial Regression
Distributional assumption Mean equals variance (equidispersion) Variance exceeds mean (overdispersion)
Dispersion parameter None Estimated from data
Use case Rare events with stable rates Real-world data with extra variability
Risk of misuse Underestimates standard errors if overdispersion is present Unnecessary complexity if data is not overdispersed

If you are unsure which to use, fit both models and compare AIC values. Choose the model with the lower AIC.

 

When should I use Fisher’s exact test instead of chi-square?

Use Fisher’s exact test when:

  • Your total sample size is small (generally fewer than 20 observations)
  • Any expected cell count in your contingency table is less than 5
  • You are working with a 2×2 table and cannot guarantee large expected frequencies

Use chi-square when:

  • All expected cell counts are 5 or greater
  • Your sample size is large enough to rely on the chi-square approximation

 

Can I use a t-test on count data?

A t-test is generally not the best choice for count data because:

  • Count data is often skewed, particularly when counts are low, which violates the normality assumption of the t-test
  • Count data cannot be negative, while the normal distribution extends to negative infinity
  • The variance of count data often increases with the mean, which the t-test does not account for

Better alternatives:

  • For comparing two groups: Wilcoxon rank-sum test (non-parametric) or Poisson/negative binomial regression
  • For large samples where counts are high and approximately symmetric, the t-test may perform reasonably well in practice, but regression-based approaches are still preferred

 

What is overdispersion and how do I detect it?

Overdispersion is a condition in count data where the observed variance is greater than the mean. Poisson regression assumes these are equal, so ignoring overdispersion leads to underestimated standard errors, artificially narrow confidence intervals, and inflated type I error rates (false positives).

How to detect overdispersion:

  • Calculate the mean and variance of your count variable; a variance substantially larger than the mean is a warning sign
  • Fit a Poisson regression and compute the ratio of residual deviance to degrees of freedom; values well above 1.0 indicate overdispersion
  • Use the formal dispersiontest() function in R (AER package)
  • Compare AIC of a Poisson model versus a negative binomial model

 

How do I report count data results in a research paper?

  • Descriptive statistics: report median and interquartile range (IQR) for non-normally distributed count variables, or mean and standard deviation if the distribution is approximately symmetric
  • For chi-square: report X²(df) = value, p = value, and Cramér’s V for effect size
  • For Fisher’s exact test: report the p-value and odds ratio with 95% confidence interval
  • For Wilcoxon rank-sum: report W or U statistic, p-value, and medians for each group
  • For regression models: report incidence rate ratios (IRR) with 95% confidence intervals and p-values for each predictor
  • Always state the statistical software used (e.g., R version 4.3.1, Python 3.11 with statsmodels 0.14)

 

This article was originally published on November 14, 2023, and updated on May 30, 2026.

Infographic describing 5 Popular Statistical Tests for Count Data1. Chi-Square Test: Assesses independence and association between categorical variables using observed and expected count comparisons. Example: To examine the relationship between smoking status (smoker, non-smoker) and lung cancer diagnosis (yes, no) among a group of patients. 2. Fisher's Exact Test: Analyzes the association between two categorical variables, especially when sample sizes are small. Example: to compare the occurrence of rare adverse drug reactions (yes, no) between two different drug treatment groups in a small clinical trial 3. Wilcoxon Rank-Sum Test: Compares the distribution of count data between two groups when normality assumptions are violated. Example: To compare the counts of CD4+ T cells between patients receiving two different treatments for HIV when the data distribution is non-normal. 4. Negative Binomial Regression: Handles overdispersed count data, accounting for extra variability often seen in real-world datasets. Example: To examine the association between the number of hospital readmissions and patient comorbidity in cardiac patients. 5. Poisson Regression: Models count data with a Poisson distribution, suitable for studying associations and predicting counts. Example: To verify if the number of new COVID-19 cases in a region depends on vaccination rates and population density.
How to analyze count data

5 Popular statistical tests for count data_0.jpg

Download

Found this useful?

If so, share it with your fellow researchers

View Comments

Related post

Data Analysis

Differences between correlation and regression: Learn about...

May 20, 2026
Data Analysis Planning to Write

What researchers should do BEFORE statistical analysis

May 6, 2026

Powerful ways to tackle missing data

May 14, 2024

Related Reading

  • From novice to ninja: How working with a biostatistician can boost your own statistical expertise

  • A handy guide to joint modeling for biomedical researchers

  • How to test relationships between variables: Key precautions

  • Unleashing the power of deep learning in biomedical research

  • How to analyze longitudinal data appropriately: Tips for biomedical researchers

About Editage Insights

Editage Insights offers a wealth of free academic research and publishing resources and is a one-stop guide for authors and others involved in scholarly publishing. Our original resources for authors and journals will help you become an expert in academic publishing. Register for comprehensive research tips and expert advice on English writing, journal publishing, good publication practices, trends in publishing, and a lot more.

More

Quality

  • Confidentiality
  • Publication Ethics
  • Quality Assurance
  • Testimonials

Editage

  • About Editage
  • Payment Options
  • Prices for Editing
  • Terms of Use

More

  • Contribute as a Guest
  • Privacy Policy
  • Cookies Policy
  • Contact Us

Publication Support Services

  • Artwork Preparation
  • Journal Selection
  • Journal Submission
  • Plagiarism Check
  • Publication Support Packs
  • Rapid Technical Review
  • Resubmission Support
  • Statistical Analysis and Review

Our Expertise

  • Business and Economics
  • Life Sciences
  • Medical Sciences
  • Physical Sciences
  • Social Sciences

FOLLOW ON SOCIAL PLATFORMS

Editage Insights Global Sites

Japanese – エディテージ・インサイト

Simplified Chinese – 意得辑专家视点

Korean - 에디티지 인사이트

Editing Services

  • Advanced Editing
  • Digital Editing
  • English Editing Services
  • Post-Editing Services
  • Premium Editing
  • Scientific Editing

Full disclosure: Editage Insights is a product of Editage, a global provider of world-class scientific communication solutions. Editage Insights is funded by Editage and endorses services provided by Editage but is editorially independent. English Editing - Editage.com | 英文校正 – Editage.jp | 원어민영문교정 – Editage.co.kr | SCI英文论文发表 – Editage.cn | publicação de artigos – Editage.com.br | 編輯英文 – Editage.com.tw

Copyright Cactus Communications. All rights reserved.
  • Privacy Policy
  • Cookies Policy
  • Terms of Use
  • Careers

Filter by a topic

or Select all topics
x
  • Academic Publishing Trends
  • Articles
  • Books & Tools
  • Business of Academic Publishing
  • Career Growth
    • Career Advancement
      • Becoming a Peer Reviewer
      • Career Advice for Researchers
    • Professional Development
      • Collaboration and networking
      • Presentation skills
      • Project Management
    • Science Communication
      • Plain Language Summaries
      • Promoting your Research
      • Public Engagement
      • Video & Graphical Abstracts
    • Securing Research Funding
      • Funding Sources
      • Grant Application
  • Conducting Research
    • Publication Planning
      • Authorship in Research
      • Literature Search
      • Planning to Write
      • Research Ethics
      • Statement of the Problem
    • Research Data Management
      • Data Analysis
      • Data Storage & Management
  • Impact
  • Industry Outlook
    • AI & Digital Transformation
    • Maximizing Impact
    • Research Integrity
    • Researcher Engagement
    • Trends in Scholarly Publishing
  • Journal Selection
    • Choosing a Journal
      • Journal Selection Tips
      • Presubmission Inquiry
      • Understanding the Impact Factor
    • Publication Models
      • Avoiding Predatory Publishers
      • Open Access & Subscription Models
      • Rapid Publication
  • Journal Submission & Peer Review
    • Journal Rejection
      • Dealing with Rejection
      • Reasons for Rejection
    • Journal Retraction
      • Dealing with Retraction
      • Reasons for Retraction
    • Manuscript Submission
      • Duplicate Submission
      • Ethical Declarations
      • Submission Process
    • Responding to Peer Reviewers
      • Basics of Peer Review
      • Responding to Peer Reviewers
    • Tracking Your Submission
      • Manuscript Status
      • Manuscript Withdrawal
      • Queries to the Editor
  • Manuscript Writing
    • Academic Writing
      • Grammar & Language
      • Plagiarism in Research
      • Style & Format
    • Research Paper Structure
      • Methods
      • Publication Support Services
      • References & Acknowledgements
      • Results & Discussion
      • Study Background & Introduction
      • Tables & Figures
      • Title, Abstract & keywords
  • Marketing
  • Medicine
  • Mental Health
    • Mental Health in Academia
      • Research Culture
      • Researcher Wellness
  • More
    • News & Trends
      • COVID-19
      • Industry Interviews
      • Industry News
      • Industry Trends
      • Medicine
      • Peer Review Week 2020
      • Trending Research
    • Recommended Reads
      • Around the web
      • Our Publication Showcase
      • Resources for Editors
  • New Media
  • Research Perception Building
  • Research stories
  • Researchers and Their Stories
  • Showcasing Research Impact
  • Social Media
  • Social Media Outreach