What is ANOVA (Analysis of Variance)? Types, Assumptions, and Uses

Get Published
Getting your Trinity Audio player ready...

Contents

Analysis of Variance, more commonly known as ANOVA, is one of the most widely used statistical techniques in scientific research. Whether you are comparing the effectiveness of three drug treatments, evaluating teaching methods across classrooms, or assessing crop yields across soil types, ANOVA provides a rigorous framework for determining whether observed differences between groups are statistically meaningful or simply due to chance.

This comprehensive guide covers what ANOVA is, how it works, its key types, the assumptions it rests on, and its applications in real-world research.

What is ANOVA?

ANOVA, or Analysis of Variance, is a statistical method used to compare the means of three or more groups simultaneously to determine whether at least one group mean is significantly different from the others. It does this by partitioning the total variability observed in a dataset into two components:

  • Between-group variance: variation attributable to the differences among group means
  • Within-group variance: variation due to individual differences inside each group (also called residual or error variance)

If the between-group variance is substantially larger than the within-group variance, ANOVA concludes that the group means are unlikely to all be equal. Despite its name, ANOVA analyzes variances to draw conclusions about means.

A Brief History of ANOVA

ANOVA was pioneered by the British statistician Sir Ronald A. Fisher at the Rothamsted Experimental Station in England. His first application appeared in 1921 in Studies in Crop Variation I, and the method became widely known after being published in his landmark 1925 book, Statistical Methods for Research Workers. Fisher’s work revolutionized experimental design by providing a systematic framework for partitioning total variation into meaningful components.

ANOVA vs. the t-Test: Why Not Just Use Multiple t-Tests?

A natural question arises: why not simply run multiple t-tests to compare all pairs of groups? The answer lies in Type I error inflation.

Comparison Method# of Comparisons (3 groups)Cumulative Type I Error Risk
Multiple independent t-tests3 pairwise tests~14% (instead of 5%)
ANOVA1 omnibus test5% (controlled)
Multiple t-tests (6 groups)15 pairwise tests~54%

Each individual t-test carries a 5% chance of a false positive. Running multiple tests multiplies this risk dramatically. ANOVA controls the experiment-wide error rate by evaluating all group differences in a single test. For a detailed discussion of how to select the right statistical test, see this guide to choosing the right statistical test.

How Does ANOVA Work? The F-Test Explained

The core of ANOVA is the F-statistic (also called the F-ratio), which is calculated as:

F = Mean Square Between Groups (MSB) ÷ Mean Square Within Groups (MSW)

Step-by-Step: How ANOVA Calculates the F-Statistic

  1. Calculate the grand mean: the overall mean of all observations across all groups
  2. Compute the Sum of Squares Between (SSB): how much each group mean deviates from the grand mean, weighted by group size
  3. Compute the Sum of Squares Within (SSW): how much individual observations deviate from their own group mean
  4. Divide by degrees of freedom to get Mean Squares (MSB and MSW)
  5. Calculate F = MSB / MSW
  6. Compare F to the critical value from the F-distribution at the chosen significance level (typically α = 0.05)

Interpreting the ANOVA Table

Source of VariationSum of Squares (SS)Degrees of Freedom (df)Mean Square (MS)F-Valuep-Value
Between GroupsSSBk − 1MSB = SSB/(k−1)F = MSB/MSWp
Within GroupsSSWN − kMSW = SSW/(N−k)
TotalSSTN − 1

Where k = number of groups, N = total number of observations

Null and Alternative Hypotheses in ANOVA

  • Null hypothesis (H₀): All group means are equal (μ₁ = μ₂ = μ₃ = … = μk)
  • Alternative hypothesis (H₁): At least one group mean is significantly different from the others

When the p-value is less than the chosen significance level (commonly 0.05), you reject the null hypothesis. This tells you that a difference exists somewhere among the groups: but not which specific groups differ. That requires post-hoc testing (see below).

Key Terminology in ANOVA

TermDefinition
FactorThe categorical independent variable (e.g., treatment type, diet group)
LevelThe individual categories or conditions within a factor (e.g., Drug A, Drug B, Placebo)
Dependent variableThe continuous outcome being measured (e.g., blood pressure, test score)
Between-subjectsDifferent participants in each group
Within-subjectsThe same participants measured across multiple conditions
Interaction effectWhen the effect of one factor depends on the level of another
Main effectThe overall effect of one factor, averaged across levels of other factors
ResidualsThe unexplained variation remaining after the model is fitted

Types of ANOVA

ANOVA is not a single test: it is a family of related methods. The appropriate type depends on the number of independent variables, whether the same subjects appear in multiple groups, and the nature of the experimental design.

One-Way ANOVA

One-way ANOVA is the simplest form. It tests whether the means of three or more groups differ on a single independent variable (factor).

When to use it:

  • You have one categorical independent variable with three or more levels
  • You have one continuous dependent variable
  • Groups are independent of each other

Example:

Testing whether three different fertilizers (A, B, C) produce different average crop yields.

Two-Way ANOVA

Two-way ANOVA examines the effects of two independent variables and the interaction between them on a single dependent variable. It answers three questions simultaneously:

  1. Does Factor A have a significant main effect?
  2. Does Factor B have a significant main effect?
  3. Is there a significant interaction between Factor A and Factor B?

Example:

Studying how both fertilizer type (A, B, C) and irrigation method (drip, flood) affect crop yield, including whether certain fertilizers work better with certain irrigation methods.

Repeated Measures ANOVA

Repeated measures ANOVA is used when the same participants are measured multiple times: across different time points or experimental conditions. Because multiple measurements from the same individual are correlated, standard ANOVA would violate the independence assumption.

Example:

Measuring blood pressure in the same patients at baseline, after 4 weeks, and after 8 weeks of a new therapy.

Important consideration:

Requires testing the sphericity assumption (equal variances of pairwise differences). Violations are corrected using the Greenhouse-Geisser or Huynh-Feldt correction.

ANCOVA (Analysis of Covariance)

ANCOVA combines ANOVA with regression by including one or more continuous covariates: variables that are related to the dependent variable but are not the primary focus of the study. Including covariates reduces unexplained variance and increases statistical power.

 

Example:

Comparing test scores across teaching methods while controlling for students’ prior academic performance.

MANOVA (Multivariate Analysis of Variance)

MANOVA is an extension of ANOVA that handles two or more continuous dependent variables simultaneously. Rather than running separate ANOVAs for each outcome, MANOVA tests whether the group centroids (vectors of means) differ across groups.

Example:

Comparing the effects of three diets on both body weight and cholesterol levels simultaneously.

Nested ANOVA

In nested (hierarchical) ANOVA, the levels of one factor are nested within another. This structure appears when a subgroup exists entirely within a larger group and the subgroups are not crossed with each other.

Example:

Comparing student performance across schools, where classrooms are nested within schools.

Mixed ANOVA (Split-Plot ANOVA)

Mixed ANOVA combines between-subject and within-subject factors in a single design. One or more factors vary between participants, while at least one factor is measured within participants.

Example:

Comparing depression scores (measured at three time points) in patients randomly assigned to either therapy or a control condition.

MANCOVA (Multivariate Analysis of Covariance)

MANCOVA combines the logic of MANOVA and ANCOVA by examining differences in two or more continuous dependent variables across groups while simultaneously controlling for one or more continuous covariates. It is the multivariate extension of ANCOVA, and the covariate-adjusted extension of MANOVA.

When to use it:

  • When you have multiple related dependent variables that should be analyzed together
  • When one or more continuous confounding variables need to be statistically controlled
  • When analyzing each outcome separately or ignoring the covariate would produce misleading results

Example:

Comparing the effects of three different rehabilitation programs on both muscle strength and functional mobility scores, while controlling for patients’ age and baseline fitness level.

Key assumptions (in addition to standard MANOVA assumptions):

  • The covariate(s) must be continuous and measured without error
  • The covariate should be linearly related to each dependent variable within each group
  • Homogeneity of regression slopes: the relationship between the covariate and the dependent variables must be consistent across all groups (tested before running MANCOVA)
  • The covariate should be independent of the treatment or grouping variable, ideally measured before the intervention

Practical note:

MANCOVA is particularly valuable in clinical and experimental research where pre-existing differences between groups (e.g., baseline severity scores, age, or body weight) could otherwise confound the comparison of outcomes. By partialling out covariate effects, it increases statistical power and produces cleaner estimates of group differences.

Summary: Choosing the Right ANOVA Type

ANOVA Type# of IVsSame Subjects?# of DVsKey Use Case
One-Way1No1Basic group comparison
Two-Way2No1Testing interactions between two factors
Three-Way3No1Complex factorial designs
Repeated Measures1+Yes1Longitudinal / within-subject designs
ANCOVA1+No1Controlling for a covariate
MANOVA1+No2+Multiple outcomes simultaneously
Nested2+No1Hierarchical/clustered data
Mixed1+Both1Between + within subject factors

For a deeper look at how ANOVA testing works in statistics, including worked examples, this Editage resource provides a helpful overview.

Are you unsure whether to opt for ANOVA, MANOVA, or ANCOVA? Get guidance from an expert biostatistician, through Editage’s Statistical Analysis & Review Services.

Assumptions of ANOVA

ANOVA is a parametric test, meaning its validity depends on several statistical assumptions. Violating these assumptions can lead to incorrect conclusions. Before running ANOVA, researchers should verify the following:

1. Independence of Observations

Each observation must be independent of all others. This means:

  • Participants in one group should not influence participants in another group
  • No hidden relationships or clustering should exist among observations
  • Data must be collected using valid random sampling or randomization methods

How to check:

Assess the study design: if participants share environments (e.g., students in the same classroom), a nested or mixed model may be needed.

2. Normality of the Dependent Variable

The values of the dependent variable should follow a normal (Gaussian) distribution within each group.

How to check:

  • Shapiro-Wilk test (for small samples, n < 50)
  • Kolmogorov-Smirnov test
  • Q-Q plots (quantile-quantile plots)
  • Histograms of residuals

Practical note:

ANOVA can be relatively robust to moderate violations of normality, especially when group sizes are ≥15 observations.

3. Homogeneity of Variance (Homoscedasticity)

The variance of the dependent variable should be approximately equal across all groups.

How to check:

  • Levene’s Test: most commonly used; robust to non-normality
  • Bartlett’s Test: more sensitive to non-normality
  • Visual inspection of residual plots

What to do if violated:

Consider Welch’s ANOVA (robust to unequal variances), Brown-Forsythe test, or data transformation (e.g., log transformation).

For a broader discussion of how variance and heteroscedasticity affect statistical inference, see this detailed article on understanding heteroscedasticity and homoscedasticity in research data.

4. Additional Requirements

  • Continuous Dependent Variable: must be measured on an interval or ratio scale
  • Categorical Independent Variable(s): divided into distinct groups or levels
  • Sphericity (Repeated Measures ANOVA only): tested via Mauchly’s Test; violations corrected with Greenhouse-Geisser or Huynh-Feldt correction

Summary of ANOVA Assumptions

AssumptionWhat It MeansHow to Test
IndependenceObservations are not relatedStudy design review
NormalityDV is normally distributed within groupsShapiro-Wilk, Q-Q plots
Homogeneity of varianceEqual variances across groupsLevene’s test
Continuous DVOutcome is interval/ratio scaleVariable measurement review
Sphericity (RM ANOVA only)Equal variance of pairwise differencesMauchly’s test

Post-Hoc Tests: Finding Where the Differences Lie

A significant ANOVA F-test tells you that at least one group mean is different: but it does not tell you which groups differ from each other. This requires post-hoc (multiple comparison) tests, which conduct pairwise comparisons while controlling for inflated Type I error.

Common Post-Hoc Tests and When to Use Them

Post-Hoc TestBest Used WhenCharacteristics
Tukey’s HSDEqual group sizes, equal variancesMost widely used; good balance of power and control
Bonferroni correctionFewer planned comparisonsVery conservative; divides α by number of comparisons
Scheffé testUnequal group sizesMost conservative; suitable for complex contrasts
LSD (Fisher’s)Only 3 groups and F is significantLeast conservative; higher risk of Type I error
Dunnett’s testComparing all groups to one controlSpecifically designed for control group comparisons
Games-HowellUnequal variances and/or unequal nDoes not assume equal variances

Effect Size in ANOVA

Statistical significance alone does not convey the magnitude of a difference. Alongside the F-test, researchers should always report an effect size:

  • η² (eta-squared): Proportion of total variance explained by the factor; easy to compute but tends to overestimate in small samples
  • ω² (omega-squared): Less biased estimate of effect size; preferred for small samples
  • Partial η²: Used in factorial ANOVA; variance explained by one factor controlling for others
  • Cohen’s f: Standardized effect size; benchmarks: small = 0.10, medium = 0.25, large = 0.40

Understanding effect size is important for evaluating the practical significance of a finding and for planning future studies. For a comprehensive overview of statistical power in research design, including sample size planning for ANOVA, refer to this resource.

Non-Parametric Alternatives to ANOVA

When ANOVA assumptions are seriously violated and cannot be corrected, non-parametric alternatives should be considered:

Parametric TestNon-Parametric AlternativeWhen to Use
One-Way ANOVAKruskal-Wallis testNon-normal data, ordinal outcomes
Repeated Measures ANOVAFriedman testNon-normal within-subject data
Two-Way ANOVAAligned Rank Transform ANOVANon-normal data with factorial design

For a broader introduction to non-parametric tests for biomedical researchers, see this overview.

How to Perform ANOVA: A Step-by-Step Guide

Step 1: Formulate Your Hypotheses

  • H₀: μ₁ = μ₂ = μ₃ (all group means are equal)
  • H₁: At least one μi ≠ μj (at least one pair of means differs)

Step 2: Choose the Appropriate ANOVA Type

Refer to the summary table above. Key questions: How many independent variables? Are participants the same across conditions? How many dependent variables? Are there any covariates?

Step 3: Check Assumptions

  • Verify independence from study design
  • Test normality within each group (Shapiro-Wilk)
  • Test homogeneity of variance (Levene’s test)
  • For repeated measures: run Mauchly’s test for sphericity

Step 4: Run the ANOVA

SoftwareCommand / Function
Raov(), lm(), ezANOVA()
Python (scipy)scipy.stats.f_oneway()
SPSSAnalyze → Compare Means → One-Way ANOVA
SASPROC ANOVA, PROC GLM
ExcelData Analysis ToolPak → ANOVA
GraphPad PrismBuilt-in ANOVA wizard

Step 5: Interpret Results

  • If p < α (usually 0.05): Reject H₀; at least one group mean differs significantly
  • If p ≥ α: Fail to reject H₀; no significant difference detected
  • Examine the F-value and degrees of freedom
  • Report the effect size (η², ω², or partial η²)

Step 6: Run Post-Hoc Tests (if H₀ is rejected)

Choose the appropriate post-hoc test to identify which specific group pairs are significantly different.

Step 7: Report Results

A complete ANOVA report should include:

  • The F-statistic and degrees of freedom: F(df_between, df_within) = X.XX
  • The p-value
  • The effect size measure
  • Post-hoc test results (if applicable)
  • A visualization (box plot, bar chart with error bars, or interaction plot for two-way ANOVA)

Example of ANOVA reporting:

“A one-way ANOVA revealed a significant effect of fertilizer type on crop yield, F(2, 27) = 9.21, p = .001, η² = .41. Tukey’s post-hoc test indicated that Fertilizer C produced significantly higher yields than both Fertilizer A (p = .003) and Fertilizer B (p = .009), while Fertilizers A and B did not differ significantly (p = .72).”

Applications of ANOVA

ANOVA is applied across a broad range of disciplines wherever group comparisons are needed.

Clinical and Biomedical Research

  • Comparing efficacy of multiple drug doses or treatment regimens in randomized controlled trials
  • Evaluating patient outcomes (recovery time, symptom severity) across diagnostic groups
  • Analyzing the effect of different surgical techniques on postoperative outcomes
  • Repeated measures ANOVA for tracking biomarker changes over a trial timeline

Psychology and Social Sciences

  • Comparing cognitive performance across age groups or experimental conditions
  • Assessing the impact of different therapeutic interventions on mental health outcomes
  • Analyzing survey responses across demographic groups

Education

  • Evaluating whether different teaching methods produce different test scores
  • Comparing student performance across schools, curricula, or grade levels
  • Assessing the impact of class size or instructional time on achievement

Agriculture and Environmental Sciences

  • Testing the effect of different fertilizers, pesticides, or crop varieties on yield
  • Comparing environmental measurements (pollution levels, soil nutrients) across sites
  • Assessing the interaction of irrigation method and planting density on growth

Business, Marketing, and Consumer Research

  • A/B/C testing: comparing conversion rates across three or more website versions
  • Evaluating customer satisfaction scores across product categories or regions
  • Analyzing sales performance across sales teams, promotional strategies, or geographic markets
  • Testing whether pricing strategies differ in their effect on purchase intent

Quality Control and Manufacturing

  • Testing whether output quality differs across machines, shifts, or production batches
  • Comparing defect rates under different process conditions
  • Analyzing how temperature and pressure settings interact to affect product strength

ANOVA vs. Regression: What’s the Relationship?

ANOVA and linear regression are mathematically equivalent under the general linear model framework. Both minimize the sum of squared residuals and can be expressed as the same underlying model.

FeatureANOVARegression
Independent variableCategorical (factors/groups)Continuous (or mixed)
Primary questionDo group means differ?How does Y change with X?
Output focusF-test for group effectsRegression coefficients (slopes)
OverlapCan include covariates (ANCOVA)Can include categorical predictors (dummy coding)

Common Mistakes and Pitfalls in ANOVA

Running Multiple t-Tests Instead of ANOVA

One of the most frequent errors in multi-group comparisons is conducting a series of independent t-tests rather than a single ANOVA. With three groups, this produces three pairwise comparisons; with five groups, ten. Each test carries its own 5% false positive rate, and these accumulate rapidly. This phenomenon is known as familywise error rate inflation. ANOVA was designed precisely to avoid this problem by evaluating all group differences within a single omnibus test, keeping the Type I error rate at the chosen significance level.

Skipping Assumption Checks

ANOVA is a parametric test and its results are only valid when its underlying assumptions are reasonably met. Many researchers proceed directly to the analysis without testing for normality (e.g., using the Shapiro-Wilk test or Q-Q plots) or homogeneity of variance (e.g., Levene’s test). In practice, moderate violations of normality are tolerable with larger group sizes, but serious violations, particularly of the equal variances assumption, can produce inflated or deflated F-statistics. When assumptions are not met, alternatives such as Welch’s ANOVA or non-parametric tests should be considered rather than ignored.

Omitting Post-Hoc Tests After a Significant Result

A significant ANOVA F-test only establishes that at least one group mean differs. But it does not identify where the difference lies. Reporting a significant result without following up with an appropriate post-hoc test (such as Tukey’s HSD, Bonferroni correction, or Games-Howell) leaves the finding incomplete and uninterpretable. Conversely, running post-hoc tests after a non-significant omnibus ANOVA — sometimes called “fishing” for differences — is equally problematic and inflates Type I error.

Misinterpreting a Non-Significant Result as Proof of Equality

Failing to reject the null hypothesis does not mean the groups are equivalent. A non-significant result may reflect insufficient statistical power due to a small sample size, high within-group variability, or a true effect that is too small to detect with the available data. Interpreting p > 0.05 as confirmation that no difference exists is a logical error. Researchers should report confidence intervals and effect sizes alongside the p-value to give a fuller picture of what the data do and do not rule out.

Ignoring Effect Size

Statistical significance is a function of both the true effect and the sample size. With a large enough sample, even a trivially small difference between groups will yield a significant p-value. Reporting only the F-statistic and p-value without an accompanying effect size measure (η², ω², or partial η²) makes it impossible to judge the practical or clinical relevance of the finding. Effect size should be reported as a matter of course, and its magnitude interpreted against established benchmarks such as Cohen’s f.

Using Standard ANOVA for Repeated Measures Data

When the same participants are measured at multiple time points or under multiple conditions, the observations within each participant are correlated. Applying a standard between-subjects ANOVA to such data violates the independence assumption, inflates the F-statistic, and produces artificially low p-values. Repeated measures ANOVA or a mixed-effects model for more complex designs must be used instead. Additionally, the sphericity assumption specific to repeated measures designs should be tested with Mauchly’s test and corrected if violated.

Misinterpreting Interaction Effects in Factorial ANOVA

In two-way or higher ANOVA designs, a statistically significant interaction means that the effect of one factor depends on the level of another. A common mistake is to interpret or report main effects in isolation when a significant interaction is present. Doing so produces a misleading and incomplete picture. When an interaction is significant, main effects should only be interpreted through the lens of the interaction, and an interaction plot should always be inspected to understand the nature and direction of the effect.

Confusing Statistical and Practical Significance

A result can be statistically significant without being meaningful in any practical or clinical sense, and vice versa. This is particularly common in large datasets where even negligible group differences reach significance. Researchers should always contextualize their findings by asking whether the magnitude of the difference between group means would matter in the real world and not just whether it clears a statistical threshold.

Unequal Group Sizes Without Appropriate Adjustment

While ANOVA can technically handle unequal group sizes (unbalanced designs), this introduces complications that are often overlooked. In factorial ANOVA, unequal cell sizes mean that the sum of squares cannot be cleanly partitioned. Researchers must use Type III sums of squares (rather than the default Type I in some software) to obtain interpretable main effects and interactions. Failing to account for unbalanced designs can produce incorrect estimates of effects and misleading F-statistics.

Treating the Dependent Variable as Categorical

ANOVA requires a continuous dependent variable measured on an interval or ratio scale. Applying ANOVA to ordinal outcome variables (like Likert scale responses treated as continuous) violates this assumption and can produce unreliable results. In such cases, a non-parametric alternative such as the Kruskal-Wallis test is more appropriate, unless there is strong theoretical justification for treating the ordinal variable as approximately continuous.

Frequently Asked Questions About ANOVA

Can ANOVA be used with only two groups?

ANOVA with two groups is mathematically valid and gives the same result as an independent samples t-test (F = t²). However, a t-test is simpler and preferred for two-group comparisons.

What sample size is needed for ANOVA?

There is no fixed universal rule, but a common recommendation is a minimum of 20–30 observations per group for adequate power. A formal power analysis is strongly recommended. For guidance on sample size calculations in research, refer to this resource.

What is a factorial ANOVA?

Any ANOVA with more than one categorical independent variable is a factorial ANOVA. A two-way ANOVA is the simplest factorial design.

Is ANOVA one-tailed or two-tailed?

ANOVA uses a one-tailed F-test. The F-distribution is inherently one-tailed (F cannot be negative), but the test is conceptually two-sided because it detects differences in either direction.

When should I use Welch’s ANOVA instead of standard ANOVA?

When the assumption of homogeneity of variance (equal variances) is violated, Welch’s ANOVA provides a more robust alternative that does not assume equal variances across groups.

References

  1. Fisher RA. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd; 1925.
  2. St. John JC. The analysis of variance (ANOVA). Nutr Res Pract. 2010;4(5):432. PMID: 27406694.
  3. Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept and the practical use. Korean J Anesthesiol. 2016;69(1):8–14. PMID: 26885295.
  4. Keselman HJ, Huberty CJ, Lix LM, et al. Statistical practices of educational researchers: an analysis of their ANOVA, MANOVA, and ANCOVA analyses. Rev Educ Res. 1998;68(3):350–386.
  5. McHugh ML. Multiple comparison analysis testing in ANOVA. Biochem Med (Zagreb). 2011;21(3):203–209. PMID: 22180578.
  6. Kim HY. Statistical notes for clinical researchers: one-way analysis of variance (ANOVA). Restor Dent Endod. 2014;39(1):74–77. PMID: 24516834.
  7. Gueorguieva R, Krystal JH. Move over ANOVA: progress in analyzing repeated-measures data. Arch Gen Psychiatry. 2004;61(3):310–317. PMID: 14993119.

This article was originally published on July 19, 2023, and updated on June 4, 2026.

Related post

Featured post

Comment

There are no comment yet.

TOP