What Is an Umbrella Review? Definition, Methods, Tools, Examples

Getting your Trinity Audio player ready...
Summarize this Blog with AI

Contents

Glossary of Key Terms

Before diving in, here are the key terms you will encounter throughout this guide, defined concisely for quick reference.

UMBRELLA REVIEW (UR) A systematic review whose unit of analysis is other systematic reviews or meta-analyses. Also called ‘review of reviews’ or ‘overview of reviews.’SYSTEMATIC REVIEW (SR) A comprehensive, reproducible synthesis of all available primary research on a defined question, using explicit pre-specified methods.
META-ANALYSIS (MA) A statistical technique used within a systematic review to mathematically pool results from multiple studies to produce a single summary effect estimate.SRMA Systematic Review and/or Meta-Analysis — the primary unit of inclusion in an umbrella review.
AMSTAR-2 A Measurement Tool to Assess Systematic Reviews (version 2). A 16-item validated checklist used to critically appraise the methodological quality of included SRMAs.GRADE Grading of Recommendations, Assessment, Development and Evaluations. A system for rating the certainty of evidence and the strength of clinical recommendations.
ROBIS Risk Of Bias In Systematic reviews. A tool for assessing the risk of bias in systematic reviews, complementary to AMSTAR-2.JBI Joanna Briggs Institute. Publishes the primary methodological handbook for umbrella reviews (Chapter 10 of the JBI Manual for Evidence Synthesis).
PICO(S) Population, Intervention, Comparison, Outcome (Study design). The framework used to structure research questions in intervention-based reviews.PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses. The dominant reporting guideline; a PRISMA-OvR extension exists for overviews of reviews.
HETEROGENEITY (I²) A statistical measure of variability in results across studies in a meta-analysis. High I² (>75%) signals that results differ substantially beyond chance.CORRECTED COVERED AREA (CCA) A metric used to quantify the degree of overlap of primary studies across multiple meta-analyses included in an umbrella review.
PROSPERO International Prospective Register of Systematic Reviews. The standard registry for pre-registering review protocols to increase transparency.PREDICTION INTERVAL (PI) A range capturing where the true effect size would be expected in a new study. Wider PIs indicate higher uncertainty about the generalisability of an effect.
SMALL STUDY EFFECTS The tendency for smaller studies to report larger effect sizes, potentially indicating publication bias. Assessed via Egger’s test and funnel plot asymmetry.EXCESS SIGNIFICANCE BIAS When the number of statistically significant results in a meta-analysis exceeds what would be expected by chance — a marker of potential selective reporting.

What Is an Umbrella Review?

An umbrella review is a systematic review of previously published systematic reviews and/or meta-analyses. Where a typical systematic review synthesizes primary studies (randomized controlled trials, cohort studies, etc.), an umbrella review sits one level higher: its building blocks are themselves the products of evidence synthesis.

The name is apt: an umbrella review holds multiple systematic reviews under a single canopy, offering a panoramic view of an entire research landscape rather than a narrow cross-section of it.

CORE DEFINITION

An umbrella review is a systematic review whose unit of analysis is other systematic reviews or meta-analyses, aggregating findings from several reviews that address specific questions under a shared topic. Each umbrella review focuses on a broad condition or problem for which there are two or more potential interventions, exposures, or outcomes of interest.

It is also known by several other names in the literature:

  • Overview of reviews (the preferred Cochrane terminology)
  • Review of reviews
  • Meta-review
  • Summaries of systematic reviews
  • Syntheses of reviews

These terms are used interchangeably across different institutions and journals, though methodological purists sometimes make fine distinctions between them.

Why Umbrella Reviews Exist

The exponential growth of biomedical publishing has created a paradox: researchers now have access to more evidence than ever before, yet synthesizing it all is increasingly impossible. As the number of systematic reviews has grown year-on-year, a new level of synthesis became necessary.

  • Thousands of systematic reviews are published annually across medicine, psychology, and public health
  • Multiple competing SRMAs often exist on the same narrow question, reaching different conclusions
  • Clinicians, guideline developers, and policymakers need a single, authoritative synthesis and not a reading list of competing reviews
  • Health technology assessments that evaluate all management options for a condition benefit enormously from a single umbrella document
  • A field may have been split into focused populations or interventions across many reviews; an umbrella review restores coherence by bringing them together

THE LUMPING VS. SPLITTING PROBLEM

Umbrella reviews solve a structural challenge: research on a broad topic tends to be systematically reviewed in narrow slices (by subgroup, intervention variant, or outcome). The umbrella review restores the comprehensive picture by synthesizing those slices without having to re-examine thousands of primary studies.

When to Use an Umbrella Review

Not every research question warrants an umbrella review. The method is most appropriate under specific conditions.

Ideal Conditions for an Umbrella Review

  • Two or more existing high-quality systematic reviews already exist on the topic
  • The field has been well-covered by SRMAs but lacks an overarching synthesis
  • Multiple competing interventions or exposures exist for a single condition
  • Guideline developers need a broad, defensible evidence base quickly
  • The question is broad enough that a single SR would be unwieldy or uninformative
  • Decision-makers need a rapid orientation to the state of evidence across a complex topic
  • There is confusion or contradiction between existing SRMAs that a higher-level synthesis could clarify

When NOT to Use an Umbrella Review

  • Existing systematic reviews are absent, sparse, or of very low quality
  • The research question is narrow enough that a single SR would suffice
  • The topic area is too new for secondary evidence to exist
  • Subgroup analyses of individual patient data are needed (an umbrella review cannot do this)
  • The goal is to re-analyse primary data; umbrella reviews work only at the review level

Umbrella Reviews vs. Other Review Types

Review TypeUnit of AnalysisScopeQuestion TypeTypical Output
Umbrella ReviewSystematic reviews & meta-analysesVery broadMultiple interventions/exposures/outcomesPanoramic evidence map; evidence grading
Systematic ReviewPrimary studies (RCTs, cohorts, etc.)Narrow–moderateSpecific PICO questionPooled or narrative synthesis
Meta-AnalysisQuantitative data from primary studiesNarrowOne or few outcomesPooled effect estimate (OR, RR, MD)
Scoping ReviewPrimary studies (any design)BroadLandscape mapping; concept clarificationMap of evidence; research gaps
Narrative ReviewPrimary studies (selective)VariableBackground, educationalExpert-guided summary; not reproducible
Rapid ReviewPrimary studies or SRsNarrow–moderatePolicy-urgent questionsAbbreviated synthesis with speed tradeoffs
Realist ReviewPrimary studiesVariableWhat works for whom in what context?Mechanistic explanations; theory-building

Umbrella Review vs. Systematic Review: Key Differences

DimensionSystematic ReviewUmbrella Review
Building blockPrimary studySystematic review or meta-analysis
Eligibility criteriaFocused and specificBroader, but still pre-specified
Search termsDisease/intervention keywordsDisease/topic + systematic review OR meta-analysis
Quality appraisal toolCochrane RoB, ROBINS-I, NOS, etc.AMSTAR-2, ROBIS, JBI checklist
Statistical re-analysisMeta-analysis of primary dataRe-run existing meta-analyses using standardized methods
Time to complete12–24 months (typical)6–10 weeks (professional teams)
Overlap concernDuplicate data in pooled analysesSame primary studies across multiple SRs (use CCA)
Evidence hierarchy positionHighHighest currently available

The Evidence Hierarchy

Umbrella reviews sit at the apex of the evidence pyramid. Understanding where they stand relative to other study designs clarifies their unique value.

Umbrella Reviews
Systematic Reviews & Meta-Analyses
Randomized Controlled Trials (RCTs)
Cohort & Prospective Studies
Case-Control & Cross-Sectional Studies
Case Reports & Case Series
Expert Opinion & Editorials

Umbrella reviews represent one of the highest levels of evidence synthesis currently available. They do not directly include primary studies; instead, they synthesize the work of dozens or hundreds of individual systematic reviews. This means that their conclusions rest on a very broad base of underlying research.

Registering an Umbrella Review Prospectively

Step 1: Develop a Protocol Before Starting Review Work

A protocol should be finalized before study selection/screening begins (and at minimum, before data extraction starts). It typically includes:

  • Research question(s) in PICO/PECO format
  • Eligibility criteria for systematic reviews to be included
  • Search strategy and databases
  • Quality appraisal tool(s) (e.g., AMSTAR-2, ROBIS, JBI)
  • Data extraction and synthesis plan
  • Handling of overlapping primary studies across reviews

Step 2: Choose a Registry

RegistryCostNotes
PROSPERO (CRD, University of York)FreeEligible review types include systematic, rapid, umbrella, diagnostic accuracy, prognostic, and methodological reviews;   scoping reviews are not eligible
OSF (Open Science Framework)FreeCommon alternative, especially for non-PROSPERO-eligible designs
INPLASYPaidFaster turnaround, broader eligibility
Research RegistryPaidAlternative option
JBI Systematic Review RegisterFree (JBI members)Title registry for JBI-affiliated reviews only

PROSPERO is the most widely recognized and commonly used registry for umbrella reviews in health research.

Step 3: Timing of Registration

  • Registration should take place once the review protocol has been finalised, but ideally before screening studies for inclusion begins.  
  • Reviews are accepted for registration as long as they have not progressed beyond completion of data extraction.  
  • Completed reviews are not accepted. Registration must occur before the review is finished.  
  • Registration must occur before data extraction begins; earlier is better and more credible for accountability purposes.  

Step 4: Register on PROSPERO

  1. Go to the PROSPERO website (CRD, University of York)
  2. Create an account
  3. Complete the online registration form, including:
    • Review title and team details
    • Anticipated start/completion dates
    • Condition/domain being studied
    • Comparator/exposure and outcomes
    • Search strategy summary
    • Quality appraisal tool(s) to be used
  4. Submit for review by PROSPERO administrators
  5. Receive a unique CRD registration number (format: CRD4YYYYxxxxxx)

Step 5: Cite the Registration

  • A unique CRD identifier issued at approval is used by journals, peer reviewers, and funders to verify the published review matches the registered plan.  
  • Include the registration number in:
    • The protocol publication (if published separately)
    • The final manuscript’s abstract and methods section

Step 6: Handling Amendments

  • Amendments are dated and version-tracked.  
  • Deviations from the registered protocol are described in the manuscript’s methods section.  

Example Registration Numbers (Real Umbrella Reviews)

Umbrella Review TopicPROSPERO ID
Psychological interventions post-stroke/TIACRD42022375947
Ethnic diversity in RCTsCRD42022325241
SNPs and lung cancer riskCRD42020204685
PPI with children/familiesCRD42024608935
Postpartum depression risk factorsCRD420251249033

Quick Checklist

  • Protocol drafted (PICO, eligibility, search, appraisal, synthesis plan)
  • Registration completed before screening/data extraction
  • Quality appraisal tool specified in advance (AMSTAR-2, ROBIS, JBI)
  • PROSPERO/OSF number obtained
  • Registration number cited in final publication
  • Any protocol deviations documented and explained

This is a general informational overview based on current registry guidance; if precise current PROSPERO eligibility criteria are critical to your submission, it’s worth double-checking directly on the PROSPERO website, as policies can be updated periodically.

How to Conduct an Umbrella Review: Step-by-Step Methodology

Conducting an umbrella review is a rigorous process. Below are the essential steps, drawn from the leading methodological guidance documents including the JBI Manual, the BMJ Medicine guidelines, and published frameworks from Fusar-Poli & Radua.

Confirm the Review Is Needed & Register a Protocol

Before beginning, verify that no equivalent umbrella review has already been published or is underway (check PROSPERO and published literature). Pre-specify your protocol and register it on PROSPERO. This prevents selective reporting and establishes transparency. Protocol registration is increasingly required by journals.

Define the Research Question (PICO/PECO Framework)

 Clearly articulate what you are asking. For intervention reviews, use the PICO framework (Population, Intervention, Comparison, Outcome). For epidemiological umbrella reviews, define the population(s), risk factor(s) or exposure(s), and outcome(s). The scope should be broader than a typical SR but still precisely delimited.

Develop Explicit Eligibility Criteria

 Specify which types of SRMAs will be included and excluded. Typical criteria address publication type, language and date restrictions, minimum quality thresholds (or the decision to include all regardless of quality), whether to include SRMAs of observational studies only or RCTs only or both, and the definition of relevant population, intervention/exposure, and outcomes.

Construct a Two-Part Search Algorithm

The search string has two components combined using boolean AND: (1) a study design filter identifying SRMAs (e.g., ‘systematic review*’ OR ‘meta-analys*’), and (2) a topic filter covering all relevant keywords, MeSH terms, and synonyms. Search multiple databases (MEDLINE, Embase, Cochrane Library, PsycINFO) and grey literature sources.

Screen Literature Independently (Double Screening)

Two independent reviewers screen titles and abstracts, then full texts, using the pre-specified eligibility criteria. Disagreements are resolved through discussion or a third reviewer. Document exclusion reasons at the full-text stage and present results in a PRISMA flow diagram.

Manage Overlap Between SRMAs

When multiple SRMAs cover the same primary studies and outcomes, decide which to include. Common strategies: choose the most recent SRMA; choose the SRMA with the largest number of studies; choose the SRMA with the highest AMSTAR-2 quality; for epidemiological reviews, choose the SRMA with the most prospective studies. Calculate the Corrected Covered Area (CCA) to quantify overlap.

Extract Data Using Standardized Forms

Two reviewers independently extract data. For each included SRMA, extract: number of included studies and total sample size; study-specific effect estimates and 95% confidence intervals; heterogeneity statistics (I², Cochran’s Q, tau²); any risk-of-bias or quality assessments reported within the SRMA; and descriptive conclusions for narrative reviews without meta-analysis.

Re-Run Meta-Analyses with Standardized Methods

Rather than simply reporting the pooled estimates as published, re-run each meta-analysis using standardized statistical models. This ensures comparability across all included SRMAs. Conduct consistency checks and assess heterogeneity and potential biases (small study effects, excess significance bias) uniformly.

Assess Methodological Quality of Included SRMAs

Apply validated appraisal tools, most commonly AMSTAR-2, to each included SRMA. Appraisal should be done independently by two reviewers, with consensus discussion for discrepancies. Summarize quality findings in a table.

Grade the Strength of Evidence

For intervention reviews: apply GRADE. For epidemiological umbrella reviews: use criteria assessing amount of evidence, statistical significance, heterogeneity, small study effects, and excess significance bias. Consider performing sensitivity analyses restricted to prospective studies to examine temporality of associations.

Report Results Transparently

Report in both tabular and graphical formats. Key elements include summary tables of all meta-analyses with key statistics, evidence grading tables, PRISMA flow diagrams, and optional visual plots (forest plots, Manhattan plots). Address contradictory conclusions across SRMAs explicitly.

Interpret Findings Carefully

Interpret with attention to confounding (for observational reviews), clinical relevance, external validity/generalizability, and the limitations of the included SRMAs. Causal claims require extreme caution. Discuss biological plausibility and cite supporting evidence from other methodologies (e.g., Mendelian randomisation studies).

What is AMSTAR-2?

AMSTAR-2 (A MeaSurement Tool to Assess systematic Reviews, version 2) is a 16-item checklist used to evaluate the methodological quality of systematic reviews, including those with or without meta-analysis. In an umbrella review (a review that synthesizes findings from multiple systematic reviews/meta-analyses on related topics), AMSTAR-2 is applied to each included systematic review to judge how trustworthy its conclusions are before they’re pooled or compared.

Why AMSTAR-2 Matters in Umbrella Reviews

  • Umbrella reviews combine evidence from many systematic reviews, often covering overlapping primary studies.
  • The overall conclusions are only as reliable as the weakest review included.
  • AMSTAR-2 helps flag reviews with serious methodological flaws so their findings can be interpreted with appropriate caution or downweighted.

The 16 Items

#DomainCritical?
1PICO components in research questionsNo
2Pre-established protocolNo
3Explanation for study design selectionNo
4Comprehensive literature searchYes
5Study selection in duplicateNo
6Data extraction in duplicateNo
7List of excluded studies with justificationYes
8Adequate description of included studiesNo
9Satisfactory risk of bias (RoB) assessmentYes
10Reporting of funding sources for included studiesNo
11Appropriate meta-analysis methodsYes
12Assessment of RoB impact on resultsYes
13Accounting for RoB when interpreting resultsYes
14Explanation of heterogeneityNo
15Investigation of publication biasYes
16Disclosure of conflicts of interestNo

Rating Each Review

Each item is rated as:

  • Yes
  • Partial Yes
  • No

Overall Confidence Ratings

Based on the pattern of weaknesses across the 7 critical domains, each systematic review receives an overall rating:

RatingCriteria
HighNo or one non-critical weakness
ModerateMore than one non-critical weakness
LowOne critical flaw, with or without non-critical weaknesses
Critically LowMore than one critical flaw

How Umbrella Review Authors Use AMSTAR-2

  • Two independent raters typically apply AMSTAR-2 to each included review, resolving disagreements by consensus or a third reviewer.
  • Results are usually presented in a summary table showing each review’s score per item and its overall confidence rating.
  • Critically low quality reviews may be:
    • Excluded from the primary synthesis
    • Reported separately as supporting/contextual evidence
    • Used in sensitivity analyses
  • Confidence ratings often feed into the overall certainty of evidence (alongside tools like GRADE) when drawing umbrella-level conclusions.

Common Reporting Table in Umbrella Reviews

Included ReviewItem 4Item 7Item 9Item 11Item 12Item 13Item 15Overall Rating
Review AYesPartial YesYesYesNoYesYesModerate
Review BNoNoPartial YesYesNoNoNoCritically Low

Key Limitation

AMSTAR-2 assesses how a review was conducted, not whether its findings are correct. A methodologically strong review can still report null or modest effects, and a flawed one can still report a true effect. It’s a quality lens, not a validity verdict on the underlying evidence itself.

What is GRADE?

GRADE (Grading of Recommendations Assessment, Development and Evaluation) is a framework for rating the certainty of evidence for a given outcome and, where applicable, the strength of recommendations. In an umbrella review, GRADE is applied to the body of evidence underlying each outcome reported across the included systematic reviews/meta-analyses and it helps readers judge how much confidence to place in each summary effect.

Why GRADE Matters in Umbrella Reviews

  • Umbrella reviews often report many outcome-exposure or outcome-intervention associations.
  • Not all associations are equally trustworthy, even if statistically significant.
  • GRADE provides a standardized way to communicate how confident readers should be that the reported effect reflects the true effect.

Starting Point Based on Study Design

Body of EvidenceStarting Certainty
Randomized controlled trials (RCTs)High
Observational studies (cohort, case-control)Low

The 5 Domains That Can Downgrade Certainty

DomainWhat It Assesses
Risk of biasMethodological limitations in the primary studies underlying the reviews
InconsistencyUnexplained heterogeneity in effect estimates across studies/reviews
IndirectnessDifferences in population, intervention, comparator, or outcome from the question of interest
ImprecisionWide confidence intervals or small sample sizes/event numbers
Publication biasEvidence that studies with null/negative results are missing

3 Factors That Can Upgrade Certainty (Mainly for Observational Evidence)

FactorDescription
Large effect sizeStrong or very strong magnitude of association
Dose-response gradientEffect increases consistently with exposure level
Plausible confoundingConfounders would likely reduce, not create, the observed effect

Final Certainty Ratings

RatingSymbolInterpretation
High⊕⊕⊕⊕True effect is close to the estimated effect
Moderate⊕⊕⊕◯True effect is probably close to estimate, but could differ
Low⊕⊕◯◯True effect may be substantially different
Very Low⊕◯◯◯Very little confidence in the estimate

How Umbrella Review Authors Apply GRADE

  • Often combined with other classification systems specific to umbrella reviews (e.g., evidence classes based on significance, sample size, and 95% prediction intervals), but GRADE remains the most widely recognized certainty framework.
  • Each outcome/association is assessed individually, since certainty can vary even within the same umbrella review.
  • Findings are typically presented in a GRADE summary table alongside effect estimates.

Example Summary Table

OutcomeEffect Estimate (95% CI)Risk of BiasInconsistencyIndirectnessImprecisionPublication BiasOverall Certainty
Outcome 1RR 1.45 (1.20–1.75)Not seriousSeriousNot seriousNot seriousNot seriousModerate
Outcome 2OR 0.88 (0.60–1.30)SeriousSeriousNot seriousSeriousNot detectedVery Low

GRADE vs AMSTAR-2

  • AMSTAR-2 assesses the methodological quality of each included systematic review (the “wrapper”).
  • GRADE assesses the certainty of the evidence for each outcome (the underlying findings).
  • Used together, they give umbrella review readers a fuller picture: was the review conducted well, AND can we trust its reported effect?

Key Limitation

GRADE was originally developed for single systematic reviews informing clinical guidelines, so applying it within umbrella reviews requires adaptation. This is particularly relevant when you are summarizing across multiple overlapping reviews with shared primary studies, which can complicate judgments about imprecision and inconsistency.

Managing Overlap of Primary Studies in Umbrella Reviews

One of the most technically complex challenges in umbrella reviews is overlap: many systematic reviews on the same topic will have included some or all of the same primary studies. If left unaddressed, overlap can inflate the apparent evidence base and artificially narrow confidence intervals. This would make results look more precise than they actually are.

Why Overlap Matters

  • A single large, high-quality RCT included in five separate meta-analyses may have disproportionate influence on the overall umbrella review finding
  • Overlap that inflates sample sizes can produce misleadingly low p-values, increasing Type I error (false positive) risk
  • Contradictory conclusions between SRMAs on the same question may stem from differing eligibility criteria, search dates, or statistical methods

Strategies for Handling Overlap

StrategyDescriptionBest Used When
Corrected Covered Area (CCA)Quantifies the proportion of primary studies shared across SRMAs. CCA <0.05 = slight; 0.05–0.10 = moderate; 0.11–0.15 = high; >0.15 = very high overlap.Always — as a reporting measure alongside any overlap decisions
Restrict by recencyAmong overlapping SRMAs, include only the most recently published versionWhen the topic has evolved rapidly and newer SRMAs incorporate more updated evidence
Restrict by sizePrefer the SRMA with the most included primary studiesWhen comprehensiveness of coverage is the priority
Restrict by qualityUse AMSTAR-2 scores to select the methodologically highest-quality SRMAWhen methodological rigour is the priority
Include all + CCA reportingInclude all SRMAs and quantify overlap using CCA; interpret results in light of the overlap magnitudeWhen comprehensiveness and transparency are prioritized; common in observational umbrella reviews

Important clarification

An umbrella review does not combine or re-pool results from different meta-analyses in a grand statistical synthesis. It describes and grades the evidence from each SRMA separately. Therefore, overlap does not carry the same statistical risk as in a traditional meta-analysis, but it must still be acknowledged and managed for interpretive clarity.

Quality Assessment Tools for Umbrella Reviews

Evaluating the methodological quality of included SRMAs is essential to interpreting umbrella review findings. The quality of an umbrella review is ultimately contingent on the quality of its constituent SRMAs.

AMSTAR-2 (Primary Tool)

AMSTAR-2 is the most widely used tool for appraising systematic reviews in the context of umbrella reviews. It covers 16 items and distinguishes between critical and non-critical domains.

ROBIS for Umbrella Reviews

ROBIS (Risk of Bias in Systematic Reviews) is a tool specifically developed to assess the risk of bias — rather than just reporting quality — of systematic reviews included in an umbrella review.

Structure: 3 Phases

PhasePurpose
Phase 1Assess relevance of the review to the umbrella review’s question (optional)
Phase 2Identify concerns in 4 domains via signaling questions
Phase 3Judge overall risk of bias based on Phase 2

The 4 Domains (Phase 2)

DomainFocus
1. Study eligibility criteriaWere inclusion criteria appropriate and clearly defined before conducting the review?
2. Identification & selection of studiesWas the search comprehensive and selection bias minimized?
3. Data collection & study appraisalWere data extraction and risk-of-bias assessments of primary studies done appropriately?
4. Synthesis & findingsWere methods for synthesis, and interpretation of results, appropriate?

Signaling Question Responses

  • Yes
  • Probably Yes
  • Probably No
  • No
  • No Information

Overall Risk of Bias Judgment

RatingMeaning
LowFew or no concerns across domains
HighConcerns in one or more domains significantly affecting confidence
UnclearInsufficient information to judge

Use in Umbrella Reviews

  • Each domain receives a low/high/unclear rating, then an overall risk of bias judgment is made for the review as a whole.
  • Often used as an alternative or complement to AMSTAR-2. Some umbrella reviews use both for triangulation.
  • Reviews rated high risk of bias may be flagged, excluded from primary synthesis, or interpreted cautiously.

ROBIS vs. AMSTAR-2

FeatureROBISAMSTAR-2
Primary focusRisk of biasMethodological quality
Domains47 critical + 9 non-critical (16 total)
Overall ratingLow/High/UnclearHigh/Moderate/Low/Critically Low
Common inPublic health, epidemiologyHealth interventions broadly

JBI Critical Appraisal Checklist for Umbrella Reviews

The JBI (Joanna Briggs Institute) Critical Appraisal Checklist for Systematic Reviews and Research Syntheses is part of a suite of design-specific JBI tools, widely used in nursing, allied health, and JBI-affiliated reviews.

Checklist Items (11 Questions)

#Question Focus
1Were the review questions clearly stated?
2Were inclusion criteria appropriate?
3Was the search strategy appropriate?
4Were sources/resources for studies adequate?
5Were criteria for appraising studies appropriate?
6Was critical appraisal conducted by ≥2 reviewers independently?
7Were methods to minimize errors in data extraction used?
8Were appropriate methods used to combine studies?
9Was likelihood of publication bias assessed?
10Were recommendations for policy/practice supported by data?
11Were specific directives for new research appropriate?

Response Options

  • Yes
  • No
  • Unclear
  • Not Applicable

Use in Umbrella Reviews

  • Particularly favored when the umbrella review itself follows JBI methodology (which has its own formal guidance for conducting umbrella reviews).
  • Results presented in a simple summary table (reviews × items, with Y/N/U/NA responses).
  • No formal overall “score” or cutoff; appraisal informs narrative judgment about including/weighting a review.
  • Often paired with JBI’s own umbrella review conduct guidelines, creating methodological consistency for JBI-affiliated authors.

JBI Checklist vs. AMSTAR-2

FeatureJBI ChecklistAMSTAR-2
Item count1116
Overall rating systemNone (narrative)Yes (4-tier)
Discipline associationNursing/allied healthGeneral health sciences
Companion conduct guidanceJBI Umbrella Review MethodologyNone specific

PRISMA Checklist for Umbrella Reviews

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) is not a quality or risk-of-bias appraisal tool. It is a reporting guideline ensuring transparency and completeness of how a review (including umbrella reviews) is described.

Key Distinction

ToolWhat It Assesses
AMSTAR-2 / ROBIS / JBIMethodological quality / risk of bias
PRISMACompleteness and transparency of reporting

PRISMA Components

ComponentDescription
27-item checklistCovers title, abstract, methods, results, discussion, funding
PRISMA flow diagramVisualizes study identification, screening, exclusion, and inclusion
PRISMA extensionsSpecialized versions (e.g., PRISMA-P for protocols; no dedicated umbrella-review version, but PRISMA 2020 is commonly adapted)

Use in Umbrella Reviews

  • Authors use PRISMA to structure and report the umbrella review itself and not to appraise the included systematic reviews.
  • A completed PRISMA flow diagram documents how many systematic reviews were identified, screened, excluded (with reasons), and included.
  • Journals frequently require a PRISMA checklist submission alongside the manuscript.
  • Improves reproducibility and transparency, allowing readers to trace the selection process for included reviews.

Relationship to Quality Appraisal Tools

  • PRISMA compliance does not indicate methodological quality. A review can be well-reported but methodologically weak (low AMSTAR-2/high ROBIS risk), or vice versa.
  • PRISMA is therefore typically used alongside, not instead of, AMSTAR-2, ROBIS, or JBI tools in umbrella reviews.

Practical note

A common debate in umbrella reviews is whether to exclude low-quality SRMAs from the synthesis. Most methodologists recommend including all SRMAs regardless of quality (to avoid overestimating or underestimating effect sizes from incomplete data), while clearly reporting quality ratings and conducting sensitivity analyses that restrict findings to higher-quality reviews.

How to Grade the Evidence in an Umbrella Review

Quality assessment (is this SRMA well-conducted?) and evidence grading (how strong is the overall body of evidence?) are distinct steps that are often confused.

GRADE for Intervention Reviews

When the umbrella review addresses interventions, GRADE is the validated approach. GRADE evaluates certainty of evidence across five factors:

  • Risk of bias in the underlying studies
  • Inconsistency: unexplained variability across studies
  • Indirectness: applicability of evidence to the specific question
  • Imprecision: width of confidence intervals
  • Publication bias: likelihood of selective reporting

Evidence is rated as: High → Moderate → Low → Very Low certainty.

Ioannidis Criteria for Epidemiological Reviews

For umbrella reviews of observational or epidemiological associations (risk factors, predictors), a widely used complementary framework evaluates each meta-analysis on:

CriterionDescriptionThreshold (example)
Amount of evidenceTotal number of cases or participants≥1,000 cases
Statistical significanceP-value for the summary effect estimateP < 0.001
HeterogeneityI² statistic across studiesI² < 50%
Prediction intervalDoes the 95% PI exclude the null?PI excludes 1.0 (or 0)
Small study effectsEgger’s test or funnel plot asymmetryP > 0.10 for Egger’s test
Excess significance biasMore significant studies than expectedP > 0.10 for excess significance test

Based on these criteria, each association may be classified as: Convincing, Highly Suggestive, Suggestive, Weak, or Not Significant.

How to Report Umbrella Reviews

Transparent reporting is critical to umbrella review quality. Several reporting guidelines apply:

Reporting Standards for Umbrella Reviews

GuidelineFull NameApplicable To
PRISMA-OvRPreferred Reporting Items for Overviews of ReviewsAll umbrella/overview reviews; the primary reporting standard
MOOSEMeta-analysis Of Observational Studies in EpidemiologyEpidemiological umbrella reviews of observational data
PRISMA 2020Preferred Reporting Items for Systematic Reviews and Meta-AnalysesBaseline guidance applicable to the overall structure
GRADE SoF tablesSummary of Findings tables (GRADE format)Presenting evidence certainty for intervention reviews
PROSPERO ProtocolInternational Prospective Register of Systematic ReviewsPre-registration of protocol before data collection

Key Reporting Elements in Umbrella Review Results

For each included SRMA, report:

  • Total number of events or cases (binary outcomes) and total sample size
  • Number of included primary studies
  • Effect size metric used (OR, RR, HR, MD, SMD)
  • Meta-analysis model (fixed-effect vs. random-effects)
  • Summary effect estimate and 95% confidence interval
  • 95% prediction interval
  • Heterogeneity statistics (I², Cochran’s Q p-value, tau²)
  • Effect estimate from the largest single included study
  • Results of small study effects and excess significance tests
  • Overall evidence grade

Software & Tools for Umbrella Reviews

ToolPurposeStage of Review
RayyanAI-assisted title/abstract screening; collaboration between reviewersScreening
CovidenceFull-text screening, data extraction, conflict resolutionScreening & extraction
EPPI-ReviewerSystematic review management; useful for large umbrella reviewsAll stages
R (metafor package)Statistical meta-analysis; heterogeneity and bias testsStatistical analysis
Stata (meta suite)Meta-analytic modelling; funnel plots; Egger’s testStatistical analysis
RevManCochrane’s review management tool; forest plotsAnalysis & reporting
GROOVE toolGraphical representation of overlap across systematic reviewsOverlap assessment
GRADEpro GDTGRADE evidence profiling; Summary of Findings tablesEvidence grading
PROSPEROProtocol registration registryPre-review planning
ATLAS.ti / NVivoQualitative data management (for narrative synthesis components)Qualitative synthesis

Strengths & Limitations of Umbrella Reviews

Strengths

  • Provides the highest-level, most comprehensive overview of evidence on a broad topic in a single document
  • Efficient for decision-makers: saves time compared to reading dozens of individual SRMAs
  • Particularly valuable for health technology assessments evaluating all management options for a condition
  • Resolves the ‘lumping vs. splitting’ tension in research synthesis
  • Enables comparison of the strength of evidence across multiple interventions, exposures, or outcomes simultaneously
  • Can reveal contradictions between existing SRMAs and explain their sources
  • Standardized re-analysis of each meta-analysis corrects errors in published SRMAs that used inappropriate statistical models
  • Feasible in 6–10 weeks for professional teams, since primary data re-analysis is not required

Limitations

  • ‘Garbage in, garbage out’: An umbrella review is wholly dependent on the accuracy and rigour of the included SRMAs. If the underlying reviews are biased, the umbrella review inherits those biases.
  • Cannot fill evidence gaps: If a research area lacks systematic reviews, an umbrella review cannot be conducted.
  • No individual patient data (IPD) analysis: Because primary data are not re-examined, subgroup analyses at the participant level are not possible.
  • Overlap inflation: Even with CCA management, overlapping primary studies across SRMAs can create an illusion of more independent evidence than actually exists.
  • Difficulty in causal inference: For epidemiological umbrella reviews, confounding, reverse causality, and selection bias remain serious threats.
  • Clinical heterogeneity: Combining SRMAs that varied in their populations, interventions, comparators, and outcomes can make the overall synthesis clinically difficult to interpret.

Annotated Real-World Examples

The following examples illustrate how umbrella reviews function in practice — what questions they asked, what methods they used, and what their findings demonstrated.

EXAMPLE 1 · EPIDEMIOLOGY

Risk Factors for the Onset of Type 2 Diabetes Mellitus

  • What it asked: Which non-genetic factors are associated with developing type 2 diabetes, and how strong is the evidence for each?
  • Scope: Synthesized 142 epidemiological associations from multiple SRMAs. Population: individuals without T2DM at study baseline. Exposure: any non-genetic factor. Outcome: incident T2DM.
  • Overlap handling: When multiple SRMAs covered the same exposure-outcome pair, researchers selected the SRMA with the largest number of prospective studies, to preserve temporality of association (exposure before outcome).
  • Evidence grading: Applied Ioannidis criteria to each of the 142 associations, classifying each as convincing, highly suggestive, suggestive, weak, or not significant.
  • Visualization: Results were presented in a comprehensive table of all 142 associations with full statistics, plus a Manhattan plot  (a visual borrowed from genomics) that made the panoramic pattern of evidence immediately readable.
  • Why it was useful: No single systematic review could have assessed 142 associations simultaneously. The umbrella review revealed which risk factors had convincing, consistent evidence and which were spurious — directly informing prevention guidelines.

EXAMPLE 2 · PHARMACOLOGY / WOMEN’S HEALTH

Menopausal Hormone Therapy and Women’s Health

  • What it asked: What are the effects of menopausal hormone therapy (MHT) across a wide range of health outcomes: cardiovascular disease, fractures, cancer, cognition, and others?
  • Why an umbrella review was needed: Dozens of systematic reviews existed on individual outcomes of MHT. Clinicians and guideline developers needed a single document that assessed and compared the evidence across all relevant outcomes at once.
  • Methods: Included SRMAs on randomized and observational designs. Quality was assessed using AMSTAR-2. Evidence certainty was graded using GRADE, yielding High/Moderate/Low/Very Low ratings for each outcome.
  • Key value: The umbrella review showed, side-by-side, that MHT has moderate-certainty evidence of benefit for fracture prevention and hot flashes, but very low-certainty evidence for cognitive outcomes. This is a nuanced, actionable picture that no individual SR provided.

EXAMPLE 3 · PSYCHIATRY / MENTAL HEALTH

Umbrella Reviews in Early Psychosis

  • Context: Paolo Fusar-Poli and Joaquim Radua, the authors of the landmark ‘Ten Simple Rules for Conducting Umbrella Reviews’, applied umbrella review methodology extensively in psychiatry, particularly around risk factors for and interventions in early psychosis.
  • Evidence stratification: Each risk factor for psychosis transition was classified into evidence tiers, allowing clinicians to quickly identify factors with convincing vs. weak evidence — critical for clinical risk calculators and early intervention programs.
  • These umbrella reviews demonstrated pre-specifying the protocol, defining variables of interest such as transition to psychosis as a clear binary outcome, estimating a common effect size (OR) across all SRMAs, and reporting the heterogeneity and 95% prediction intervals for each association.

EXAMPLE 4 · NUTRITION SCIENCE

Diet-Associated Inflammation and 38 Chronic Disease Outcomes

  • What it asked: What is the strength of evidence linking dietary inflammatory potential to 38 different chronic disease outcomes?
  • Efficiency demonstrated: This umbrella review was completed in approximately one year and assessed 38 chronic disease outcomes. This is a scope that would be entirely infeasible if starting from primary studies.
  • Limitations acknowledged: The authors explicitly noted that the review could not capture associations not yet covered by published meta-analyses, and that individual patient data analyses were not possible within the umbrella review framework.

EXAMPLE 5 · INFECTIOUS DISEASE / PUBLIC HEALTH

Long COVID Prevalence and Risk Factors (Rapid Systematic Umbrella Review)

  • Context: As the COVID-19 pandemic generated a rapid explosion of primary studies and systematic reviews on Long COVID, an umbrella review approach allowed researchers to synthesize the evidence at the review level.
  • Unique use case: The umbrella review explicitly used the systematic review level to examine common biases and limitations across the field — demonstrating how umbrella reviews can serve a methodological surveillance function, not just a substantive evidence synthesis one.
  • What it found: 14 reviews covering 5–196 primary studies were included. Pooling was not performed; instead, a descriptive meta-synthesis of prevalence estimates, risk factors, and bias patterns was conducted. This example illustrates that umbrella reviews can be qualitative/narrative as well as statistical.

Key Takeaways

  • An umbrella review is a systematic review of systematic reviews and/or meta-analyses; it synthesizes evidence at the highest available level, one step above individual SRMAs in the evidence hierarchy.
  • It is also called an ‘overview of reviews,’ ‘meta-review,’ or ‘review of reviews’. These terms are largely interchangeable in the current literature, though Cochrane prefers ‘overview of reviews.’
  • Umbrella reviews are appropriate only when a research topic is already well-covered by existing SRMAs. They cannot substitute for a primary systematic review where SRMAs do not yet exist.
  • The unit of analysis is the SRMA, not the primary study. But researchers typically re-run each meta-analysis using standardized statistical methods rather than simply reporting the published pooled estimates.
  • The two-part search algorithm (study design filter + topic filter, combined with AND) is the defining methodological feature that differentiates an umbrella review search from a standard systematic review search.
  • Overlap (the same primary studies appearing in multiple included SRMAs) is a key methodological challenge. The Corrected Covered Area (CCA) is the standard metric to quantify and report overlap.
  • AMSTAR-2 is the dominant tool for appraising the methodological quality of included SRMAs; ROBIS and JBI checklists are also used. Quality assessment is distinct from evidence grading.
  • GRADE is used to grade certainty of evidence for intervention umbrella reviews; the Ioannidis criteria are widely used for epidemiological umbrella reviews.
  • Pre-registration on PROSPERO and reporting adherent to PRISMA-OvR are the expected standards for publication-ready umbrella reviews.
  • Umbrella reviews inherit the biases and limitations of the SRMAs they include. A well-conducted umbrella review of low-quality SRMAs will still yield unreliable conclusions.
  • Umbrella reviews cannot generate new pooled effect sizes from primary data and are not designed for individual patient data (IPD) subgroup analyses.
  • The efficiency advantage of umbrella reviews is substantial: professional teams can often complete one in 6–10 weeks, compared to 12–24 months for a full de novo systematic review.

Frequently Asked Questions

Can I do an umbrella review if some of the systematic reviews I find are low quality? Do I have to exclude them?

This is one of the most commonly debated questions in umbrella review practice. The general methodological consensus is: include all eligible SRMAs regardless of quality, but clearly report quality ratings and conduct sensitivity analyses that restrict findings to higher-quality reviews.

Excluding low-quality SRMAs a priori risks distorting the evidence base: either by overestimating or underestimating effect sizes from a selectively curated subset. Instead, assess quality using AMSTAR-2 (or ROBIS/JBI), present the ratings transparently in a table, and discuss how the inclusion of lower-quality reviews may have influenced your overall conclusions. Sensitivity analyses restricted to ‘high’ and ‘moderate’ AMSTAR-2 ratings are the appropriate way to test whether your conclusions hold when low-quality reviews are removed.

If multiple systematic reviews all include the same studies, isn’t the ‘umbrella review’ just inflating the evidence base and showing me one study multiple times?

This concern is valid and points to the overlap problem, which is the most technically complex challenge in umbrella reviews. However, there are two important clarifications.

First, umbrella reviews do not statistically pool the results of different meta-analyses into a grand combined estimate. Each SRMA is analysed and reported separately. So the inflation risk is interpretive, not arithmetical. Second, the standard approach is to calculate the Corrected Covered Area (CCA). CCA values are classified as slight (<5%), moderate (5–10%), high (11–15%), or very high (>15%). This metric helps readers understand how much of the apparent evidence is truly independent: a critical caveat that should be prominently reported.

Do I need a full team to conduct an umbrella review, or can I do it alone? Is it really faster than a systematic review?

Unlike a standard systematic review (which mandates double-screening and double-extraction by at least two independent reviewers) some guidance documents note that a team is not strictly required for umbrella reviews. That said, best practice still calls for independent dual-screening to minimise selection bias, and independent data extraction for key statistics.

As for speed: yes, umbrella reviews are substantially faster than de novo systematic reviews. Professional teams can typically complete an umbrella review in 6–10 weeks. The time savings come from not having to search for, screen, and extract data from thousands of primary studies, only from a smaller pool of existing SRMAs. However, the statistical re-analysis step can add significant time for large umbrella reviews covering many associations.

Two of the systematic reviews I want to include reach completely opposite conclusions on the same question. What do I do?

Contradictory conclusions across SRMAs on the same question are not a failure. In fact, exposing and explaining these contradictions is one of the most valuable contributions an umbrella review can make.

Common explanations include: different eligibility criteria (populations, comparators, or outcome definitions); different search dates; different statistical models (fixed-effect vs. random-effects); different quality thresholds; and language or publication bias differences in search strategies.

Report the conflicting SRMAs side by side in your results tables, explain the likely sources of divergence in your discussion, and note what additional primary research would be needed to resolve the contradiction.

Can an umbrella review include non-intervention reviews, like reviews of prevalence, diagnostic accuracy, or qualitative studies?

Yes, though most of the established methodology has been developed and applied in the intervention and epidemiological association contexts, umbrella reviews are not inherently restricted to these domains.

Umbrella reviews of observational/epidemiological SRMAs are now very common. Umbrella reviews of diagnostic accuracy SRMAs are feasible but require specialised statistical methods (bivariate/SROC models). Umbrella reviews including qualitative systematic reviews are possible but remain methodologically less standardised. The JBI Manual for Evidence Synthesis provides specific guidance for umbrella reviews incorporating qualitative evidence alongside quantitative SRMAs.

When an umbrella review and a single systematic review both exist on the same question, which should I cite in my paper or guideline?

When both exist, the umbrella review is generally preferred as the citation for evidence-based decision-making because it

  1. consolidates findings from multiple SRMAs,
  2. addresses the overlap problem,
  3. applies standardised quality assessment, and
  4. provides a more complete and reliable picture of the evidence than any single SRMA can.

The important caveat: an umbrella review is only as good as the quality and completeness of the SRMAs it synthesises. If the best available SRMA covers more recent trials than an older umbrella review, citing the updated SRMA alongside the umbrella review may be appropriate until an updated umbrella review is published.

References

  1. Ten simple rules for conducting umbrella reviews. https://pmc.ncbi.nlm.nih.gov/articles/PMC10270421/
  2. Types of Reviews: Umbrella Reviews. https://laneguides.stanford.edu/types-of-reviews/umbrella
  3. Umbrella reviews: a methodological guide. https://academic.oup.com/eurjcn/article/24/6/996/7974731
  4. How to Conduct Umbrella Review in Education? A Step-by-Step Methodological Guide Through a Case Study in Digital Diaries. https://journals.sagepub.com/doi/10.1177/20965311261421966

Related post

Featured post

Comment

There are no comment yet.

TOP