|
Getting your Trinity Audio player ready...
|
Contents
- Glossary of Key Terms
- What Is an Umbrella Review?
- Why Umbrella Reviews Exist
- When to Use an Umbrella Review
- Umbrella Reviews vs. Other Review Types
- Umbrella Review vs. Systematic Review: Key Differences
- The Evidence Hierarchy
- Registering an Umbrella Review Prospectively
- How to Conduct an Umbrella Review: Step-by-Step Methodology
- What is AMSTAR-2?
- What is GRADE?
- Managing Overlap of Primary Studies in Umbrella Reviews
- Quality Assessment Tools for Umbrella Reviews
- How to Grade the Evidence in an Umbrella Review
- How to Report Umbrella Reviews
- Software & Tools for Umbrella Reviews
- Strengths & Limitations of Umbrella Reviews
- Annotated Real-World Examples
- Key Takeaways
- Frequently Asked Questions
- References
Glossary of Key Terms
Before diving in, here are the key terms you will encounter throughout this guide, defined concisely for quick reference.
| UMBRELLA REVIEW (UR) A systematic review whose unit of analysis is other systematic reviews or meta-analyses. Also called ‘review of reviews’ or ‘overview of reviews.’ | SYSTEMATIC REVIEW (SR) A comprehensive, reproducible synthesis of all available primary research on a defined question, using explicit pre-specified methods. |
| META-ANALYSIS (MA) A statistical technique used within a systematic review to mathematically pool results from multiple studies to produce a single summary effect estimate. | SRMA Systematic Review and/or Meta-Analysis — the primary unit of inclusion in an umbrella review. |
| AMSTAR-2 A Measurement Tool to Assess Systematic Reviews (version 2). A 16-item validated checklist used to critically appraise the methodological quality of included SRMAs. | GRADE Grading of Recommendations, Assessment, Development and Evaluations. A system for rating the certainty of evidence and the strength of clinical recommendations. |
| ROBIS Risk Of Bias In Systematic reviews. A tool for assessing the risk of bias in systematic reviews, complementary to AMSTAR-2. | JBI Joanna Briggs Institute. Publishes the primary methodological handbook for umbrella reviews (Chapter 10 of the JBI Manual for Evidence Synthesis). |
| PICO(S) Population, Intervention, Comparison, Outcome (Study design). The framework used to structure research questions in intervention-based reviews. | PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses. The dominant reporting guideline; a PRISMA-OvR extension exists for overviews of reviews. |
| HETEROGENEITY (I²) A statistical measure of variability in results across studies in a meta-analysis. High I² (>75%) signals that results differ substantially beyond chance. | CORRECTED COVERED AREA (CCA) A metric used to quantify the degree of overlap of primary studies across multiple meta-analyses included in an umbrella review. |
| PROSPERO International Prospective Register of Systematic Reviews. The standard registry for pre-registering review protocols to increase transparency. | PREDICTION INTERVAL (PI) A range capturing where the true effect size would be expected in a new study. Wider PIs indicate higher uncertainty about the generalisability of an effect. |
| SMALL STUDY EFFECTS The tendency for smaller studies to report larger effect sizes, potentially indicating publication bias. Assessed via Egger’s test and funnel plot asymmetry. | EXCESS SIGNIFICANCE BIAS When the number of statistically significant results in a meta-analysis exceeds what would be expected by chance — a marker of potential selective reporting. |
What Is an Umbrella Review?
An umbrella review is a systematic review of previously published systematic reviews and/or meta-analyses. Where a typical systematic review synthesizes primary studies (randomized controlled trials, cohort studies, etc.), an umbrella review sits one level higher: its building blocks are themselves the products of evidence synthesis.
The name is apt: an umbrella review holds multiple systematic reviews under a single canopy, offering a panoramic view of an entire research landscape rather than a narrow cross-section of it.
CORE DEFINITION
An umbrella review is a systematic review whose unit of analysis is other systematic reviews or meta-analyses, aggregating findings from several reviews that address specific questions under a shared topic. Each umbrella review focuses on a broad condition or problem for which there are two or more potential interventions, exposures, or outcomes of interest.
It is also known by several other names in the literature:
- Overview of reviews (the preferred Cochrane terminology)
- Review of reviews
- Meta-review
- Summaries of systematic reviews
- Syntheses of reviews
These terms are used interchangeably across different institutions and journals, though methodological purists sometimes make fine distinctions between them.
Why Umbrella Reviews Exist
The exponential growth of biomedical publishing has created a paradox: researchers now have access to more evidence than ever before, yet synthesizing it all is increasingly impossible. As the number of systematic reviews has grown year-on-year, a new level of synthesis became necessary.
- Thousands of systematic reviews are published annually across medicine, psychology, and public health
- Multiple competing SRMAs often exist on the same narrow question, reaching different conclusions
- Clinicians, guideline developers, and policymakers need a single, authoritative synthesis and not a reading list of competing reviews
- Health technology assessments that evaluate all management options for a condition benefit enormously from a single umbrella document
- A field may have been split into focused populations or interventions across many reviews; an umbrella review restores coherence by bringing them together
THE LUMPING VS. SPLITTING PROBLEM
Umbrella reviews solve a structural challenge: research on a broad topic tends to be systematically reviewed in narrow slices (by subgroup, intervention variant, or outcome). The umbrella review restores the comprehensive picture by synthesizing those slices without having to re-examine thousands of primary studies.
When to Use an Umbrella Review
Not every research question warrants an umbrella review. The method is most appropriate under specific conditions.
Ideal Conditions for an Umbrella Review
- Two or more existing high-quality systematic reviews already exist on the topic
- The field has been well-covered by SRMAs but lacks an overarching synthesis
- Multiple competing interventions or exposures exist for a single condition
- Guideline developers need a broad, defensible evidence base quickly
- The question is broad enough that a single SR would be unwieldy or uninformative
- Decision-makers need a rapid orientation to the state of evidence across a complex topic
- There is confusion or contradiction between existing SRMAs that a higher-level synthesis could clarify
When NOT to Use an Umbrella Review
- Existing systematic reviews are absent, sparse, or of very low quality
- The research question is narrow enough that a single SR would suffice
- The topic area is too new for secondary evidence to exist
- Subgroup analyses of individual patient data are needed (an umbrella review cannot do this)
- The goal is to re-analyse primary data; umbrella reviews work only at the review level
Umbrella Reviews vs. Other Review Types
| Review Type | Unit of Analysis | Scope | Question Type | Typical Output |
| Umbrella Review | Systematic reviews & meta-analyses | Very broad | Multiple interventions/exposures/outcomes | Panoramic evidence map; evidence grading |
| Systematic Review | Primary studies (RCTs, cohorts, etc.) | Narrow–moderate | Specific PICO question | Pooled or narrative synthesis |
| Meta-Analysis | Quantitative data from primary studies | Narrow | One or few outcomes | Pooled effect estimate (OR, RR, MD) |
| Scoping Review | Primary studies (any design) | Broad | Landscape mapping; concept clarification | Map of evidence; research gaps |
| Narrative Review | Primary studies (selective) | Variable | Background, educational | Expert-guided summary; not reproducible |
| Rapid Review | Primary studies or SRs | Narrow–moderate | Policy-urgent questions | Abbreviated synthesis with speed tradeoffs |
| Realist Review | Primary studies | Variable | What works for whom in what context? | Mechanistic explanations; theory-building |
Umbrella Review vs. Systematic Review: Key Differences
| Dimension | Systematic Review | Umbrella Review |
| Building block | Primary study | Systematic review or meta-analysis |
| Eligibility criteria | Focused and specific | Broader, but still pre-specified |
| Search terms | Disease/intervention keywords | Disease/topic + systematic review OR meta-analysis |
| Quality appraisal tool | Cochrane RoB, ROBINS-I, NOS, etc. | AMSTAR-2, ROBIS, JBI checklist |
| Statistical re-analysis | Meta-analysis of primary data | Re-run existing meta-analyses using standardized methods |
| Time to complete | 12–24 months (typical) | 6–10 weeks (professional teams) |
| Overlap concern | Duplicate data in pooled analyses | Same primary studies across multiple SRs (use CCA) |
| Evidence hierarchy position | High | Highest currently available |
The Evidence Hierarchy
Umbrella reviews sit at the apex of the evidence pyramid. Understanding where they stand relative to other study designs clarifies their unique value.
| Umbrella Reviews |
| Systematic Reviews & Meta-Analyses |
| Randomized Controlled Trials (RCTs) |
| Cohort & Prospective Studies |
| Case-Control & Cross-Sectional Studies |
| Case Reports & Case Series |
| Expert Opinion & Editorials |
Umbrella reviews represent one of the highest levels of evidence synthesis currently available. They do not directly include primary studies; instead, they synthesize the work of dozens or hundreds of individual systematic reviews. This means that their conclusions rest on a very broad base of underlying research.
Registering an Umbrella Review Prospectively
Step 1: Develop a Protocol Before Starting Review Work
A protocol should be finalized before study selection/screening begins (and at minimum, before data extraction starts). It typically includes:
- Research question(s) in PICO/PECO format
- Eligibility criteria for systematic reviews to be included
- Search strategy and databases
- Quality appraisal tool(s) (e.g., AMSTAR-2, ROBIS, JBI)
- Data extraction and synthesis plan
- Handling of overlapping primary studies across reviews
Step 2: Choose a Registry
PROSPERO is the most widely recognized and commonly used registry for umbrella reviews in health research.
Step 3: Timing of Registration
- Registration should take place once the review protocol has been finalised, but ideally before screening studies for inclusion begins.
- Reviews are accepted for registration as long as they have not progressed beyond completion of data extraction.
- Completed reviews are not accepted. Registration must occur before the review is finished.
- Registration must occur before data extraction begins; earlier is better and more credible for accountability purposes.
Step 4: Register on PROSPERO
- Go to the PROSPERO website (CRD, University of York)
- Create an account
- Complete the online registration form, including:
- Review title and team details
- Anticipated start/completion dates
- Condition/domain being studied
- Comparator/exposure and outcomes
- Search strategy summary
- Quality appraisal tool(s) to be used
- Submit for review by PROSPERO administrators
- Receive a unique CRD registration number (format: CRD4YYYYxxxxxx)
Step 5: Cite the Registration
- A unique CRD identifier issued at approval is used by journals, peer reviewers, and funders to verify the published review matches the registered plan.
- Include the registration number in:
- The protocol publication (if published separately)
- The final manuscript’s abstract and methods section
Step 6: Handling Amendments
- Amendments are dated and version-tracked.
- Deviations from the registered protocol are described in the manuscript’s methods section.
Example Registration Numbers (Real Umbrella Reviews)
| Umbrella Review Topic | PROSPERO ID |
| Psychological interventions post-stroke/TIA | CRD42022375947 |
| Ethnic diversity in RCTs | CRD42022325241 |
| SNPs and lung cancer risk | CRD42020204685 |
| PPI with children/families | CRD42024608935 |
| Postpartum depression risk factors | CRD420251249033 |
Quick Checklist
- Protocol drafted (PICO, eligibility, search, appraisal, synthesis plan)
- Registration completed before screening/data extraction
- Quality appraisal tool specified in advance (AMSTAR-2, ROBIS, JBI)
- PROSPERO/OSF number obtained
- Registration number cited in final publication
- Any protocol deviations documented and explained
This is a general informational overview based on current registry guidance; if precise current PROSPERO eligibility criteria are critical to your submission, it’s worth double-checking directly on the PROSPERO website, as policies can be updated periodically.
How to Conduct an Umbrella Review: Step-by-Step Methodology
Conducting an umbrella review is a rigorous process. Below are the essential steps, drawn from the leading methodological guidance documents including the JBI Manual, the BMJ Medicine guidelines, and published frameworks from Fusar-Poli & Radua.
Confirm the Review Is Needed & Register a Protocol
Before beginning, verify that no equivalent umbrella review has already been published or is underway (check PROSPERO and published literature). Pre-specify your protocol and register it on PROSPERO. This prevents selective reporting and establishes transparency. Protocol registration is increasingly required by journals.
Define the Research Question (PICO/PECO Framework)
Clearly articulate what you are asking. For intervention reviews, use the PICO framework (Population, Intervention, Comparison, Outcome). For epidemiological umbrella reviews, define the population(s), risk factor(s) or exposure(s), and outcome(s). The scope should be broader than a typical SR but still precisely delimited.
Develop Explicit Eligibility Criteria
Specify which types of SRMAs will be included and excluded. Typical criteria address publication type, language and date restrictions, minimum quality thresholds (or the decision to include all regardless of quality), whether to include SRMAs of observational studies only or RCTs only or both, and the definition of relevant population, intervention/exposure, and outcomes.
Construct a Two-Part Search Algorithm
The search string has two components combined using boolean AND: (1) a study design filter identifying SRMAs (e.g., ‘systematic review*’ OR ‘meta-analys*’), and (2) a topic filter covering all relevant keywords, MeSH terms, and synonyms. Search multiple databases (MEDLINE, Embase, Cochrane Library, PsycINFO) and grey literature sources.
Screen Literature Independently (Double Screening)
Two independent reviewers screen titles and abstracts, then full texts, using the pre-specified eligibility criteria. Disagreements are resolved through discussion or a third reviewer. Document exclusion reasons at the full-text stage and present results in a PRISMA flow diagram.
Manage Overlap Between SRMAs
When multiple SRMAs cover the same primary studies and outcomes, decide which to include. Common strategies: choose the most recent SRMA; choose the SRMA with the largest number of studies; choose the SRMA with the highest AMSTAR-2 quality; for epidemiological reviews, choose the SRMA with the most prospective studies. Calculate the Corrected Covered Area (CCA) to quantify overlap.
Extract Data Using Standardized Forms
Two reviewers independently extract data. For each included SRMA, extract: number of included studies and total sample size; study-specific effect estimates and 95% confidence intervals; heterogeneity statistics (I², Cochran’s Q, tau²); any risk-of-bias or quality assessments reported within the SRMA; and descriptive conclusions for narrative reviews without meta-analysis.
Re-Run Meta-Analyses with Standardized Methods
Rather than simply reporting the pooled estimates as published, re-run each meta-analysis using standardized statistical models. This ensures comparability across all included SRMAs. Conduct consistency checks and assess heterogeneity and potential biases (small study effects, excess significance bias) uniformly.
Assess Methodological Quality of Included SRMAs
Apply validated appraisal tools, most commonly AMSTAR-2, to each included SRMA. Appraisal should be done independently by two reviewers, with consensus discussion for discrepancies. Summarize quality findings in a table.
Grade the Strength of Evidence
For intervention reviews: apply GRADE. For epidemiological umbrella reviews: use criteria assessing amount of evidence, statistical significance, heterogeneity, small study effects, and excess significance bias. Consider performing sensitivity analyses restricted to prospective studies to examine temporality of associations.
Report Results Transparently
Report in both tabular and graphical formats. Key elements include summary tables of all meta-analyses with key statistics, evidence grading tables, PRISMA flow diagrams, and optional visual plots (forest plots, Manhattan plots). Address contradictory conclusions across SRMAs explicitly.
Interpret Findings Carefully
Interpret with attention to confounding (for observational reviews), clinical relevance, external validity/generalizability, and the limitations of the included SRMAs. Causal claims require extreme caution. Discuss biological plausibility and cite supporting evidence from other methodologies (e.g., Mendelian randomisation studies).
What is AMSTAR-2?
AMSTAR-2 (A MeaSurement Tool to Assess systematic Reviews, version 2) is a 16-item checklist used to evaluate the methodological quality of systematic reviews, including those with or without meta-analysis. In an umbrella review (a review that synthesizes findings from multiple systematic reviews/meta-analyses on related topics), AMSTAR-2 is applied to each included systematic review to judge how trustworthy its conclusions are before they’re pooled or compared.
Why AMSTAR-2 Matters in Umbrella Reviews
- Umbrella reviews combine evidence from many systematic reviews, often covering overlapping primary studies.
- The overall conclusions are only as reliable as the weakest review included.
- AMSTAR-2 helps flag reviews with serious methodological flaws so their findings can be interpreted with appropriate caution or downweighted.
The 16 Items
| # | Domain | Critical? |
| 1 | PICO components in research questions | No |
| 2 | Pre-established protocol | No |
| 3 | Explanation for study design selection | No |
| 4 | Comprehensive literature search | Yes |
| 5 | Study selection in duplicate | No |
| 6 | Data extraction in duplicate | No |
| 7 | List of excluded studies with justification | Yes |
| 8 | Adequate description of included studies | No |
| 9 | Satisfactory risk of bias (RoB) assessment | Yes |
| 10 | Reporting of funding sources for included studies | No |
| 11 | Appropriate meta-analysis methods | Yes |
| 12 | Assessment of RoB impact on results | Yes |
| 13 | Accounting for RoB when interpreting results | Yes |
| 14 | Explanation of heterogeneity | No |
| 15 | Investigation of publication bias | Yes |
| 16 | Disclosure of conflicts of interest | No |
Rating Each Review
Each item is rated as:
- Yes
- Partial Yes
- No
Overall Confidence Ratings
Based on the pattern of weaknesses across the 7 critical domains, each systematic review receives an overall rating:
| Rating | Criteria |
| High | No or one non-critical weakness |
| Moderate | More than one non-critical weakness |
| Low | One critical flaw, with or without non-critical weaknesses |
| Critically Low | More than one critical flaw |
How Umbrella Review Authors Use AMSTAR-2
- Two independent raters typically apply AMSTAR-2 to each included review, resolving disagreements by consensus or a third reviewer.
- Results are usually presented in a summary table showing each review’s score per item and its overall confidence rating.
- Critically low quality reviews may be:
- Excluded from the primary synthesis
- Reported separately as supporting/contextual evidence
- Used in sensitivity analyses
- Confidence ratings often feed into the overall certainty of evidence (alongside tools like GRADE) when drawing umbrella-level conclusions.
Common Reporting Table in Umbrella Reviews
| Included Review | Item 4 | Item 7 | Item 9 | Item 11 | Item 12 | Item 13 | Item 15 | Overall Rating |
| Review A | Yes | Partial Yes | Yes | Yes | No | Yes | Yes | Moderate |
| Review B | No | No | Partial Yes | Yes | No | No | No | Critically Low |
Key Limitation
AMSTAR-2 assesses how a review was conducted, not whether its findings are correct. A methodologically strong review can still report null or modest effects, and a flawed one can still report a true effect. It’s a quality lens, not a validity verdict on the underlying evidence itself.
What is GRADE?
GRADE (Grading of Recommendations Assessment, Development and Evaluation) is a framework for rating the certainty of evidence for a given outcome and, where applicable, the strength of recommendations. In an umbrella review, GRADE is applied to the body of evidence underlying each outcome reported across the included systematic reviews/meta-analyses and it helps readers judge how much confidence to place in each summary effect.
Why GRADE Matters in Umbrella Reviews
- Umbrella reviews often report many outcome-exposure or outcome-intervention associations.
- Not all associations are equally trustworthy, even if statistically significant.
- GRADE provides a standardized way to communicate how confident readers should be that the reported effect reflects the true effect.
Starting Point Based on Study Design
| Body of Evidence | Starting Certainty |
| Randomized controlled trials (RCTs) | High |
| Observational studies (cohort, case-control) | Low |
The 5 Domains That Can Downgrade Certainty
| Domain | What It Assesses |
| Risk of bias | Methodological limitations in the primary studies underlying the reviews |
| Inconsistency | Unexplained heterogeneity in effect estimates across studies/reviews |
| Indirectness | Differences in population, intervention, comparator, or outcome from the question of interest |
| Imprecision | Wide confidence intervals or small sample sizes/event numbers |
| Publication bias | Evidence that studies with null/negative results are missing |
3 Factors That Can Upgrade Certainty (Mainly for Observational Evidence)
| Factor | Description |
| Large effect size | Strong or very strong magnitude of association |
| Dose-response gradient | Effect increases consistently with exposure level |
| Plausible confounding | Confounders would likely reduce, not create, the observed effect |
Final Certainty Ratings
| Rating | Symbol | Interpretation |
| High | ⊕⊕⊕⊕ | True effect is close to the estimated effect |
| Moderate | ⊕⊕⊕◯ | True effect is probably close to estimate, but could differ |
| Low | ⊕⊕◯◯ | True effect may be substantially different |
| Very Low | ⊕◯◯◯ | Very little confidence in the estimate |
How Umbrella Review Authors Apply GRADE
- Often combined with other classification systems specific to umbrella reviews (e.g., evidence classes based on significance, sample size, and 95% prediction intervals), but GRADE remains the most widely recognized certainty framework.
- Each outcome/association is assessed individually, since certainty can vary even within the same umbrella review.
- Findings are typically presented in a GRADE summary table alongside effect estimates.
Example Summary Table
| Outcome | Effect Estimate (95% CI) | Risk of Bias | Inconsistency | Indirectness | Imprecision | Publication Bias | Overall Certainty |
| Outcome 1 | RR 1.45 (1.20–1.75) | Not serious | Serious | Not serious | Not serious | Not serious | Moderate |
| Outcome 2 | OR 0.88 (0.60–1.30) | Serious | Serious | Not serious | Serious | Not detected | Very Low |
GRADE vs AMSTAR-2
- AMSTAR-2 assesses the methodological quality of each included systematic review (the “wrapper”).
- GRADE assesses the certainty of the evidence for each outcome (the underlying findings).
- Used together, they give umbrella review readers a fuller picture: was the review conducted well, AND can we trust its reported effect?
Key Limitation
GRADE was originally developed for single systematic reviews informing clinical guidelines, so applying it within umbrella reviews requires adaptation. This is particularly relevant when you are summarizing across multiple overlapping reviews with shared primary studies, which can complicate judgments about imprecision and inconsistency.
Managing Overlap of Primary Studies in Umbrella Reviews
One of the most technically complex challenges in umbrella reviews is overlap: many systematic reviews on the same topic will have included some or all of the same primary studies. If left unaddressed, overlap can inflate the apparent evidence base and artificially narrow confidence intervals. This would make results look more precise than they actually are.
Why Overlap Matters
- A single large, high-quality RCT included in five separate meta-analyses may have disproportionate influence on the overall umbrella review finding
- Overlap that inflates sample sizes can produce misleadingly low p-values, increasing Type I error (false positive) risk
- Contradictory conclusions between SRMAs on the same question may stem from differing eligibility criteria, search dates, or statistical methods
Strategies for Handling Overlap
| Strategy | Description | Best Used When |
| Corrected Covered Area (CCA) | Quantifies the proportion of primary studies shared across SRMAs. CCA <0.05 = slight; 0.05–0.10 = moderate; 0.11–0.15 = high; >0.15 = very high overlap. | Always — as a reporting measure alongside any overlap decisions |
| Restrict by recency | Among overlapping SRMAs, include only the most recently published version | When the topic has evolved rapidly and newer SRMAs incorporate more updated evidence |
| Restrict by size | Prefer the SRMA with the most included primary studies | When comprehensiveness of coverage is the priority |
| Restrict by quality | Use AMSTAR-2 scores to select the methodologically highest-quality SRMA | When methodological rigour is the priority |
| Include all + CCA reporting | Include all SRMAs and quantify overlap using CCA; interpret results in light of the overlap magnitude | When comprehensiveness and transparency are prioritized; common in observational umbrella reviews |
Important clarification
An umbrella review does not combine or re-pool results from different meta-analyses in a grand statistical synthesis. It describes and grades the evidence from each SRMA separately. Therefore, overlap does not carry the same statistical risk as in a traditional meta-analysis, but it must still be acknowledged and managed for interpretive clarity.
Quality Assessment Tools for Umbrella Reviews
Evaluating the methodological quality of included SRMAs is essential to interpreting umbrella review findings. The quality of an umbrella review is ultimately contingent on the quality of its constituent SRMAs.
AMSTAR-2 (Primary Tool)
AMSTAR-2 is the most widely used tool for appraising systematic reviews in the context of umbrella reviews. It covers 16 items and distinguishes between critical and non-critical domains.
ROBIS for Umbrella Reviews
ROBIS (Risk of Bias in Systematic Reviews) is a tool specifically developed to assess the risk of bias — rather than just reporting quality — of systematic reviews included in an umbrella review.
Structure: 3 Phases
| Phase | Purpose |
| Phase 1 | Assess relevance of the review to the umbrella review’s question (optional) |
| Phase 2 | Identify concerns in 4 domains via signaling questions |
| Phase 3 | Judge overall risk of bias based on Phase 2 |
The 4 Domains (Phase 2)
| Domain | Focus |
| 1. Study eligibility criteria | Were inclusion criteria appropriate and clearly defined before conducting the review? |
| 2. Identification & selection of studies | Was the search comprehensive and selection bias minimized? |
| 3. Data collection & study appraisal | Were data extraction and risk-of-bias assessments of primary studies done appropriately? |
| 4. Synthesis & findings | Were methods for synthesis, and interpretation of results, appropriate? |
Signaling Question Responses
- Yes
- Probably Yes
- Probably No
- No
- No Information
Overall Risk of Bias Judgment
| Rating | Meaning |
| Low | Few or no concerns across domains |
| High | Concerns in one or more domains significantly affecting confidence |
| Unclear | Insufficient information to judge |
Use in Umbrella Reviews
- Each domain receives a low/high/unclear rating, then an overall risk of bias judgment is made for the review as a whole.
- Often used as an alternative or complement to AMSTAR-2. Some umbrella reviews use both for triangulation.
- Reviews rated high risk of bias may be flagged, excluded from primary synthesis, or interpreted cautiously.
ROBIS vs. AMSTAR-2
| Feature | ROBIS | AMSTAR-2 |
| Primary focus | Risk of bias | Methodological quality |
| Domains | 4 | 7 critical + 9 non-critical (16 total) |
| Overall rating | Low/High/Unclear | High/Moderate/Low/Critically Low |
| Common in | Public health, epidemiology | Health interventions broadly |
JBI Critical Appraisal Checklist for Umbrella Reviews
The JBI (Joanna Briggs Institute) Critical Appraisal Checklist for Systematic Reviews and Research Syntheses is part of a suite of design-specific JBI tools, widely used in nursing, allied health, and JBI-affiliated reviews.
Checklist Items (11 Questions)
| # | Question Focus |
| 1 | Were the review questions clearly stated? |
| 2 | Were inclusion criteria appropriate? |
| 3 | Was the search strategy appropriate? |
| 4 | Were sources/resources for studies adequate? |
| 5 | Were criteria for appraising studies appropriate? |
| 6 | Was critical appraisal conducted by ≥2 reviewers independently? |
| 7 | Were methods to minimize errors in data extraction used? |
| 8 | Were appropriate methods used to combine studies? |
| 9 | Was likelihood of publication bias assessed? |
| 10 | Were recommendations for policy/practice supported by data? |
| 11 | Were specific directives for new research appropriate? |
Response Options
- Yes
- No
- Unclear
- Not Applicable
Use in Umbrella Reviews
- Particularly favored when the umbrella review itself follows JBI methodology (which has its own formal guidance for conducting umbrella reviews).
- Results presented in a simple summary table (reviews × items, with Y/N/U/NA responses).
- No formal overall “score” or cutoff; appraisal informs narrative judgment about including/weighting a review.
- Often paired with JBI’s own umbrella review conduct guidelines, creating methodological consistency for JBI-affiliated authors.
JBI Checklist vs. AMSTAR-2
| Feature | JBI Checklist | AMSTAR-2 |
| Item count | 11 | 16 |
| Overall rating system | None (narrative) | Yes (4-tier) |
| Discipline association | Nursing/allied health | General health sciences |
| Companion conduct guidance | JBI Umbrella Review Methodology | None specific |
PRISMA Checklist for Umbrella Reviews
PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) is not a quality or risk-of-bias appraisal tool. It is a reporting guideline ensuring transparency and completeness of how a review (including umbrella reviews) is described.
Key Distinction
| Tool | What It Assesses |
| AMSTAR-2 / ROBIS / JBI | Methodological quality / risk of bias |
| PRISMA | Completeness and transparency of reporting |
PRISMA Components
| Component | Description |
| 27-item checklist | Covers title, abstract, methods, results, discussion, funding |
| PRISMA flow diagram | Visualizes study identification, screening, exclusion, and inclusion |
| PRISMA extensions | Specialized versions (e.g., PRISMA-P for protocols; no dedicated umbrella-review version, but PRISMA 2020 is commonly adapted) |
Use in Umbrella Reviews
- Authors use PRISMA to structure and report the umbrella review itself and not to appraise the included systematic reviews.
- A completed PRISMA flow diagram documents how many systematic reviews were identified, screened, excluded (with reasons), and included.
- Journals frequently require a PRISMA checklist submission alongside the manuscript.
- Improves reproducibility and transparency, allowing readers to trace the selection process for included reviews.
Relationship to Quality Appraisal Tools
- PRISMA compliance does not indicate methodological quality. A review can be well-reported but methodologically weak (low AMSTAR-2/high ROBIS risk), or vice versa.
- PRISMA is therefore typically used alongside, not instead of, AMSTAR-2, ROBIS, or JBI tools in umbrella reviews.
Practical note
A common debate in umbrella reviews is whether to exclude low-quality SRMAs from the synthesis. Most methodologists recommend including all SRMAs regardless of quality (to avoid overestimating or underestimating effect sizes from incomplete data), while clearly reporting quality ratings and conducting sensitivity analyses that restrict findings to higher-quality reviews.
How to Grade the Evidence in an Umbrella Review
Quality assessment (is this SRMA well-conducted?) and evidence grading (how strong is the overall body of evidence?) are distinct steps that are often confused.
GRADE for Intervention Reviews
When the umbrella review addresses interventions, GRADE is the validated approach. GRADE evaluates certainty of evidence across five factors:
- Risk of bias in the underlying studies
- Inconsistency: unexplained variability across studies
- Indirectness: applicability of evidence to the specific question
- Imprecision: width of confidence intervals
- Publication bias: likelihood of selective reporting
Evidence is rated as: High → Moderate → Low → Very Low certainty.
Ioannidis Criteria for Epidemiological Reviews
For umbrella reviews of observational or epidemiological associations (risk factors, predictors), a widely used complementary framework evaluates each meta-analysis on:
| Criterion | Description | Threshold (example) |
| Amount of evidence | Total number of cases or participants | ≥1,000 cases |
| Statistical significance | P-value for the summary effect estimate | P < 0.001 |
| Heterogeneity | I² statistic across studies | I² < 50% |
| Prediction interval | Does the 95% PI exclude the null? | PI excludes 1.0 (or 0) |
| Small study effects | Egger’s test or funnel plot asymmetry | P > 0.10 for Egger’s test |
| Excess significance bias | More significant studies than expected | P > 0.10 for excess significance test |
Based on these criteria, each association may be classified as: Convincing, Highly Suggestive, Suggestive, Weak, or Not Significant.
How to Report Umbrella Reviews
Transparent reporting is critical to umbrella review quality. Several reporting guidelines apply:
Reporting Standards for Umbrella Reviews
| Guideline | Full Name | Applicable To |
| PRISMA-OvR | Preferred Reporting Items for Overviews of Reviews | All umbrella/overview reviews; the primary reporting standard |
| MOOSE | Meta-analysis Of Observational Studies in Epidemiology | Epidemiological umbrella reviews of observational data |
| PRISMA 2020 | Preferred Reporting Items for Systematic Reviews and Meta-Analyses | Baseline guidance applicable to the overall structure |
| GRADE SoF tables | Summary of Findings tables (GRADE format) | Presenting evidence certainty for intervention reviews |
| PROSPERO Protocol | International Prospective Register of Systematic Reviews | Pre-registration of protocol before data collection |
Key Reporting Elements in Umbrella Review Results
For each included SRMA, report:
- Total number of events or cases (binary outcomes) and total sample size
- Number of included primary studies
- Effect size metric used (OR, RR, HR, MD, SMD)
- Meta-analysis model (fixed-effect vs. random-effects)
- Summary effect estimate and 95% confidence interval
- 95% prediction interval
- Heterogeneity statistics (I², Cochran’s Q p-value, tau²)
- Effect estimate from the largest single included study
- Results of small study effects and excess significance tests
- Overall evidence grade
Software & Tools for Umbrella Reviews
| Tool | Purpose | Stage of Review |
| Rayyan | AI-assisted title/abstract screening; collaboration between reviewers | Screening |
| Covidence | Full-text screening, data extraction, conflict resolution | Screening & extraction |
| EPPI-Reviewer | Systematic review management; useful for large umbrella reviews | All stages |
| R (metafor package) | Statistical meta-analysis; heterogeneity and bias tests | Statistical analysis |
| Stata (meta suite) | Meta-analytic modelling; funnel plots; Egger’s test | Statistical analysis |
| RevMan | Cochrane’s review management tool; forest plots | Analysis & reporting |
| GROOVE tool | Graphical representation of overlap across systematic reviews | Overlap assessment |
| GRADEpro GDT | GRADE evidence profiling; Summary of Findings tables | Evidence grading |
| PROSPERO | Protocol registration registry | Pre-review planning |
| ATLAS.ti / NVivo | Qualitative data management (for narrative synthesis components) | Qualitative synthesis |
Strengths & Limitations of Umbrella Reviews
Strengths
- Provides the highest-level, most comprehensive overview of evidence on a broad topic in a single document
- Efficient for decision-makers: saves time compared to reading dozens of individual SRMAs
- Particularly valuable for health technology assessments evaluating all management options for a condition
- Resolves the ‘lumping vs. splitting’ tension in research synthesis
- Enables comparison of the strength of evidence across multiple interventions, exposures, or outcomes simultaneously
- Can reveal contradictions between existing SRMAs and explain their sources
- Standardized re-analysis of each meta-analysis corrects errors in published SRMAs that used inappropriate statistical models
- Feasible in 6–10 weeks for professional teams, since primary data re-analysis is not required
Limitations
- ‘Garbage in, garbage out’: An umbrella review is wholly dependent on the accuracy and rigour of the included SRMAs. If the underlying reviews are biased, the umbrella review inherits those biases.
- Cannot fill evidence gaps: If a research area lacks systematic reviews, an umbrella review cannot be conducted.
- No individual patient data (IPD) analysis: Because primary data are not re-examined, subgroup analyses at the participant level are not possible.
- Overlap inflation: Even with CCA management, overlapping primary studies across SRMAs can create an illusion of more independent evidence than actually exists.
- Difficulty in causal inference: For epidemiological umbrella reviews, confounding, reverse causality, and selection bias remain serious threats.
- Clinical heterogeneity: Combining SRMAs that varied in their populations, interventions, comparators, and outcomes can make the overall synthesis clinically difficult to interpret.
Annotated Real-World Examples
The following examples illustrate how umbrella reviews function in practice — what questions they asked, what methods they used, and what their findings demonstrated.
EXAMPLE 1 · EPIDEMIOLOGY
Risk Factors for the Onset of Type 2 Diabetes Mellitus
- What it asked: Which non-genetic factors are associated with developing type 2 diabetes, and how strong is the evidence for each?
- Scope: Synthesized 142 epidemiological associations from multiple SRMAs. Population: individuals without T2DM at study baseline. Exposure: any non-genetic factor. Outcome: incident T2DM.
- Overlap handling: When multiple SRMAs covered the same exposure-outcome pair, researchers selected the SRMA with the largest number of prospective studies, to preserve temporality of association (exposure before outcome).
- Evidence grading: Applied Ioannidis criteria to each of the 142 associations, classifying each as convincing, highly suggestive, suggestive, weak, or not significant.
- Visualization: Results were presented in a comprehensive table of all 142 associations with full statistics, plus a Manhattan plot (a visual borrowed from genomics) that made the panoramic pattern of evidence immediately readable.
- Why it was useful: No single systematic review could have assessed 142 associations simultaneously. The umbrella review revealed which risk factors had convincing, consistent evidence and which were spurious — directly informing prevention guidelines.
EXAMPLE 2 · PHARMACOLOGY / WOMEN’S HEALTH
Menopausal Hormone Therapy and Women’s Health
- What it asked: What are the effects of menopausal hormone therapy (MHT) across a wide range of health outcomes: cardiovascular disease, fractures, cancer, cognition, and others?
- Why an umbrella review was needed: Dozens of systematic reviews existed on individual outcomes of MHT. Clinicians and guideline developers needed a single document that assessed and compared the evidence across all relevant outcomes at once.
- Methods: Included SRMAs on randomized and observational designs. Quality was assessed using AMSTAR-2. Evidence certainty was graded using GRADE, yielding High/Moderate/Low/Very Low ratings for each outcome.
- Key value: The umbrella review showed, side-by-side, that MHT has moderate-certainty evidence of benefit for fracture prevention and hot flashes, but very low-certainty evidence for cognitive outcomes. This is a nuanced, actionable picture that no individual SR provided.
EXAMPLE 3 · PSYCHIATRY / MENTAL HEALTH
Umbrella Reviews in Early Psychosis
- Context: Paolo Fusar-Poli and Joaquim Radua, the authors of the landmark ‘Ten Simple Rules for Conducting Umbrella Reviews’, applied umbrella review methodology extensively in psychiatry, particularly around risk factors for and interventions in early psychosis.
- Evidence stratification: Each risk factor for psychosis transition was classified into evidence tiers, allowing clinicians to quickly identify factors with convincing vs. weak evidence — critical for clinical risk calculators and early intervention programs.
- These umbrella reviews demonstrated pre-specifying the protocol, defining variables of interest such as transition to psychosis as a clear binary outcome, estimating a common effect size (OR) across all SRMAs, and reporting the heterogeneity and 95% prediction intervals for each association.
EXAMPLE 4 · NUTRITION SCIENCE
Diet-Associated Inflammation and 38 Chronic Disease Outcomes
- What it asked: What is the strength of evidence linking dietary inflammatory potential to 38 different chronic disease outcomes?
- Efficiency demonstrated: This umbrella review was completed in approximately one year and assessed 38 chronic disease outcomes. This is a scope that would be entirely infeasible if starting from primary studies.
- Limitations acknowledged: The authors explicitly noted that the review could not capture associations not yet covered by published meta-analyses, and that individual patient data analyses were not possible within the umbrella review framework.
EXAMPLE 5 · INFECTIOUS DISEASE / PUBLIC HEALTH
Long COVID Prevalence and Risk Factors (Rapid Systematic Umbrella Review)
- Context: As the COVID-19 pandemic generated a rapid explosion of primary studies and systematic reviews on Long COVID, an umbrella review approach allowed researchers to synthesize the evidence at the review level.
- Unique use case: The umbrella review explicitly used the systematic review level to examine common biases and limitations across the field — demonstrating how umbrella reviews can serve a methodological surveillance function, not just a substantive evidence synthesis one.
- What it found: 14 reviews covering 5–196 primary studies were included. Pooling was not performed; instead, a descriptive meta-synthesis of prevalence estimates, risk factors, and bias patterns was conducted. This example illustrates that umbrella reviews can be qualitative/narrative as well as statistical.
Key Takeaways
- An umbrella review is a systematic review of systematic reviews and/or meta-analyses; it synthesizes evidence at the highest available level, one step above individual SRMAs in the evidence hierarchy.
- It is also called an ‘overview of reviews,’ ‘meta-review,’ or ‘review of reviews’. These terms are largely interchangeable in the current literature, though Cochrane prefers ‘overview of reviews.’
- Umbrella reviews are appropriate only when a research topic is already well-covered by existing SRMAs. They cannot substitute for a primary systematic review where SRMAs do not yet exist.
- The unit of analysis is the SRMA, not the primary study. But researchers typically re-run each meta-analysis using standardized statistical methods rather than simply reporting the published pooled estimates.
- The two-part search algorithm (study design filter + topic filter, combined with AND) is the defining methodological feature that differentiates an umbrella review search from a standard systematic review search.
- Overlap (the same primary studies appearing in multiple included SRMAs) is a key methodological challenge. The Corrected Covered Area (CCA) is the standard metric to quantify and report overlap.
- AMSTAR-2 is the dominant tool for appraising the methodological quality of included SRMAs; ROBIS and JBI checklists are also used. Quality assessment is distinct from evidence grading.
- GRADE is used to grade certainty of evidence for intervention umbrella reviews; the Ioannidis criteria are widely used for epidemiological umbrella reviews.
- Pre-registration on PROSPERO and reporting adherent to PRISMA-OvR are the expected standards for publication-ready umbrella reviews.
- Umbrella reviews inherit the biases and limitations of the SRMAs they include. A well-conducted umbrella review of low-quality SRMAs will still yield unreliable conclusions.
- Umbrella reviews cannot generate new pooled effect sizes from primary data and are not designed for individual patient data (IPD) subgroup analyses.
- The efficiency advantage of umbrella reviews is substantial: professional teams can often complete one in 6–10 weeks, compared to 12–24 months for a full de novo systematic review.
Frequently Asked Questions
Can I do an umbrella review if some of the systematic reviews I find are low quality? Do I have to exclude them?
This is one of the most commonly debated questions in umbrella review practice. The general methodological consensus is: include all eligible SRMAs regardless of quality, but clearly report quality ratings and conduct sensitivity analyses that restrict findings to higher-quality reviews.
Excluding low-quality SRMAs a priori risks distorting the evidence base: either by overestimating or underestimating effect sizes from a selectively curated subset. Instead, assess quality using AMSTAR-2 (or ROBIS/JBI), present the ratings transparently in a table, and discuss how the inclusion of lower-quality reviews may have influenced your overall conclusions. Sensitivity analyses restricted to ‘high’ and ‘moderate’ AMSTAR-2 ratings are the appropriate way to test whether your conclusions hold when low-quality reviews are removed.
If multiple systematic reviews all include the same studies, isn’t the ‘umbrella review’ just inflating the evidence base and showing me one study multiple times?
This concern is valid and points to the overlap problem, which is the most technically complex challenge in umbrella reviews. However, there are two important clarifications.
First, umbrella reviews do not statistically pool the results of different meta-analyses into a grand combined estimate. Each SRMA is analysed and reported separately. So the inflation risk is interpretive, not arithmetical. Second, the standard approach is to calculate the Corrected Covered Area (CCA). CCA values are classified as slight (<5%), moderate (5–10%), high (11–15%), or very high (>15%). This metric helps readers understand how much of the apparent evidence is truly independent: a critical caveat that should be prominently reported.
Do I need a full team to conduct an umbrella review, or can I do it alone? Is it really faster than a systematic review?
Unlike a standard systematic review (which mandates double-screening and double-extraction by at least two independent reviewers) some guidance documents note that a team is not strictly required for umbrella reviews. That said, best practice still calls for independent dual-screening to minimise selection bias, and independent data extraction for key statistics.
As for speed: yes, umbrella reviews are substantially faster than de novo systematic reviews. Professional teams can typically complete an umbrella review in 6–10 weeks. The time savings come from not having to search for, screen, and extract data from thousands of primary studies, only from a smaller pool of existing SRMAs. However, the statistical re-analysis step can add significant time for large umbrella reviews covering many associations.
Two of the systematic reviews I want to include reach completely opposite conclusions on the same question. What do I do?
Contradictory conclusions across SRMAs on the same question are not a failure. In fact, exposing and explaining these contradictions is one of the most valuable contributions an umbrella review can make.
Common explanations include: different eligibility criteria (populations, comparators, or outcome definitions); different search dates; different statistical models (fixed-effect vs. random-effects); different quality thresholds; and language or publication bias differences in search strategies.
Report the conflicting SRMAs side by side in your results tables, explain the likely sources of divergence in your discussion, and note what additional primary research would be needed to resolve the contradiction.
Can an umbrella review include non-intervention reviews, like reviews of prevalence, diagnostic accuracy, or qualitative studies?
Yes, though most of the established methodology has been developed and applied in the intervention and epidemiological association contexts, umbrella reviews are not inherently restricted to these domains.
Umbrella reviews of observational/epidemiological SRMAs are now very common. Umbrella reviews of diagnostic accuracy SRMAs are feasible but require specialised statistical methods (bivariate/SROC models). Umbrella reviews including qualitative systematic reviews are possible but remain methodologically less standardised. The JBI Manual for Evidence Synthesis provides specific guidance for umbrella reviews incorporating qualitative evidence alongside quantitative SRMAs.
When an umbrella review and a single systematic review both exist on the same question, which should I cite in my paper or guideline?
When both exist, the umbrella review is generally preferred as the citation for evidence-based decision-making because it
- consolidates findings from multiple SRMAs,
- addresses the overlap problem,
- applies standardised quality assessment, and
- provides a more complete and reliable picture of the evidence than any single SRMA can.
The important caveat: an umbrella review is only as good as the quality and completeness of the SRMAs it synthesises. If the best available SRMA covers more recent trials than an older umbrella review, citing the updated SRMA alongside the umbrella review may be appropriate until an updated umbrella review is published.
References
- Ten simple rules for conducting umbrella reviews. https://pmc.ncbi.nlm.nih.gov/articles/PMC10270421/
- Types of Reviews: Umbrella Reviews. https://laneguides.stanford.edu/types-of-reviews/umbrella
- Umbrella reviews: a methodological guide. https://academic.oup.com/eurjcn/article/24/6/996/7974731
- How to Conduct Umbrella Review in Education? A Step-by-Step Methodological Guide Through a Case Study in Digital Diaries. https://journals.sagepub.com/doi/10.1177/20965311261421966

Comment