How to Conduct and Report a Systematic Review: Examples, Tools, Templates

Get Published
Getting your Trinity Audio player ready...
Summarize this Blog with AI

Contents

Glossary of Key Terms

TermDefinition
Systematic ReviewA structured synthesis of all available evidence on a defined research question, using pre-specified, reproducible methods to identify, select, appraise, and summarize studies.
Meta-AnalysisA statistical technique used within or alongside a systematic review to pool quantitative data from multiple studies into a single effect estimate.
PICO/PICOSA framework for structuring a clinical research question: Population, Intervention, Comparison, Outcome, and (optionally) Study design.
PRISMAPreferred Reporting Items for Systematic Reviews and Meta-Analyses. The leading reporting guideline for systematic reviews, comprising a checklist and flow diagram.
PROSPEROInternational Prospective Register of Systematic Reviews. A publicly searchable registry for systematic review protocols, hosted by the University of York.
ProtocolA pre-specified plan documenting the rationale, objectives, methods, and analysis approach of a systematic review, ideally registered before the review begins.
Inclusion/Exclusion CriteriaPre-defined rules that determine which studies are eligible for the review and which are not, based on features such as population, design, outcome, and date.
Grey LiteratureResearch produced outside of traditional academic publishing channels, including government reports, conference abstracts, dissertations, and unpublished studies.
Critical AppraisalA systematic evaluation of a study’s validity, results, and relevance, using structured tools to assess risk of bias and methodological quality.
Risk of BiasThe degree to which flaws in study design, conduct, or reporting may distort the results away from the true effect.
Data ExtractionThe process of systematically retrieving pre-specified information from each included study into a standardized form.
Narrative SynthesisA qualitative approach to combining study findings using text and tables when statistical pooling is not appropriate.
HeterogeneityVariability among studies in their design, populations, interventions, or results, which affects whether meta-analysis is appropriate.
Inter-Rater ReliabilityA measure of agreement between two or more independent reviewers at screening or data extraction stages, commonly expressed using Cohen’s Kappa.
Publication BiasThe tendency for studies with positive or statistically significant results to be published more readily than studies with null or negative findings.
Scoping ReviewA type of evidence synthesis that maps the available evidence on a broad topic, identifying key concepts and gaps without formal quality appraisal.
Umbrella ReviewA review of reviews that synthesizes multiple systematic reviews on a related topic to provide a higher-level evidence overview.

Key Takeaways

  • A systematic review is the gold standard for synthesizing evidence: it uses pre-specified, reproducible methods to minimize bias and provide a reliable summary of a research question.
  • Register before you search: submitting a protocol to PROSPERO (or a comparable registry) before data collection begins increases transparency and reduces reporting bias.
  • Team composition matters: at least two independent reviewers are required for screening and data extraction; a specialist librarian should be involved in search design.
  • Use PICO(S) to define your question: a well-formed question anchors every downstream decision, from database selection to inclusion criteria.
  • Search comprehensively: multiple databases, grey literature, citation chaining, and trial registries are needed to approach completeness.
  • Document every decision: a PRISMA flow diagram and a clearly reasoned exclusion log allow readers to replicate and audit the review.
  • Critical appraisal is non-negotiable: quality assessment does not exclude studies automatically but informs how their findings are weighted in synthesis.
  • Choose the right synthesis method: meta-analysis is appropriate only when studies are sufficiently similar; narrative synthesis or vote-counting may be more honest alternatives.
  • Report transparently using PRISMA: the PRISMA checklist and flow diagram are the minimum standard; extensions exist for specific review types.
  • A systematic review takes time: from protocol development to publication, the process typically spans six months to two years.

What Is a Systematic Review?

A systematic review is a rigorous, pre-planned synthesis of all available evidence on a specific research question. Unlike a narrative or traditional literature review, every stage of a systematic review, from the search strategy to the inclusion decisions and quality assessments, is conducted using explicit, reproducible methods. This transparency is what distinguishes systematic reviews from other forms of evidence summary and elevates them to the top of the evidence hierarchy.

Systematic reviews serve several important purposes:

  • Summarizing what is known about a clinical, policy, or scientific question
  • Identifying gaps in the existing evidence base that warrant future research
  • Providing a foundation for clinical guidelines, public health recommendations, and policy decisions
  • Resolving apparent conflicts between individual studies
  • Reducing duplication of research effort by mapping existing work

How Does a Systematic Review Differ from Other Review Types?

It uses a registered protocol, exhaustive searching, and formal quality appraisal, whereas other review types may not. The table below compares the most common review formats.

Review TypeQuestion FocusQuality Appraisal?Typical Use Case
Systematic ReviewSpecific, narrowYes, formalGuideline development, evidence synthesis
Meta-AnalysisSpecific, narrowYesQuantitative pooling of effect sizes
Scoping ReviewBroad, exploratoryUsually notMapping a new field, identifying gaps
Rapid ReviewSpecificAbbreviatedTime-sensitive policy decisions
Umbrella ReviewBroadYes, of reviewsSynthesis of multiple systematic reviews
Narrative ReviewVariableNoBackground sections, educational overviews

Is a Systematic Review the Right Choice for Your Project?

Yes, when you need to inform policy or practice with the strongest possible evidence. However, a systematic review is not always appropriate. Consider a systematic review when:

  • A clearly defined, answerable question exists
  • Multiple primary studies on the topic have likely been published
  • An evidence synthesis could meaningfully inform a decision
  • Sufficient time and team resources are available (typically six months to two years)

A scoping review, rapid review, or narrative review may be more appropriate when the field is emergent, the question is broad, or time constraints are severe.

Assembling the Review Team

Systematic reviews require multi-disciplinary expertise. No single person can reliably conduct a rigorous systematic review alone: independent duplication at key stages is both a methodological safeguard and a requirement of most reporting standards.

Recommended team composition:

RoleCore ResponsibilitiesEssential?
Principal Investigator / Lead ReviewerOversees the review, leads protocol development, makes final decisions on disagreementsYes
Second ReviewerIndependently screens titles, abstracts, and full texts; independently extracts data; resolves inter-rater disagreementsYes (minimum two reviewers required)
Subject Librarian or Information SpecialistDesigns and executes the search strategy, selects databases, handles grey literature, peer-reviews the search (PRESS)Strongly recommended
Statistician or MethodologistAdvises on meta-analysis methods, heterogeneity analysis, and sensitivity analysesRequired if meta-analysis planned
Content Expert(s)Provides domain knowledge during protocol development, inclusion decisions, and interpretationRecommended

All team members should agree on roles, timelines, and authorship criteria at the outset. Where there is genuine disagreement between reviewers that cannot be resolved through discussion, a third reviewer or arbitrator should be pre-specified in the protocol.

Formulating Your Research Question

A precise, answerable research question is the foundation of every subsequent decision in the review process, from which databases to search to which outcomes to measure. Vague questions lead to irreproducible searches and unmanageable result sets.

What is the PICO(S) Framework?

PICO(S) is the most widely used framework for structuring questions in health and social science reviews:

LetterElementGuiding Question
PPopulationWho are the participants? (e.g., adults with Type 2 diabetes, children aged 5-12)
IInterventionWhat is the exposure or intervention of interest? (e.g., metformin, cognitive behavioral therapy)
CComparisonWhat is the comparator? (e.g., placebo, usual care, alternative treatment)
OOutcomeWhat are the outcomes of interest? (e.g., HbA1c reduction, quality of life, mortality)
SStudy DesignWhich study designs will be included? (e.g., randomized controlled trials only, all observational designs)

Example

A well-formed PICO question might read: “In adults with chronic low back pain (P), does mindfulness-based stress reduction (I) compared with physiotherapy (C) reduce pain intensity and disability at 12 months (O) in randomized controlled trials (S)?”

Alternatives to PICOS

Alternative frameworks include PICo (for qualitative research, replacing Intervention with phenomenon of Interest), SPICE (Setting, Perspective, Intervention, Comparison, Evaluation), and SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type). The choice of framework should match the nature of the research question and the expected evidence base.

Qualitative & phenomenological: PICo

LetterElementMeaning / example
PPopulationWho: e.g. adults living with HIV
Iphenomenon of InterestWhat experience or concept: e.g. stigma experiences
CoContextWhere/when: e.g. rural sub-Saharan Africa

Best for: qualitative reviews exploring lived experience, attitudes, or meaning.

Health services & policy: SPICE

LetterElementMeaning / example
SSettingContext of delivery: e.g. primary care clinics
PPerspectiveWhose viewpoint: e.g. patients, clinicians
IInterventionWhat is being evaluated: e.g. telehealth consultations
CComparisonVersus what: e.g. in-person visits
EEvaluationHow success is measured: e.g. patient satisfaction scores

Best for: service delivery, policy, or implementation questions where setting and stakeholder perspective are central.

Qualitative & mixed methods: SPIDER

LetterElementMeaning / example
SSampleWho: e.g. postpartum women
PIPhenomenon of InterestWhat experience: e.g. breastfeeding decisions
DDesignStudy design: e.g. interviews, focus groups
EEvaluationOutcome or construct: e.g. self-efficacy
RResearch typeQualitative, quantitative, or mixed

Best for: qualitative or mixed-methods reviews; more sensitive than PICOS for qualitative searches.

Broad scoping & social science: ECLIPSE

LetterElementMeaning / example
EExpectationWhat improvement is sought: e.g. reduce waiting times
CClient groupWho benefits: e.g. elderly patients
LLocationWhere: e.g. NHS outpatient departments
IImpactDesired change: e.g. improved throughput
PProfessionalsWho delivers it: e.g. triage nurses
SEServiceType of service: e.g. emergency care

Best for: health management, service improvement, and organizational research questions.

Diagnosis & test accuracy: PIRD

LetterElementMeaning / example
PPopulationWho: e.g. adults with suspected PE
IIndex testTest under evaluation: e.g. D-dimer assay
RReference standardGold standard: e.g. CT pulmonary angiography
DDiagnosisTarget condition: e.g. pulmonary embolism

Best for: diagnostic accuracy reviews; aligns with the QUADAS-2 appraisal tool.

Prognosis & prediction: PICOTS

LetterElementMeaning / example
PPopulationWho: e.g. post-MI patients
IInterventionPrognostic factor or treatment: e.g. statin therapy
CComparisonComparator: e.g. no statin
OOutcomeEvent of interest: e.g. 5-year mortality
TTimeframeFollow-up period: e.g. 5 years post-discharge
SSettingCare context: e.g. community cardiology

Best for: prognosis reviews and intervention reviews where timing and setting are critical moderators.

Summary of PICOS Alternatives

  • PICo strips out the comparison element entirely, making it suitable for qualitative syntheses where there is no “intervention vs. control” logic.
  • SPICE foregrounds the stakeholder perspective and the setting, which makes it popular in health services and policy research.
  • SPIDER adds study design as an explicit element and uses “Sample” instead of “Population,” which produces more sensitive searches in qualitative literature.
  • ECLIPSE is organized around organizational expectations and service delivery rather than clinical intervention, useful for management and improvement research.
  • PIRD is purpose-built for diagnostic test accuracy reviews and maps directly onto the QUADAS-2 appraisal tool.
  • PICOTS extends PICOS with a timeframe and a setting element, which is particularly useful when prognosis or follow-up duration is a core part of the question.

The practical rule: if your question involves an experience or meaning, reach for PICo or SPIDER; if it involves a diagnostic test, use PIRD; if it involves a service or organization, try SPICE or ECLIPSE; and if timing is central, add the T and S of PICOTS.

Checking for Existing Reviews

Before finalizing your question, search PROSPERO, the Cochrane Database of Systematic Reviews, and databases such as PubMed to determine whether a recent, high-quality systematic review on the same question already exists. If one does:

  • Consider whether an update is justified (new evidence, narrower scope, different population)
  • Clearly articulate how your review differs from existing work
  • Avoid contributing to research waste by duplicating recent high-quality reviews

Protocol Development and Registration

A protocol is a detailed, pre-specified plan for your systematic review. Writing and registering the protocol before searching begins is one of the most important steps in ensuring the integrity and transparency of the review.

Why Register a Protocol?

Registration reduces bias, increases transparency, and signals to the research community that your review is underway. A pre-registered protocol:

  • Prevents post-hoc changes to outcomes or methods that could inflate apparent significance
  • Allows readers and editors to identify deviations from the planned methodology
  • Reduces duplication by alerting others to ongoing reviews
  • Is increasingly required by journals as a condition of publication

Where and When to Register

PROSPERO (hosted by the Centre for Reviews and Dissemination, University of York) is the leading international registry for health-related systematic reviews. Registration must be completed before data extraction begins; ideally it should occur before screening starts. Other registries include OSF Registries (for social science and psychology), the Cochrane editorial management system (for Cochrane protocols), and field-specific registries.

What Should a Protocol Include?

A rigorous protocol should address the following elements:

  • Background and rationale for the review
  • Research question stated using PICO or equivalent framework
  • Eligibility criteria for inclusion and exclusion of studies
  • Databases to be searched and planned search strategy
  • Grey literature sources to be consulted
  • Process for study selection (number of reviewers, software to be used, handling of disagreements)
  • Data extraction template and planned data items
  • Risk of bias assessment tool and process
  • Synthesis approach (narrative synthesis or meta-analysis) and any planned subgroup or sensitivity analyses
  • Any planned assessment of publication bias

Any deviations from the registered protocol that occur during the review process should be transparently reported and justified in the final manuscript.

Sample PROSPERO registered protocol

You can download a sample PROSPERO registered protocol for a systematic review in endocrinology here:

The template covers all nine major sections of a PROSPERO-compliant protocol:

  1. Administrative details: title, registration, versioning, authors, funding, and conflicts of interest
  2. Background and rationale: the clinical problem, evidence gap, and stated objectives
  3. Research question (PICOS): each element populated for the GLP-1 RA vs. insulin in T2DM + CKD question
  4. Eligibility criteria: explicit inclusion and exclusion rules, with language and date policy
  5. Search strategy: databases, grey literature sources, and a sample MEDLINE string with MeSH and free-text terms
  6. Study selection: the two-stage screening process, software, disagreement handling, and inter-rater reliability
  7. Data extraction: form development, piloting, and a structured list of data items by category
  8. Risk of bias: RoB 2.0 for RCTs, ROBINS-I for observational studies, with traffic-light plot reporting
  9. Synthesis and reporting: narrative synthesis, meta-analysis conditions, heterogeneity thresholds, four pre-specified subgroups, three sensitivity analyses, publication bias assessment, GRADE, and dissemination plan
SectionPROSPERO fieldKey content (this protocol)Notes / guidance
1. AdminTitle, registration, authors, amendments, funding, COIGLP-1 RA vs insulin in T2DM + CKD; PROSPERO pre-registration; multi-disciplinary team including librarian and statisticianRegister before screening begins; log all amendments with date and rationale; declare all funding and conflicts
2. IntroRationale and objectives537 million T2DM worldwide; CKD complicates glycemic management; no existing SR directly comparing GLP-1 RA vs insulin in CKD 3–5; primary objective: HbA1c reduction at 6 and 12 monthsPreliminary search of PROSPERO and Cochrane must precede this section; state the gap explicitly; list primary and secondary objectives separately
3a. PICOSEligibility criteriaP: Adults, T2DM, CKD stage 3a–5 non-dialysis (eGFR less than 60). I: Any GLP-1 RA (semaglutide, liraglutide, dulaglutide, exenatide, tirzepatide). C: Any insulin regimen (basal, basal-bolus, premixed). O: HbA1c; weight; eGFR; UACR; hypoglycemia; SAEs; mortality. S: RCTs and prospective cohorts; minimum 12 weeks; 2005 to presentSpecify inclusion and exclusion criteria with equal precision; avoid post-hoc additions; state language and date policy explicitly
3b–c. SearchInformation sources and search strategyMEDLINE, Embase, CENTRAL, CINAHL, Web of Science; ClinicalTrials.gov, WHO ICTRP, ADA/EASD/ASN conference abstracts; backward and forward citation chasing; PRESS peer review of strategyMinimum 3 databases; grey literature is mandatory to reduce publication bias; provide full strategy for at least one database verbatim; rerun search before submission
3d–f. Selection, extraction, RoBStudy selection, data collection, risk of biasDual independent screening in Covidence; Cohen’s Kappa reported; standardized extraction form piloted on 3 studies; RoB 2.0 for RCTs, ROBINS-I for cohorts; traffic-light plots via robvisTwo independent reviewers are non-negotiable; pre-specify arbitration process; pilot extraction form; do not exclude studies solely on RoB; report domain-level judgments
3g–4. Synthesis and reportingData synthesis, subgroups, sensitivity, GRADE, disseminationNarrative synthesis (SWiM); random-effects meta-analysis if 2 or more homogeneous studies; I² thresholds defined; 5 pre-specified subgroups; 4 sensitivity analyses; funnel plots if 10 or more studies; GRADE SoF table; PRISMA 2020 reporting; open data deposit via OSF or ZenodoJustify pooling decision prospectively; pre-specify all subgroup and sensitivity analyses to avoid data dredging; GRADE is expected by most journals; share data and code on publication

Conducting the Literature Search

The literature search is the engine of a systematic review. Its goal is to identify every study that might meet the inclusion criteria, regardless of where it was published, in what language, or whether its findings were positive or negative. A comprehensive, reproducible, and well-documented search is fundamental to the validity of the review’s conclusions.

Selecting Databases

No single database covers all relevant literature. A minimum of two to three major bibliographic databases should be searched; more are typically needed for a complete review. Recommended databases by field include:

DatabasePrimary CoverageRelevant Fields
MEDLINE / PubMedBiomedical and life sciencesMedicine, nursing, pharmacy, public health
EmbaseBiomedical, pharmacologicalMedicine, drug research, clinical trials
CINAHLNursing and allied healthNursing, physiotherapy, occupational therapy
PsycINFOPsychology and behavioral sciencesMental health, cognitive science, education
Cochrane CENTRALControlled trialsAll clinical intervention research
Web of ScienceMultidisciplinaryScience, social science, arts and humanities
ScopusMultidisciplinaryScience, technology, social science
ERICEducationPedagogy, educational policy, learning science

Designing the Search Strategy

The search strategy translates the PICO question into database-searchable syntax. Key principles include:

  • Use controlled vocabulary: Subject headings (MeSH in MEDLINE, EMTREE in Embase) capture concepts regardless of exact wording. Always combine subject headings with free-text keywords.
  • Apply Boolean operators: Use AND to combine different concepts (e.g., population AND intervention), and OR to combine synonyms and related terms within a concept.
  • Use truncation and wildcards: These retrieve multiple word endings from a root (e.g., “therap*” retrieves therapy, therapist, therapeutic).
  • Avoid over-restriction: Do not apply date limits, language limits, or study design filters unless specifically justified; these can systematically exclude relevant evidence.
  • Document every component: Record the exact search string run in each database, including the date of the search, to allow replication.
  • Seek peer review of the search: The Peer Review of Electronic Search Strategies (PRESS) checklist provides a framework for a second librarian or information specialist to review the strategy before it is run.

Searching Grey Literature and Supplementary Sources

Published, peer-reviewed studies represent only a portion of the available evidence. Failure to include grey literature can artificially inflate effect size estimates due to publication bias. Grey literature sources include:

  • Trial and study registries: ClinicalTrials.gov, WHO International Clinical Trials Registry Platform, EU Clinical Trials Register
  • Government and regulatory agency websites (e.g., CDC, FDA, EMA, NICE)
  • Conference proceedings and abstract books
  • Dissertations and theses (ProQuest Dissertations and Theses, EThOS)
  • Preprint servers (medRxiv, bioRxiv, PsyArXiv) for recent, un-peer-reviewed work
  • Google Scholar for supplemental discovery (not as a primary database due to lack of reproducible search syntax)
  • Reference lists of included studies (backward citation chasing)
  • Articles citing key included studies (forward citation chasing)
  • Contact with study authors or key experts to identify unpublished or ongoing work

Common Errors in Literature Search

ErrorWhy it mattersExampleHow to avoid it
Searching too few databasesEach database has unique coverage; relying on one source misses a significant proportion of relevant studiesSearching only PubMed for a nursing intervention review, missing studies indexed exclusively in CINAHLSearch a minimum of three major databases plus at least one grey literature source; justify every database included and excluded in the methods section
Poorly constructed Boolean logicIncorrect use of AND/OR inverts the intended search, either inflating results uncontrollably or excluding entire concept blocksUsing AND between synonyms for the same concept (e.g., “liraglutide AND semaglutide”) instead of OR, effectively limiting results to studies mentioning both drugsMap each PICOS concept to a separate block; combine synonyms within a block using OR; combine blocks using AND; have a second person trace the logic before running
Missing controlled vocabularyFree-text searching alone misses records indexed under a subject heading but not using your exact keywords in the title or abstractSearching “heart attack” as free text but omitting the MeSH term “Myocardial Infarction”Combine MeSH (MEDLINE), EMTREE (Embase), and CINAHL Subject Headings with free-text synonyms for every major concept; check the database thesaurus before finalizing the strategy
Inadequate synonym coverageConcepts are expressed with multiple spellings, abbreviations, and regional variants; missing these creates gaps in retrievalSearching “type 2 diabetes” but omitting “T2DM,” “NIDDM,” “non-insulin-dependent diabetes,” and “adult-onset diabetes”Use truncation and wildcards (e.g., diabet*); consult existing systematic reviews on the topic to harvest synonyms; run the strategy past a subject librarian
Applying overly restrictive filtersDate limits, language filters, and study-design filters applied at the search stage silently exclude relevant records before a human ever screens themAdding “English only” and “2015 to present” filters to the database search, missing older landmark studies and non-English trialsApply filters at the screening stage rather than the search stage; if restrictions are unavoidable, justify them explicitly in the protocol and report them as a limitation
Neglecting grey literatureUnpublished or non-commercially published studies skew disproportionately toward null or negative results; omitting them inflates apparent effect sizesSearching only peer-reviewed databases and missing a large government-funded trial reported only on ClinicalTrials.gov and in a conference abstractSearch trial registries (ClinicalTrials.gov, WHO ICTRP), regulatory databases, conference abstracts, and dissertations; contact study authors for unpublished data
Failing to peer-review the search strategySearch strategies in published systematic reviews have error rates above 70%; logical, spelling, and truncation errors go undetected without independent reviewA truncation symbol entered incorrectly (e.g., “diabet:” instead of “diabet*” in PubMed) silently retrieves zero results for an entire concept block without triggering an error messageApply the PRESS 2015 checklist; have a second independent librarian or information specialist review every strategy before it is run; document the peer review as a supplementary file

Study Selection

Study selection is the process of applying your pre-specified eligibility criteria to the results of the literature search to arrive at the set of studies to be included in the review. It is conducted in two sequential stages.

Stage 1: Title and Abstract Screening

All records retrieved from the search are screened at the title and abstract level by at least two independent reviewers. Records are classified as include, exclude, or uncertain. Any record that cannot be confidently excluded at this stage should be retained for full-text review. Disagreements between reviewers should be resolved through discussion, with escalation to a third reviewer if consensus cannot be reached.

Stage 2: Full-Text Eligibility Assessment

Records retained from Stage 1 are retrieved in full text and assessed against the complete eligibility criteria. Reasons for exclusion at this stage must be recorded for each excluded study and reported in the PRISMA flow diagram. Records should be managed using dedicated software:

  • Covidence: The most widely used tool for systematic review management; supports dual screening, full-text review, and data extraction.
  • Rayyan: A free web-based tool suitable for title and abstract screening with a blinding feature.
  • EPPI-Reviewer: Supports complex reviews and machine-learning-assisted screening.
  • Endnote / Zotero: Reference management tools often used for deduplication before screening begins.

Measuring Agreement Between Reviewers

Inter-rater reliability should be calculated and reported at each screening stage, most commonly using Cohen’s Kappa coefficient. Kappa values are interpreted as:

Kappa ValueLevel of Agreement
Less than 0.20Slight agreement
0.21 to 0.40Fair agreement
0.41 to 0.60Moderate agreement
0.61 to 0.80Substantial agreement
0.81 to 1.00Almost perfect agreement

A Kappa of 0.61 or above is generally considered acceptable for systematic review purposes, though the threshold depends on the consequences of misclassification for the specific review.

The PRISMA Flow Diagram

The PRISMA flow diagram provides a visual audit trail of the study selection process. It must be included in the final report and should document:

  • Total records identified from each database and additional source
  • Number of duplicates removed
  • Number screened at title and abstract stage and number excluded
  • Number assessed for full-text eligibility and number excluded with reasons
  • Number of studies included in the final review

Sample PRISMA Flow Diagram

Data Extraction

Data extraction is the systematic retrieval of pre-specified information from each included study. The goal is to capture all data needed for synthesis, quality appraisal, and reporting in a standardized, reproducible manner.

Designing the Data Extraction Form

A data extraction form (also called a data collection form or charting form) should be piloted on two to three included studies before being used in full. The form should be developed by the review team collaboratively and should capture the following categories of information:

CategoryExample Data Items
BibliographicAuthors, year, journal, country of publication, funding source
Study DesignDesign type, randomization method, blinding, allocation concealment
PopulationSample size, age, sex/gender, diagnosis or condition, inclusion and exclusion criteria
Intervention / ExposureType, dose, duration, frequency, delivery setting, comparator details
OutcomesOutcome measures, measurement tools, time points, follow-up duration
ResultsEffect sizes, confidence intervals, p-values, sample sizes per group, attrition
Quality / BiasRisk of bias domain ratings (from formal appraisal tool)

Data extraction should be performed independently by two reviewers, with discrepancies resolved through discussion or referral to a third reviewer. Where data are ambiguous, missing, or inconsistently reported, authors of the original studies should be contacted for clarification. All decisions should be documented.

Common Difficulties in Data Extraction & Next Steps

DifficultyWhy it occursHow to resolve it
Missing or unreported dataAuthors omit standard deviations, denominators, or subgroup breakdowns, particularly in older publicationsContact the corresponding author with a specific data request; use the standard error or confidence interval to back-calculate SD; note the gap transparently and assess its impact on synthesis
Inconsistent outcome definitions across studiesDifferent studies measure the same construct with different tools, time points, or thresholds (e.g., hypoglycemia defined as plasma glucose below 3.0 vs. 3.9 mmol/L)Pre-specify how definitional variants will be handled in the protocol; group studies by outcome definition in the narrative synthesis; conduct sensitivity analyses restricted to studies using the same definition
Multiple publications from the same studyA single trial generates a primary paper, secondary analyses, and long-term follow-up reports, inflating apparent study counts and risking double-counting of participantsLink all reports to a single study record during deduplication; extract data from the most complete report and supplement with secondary papers; document all linked publications in the included-studies table
Discrepancies between two independent extractorsReviewers interpret ambiguous text, tables, or figures differently, producing conflicting values for the same data itemResolve through structured discussion referencing the original text; escalate to a pre-specified third reviewer if consensus is not reached; calculate and report inter-rater agreement after a pilot extraction exercise
Data presented only in figures or graphsNumerical values are embedded in bar charts, Kaplan-Meier curves, or forest plots without accompanying tablesUse validated digitizing software (e.g., WebPlotDigitizer) to extract approximate values; document the extraction method and its inherent imprecision; contact authors for the underlying data
Unit or scale inconsistenciesStudies report the same outcome in different units (e.g., eGFR in mL/min vs. mL/min/1.73 m², HbA1c in % vs. mmol/mol) or use different Likert scale directionsConvert all values to a common unit before pooling using validated conversion formulas; document every conversion in the extraction form; flag studies where conversion introduces meaningful imprecision

Critical Appraisal and Quality Assessment

Every included study must be assessed for methodological quality and risk of bias. Critical appraisal is not a gate-keeping mechanism to exclude imperfect studies; rather, it provides the information needed to weight evidence appropriately and identify sources of variability in the review findings.

What Tools Should Be Used for Critical Appraisal?

Tool selection should be matched to the study design and pre-specified in the protocol. The most widely used tools are:

ToolStudy DesignKey Domains Assessed
Cochrane RoB 2.0Randomized controlled trialsRandomization, deviations from intervention, missing outcome data, measurement, selective reporting
ROBINS-INon-randomized studies of interventionsConfounding, selection bias, classification of interventions, missing data, outcome measurement, selective reporting
QUADAS-2Diagnostic accuracy studiesPatient selection, index test, reference standard, flow and timing
CASP Qualitative ChecklistQualitative studiesResearch design, sampling, data collection, reflexivity, ethical issues, rigor
Newcastle-Ottawa ScaleCohort and case-control studiesSelection, comparability, exposure or outcome assessment
AMSTAR-2Systematic reviews (for umbrella reviews)Protocol, search, study selection, data extraction, risk of bias, synthesis

How to Handle Studies with High Risk of Bias

High risk of bias does not automatically disqualify a study from inclusion. The appropriate response depends on the overall evidence base:

  • Include the study but note the limitations in the narrative synthesis
  • Conduct a sensitivity analysis excluding high-risk studies to assess their influence on pooled estimates
  • Use the Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework to formally downgrade the overall certainty of evidence when risk of bias is a concern across included studies

Synthesizing Results

Synthesis brings together the data extracted from individual studies to produce conclusions about the body of evidence as a whole. Every systematic review must include at least a narrative synthesis; quantitative synthesis (meta-analysis) is an additional option when studies are sufficiently similar.

Narrative Synthesis

Narrative synthesis uses structured text and tables, rather than statistical pooling, to summarize and integrate findings. It is appropriate when:

  • Studies are too heterogeneous in design, population, or outcome to be pooled
  • Outcomes are qualitative or not amenable to statistical combination
  • The number of included studies is too small for robust pooling

A rigorous narrative synthesis should not be a simple list of individual study results. It should actively compare and contrast studies, identify patterns and divergences, consider how context, population, or design differences may explain variation, and draw evidence-based conclusions about the overall state of knowledge.

When Is Meta-Analysis Appropriate?

Meta-analysis is appropriate when included studies are clinically and methodologically similar enough that pooling their results is a meaningful scientific act rather than a statistical convenience. Before conducting a meta-analysis, address the following:

  • Clinical homogeneity: Are the populations, interventions, comparators, and outcomes sufficiently similar?
  • Statistical heterogeneity: Use the I² statistic (values above 50% indicate substantial heterogeneity) and the Chi² Q-test to assess variability in effects between studies.
  • Model selection: A random-effects model is generally preferred when studies are drawn from different settings; a fixed-effect model assumes all studies estimate one true effect.
  • Subgroup analyses: Pre-specify any planned subgroup analyses (e.g., by age group, dose, or risk of bias) to explore sources of heterogeneity.
  • Sensitivity analyses: Assess the robustness of pooled estimates by repeating analyses with high-risk studies excluded, or using alternative statistical models.

Assessing Publication Bias

Publication bias occurs when studies with positive or statistically significant results are more likely to be published than those with null findings, leading to an overestimation of effects in the published literature. Methods for assessing publication bias include:

  • Funnel plots: A scatter plot of effect size against a measure of study precision; asymmetry in the funnel suggests possible publication bias. Require at least 10 studies for meaningful interpretation.
  • Egger’s test: A statistical test for funnel plot asymmetry.
  • Trim and fill method: An approach that estimates the number of missing studies and adjusts the pooled estimate accordingly.
  • Comprehensive grey literature searching: The most proactive safeguard against publication bias is a thorough search that includes unpublished and grey literature from the outset.

Assessing the Overall Certainty of Evidence Using GRADE

The GRADE (Grading of Recommendations Assessment, Development and Evaluation) framework provides a systematic approach for rating the overall certainty of a body of evidence for each outcome. Evidence begins at a rating based on study design and may be downgraded or upgraded:

GRADE RatingStarting PointCan Be Downgraded For
HighRandomized controlled trialsRisk of bias, inconsistency, indirectness, imprecision, publication bias
ModerateDowngraded RCTs or upgraded observational studiesSame five factors
LowObservational studiesSame five factors
Very LowDowngraded observational studies or case seriesSame five factors

Reporting the Systematic Review

Clear, transparent reporting is essential so that readers can assess the validity of the review, replicate the methods, and apply the findings appropriately. The primary reporting standard for systematic reviews is the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement.

The PRISMA Statement

The PRISMA 2020 update comprises a 27-item checklist organized around the following sections of a systematic review report:

  • Title: Identify the document as a systematic review (and meta-analysis where applicable).
  • Abstract: Provide a structured summary covering background, objectives, data sources, eligibility criteria, risk of bias assessment, synthesis methods, results, limitations, conclusions, and registration details.
  • Introduction: Describe the rationale for the review and explicitly state the research question.
  • Methods: Detail eligibility criteria, information sources, search strategy (including the full search string for at least one database), selection process, data extraction process, risk of bias assessment methods, and synthesis methods.
  • Results: Present the PRISMA flow diagram, characteristics of included studies, risk of bias assessment results, and synthesized findings.
  • Discussion: Interpret findings in context, discuss limitations, and present conclusions.
  • Registration: Provide the protocol registration number and any deviations from the registered protocol.

PRISMA extensions are available for specific review types, including PRISMA-P (protocols), PRISMA-NMA (network meta-analysis), PRISMA-IPD (individual participant data), PRISMA-DTA (diagnostic test accuracy), and PRISMA-ScR (scoping reviews).

Key Sections of the Methods Chapter

The methods section is the most critical component for reproducibility. It should allow an independent team to replicate the review exactly. Checklist for a complete methods section:

  • Eligibility criteria stated explicitly (population, intervention, comparator, outcomes, study design, language, date range if applicable)
  • Complete search strategy for at least one database presented verbatim
  • All databases and supplementary sources listed, with dates searched
  • Screening process described (number of reviewers, software, handling of disagreements)
  • Data extraction process described (form used, number of reviewers, piloting)
  • Risk of bias tool named and process for assessment described
  • Synthesis approach justified (narrative synthesis or meta-analysis, with statistical methods specified)
  • Any planned subgroup analyses, sensitivity analyses, or assessment of publication bias described
  • GRADE assessment planned or reasons for exclusion of GRADE stated

Dissemination and Publication

Completing a systematic review without disseminating the findings wastes the effort invested and denies the research community access to a potentially important evidence synthesis. Dissemination strategies should be considered from the outset.

Choosing a Target Journal

Selection of a target journal should be guided by:

  • The subject area and audience most likely to use the findings (clinicians, policymakers, researchers)
  • Whether the journal accepts systematic reviews and has published similar work previously
  • Impact factor and indexing in major bibliographic databases
  • Open-access requirements of funders and institutions
  • Adherence to PRISMA as a condition of submission

Journals that specialize in systematic reviews include Systematic Reviews (BioMed Central), the Cochrane Database of Systematic Reviews, and the Campbell Collaboration library. Many field-specific journals also publish high-quality systematic reviews.

Handling Peer Review

Peer reviewers of systematic reviews commonly focus on:

  • Adequacy and comprehensiveness of the search strategy
  • Consistency of inclusion criteria application
  • Appropriateness of risk of bias tool selection and application
  • Justification of the synthesis approach
  • Transparency of deviation from the registered protocol

Responses to reviewers should address each comment systematically, referencing specific changes made in the manuscript with page and line numbers.

Common Journal Peer Reviewer Concerns for Systematic Reviews.

Reviewer concernUnderlying reasonHow to address it
Search strategy is incomplete or poorly documentedReviewers cannot assess reproducibility if databases, date ranges, or the full search string are absent or vagueProvide the verbatim search string for every database searched as a supplementary file; list all databases, grey literature sources, and date ranges; include the PRESS peer review form
No protocol registration or late registrationAbsence of a pre-registered protocol raises suspicion of outcome switching or post-hoc methodological decisionsRegister in PROSPERO before screening begins; cite the registration number in the abstract and methods; explain and justify any deviations from the registered protocol transparently
Single reviewer used for screening or extractionUsing one reviewer introduces subjective bias that undermines the core methodological advantage of a systematic reviewReport that two independent reviewers conducted all screening and extraction stages; provide Cohen’s Kappa at each stage; describe the arbitration process for disagreements
Risk of bias assessment is superficial or tool is mismatchedApplying the wrong tool (e.g., Newcastle-Ottawa for RCTs) or reporting only overall judgments without domain-level reasoning weakens appraisal credibilitySelect and justify the tool for each study design (RoB 2.0 for RCTs, ROBINS-I for observational studies); report domain-level judgments for every included study; present traffic-light plots
Meta-analysis conducted despite substantial heterogeneityPooling clinically or statistically heterogeneous studies produces a misleading average that may obscure more than it revealsReport I² with confidence intervals and the Chi² Q-test; justify the decision to pool or not pool; explore heterogeneity through pre-specified subgroup and sensitivity analyses; consider narrative synthesis if I² exceeds 75%
PRISMA flow diagram or checklist is incompleteMissing counts, absent exclusion reasons, or an unpopulated checklist prevent readers from auditing the selection processProvide a fully populated PRISMA 2020 flow diagram with reasons for every full-text exclusion; submit the completed PRISMA checklist as a supplementary file; number checklist items to manuscript page and line
Conclusions overreach the strength of the evidenceAuthors draw strong recommendations from low-certainty evidence or small numbers of studies without acknowledging limitationsApply the GRADE framework and explicitly state the certainty rating for each outcome in a Summary of Findings table; calibrate conclusion language to the evidence grade; dedicate a paragraph to limitations including publication bias

Other Dissemination Channels

Beyond journal publication, consider:

  • Conference presentations and posters to reach practitioners and researchers early
  • Policy briefs or lay summaries for non-specialist audiences
  • Preprint posting (e.g., medRxiv) to make findings available while under peer review
  • Sharing data extraction files and screening decisions as supplementary materials or on open repositories (OSF, Zenodo, Figshare) to facilitate replication and updating

Software, Tools, and Emerging Technologies

Recommended Software by Task

TaskRecommended ToolsNotes
Protocol registrationPROSPERO, OSF RegistriesFree; PROSPERO is health-specific
Reference management and deduplicationEndnote, Zotero, MendeleyDeduplication before importing into screening tools is essential
Screening and data extractionCovidence, Rayyan, EPPI-ReviewerCovidence integrates with Cochrane; Rayyan is free
Statistical analysis (meta-analysis)RevMan 5, R (meta package), Stata (metan)RevMan is free and used by Cochrane; R and Stata offer greater flexibility
Risk of bias assessmentRoB 2.0 web tool, ROBOTreviewerROBOTreviewer uses machine learning to assist RoB assessment
GRADE assessmentGRADEpro GDTFree; produces Summary of Findings tables
PRISMA flow diagramPRISMA2020 R package, Lucidchart, draw.ioPRISMA2020 package generates diagrams directly from screening counts

The Role of Artificial Intelligence in Systematic Reviews

Artificial intelligence and machine learning tools are increasingly used to assist with systematic review tasks, particularly those that are repetitive and high-volume. Current evidence-supported applications include:

  • Title and abstract screening assistance: Machine learning classifiers can prioritize records most likely to be relevant, reducing the number of records a human reviewer must screen (known as active learning or technology-assisted review). These tools include ASReview, EPPI-Reviewer’s AI plugin, and Rayyan’s machine learning feature.
  • Risk of bias automation: Tools such as ROBOTreviewer use natural language processing to extract risk of bias signals from trial reports.
  • Data extraction assistance: Large language models are being evaluated for structured data extraction, though human verification remains essential.

AI tools do not replace the need for human judgment and dual review. Any use of AI in the review process should be transparently described in the methods section, including the tool used, how it was applied, and what human oversight was maintained.

Common Pitfalls and How to Avoid Them

PitfallWhy It MattersHow to Avoid It
Poorly defined PICO questionLeads to unfocused searches, inconsistent eligibility decisions, and conclusions that cannot be appliedSpend time refining the question before any other step; pilot the criteria on sample records
Searching too few databasesMisses relevant studies, introducing selection biasSearch a minimum of three major databases plus grey literature sources
Errors in the search strategyPublished studies show error rates above 70-90% in systematic review search strategiesUse PRESS peer review; involve an experienced librarian in strategy design
Single reviewer screening or extractionIntroduces subjective bias into study selection and data captureMandate dual review at every stage; calculate and report inter-rater agreement
No protocol or late registrationAllows post-hoc outcome switching and reduces credibilityRegister in PROSPERO before the search begins
Inappropriate meta-analysisPooling clinically heterogeneous studies produces a misleading averageAssess clinical and statistical homogeneity before pooling; use narrative synthesis if heterogeneity is high
Failing to assess publication biasReview conclusions may overestimate beneficial effectsSearch grey literature; produce funnel plots when 10 or more studies are available
Incomplete PRISMA reportingPrevents replication and reduces credibility with journals and readersComplete the PRISMA checklist before submission; share it as a supplementary document

When and How to Update a Systematic Review

Evidence accumulates over time, and a systematic review that is current today may become outdated as new primary studies are published. Most systematic reviews should be considered for updating every two to five years, or sooner if:

  • A significant new trial or body of evidence has been published in the field
  • A new intervention, comparator, or outcome has become clinically relevant
  • The existing review’s conclusions have been challenged by subsequent evidence
  • Guidelines based on the review are being revised

The update process should follow the same rigorous methodology as the original review, with a new or updated protocol registered in PROSPERO. The update search is typically run from the date of the last search in the original review. Changes in methodology or scope between the original and updated review should be transparently documented.

Living systematic reviews represent a more resource-intensive model in which the review is continuously updated as new evidence becomes available, often using semi-automated screening and rapidly cycling publication cycles. This model is particularly relevant for fast-moving fields, such as emerging infectious diseases.

Frequently Asked Questions

Can I conduct a systematic review on my own?

Conducting a rigorous systematic review as a solo researcher is strongly discouraged and is inconsistent with most published standards. At minimum, a second independent reviewer is required for screening and data extraction to protect against subjective bias. That said, some institutions allow solo reviews for degree projects provided that methods are clearly documented, bias limitations are acknowledged, and an experienced supervisor or collaborator reviews key decisions.

Do I need PROSPERO registration even for a student project?

Registration is not legally mandated, but it is strongly recommended for any systematic review regardless of context. Journals increasingly require PROSPERO registration as a condition of submission. Even for a student project, registration demonstrates methodological rigor, prevents duplication of effort, and protects your work by establishing priority. Registration is free and typically takes less than an hour to complete.

How do I handle studies published in languages other than English?

Restricting a systematic review to English-language studies can introduce language bias, particularly in fields where important research is published in German, French, Spanish, or other languages. Best practice is to search without language restrictions, then translate or arrange translation for non-English studies that meet your eligibility criteria. If translation is not feasible, the language restriction must be transparently disclosed as a limitation.

What is the difference between a “reason for exclusion” and a “reason for inclusion”?

Inclusion is binary: a study either meets all eligibility criteria or it does not. Only one reason for exclusion needs to be recorded for each excluded full-text study (the primary or most decisive reason). A common error is recording multiple reasons per study; the convention is to apply criteria hierarchically and stop at the first unmet criterion. Reasons should be presented in aggregate in the PRISMA flow diagram, not listed individually for every excluded record.

How many studies do I need to include for a valid systematic review?

There is no minimum number of included studies required for a valid systematic review. A review that identifies zero studies satisfying the eligibility criteria is informative: it reveals a gap in the evidence base. Reviews with very few studies can still be valid, though conclusions must be drawn cautiously. Meta-analysis requires a minimum of two studies, and funnel plot interpretation is unreliable with fewer than ten. The validity of a review depends on the rigor of its methods, not its yield.

Should I use ChatGPT or other AI tools to help screen studies?

General-purpose large language models such as ChatGPT are not validated for systematic review screening and should not replace human dual review. They may introduce systematic errors or hallucinate eligibility decisions. Purpose-built tools such as ASReview, Rayyan’s machine learning feature, and Cochrane’s RCT classifier are designed and evaluated for these tasks. Any AI assistance in the screening process must be fully described in the methods section, and human verification of AI-assisted decisions is non-negotiable.

What is the difference between risk of bias and quality assessment?

“Risk of bias” is the preferred contemporary term for what was historically called “quality assessment.” Risk of bias refers specifically to systematic error in study results due to flaws in design, conduct, or reporting. The term “quality” is broader and may encompass reporting quality, external validity, or methodological rigor. Modern tools (RoB 2.0, ROBINS-I) assess risk of bias in specific domains and assign a judgment of low, some concerns, or high risk. A study can be well-reported but still have a high risk of bias, and vice versa.

How should I handle disagreements between co-reviewers that cannot be resolved by discussion?

Irresolvable disagreements between two reviewers should be escalated to a pre-specified third reviewer or arbitrator, who makes the final decision. This process should be described in the protocol and methods section before the review begins. In practice, persistent disagreements often signal ambiguity in the eligibility criteria themselves; revising and clarifying criteria during a calibration exercise before formal screening begins can substantially reduce downstream conflict.

What is the difference between a systematic review and realist review?

A systematic review aggregates evidence to determine whether an intervention works, typically pooling quantitative data and ranking studies by design quality (RCTs preferred). A realist review asks why, for whom, and under what circumstances an intervention works, using theory-driven synthesis of mixed evidence. Systematic reviews control for context; realist reviews treat context as central. The output of a systematic review is usually an effect estimate; a realist review produces refined programme theory in the form of Context, Mechanism, Outcome (CMO) configurations.

Related post

Featured post

Comment

There are no comment yet.

TOP