How to Do a Literature Search: A Practical Guide for Researchers
Contents
- What is a literature search?
- Why the Purpose of Your Research Changes Everything
- Step 1: Define Your Research Question
- Step 2: Identify Search Terms
- Step 3: Choose Your Databases
- Step 4: Build and Run Your Search Strategy
- Step 5: Apply Filters and Limits
- Step 6: Manage Your Results
- Step 7: Check for Missing Literature
- Step 8: Document and Report Your Search
- Common Mistakes to Avoid in Your Literature Search
- Summary: Matching Search Rigour to Research Purpose
- Frequently Asked Questions
What is a literature search?
A literature search is the systematic process of identifying, locating, and retrieving published works relevant to a research question. It is not the same as writing a literature review. The search comes first, and its quality determines everything that follows. A poorly executed search produces a biased, incomplete literature review whereas a well-executed one leads to an effective and comprehensive literature review. This guide walks you through the process step by step, with attention to how the purpose of your research shapes the depth and rigor of the search itself.
Why the Purpose of Your Research Changes Everything
Before running a single search, you must ask: Why am I doing this literature search? The answer fundamentally changes how comprehensive, documented, and reproducible your process needs to be.
| Research Type | Primary Goal | Search Depth | Documentation Required | Typical Databases |
| Doctoral Thesis/Dissertation | Establish originality; map field comprehensively | Exhaustive | High, must demonstrate thorough coverage | Multiple: PubMed, Embase, Scopus, Web of Science + grey literature |
| Original Research Article | Justify gap and rationale | Focused | Moderate, enough to contextualize | 2–3 core databases |
| Narrative Review | Synthesise themes and concepts | Broad, selective | Low-moderate | 2–3 databases, expert suggestion |
| Systematic Review/Meta-analysis | Answer a specific clinical/policy question | Exhaustive, reproducible | Detailed and mandatory, PRISMA flow required | All major databases + trial registries |
Doctoral Thesis or Dissertation
A doctoral literature search must demonstrate that you know your field well enough to identify a genuine gap. It is typically the broadest type of search, covering:
- Foundational literature going back to seminal works, sometimes decades
- Methodological literature: not just what has been studied, but how
- Grey literature: conference abstracts, preprints, institutional reports, theses from other institutions
- Adjacent disciplines: for example, a thesis on diabetes-related cognitive decline must cover both endocrinology and neuropsychology literature
The search is rarely conducted once. Doctoral candidates revisit and update the search as their research question sharpens, often running a final update search within 3–6 months of thesis submission.
Tools like R Discovery are particularly useful at this stage. R Discovery’s AI-powered feed learns from papers you mark as relevant and surfaces related work you may not have found through keyword searches alone: a significant advantage when you are trying to ensure breadth without becoming overwhelmed by volume. Its ability to recommend papers from across disciplines helps doctoral researchers working at the intersection of fields.
Original Research Article
When writing the introduction and discussion sections of an empirical paper, the literature search is focused rather than exhaustive. The goal is to:
- Establish that the specific question has not been adequately answered
- Cite the most current, high-quality evidence on key concepts
- Place your findings in context with comparable studies
A targeted search of 2–3 major databases (e.g., PubMed plus Scopus) with a limited date range (commonly the past 5–10 years, with key landmark studies regardless of date) is usually sufficient. R Discovery supports this workflow through its daily paper recommendations and citation-chasing features, which allow researchers to quickly identify the 20–40 most cited and most recent papers on a topic without running lengthy manual searches.
Narrative Review
A narrative review synthesises literature thematically and does not require a fully reproducible search protocol. However, it still requires a structured search to avoid the criticism of cherry-picking. Best practices include:
- Searching at least 4-5 databases
- Documenting the search terms used, even if not to PRISMA standards
- Be transparent about inclusion scope (e.g., English-language only, last 10 years)
- Using expert consultation or reference list scanning to supplement
R Discovery is well-suited to narrative review workflows. Its recommendation engine: trained on your reading history: helps surface thematically related papers, and its collections feature lets you organise papers by theme before synthesis begins.
Systematic Review or Meta-analysis
This is the most rigorous form of literature search. It must be:
- Pre-registered (on PROSPERO, for health-related reviews)
- Reproducible: another researcher running the same search should get the same results
- Comprehensive: missing key studies threatens the validity of pooled estimates
- Documented in full PRISMA format with a flow diagram
The search must cover multiple databases (typically a minimum of PubMed, Embase, and the Cochrane Library for biomedical topics), plus trial registries such as ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform. Hand-searching key journals and contacting authors for unpublished data may also be required.
R Discovery plays a supporting rather than primary role here: it is ideal for keeping track of ongoing literature during a long review, flagging new publications that may need to be included in an updated search, and reading and annotating full texts during screening. Its ChatPDF feature allows you to quickly determine whether the study in question included the variables/outcomes you are interested in, before you dive into reading the full paper.
While no literature search can be outsourced to AI completely, the PRISMA guidelines (mandatory for systematic reviews and meta-analyses) make it clear that the researcher(s) must be in complete charge of the search process at every step.
Step 1: Define Your Research Question
Every literature search begins with a clearly formulated question. Structured frameworks help translate a broad topic into searchable concepts.
PICO (most common in clinical biomedicine):
| Element | Meaning | Example |
| P | Population/Problem | Adults with type 2 diabetes |
| I | Intervention | GLP-1 receptor agonists |
| C | Comparator | Metformin |
| O | Outcome | Glycaemic control, weight loss |
Other frameworks include:
- SPIDER (qualitative research): Sample, Phenomenon of Interest, Design, Evaluation, Research type
- PEO (qualitative/social): Population, Exposure, Outcome
- ECLIPSE (health policy): Expectation, Client group, Location, Impact, Professionals, SErvice
Step 2: Identify Search Terms
From your PICO (or equivalent framework), extract keywords for each concept. Then expand each keyword with:
- Synonyms: “myocardial infarction,” “heart attack,” “MI,” “AMI”
- British/American spelling variants: “haematology” vs “hematology”
- Acronyms and abbreviations: “T2DM,” “DM2,” “non-insulin-dependent diabetes”
- Broader and narrower terms: from controlled vocabulary (see below)
Controlled Vocabulary vs Free Text
| Type | Examples | When to Use |
| Controlled vocabulary (MeSH, Emtree) | “Neoplasms,” “Antineoplastic Agents” | Precise retrieval in indexed databases (PubMed, Embase) |
| Free text / keywords | “cancer,” “tumour,” “anti-cancer drug” | Catches new terms, preprints, grey literature |
| Combined | Both, joined with OR | Best practice: maximises sensitivity |
In PubMed, MeSH (Medical Subject Headings) terms are assigned by indexers to each article. Searching “Heart Failure”[MeSH] retrieves all papers tagged with that heading, regardless of what word the authors used. Combine this with free-text keywords in the title and abstract fields using the [tiab] tag for maximum coverage.
R Discovery simplifies this step for less experienced researchers by allowing natural-language topic input and automatically mapping it to relevant papers: effectively handling some of the synonym expansion behind the scenes.
Step 3: Choose Your Databases
| Database | Coverage | Best For |
| PubMed/MEDLINE | Biomedical, clinical, life sciences | First stop for all biomedical searches |
| Embase | Pharmacological, drug literature, European journals | Drug trials, adverse effects, European research |
| Cochrane Library | Systematic reviews, RCTs | Evidence-based clinical questions |
| Scopus | Multidisciplinary | Citation analysis, broad coverage |
| Web of Science | Multidisciplinary | Citation tracking, impact metrics |
| CINAHL | Nursing and allied health | Nursing, physiotherapy, nutrition |
| PsycINFO | Psychology, psychiatry | Mental health, behavioural science |
| ClinicalTrials.gov | Registered trials | Ongoing and unpublished trials (systematic reviews) |
| Google Scholar | Very broad, including grey literature | Supplementary; not for primary systematic searches |
R Discovery aggregates content from across many of these sources and adds AI-powered curation on top. For researchers at institutions with limited database subscriptions, R Discovery’s ability to surface open-access papers and indicate full-text availability is particularly valuable.
Step 4: Build and Run Your Search Strategy
A search strategy combines your terms using Boolean operators:
- OR: broadens the search; use within a concept (diabetes OR hyperglycaemia OR “high blood sugar”)
- AND: narrows the search; use between concepts (diabetes AND metformin AND “cardiovascular outcomes”)
- NOT: excludes terms (use sparingly; can inadvertently exclude relevant papers)
Other Search Techniques
- Truncation: diabet* retrieves diabetes, diabetic, diabetics, diabetologist
- Wildcards: wom?n retrieves woman and women
- Phrase searching: “insulin resistance” in quotation marks retrieves the exact phrase
- Field tags: limit to title/abstract, MeSH term, author, journal, publication year
Example Search Block (PubMed)
A search for GLP-1 receptor agonists and cardiovascular outcomes in type 2 diabetes might look like:
(“Glucagon-Like Peptide-1 Receptor”[MeSH] OR “GLP-1 receptor agonist*”[tiab] OR
semaglutide[tiab] OR liraglutide[tiab] OR dulaglutide[tiab])
AND
(“Diabetes Mellitus, Type 2″[MeSH] OR “type 2 diabetes”[tiab] OR “T2DM”[tiab])
AND
(“Cardiovascular Diseases”[MeSH] OR “cardiovascular outcome*”[tiab] OR “MACE”[tiab]
OR “major adverse cardiac event*”[tiab])
This kind of multi-concept Boolean block is the backbone of a rigorous search. For systematic reviews, it is typically peer-reviewed by a medical librarian using the PRESS checklist.
What is the PRESS checklist?
PRESS stands for Peer Review of Electronic Search Strategies. It is a structured tool used to peer review electronic literature search strategies, particularly for systematic reviews and other evidence syntheses.
The PRESS 2015 checklist covers six elements:
- translation of the research question;
- Boolean and proximity operators;
- subject headings;
- text word searching;
- spelling, syntax, and line numbers; and
- limits and filters.
For most original articles and narrative reviews, PRESS is not required. It is considered best practice (and increasingly mandatory by journals) for systematic reviews and health technology assessments. Peer review of the search strategy is recommended at the protocol phase, before searches are conducted and study selection begins.
Step 5: Apply Filters and Limits
Filters should be applied after running the initial search, not before: applying them too early can introduce bias.
Commonly used filters:
- Date range: appropriate for rapidly changing fields (e.g., COVID-19 therapeutics) but not for conditions with longstanding evidence bases
- Language: English-only is common but introduces language bias in systematic reviews
- Study design: clinical trials, RCTs, meta-analyses (use with caution; not all study types are consistently indexed)
- Species: human vs. animal studies (important in basic science searches)
- Age group: paediatric, adult, elderly
- Publication type: exclude editorials, letters, conference abstracts (for some purposes)
For systematic reviews, language and date restrictions should be explicitly justified in the methods section.
Step 6: Manage Your Results
Once you have your results, you need to:
- Remove duplicates: the same paper often appears across multiple databases
- Screen titles and abstracts: based on pre-defined inclusion and exclusion criteria
- Retrieve and screen full texts: for papers that pass abstract screening
- Track decisions: especially for systematic reviews (required for PRISMA flow)
Tools for Reference Management and Screening
| Tool | Best Use |
| Zotero | Free reference manager; browser plugin; good for small-medium projects |
| EndNote | Institutional standard; powerful deduplication; good for large systematic reviews |
| Mendeley | Reference manager with PDF annotation |
| Rayyan | Free, purpose-built for systematic review screening |
| Covidence | Gold standard for systematic review management; subscription required |
| R Discovery | AI-powered reading feed; collections; full-text PDF access; annotation; ideal for ongoing monitoring |
R Discovery deserves particular mention in the management phase. Its collections feature allows you to create named folders (e.g., “Included: full text reviewed,” “Excluded: wrong population”) and move papers between them. The built-in PDF reader with highlighting and note-taking means you can annotate directly within the platform without switching tools. For researchers who need to stay current with a field over months or years, R Discovery’s daily personalised feed ensures that newly published papers matching your interests are flagged automatically.
Step 7: Check for Missing Literature
Even after a thorough database search, important papers can be missed. Supplement your search with:
- Reference list scanning (pearl growing): check the reference lists of key included papers
- Citation chasing (forward searching): find papers that have cited your key papers (use Scopus, Web of Science, or Google Scholar)
- Journal hand-searching: manually browse issues of the most relevant journals
- Grey literature searching: conference proceedings, preprint servers (bioRxiv, medRxiv), WHO, FDA, NICE documents
- Expert consultation: contact field experts to ask if key papers have been missed
- Contacting authors: for unpublished data in systematic reviews
R Discovery supports citation chasing through its “related papers” and “citing papers” features, which surface both older foundational papers and newer papers that have built on a work of interest, without requiring a separate search in Scopus or Web of Science. It also covers a considerable amount of grey literature, with more than 5 million preprints and 7.5 million patents in its database.
Step 8: Document and Report Your Search
For All Research Types
At minimum, record:
- The databases searched
- The date of the search
- The full search strategy (all terms and Boolean logic) for at least the primary database
- The number of results from each database
- The number of records after deduplication
For Systematic Reviews: PRISMA Flow Diagram
The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram is mandatory. It documents:
- Records identified from database searching and other sources
- Records after duplicate removal
- Records screened and excluded at title/abstract stage (with reasons)
- Full texts assessed and excluded (with reasons)
- Studies included in final synthesis
Common Mistakes to Avoid in Your Literature Search
- Searching only one database: even PubMed alone misses a significant proportion of relevant literature
- Using only free-text keywords without MeSH terms: reduces sensitivity in indexed databases
- Applying date filters too early: can exclude landmark papers
- Not documenting the search: makes it impossible to update or reproduce
- Confusing sensitivity and specificity: a very specific search finds fewer, more precise results; a very sensitive search casts a wider net but retrieves more noise. Systematic reviews prioritise sensitivity; targeted searches for original articles can be more specific
- Searching once and stopping: literature searches for long projects (thesis, systematic review) must be updated before submission
- Over-relying on a single tool: even powerful platforms like R Discovery are best used alongside formal database searches, not as a replacement for them in rigorous review contexts
Summary: Matching Search Rigour to Research Purpose
| Feature | Doctoral Thesis | Original Article | Narrative Review | Systematic Review |
| Minimum databases | 4–6 | 2–3 | 4–5 | 5+ including trial registries |
| MeSH/controlled vocabulary | Recommended | Recommended | Optional | Mandatory |
| Grey literature | Yes | Optional | Optional | Yes |
| Date restrictions | Justified | Common | Common | Justified only |
| PRISMA flow | Optional | Not required | Not required | Mandatory |
| Pre-registration | Not required | Not required | Optional | Strongly recommended |
| Search updates | Yes (ongoing) | At submission | At submission | Yes, before final analysis |
| R Discovery role | Daily feed, collections, reading | Recommendations, reading | Recommendations, themes | Monitoring updates, reading |
A well-conducted literature search is an ongoing process of systematic discovery, careful documentation, and continuous updating. Whether you are a doctoral student mapping a field for the first time or a seasoned researcher conducting a meta-analysis, the principles remain the same: be systematic, be explicit, and let the evidence tell you what is there, not what you hoped to find.
Frequently Asked Questions
Can I use ChatGPT to find papers for my literature search?
Not as your primary search tool—and certainly not without verifying every single reference it generates. General-purpose large language models like ChatGPT do not search live databases; they generate text based on patterns in their training data. This means they can produce citations that look entirely plausible with correct journal format, realistic author names, and believable titles BUT these papers do not actually exist. A Deakin University study found that when ChatGPT was used to write mental health literature reviews, roughly one in five citations were completely fabricated, and more than half of all citations were either fake or contained errors. A psychiatry-focused test found that of 35 references generated by ChatGPT, only two were real, 12 were similar to actual manuscripts with incorrect details, and the remaining 21 were plausible-sounding composites of multiple real papers.
AI tools purpose-built for academic literature—such as R Discovery—are a different matter. These connect to real paper databases and surface verified records. R Discovery in particular is designed for this workflow, with AI recommendations grounded in actual indexed literature rather than generated text. Use it; do not use general chatbots as a substitute for database searching.
How do I know when I have searched enough and when can I stop?
If you keep seeing the same references appear repeatedly across different searches and databases, you have likely reached critical mass and can stop the retrospective search because you have found the existing relevant articles on your topic.
More formally, you are looking for a state sometimes called search saturation. This is the point at which new searches are no longer returning papers you haven’t already seen. Practical indicators include:
- Running a new database search or variant search string yields fewer than 5% new unique results
- Reference list scanning of your included papers keeps pointing back to papers already in your pile
- Citation chasing forward from key papers surfaces no new relevant works
For systematic reviews, the stopping point is defined by protocol and date: you run the search, record it, and then update it once before final submission. For doctoral work, the search is never fully “done”; a final update run within 3–6 months of submission is standard practice. R Discovery’s daily feed is particularly useful here because it passively monitors the literature and flags new publications so you don’t have to keep re-running manual searches from scratch.
What do I do when I can’t access the full text of a paper I need?
Paywalls are a genuine barrier, but there are several legitimate routes before giving up:
- Check for an open-access version first. Many authors self-archive their papers in institutional repositories or on ResearchGate. The browser extension Unpaywall automatically detects legal open-access copies of paywalled articles as you browse. Open Access Button is another tool that searches for freely available versions, or sends a request directly to the author if none is found.
- Check preprint servers. bioRxiv and medRxiv host preprint versions of many biomedical papers, often the accepted manuscript before final journal formatting.
- Email the corresponding author. Authors are almost always willing to share a PDF of their own work on request. This is completely legal and usually fast.
- Use interlibrary loan (ILL). If you have institutional affiliation, your library can obtain papers from other institutions, usually within a few days.
- R Discovery surfaces open-access versions of papers and indicates full-text availability directly within its interface, reducing time spent hunting across multiple platforms.
How do I know if a paper I want to cite has been retracted or flagged for problems?
Practical steps to keep away from spurious or unethical research are:
- Check the Retraction Watch Database: a searchable database of retracted papers, now integrated into reference managers including EndNote and Zotero, which can flag retracted papers automatically in your library.
- Check PubMed directly: retracted papers in PubMed carry a “Retraction of Publication” notice on the record.
- Use PubPeer: a post-publication peer review platform where concerns about papers are flagged and discussed, often before formal retraction. RedacTek is another tool that flags articles with high self-citation rates and other markers of potentially problematic papers, available as a Chrome extension for use during active searching.
- Be alert to red flags in the paper itself: unusual author combinations, figures that appear elsewhere online (reversible with a Google Lens image search), and papers that appear highly polished without conveying actual data or insights (See also: 6 ways to spot an AI-generated medical paper)
- Look up unfamiliar journals: Signs of a predatory journal/hijacked journal that doesn’t properly peer review papers are: an extremely broad scope (e.g., Journal of Medicine, Biology, Engineering, and Technology), rapid publishing speeds, articles that look AI-generated, and articles that don’t meet basic scientific criteria.
My search is returning thousands of results. How do I manage this without reading everything?
A very large result set is usually a sign that your search strategy is too broad, not that you need to read everything, so you need to narrow first, then read:
- Tighten the search by adding an additional AND concept, using more specific MeSH terms, or applying proximity operators (e.g., requiring that two terms appear within five words of each other rather than anywhere in the abstract)
- Screen titles first: a title pass through several thousand records takes far less time than it sounds; most can be excluded in 2–3 seconds
- Screen abstracts second: only for papers that passed the title screen (if you are doubtful after reading the abstract, use the ChatPDF feature in R Discovery to check if the paper really covers what you’re interested in)
- Read full texts last: only for papers that passed the abstract screen
For systematic reviews, this two-stage screening process is mandatory and documented in the PRISMA flow. For other research types, even an informal version of this funnel prevents you from drowning in irrelevant material.
This article was originally published on October 5, 2023, and updated on May 14, 2026.





