2026.06.09

Research Data Management: How to Make a Data Management Plan (DMP)

Contents

Glossary of Key Terms
Key Takeaways
What Is Research Data Management?
The Research Data Lifecycle
The FAIR Principles: The Foundation of Good RDM
Step 1: Start with Policies, Ethics, and Legal Compliance
Step 2: Build a Sound Data Collection Strategy
Step 3: Organize Your Files and Folders
Step 4: Document Everything: Metadata and README Files
Step 5: Store Data Securely Using the 3-2-1 Rule
Step 6: Develop a Data Management Plan (DMP)
Step 7: Choose Open and Standard File Formats
Step 8: Share Your Data: The Open Science Imperative
Practical Implementation: Getting Started
Continuous Self-Monitoring
Frequently Asked Questions

Glossary of Key Terms

Term	Definition
Research Data Management (RDM)	The process of collecting, organizing, storing, securing, sharing, and preserving research data across all stages of the research lifecycle
Data Lifecycle	The sequence of stages data passes through: creation/collection → processing → analysis → storage → sharing → preservation/reuse
FAIR Principles	A framework stating that data should be Findable, Accessible, Interoperable, and Reusable
Data Management Plan (DMP)	A formal document outlining how data will be collected, stored, shared, and preserved throughout and after a research project
Metadata	Structured data that describes other data: for example, who created a file, when, using what method, and in what format
Data Repository	A digital archive where datasets can be deposited and made accessible to other researchers
Open File Format	A non-proprietary file format that can be opened without specific paid software and is more likely to remain accessible over time
Persistent Identifier (PID)	A long-lasting reference to a digital resource: such as a DOI (Digital Object Identifier): that makes data reliably citable and findable
De-identification	The process of removing or masking personally identifiable information from a dataset to protect participant privacy
Version Control	A system for tracking changes to files over time, allowing earlier versions to be retrieved
Electronic Lab Notebook (ELN)	A digital tool that replaces or supplements traditional paper lab notebooks, enabling structured, searchable documentation of research activities
Open Science	A movement that promotes transparent, reproducible, and openly accessible research: including open data, open methods, and open publications
Data Governance	The set of policies, roles, and processes that determine how data is managed, accessed, and protected within an organization
3-2-1 Backup Rule	A data backup strategy: maintain 3 copies of data, on 2 different storage media, with 1 copy stored offsite

Key Takeaways

Research Data Management is the process of providing appropriate labeling, storage, and access for data at all stages of a research project.
RDM encompasses all data-related activities across the entire data lifecycle: from collection and processing through to storage, sharing, and long-term preservation.
Well-organized research data can help researchers improve efficiency, meet funding and regulatory requirements, and optimize the potential of their research: leading to publication, funding, and collaboration opportunities.
The FAIR principles: Findable, Accessible, Interoperable, and Reusable: serve as the foundational framework for modern RDM and should guide every decision about how data is documented, stored, and shared.
A Data Management Plan (DMP) is not optional busywork: it is a living document that helps researchers anticipate needs, stay compliant, and maximize the long-term value of their data.
Open and standard file formats ensure that research data remain accessible and usable over time by avoiding dependencies on proprietary software that may not be supported in the future.
Data publication promotes transparency, credibility, long-term accessibility, reproducibility, and collaboration. Researchers stand to benefit personally through greater recognition of their work.
Continuous self-monitoring (setting regular intervals to review data practices, assess progress, and refine workflows) is just as important as the initial setup of any RDM system.

What Is Research Data Management?

Research data is changing in scale and complexity at a pace that was hard to imagine two decades ago. In the 20th century, it was common for a study or experiment to yield a single file: perhaps one data table. Today, many research projects generate many files, often created by multiple collaborators, and often valuable for secondary use. Experiments in genomics can generate multiple raw files per biological sample, plus layers of processed data.

Research Data Management is the process of providing appropriate labeling, storage, and access for data at all stages of a research project. It is not a single activity but a continuous discipline that spans the entire life of a research project: and often extends well beyond it.

Funding agencies now require a data management or data sharing plan to be submitted with grant applications. Many academic journals also require the submission of relevant data alongside manuscripts to promote open access and reproducibility. Early and attentive management at each step of the data lifecycle will ensure the discoverability and longevity of your research.

Why Should Researchers Care?

Beyond compliance, good RDM makes practical sense:

Organized, well-documented data is simply easier to analyze
You can find your own files when you need them: sometimes years later
You avoid drowning in irrelevant or duplicate data
You protect against data loss from accidents, equipment failure, or staff turnover
You get credit for your data and avoid accusations of misconduct
You enable other researchers to build on your work, amplifying your impact

The Research Data Lifecycle

Understanding RDM starts with understanding the data lifecycle: the sequence of stages your data passes through from creation to long-term use. The lifecycle is typically represented as a cycle rather than a straight line, because data created in one project frequently becomes the raw material for future research.

Lifecycle Stage	What Happens	Key RDM Activities
Plan	Research design, DMP creation	Defining data needs, legal/ethical review, storage planning
Collect	Data generation or acquisition	Naming conventions, quality control, format selection
Process & Analyze	Cleaning, transforming, analyzing	Version control, documentation, code management
Store	Securing data during the project	Backup implementation, access controls, encryption
Share & Publish	Making data available	Repository selection, licensing, persistent identifiers
Preserve & Reuse	Long-term archiving	Format migration, metadata maintenance, enabling secondary use

The FAIR Principles: The Foundation of Good RDM

FAIR refers to the findability, accessibility, interoperability, and reuse of digital assets. Every step in research data management is closely connected to FAIR.

What Each Principle Means in Practice

Findable: Metadata and data should be easy to locate for both humans and computers. This means rich metadata, clear naming, and the use of persistent identifiers like DOIs.
Accessible: There should be clarity on how data can be retrieved: including what authentication or authorization is required. “Accessible” does not necessarily mean “open to everyone”; it means the access conditions are clearly defined.
Interoperable: Data should be structured in a way that enables it to work with other datasets, tools, and workflows. This is primarily achieved through the use of standard formats and shared vocabularies.
Reusable: Metadata and data should be sufficiently well-described so that others can reproduce, replicate, or build upon the work in different settings.

It is important to know that FAIR is applicable not only to data but also to metadata and relevant infrastructure.

Step 1: Start with Policies, Ethics, and Legal Compliance

Before collecting a single data point, researchers need to understand the landscape of rules that govern their work.

Applicable Policies

Countries, umbrella organizations, science foundations, professional societies, institutions, funding bodies, and project boards may all issue policies and guidelines. These inherently reflect best practices, outline legal issues, or offer suggestions for efficiency and resource management. Researchers should check for applicable policies and compliance requirements in their subject area by consulting funder websites, institutional research support offices, or DMP tools.

Ethical Regulations and Legislation

Relevant legislation typically pertains to data collection and sharing, designed to safeguard personal data of individuals. For example, within the European Union, individuals possess the right to know which information is collected, processed, and transmitted under the General Data Protection Regulation (GDPR). Ethics committees review proposals for ethical compliance and aim to ensure the rights, safety, and well-being of participants.

Key considerations before starting data collection:

Does this project involve human participants? If so, IRB/ethics approval is likely required.
Does the data contain personally identifiable information (PII)? If so, GDPR, HIPAA, or local equivalents may apply.
Are there intellectual property considerations: for example, if collaborating with industry partners?
What data retention periods are required by your funder or institution?

Step 2: Build a Sound Data Collection Strategy

To reach well-founded conclusions, researchers require quality data. Understanding how the research question translates into specific data needs: what data are required, what insights are expected, and in what way: is one of the first steps in conducting research.

Reusing Existing Data

Researchers should first check published data to examine whether it can be integrated into the work. Reusing data can be a tremendous benefit and save significant resources, especially labor and material costs, as well as data retention costs in projects with high data volumes.

Collecting New Data

If new original data are required, the guiding principle is to collect as much as required, but no more than necessary. Running a sample size calculation before collecting data ensures collection of the minimum amount of required data. Data collected beyond requirements need additional resources for administration, processing, and storage.

Ensuring Data Quality

Data quality is the degree to which the data at hand can meet their intended purpose while being error-free. The goal of data quality efforts is to assure that the data depicts the real-world entities it measures as comprehensively as possible. Best practices for data quality may vary by discipline. In social sciences, validation through triangulation is common, while in physics, calibration of instruments ensures accuracy.

Step 3: Organize Your Files and Folders

A systematic organization of files and working directories is key to efficient filing, navigation, and prompt file retrieval. Using the same filing scheme across projects and teams can simplify and accelerate interactions.

Folder Structure

Clean working directories have a logical and uniform structure: a standardized folder structure and depth, as well as default folder names. The project folder ideally contains all project files in its logically subdivided subfolders. Using a similar folder structure across projects can facilitate data retrieval and promote standardization while allowing for variations as necessary.

A typical folder hierarchy might look like this:

ProjectName/

├── raw_data/

├── processed_data/

├── analysis/

├── manuscripts/

├── protocols/

├── code/

└── admin/

├── DMP/

└── ethics/

File Naming Conventions

Ensure the file name is descriptive, relevant, and allows for easy sorting and filtering. Incorporate dates, version numbers, and project numbers within the file name. A consistent file naming structure might include: [ProjectCode]_[DocumentType]_[Version]_[Date]. For example: TenR_Man_01_MJH_v01_2025-01-01.docx

Key rules for file naming:

Use ISO date format (YYYY-MM-DD) for easy chronological sorting
Avoid spaces: use underscores or hyphens instead
Avoid special characters that may cause issues across operating systems
Be consistent across your entire team

Version Control

Have a system in place to track file versions and activities. This will help you identify any changes made to the original file. At any point if you are unsure, you will be able to go back to find the person who made the change and the reasons for it.

Step 4: Document Everything: Metadata and README Files

Orderly and standardized documentation of both data and its collection method is key to understanding and using any kind of data. Data are usually not self-explanatory: with sufficient documentation, the work remains transparent and reproducible and is less likely to be misinterpreted.

What Metadata Should Include

Metadata comprises information on when data were created, by whom, and with which method. It may also include file sizes, formats, and languages. Common forms include README files, data dictionaries, or computer-readable XML/JSON files.

A README file should minimally contain:

README Element	Description
Explanation of data included	What the dataset contains and what it represents
Original purpose / project affiliation	The research question and project the data was collected for
Author(s) / Creator(s)	Who collected or generated the data
Date/period of data creation	When the data was collected or generated
Software or hardware requirements	What is needed to open or process the files

Data Dictionaries

For tabular data especially, a data dictionary is invaluable. It explains what each column or variable represents, the data type, the unit of measurement, and the range of valid values. This is critical for any collaborator: including your future self: who needs to work with the data later.

Step 5: Store Data Securely Using the 3-2-1 Rule

Storing files on a modern computer is easy. However, securing them over time and in a sustainable way: avoiding data corruption and data loss: requires following some simple principles.

What is the 3-2-1 Backup Rule?

Valuable data should be stored in accordance with the 3-2-1 backup rule: keep three separate instances of the data: the original and two backups: on two distinct storage devices, such as a local copy on a laptop plus network storage, and one offsite backup at a different location.

Backup Layer	Example
Copy 1 (Primary)	Working files on your personal computer or workstation
Copy 2 (Local backup)	Institutional network drive, automatically backed up
Copy 3 (Offsite backup)	Cloud storage (institutional or commercial) in a different geographic location

Access Control and Security

Data security is not just about preventing loss: it is also about controlling who can access sensitive data. Key security practices include:

Setting appropriate access permissions for team members
Encrypting sensitive data, especially on portable drives
Using strong authentication (multi-factor authentication where possible)
Reviewing and revoking access when team members leave a project

Data Retention

It is best practice to retain research data over time. Data retention requirements may vary by country, funder, data type, or subject domain, but may require a minimum of 10 years, ranging up to over 25 years. Check your funder’s and institution’s specific requirements: and document them in your DMP.

Step 6: Develop a Data Management Plan (DMP)

A DMP essentially outlines data collection, storage, and sharing strategies while describing how privacy, security, and policy compliance are ensured. Usually drafted alongside the project outline, a DMP accompanies the project throughout its lifecycle, maximizing the data’s value and impact.

Core Questions a DMP Should Answer

DMP Category	Key Questions
Collection	What data will be re-used, collected, or created, and how?
Description	What formats and types will be collected? What hardware/software is required?
Standards	How will data be described to enable effective interpretation? Which metadata standards apply?
Policies, Legal & Ethics	Which policies and funder requirements apply? How will legal and ethical compliance be met?
Storage & Preservation	How will data be stored, secured, and preserved during and after the project?
Access	Who needs access during the project, and what authorization rules apply?
Sharing & Reuse	How will data be shared, and under what conditions or licenses?
Roles & Responsibilities	Who is responsible for each data-related step?
Budget	What financial implications arise from data storage, software, or publication?
Quality Control	How will data quality be ensured and monitored?

Tools to Help Create a DMP

Several free web-based tools can guide researchers through the DMP process and incorporate funder-specific templates:

DMPTool: widely used in the United States, with templates for NIH, NSF, and other funders
DMP Online: commonly used in the UK and Europe
RDMO: supports multiple European funder requirements and provides forms tailored to those requirements

A common pitfall is that researchers may create a DMP initially and then fail to regularly review and update it. This oversight can lead to consequences ranging from increased workload to data mismanagement.

Step 7: Choose Open and Standard File Formats

Open and standard file formats facilitate data handling within research groups and beyond, while ensuring that data remain readable and accessible over time. The practical recommendation: retain your original proprietary file if needed for active work, but always produce an open-format copy for archiving.

Recommended Formats by Data Type

Data Type	Recommended Open Formats
Tabular data / Statistics	CSV, plain text (UTF-8)
Text documents	PDF/A (archival), TXT, ODT
Images / Photographs	TIFF, PNG, JPEG 2000
Audio	FLAC, BWF (Broadcast Wave)
Video	MP4, MKV
Containers / Archives	ZIP, TAR
Scientific sequences (bioinformatics)	FASTA, FASTQ

Step 8: Share Your Data: The Open Science Imperative

Data publication refers to making research datasets openly accessible for review and reuse. In the context of the reproducibility crisis, data sharing plays a key role in enabling research reproducibility. Research data publication promotes transparency, credibility, long-term accessibility, reproducibility, and collaboration.

What Data Should be Shared?

Publish	Do Not Publish
Unique datasets difficult to recreate	Test or pilot data
Data with high relevance to the scientific community	Discarded or erroneous data
Data that is anonymized or safely de-identified	Data with no medium- or long-term relevance
Complex or expensive-to-generate data	Data containing unresolvable personal identifiers

De-identifying Sensitive Data

When human subject data is involved, sharing requires careful de-identification. This includes removing direct identifiers such as names and addresses, aggregating variables such as grouping ages into ranges, suppressing rare values, or adding noise to geographic data. These practices help balance openness with privacy protections under regulations like HIPAA in the US or GDPR in the EU.

Choosing a Data Repository

Researchers can choose a general-purpose data repository or select a topic-specific repository. Common repositories include:

Zenodo: free, general-purpose, supported by CERN
Dryad: popular in the life and environmental sciences
Figshare: supports a wide variety of file types
Harvard Dataverse: widely used in the social sciences and humanities
Domain-specific repositories: such as NCBI (genomics), ICPSR (social science), or UK Data Archive

Licenses and Persistent Identifiers

Researchers should choose an appropriate license: such as a Creative Commons or MIT license: to specify how data can be reused. As a rule of thumb, select a license that is as open as possible and imposes minimal restrictions. Data publications should be assigned a persistent identifier such as a DOI to enhance long-term findability and prevent isolation, in line with the FAIR principles.

Practical Implementation: Getting Started

Shifting to manage research data digitally can seem like a daunting task, but it doesn’t need to be. Many organizations and institutions provide research data management support, mostly through research services offices or research librarians.

A Realistic Starting Sequence

Before the project:

Review funder and institutional data policies
Seek ethics approval if required
Draft your DMP
Set up your folder structure and file naming convention

During the project:

Apply naming conventions consistently
Maintain metadata and documentation as you go
Back up data regularly using the 3-2-1 rule
Use version control for code and evolving data files

At the end of the project:

Prepare data and metadata for publication
Convert files to open formats
Choose a repository and deposit your data
Assign a DOI or other persistent identifier
Update your DMP to reflect what was actually done

Choosing a Digital Tool: Common Approaches

Approach	Pros	Cons
Paper notebooks	Zero cost, no setup	Extremely difficult to find, share, or back up data
Shared server folders	Familiar, low cost	Easily loses control with multiple users; hard to search
Cloud storage (generic)	Accessible, low cost	Limited structure; depends on individuals to follow conventions
Electronic Lab Notebook (ELN)	Structured, searchable, version-controlled	Takes time to set up; may have licensing costs

Communicating Change Within Your Lab

Communication is key when changing existing practices within the lab. Make sure everyone understands why these changes are happening. This could mean involving lab members in decision making, listening to feedback from those who handle data day-to-day, and providing context and background on why switching to digital data management is important. If only some lab members follow the new practices, the benefits will be drastically reduced.

Continuous Self-Monitoring

Continuous self-monitoring involves setting regular intervals at which to evaluate progress within and beyond projects in order to spot potential problems and ascertain the overall effectiveness of the work. By performing such self-monitoring, researchers can potentially mitigate the risk of getting lost in detail and better visualize goals.

Regular review allows mapping of progress against strategies, schedules, budgets, and other metrics to keep work on track and allow for corrective actions. This can help tackle issues that might be minor today but can compound over time and require excessive resources in the long run.

Suggested checkpoints:

At project initiation: confirm DMP is complete and the team is aligned
At each major milestone: review whether data practices are being followed
Annually: review storage needs, access permissions, and policy changes
At project close: complete repository deposit and final documentation update

Frequently Asked Questions

Do I need a Data Management Plan even if my funder doesn’t require one?

Yes. A DMP is useful regardless of funder requirements because it forces you to think through data needs proactively, prevents costly corrections later, and ensures your team is aligned. Many researchers who draft DMPs voluntarily report that the process itself surfaces problems they hadn’t anticipated.

What’s the difference between a data repository and cloud storage like Google Drive or Dropbox?

Commercial cloud storage services are designed for active file access and sharing, not for long-term preservation or discoverability. A data repository assigns your dataset a persistent identifier (such as a DOI), indexes it for search, and ensures it remains accessible according to defined standards, sometimes for decades. Cloud storage accounts can be closed, reorganized, or have their terms changed without notice.

How do I handle data that belongs to multiple collaborators or institutions?

Establish a data governance agreement at the start of the project. This should clearly state who owns the data, who has access rights, how it can be shared or published, and what happens to the data if the collaboration ends or a team member moves to another institution. This is particularly important when collaborating across countries with different legal frameworks.

My lab generates very large datasets (terabytes of imaging or sequencing data). Does RDM still apply?

RDM applies especially to large datasets, which are harder to manage retroactively. For high-volume data, it is worth planning storage infrastructure and costs explicitly in your DMP. Some funders allow research data storage costs to be included in grant budgets. Domain-specific repositories often have infrastructure designed for large scientific datasets.

Is anonymized data always safe to share?

Not necessarily. Re-identification risks are real, especially with genomic data, rare disease data, or small geographic populations. Even data that has been processed to remove obvious identifiers can sometimes be cross-referenced with publicly available information to re-identify individuals. For sensitive data, consult your institution’s data protection officer or legal team before sharing, and consider whether controlled-access sharing is more appropriate.

How long should I keep my research data after a project ends?

This varies by discipline, country, funder, and data type. A common minimum is 10 years, but some funders, institutions, or regulatory frameworks require longer retention, in some cases 25 years or more. Clinical trial data often carries extended retention requirements. Check your specific funder’s policy and document the required retention period in your DMP.

What happens to research data when a PhD student or postdoc leaves the lab?

This is a common and underappreciated risk. Before a lab member departs, ensure that all data is properly documented, stored in a shared institutional location (not on a personal device), and that at least one remaining team member can access and understand it. Creating a “knowledge transfer file” or offboarding checklist is a best practice that many institutions now recommend.

Can I use AI tools to help with data documentation or organization?

AI tools can assist with tasks like generating README templates, drafting data dictionaries, or suggesting metadata fields: but the researcher remains responsible for the accuracy and completeness of all documentation. AI-generated metadata should always be reviewed against the actual data. Be cautious about uploading sensitive or confidential datasets to third-party AI services, as this may violate data governance agreements or privacy regulations.

What Is a Cohort Study? Definition, Examples, How to Conduct

What Is a Scoping Review? Purpose, Method, Examples