Open Data in Science - Embracing Open Data for Scientific Progress
Editor’s Note: This post was originally published in 2015 and has been refreshed for Open Access Week 2017.
Until a few years ago, sharing research data was a completely unimaginable idea; researchers were wary of being scooped and publishing research findings was considered to be sufficient. But today, with the scholarly community focusing on open access, researchers and scientists have realized the benefits of and the need for making scientific data easily storable, accessible, and sharable. The concept of open access certainly changed the publishing landscape, but more and more institutions, publishers, funders, and researchers are getting convinced about the benefits of having open data mandates. For example, the Open Data Institute in the UK identified over 250 companies that used open data to create innovative products and build their businesses; the publishing company Elsevier talked about “creating a sharing ecosystem for research data” and launched an open data pilot for about 40 journals; some researchers, too, have actively spoken in favor of data sharing, saying the benefits far outweigh the perceived risks. In this post, I will take a quick look at the data sharing policies various countries follow. Before that, a brief overview of what open data means and involves is in order.
More about open data in science
Free availability of research data (e.g., figures, tables, graphs, equations, calculations, supplementary material) aids scientific progress by enabling others to build upon existing research or conduct reproducibility studies – both of which are critical to scientific progress. However, the problem is that a vast amount of scientific data is lost either because researchers are unwilling to share it, it is stored inappropriately, or it is locked away behind complex paywall restrictions. This seems to be changing across the globe as more and more people are accepting the concept of open data. Open data, as defined by the Open Data Institute, is data that is freely available in a storable, accessible format for everyone, with a license that allows the free re-use, sharing, or distribution of it. Open data mandates refer to policy-level rules that require researchers to share their data based on institutional/funder regulations. For example, the US Office of Science and Technology Policy (OSTP) issued a memorandum on data sharing of federally funded research; private funders such as the Bill and Melinda Gates foundation have set strict data sharing rules; the Engineering and Physical Sciences Research Council (EPRSC) in the UK follows one of the most talked about data sharing mandates; the Canadian government and the Austrian Science Fund too have open data policies in place. This list runs long. There are differences in the way in which these policies are introduced. Some agencies require researchers to share data after their manuscripts are published, while some others require it right from the grant application stage. Most policies apply to publicly or federally funded research (where funding comes from taxpayers or the government).
The six stages of data sharing
Mark Hahnel, CEO and founder of Figshare, who labeled 2015 “the year of open data mandates,” identifies six stages of data sharing and observes that funders across the globe have passed the halfway mark of this six-stage route.
Open data mandates across the globe
Given the positive attitude towards data sharing, many governments and institutions in different parts of the world are implementing open data mandates to capitalize on the long-term benefits of data sharing through their own or external repositories. Let us take a look at some of these mandates.
- The UK
The UK has come to be known as one of the pioneers in implementing hard open data mandates that institutions or researchers cannot avoid. The EPSRC’s open research data policy came into effect on May 1st 2015. Setting new global standards for open data policies, the EPSRC decided to “investigate non-compliance” and “impose appropriate sanctions.” Interestingly, EPSRC’s stringent policies are based on those of Research Councils UK (RCUK) which considers research data as public because it is dependent on public funding. Other funders in the UK that have data sharing policies in place are Wellcome Trust and the Medical Research Council (MRC) (both of which require researchers to prepare and share a comprehensive data sharing plan), BBSRC (primary data must be held for 10 years after completion of a research project), Nuffield Foundation (requires data to be deposited within a year of grant completion) Natural Environment Research Council (NERC, allows researchers to use their data exclusively for a maximum embargo period of 2 years from the end of data collection), National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs, requires articles to include a statement on how the research materials such as data, samples or models can be accessed by readers), and Cancer Research UK (data can be sent to the person requesting it without having to place it in a repository and data must be made available for sharing for at least 5 years after the end of the research grant).
- The US
In the US, the emphasis on open data comes right from the President’s office. The 2013 memorandum issued by the OSTP directs “each Federal agency with over $100 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government.” According to the National Science Foundation (NSF), all work produced under NSF grants should be shared, including primary data, samples, physical collections and other supporting materials. The National Institute of Health (NIH) mandates that data should be shared as soon as manuscripts are accepted for publication. The Bill and Melinda Gates Foundation’s policy requires the researchers it funds to make their papers and underlying data sets available immediately upon publication, also allowing the research available for commercial reuse. Other institutions that mandate data sharing are the Gordon and Betty Moore Foundation (GBMF, requires all data to be archived within 6 months of data collection or DNA sequence determination), and the Howard Hughes Medical Institute (HHMI, requires data sharing immediately after publication because “A responsibility of authorship is to make available materials, databases, and software integral to the publication so that others may validate or falsify the results and extend them in new directions.”).
The Canadian government is working on “a government-wide Open Science Implementation Plan” that includes launching open access publications and data from federally funded research, policy-level changes, and tools to help make data more accessible.
The National Health and Medical Research Council (NHMRC) and the Australian Research Council (ARC) require research funded by both bodies to be deposited into an open access institutional repository within 12 months from the date of publication. Several Australian universities, which host data storing repositories with different degrees of sophistication, recently came together to launch their Open Data Collections.
Chinese scientists have been finding it extremely difficult to access high-quality data from domestic research because of monopolistic government departments. In 2014, two Chinese funding agencies, the National Natural Science Foundation of China (NSFC) and the Chinese Academy of Sciences (CAS) issued a mandate requiring researchers to store manuscripts into online repositories and make them publicly accessible within 12 months of publication. According to a report published by Open Data Now, “it is citizens, nonprofits, and urban government leaders driving the movement for more data in China.”
Although open access or data sharing have not been taken up at the policy level in Japan, according to a UNESCO report, there have been several government-funded repositories hosted by universities across Japan. The report mentions “5 OA policies, two of which are funders' mandates and three institutional OA policies.” In 2013, the Japanese ministry of Education, Culture, Sports, Science and Technology issued a new repository mandate. How these measures are received and developed by Japan’s scholarly publishing community remains to be seen.
Some notable data sharing policies from other countries include the Austrian Science Fund (data should, if possible, be deposited in a way that allows it to be re-used/cited without restrictions), the Higher Education Authority (HEA) of Ireland, and the Vetenskapsrådet policy which applies to researchers funded by the Swedish research council.
While some governments and institutions are fairly aggressive with data sharing principles and policies, some are still catching up. Despite this, it is clear that the next few years are critical for countries imposing open data mandates as they will determine the success of existing policies as well as the velocity with which newer ones emerge. It is heartening to witness a general acceptance of the idea of open data in science: that scientific data should be freely available to all in the spirit of encouraging scientific discovery.
Note: This post only presents an overview of data sharing and the most prominent policies implemented by various countries. Other aspects of open data and data sharing, which merit attention as separate detailed discussions, are: the process of setting up data repositories, costs involved in making data open, exceptions to data sharing, various types of licenses, researchers’ attitudes toward these policies, the need for safeguarding publicly available data, reactions or responses to open data mandates, measurements of the success of these mandates, country-level differences in policies, and a comparison of world regions (e.g., Europe versus Asia).
Related reading: Open access and data sharing are exciting phenomena, and Editage Insights has published some posts on these topics. You might be interested in reading our previous post where we discuss the pros and cons of data sharing, and the post where we talk about researchers’ incentives behind data sharing. To understand this issue from an industry professional’s perspective, check out our interview series with Dr. Caroline Sutton. Some of the information in this post has been sourced from Mark Hahnel’s insightful blog.