STAP case revisited: Is mandatory data sharing the solution?
Nullis in verba or "take no man’s word for it" comes from one of the Roman poet Horace’s many letters, collectively published as two books between 20 and 14 BC. Many centuries later, the import of this phrase has come to hit the scientific community with renewed vigor, in the backdrop of widespread concerns about the STAP methodology.
As noted in my previous post, a handful of the STAP authors from RIKEN have issued a protocol exchange document detailing ‘essential technical tips’ to reproduce STAP cells. The listed protocol goes into far more detail than the original methods section published in Nature. Its purport, as gathered by Paul Knoepfler, a leading stem cell scientist and blogger, is that the process of producing STAP cells is not as easy as it first might have seemed. There are specific conditions for isolating and treating the cells that need to be followed.
This protocol document was published on March 5th, almost 5 weeks after the publication of the STAP articles. Equipped with the newly released information, independent labs may or may not be able to replicate the original results and the ruffled feathers of researchers worldwide may or may not be soothed. However it might pan out, the STAP saga raises a very fundamental question regarding scientific publishing practice: Should researchers make all research data aimed at reproducing study results available to peers, or rather, should journals make it necessary for authors to share raw data online?
This question is by no means new and has for long lurked along the fringes of discussions—seldom as central as peer review efficacy or open access models—on acceptable publication practices. But the STAP case has suddenly and sharply drawn it into focus, making it an opportune time for us to take a look at it.
Last December, PLOS announced that to ‘best foster scientific progress, the underlying data should be made freely available for researchers to use, wherever this is legal and ethical.’ Proof of the significance of the announcement came in the form of two subsequent updates from PLOS in which it acknowledged the ‘flurry of interest’ and the ‘extraordinary outpouring of discussions on open data and its place in scientific publishing.’ BMJ, another votary of data sharing, encourages authors submitting to BMJ Open to share raw data. So does Springer Open. Leading international research funders such as the US National Institutes of Health, the UK Medical Research Council, and the Wellcome Trust have their respective mandates on the issue but, here too, the common thrust is toward more data sharing.
For years logistical and technological archiving constraints had led to withholding of data. But advances in information technology engendered a spurt in databases able to hold large and varied datasets. Researchers welcomed the development, universities showed their will by undertaking mammoth exercises to build repositories. Like in the case of the University of Rochester, profiled in Nature. Yet none of this actually translated into a bigger bank of open and usable data. Why?
Researchers are too busy to make time to make their data available. Writing for the Scholarly Kitchen, David Crotty says that researchers value time at hand as their most important resource. They prize it over the perceived benefits of better access and more visibility.
Not all research fields are the same. What is accepted data-sharing custom in field A might be an unwelcome change in field B. Disciplines like genetics and bioinformatics are more receptive to data sharing than are most of the health sciences fields.
Extra personnel hours and tight research funds don’t go hand in hand. Not all research funding factors in costs for data sharing and researchers would much rather invest in additional experiments than expend resources curating data for a hypothetical good.
Time, money, and disciplinary culture are hardly the only peeves. Imagine the consequences if raw data sharing is enforced across the board for all research disciplines. This will mean clinical investigators will have to make public confidential information about human subjects, which might in turn make obtaining consent a headache.
Some opine that sharing raw data as and when asked for them is a far better option than publicly sharing large volumes of data that can take care of any scientific request that will ever be made. In the status quo, editors and reviewers are not loathe to demanding more data from authors if additional information is required to verify an investigation’s findings. As goes a famous quote by Carl Sagan, ‘Extraordinary claims need extraordinary evidence.’ The more seminal the finding, the more data needed to back it up.
What then does all of this say about the state of scientific publishing today? Is it a muddle? Is the recent tilt toward data sharing just a trend that will slacken with time? On available evidence it might be tricky to find definite answers but whatever they may be, history is proof that science has always found a way to fix itself.