The irreproducibility problem is serious, but it is also misunderstood

Researchers are busy people, and Dr. Jonas Ranstam is perhaps the busiest of them all. Dr. Ranstam is, officially, the world’s most prolific peer reviewer, having reviewed as many as 661 papers in a year. In 2016, this medical statistician was the overall winner of the Sentinels of Science Awards initiated by Publons to recognize the efforts of reviewers. He was also acknowledged as one of the Top Reviewers for 2016 by Publons. I feel honored for having this opportunity to talk to Dr. Ranstam about a range of topics from medical statistics to peer review.

Before retiring from being a full-time academic, Dr. Ranstam was affiliated with several institutions, including Lund University, Sweden, as Professor and senior lecturer of medical statistics. Currently, as a medical statistician, Dr. Ranstam acts as a statistical advisor to clinical and epidemiological investigators at academic and research institutions, hospitals, governmental agencies, and private companies. He also offers his expertise to Osteoarthritis and Cartilage (as deputy editor), the British Journal of Surgery (as statistical editor), and Acta Orthopaedica (as a statistics consultant), and a statistical reviewer for several international scientific medical journals. He also maintains the Statistical Mistakes blog, which focuses on systematic reviews of statistical mistakes in medical research and presents references to literature describing how to avoid such mistakes.

In this first segment of the interview, Dr. Ranstam talks about a variety of topics – statistical methodology, the blog he maintains, the problem with disclosing the uncertainty of findings in medical research, and the irreproducibility crisis. He also reveals the common mistakes researchers make when presenting statistical data in their manuscripts.

Let’s begin by talking about your current profile. What do you do as an independent statistician/consultant?

I work with medical research problems, mainly in the area of clinical treatment research. For example, I participate in the development of study design in several research projects, and I write study protocols and statistical analysis plans. I analyze data and write research reports. I also review manuscripts, grant applications, and sometimes job applications. However, in contrast to my previous job as a university professor, I have very few administrative tasks and almost no teaching.

What led you to start your blog, Statistical Mistakes?

It started with a reference list for my own use. I often include references to published papers in my review comments to facilitate learning for the authors, and I wanted to have easy access to my list. Just keeping it in a Word document was not a good alternative as I usually work with different computers and at various locations. The simplest solution turned out to be the WordPress blog system.

I didn’t see it as a disadvantage that the list became public. I thought this could be useful also for others writing and reviewing manuscripts.

I am engaged in two other blogs as well, ArthroplastyWatch represents an international collection of joint replacement safety alerts, and DRICKSVATTEN.BLOG, a national collection of local Swedish drinking water alerts.

On your blog, you mention that medical researchers are “ignorant about statistical methodology.” How can this change? How could a medical researcher or any researcher working with data and using statistical analyses be made more aware of the problem?

Yes, that is unfortunately true. Douglas Altman once wrote [Altman DG. Statistical reviewing for medical journals. Stat Med 1998;17:2661-2674] that “the majority of statistical analyses are performed by people with an inadequate understanding of statistical methods. They are then peer reviewed by people who are generally no more knowledgeable”.

The consequences of the statistical mistakes affect us all. Without them we could have had more effective treatments with fewer complications and lower costs. I believe that the main problem is that successful medical research requires understanding of stochastic phenomena, and most medical researchers tend to have a deterministic orientation.

Several attempts to improve the quality of medical research are made. The importance of statistical reviewing is, for example, considered increasingly important in many medical journals. The use of public trial registers and compliance with reporting checklists, such as CONSORT, PRISMA, and ARRIVE, has also become an integrated part of the requirements for having manuscripts accepted for publication.

During one of your presentations, you mentioned that “Many (if not all) authors severely underestimate the uncertainty of their findings.” Could you elaborate?

Medical research is mostly quantitative, i.e., it includes quantification of the finding’s sampling and measurement uncertainty. This is usually measured in terms of p-values and confidence intervals. Non-significant results are often considered too uncertain to be publishable.

It is, however, possible to give the impression that the uncertainty is lower than it is, even when p-values and confidence intervals are correctly calculated. For example, hypotheses generating study results can be presented as if they had been confirmatory, and the effects of multiple testing can be ignored, or corrected for in an inadequate manner. Such inadequacies are not necessarily intentional, but the general methodological practice seems to have a tendency to produce research findings with systematically overrated empirical support. Given the importance of publishing in a “publish or perish” culture, this development should perhaps not come as a surprise.

In another presentation, you mention that journal editors are keen on publishing guidelines because guidelines generate citations. Could you please elaborate?

It has been discussed that some publication types, such as review articles and guidelines, generate more citations than other types and therefore have greater influence over a journal’s impact factor.

I don’t know how well this phenomenon has been studied, but I remember that when I started my career in medical statistics, the most cited publication in medical research was Sydney Siegel’s Nonparametric Statistics, a statistics textbook with guidelines on the use of distribution-free tests.

What role do data management, data storage, and data sharing play in medical statistics and biostatistics research?

My personal opinion is that the reproduction of results is important and necessary, but the discussions on open data and data sharing also seem a bit naive. Working with complex database structures and advanced statistical analyses presents many problems that shouldn’t be underestimated. Mistakes and misunderstandings in a statistical reanalysis can easily discredit sound research findings. I believe that public sharing of data needs to be combined with measures to avoid such problems.

In your opinion, how big is the irreproducibility problem facing science? How can it be addressed/fixed?

The irreproducibility problem is serious, but it is also misunderstood. Scientific development relies on the questioning of established truths; to reproduce results is an important part of this, and not succeeding is not necessarily a bad thing.

I believe that it is important to label studies correctly. Many studies are exploratory – the aim is to generate hypotheses. Such studies can be well planned and performed, but they can also be fishing expeditions with results that are mere speculations. The uncertainty of these findings cannot be reliably calculated, so why should the results be reproducible?

However, also the results from confirmatory studies are uncertain but at a defined level, because they are designed and performed in a way that enables calculating the inferential uncertainty of their results. Nevertheless, a part of these results can be expected to be false and fail to reproduce.

Statistical mistakes, unfortunately, play a prominent role in many studies. Laboratory experiments, for example, often lack pre-specified endpoints and analysis plans include multiple testing with inadequate use of multiplicity correction, and are based on correlated instead of independent observations. In addition, whether or not the assumptions underlying the statistical evaluation are fulfilled is often ignored. Other, equally severe mistakes are common in epidemiological studies.

There is no simple way out of this mess, but statistical rigor is obviously necessary for a more rational use of our research resources.

In your experience as an author, reviewer, and editor, what are the most common mistakes authors make when presenting statistical data in their manuscripts? How can these mistakes be avoided?

The most common mistakes are, in my opinion, caused by the misunderstanding of p-values and statistical significance. These are measures related to uncertainty but are typically mistaken for tokens of importance.

Several recently published articles, including one from the American Statistical Association, have discussed these problems and proposed changes. One journal, BASP (Basic and Applied Social Psychology), has also banned the use of p-values and other statistical measures that form part of “null hypothesis significance testing”. However, ignoring inferential uncertainty just makes the situation worse.

That brings us to the end of this segment of the interview with Dr. Jonas Ranstam. In the next segment, Dr. Ranstam will talk about peer review in scholarly publishing. Stay tuned!

Related post

Careers outside academia: Interview with Gerlind Wallon

Careers outside academia: How her PhD helps...

Careers outside academia: Interview with Anirban Chakraborty

Filter by a topic