Linking statistical significance to clinical importance of trial data: A paradigm shift
In evidence-based medicine, statistical information is critical for investigators to interpret observations and make treatment recommendations. A voice of dissent opposing over-reliance on p-value based decision making — the widely accepted and overly practiced method for analyzing clinical trial data — is becoming stronger in the research fraternity. Several recent publications in reputed journals are questioning the popularity of the concept of ‘’statistical significance.”
The p-value was introduced to statistics not as a definitive test but as a tool to judge the probability of evidence gathered from an experiment holding true when the experiment is repeated. In brief, p-values range from 0 to 1; and the lower the value, the lower the probability of the results being attributed to pure chance. Conventionally, a p-value of 0.05 is the threshold to determine reliability, and consequently, publication-worthiness. This threshold, nevertheless, is random and the p-value is essentially more of a pragmatic tool, which when combined with background knowledge, could lead to a better scientific understanding. In fact, Regina Nuzzo, a professor at Gallaudet University, in her award-wining article opines that the magical 0.05 is a boundary too permeable to be taken seriously, as adding some extra data can change an effect from being significant to non-significant.
Over-reliance on the p-value to determine the actual treatment effects has populated biomedical literature with studies reporting statistically significant results while not accounting for factors that are essential to prove the clinical worth of a trial. This discordance stems from the fact that binary boundary of statistical significance precludes crucial factors like the size of the treatment effect, treatment effects on secondary endpoints, the implications of these effects in general risk/benefit assessment, the biological likelihood of the effects, reproducibility, and generalizability of the observation from the inferential process. In recent years, at least one academic journal Basic and Applied Social Psychology has banned using p-values. The decision may be jarring, but in high-impact journals, researchers like Buyse et al. are publishing papers that are indeed contending the interpretation of clinical data based on non-significant statistical results.
Clinically relevant changes are often identified by terms like minimally important changes (MIC) or minimal clinically important differences (MCID). Unfortunately, clinical significance is not well-defined in the context of objective measurement. But assessing the clinical significance of a study by means of statistical data would surely require one to think beyond the p-value.
The way forward to blend clinical importance with statistical significance
A dichotomous way of looking at the world of clinical trials in terms of results being statistically ‘significant’ or ‘non-significant’ often distorts the broader interpretation of data gathered so far. The magnitude and relative importance of an effect, expressed by effect size and confidence interval, are considered more robust ways of reporting a clinical trial result.
Effect size: The simplified interpretation of the effect of a treatment in terms of ‘yes’ or ‘no’ may be attractive, but unrealistic to researchers of a non-binary world who want to measure the effect of a treatment and its biological importance on a scale. Including effect size in the analysis of clinical data would be a telling way of assessing the clinical significance. It reflects the magnitude of the difference in outcomes between groups; a larger effect size of a treatment indicates a greater difference between experimental and control groups and more meaningfulness for patients.
Confidence interval: Confidence interval is a method preferred by many researchers and endorsed by Consolidated Standards of Reporting Trials (CONSORT) statement, as it suggests the level of uncertainty around the measure of effect. In other words, by having an upper and lower confidence limit one can infer that the true population effect lies between these two points. In addition to conveying whether the result is statistically significant, as the p-value does, it indicates the precision of the result.
Bayesian approach: The problem with p-value based inference reflects a logical fallacy of real life, known as transposed conditional. Just as the probability of feeling fatigued for an anemic patient is not the same as the probability of a fatigued individual having anemia, a <0.05 p-value conveying a difference between the intervention and control groups does not indicate the probability of the treatment actually working. To capture the fluidity and uncertainty of real-life scenarios, a paradigm shift in the analysis of clinical trial data has occurred with the Bayesian approach. The approach addresses research questions mimicking a physician’s process of critical thinking, which involves making decisions only after considering factors such as the prevalence of the disease, patients’ demographics, and symptoms, assessing the pre-test probability, and thereafter conducting a diagnostic test. Going by the increasing number of studies, such as the one by Bittl and He, promoting this approach, Bayesian statistics seem better equipped to integrate statistical evidence with clinical significance than classical statistics.
While significance testing will always have its proponents, researchers should now start recognizing its pitfalls. When reporting a clinical trial result, the best way to help readers assess the significance would be to clearly report every crucial detail of the study and share all the clinical knowledge available to the researchers.