Jo Røislien is a famous international science communicator who reaches a wide audience by appearing regularly on radio, television, and the printed press. He delivers lectures on the communication of complex topics, knowledge dissemination, and his own research. He has hosted programs such as the mathematics and statistics series Siffer on Norway’s largest national broadcaster NRK 1 and multiple science series on Discovery Channel. He also produced the short mathematics film Chasing the world’s largest number through Bulldozer Film along with director Christian Holm-Glad. He debuted in the series Digits also on NRK 1, which was nominated for two 2012 Gullruten awards - Norwegian’s annual TV awards – for Best New Series and Best Lifestyle Series.
Røislien has produced many articles and texts on mathematics, including the book Number Stories (2013), recipient of “This Year’s Most Beautiful Book Award” and quoted as “a little gem of a statistics book” by The Journal of the Norwegian Medical Association.
He is a Norwegian mathematician, biostatistician, and researcher in medicine, and holds a PhD in geostatistics from the Department of Petroleum Engineering and Applied Geophysics at Norwegian University of Science and Technology (NTNU). He has served as Research Advisor at Rikshospitalet University Hospital and postdoctoral researcher in the Department of Biostatistics at the University of Oslo (UiO), and later as Senior Scientist at The Norwegian Air Ambulance Foundation. Røislien has collaborated with numerous medical research projects as a statistical consultant and researcher, including Center for Morbid Obesity at Vestfold Hospital Trust, Sunnaas Rehabilitation Hospital and the Norwegian Institute of Public Health. Currently, he is an Associate Professor at the Department of Health Sciences, University of Stavanger, with additional posts in the Department of Biostatistics and the Center for Addiction Research at the University of Oslo.
In this first segment of a three-part interview, Dr. Jo Røislien provides meaningful insight about the issues with data and statistical analysis.
Could you tell us about your research interests and what drew you to them?
The ‘Methods’ chapter is the most important chapter in any research publication. Obviously. The conclusions are based directly on the results, and the results are a direct consequence of the methods applied. That’s why academic debates are rarely on results, but mostly on method. If the method does not hold, the results are of no value.
As a statistician I always side with methodology. Even though I have ended up in medical research, I was never really attracted to medicine when I was younger. I studied engineering, mathematics, informatics, statistics. 10 years ago, after finishing my doctorate in geostatistics and petroleum engineering, my little sister encouraged me to apply for a position as a statistical research advisor at Oslo University Hospital. I can still remember the first time I was sitting at my desk, working on some problem, and it suddenly struck me that the equations on the pieces of paper scattered around my desk actually represented life and death. It was powerful stuff. I never looked back.
I have collaborated in numerous medical research projects, in a wide range of medical and statistical topics. And slowly my main research interest has drifted towards the analysis of temporal data. Mainly the issue of how to properly analyze data where there are multiple temporal measurements for multiple individuals, but also the issue of modelling multiple layers of temporal effects simultaneously, such as the combination of a long-term non-linear increase, seasonal effects and weekly effects. Problems like these have popped up in my collaboration with everything from the Norwegian Institute of Public Health, the Norwegian Centre for Addiction Research, and the Norwegian Air Ambulance Foundation.
As the field of medical research has grown older, data collection has also become more sophisticated. Not everything can be tested in a randomized controlled trial, so in order to establish causal relationships, study designs now tend to result in increasingly more intricate data. There is a lot going on within this part of methodology research, as a lot of the statistical models needed for analyzing such data are either too simple, or simply non-existing. It’s a great time for being a biostatistician.
Can you please provide an overview of the nature of data today in the biostatistics field?
Working as a statistician in the medical sciences is a gem. The types of problems is extremely varied, and so are the data one can encounter: Lab studies with six mice, small randomized clinical trials with 50 people divided into one of two groups, large observational longitudinal studies of potential health risks with tens of temporal measurement points for each of thousands of individuals, registry studies with millions of individuals measured on hundreds of variables, analyses of radiological images of cancer cells in 2- and 3-dimensional geometric space, genomics data in 10,000 dimensions.
In the medical sciences data are rarely collected for the sake of merely collecting data. Usually we have a hypothesis we want to test, a question we want to find an answer to. So data collection is often very specific. And also often very time consuming. PhD students can often spend the first few years of their work collecting data, before they can even start to think about analyzing and actually getting any results. I am obviously pro the idea of increasing access to data, but it is easy to be pro when you are on the receiving end. When you have spent years of your life collecting data to answer a specific question you are curious about, only to have people call you conservative and selfish for not immediately making the data freely available to everybody, it isn’t equally obvious how to go about distributing research data fairly.
Collecting data can even be so challenging that just the ‘how to’ is a research field in itself. I currently collaborate with the Norwegian Air Ambulance Foundation on increasing knowledge in the field of pre-hospital medical care. And here collecting data is a core issue. When you land a helicopter as the first on site after a dramatic accident, your first action is not, and cannot be, to note down various possible confounding factors on a standardized form, or to measure a series of baseline values for various biomarkers in the blood of the patient. However, starting to measure these things after the patient arrives at the hospital means you have no information on the process the patient has gone through from first contact to hospital care. So how can you know what actions you took along the way actually helped?
The collaboration with various disciplines—engineers, molecular biologists, and so on—is a must in order to figure out what to measure and how to measure it, preferably often, maybe even continuously, in real time. This would dramatically increase the possibilities of figuring out how the body reacts to various interventions. We know that time is of the essence, but how, exactly?
What challenges or dangers do you see particularly with data mining in general for researchers?
Chance is a funny thing. There is a lot more structure in chance than we like to admit to ourselves. So I belong to the group of people that tend to raise a warning finger when the topic of data mining comes up. Statistical analysis is not about “finding patterns in data.” There will always be patterns in data. Small patterns, large patterns, simple patterns, intricate patterns. And the more closely you look, the more patterns you will find.
The question is thus not whether you will find patterns in your data if you start digging, but whether the patterns you have found reflect some actual structure in the data, or are just due to chance. Statistical analysis is the act of separating actual patterns from random patterns. True association from mere chance. And, most importantly, statistics quantifies the degree of certainty with which you should trust your results. Or not.
So when I was asked to give a TEDx talk in Oslo fall of 2013 on “The Paradox of Plenty,” this was my topic of choice. The paradox is that in large datasets you are destined to find more patterns, more intricate patterns, just by chance. At the same time we put more confidence in patterns found in large datasets just because we have large amounts of data.
You needn’t many classes of statistics and scientific research methods before you realize that digging away in data can be even more harmful than doing nothing. The big challenge today is not the lack of data, but the lack of good analyses.
There is a recent research initiative in Norway called BIG INSIGHT with the aim of developing statistical methods for large datasets. I am really looking forward to seeing what they come up with.