What does big data mean for science?
Till a few decades ago, science was largely the product of three interrelated paradigms: experimental, theoretical, and computational. The advent of the Internet and computing massively expanded the capacity to collect, store, and share data. As a result, the speed of data generation has accelerated to such an extent that scientific data output is increasing at an annual rate of 30% and is doubling in size every two years. This influx of data is described as “big data.”
Big data, also described as data deluge or tsunami, comprises of all data that arises from the Internet, smartphones, scientific studies, businesses, governments, and other sources. Academia has welcomed this data flood as it has opened several avenues for scientific exploration. In fact, big data is being considered by many as another paradigm of science that is based on the collection and analysis of large amounts of data.
Big data has become a buzzword not only due to its quantity but also its uses. The availability of large volumes of data has brought a change in the very nature of science. Scientific advances have now become more data driven and data intensive. Across scientific disciplines, researchers see the potential of big data translating into standard scientific methods and processes. While earlier, only fields such as high-energy particle physics or nuclear fusion relied on big data sets, the potential of big data is now being explored by researchers from other disciplines such as biologists, chemists, physicists, astronomers, and genome scientists to create better scientific models.
Dr. David Rossell, a statistician, provides instances of how science is benefiting from big data. According to him, big data offers unprecedented opportunities for personalized medicine as characterization of complex diseases at the molecular level can be combined with medical and treatment history, and diagnostic or imaging tests. The Large Hadron Collider that records data 40 million times per second to test theories in physics is another instance of the application of big data. Large data sets can also help manage cities and natural resources, study climate change, devise political strategies, and study how ideas spread. Apart from this, the availability of data across the globe has expanded the horizon of scientific studies as researchers can share their data and collaborate with each other with greater ease.
While scientists view accessibility to more data in a positive light, many statisticians believe that scientists should be wary of big data since it has many downsides too. The primary challenge while working with big data is storing and managing it efficiently. Big data sets are too complex to be analyzed using traditional data processing methods. Apart from this, big data can result in bad science if research is purely driven by data without careful consideration and analysis. Does big data mean good data? What challenges do researchers face while handling big data? These questions will be discussed in detail in another article.
You might like to read the interview with Dr. Jo Røislien, a well-known Norwegian mathematician, biostatistician, and researcher in medicine, in which he provides meaningful insight about the issues with big data and statistical analysis.