Misuse of bibliometric analysis shifts scientists' focus from their research to pursuing scores
Since 1983 David A. Pendlebury has been a citation analyst at Clarivate Analytics, formerly the Intellectual Property and Science business of Thomson Reuters. After undergraduate and graduate studies in ancient history, David began working as a translator and indexer at the Institute for Scientific Information (ISI), which was acquired by Thomson Reuters in 1992, and worked with ISI founder Eugene Garfield on his personal research projects. In 1987, he developed the Research section pages of the newspaper The Scientist. Two years later, he joined the company’s Research Services Group and helped launch the newsletter Science Watch. As a member of the Research group, he helped design and develop Clarivate Analytics Essential Science Indicators, a database that provides publication and citation data on performance and trends in global research. Additionally, David has extensive experience of working with federal agencies, academic institutions, corporations, and science publishers worldwide.
Understanding the impact of research is vital. Today, a rapidly changing digital scholarly publishing industry poses both new opportunities and challenges to those who assess research impact. In this interview, David throws light on bibliometrics and its uses, discusses his work at Clarivate Analytics, and talks about the Eugene Garfield Award for Innovation in Citation Analysis that Clarivate Analytics announced.
What are some of your main responsibilities as a Consultant at Clarivate Analytics? Also, if you could tell us a bit more about how you developed the Clarivate Analytics Essential Science Indicators database that would be great!
I focus on communicating the possibilities and power of citation analysis for information retrieval, research assessment, and science monitoring. Essential Science Indicators (ESI) was created by a team in the Research Department, led by Director Henry Small, at (then) Thomson Scientific around 2000. We were aiming to provide easy-to-access publication and citation statistics at the level of papers, people, institutions, nations, and journals for a 10-year period in 22 broad field categories. ESI also includes valuable data on research fronts, which are specialty areas that are identified through co-citation analysis of highly cited papers published in the last 5 years. Co-citation analysis finds similarity between publications because they are frequently cited together. Henry pioneered research fronts derived from co-citation clustering and science mapping in the 1970s and 1980s. ESI data are updated every two months, making this database the most up-to-date window on key research activity available today. ESI is now part of the InCites platform, which also includes the Journal Citation Reports database including impact factors. The InCites platform uses our Web of Science data to provide users with publication and citation data for multidimensional research assessment and benchmarking. So these tools and data are designed to provide insights on the structure and dynamics of research, to reveal significant and growing areas, to identify top performers, as well as to aid in information retrieval or discovery.
As part of your work, you undertake special projects such as predicting Nobel Prize winners. How do you go about this?
From the earliest days of Eugene Garfield’s work on the Science Citation Index, the differences between elite scientists, as represented by Nobel Prize winners, and so-called normal scientists were evident in our data. Garfield demonstrated in 1965 (with only a few years of data) that, on average, Nobel Prize winners published five times more and were cited 30 to 50 times more than the typical researcher. He also showed that Nobel laureates almost always produced one or more ‘Citation Classics’ – papers that rank by citations in the top 0.1% for their discipline. In the fields of science recognized by the Nobel Prize – Physiology or Medicine, Physics, Chemistry, and Economic Sciences – our researchers at Clarivate Analytics search for extremely highly cited papers (typically more than 2,000 citations) and examine these to identify their authors, the nature of the findings reported in these papers, and whether the research and the researchers have previously been recognized by the Nobel Prize. In many cases, we find this to be so and then go on to focus on the highly cited papers of scientists who have not yet received that exciting phone call from someone with a Swedish accent. There is no mystery about the correlation between highly cited papers and peer recognition in the form of top prizes: both reflect high esteem, the first arrived at quantitatively and the second qualitatively through the judgment of colleagues.
For the benefit of our readers, could you elaborate on the difference between bibliometrics and scientometrics?
Bibliometrics derives from the Greek words biblios and metron, meaning book or scroll and measurement. Thus, bibliometrics is the measurement of various aspects of publishing, whether books or journals. Early on it was an approach that librarians used to improve their collections by identifying so-called core journals and observing patterns of usage, thereby providing a scientific basis for collection development decisions. The word scientometrics was probably first introduced by the Russian polymath Vasily Nalimov, who spoke of naukometriya in the late 1960s. Nauk is the Russian word for science. So, bibliometrics applied to scientific research is scientometrics. Scientometric studies are far broader than analysis of science journals for libraries and also include research performance, innovation, science communication, field structure and dynamics, and policy matters including funding.
What, in your view, are the pros and cons of using bibliometrics?
Your question might be rephrased as “What is the use of measurement” and “What are the dangers of measurement”? Let’s stipulate that measurement can be quite useful. The Kelvin dictum says we know more about what interests us if we can measure it and that without measurement our knowledge is meager. I would like to focus on the possible ”cons.” These include: using incomplete or inaccurate data; employing measures that do not answer the question asked; relying on single and composite measures, which are insufficient to portray the many and different facets of research activity and impact; failing to use relative or normalized measures that ensure like-for-like comparisons; and, believing that the data speak for themselves and can be used apart from interpretation by field experts.
Do you feel the bibliometrics approach has been misunderstood or used incorrectly by many stakeholders of science such as policy makers and granting committees? What are some of the common forms of misuse?
Yes, indeed, and it is distressing. Implementing a simplistic, single measure system for evaluation and grant-making (such as the h-index or average impact factors) -- which I have unfortunately seen far too often -- undermines confidence in the utility and worth of bibliometric analysis and changes the behavior of scientists who then pursue scores rather than focus on their research. This is corrosive to science. One way to guard against such misuse is to ensure that citation analysis is a supplement to peer review and not a substitute for it. People determine substance and quality whereas bibliometric measures are proxies or indications that suggest but do not prove significance or value.
What do you perceive as challenges for bibliometrics in the ever evolving scientific landscape?
One is the demand for measures of influence beyond academic impact, that is, beyond the walls of the university. Of course, there has long been interest in tracing the impact of basic and applied research on innovation. Clarivate Analytics has used its Derwent patent data to measure innovation for more than 50 years. One important area of study currently is the analysis of scholarly literature cited by the most cited and most valuable patents, which reveals something of important linkages between academia and industry. More and more, universities want to demonstrate their contribution to the engine of economic growth to justify generous public funding of research. With the rise of social media, there is a hope to gather new indicators of research impact, especially social and cultural benefits deriving from research activity within universities. Altmetrics is the term that is generally used to describe several different types of data and potential indicators, such as usage measures, recommendations or bookmarks, news items, blogs, tweets, and many others. As indicators of impact these are heterogeneous in their meaning and significance. Altmetrics is one of the most active themes in scientometric research but much more study is required to understand different altmetrics indicators -- their nature, meaning, and dynamics -- and whether they are related in any way to research impact even as more broadly defined. For those that give insight to impact, there will be a need to normalize the indicators for age and field or topic, something that is only beginning. There is no prospect at the moment that altmetrics will overtake traditional metrics. Some may supplement traditional metrics, but it is very early days still.
I was intrigued by one of your quotes in a recent press release: “Careful analysis of publication and citation data represents a data-driven approach to science policymaking and funding and can be a key strategy for addressing weaknesses and building on strengths.” Could you elaborate? How is citation data used for policymaking and/or funding decisions?
One of the greatest benefits of bibliometric analysis of the literature is its top-down approach. It is possible to summarize vast amounts of information and determine key features in the research landscape that might not otherwise be seen or appreciated because of the usual bottom-up view that derives from more limited personal knowledge and experience in the context of peer review. Second, the characteristic distribution of citations, which is highly skewed, allows one to focus on the largest or tallest structures in the landscape and to do so quickly and efficiently. Of course, what is prominent in one area may be small in another owing to differences in average citations from one field to another. And one must always make adjustments for the time dimension because older papers will have had longer to accumulate citations than younger ones. Therefore, relative or normalized measures are required. What citation analysis can show is positive evidence of research impact in a field or specialty and how that impact is related to that of others, whether researchers, institutions, or nations. This evidence will provide a better understanding of influence or impact in context. Since not everything can be funded, the logical approach is promoting or funding those who have produced research that has proven to be influential. That is not to say, however, that only those with a record of high research impact according to citation indicators are deserving of support. As has been noted by many, ”absence of evidence is not evidence of absence” so one must make room for policy and funding decisions based on knowledge and intuition apart from quantitative measures of past successful performance. And this is especially so with respect to support for early career researchers.
This is only part of a comprehensive data-driven approach to strengthening research capacity. The lifecycle of research is greater than the publication of papers and the subsequent citations they attract. Before publication there is peer review. Scientists devote enormous effort to improve the research record before publication. A university whose researchers are engaged in this activity should be recognized, even rewarded as part of an overall strategy for an institution’s ongoing program to ensure excellence. This is why Clarivate recently acquired Publons, the leading global platform for researchers to share, discuss, and receive recognition for peer review and editing of academic research (http://news.clarivate.com/2017-06-01-Clarivate-Analytics-acquires-market-leader-Publons-creating-the-definitive-publisher-independent-platform-for-accelerating-research-through-peer-review). Capturing and measuring this dimension of research activity expands an institution’s data and supports its decision-making.
What have been some of the most innovative developments in scientometrics in recent times?
I’ve mentioned a few, such as altmetrics and context or sentiment analysis made possible through access to full-text data. The analysis of funding data has become possible through capture of this information when it is presented in papers. Clarivate Analytics has indexed funding sources since August 2008, so we now have nearly a decade of data. Connecting funding sources to published papers and then to their impact as revealed by citations is a new frontier, and certainly funders want to know more about the outcomes and impact of their funding decisions. The desire to accelerate innovation on the part of industry, universities, and government and private funders has fueled more and more studies on interdisciplinary research, its features, nature, and potential to drive discoveries. Defining interdisciplinarity is challenging and can be viewed in different ways, especially as traditional field boundaries have less and less meaning. Nonetheless, studying retrospectively and prospectively how important findings emerge when knowledge from different domains is combined is, in my view, a fertile area in scientometrics. And somewhat related to that has been the growth in science mapping thanks to increased computer speed, storage, and the availability of software created by several academic groups that now permit easy, do-it-yourself visualizations of many types.
Could you share your experience setting up Science Watch? Do you have any interesting anecdotes to share?
In 1989, Henry Small asked me to create a monthly newsletter that featured short articles about research performance and trends in science using our data. We combined traditional science journalism with our publication and citation indicators. We tried to highlight developments overlooked by the mainstream press and to detect and highlight emerging trends. In each issue, we also featured an interview with a highly cited researcher and published lists of the top 10 hottest papers in Medicine, Biology, Physics, and Chemistry, along with commentary by field experts. We define hot papers as reports that are two years old or younger that have been cited in the top 0.1% by citations for their field and time of publication. If you page through the old issues of Science Watch you will find interviews with many scientists who went on to receive a Nobel Prize and other top international awards. I think you would also say that our feature stories and hot papers lists provided a good summary of then current science findings and trends. Ultimately, due to a change in editorial direction, the original ScienceWatch site was discontinued in 2015. Please note that archived content is still available, and the thread of citation-based analysis is still prominent in various online materials from Clarivate – including, for example, white papers and reports that examine national and regional research or focus on specific areas of inquiry.
Today, the scholarly publishing industry is rapidly undergoing a digital transformation. Data can now be stored in multiple formats on multiple platforms by multiple people. Does this accessibility complicate information retrieval? How can bibliometrics help information retrieval in the complex digital journal publishing space?
The progression from print to digital media is most welcome. It has and will continue to revolutionize information distribution, use, and analysis. I like to hold a book or journal in my hands and find printed material easier to read than from a monitor, but that is perhaps the only superior feature of print that comes to mind. Of course there are adaptations needed to take advantage of the possibilities offered through the digital transformation. Digital Object Identifiers (DOIs) are essential as are unique author and institutional identifiers, which are being increasingly adopted. Unique author identifiers such as ResearcherID or ORCID, when near universal, will greatly aid scientometric analysis by addressing the author name disambiguation problem. Full-text versions of papers are more readily available thanks to open access publishing. This permits the analysis of citing sentences that can reveal the context and sentiment of the citing occurrence. Differentiation of the “quality” of citations has been discussed for decades but is now technically possible on a large scale. When I say “quality” I mean discerning whether the reference is positive or supporting, negative or critical, or simply neutral. To accelerate this development, Clarivate Analytics recently announced a grant in support of ImpactStory’s oaDO service, which delivers open-access full text versions of publications over a free, fast, open API (http://news.clarivate.com/2017-06-23-Clarivate-Analytics-announces-landmark-partnership-with-Impactstory-to-make-open-access-content-easier-for-researchers-to-use) I suppose I should mention ”big data” analysis, but this is a term understood differently by different people and is somewhat hyped. Certainly, however, full text of papers as well as their associated data sets will be mined to extract all sorts of new associations and connections. This mining is not limited to text but can draw on citations too. This is already happening.
All this sounds exciting. We have a slightly personal question for you. Eugene Garfield was one of the pioneers of scientometrics and you have worked closely with him for several years. Would you tell us about your experience?
It was a privilege and real honor to be an associate of his for more than 30 years. He was a mentor and a friend. Many think of him as a businessman or entrepreneur who designed and sold database products such as the Web of Science and Current Contents. But I consider him first and foremost a researcher who loved nothing more than analyzing and understanding the data he was harvesting to create these products. His scholarly contributions – not merely his creation of a citation index for the sciences – made him, along with Derek de Solla Price, a founding father of scientometrics. Oh, and did I mention he was a genius? He was that, of course, but he was also generous and kind-hearted. I do miss him.
Recently, Clarivate Analytics announced the Eugene Garfield Award for Innovation in Citation Analysis. Could you tell us more about this award?
Soon after Gene passed away in late February this year, Clarivate Analytics decided to establish an award in his honor and memory. Several of us involved in fashioning the award chose citation analysis as the core theme of his life’s work because the cited reference was the focal object around which the Science Citation Index was organized and that he studied in one form or another throughout seven decades. The award will support research projects that involve citation analysis but are not limited studies of research performance alone; they may include analysis of the structure of science, science mapping, monitoring of trends, as well as the role of citations in information retrieval, which was Gene’s first interest. The initial award will be announced during a celebration of Gene’s life that will be held in Philadelphia on September 15-16 this year. There is a $US 25,000 purse that goes with the award as well as access to Web of Science data to support a research project. We are asking for applications from early career researchers, those who have received their Ph.D. degree no more than 10 years ago. The application form is brief and can be found at http://clarivate.com/eugene-garfield-award/. We will consider all applications received by July 21st.
Thank you, David, for this extremely insightful interview.