Video: AI is Not a Replacement for Human Peer Reviewer with Serge Horbach
AI is only an assistant, not a replacement! If you are someone who tends to expect AI to do the work for you rather than use it as an enhancement tool, you are in for a shock. In this video, Dr. Serge Horbach (Assistant Professor, Radboud University) talks about all the good that AI brings to peer review—but also warns against its misuse.
Q: Where do you see the greatest potential for AI to support peer reviewers today?
A: Well, I must say that I actually think the greatest potential of AI in the editorial peer review process is probably not to support peer reviewers currently but much more to conduct checks that are usually done by the editorial team. So, I think of checks related to research integrity, image duplication, scanners or reference. Those kinds of checks, research integrity checks, I think that is something where AI can very meaningfully support currently. They might also include checks regarding the scope of the journal. So, incoming manuscripts: Do they align with the journal’s scope? It’s, I think, a clear and obvious use case of the current AI tools to support the editorial process.
And then when you think of reviewers, I think the main potential currently is really to improve review reports in the sense of making them, perhaps more constructive or more helpful to the authors, rather than actually writing them from scratch, identifying weaknesses and strengths in manuscripts etc. So, it’s more about editing peer review reports to make them more helpful.
Q: What are the limits of AI when it comes to evaluating manuscripts? Can it handle nuance, ethics, or context?
A: I think really the only truly honest answer is that we don’t really know whether it can handle all these aspects in a proper way. And there’s been some research done but
It’s really unclear I would say. Then, the technologies are obviously very quickly developing. So, what it might not be able to do now, who knows what’s just around the corner and going to happen very soon. But well, in general, we know that these tools tend to be somewhat more positive than human reviewers. Of course, you can instruct them to be very critical and give negative reviews, but in general, they tend to come up with fairly positive evaluations, which then, I think again, points to this limit of maybe they’re not so suitable at the moment for identifying strengths, weaknesses etc. Maybe at least not to make editorial decisions or recommendations based on that. That is a clear limitation of them currently and instead they’re probably better suited to provide feedback or suggestions to improve the manuscript rather than to act like this gatekeeping tool or element in the process.
And something that we should be very aware of, not necessarily in the sense of it being a limit of AI, but more so that it’s a consequence of using these tools in the editorial or peer review process, is that they come to set quality standards. We’ve seen that very clearly with the text duplication or plagiarism detection software, which then comes to define in the end what originality really means. What is it that makes us think that this is original new scholarly writing? That is when the plagiarism detection software says that duplication is below this and that threshold. And the same will happen, if uncritically used, when we implement AI technologies, who then come to define quality standards in ways that are maybe not always so obvious. And that’s definitely, well again not an inherent limitation of the technology, but a very serious consequence of implementing the technology in systems.
Q: In your experience, what aspects of peer review are uniquely human and irreplaceable?
A: Well, we’ve touched on that a bit. I think it’s mainly this aspect of making valued judgments and deciding on what really matters and deciding what is really good quality or high-quality research, identifying or recognizing the beauty or the excellence of something. I think that is something inherently AI technology cannot do. It ultimately, fundamentally just works from an input set of criteria or just mimics past standards that have been used in its training data. But independently identifying the core quality of a piece of work on the review, that I think, remains uniquely human.
Q: Should peer reviewers be trained to work alongside AI tools? What kind of literacy do they need?
A: Well, I think sure if we get to the point where we ask reviewers to use AI tools in their work, then surely they would need training to do that properly and the efficacy of using such tools greatly depends on the way in which they are being used. And that is not self-evident. The output that you would get from these tools and how to interpret that is, for its own sake, not self-evident. Let’s again take that example of the plagiarism detection tools where it seems so obvious maybe from the outside and well you run the text or the tool on a piece of text and you get a score about how much overlap there is with the database against which it was checked. But just that percentage—I think everyone who has worked in editorial roles know—just that percentage doesn’t say so much. It really requires you to interpret what that percentage means in a specific context of the manuscript that you’ve just run through the system. And the same will apply to using AI technologies for any task in peer review. And the way in which we should interpret its output requires understanding of how the tool works and what its inner workings are. So yes, we would have to train reviewers to do that.
But again, I say one should ask whether implementing AI tools as part of the reviewer’s workflow is the optimal place that we can implement these tools. Currently I think other options would be to make this part of the editorial workflow, the integrity checking for fit with journal scope, etc. Or even on the author side of the process where an author could run his or her manuscript through an AI tool to get suggestions for improvement, etc., and do a bit of a pre-review check on a manuscript to flatten out any issues that may be flagged by the tool up front. And again, for any actor using the tool in any way that would require a brief or more elaborate training to do that properly. And then some have suggested a more elaborate use of AI tools in what they’ve called a human-assisted peer review by AI. So, where it’s really turning things around, not the AI supporting human reviewers, but rather just keeping the human in a loop to check the output of AI tools. Well, that would surely require some training, but more fundamentally, I think that is something to be wary of.
Clearly these tools hold potential in terms of efficiency, maybe even quality. But we know from research in other areas, especially the medical ones, that there might be some issues with these kinds of implementations where it’s humans who check AI output. First, that might not really lead to efficiency gains and in the end that might still require humans to do the full review and do the actual intellectual labor required and to review a manuscript or to review AI output regarding that manuscript. But also fundamentally, humans are just not so good apparently at spotting mistakes that these AI tools might make. Again, that’s been very fairly properly researched in the medical context where, for instance, these kinds of AI tools are used to interpret X-rays or any other scan, MRI scans or whatever other parts of the human body. A task that was, of course, initially always done by humans. These AI tools tend to be fairly good at doing that. But obviously sometimes they make mistakes. And in experiments where humans were then tasked to review the output of an AI tool and spot the mistakes, it turns out that they’re just not so good at that. And they even don’t spot mistakes that they would never make themselves. So, in a control group where humans were to interpret the same X-rays or MRI scans etc. and they would make much fewer mistakes compared to what the AI tool does and even when the human review of the AI tool is implemented. So, for these kinds of implementations also in editorial peer review I think we should be somewhat wary and acknowledge both the limited capabilities of AI tools and humans alike.
Want to know if your paper is ready for peer review? Get your manuscript evaluated by expert reviewers using our Pre-Submission Peer Review Service