When I was 19, I took a trip to the United States. I travelled across country by bus, and was able to do a lot of reading as I moved from state to state. One of the books I read was The Effective Executive by Peter Drucker. A key message within the book is to: “know your business.” For example, if you think you’re in the oil business, you’re wrong. You’re actually in the energy business. If you’re in the postal business, you’re really in communications, and you’d better understand the impact of e-mail!
Assertions journals make about their content
What “business” is the scholarly publishing industry in? The content business? No. What business are journals in? I would say that journals are in the “assertions” business. Here are some of the assertions that journals make: This content is worth reading because it is novel and because it is original; These are the authors; These are the institutions; The content has been through peer review; The statistics support the conclusions, etc. If you think about the work that a journal does, it’s really collecting a group of assertions about the content. It’s the assertions that make the content valuable.
Historically, assertions have been implied by journal format. A journal didn’t need to explicitly say “this is the title,” “these are the authors,” “this is the abstract.” Instead, all of this was woven in to the format that a journal used. It was a good way of communicating assertions for the past 350 years. This dates all the way back to Henry Oldenburg’s Philosophical Transactions: using format to communicate assertions.
The need to validate assertions
But the assertion business is changing, and fast! To begin with, it’s increasingly easy to make false or inaccurate assertions. If format is all a journal is using to communicate assertions, anyone else can replicate that format. Clearly that isn’t the case. What about the quality of assertions made? If you happen to make an unsubstantiated assertion these days, there’s a whole industry, including outlets like Retraction Watch, outing these kinds of failures. Format is no longer a guarantee of assertion quality.
Research funders are spending $1.6 trillion a year-- $50,000 a second—so understandably they want better tools to evaluate research output. They want higher quality, accurate, granular assertions. Those assertions must be machine readable, not just human readable. Return on research investment can be proven if assertions are more reliable.
From a journal perspective, is this an opportunity or a threat?
Let’s not think about content workflow. Rather let’s think about an assertion workflow. An author makes a number of assertions about what they submit to a journal: I’m an author. These are my co-authors. Here’s who funded the paper. Here are the methods I used. The data supports the results, etc. One of the things that the journal does is evaluate those assertions for accuracy. Then, the journal makes its own assertions: This work has been through peer review. This is original. Next, the journal boils the assertions down to published content. This is where it’s important to understand that the process of creating assertions is not the same as the process of creating format.
Tools journals use to validate assertions
Semantic tagging: This means taking a look at the process or, from a technical perspective, at the workflow infrastructure. How assertions are tagged can be critical to their usefulness in workflow. Take the word Brown, for example. Is this word a university? A name? A color? A street address? Let’s use simple tagging. For example, <b> Brown </b>, tells software applications that Brown should be bolded: Brown. Still, that is format, which is not a reliable way to communicate assertions, because it doesn’t say anything about what Brown is.
Using semantic tagging, such as <author> Brown </author>, it is now clear what Brown is: an Author. However, because different publications might use different semantic tags to describe an author (e.g. Contributor, Author, or Article Author) there is still some ambiguity. What’s needed is an agreed way of tagging the author. This is where Document Type Definitions (DTDs) come in.
Formatting: The Journal Article Tag Suite (JATS) is one way (DTD) of agreeing to tag scholarly journal data. The software here knows that Brown is an author from reading the tags around the name. A style sheet can then be used to apply formatting ‘rules’ that anything tagged as an author should be represented in bold. Thus the word 'author' is bolded and positioned in the correct area of the manuscript.
Formatting is nothing but presentation. And when there’s talk of formatting, XML cannot be far behind. It may be tempting to think of XML as just a ‘fancy’ and convenient way to do formatting. But it’s more than that. Journals can change the styling ruleset to say, for example, all author names should be shown in blue, and from that point on, that will happen.
Persistent identifiers: Still, in order to get the maximum value out of XML, persistent identifiers need to be used. Let’s take ORCID for the case of the author named Brown. In this case, ORCID will reliably tell the software which specific Brown the journal is referring to. Using API integrations, a journal can validate that it’s not just anyone asserting which Brown we’re talking about, that there is an authoritative source (i.e. ORCID) doing that. If an unsophisticated user decides to manually enter ORCID iDs to manuscript XML, that does not achieve anything other than adding a bit of text to the XML output. The problem is that the text-entered ORCID hasn’t been validated anywhere. The ‘right’ way to add an ORCID iD to XML is through an API call to the ORCID database. This empowers users themselves to validate their ORCID iD. Outputting a validated iD in XML is a reliable assertion that flows through the workflow.
This same approach can be taken to add more reliable assertions to workflow. For example, by identifying an institution to whom an author has an association. This is possible, for example, if the author and/or the institution has validated their institutional affiliation to “Northeastern University” in China. This validation is possible through Ringgold institutional identifiers. Now the assertion that author Brown belongs to Northeastern University in China is validated and can persist in workflow. This type of assertion is valuable to funders since it helps them track their investment. But we can go further. Now we know who did the research and which institution(s) they belong to but we do not know what they did. Here CRediT roles can assert the contribution, and degree of contribution in workflow. Open Funder Registry identifiers can confirm who funded the research, citations can be asserted using DOI linking, and so on. This logic of interconnected assertions can be applied to many aspects of journal workflow.
Tools to aid peer review: Speaking of journal workflow, let’s look at peer review. In peer review systems such as Editorial Manager, there are tools to support making and validating additional journal assertions: Has this content been plagiarized? Is this content novel? (Similarity Check, Meta) Are the citations accurate? (Reference linking) Have conflicts of interest been disclosed? Are the statistics accurate? (StatReviewer)
Once created, these assertions can be passed to other tools for other purposes such as searching for reviewers, acknowledging reviewer activity, and estimating and processing APCs. In other words, once you have valuable assertions, your content can “talk”!
In a world where “content wants to be free,” assertions represent an opportunity for publishers to add value and generate revenue. The growth of Open Access and Open Science underscores this point and accelerates the need for journals to think beyond “selling content.” This requires transitioning from content-based workflows to assertion-based workflows. The journal’s peer review system is an assertion management system that helps publishers grow and manage assertions via integration with persistent identifiers and taxonomies. Publisher marketing departments will need to focus their messages not just on offering “the best content” but on having “the best assertions.”
It is more important than ever for publishers to understand the beating heart of their business. The lessons from Peter Drucker’s book that I read while on the road in 1981 are more relevant than ever!
article is based on a presentation from the Editorial Manager User Group Meeting 2016 – watch the video.