The ‘Papermill alarm’ software signals potentially false papers

By on September 23, 2022 0

The stationery alarm searches for similarities to text found in fake papers.Credit: Raimund Koch/Getty

A software tool that analyzes the titles and abstracts of scientific articles and detects texts similar to those found in fake articles is attracting interest from publishers.

The tool, called Papermill Alarm, was developed by Adam Day, director of scholarly data services firm Clear Skies in London, UK. Day says he reviewed all the titles listed in the PubMed citation database through the system and discovered that 1% of the articles currently listed contain text very similar to articles produced by paper mills – companies or individuals who make scientific manuscripts to order. The stationery alarm does not tell for sure if an item is made, but flags those that deserve further investigation.

Day says her analysis is not intended to estimate the magnitude of stationery among PubMed entries, because she can only recognize papers similar to those from known stationeries. Many other paper mills could exist, and legitimate newspapers could also be flagged for having similar wording, he says. “It’s like a fishing net. It’s not a fishing rod.

Anna Abalkina, an economist at the Free University of Berlin who studies paper mills, says the scientific community will benefit from automated checks that can detect potentially fake papers.

Suspicious submissions

Many publishers already use software and other methods to help detect fraudulent activity and spot unwanted papers. Some manuscript processing systems can detect and report if many submissions come from the same computer, for example – a sign that a person or organization might be producing a large number of studies. But Day says his approach to text analysis is novel. Six publishers, including SAGE in Thousand Oaks, Calif., where Day works as a data scientist, have expressed interest in using Papermill Alarm to screen submitted manuscripts.

The tool uses a deep learning algorithm to compare the language used in manuscript titles and abstracts with that used in articles known to come from paper mills. The comparison is based on lists of stationery items compiled by research integrity sleuths, including Elisabeth Bik and David Bimler (also known as Smut Clyde). The tool uses a traffic light system, assigning red flags to items with many similarities to known stationery items, orange flags to those with some similarities, and green flags to those with none.

Until now, there have been few estimates of the prevalence of items from stationery stores. A month of June publication ethics committee report in Eastleigh, UK, suggested that 2% of papers submitted to journals come from paper mills and said the problem “threatens to overwhelm the editorial processes of a significant number of journals”.

Even Day’s conclusion that 1% of published PubMed articles come from stationery is “too high for comfort,” says Bimler. “These unwanted newspapers are cited. People use it to buttress their own bad ideas and support dead-end research programs,” he adds.

Bik says the actual number of stationery articles listed in PubMed could be even higher, but points out that their impact on science as a whole is likely low, as most of these articles are not highly cited or influential. “But it hurts the reputation of science and the trust we place in research papers,” she says.