The research archive ArXiv will ban authors for a year if they let AI do all the work

Business research

ArXiv, a widespread open repository for preprint research, is doing more to crack down on the careless use of large language models in scholarly articles.

Although papers are posted to the site before they are peer-reviewed, arXiv (pronounced “archive”) has become one of the main ways in which research circulates in fields such as computer science and mathematics, and the site itself has become a source of data on trends in scientific research.

ArXiv has already taken steps to combat a growing number of low-quality, AI-generated papers, for example by requiring first-time posters to get an endorsement from an established author. And after being hosted by Cornell for more than 20 years, the organization is becoming an independent nonprofit, which should allow it to raise more money to address issues like AI slop.

In his latest move, Thomas Dietterich — the chair of arXiv’s computer science section — wrote Thursday that “if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, that means we can’t trust anything in the paper.”

That indisputable evidence could include things like “hallucinated references” and comments to or from LLM, Dietterich said. If such evidence is found, an article’s authors will face “a one-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted by a reputable peer-reviewed site.”

Note that this is not an outright ban on using LLMs, but rather an insistence that, as Dietterich put it, authors take “full responsibility” for the content, “regardless of how the content is generated.” So if researchers copy and paste “inappropriate language, plagiarized content, biased content, mistakes, errors, incorrect references or misleading content” directly from an LLM, then they are still responsible for it.

Dietterich told 404 Media that this will be a “one-strike” rule, but moderators must flag the issue and section chairs must confirm the evidence before imposing the penalty. Authors will also be able to appeal the decision.

Recent peer-reviewed research has found that fabricated citations are on the rise in biomedical research, likely due to LLMs—though to be fair, scientists aren’t the only ones caught using AI-made citations.

When you buy through links in our articles, we may earn a small commission. This does not affect our editorial independence.

Leave a Reply

Your email address will not be published. Required fields are marked *