ICML Rejects 2% of Papers for Using LLMs in Peer Reviews

The International Conference on Machine Learning (ICML) has taken an unprecedented enforcement step, issuing desk rejections to dozens of submitted papers after discovering authors used large language models to write or augment their peer reviews. This marks the first public, quantified action of its scale from a top-tier AI conference, directly enforcing its ban on LLM-assisted reviewing. The move reveals the operational challenges and ethical fault lines emerging as generative AI tools become ubiquitous in the academic workflow, forcing a reckoning on what constitutes authentic scholarly contribution.

The International Conference on Machine Learning (ICML), one of the field's most prestigious venues, has disclosed that it rejected approximately 2% of submitted papers without sending them for full peer review. The sole reason: the authors used large language models like ChatGPT or Claude to write or substantively edit their assigned reviews of other papers. This action was taken in direct violation of ICML's explicit policy, which states that "the use of LLMs to write or generate reviews is strictly prohibited."

What Happened: Policy Meets Enforcement

In a March 18 blog post titled "On Violations of LLM Review Policies," the ICML 2026 program chairs confirmed the desk rejections. The conference employs a combination of automated screening tools and manual verification by senior area chairs to detect policy violations. While the post did not specify the exact number of papers, a 2% rate applied to ICML's typical submission volume of over 6,000 papers suggests around 120 submissions were rejected at the desk for this specific infraction.

The policy itself is not new; ICML and other conferences like NeurIPS and ICLR have had similar guidelines in place for the past two years. However, this is the first major public report of a top conference applying such a significant batch of sanctions. The blog post emphasizes that the policy exists to preserve the core value of peer review: human expertise, accountability, and confidential judgment. Using an LLM, the chairs argue, delegates these responsibilities to an opaque system and risks introducing bias, factual errors, or generic feedback that does not serve the scholarly community.

Why This Matters for AI and Academia

This enforcement action is a bellwether for academic publishing. It moves the debate about AI use from theoretical guidelines to concrete consequences, setting a precedent other venues will likely follow. The stakes are high: a desk rejection at a major conference can delay research dissemination, impact a researcher's publication record, and affect funding or career progression.

The incident highlights a fundamental tension. Researchers are encouraged to use AI as a tool in their research and writing, yet are prohibited from using it in the reciprocal service of evaluation. Conferences must now walk a fine line, fostering innovation while safeguarding processes they deem essential. This also places a new burden on program committees to develop reliable detection methods, an arms race that parallels concerns in education about AI-assisted cheating. The integrity of the entire peer-review system, already strained by volume, now faces a novel technological stress test.

The People and Competitive Context

The decision was enacted by the ICML 2026 program chairs, who operate under the guidance of the International Machine Learning Society. Their action reflects a growing consensus among senior researchers and society leadership that certain uses of AI corrode scholarly norms. Notably, this policy is largely championed by established academics and conference organizers, while some early-career researchers and those under high publication pressure may view LLMs as a productivity tool for managing relentless review requests.

The competitive context is crucial. Conferences like ICML are the primary currency of AI research career advancement. The pressure to publish is immense, and the review load is often overwhelming. The temptation to use an LLM to quickly summarize a paper or draft review comments is understandable, yet the policy frames it as an unacceptable shortcut that undermines the communal contract of peer review. This places the onus on both institutions to manage reviewer workload and on individuals to uphold ethical standards despite pressure.

What Happens Next

Expect immediate ripple effects. Other top-tier AI conferences will likely point to ICML's action to justify and strengthen their own enforcement. The development of more sophisticated AI-text detection tools tailored for academic review will accelerate, though their accuracy and ethical use will be hotly debated. A key next signal to watch is whether any authors contest the rejections or call for more nuanced policies, such as allowing LLMs for grammar correction or formatting while prohibiting substantive generation.

Longer term, this may force a broader conversation about the structure of academic peer review itself. If the system is so overburdened that researchers seek AI assistance to participate, perhaps the model needs reform. Solutions could include formal recognition for reviewing, better incentives, or a shift toward delegated or staggered review processes. For now, ICML has drawn a bright red line, making it clear that in the evaluation of human intelligence, the human must remain irreplaceably in the loop.