In January 2026, an investigation by GPTZero examined 4,841 papers accepted at NeurIPS 2025, the premier conference in artificial intelligence research. The findings stunned the academic community: at least 100 confirmed hallucinated citations across 53 papers slipped past three to five expert reviewers per submission. Around the same time, the University of Hong Kong reported a peer-reviewed paper that contained 20 fabricated AI references out of 61 total. These were not random errors. They were AI-generated phantom citations that looked legitimate, carried proper formatting, and named plausible authors who had never written the papers being cited.
If even peer reviewers at the world’s most rigorous conferences cannot catch AI-fabricated content, the implications for ordinary researchers using ChatGPT to prepare their papers are serious. For example, AI tools can generate text that reads well. However, they cannot meet the standards of academic peer review, and using them without human verification puts your reputation, your paper, and sometimes your career at risk.
This guide explains exactly why ChatGPT cannot pass peer review, what specific failures journal editors and reviewers now look for, what current journal policies say about AI use, and why a trained human editor remains essential for academic work. In addition, the evidence here comes from peer-reviewed studies and major academic publishers, updated to early 2026.
The Core Problem: ChatGPT Predicts, It Does Not Verify
The fundamental issue with ChatGPT and similar large language models is that they generate text by predicting what a plausible response would look like. In contrast to a search engine, they do not search databases, verify facts, or check whether the sources they cite actually exist. As a result, when ChatGPT generates a citation, it produces what a citation would look like, not what a real citation is.
For instance, a ChatGPT-generated reference can include:
- A plausible author name in a plausible field
- A title that sounds like a real paper
- A journal name that actually exists
- A volume, issue, and page range that follows realistic patterns
- A DOI formatted like a real DOI
Every element looks correct. None of it has to be real. Furthermore, the model cannot tell you which references it made up and which are accurate, because it does not distinguish between the two when generating them.
According to recent corpus studies, even the most advanced models (GPT-4o and Claude 3.7) still exhibit 15 to 20 percent hallucination rates on factual citation tasks. For niche or recent topics, those rates rise to 35 to 55 percent. In medical and legal domains, where precision matters most, hallucination rates can exceed 28 percent without grounding.
What Peer Reviewers Actually Look For
To understand why ChatGPT fails peer review, it helps to know what reviewers evaluate. Peer review is not just about checking grammar or formatting. A serious reviewer examines:
- Novelty and significance of the contribution. Does the work advance the field?
- Validity of the methodology. Are the methods appropriate, rigorous, and reproducible?
- Accuracy of the data analysis. Do the results follow logically from the methods?
- Strength of the argument. Does the discussion connect findings to the broader literature in a defensible way?
- Accuracy of the references. Do the cited sources exist and support the claims made?
- Integrity of the work. Is there evidence of plagiarism, data manipulation, or fabrication?
ChatGPT cannot reliably perform any of these checks on its own output. It can produce text that mimics each of them, but it cannot verify substance, accuracy, or originality. Furthermore, when asked, it cannot identify which parts of its own output are factual and which are hallucinated.
Five Specific Reasons ChatGPT Fails Peer Review
Here are the most common ways ChatGPT-generated content gets caught during peer review or, worse, slips past it and causes damage afterward.
1. Fabricated References
This is the single most documented failure. For example, a 2023 study published in Nature Scientific Reports found that ChatGPT-3 produced 178 references for research proposals, of which 69 lacked valid DOIs and at least 41 were entirely fabricated. Similarly, a more recent 2025 study examining mental health research found that 20 percent of ChatGPT citations were complete fabrications and 45 percent of real references contained errors in author names, journal names, or publication years.
In the published University of Hong Kong case, 20 of 61 references in a peer-reviewed paper turned out to be AI-generated fabrications. Reviewers did not catch them. A social media user did, after the paper had already appeared online.
If a reviewer or editor spots even one fabricated citation, the entire paper is suspect. Most journals will then withdraw the paper, ban the author from resubmission for a period, and in some cases notify the author’s institution.
2. Inaccurate Paraphrasing of Source Material
Even when ChatGPT cites real papers, it frequently misrepresents what those papers actually say. The tool generates plausible summaries that match the style of academic writing but do not necessarily reflect the source’s actual findings, methodology, or conclusions.
As a result, a reviewer who knows the cited literature spots this immediately. Furthermore, the misrepresentation can persist into your final paper even after extensive editing, because you may not realize ChatGPT got the original source wrong.
3. Generic, Discipline-Blind Language
OpenAI trained ChatGPT on broad internet text. As a result, it produces writing that sounds academic in general but lacks the specific conventions of your discipline. It misses the standard phrasing patterns of biomedical writing, the structural conventions of qualitative research reporting, the precise hedging language expected in legal writing, and the specific terminology favored in computer science.
Reviewers in your field notice immediately when language does not match the conventions of the field. The paper reads as if someone outside the discipline wrote it, which weakens reviewer confidence in the underlying work.
4. Hollow Argument Structure
ChatGPT excels at producing text that sounds like a coherent argument without actually making one. For example, sentences flow well, transitions appear smooth, and the structure looks correct. However, when a reviewer examines the logical chain (premise A → premise B → conclusion C), the connections often fall apart. The argument is grammatically present and substantively absent.
Furthermore, ChatGPT cannot evaluate whether your conclusion is actually supported by your data, because it does not understand your data. It only matches surface patterns.
5. Detection by AI Tools the Journals Now Use
Most major publishers now run submissions through AI detection tools before sending them for peer review. Elsevier, Springer Nature, Wiley, Taylor & Francis, and most ICMJE-compliant journals use these checks routinely. Furthermore, several journals have added AI-content disclosure requirements to their submission systems.
If your paper triggers a high AI-content score, editors either return the manuscript, flag it for special review, or reject it. For more on what AI detection tools look for and how to write so that legitimate human work does not get flagged, see our guide on how to avoid AI detection in academic writing ethically.
What Current Journal Policies Say About AI Use
Journal AI policies have tightened significantly since 2023. Here is where the major publishers stand in 2026.
| Publisher / Journal | Policy on AI Use |
|---|---|
| Nature / Nature Portfolio | AI tools cannot be authors. Use must be disclosed in Methods or Acknowledgements. AI cannot generate images for figures. |
| The Lancet | Follows ICMJE. AI cannot be an author. Authors take full responsibility for AI-assisted content. Use must be disclosed. |
| NEJM | AI tools cannot generate text in submissions. Limited use for language editing must be disclosed. |
| JAMA / AMA Network | AI use must be specified by name, version, and date. Authors guarantee accuracy of all content. |
| Elsevier journals | AI use for language editing acceptable with disclosure. AI cannot be an author or generate substantive content. |
| Springer Nature | AI use limited to language assistance. Substantive content generation prohibited. Required disclosure. |
| Wiley | Authors must disclose any AI use in writing or analysis. AI cannot be credited as an author. |
| ICMJE-compliant journals | All four ICMJE authorship criteria apply. AI cannot meet them. AI use must be disclosed in Methods or Acknowledgements. |
The pattern across all major publishers is clear: AI use is permitted only for language assistance, must be disclosed, and never qualifies an AI tool as an author. Any substantive content generation crosses ethical lines.
Furthermore, the Committee on Publication Ethics (COPE) explicitly states that AI tools cannot be authors because they cannot take responsibility for the integrity of the work. Responsibility, accountability, and approval remain human functions.
What a Human Editor Does That ChatGPT Cannot
The differences between professional academic editing and AI tools are not subtle. They are structural. Here is what a trained human editor brings that no current AI can match.
1. Verification of Substance
A human editor checks whether your citations are real, whether your sources support your claims, and whether your reasoning holds up. AI generates content without verifying any of this.
2. Discipline-Specific Expertise
A trained academic editor in your field knows the conventions of biomedical writing, qualitative research, computer science, social science, or the humanities. They apply these conventions naturally. AI applies generic patterns regardless of discipline.
3. Preservation of Your Voice
A skilled editor improves your writing without flattening your authorial voice. AI tools standardize text into a recognizable register, which is exactly what detection tools now flag. Furthermore, your voice is part of your scholarly identity. Losing it is a real cost.
4. Judgment About What to Keep
A human editor recognizes when an awkward sentence carries meaning that should not be smoothed away. AI removes awkwardness regardless of whether the underlying meaning gets compromised.
5. Accountability
A professional editor takes responsibility for the changes they recommend. They sign their work. AI tools do not, and cannot.
6. Ethical Boundaries
A trained editor knows where the line sits between editing (preserving authorship) and ghostwriting (replacing it). They stay on the ethical side. AI tools do not understand the line at all.
The Honest Verdict: ChatGPT Has Limited Uses, But Cannot Replace Editing
ChatGPT does have legitimate uses in academic work. Specifically:
- Brainstorming initial ideas for research questions
- Generating summary outlines you then verify and rewrite
- Translating short passages for understanding (not for final text)
- Suggesting alternative phrasings on small sections
Even these limited uses require careful verification and disclosure where journals require it.
What ChatGPT cannot do is replace professional editing for high-stakes academic work. The risk of fabrication, the absence of verification, the loss of voice, the failure to meet discipline-specific conventions, and the inability to take responsibility all combine to make AI tools insufficient for serious peer review preparation.
For more on the broader question of whether to invest in professional editing, see our guide on whether professional academic editing is worth it.
When ManuscriptLab Helps
Researchers who want their papers to clear peer review increasingly need both an honest AI assessment and a human editor. ManuscriptLab provides both, with subject-matter expert editors who hold advanced degrees in your field and tools that verify your paper meets current journal expectations.
The two services most directly relevant when concerns about AI content arise are AI Turnitin Report, which shows you exactly what AI detection tools flag in your paper before a journal does, and AI Reduction, where our editors rewrite flagged sections to read naturally while preserving your meaning and academic voice. If you want either service applied to your manuscript, contact our team at ManuscriptLab.
Frequently Asked Questions
Can ChatGPT write a paper that passes peer review? Not reliably. While AI-generated content has slipped past peer review in some documented cases, the failure rate is high. Journals are tightening AI policies and improving detection. Submitting AI-generated content is a serious career risk.
Will my paper get rejected if I used ChatGPT for grammar? Generally no, as long as you disclose the use according to journal policy. Most major publishers now allow AI for language assistance with disclosure. The line is whether the AI generated substantive content or just polished what you wrote.
Why do AI tools fabricate citations? Because they predict plausible text rather than verify factual accuracy. They generate references that look real because they match the patterns of real citations, but they do not check whether the cited sources actually exist.
How do I check if ChatGPT fabricated my citations? Search every citation in Google Scholar, PubMed, Web of Science, or Scopus using the title or DOI. If the source does not appear in any database, it is likely fabricated. Furthermore, do not trust the DOI alone; verify by accessing the actual source.
Do journals use AI detection on submissions? Yes. Most major publishers now run submissions through AI detection tools before peer review. Elsevier, Springer Nature, Wiley, and Taylor & Francis all use such checks routinely.
What happens if I am caught using AI without disclosure? Consequences depend on the journal and severity. They can range from rejection of the manuscript to a ban on future submissions, retraction of any published paper, and reporting to your institution. Disclosure is always the safer choice.
Is ChatGPT useful for any part of academic writing? Yes, with caveats. It can help with brainstorming, outline generation, and minor language polishing on text you have written and will verify. It cannot generate citations, summarize literature reliably, or produce final text for submission.
Moving Forward
The evidence from NeurIPS 2025, the University of Hong Kong, and multiple peer-reviewed studies makes one thing clear: ChatGPT cannot meet the standards of academic peer review. It hallucinates citations, misrepresents sources, produces generic discipline-blind language, builds hollow argument structures, and triggers the detection tools journals now use. None of these failures will resolve themselves through better prompting or more careful use.
In summary, use AI tools only for limited language assistance and brainstorming. Verify every citation against real databases. Disclose AI use according to journal policy. For substantive editing, structural review, and final preparation for peer review, work with a trained academic editor who can verify substance, preserve your voice, and take responsibility for the work. Finally, never assume AI-generated content will pass peer review just because it reads well.
Do those things and your work will reach the standard peer review expects, with your reputation and your career intact.




