Diagnostic errors remain a pervasive issue in healthcare with cognitive errors being a significant contributor (Graber et al., 2005). Despite decades of clinical reasoning curriculum reform and patient safety investment, there has not been much movement. Integrating AI into clinical reasoning education is one of the most discussed potential paths forward, but the research is clear that not all approaches work. Programs that give learners unstructured access to AI tools see no meaningful improvement in diagnostic performance. Programs that build structured, theory-grounded frameworks do.

This post summarizes what the evidence shows, identifies the approaches that work, and outlines a practical framework for education leaders building or revising an AI integration strategy.

What does the research show about AI and clinical reasoning?

Under controlled conditions, large language models consistently outperform clinicians on diagnostic tasks when given clean, curated case information(Kanjee et al., 2023;  Eriksen et al., 2023; Cabral et al., 2023; Nori et al., 2025).

The critical caveat: real clinical environments don't look like controlled studies. Clinicians work with incomplete data, under time pressure, across a high volume of patients simultaneously. AI performance in simulated cases doesn't translate automatically to the real clinical environment.

More relevant for education is a separate finding: AI plus clinician does not automatically outperform the clinician alone. Studies that gave physicians unstructured AI access during diagnostic tasks found no significant accuracy improvement (Goh et al., 2024; Brodeur et al., 2024). This maps directly onto what we know from the structured reflection literature, telling someone to "think about their thinking" without a framework produces no measurable benefit.

What can produce benefit is structured integration: AI engagement guided by expert-developed prompts, grounded in clinical reasoning theory. A real-world study across primary care clinics in Nairobi embedded AI in documentation workflows, flagging discordances in real time. The AI-assisted group outperformed controls across all steps of the diagnostic process and AI intervention frequency decreased over time, suggesting clinicians were learning from the feedback rather than depending on it (Korom et al., 2025).

Why does unguided AI use fail to improve diagnostic performance?

Effective clinical reasoning training relies on two strategy types: knowledge organization (illness scripts, problem representation, diagnostic schemas) and structured reflection (deliberate, framework-guided review of the reasoning process) (Bowen, 2006; Schaye et al., 2019). Both require intentional scaffolding, but neither improves through incidental exposure alone.

When AI is introduced without that scaffolding, learners have no framework for when to consult it, what to ask, or how to evaluate the output. There's also a structural risk: large language models tend toward sycophancy, agreeing with whatever hypothesis the user presents. A learner who shares their diagnosis before committing to it gets AI output that reflects their own framing back at them, reinforcing rather than challenging their reasoning (Goh et al., 2024).

The implication is direct: learners should commit to their own clinical reasoning before engaging AI, not after.

How should programs structure AI integration in the curriculum?

The emerging consensus from the structured reflection literature and early AI-plus-clinician research points toward a consistent framework:

Commit to independent reasoning first. Before any AI engagement, learners should formulate their own differential and management plan. This protects against sycophantic AI influence and preserves the productive struggle that drives reasoning development.

Use structured prompts, not open-ended access. Prompts grounded in reflection theory — "what findings support or oppose each diagnosis?" or "what would you expect to see that isn't present?" — outperform blank chat interfaces. Guidance on prompts should be developed by clinical reasoning and AI educators, with training of learners on best practices.

Teach critical appraisal explicitly. AI output is not ground truth. Learners need instruction in evaluating responses, identifying hallucinations, and cross-referencing against trusted sources. The cyborg/centaur framing from (Abdulnour et al., 2025) is useful here: know when you can rely on AI output directly (cyborg) and when you need to verify it first before acting (centaur) — just as you would a preliminary radiology report or automated ECG interpretation.

Maintain separate assessments. Programs need to assess human-alone reasoning and human-plus-AI reasoning independently. Conflating the two makes it impossible to identify deskilling (existing skills degrading through AI dependence) or never-skilling (foundational skills never developing in the first place) (Berzin & Topol, 2025).

Not all AI tools are equal. UpToDate Expert AI draws from expert-reviewed clinical content with transparent editorial oversight — the strongest evidentiary foundation of the available options. OpenEvidence offers fast, free natural-language search but with narrower, less transparent sourcing. General LLMs (GPT, Claude, Gemini) draw from the broader internet and have the potential (depending on institutional resources) to be integrated directly into the electronic health record. Teaching learners to distinguish these by evidence base, not just by interface, is a core AI competency.

How DDx supports structured AI integration

Preparing learners for a clinical environment where AI is present requires repeated, structured practice, not just classroom instruction. DDx by Sketchy provides medical, PA, NP, and GME programs with a clinical readiness platform built around that principle.

DDx goes beyond passive learning, delivering the tools faculty need to build and assess clinical reasoning at every stage.

  • Realistic, multi-role case simulation across clinical reasoning, communications, and technical skills
  • AI-powered interactivity that prompts learners to commit to their thinking before receiving feedback
  • Rubric-based assessments give faculty longitudinal visibility into reasoning development
  • Faculty-facing analytics identify performance gaps early enough to act on them

For programs building or revising an AI integration curriculum, DDx offers the case volume, expert-guided feedback, and structured AI training environment that clinical settings can't consistently provide.

Explore DDx for your program

Frequently asked questions

Does AI-assisted clinical reasoning training actually improve diagnostic performance? Structured AI integration, using expert-developed prompts grounded in clinical reasoning theory — does show consistent improvement over human-alone performance in the research, with the most pronounced gains in the most diagnostically challenging cases. Unguided AI access, without instruction on how to use or evaluate it, produces no measurable improvement.

When should programs introduce AI tools in clinical training? After learners have developed baseline clinical reasoning competency — enough foundational skill to critically evaluate AI output rather than defer to it. Introducing AI too early creates never-skilling risk; never introducing it leaves learners unprepared for the clinical environment they'll graduate into.

How do programs prevent AI from undermining reasoning skill development? Require learners to commit to their own differential before consulting AI. Use structured prompts rather than open-ended access. And maintain separate assessments of human-alone and human-plus-AI reasoning to track whether foundational skills are developing as expected.

What data privacy rules apply when learners use AI in clinical settings? Patient-identifiable information cannot go into public AI tools. Programs need tiered guidance distinguishing institutional HIPAA-compliant systems from public tools, and should teach learners which tools are appropriate for which use cases before they encounter them in clinical settings.

More in this series

This is the second installment in our three-part series on AI and clinical reasoning education, created in collaboration with leading medical educators.

Part 1: What generative AI means for clinical reasoning education — Justin Choi, MD, MSc, Weill Cornell Medicine, on distributed cognition, adaptive expertise, and the cognitive frameworks programs need to understand before integrating AI into training. Read more here.

References

Abdulnour RE, Gin B, Boscardin CK. Educational Strategies for Clinical Supervision of Artificial Intelligence Use. N Engl J Med. 2025 Aug 21;393(8):786-797. doi: 10.1056/NEJMra2503232. PMID: 40834302.

Berzin TM, Topol EJ. Preserving clinical skills in the age of AI assistance. The Lancet. 2025;406(10513):1719.

Bowen JL. Educational strategies to promote clinical diagnostic reasoning. N Engl J Med. 2006;355(21):2217–2225.

Brodeur PG, Buckley TA, Kanjee Z, et al. Superhuman performance of a large language model on the reasoning tasks of a physician. arXiv preprint. 2024.

Cabral S, Restrepo D, Kanjee Z, et al. Clinical reasoning of a generative artificial intelligence model compared with physicians. JAMA. 2023.

Eriksen AV, Möller S, Ryg J. Use of GPT-4 to diagnose complex clinical cases. NEJM AI. 2023;1(1).

Graber ML, Franklin N, Gordon R. Diagnostic Error in Internal Medicine. Arch Intern Med. 2005;165(13):1493–1499. doi:10.1001/archinte.165.13.1493.

Goh E, Gallo R, Hom J, et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Network Open. 2024.

Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. NEJM AI. 2023;1(1).

Korom R, Kiptinness S, Adan N, et al. AI-based clinical decision support for primary care: a real-world study. arXiv preprint arXiv:2507.16947. 2025.

Nori H, Daswani M, Kelly C, et al. Sequential diagnosis with language models. arXiv preprint arXiv:2506.22405. 2025.

Schaye V, Eliasz KL, Janjigian M, Stern DT. Theory-guided teaching: implementation of a clinical reasoning curriculum in residents. Med Educ. 2019;53(12):1192–1199.

This post draws on research and curriculum work presented by Verity Schaye, MD, MHPE, Associate Professor of Medicine and Assistant Dean for Education in the Clinical Sciences at NYU Grossman School of Medicine.

Explore how AI-enabled clinical simulation can benefit your institution. Schedule a demo of DDx today.

Schedule  Demo