Posts

→

AI in medicine

→

Integrating generative AI into clinical reasoning education: What clinical educators need to know

Verity Schaye, MD, MHPE

April 9, 2026

AI in medicine

Diagnostic errors remain a pervasive issue in healthcare with cognitive errors being a significant contributor (Graber et al., 2005). Despite decades of clinical reasoning curriculum reform and patient safety investment, there has not been much movement. Integrating AI into clinical reasoning education is one of the most discussed potential paths forward, but the research is clear that not all approaches work. Programs that give learners unstructured access to AI tools see no meaningful improvement in diagnostic performance. Programs that build structured, theory-grounded frameworks do.

This post summarizes what the evidence shows, identifies the approaches that work, and outlines a practical framework for education leaders building or revising an AI integration strategy.

What does the research show about AI and clinical reasoning?

Under controlled conditions, large language models consistently outperform clinicians on diagnostic tasks when given clean, curated case information(Kanjee et al., 2023; Eriksen et al., 2023; Cabral et al., 2023; Nori et al., 2025).

The critical caveat: real clinical environments don't look like controlled studies. Clinicians work with incomplete data, under time pressure, across a high volume of patients simultaneously. AI performance in simulated cases doesn't translate automatically to the real clinical environment.

More relevant for education is a separate finding: AI plus clinician does not automatically outperform the clinician alone. Studies that gave physicians unstructured AI access during diagnostic tasks found no significant accuracy improvement (Goh et al., 2024; Brodeur et al., 2024). This maps directly onto what we know from the structured reflection literature, telling someone to "think about their thinking" without a framework produces no measurable benefit.

What can produce benefit is structured integration: AI engagement guided by expert-developed prompts, grounded in clinical reasoning theory. A real-world study across primary care clinics in Nairobi embedded AI in documentation workflows, flagging discordances in real time. The AI-assisted group outperformed controls across all steps of the diagnostic process and AI intervention frequency decreased over time, suggesting clinicians were learning from the feedback rather than depending on it (Korom et al., 2025).

Why does unguided AI use fail to improve diagnostic performance?

Effective clinical reasoning training relies on two strategy types: knowledge organization (illness scripts, problem representation, diagnostic schemas) and structured reflection (deliberate, framework-guided review of the reasoning process) (Bowen, 2006; Schaye et al., 2019). Both require intentional scaffolding, but neither improves through incidental exposure alone.

When AI is introduced without that scaffolding, learners have no framework for when to consult it, what to ask, or how to evaluate the output. There's also a structural risk: large language models tend toward sycophancy, agreeing with whatever hypothesis the user presents. A learner who shares their diagnosis before committing to it gets AI output that reflects their own framing back at them, reinforcing rather than challenging their reasoning (Goh et al., 2024).

The implication is direct: learners should commit to their own clinical reasoning before engaging AI, not after.

How should programs structure AI integration in the curriculum?

The emerging consensus from the structured reflection literature and early AI-plus-clinician research points toward a consistent framework:

Commit to independent reasoning first. Before any AI engagement, learners should formulate their own differential and management plan. This protects against sycophantic AI influence and preserves the productive struggle that drives reasoning development.

Use structured prompts, not open-ended access. Prompts grounded in reflection theory — "what findings support or oppose each diagnosis?" or "what would you expect to see that isn't present?" — outperform blank chat interfaces. Guidance on prompts should be developed by clinical reasoning and AI educators, with training of learners on best practices.

Teach critical appraisal explicitly. AI output is not ground truth. Learners need instruction in evaluating responses, identifying hallucinations, and cross-referencing against trusted sources. The cyborg/centaur framing from (Abdulnour et al., 2025) is useful here: know when you can rely on AI output directly (cyborg) and when you need to verify it first before acting (centaur) — just as you would a preliminary radiology report or automated ECG interpretation.

Maintain separate assessments. Programs need to assess human-alone reasoning and human-plus-AI reasoning independently. Conflating the two makes it impossible to identify deskilling (existing skills degrading through AI dependence) or never-skilling (foundational skills never developing in the first place) (Berzin & Topol, 2025).

Not all AI tools are equal. UpToDate Expert AI draws from expert-reviewed clinical content with transparent editorial oversight — the strongest evidentiary foundation of the available options. OpenEvidence offers fast, free natural-language search but with narrower, less transparent sourcing. General LLMs (GPT, Claude, Gemini) draw from the broader internet and have the potential (depending on institutional resources) to be integrated directly into the electronic health record. Teaching learners to distinguish these by evidence base, not just by interface, is a core AI competency.

How DDx supports structured AI integration

Preparing learners for a clinical environment where AI is present requires repeated, structured practice, not just classroom instruction. DDx by Sketchy provides medical, PA, NP, and GME programs with a clinical readiness platform built around that principle.

DDx goes beyond passive learning, delivering the tools faculty need to build and assess clinical reasoning at every stage.

Realistic, multi-role case simulation across clinical reasoning, communications, and technical skills
AI-powered interactivity that prompts learners to commit to their thinking before receiving feedback
Rubric-based assessments give faculty longitudinal visibility into reasoning development
Faculty-facing analytics identify performance gaps early enough to act on them

For programs building or revising an AI integration curriculum, DDx offers the case volume, expert-guided feedback, and structured AI training environment that clinical settings can't consistently provide.

Explore DDx for your program

Frequently asked questions

Does AI-assisted clinical reasoning training actually improve diagnostic performance? Structured AI integration, using expert-developed prompts grounded in clinical reasoning theory — does show consistent improvement over human-alone performance in the research, with the most pronounced gains in the most diagnostically challenging cases. Unguided AI access, without instruction on how to use or evaluate it, produces no measurable improvement.

When should programs introduce AI tools in clinical training? After learners have developed baseline clinical reasoning competency — enough foundational skill to critically evaluate AI output rather than defer to it. Introducing AI too early creates never-skilling risk; never introducing it leaves learners unprepared for the clinical environment they'll graduate into.

How do programs prevent AI from undermining reasoning skill development? Require learners to commit to their own differential before consulting AI. Use structured prompts rather than open-ended access. And maintain separate assessments of human-alone and human-plus-AI reasoning to track whether foundational skills are developing as expected.

What data privacy rules apply when learners use AI in clinical settings? Patient-identifiable information cannot go into public AI tools. Programs need tiered guidance distinguishing institutional HIPAA-compliant systems from public tools, and should teach learners which tools are appropriate for which use cases before they encounter them in clinical settings.

More in this series

This is the second installment in our three-part series on AI and clinical reasoning education, created in collaboration with leading medical educators.

Part 1: What generative AI means for clinical reasoning education — Justin Choi, MD, MSc, Weill Cornell Medicine, on distributed cognition, adaptive expertise, and the cognitive frameworks programs need to understand before integrating AI into training. Read more here.

Part 3: Why management reasoning is the next training frontier in medical education -- Andrew S. Parsons, MD, MPH makes the case for why management reasoning — not diagnosis — is the training frontier medical educators need to focus on in an AI era.

References

Abdulnour RE, Gin B, Boscardin CK. Educational Strategies for Clinical Supervision of Artificial Intelligence Use. N Engl J Med. 2025 Aug 21;393(8):786-797. doi: 10.1056/NEJMra2503232. PMID: 40834302.

Berzin TM, Topol EJ. Preserving clinical skills in the age of AI assistance. The Lancet. 2025;406(10513):1719.

Bowen JL. Educational strategies to promote clinical diagnostic reasoning. N Engl J Med. 2006;355(21):2217–2225.

Brodeur PG, Buckley TA, Kanjee Z, et al. Superhuman performance of a large language model on the reasoning tasks of a physician. arXiv preprint. 2024.

Cabral S, Restrepo D, Kanjee Z, et al. Clinical reasoning of a generative artificial intelligence model compared with physicians. JAMA. 2023.

Eriksen AV, Möller S, Ryg J. Use of GPT-4 to diagnose complex clinical cases. NEJM AI. 2023;1(1).

Graber ML, Franklin N, Gordon R. Diagnostic Error in Internal Medicine. Arch Intern Med. 2005;165(13):1493–1499. doi:10.1001/archinte.165.13.1493.

Goh E, Gallo R, Hom J, et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Network Open. 2024.

Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. NEJM AI. 2023;1(1).

Korom R, Kiptinness S, Adan N, et al. AI-based clinical decision support for primary care: a real-world study. arXiv preprint arXiv:2507.16947. 2025.

Nori H, Daswani M, Kelly C, et al. Sequential diagnosis with language models. arXiv preprint arXiv:2506.22405. 2025.

Schaye V, Eliasz KL, Janjigian M, Stern DT. Theory-guided teaching: implementation of a clinical reasoning curriculum in residents. Med Educ. 2019;53(12):1192–1199.

This post draws on research and curriculum work presented by Verity Schaye, MD, MHPE, Associate Professor of Medicine and Assistant Dean for Education in the Clinical Sciences at NYU Grossman School of Medicine.

Verity Schaye, MD, MHPE

MD, MHPE

Verity is Associate Professor of Medicine and Assistant Dean for Education in the Clinical Sciences at NYU Grossman School of Medicine. She is a 2024–25 National Academy of Medicine Scholar in Diagnostic Excellence and a nationally recognized leader in clinical reasoning education and AI integration.

Explore how AI-enabled clinical simulation can benefit your institution. Schedule a demo of DDx today.

Schedule a call

AI in medicine

AI resources for clinical educators: a practical guide to getting started

AI in medicine

What it really means to be a nurse practitioner: Stories from the bedside

AI in medicine

How to identify and support struggling residents earlier: a structured approach to GME remediation

MCAT® is a registered trademark of the Association of American Medical Colleges. The United States Medical Licensing Examination (USMLE®) is a joint program of the Federation of State Medical Boards (FSMB®) and National Board of Medical Examiners (NBME®). NAPLEX® is a registered trademark of the National Association of Boards of Pharmacy. PANCE© is a registered trademark of the National Commission on Certification of Physician Assistants. NCLEX® is a registered trademark and service mark of the National Council of State Boards of Nursing, Inc. None of the trademark holders are endorsed by nor affiliated with Sketchy or this website.

Integrating generative AI into clinical reasoning education: What clinical educators need to know

April 9, 2026

What does the research show about AI and clinical reasoning?

Why does unguided AI use fail to improve diagnostic performance?

How should programs structure AI integration in the curriculum?

How DDx supports structured AI integration

Frequently asked questions

More in this series

References

Explore how AI-enabled clinical simulation can benefit your institution. Schedule a demo of DDx today.

Related Resources

AI resources for clinical educators: a practical guide to getting started

What it really means to be a nurse practitioner: Stories from the bedside

How to identify and support struggling residents earlier: a structured approach to GME remediation