Generative AI can already generate differential diagnoses and estimate pre- and post-test probabilities at levels that match or exceed experienced clinicians in some studies (Goh et al., 2024; McDuff et al., 2025). The question for programs is not whether AI will change how learners develop clinical reasoning, but which aspects of that development need protecting.

This post summarizes what the evidence shows on AI and clinical reasoning, identifies the risks programs need to plan around, and outlines a practical framework for education leaders building or revising an AI integration strategy.

How does generative AI fit into clinical reasoning?

Dual process theory that describes System 1 (fast, intuitive) and System 2 (slow, analytic) thinking has shaped clinical reasoning education for decades. Its limitation: it only describes what happens inside an individual clinician's mind.

A more useful framework is distributed cognition (DCog), developed by Edwin Hutchins (1995). His argument: decision-making is distributed across three components:

  • Individuals: clinicians, learners, and patients shaping reasoning through communication
  • Tools and artifacts: the EHR, diagnostic algorithms, and increasingly, AI
  • Environment: workflows, time constraints, and institutional culture

Generative AI is entering all three. Right now it functions as a tool that can assist or complete cognitive tasks with high efficiency and at greater scale than anything before it. But it is also beginning to behave as an agent: generating recommendations in ways that resemble collaboration. As AI embeds itself in clinical environments, it will shape the norms of practice — which means it will shape what programs need to teach.

What are the risks of AI-assisted reasoning for medical learners?

Deskilling and never-skilling

Berzin and Topol (2025) describe two trajectories for clinicians trained in an AI-integrated environment:

  • AI-enhanced practice: for those who develop the skills to use AI well
  • Deskilling or never-skilling: for those who don't

For today's learners, the greater risk is never-skilling: never building foundational reasoning processes because AI handles them before those skills develop.

Bias and noise

LLMs don't eliminate the cognitive biases that drive diagnostic error, but can amplify them:

  • LLMs display the same biases as human clinicians, sometimes at larger magnitudes, and prompting the model to avoid them did not reliably help (Wang & Redelmeier, 2024, 2025)
  • LLMs can generate different recommendations for patients from marginalized groups even when clinical data is identical (Omar et al., 2025)
  • Outputs vary between and within models due to their stochastic architecture (Landon et al., 2025)

AI is not a neutral corrective for diagnostic error. Learners need to evaluate its outputs critically.

How can programs use AI to improve clinical reasoning assessment?

Most clinical reasoning assessments are one-off snapshots, not part of a coherent developmental picture. The emerging standard is programmatic assessment: longitudinal, integrative, triangulating multiple data sources over time. Torre and colleagues (2025) describe it as a necessary evolution. The obstacle has always been execution — most programs can't sustain the faculty time it requires.

LLMs are well-suited to this synthesis work: drawing on OSCE scores, workplace-based assessments, written evaluations, and knowledge tests to surface longitudinal patterns.

The model: AI synthesizes and drafts. Faculty verify, edit, and make high-stakes decisions.

What does adaptive expertise look like when AI handles the routine?

  • Routine expertise: efficient application of established knowledge to familiar problems
  • Adaptive expertise: generating new solutions when familiar approaches fail

Clinical medicine consistently demands the latter. Choi's argument: AI can accelerate the path to routine expertise by offloading routine cognitive work, while supporting adaptive expertise through:

  • A second opinion when clinical certainty is low
  • A devil's advocate that surfaces gaps in reasoning
  • A coaching tool within master adaptive learning frameworks (plan → learn → assess → adjust)

The goal is not learners who depend on AI, but learners who know when and how to use it well. Approximately 70–80% of diagnoses are still made through history and physical examination alone. Clinical judgment, humanism, and bedside skills remain a core obligation of clinical education.

Two priorities for programs

1. Shift the unit of analysis from the individual to the system. Programs will need to assess the human-AI dyad, including competencies like prompt engineering, verification of outputs, and appropriate oversight of automated recommendations.

2. Define the division of cognitive labor explicitly. AI is already being used to handle medication prescription renewals autonomously in some settings. Programs that engage with this question proactively will be better positioned to prepare learners for the environments they'll actually work in.

DDx by Sketchy provides structured, rubric-based case simulation that develops clinical reasoning, communications, and technical skills at scale, without proportional increases in faculty workload. If you're thinking through how AI intersects with clinical readiness training at your institution, we'd welcome the conversation.

See how programs are using DDx

Frequently asked questions

Q: What is the biggest risk of generative AI for medical education programs? Deskilling and never-skilling — the erosion or failure to develop foundational clinical reasoning skills when AI handles those cognitive tasks before learners have built them. Programs that prioritize foundational reasoning development alongside AI literacy are better protected against both.

Q: Can AI help with clinical reasoning assessment? Yes, particularly for programmatic assessment. LLMs can synthesize longitudinal learner data across OSCE scores, workplace-based assessments, and knowledge tests — surfacing patterns that are difficult to detect manually. The key is keeping faculty in the verification and decision-making role rather than treating AI output as a substitute for their judgment.

Q: How should programs teach learners to use AI in clinical reasoning? Three core competencies: prompting AI effectively, critically evaluating its outputs, and knowing when human reasoning should override an AI recommendation. These skills are distinct from foundational reasoning development — not substitutes for it.

Q: How do cognitive biases in LLMs affect clinical reasoning training? LLMs can display the same biases that affect human clinicians at greater magnitudes, and prompting the model to avoid them does not reliably correct the output. AI literacy must include critical appraisal of those outputs — not just fluency in using the tools.

References

Berzin TM, Topol EJ. Preserving clinical skills in the age of AI assistance. Lancet. 2025 Oct 18;406(10513):1719. doi: 10.1016/S0140-6736(25)02075-6. PMID: 41109709.

Goh E, Gallo R, Hom J, et al. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Netw Open. 2024;7(10):e2440969. doi:10.1001/jamanetworkopen.2024.40969

Goh, E., Gallo, R.J., Strong, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med 31, 1233–1238 (2025).

Hutchins, E. (1995). Cognition in the wild. MIT Press.

Landon, S., Savage, T., Greysen, S.R. et al. Variation in Large Language Model Recommendations in Challenging Inpatient Management Scenarios. J GEN INTERN MED (2025).

McDuff, D., Schaekermann, M., Tu, T. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).

Omar, M., Soffer, S., Agbareia, R. et al. Sociodemographic biases in medical decision making by large language models. Nat Med 31, 1873–1881 (2025).

Torre D, Daniel M, Ratcliffe T, Durning SJ, Holmboe E, Schuwirth L. Programmatic Assessment of Clinical Reasoning: New Opportunities to Meet an Ongoing Challenge. Teach Learn Med. 2025 Jun-Jul;37(3):403-411. doi: 10.1080/10401334.2024.2333921. Epub 2024 May 25. PMID: 38794865.

Wang J, Redelmeier DA. Cognitive biases and artificial intelligence. NEJM AI. 2024;1(12):AIcs2400639. doi:10.1056/AIcs2400639. 

Wang J, Redelmeier DA. Forewarning Artificial Intelligence about Cognitive Biases. Med Decis Making. 2025 Oct;45(7):913-916. doi: 10.1177/0272989X251346788. Epub 2025 Jun 24. PMID: 40553457; PMCID: PMC12413502.

More in this series

This is the first post in a three-part series on AI and clinical reasoning education, produced in partnership with leading medical educators.

Part 2: Teaching clinical reasoning in the age of AI: what the evidence shows — Verity Schaye, MD, MHPE, NYU Grossman School of Medicine, on structured AI integration, deskilling risk, and what the research actually supports. Read more here.

This post draws on research and curriculum work presented by Justin Choi, MD, MSc, Assistant Professor of Medicine at Weill Cornell Medicine.

Explore how AI-enabled clinical simulation can benefit your institution. Schedule a demo of DDx today.

Schedule  Demo