Summer of ‘05

“Recurrent cellulitis,” said Ms. L.

“How many times have you had it?” I asked.

“At least 15 times in the last 2 years,” she said with deep resignation, “Do you think I need antibiotics again?”.

I was a freshly minted second-year internal medicine resident seeing a patient new to our clinic, a spry 80-year-old woman with a history of chronic, but well-controlled, hypertension, and little else. Except, of course, this bothersome ‘bilateral cellulitis’ that was back again after multiple rounds of antibiotics.

I was stumped... Looking at her bright red and tender lower legs, I was inclined to prescribe that next round of antibiotics. Yet, something felt wrong, and the patient felt it, too.

She left the clinic that day without a prescription for antibiotics, but also without a diagnosis. My cluelessness was only outweighed by my curiosity to make a diagnosis and prevent future recurrences (1).

Clinical Reasoning: A ‘Disorienting Dilemma’

Despite a substantial increase in academic study and publications about clinical reasoning (CR) over the past two decades since I first met Ms. L, teaching and assessing CR remains a disorienting dilemma in health professions education. Renowned developmental psychologist Robert Kegan defines a disorienting dilemma as a challenging experience that leads a person to question their fundamental assumptions about themselves and the world. CR fits this definition both for learners trying to develop this skill and faculty in trying to teach and assess it for the following reasons:

  • Lack of Standardization: There is no consensus definition for the term nor its complex and numerous sub-competencies. There is considerable variability across institutions in how CR is defined, taught, and assessed.
  • Challenges in unpacking CR: CR is, by its nature, a tacit cognitive process, which confounds the ability of supervisors to observe it directly. Learners often struggle to articulate these internal reasoning processes, particularly under the typical time pressures of clinical settings.
  • Growing pressures on faculty: Clinical faculty face increasing productivity demands that limit time to observe learners during patient encounters, thus limiting opportunities for learner feedback or to probe the meta-cognitive processes of their learners.
  • Limited patient exposure: Many institutions anecdotally report that medical students and other health professions students are seeing fewer patients on average, thus limiting their ability to practice CR and receive faculty feedback. Additionally, there is considerable variability in exposure to patients with different medical conditions. Moreover, students predominantly see patients who have already undergone a workup, having had a diagnostic label applied before they even set eyes on them.

The Role of Virtual OSCEs

Providing learners with high-quality simulated patient encounters—particularly those incorporating augmented or artificial intelligence—offers one promising avenue to address key challenges in teaching and assessing CR because they:

  • Empower students and residents through more customizable and deliberate practice of CR skills that can be done at any time, in any place
  • Can be intentionally designed and sequenced to align with curricular learning, while layering in increasing diagnostic complexity as learners advance in their training
  • Serve as an indefatigable source of copious and frequent feedback on CR skills
  • Offer a mechanism to observe nearly all steps students and residents take in an encounter, while simultaneously tracking diagnostic accuracy and efficiency

Among the growing number of private-sector solutions seeking to harness AI for medical education, Sketchy’s DDx platform stands out for its unique interface featuring both AI-driven simulated patients and attending physicians.

In early 2025, I became aware of Sketchy’s DDx platform and its potential to address critical gaps in the CR education landscape. Since then, I have consulted with Sketchy to help elevate the educational rigor of DDx and to establish it as an evidence-based approach to teaching and assessing CR teaching and assessment. Building a strong academic-industry relationship, Sketchy and I convened a several-day in-person CR summit, engaging an expert group of medical educators. We believed that this partnership was essential because the vast majority of academic health centers, medical and health professions schools are not positioned to invest the time or money needed to capitalize on AI technologies the way that private industry can. Sketchy’s commitment to meaningful partnerships with academic physician leaders in developing DDx ensures that innovation remains grounded in authentic clinical reasoning experiences and is responsive to the extant medical education literature.

A Literature Review

One of the major goals of the Summit was to comprehensively review scholarship on CR while capitalizing on the vast experience of gathered seasoned medical educators with expertise in CR education and assessment, and then develop an assessment tool to embed in the DDx platform.

Of the many exciting approaches to assessing clinical reasoning that have appeared in the medical education literature, we felt that two best aligned with DDx’s virtual OSCE platform: IDEA and ART. The IDEA framework was developed to assess written documentation of trainees in simulated or actual patient interactions, and it has demonstrated strong inter-rater reliability and has made a correspondingly strong validity argument (2). Notably, the IDEA framework did not include narrative anchors for its 3-point Likert Scale at the time of publication, and detailed descriptors were later developed, demonstrating strong inter-rater reliability (3).

The ART framework was developed primarily to assess trainees’ CR skills vis-à-vis their oral patient presentations in simulated or actual patient interactions. Interestingly, the authors also note that it can be used as a tool to foster direct observation of learners. The authors have subsequently demonstrated strong inter-rater reliability while providing a compelling argument for its validity (4, 5).

The impact of these frameworks in advancing the field of CR cannot be understated, as they are simultaneously deeply evidence-informed AND highly practical, thus making them feasible for implementation in highly varied training programs. Yet, each has limitations. IDEA requires that students/residents craft a patient note that is then assessed using the rubric. Note writing is rarely directly observed, and few notes are reviewed carefully. Additionally, a submitted learner note is often influenced by the CR of other, more senior team members or of other specialties that have already interacted with and documented on the patient. Thus, the CR demonstrated in the learner note may not reflect the unique clinical reasoning of the learner. From a psychometric perspective, to use IDEA summatively, several notes would need to be reviewed for each learner to ensure reliability, fairness, and to avoid bias in assessment. The sheer number of notes needing review could easily overwhelm even the most committed clinical faculty members.

The ART framework also requires learners to evaluate a patient and then craft an oral patient presentation. Similar to the IDEA framework, oral presentations are often influenced by the ideas and critical thinking of others. However, ART is more practical in terms of assessor time/energy. Attending physicians or senior residents already listen to these presentations regularly, and incorporating these ratings into a rounding workflow is much more feasible.

Upon reviewing these two frameworks, we found that ART more closely aligns with educator needs and the learner experience in DDx. However, key aspects of what makes DDx a unique educational and assessment experience would not be captured if we used the ART rubric as is.

Therefore, we created a novel rubric that incorporates aspects of ART, while expanding that framework to include assessment items specific to the virtual, simulated, AI-driven, data-rich experience provided in DDx. This framework is firmly rooted in the key elements of CR, including:

  • Early Hypothesis Generation: Generating an initial differential diagnosis based on provided patient demographics and presenting concerns
  • Focused and Hypothesis-Driven Data Gathering: Gathering a history and physical examination in an efficient and hypothesis-driven manner
  • Problem Representation and Refining the Differential Diagnosis: Articulating a complete problem representation as well as an updated, prioritized differential diagnosis after engaging in and processing information from data gathering steps
  • Diagnostic Justification: Providing a sound rationale for why some diagnoses are more likely than others
  • Evaluation and Management: Directing evaluation and treatment towards high-priority diagnoses in a manner that reflects high-value testing
  • Metacognition and Adaptability: Demonstrating the ability to reflect on one’s cognitive tendencies, emotional forces, and influences of the patient care setting while adjusting to feedback provided by an attending physician during the patient interaction

Many of these elements will be immediately familiar to those who use ART; however, some important additions are included that reflect the change in context from a post-encounter oral patient presentation to a directly observed simulated patient/attending physician encounter.

Returning to Ms. L

NASA scientists, building on Luft and Ingham’s work, described four kinds of uncertainty: 

  1. things we know and recognize we know (Known Knowns)
  2. things we know we don’t know (Known Unknowns)
  3. things we know but aren’t aware we know (Unknown Knowns)
  4. things we neither know nor realize we don’t know (Unknown Unknowns).

Reflecting on my experience with Ms. L, as well as many other patients I have encountered in my career as an academic hospitalist, my uncertainty type was ‘Known Unknown’. I was aware that I did not know what the diagnosis was. This category of uncertainty can be a powerful motivator for learning. In my case, it led me to scour textbooks and journals for mimics of recurrent cellulitis, ultimately helping me unearth this patient’s unexpected diagnosis.

Bear with me a bit longer, and we will get to that diagnosis, I promise! Let’s consider the elements of clinical reasoning DDx reinforces for learners, based on my efforts as a trainee:

  • I formulated an initial differential diagnosis based on the patient’s demographics.
  • My interview was hypothesis-directed (for the most part) and yielded key information: despite not taking her last two antibiotic prescriptions, her symptoms hadn’t progressed, she didn’t develop a fever, and the ‘cellulitis’ resolved in the same time course as it typically did when antibiotics had been prescribed.
  • I revised that differential as I gathered more data (although I could not include the ultimate diagnosis in my list because I was not aware of it!)
  • Finally, I come to the importance of metacognition and adaptability in the CR process. Taking a diagnostic pause and considering the information available to me while incorporating the patient’s experience and explanatory model for her illness (‘this is definitely not cellulitis’, she said) made me feel more comfortable having Ms. L leave without an antibiotic prescription, despite her legs having an appearance identical to countless other patients I had previously cared for who I had diagnosed with cellulitis.

Ms. L’s case illustrates the complexity of CR and how its different subcomponents integrate when students/residents/health care providers attempt to determine a patient’s diagnosis. This complexity contributes to physician diagnostic errors. Diagnostic errors, in turn, remain a key factor leading to patient harm. When I received Ms. L’s labs back later that week, I did not expect her platelet count would be 900,000. It was the only abnormality that came back in the screening labs I had ordered. And while it was highly relevant to her ultimate diagnosis, it still didn’t provide me with any more clarity at the time. That clarity didn’t come until I happened upon a textbook chapter on myeloproliferative disorders. Erythromelalgia, a condition associated with essential thrombocythemia, better explained the high platelet count and recurrent bouts of ‘cellulitis’ that improved with time, not antibiotics.

This case had a happy conclusion. Ms. L’s erythromelalgia ceased after she started on aspirin therapy. In this case, it was not diagnostic accuracy that saved the day, but rather diagnostic humility.

Stepping back, I believe academic medicine’s current understanding of clinical reasoning also lies within the ‘Known Unknown.’ Just as I knew Ms. L didn’t have cellulitis while not knowing what she did have, medical educators recognize that we lack a perfect model for teaching and assessing CR. This collective uncertainty is a call to action—to learn, innovate, and refine. New technologies that harness AI, like DDx, provide a novel path forward to help us understand how learners develop clinical reasoning and how best to nurture their growth as diagnosticians.

References

  1. The case of Ms. L was previously published in a non-peer-reviewed format: “Cassese T and Dhaliwal G. 3/16/2012. Clinical Problem Solving Exercise: iRounds. Doximity.com.
  2. Baker EA, Ledford CH, Fogg L, Way DP, Park YS. The IDEA Assessment Tool: Assessing the Reporting, Diagnostic Reasoning, and Decision-Making Skills Demonstrated in Medical Students' Hospital Admission Notes. Teach Learn Med. 2015;27(2):163-73. doi: 10.1080/10401334.2015.1011654. PMID: 25893938.
  3. Schaye V, Miller L, Kudlowitz D, Chun J, Burk-Rafel J, Cocks P, Guzman B, Aphinyanaphongs Y, Marin M. Development of a Clinical Reasoning Documentation Assessment Tool for Resident and Fellow Admission Notes: a Shared Mental Model for Feedback. J Gen Intern Med. 2022 Feb;37(3):507-512. doi: 10.1007/s11606-021-06805-6. Epub 2021 May 4. PMID: 33945113; PMCID: PMC8858363.
  4. Thammasitboon S, Rencic JJ, Trowbridge RL, Olson APJ, Sur M, Dhaliwal G. The Assessment of Reasoning Tool (ART): structuring the conversation between teachers and learners. Diagnosis (Berl). 2018 Nov 27;5(4):197-203. doi: 10.1515/dx-2018-0052. PMID: 30407911.
  5. Thammasitboon S, Sur M, Rencic JJ, Dhaliwal G, Kumar S, Sundaram S, Krishnamurthy P. Psychometric validation of the reconstructed version of the assessment of reasoning tool. Med Teach. 2021 Feb;43(2):168-173. doi: 10.1080/0142159X.2020.1830960. Epub 2020 Oct 17. PMID: 33073665.
  6. To learn more about Johari Windows or the 4 box model for uncertainty you can start with: (a) https://stephen.fm/known-unknown-matrix/ and (b) https://www.proteanpreparedness.consulting/articles/posts/the-known-the-unknowns-the-johari-window/

Explore how AI-enabled clinical simulation can benefit your institution. Schedule a demo of DDx today.

Schedule  Demo