Executive Summary
This article summarizes a recent research paper titled Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models to examine how ChatGPT performs in the USMLE, a comprehensive US-based medical examination.
π Quote: “medical students often spend approximately 300β400 hours of dedicated study time in preparation for this exam”
3 Key Takeaways
π‘ Demonstrated the potential of large language models (LLMs) in medical education.
π‘ Showed that LLMs can perform at or near the passing threshold for medical licensing exams without specialized training.
π‘ Highlighted the potential of LLMs to generate novel insights that can assist human learners.
Let’s dive deeper into the research: π
Research
Artificial Intelligence (AI) has been making waves in various industries, and healthcare is no exception.
π Recommended: Will GPT-4 Save Millions in Healthcare? Radiologists Replaced By Fine-Tuned LLMs
A recent study published in PLOS Digital Health has shed light on the potential of ChatGPT in the realm of medical education.

ChatGPT, developed by OpenAI, was put to the test on the United States Medical Licensing Exam (USMLE), a comprehensive three-step standardized testing program covering all topics in a physician’s fund of knowledge.
π‘ The results were impressive, with ChatGPT performing at or near the passing threshold for all three exams without any specialized training or reinforcement.
The study revealed two major themes: the rising accuracy of ChatGPT and its potential to generate novel insights that can assist human learners in a medical education setting.

Rising Accuracy of ChatGPT
ChatGPT’s performance on the USMLE was noteworthy. The AI model achieved over 50% accuracy across all examinations, exceeding 60% in some analyses. The USMLE pass threshold is approximately 60%, so ChatGPT is now approaching the passing range. This is a significant achievement considering that the AI model received no specialized training or reinforcement for the USMLE.

Interestingly, ChatGPT’s accuracy was lowest for Step 1, followed by Step 2CK, followed by Step 3. This mirrors the subjective difficulty and objective performance for real-world test takers on Step 1, which is collectively regarded as the most difficult exam of the series.
Potential for AI-Assisted Human Learning
Beyond its impressive performance, ChatGPT also demonstrated a high level of concordance and insight in its explanations. This means that the AI model was able to generate responses that were internally consistent and provided novel insights, which could be beneficial for human learners.

In fact, the study found that at least one significant insight was present in approximately 90% of ChatGPT’s outputs. This suggests that ChatGPT could potentially teach medicine by surfacing novel and nonobvious concepts that may not be in learners’ sphere of awareness.
π Recommended: GPT-4 is Out! A New Language Model on Steroids
Future Implications
The results of this study are promising and suggest that AI, specifically large language models like ChatGPT, could play a significant role in medical education in the future. The rising accuracy of these models, combined with their ability to generate novel insights, could make them valuable tools for teaching and learning.

Moreover, the study also highlighted the potential for AI to assist with traditionally onerous writing tasks in clinical practice, such as composing appeal letters to payors, simplifying radiology reports to facilitate patient comprehension, and even brainstorming when faced with nebulous and diagnostically challenging cases.
As AI continues to evolve and mature, it’s likely that its impact on medical education and clinical practice will only grow. This study provides a glimpse into that future, demonstrating the potential of AI to enhance the delivery of individualized, compassionate, and scalable healthcare.
You can read the full research paper here:
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models by Tiffany H. Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos, Lorie De Leon, Camille ElepaΓ±o, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, Victor Tseng. Published on 2022-12-20. Cited by 339. PDF
Also join us to stay tuned with the lastest tech developments by downloading one of our coding cheat sheets: