Artificial Intelligence Surpasses Doctors in Precision Diagnosis of Eye Disorders

A study led by the University of Cambridge and the University of Birmingham, in collaboration with other colleagues around the world, has unveiled unique findings showcasing the capabilities of the Artificial Intelligence (AI) model GPT-4 in revolutionising the assessment of eye problems.

The study, which was published in the journal PLOS Digital Health, saw a “large language model” called GPT-4 tested on a variety of medical professionals, including junior doctors without specialisation and expert and trainee eye doctors. Each was asked to provide a diagnosis or treatment advice after being shown a series of 87 patient scenarios with a particular eye condition and selecting from four options.

Following the assessment, GTP-4 showcased a markedly superior performance when compared to unspecialised junior doctors, who possess a proficiency level comparable to general practitioners with specialist eye knowledge. The study found that GTP-4’s scores were closely aligned with those of both trainee and expert eye doctors, although top-performing doctors had achieved higher scores.

Researchers suggest that, while large language models are not expected to replace healthcare professionals, they have the capacity to enhance healthcare within the clinical workflow. They propose that advanced large language models, such as GPT-4, could be beneficial in offering guidance on eye-related matters and providing diagnoses. They also suggest they could offer management strategies in controlled environments, such as triaging patients, or in situations where access to specialised healthcare providers is restricted.

With the rapid pace of AI capabilities being developed, our study highlights the potential of AI to complement and enhance modern medicine, particularly when it comes to healthcare for patients with eye disorders, and providing benefit to patients who may have limited access to specialists.

Dr Darren Ting, Senior author of the study, Birmingham Health Partners (BHP) Fellow and Consultant Ophthalmologist, University of Birmingham

GPT-4 and GPT-3.5, also known as ‘Generative Pre-trained Transformers’, undergo training using datasets that consist of hundreds of billions of words sourced from articles, books, and various internet sources. These two models are prime examples of extensive language models, alongside Pathways Language Model 2 (PaLM 2) and Large Language Model Meta AI 2 (LLaMA 2), which are widely utilised. In the conducted study, GPT-3.5, PaLM2, and LLaMA were all subjected to the same set of questions. However, it was observed that GPT-4 provided more precise and accurate responses compared to the other models.

GPT-4 enables the online chatbot ChatGPT to deliver customised answers to inquiries from users. During recent months, ChatGPT has garnered considerable interest in the field of medicine for achieving a passing grade in medical school exams and offering responses that are not only more precise but also more compassionate than those provided by human physicians when addressing patient concerns.

The field of large language models powered by artificial intelligence is progressing at a rapid pace. Following the completion of this study, newer and more sophisticated models have been introduced, potentially approaching the expertise level of professional eye doctors.