Tech

GPT-4 challenges eye doctors, manages remarkably close assessment

April 19, 2024

107

[ad_1]

OpenAI’s GPT-4 has managed to perform remarkably well against eye doctors, claims a new report. The assessments generated by the LLM (Large Language Model) Generative AI (Artificial Intelligence) managed to beat non-specialist junior doctors and trainees. The Microsoft-funded Gen AI even came close to matching expert eye medics.

GPT-4 answers ophthalmology assessment MCQs

A study, published in PLOS Digital Health journal, has proven how Gen AI LLMs could help in the medical field. Speaking about the results, Arun Thirunavukarasu, the lead author of the paper said,

“What this work shows is that the knowledge and reasoning ability of these large language models in an eye health context is now almost indistinguishable from experts. We are seeing the ability to answer quite complicated questions.”

He was referring to GPT-4’s ability to answer MCQs (Multiple Choice Questions) about ophthalmology. In all, the study reportedly asked GPT-4 87 MCQs. Five expert ophthalmologists, three trainee ophthalmologists, and two unspecialized junior doctors answered the same questions.

👁️ #AI is much better than non-specialist doctors at assessing eye problems and providing advice, Cambridge researchers have found.

GPT-4 could triage patients and decide which #ophthalmology issues are emergencies that need immediate attention👇 https://t.co/nX9OYQb1XR

— Cambridge University (@Cambridge_Uni) April 18, 2024

The study designed a questionnaire from a textbook for testing trainees on everything from light sensitivity to lesions. It is interesting to note that the contents of the textbook aren’t available in the public domain. Hence, the researchers believe OpenAI may have trained its LLMs during an internal training exercise.

During the study, researchers gave ChatGPT, equipped with GPT-4 or GPT-3.5, three tries to answer definitively. If it failed, researchers marked the response as “null”.

GPT-4 beats some eye doctors but can’t match experts yet

From the 87 different patient scenarios GPT-4 reportedly outperformed the juniors, and achieved similar results to most of the specialists. Specifically speaking, GPT-4 got 60 of the 87 questions right. Junior doctors managed to get an average of 37 correct answers.

Trainees in the field of ophthalmology came pretty close with an average of 59.7 correct answers. Barring one expert, who correctly answered 56 MCQs, the remaining specialists averaged 66.4 right answers.

Comparatively, PaLM 2 managed to get 49 answers correct, GPT-3.5 got only 42, and LLaMa trailed the pack with just 28 correctly answered MCQs.

It is important to note that the study was conducted in mid-2023. In other words, the LLMs have likely gotten way better at understanding and answering complex queries.

The health industry would undoubtedly benefit from ChatGPT, Gemini, and other Gen AI platforms. However, some medical experts have cautioned against relying on Gen AI to diagnose a patient. Such platforms “lack nuance”, they stated. Hence, there could be a concerningly high probability of inaccuracy, cautioned some of the researchers.

[ad_2]

Source link