ChatGPT gave better medical advice than real doctors in blind study

Thursday, 04 May 2023 03:09

ChatGPT gave better medical advice than real doctors in blind study

font size decrease font size increase font size
Print
Email

Rate this item

(0 votes)

When it comes to answering medical questions, can ChatGPT do a better job than human doctors?

It appears to be possible, according to the results of a new study published in JAMA Internal Medicine, led by researchers from the University of California San Diego.

The researchers compiled a random sample of nearly 200 medical questions that patients posted on Reddit, a popular social discussion website, for doctors to answer. Next, they entered the questions into ChatGPT (OpenAI’s artificial intelligence chatbot) and recorded its response.

A panel of health care professionals then evaluated both sets of responses for quality and empathy.

For nearly 80% of the answers, the chatbots won out over the real doctors.

"Our panel of health care professionals preferred ChatGPT four to one over physicians," said lead researcher Dr. John W. Ayers, PhD, vice chief of innovation in the Division of Infectious Diseases and Global Public Health at the University of California San Diego.

AI language models could help relieve message burden, doctor says

One of the biggest problems facing today’s health care providers is that they're overburdened with messages from patients, Ayers said.

"With the rise in online remote care, doctors now see their patients first via their inboxes — and the messages just keep piling up," he said in an interview with Fox News Digital.

The influx of messages could lead to higher levels of provider burnout, Ayers believes.

"Burnout is already at an all-time high — nearly two out of every three physicians report being burned out in their jobs, and we want to solve that problem," he said.

Yet there are millions of patients who are either getting no answers or unsatisfactory ones, he added.

Thinking of how artificial intelligence might help, Ayers and his team turned to Reddit to demonstrate how ChatGPT could present a possible solution to the backlog of providers’ questions.

Reddit has a "medical questions" community (a "subreddit" called f/AskDocs) with nearly 500,000 members. People post questions — and vetted health care professionals provide public responses.

The questions are wide-ranging, with people asking for opinions on cancer scans, dog bites, miscarriages, vaccines and many other medical topics.

One poster worried he might die after swallowing a toothpick. Another posted explicit photos and wondered if she’d contracted a sexually transmitted disease. Someone else sought help with feelings of impending doom and imminent death.

"These are real questions from real patients and real responses from real doctors," Ayers said.

"We took those same questions and put them into ChatGPT — then put them head to head with the doctors’ answers."

Doctors rated responses on quality, empathy

After randomly selecting the questions and answers, the researchers presented them to real health care professionals — who are actively seeing patients.

They were not told which responses were provided by ChatGPT and which were provided by doctors.

First, the researchers asked them to judge the quality of the information in the message.

When assessing quality, there are multiple attributes to consider, Ayers said. "It could be accuracy, readability, comprehensiveness or responsiveness," he told Fox News Digital.

Next, the researchers were asked to judge empathy.

"It's not just what you say, but how you say it," Ayers said. "Does the response have empathy and make patients feel that their voice is heard?"

ChatGPT was three times more likely to give a response that was very good or good compared to physicians, he told Fox News Digital. The chatbot was 10 times more likely to give a response that was either empathetic or very empathetic compared to physicians.

It’s not that the doctors don’t have empathy for their patients, Ayers said — it’s that they’re overburdened with messages and don’t always have the time to communicate it.

"An AI model has infinite processing power compared to a doctor," he explained. "Doctors have resource constraints, so even though they're empathetic toward their patient, they often zero in on the most probable response and move on."

ChatGPT, with its limitless time and resources, might offer a holistic response of all the considerations that doctors are sampling, Ayers said.

Vince Lynch, AI expert and CEO of IV.AI in Los Angeles, California, reviewed the study and was not surprised by the findings.

"The way AI answers questions is often curated so that it presents its answers in a highly positive and empathetic way," he told Fox News Digital. "The AI even goes beyond well-written, boilerplate answers, with sentiment analysis being run on the answer to ensure that the most positive answers are delivered."

An AI system also uses something called "reinforcement learning," Lynch explained, which is when it tests different ways of answering a question until it finds the best answer for its audience.

"So, when you compare an AI answering a question to a medical professional, the AI actually has far more experience than any given doctor in relation to appearing empathetic, when in reality it is just mimicking empathetic language in the scenario of medical advice," he said.

The length of the responses could have also played a part in the scores they received, pointed out Dr. Justin Norden, a digital health and AI expert and a professor at Stanford University in California, who was not involved in the study.

"Length in a response is important for people perceiving quality and empathy," Norden told Fox News Digital. "Overall, the AI responses were almost double in length compared with the physician responses. Further, when physicians did write longer responses, they were preferred at higher rates."

Simply requesting physicians to write longer responses in the future is not a sustainable option, Norden added.

"Patient messaging volumes are going up, and physicians simply do not have time," he said. "This paper showcases how we might be able to address this, and it potentially could be very effective."

AI answers could be ‘elevated’ by real doctors

Rather than replacing doctors’ guidance, Ayers is suggesting ChatGPT could act as a starting point for physicians, helping them field large volumes of messages more quickly.

"The AI could draft an initial response, then the medical team or physician would evaluate it, correct any misinformation, improve the response and [tailor it] to the patient," Ayers said.

It’s a strategy that he refers to as "precision messaging."

He said, "Doctors will spend less time writing and more time dealing with the heart of medicine and elevating that communication channel."

"This will be a game changer for the patients that we serve, helping to improve population health and potentially saving lives," Ayers predicted.

Based on the study’s findings, he believes physicians should start implementing AI language models in a way that presents minimal risk.

"People are going to use it with or without us," he said — noting that patients are already turning to ChatGPT on their own to get "canned messages."

Some players in the space are already moving to implement ChatGPT-based models — Epic, the health care software company, recently announced it is teaming up with Microsoft to integrate ChatGPT-4 into its electronic health record software.

Potential benefits balanced by unknown risks

Ayers said he is aware people will be concerned about the lack of regulation in the AI space.

"We typically think about regulations in terms of stop signs and guard rails — typically, regulators step in after something bad has happened and try to prevent it from happening again, but that doesn't have to be the case here," he told Fox News Digital.

"I don't know what the stop signs and guard rails necessarily should be," he said. "But I do know that regulators could set what the goal line is, meaning the AI would have to be demonstrated to improve patient outcomes in order to be implemented."

One potential risk Norden flagged is whether patients’ perceptions would change if they knew the responses were written or aided by AI.

He cited a previous study focused on mental health support, which found that AI messages were far preferred to human ones.

"Interestingly, once the messages were disclosed as being written by AI, the support felt by the receiver of these messages disappeared," he said.

"A worry I have is that in the future, people will not feel any support through a message, as patients may assume it will be written by AI."

Tinglong Dai, professor of operations management and business analytics at the Johns Hopkins Carey Business School in Baltimore, Maryland, expressed concern about the study’s ability to represent real scenarios.

"It is important to note that the setting of the study may not accurately reflect real-world medical practice," he told Fox News Digital.

"In reality, physicians are paid to provide medical advice and have significant liabilities as a result of that advice. The claim that AI will replace doctors is premature and exaggerated."

Study highlights ‘new territory’ for AI in health care

While there are numerous unknowns, many experts seem to agree this is a first-of-its-kind study that could have far-reaching implications.

"Overall, this study highlights the new territory we are moving into for health care — AI being able to perform at the physician level for certain written tasks," said Norden.

"When physicians are suffering from record levels of burnout, you see why Epic and partners are already planning to incorporate these tools into patient messaging."

Fox News