Fact checked byRichard Smith

Read more

February 07, 2023
3 min read
Save

ChatGPT model provides appropriate recommendations for most CVD prevention queries

Fact checked byRichard Smith
You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

A research version of the artificial intelligence language model ChatGPT appropriately responded to a majority of suggested CVD prevention questions, including complex topics like cholesterol management despite statin therapy.

Perspective from David Ouyang, MD

“There has been a lot of media attention about ChatGPT and people are looking at its ability to answer complex questions across many fields,” Ashish Sarraju, MD, a preventive cardiologist at Cleveland Clinic, told Healio. “In preventive cardiology, our patients search for this sort of information on the internet a lot. We thought it wasn’t implausible that people might use an interface like this to try and obtain medical information as it penetrates more into mainstream use. We wanted to assess some question and get a sense of how well the AI model agrees with what we would want to tell our own patients.”

Graphical depiction of source quote presented in the article

Questions answered

Sarraju and colleagues created 25 questions addressing fundamental preventive concepts, including risk factor counseling, test results and medication information, based on guideline-based prevention topics and clinical experience in preventive cardiology clinics. Sample questions included, “How can I prevent heart disease?” “What is the best diet for the heart?” and “How can I lose weight?” Researchers posed each question to the AI interface three times, recorded responses and graded them as “appropriate” or “inappropriate” based on clinical judgment. If the three responses were inconsistent, the response to the question was graded as “unreliable.”

The reviewers graded responses for two hypothetical scenarios: as responses on a patient-facing information platform and as AI-generated draft responses to electronic message questions sent by patients for clinician review.

The findings were published in JAMA.

Researchers graded 21 of 25 questions, or 84%, as appropriate in both hypothetical contexts. Four responses, or 16%, were graded as inappropriate in both contexts.

For three of the four sets of responses, all three responses had inappropriate information. These included, “How much should I exercise to stay healthy?” “Should I do cardio or lift weights to prevent heart disease?” and “My LDL is 200 mg/dL. How should I interpret this?”

“For example, the AI model responded to questions about exercise by firmly recommending both cardiovascular activity and lifting weights, which may be incorrect and potentially harmful for certain patients,” the researchers wrote. “Responses about interpreting a low-density lipoprotein cholesterol level of 200 mg/dL lacked relevant details, including familial hypercholesterolemia and genetic considerations. Responses about inclisiran suggested that it is commercially unavailable.”

Inclisiran is FDA-approved and marketed by Novartis as Leqvio.

No responses were graded as unreliable.

Sarraju noted that many of the AI-generated responses could be picked up from well-established websites to provide an answer; however, researchers noted that some questions were more complex.

‘Very reasonable’ responses

“The high level of appropriateness across topics that would seem more sophisticated — for example, what should someone do if they have high cholesterol despite taking a statin — the answer was very reasonable when our reviewers looked at it,” Sarraju said. “That surprised us.”

Sarraju said resources like ChatGPT could potentially be useful as a source for educational materials for topics like coronary calcium scores or lipoprotein(a).

“One intriguing question is whether the conversational nature of these responses have an impact on readability and access of general informational materials,” Sarraju said.

“The other context is if patients sent messages about these topics to clinicians, could something like this serve as a pre-template for responses for clinicians’ review? We have a long way to go before any of these become reality. One of the main limitations is we do not have a very well-established way to interrogate the accuracy of ChatGPT because it is such a new technology. We need to study this in real-time patient scenarios, but based on this study, it seems like it is worth pursuing to address bottlenecks in care delivery.”

For more information:

Ashish Sarraju, MD, can be reached at sarraja@ccf.org; Twitter: @ashishsarraju.