Read more

April 27, 2024
4 min read
Save

Natural language processing can inform real-time MDRO screening

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Key takeaways:

  • Natural language processing can analyze patient history and physical notes.
  • A study found that it may be able to help hospitals make MDRO screening decisions.

HOUSTON — Natural language processing could reduce the need for manual chart reviews to determine a patient’s risk for carrying or being infected with multidrug-resistant organisms, according to a study.

The study, presented at the Society for Healthcare Epidemiology of America Spring Conference, tested a natural language processing (NLP) algorithm’s ability to analyze clinician notes for terms related to long-term care facility (LTCF) exposure, which is a risk factor for carriage of multidrug-resistant organisms (MDROs).

IDN0424Goodman_Graphic_01_WEB
Data derived from Goodman K, et al. Abstract 14. Presented at: Society for Healthcare Epidemiology of America Spring Conference; April 16-19; Houston.

Randomly sampling 1,020 adult admissions across the 12-facility University of Maryland hospital system between 2016 and 2021, Katherine Goodman, PhD, JD, assistant professor in the department of epidemiology and public health at the University of Maryland School of Medicine, and colleagues manually reviewed each patient’s history and physical (H&P) notes for mention of LTCF exposure.

Of the 1,020 patients, 7% of their H&P notes documented LTCF exposure either in the preceding 90 days or they were classified as an LTCF resident. Goodman and colleagues trained a machine learning model to identify LTCF exposure from the H&P notes, which it did with a C-statistic of 0.89 (95% CI: 0.80-0.98), indicating that it was highly accurate.

In clinician H&P notes, the researchers found that the most important predictor words for LTCF exposure were “rehab,” “place,” “status,” “EGD” — for esophagogastroduodenoscopy — “dementia” and “facility.”

We spoke with Goodman at the conference to find out how useful this algorithm and others could be for clinicians.

Healio: What is natural language processing?

Goodman: Natural language processing is an approach of using automated techniques, statistical and machine learning-based classifiers to analyze and extract information from free text data.

Most traditional models can’t function on what's called unstructured data, so that's “free text notes.” Normally, in most models, it has to be what’s called structured data. So, if you're analyzing data from an EHR, it's things that are extractable — such as sex: male, one, female, zero — they're in a structured, machine-readable format. Natural language processing is using automated techniques to be able to actually analyze free text data that traditionally had to be done by a human.

Healio: What did this algorithm do?

Goodman: A lot of our work is focused on patients coming into a hospital carrying highly drug resistant bacteria silently in their guts. They will be asymptomatic, but they can transmit those bacteria to other patients on the floor and pose a risk to them. Or, they themselves, if they get abdominal surgery or become immunocompromised, could have that silent carriage bloom into a full-blown infection.

There are many reasons that actually knowing whether a patient is carrying these bacteria is valuable, but it's not practical to screen everyone at hospitals. So, we try to develop targeted screening algorithms that identify the highest risk patients, and we can just screen them.

The most important risk factors for coming into the hospital and carrying these bacteria are things that happen to you before you reach the hospital. One of the absolute most important being, Were you recently or are you currently a resident of long-term care facility, where there's a lot of antibiotic use and a lot of other exposures that drive pressure and select for these organisms? But shockingly, in effectively no electronic health record systems in the country is there a structured data field for whether patients have recent long-term care facility exposure.

Healio: But that is recorded in the text notes from doctors and nurses?

Goodman: Yes, it’s in the notes [from] when you come in and they do your intake history — not always, but that’s the most likely place that it will be documented. Or it'll be mentioned incidentally. There's no way to get that meaningful information [from the notes] out of this effectively “locked” data without something like NLP or a human. We've traditionally had humans do chart review, but it's very time consuming and it's not practical for a targeted algorithm that needs to fire right when that patient hits the hospital to say, “Yep, high risk, screen them for drug resistant bacteria.”

Healio: Who would use a system like this?

Goodman: In an ideal world ... it would pop up in the EHR, when the patient's either in the ED about to get admitted or when they come to the floor. It would say, “screen,” and then you would do a rectal swab and send it to the lab.

We could see value in this for infection preventionists and hospital infection control committees, who are often doing this type of work, even for outbreak investigations when they look back at the chart to ascertain risk factors that were missed or were meaningful. You could also see value from these types of automated tools for that type of retrospective work. And certainly, for research. This comes up time and again in our research on drug-resistant bacteria. You need to know if a patient in a long-term care facility was exposed. Right now, the only way we can do that in research is if we have a medical student or a research assistant — or the investigator themselves — going through every patient's chart and looking for that information.

Healio: What’s the next step in developing this for real-world use?

Goodman: We are going to evaluate some other classifiers. We got very good performance even from what is a very simple classifier, but you do have to have comparisons. We are going to validate and apply the same process at an outside health care system that’s not the University of Maryland medical system. And then, really, the elephant in the room is that the period in which this work started, large language models and generative AI came on the scene. They excel at working with natural language, or free text unstructured data. So, we are now starting the piloting work on actually performing the same process on the same notes here and head-to-head to a large language model generative AI.