Read more

June 01, 2022
4 min read
Save

Q&A: Machine learning models identify patients with long COVID

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

NIH researchers using machine learning models identified key characteristics of people who have or likely have long COVID.

The findings were published in The Lancet Digital Health.

Care access is a critical piece of the long COVID puzzle.

“The most powerful predictors in these models are outpatient clinic utilization after acute COVID-19, patient age, dyspnea and other diagnosis and medication features that are readily available in the electronic health record,” Emily R. Pfaff, PhD, MS, a research assistant professor in the department of medicine at the University of North Carolina School of Medicine, and colleagues wrote. “The model is transparent and reproducible and can be widely deployed in individual health care systems to enable local research recruitment or secondary data analysis.”

The researchers used the National COVID Cohort Collaborative’s (N3C) EHR repository to develop three XGBoost machine learning models. The models were created to identify potential patients with long COVID based on data from 597 patients who visited a long COVID clinic as well as data from 97,995 adult patients who had COVID-19. The N3C has amassed over 8 million EHRs.

Overall, non-hospitalized patients who received care at a long COVID clinic were disproportionately women, and hospitalized patients with long COVID were disproportionately Black and more likely to have a comorbidity. Pfaff and colleagues reported that the models identified potential long COVID cases “with high accuracy.” Although the models could not definitively categorize key features of long COVID, reoccurring trends included post-COVID-19 respiratory symptoms and associated treatments, non-respiratory symptoms, preexisting risk factors for greater acute COVID-19 severity and proxies for hospitalization.

“These models are not designed to be used in the care setting — rather, they are designed for use in big data environments, where it is necessary to identify large numbers of patients who look like they may have long COVID,” Pfaff told Healio.

Healio spoke with Pfaff to learn more about development of the models and what they revealed about long COVID.

Healio: Can you describe the basics of the models you developed?

Pfaff: Because specialty care for long COVID is only available for a subset of patients with long COVID, it’s important that we don’t rely exclusively on those patients who have received specialty care to learn about long COVID in the general population. At the same time, we cannot research long COVID without knowing who long COVID patients are, and the clinical patterns they share. Our models are therefore intended to identify patients in EHR data who “look like” patients seen by physicians for long COVID. Algorithms like this allow us to apply patterns to find patients in big data resources who might otherwise escape our (human) notice, if they have not been expressly “labeled” as having long COVID using a diagnostic code. Once identified algorithmically, patients can potentially be contacted for participation in clinical trials, or their deidentified data can be used for retrospective analysis.

Healio: How do your models differ from other methods of identifying long COVID?

Pfaff: The definition of long COVID is very much in flux; one physician’s method of identifying whether a patient has long COVID may greatly differ from another’s. Our method is specifically designed to operate in a big data environment, where millions of patients and their clinical features can be considered at once, rather than one-by-one in a clinic. It is not designed to be “better” or “more accurate” than a physician’s judgment by any means but is rather intended to apply patterns that a clinician might notice at a very large scale. This scale, as well as the geographic and demographic diversity of the patient population in our repository, make our approach unique.

Healio: Given that patients with long COVID warrant special care, do you think this kind of care is accessible?

It is clear that specialty care for long COVID is not available to as many people as could benefit from that care. In addition to sparsity of specialists in this new area of study, access to care in general is an ongoing issue that impacts long COVID as well. If you cannot take time off from work, find childcare, have insurance or have reliable transportation, it is near impossible to seek and obtain the level of care necessary to help patients with this debilitating condition. Care access is a critical piece of the long COVID puzzle.

Healio: How has your understanding of long COVID changed since the formation of the models?

Pfaff: In October of 2021, a diagnosis code for long COVID was released for general use by clinicians (ICD-10 code U09.9). This means that our model is no longer limited to data from patients who sought long COVID specialty care, as this diagnosis code is used in many clinical contexts. Since the original paper was published, we have used data from patients with this new code to further refine the model. In this refined model, though shortness of breath and other respiratory issues are still very important features, we also see prominent non-respiratory features moving towards the top of the list, like fatigue, malaise, heart palpitations, gastrointestinal symptoms and pain. Long COVID is clearly much more than lingering respiratory symptoms from the acute COVID infection and should be treated as such.

Healio: What are the next steps for this research?

Pfaff: We will continue to refine the model as described above, particularly as more and more data become available over time. This work leads nicely into further work attempting to cluster long COVID patients into distinct subphenotypes, which may warrant different treatment.

References:

Pfaff ER, et al. Lancet Digit Health. 2022;doi:10.1016/S2589-7500(22)00048-6.

Scientists identify characteristics to better define long COVID. https://ncats.nih.gov/news/releases/2022/scientists-identify-characteristics-to-better-define-long-COVID. Published May 17, 2022. Accessed May 26, 2022.