Read more

March 16, 2020
4 min read
Save

Safety, reliability crucial in AI development for ECG readings

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Lior Jankelson

The use of artificial intelligence has been a hot topic in cardiology for the past few years. For example, deep neural networks can be used to analyze ECG tracings and may be more accurate than human experts.

Even with these advantages, there may be some hesitation on completely relying on this technology. In a recent research letter published in Nature Medicine, researchers developed a way to integrate smoothed adversarial examples for single-lead ECGs. Researchers found that when subtle adversarial perturbations that are indistinguishable to the human eye were added to ECG tracings, the misdiagnosis rate of the deep learning algorithm was 74%.

Healio spoke with senior author Lior Jankelson, MD, PhD, director of the cardiovascular genetics program at the Heart Rhythm Center at NYU Langone Health, to learn more about how professionals should develop machine learning systems with safety and reliability in mind while also understanding this technology’s potential.

Question: Why was it so important to conduct this research?

#
Lior Jankelson, MD, PhD, director of the cardiovascular genetics program at the Heart Rhythm Center at NYU Langone Health

Answer: The thinking points of this research are stemming from the vast expansion of use of machine learning, artificial intelligence, deep learning and other synonyms for this new, exciting technologies that enable many new medical applications that were impossible until recently and now taking over almost every field in medicine, in particular in cardiology.

Given that immense growth in interest, that is an important study because it’s describing for the first time looking at a potential, very important hurdle or aspect that could maybe raise the need for further development in research.

Q: What do the findings add to the knowledge base?

A: Essentially what we showed in this work is that we can easily generate perturbed examples of ECG tracings that are completely similar to the human reader, even to the expert electrophysiologist, meaning that a reviewer who’s a human being would not see any difference between the original ECG tracing and a modified ECG tracing because we added a very subtle ... intentional noise. This perturbed ECG completely disrupts the best-in-class deep learning algorithm for ECG interpretation.

Q: What are the advantages of constructing smoothed adversarial examples for single-lead ECGs?

A: In its essence, the paper describes a soft point or a potential disadvantage of deep learning algorithms in the sense that they’re very brittle and sensitive to these very fine perturbations that are indistinguishable to the human reader but are destructive to the algorithm. The importance of that is that before we deploy these algorithms in an unsupervised manner — let’s say we put them in millions of wearable ECG recorders that are used by users and getting interpreted by a machine learning algorithm — we have to make sure that these algorithms are resistant to the types of adversarial examples that we’ve shown in that paper.

PAGE BREAK

Q: In your paper, you and your colleagues wrote, “These findings question the safety of using deep learning in analyzing ECGs at a scale where millions of tests may be run every week by widespread consumer devices.” What needs to be done to make this process safer, especially since we’ve been hearing a lot about the use of deep learning lately?

A: As of now, I would basically stick to the prevailing recommendation, which is that every ECG, if it’s done for a medical purpose — for instance, if it’s done for the purpose of finding an arrhythmia or deciding on initiation or cessation of treatment — every decision like that should be taken based on a human expert interpretation of the ECG based on personal review of the tracing rather than by counting on a machine to do that. Again, this is not to say that machine learning algorithms are not correct. They’re very accurate, but there’s still work to be done until we can ensure that they could be completely independent and resistant to various types of perturbations.

Q: Do you think there will be a time when, eventually, machine learning can analyze these ECGs and they themselves can determine what a patient has without human analysis?

A: In reality, that stage has already passed. There have been multiple publications of machine learning algorithms that interpret ECGs in a better and more accurate way than even the best of the best of experts.

Yet, these algorithms may be vulnerable to problems, which are basically not a problem for a human being. This is what we’re showing in the paper. We’re designing a perturbation that is completely not a problem to the human reader because we just ignore that.

There is a gentle interplay between accuracy and robustness. For now, we can design algorithms that are very accurate. Next, we need to make sure that we understand exactly the robustness boundaries of deep learning and manage the characteristics that can offset this balance between accuracy and robustness. Until we do that, we should probably stick to validating every ECG by a physician or expert. There is little question that in the future, we will have to depend on completely automated classification, so we better get to work.

PAGE BREAK

Q: What further research is needed in this area?

A: We need to come up with better and enhanced algorithms take into account that specific sensitivities to features that are unique to deep learning models such as adversarial examples and would need to make the algorithms resistant to these examples. There are many ways to do that, and we are actually exploring some these days.

One conceptual way would be to train these algorithms in the presence of potential perturbations, ie, potential adversarial examples. This approach can be challenging because we would need to anticipate each potential “noise” the algorithm may encounter. Another way would be to combine between machine learning algorithms and traditional physiologically driven classifiers. On one hand, we get the benefit of the accuracy of the neural network and other machine learning methodologies, and on the other hand, we get at least something of the physiological reasoning and rationale that we as humans apply when we read ECGs. – by Darlene Dobkowski

Reference:

Han X, et al. Nat Med. 2020;doi:10.1038/s41591-020-0791-x.

For more information:

Lior Jankelson, MD, PhD, can be reached at lior.jankelson@nyumc.org.

Disclosures: The authors report no relevant financial disclosures.