ByLori A. Perine, MS

Fact checked byKristen Dowd

October 27, 2023

6 min read

Save

Understanding artificial intelligence bias in pulmonology

ByLori A. Perine, MS

Fact checked byKristen Dowd

Key takeaways:

There are statistical/computational, human and systemic biases in AI.
Harms of bias spread fast in AI due to the nature of the systems.
Knowledge about a specific AI algorithm can reduce potential bias.

Last year, I partnered with colleagues at the National Institute of Standards and Technology, or NIST, to examine the challenges represented by artificial intelligence bias and identify some preliminary guidance for addressing them.

In the publication we produced, NIST Special Publication 1270, we identified three broad sources of artificial intelligence (AI) bias: statistical or computational, human and systemic.

Robot finger typing on a laptop — Knowledge about a specific AI algorithm can reduce potential bias. *Image: Adobe Stock*

Statistical or computational biases arise when there are issues with data quality or representativeness, or due to the “flattening” that occurs during mathematical abstraction needed to create a computational model. Until recently, most attempts to address AI bias have been focused on this type of bias. Modifying data selection practices to address representativeness and/or using statistical debiasing techniques before, during and after model development are common approaches. Although there has been significant progress, statistical and computational debiasing methods for AI currently remain an open and active area of research.

Lori A. Perine

Bias arising from human and systemic sources has been largely overlooked until recently. Human bias, which can have cognitive and/or social dimensions, influences decision-making and may lead to systematic errors, even when no prejudice or bias is intended. Strategies for multistakeholder engagement and attention to diversity, equity and inclusion throughout the AI lifecycle can help to mitigate human biases.

Identification of systemic biases, which tend to be either historical, societal or institutional in nature, is important for key decisions in data selection and model design, as well as interpreting model results and understanding the limits to their generalization.

Understanding biases and their sources is fundamental for managing them in a way that lessens harmful or disparate impacts. Although it is not possible to directly control all sources of bias or to totally eliminate bias, amplification and propagation of harmful biases can be avoided by adopting integrated strategies to address biases from all three sources. That involves framing AI as a sociotechnical system, so that AI is built and deployed in a way that is consistent with societal values of fairness, equity and transparency.

Human factor challenges

When thinking about AI systems, characterizations such as “autonomous” or “black box” often are used. It is easy to overlook that people are involved in key decisions throughout an AI system lifecycle.

In addition, people and communities are impacted by the output of AI systems, sometimes in ways that ripple through society with little opportunity for redress. Human factor challenges are related to how people, with their human and systemic biases, impact the AI system and how people and communities are impacted by the AI system.

A persistent human factor challenge related to AI bias is that AI systems often are built and tested to optimize performance under idealized conditions for a task or use case. Without explicit consideration of the context where the system will be used, the system design and development are disconnected from the intentions and values of the user, and little consideration is given to potential impacts. This disconnect contributes to biased output when an otherwise “accurate” system is deployed in settings for which it is not designed.

A second area of human factor challenge concerns effective use of human-in-the-loop at all stages of the AI lifecycle. Human-in-the-loop is sometimes referenced optimistically as a catch-all solution for faulty or flawed AI system decisions. However, to date, guidance and support for human-in-the-loop configurations have been sporadic and few configurations have been validated broadly in production systems. Also, while use of multidisciplinary teams and multistakeholder engagement are key strategies for addressing human bias, there is no current guidance or model for implementing these strategies in the AI life cycle.

Finally, human factor challenges arise concerning the ability to interpret and validate the output of AI systems. Unless AI system operators and users are involved in design and development, there may be difficulties with evaluating often complex system outputs in the setting where the system is deployed. End-users and impacted individuals or communities may be presented with information about “how” an AI system works rather than “why,” which affects validation of output as well as interpretability. Lack of transparency about decision-making by various AI actors across the AI life cycle can also impact the ability to interpret and validate system output, and to seek redress when that output is biased.

Misconceptions, limitations, disparities in health care

AI systems are valuable tools precisely because they can make sense of information more quickly and consistently than humans. Thus, there is a general tendency to perceive AI and other computationally-based technologies as more fair, objective and less biased than human decision-making. However, AI systems are seeking to model or inform decisions about complex social phenomena.

These phenomena involve important and often complex qualitative input or factors about human behavior and sociocultural dynamics, which can vary according to the setting (context). These factors and their contextuality are difficult to operationalize via mathematical models, leading them to be often overlooked or ignored.

When AI systems are developed according to idealized standards for performance and optimization — without taking into account context, human factors and societal values —important dimensions that impact human lives are lost. AI systems are deployed within trusted institutions and high-stakes settings such as hiring, criminal justice or health care, often under the misconception that they can yield more objective decisions. The reality is that AI systems can and do incorporate negative biases, whether purposeful, inadvertent or emergent. The learning and automated nature of many AI systems allows harms to propagate more quickly, extensively and systematically than human and societal biases on their own.

For health care in particular, any societal and historical biases that create barriers to health care access or contribute to inadequate treatment will be reflected in past data used to train AI for health care use cases. Unless addressed specifically in design and monitoring of systems, those biases will be amplified by algorithms and perpetuate biases in decisions about who receives adequate care for disease prevention and treatment.

AI in pulmonology

In pulmonology, AI technologies are being used in conjunction with imaging for classifying and screening disease, detecting lung abnormalities for specific disease states and predicting need for preventive treatment. Other AI use cases in pulmonology include cytopathology, assessment of lung physiology and respiratory function, and categorization of respiratory events in polysomnography.

It would be prudent to assume that any of these use cases may be impacted by bias. AI algorithms are trained on past data, and past data reflect the societal and institutional biases that are present when the data were collected. Even decisions about which data to collect and which data to use to train and test a model are subject to explicit and implicit assumptions that incorporate human and systemic biases. Data used to train the model may not be representative of the patient population, introducing statistical bias.

To help mitigate bias, pulmonologists can adopt practices to gain as much information as possible about the data and assumptions used in model development. For example, if an AI algorithm is being used to evaluate CT scans to classify fibrotic lung disease in a selected patient population, is the algorithm trained on scans that are representative of the selected patient population? What are geographic, demographic and socioeconomic differences in the presentation of fibrotic lung disease which may or may not be operationalized in the algorithm? Once the limitations and potential biases of applying a particular AI algorithm are understood and documented, it is possible to employ debiasing or mitigation strategies to lessen disparate impacts for individuals, groups and communities.

Sociotechnical risk-conscious AI lifecycle

I have mentioned the importance of taking into account the context, human factors and societal values of the setting where AI is used, and adopting a sociotechnical risk-conscious approach to AI throughout its lifecycle allows us to consider these factors. At each stage of the lifecycle, taking into account the impact of people and the planet allows us to make decisions and adopt strategies that can mitigate bias and other harmful impacts. This is a shift from current AI lifecycle approaches, which tend to be focused only on optimizing system performance.

A very accurate AI system can still be very biased. A sociotechnical risk-conscious approach actively incorporates insight into systemic and human factor sources of bias, which enables better choices with respect to data, proxies and models. Context is taken into account in selecting data and in testing and validating models. It enables multistakeholder engagement throughout the lifecycle, from design and development through operations and monitoring.

Currently, the FDA has guiding principles for good machine learning practice for medical device development, and I can state that these principles are very much in alignment with the sociotechnical risk-conscious approach: attention to human factors, better data practices, testing and validation within conditions for intended use, and ongoing risk monitoring.

References:

For more information:

Lori A. Perine, MS, is managing principal at InterpreTech Services, associate researcher of Trustworthy AI at the National Institute of Standards and Technology and PhD candidate at University of Maryland College of Information Studies. She can be reached at lperine@umd.edu.

Sources/Disclosures

Collapse

Disclosures: Perine reports no relevant financial disclosures.

Read more