Artificial intelligence (AI)—broadly defined as the science of developing machines that can simulate human thinking—offers great potential to improve health care and reduce costs. For example, medical professionals and health care systems are starting to use AI to facilitate faster diagnoses, predict the likelihood of a particular patient outcome, and help conduct research on new treatments. But like any medical product, these software tools can also pose risks. The algorithms that underpin AI software may have hidden biases that lead to disparate outcomes for different patient groups, potentially reinforcing long-standing inequities.
Many health-focused AI software products fall outside the authority of the Food and Drug Administration (FDA), but since 2015 the agency has approved or cleared more than 100 medical devices that rely on AI, including a handful used to screen COVID-19 patients. The potential algorithmic biases posed by such products can affect health equity, a principle suggesting that disparities in health outcomes caused by factors such as race, income, or geography should be addressed and prevented.
Still, despite a growing awareness of these risks, FDA’s review of AI products does not explicitly consider health equity. The agency could do so under its current authorities and has a range of regulatory tools to help ensure AI-enabled products do not reproduce or amplify existing inequalities within the health care system.
Origins of AI and its unique bias risks
Bias in AI-enabled products can occur for a variety of reasons. Sometimes the datasets used to develop and test an algorithm may be incomplete or unrepresentative, which could lead to poor performance in a more diverse, real-world context. For example, in a 2019 paper, researchers at DeepMind Technologies Limited, which was acquired by Google in 2014, highlighted some limitations with an AI algorithm created to predict which patients at U.S. Department of Veterans Affairs hospitals were most likely to experience decline in kidney function. Because men represented about 94% of patients in the training data, the algorithm proved less effective when tested on women. At the time, the model was not implemented in clinical care, with DeepMind acknowledging that additional training and validation would be necessary before that happens.
In other cases, an algorithm may be targeting the wrong thing entirely. For example, researchers found in 2019 that a software program that was intended to identify patients with complex health care needs disproportionally flagged White patients for additional interventions in one hospital, although Black patients tended to be sicker. The algorithm underpinning the software used past health care costs to predict future health care needs but because Black patients often face greater barriers to accessing care, their overall health care costs tend to be lower. That meant the algorithm proved less likely to predict that they would have more complex health care needs. This type of bias—related more to the assumptions that go into algorithm design—can be difficult to detect.
Unlike many traditional medical devices, such as hip implants that are prescribed or used on a per-patient basis, software products tend to be embedded within an institution’s broader information technology infrastructure and run automatically in the background for all patients—beyond the control of individual providers. Depending on their purpose, such products may affect the care of any patient treated at a particular clinic or hospital, not just those with a particular condition. This increases the potential scope of impact if the product proves biased in some way.
Premarket review and clear labeling can reduce bias risk
During premarket review, FDA can help mitigate the risks of bias by routinely analyzing the data submitted by AI software developers by demographic subgroup, including sex, age, race, and ethnicity. This would help gauge how the product performed in those populations and whether there were differences in effectiveness or safety based on these characteristics. The agency also could choose to reject a product’s application if it determined, based on the subgroup analysis, that the risks of approval outweighed the benefits.
Today, FDA cannot require software developers to submit subpopulation-specific data as part of their device applications, though it encourages them to do so. The agency released guidance in 2016 on how to collect and submit such data, and how to report disparities in subgroup outcomes. However, it is not clear how often this data is submitted, and public disclosure of this information remains limited.
Given these concerns, clear labeling about potential disparities in product performance also could help to promote health equity. For example, if an applicant seeks FDA approval for an AI-enabled tool to detect skin cancer by analyzing patient images, but has not tested it on a racially diverse group of images, the agency could require the developer to note this omission in the labeling. This would alert potential users that the product could be inaccurate for some patient populations and may lead to disparities in care or outcomes. Providers can then take steps to mitigate that risk or avoid using the product.
Within FDA, the Office of Women’s Health and the Office of Minority Health and Health Equity (OMHHE) could develop guidance to drive product-review divisions to consider health equity as part of their analysis of AI-enabled devices. In June, OMHHE took an encouraging step by launching the Enhance Equity Initiative, which aims to improve diversity in the clinical data that the agency uses to inform its decisions and to incorporate a broader range of voices in the regulatory process.
The Center for Devices and Radiological Health, which is responsible for approving medical devices, can also diversify the range of voices at the table to support more equitable policies through its ongoing Patient Science and Engagement Initiative. Following an October 2020 public meeting on AI and machine learning in medical devices, FDA published an updated action plan that emphasized the need for increased transparency and building public trust for these products. The agency committed to considering patient and stakeholder input as it works to advance AI oversight, adding that continued public engagement is crucial for the success of such products.
AI can help patients from historically underserved populations by lowering costs and increasing efficiency in an overburdened health system. But the potential for bias must be considered when developing and reviewing these devices to ensure that the opposite does not occur. By analyzing subpopulation-specific data, calling out potential disparities on product labels, and pushing internally for the prioritization of equity in its review process, FDA can prevent potentially biased products from entering the market and help ensure that all patients receive the high-quality care they deserve.
Liz Richardson leads The Pew Charitable Trusts’ health care products project.