As a growing number of artificial intelligence tools make their way into patient care, healthcare technology experts are calling for vendors to more clearly explain how the tools were developed and get that information into clinicians’ hands.
The Food and Drug Administration, which oversees AI that meets the criteria for a medical device, has cleared more than 200 of what it calls AI- and machine learning-enabled medical devices in recent years—including AI tools that can triage patients based on radiology images, identify eye disease from images of the back of the eye and detect abnormal lesions found during endoscopies. There are also numerous AI algorithms used in clinical care that the agency doesn’t regulate, such as those deemed low risk to patients.
Many AI products released to date advertise they’re designed to support, not replace, a clinician’s judgment.
But without a clear window into how to interpret an algorithm’s results, that’s hard to do—often making doctors nervous about trusting an AI application’s recommendations.
“One of the reasons why trust is a challenge is because these algorithms are a bit of a black box,” said Dr. Edmondo Robinson, senior vice president and chief digital officer at Tampa, Florida-based Moffitt Cancer Center.
It’s not always clear which variables an AI algorithm is analyzing or how it’s reaching its conclusions, and there’s emerging evidence that AI can integrate harmful biases if not carefully monitored.
Having information on how an algorithm was developed and validated is one way to garner trust among clinicians, so they have a better idea of what to do with the results and how generalizable they are. But AI developers don’t always disclose that information. Many algorithms used in imaging, for example, haven’t publicly shared validation studies with datasets of more than 1,000 patients, and some haven’t posted any validation data, according to a study published in Academic Radiology last year.
“We understand that the actual algorithms themselves are proprietary,” Robinson said. “But how did you develop it, and what were your approaches to that development?”
Today, those are questions Moffitt Cancer Center asks AI developers it’s considering purchasing tools from. But having that information proactively disclosed in a standard way would streamline procurement and make it easier to share with clinicians.
The healthcare industry is still figuring out what information would be most helpful for AI developers to report, as well as how to encourage wider adoption of those practices.
A label like “Nutrition Facts” displayed on food is one possible solution gaining traction, with proponents including tech leaders at Mayo Clinic and Duke University. Other suggestions include federal regulation, guidelines from accreditation groups and individual hospitals taking up the mantle as AI purchasers.
For AI to become widely adopted, the healthcare industry will need to sort out expectations for how the technology is integrated into care—from what product information is made available to how clinicians are trained to use it.
The healthcare industry is “building those ecosystems as we are starting to use it,” said Suresh Balu, program director for the Duke Institute for Health Innovation and associate dean for innovation and partnership at the Duke School of Medicine. “It’s like having a car but not … the gas stations.”
Taking A cue from ‘Nutrition Facts’
The health innovation hub at Duke University School of Medicine in Durham, North Carolina, has instituted its own internal process for how to get information into the hands of clinicians.
In years past, staff at the Duke Institute for Health Innovation would regularly field questions about how their AI projects were built and how well they stack up against other tools, said Dr. Mark Sendak, population health and data science lead at the institute. So they decided to start proactively putting answers to common questions on a one-page “Model Facts” label, modeled after “Nutrition Facts.”
That transparency is part of “building trust and accountability,” Sendak said. “It’s being able to provide people answers to their questions.”
The label includes the algorithm’s intended outcome, target population and possible risks of incorporating the AI results into clinical decisions. It’s paired with additional documents for people interested in more detailed information, such as longer instruction manual-style guides.
It has received positive feedback from Duke Health clinicians, and Sendak and Balu are interested in seeing how it could scale. They’re developing frameworks that other hospitals could use to create their own product labels and instruction manuals.
That documentation would look different based on the stakeholder, since business unit leaders interested in deploying and supporting a tool will want different information than clinicians using it.
But although “Nutrition Facts” provided the inspiration for “Model Facts,” AI isn’t as simple as food.
While the nutrients in a can of soup remain stable over time, an AI tool could change how it makes decisions as it ingests more data, and its accuracy could vary across hospitals. That means a new label would need to be developed for each organization where an algorithm is deployed and regularly updated.
It’s also not clear who or what type of organization would encourage adoption of such a label.
Sendak said he hopes to see regulators “raise the bar” on what’s required for AI developers to share about their algorithms, which the FDA is moving toward. For AI algorithms not regulated by the agency, this type of reporting framework could be supported by organizations like accrediting bodies encouraging hospitals to assemble this information as a best practice.
The FDA late last year convened a public workshop on the transparency of AI- and ML-enabled medical devices, in part seeking input on what information would be helpful for manufacturers to provide in labeling. The agency already requires certain labeling for medical devices but could expand upon that for products with AI components.
The FDA will use input gathered during that workshop to inform its approach, according to an emailed statement from the agency.
“The agency acknowledges that artificial intelligence- and machine learning-enabled medical devices have unique considerations that necessitate a proactive, patient-centered approach to their development and utilization that takes into account issues including usability, equity, trust and accountability,” the statement reads, pointing to an AI action plan the FDA released last year. “Promoting transparency is a key aspect of a patient-centered regulatory approach, and we believe this is especially important for AI/ML-enabled medical devices, which are heavily data-driven, may incorporate algorithms exhibiting a degree of opacity, and may learn and change over time.”
Trade groups, like the Consumer Technology Association, have released their own standards on reporting and other best practices for trustworthy AI. Researchers have published reporting guidelines for AI developers, but a research paper released last year found even some of the most commonly used algorithms report on just half of those recommendations.
The FDA already has labeling requirements for medical devices, which can be applied to medical AI cleared by the agency, said Zach Rothstein, senior vice president for technology and regulatory affairs at the Advanced Medical Technology Association, a medical device trade group.
The FDA could add requirements for the type of information that’s needed on AI device labels. But it doesn’t make much sense to require separate, additional labels for the AI itself, he said.
“At its core, the FDA has existing labeling regulations that are very easily applicable to any type of new technology that might exist within a medical device,” Rothstein said. While AI presents new questions from a technological perspective, requiring separate labels for a specific type of technology wouldn’t be consistent with how the FDA typically regulates products, he added.
Rothstein urged hospitals and health systems to teach clinicians about what AI is, so they’re more comfortable using tools that incorporate the technology.
Dr. John Halamka, president of Rochester, Minnesota-based Mayo Clinic’s tech and big data effort, Mayo Clinic Platform, has been a vocal advocate of “Nutrition Facts”-style labels for AI. Such a label could help give an AI algorithm “credibility,” so that clinicians understand whether a recommendation is relevant for their patients, he said.
He would also like to see AI developers share information on data they used to train algorithms, including race, ethnicity, gender, age and other demographics, as well as performance characteristics—like false positives and false negatives—and whether those vary by subpopulation.
“We of course would hope that the training set is similar to the patient in front of you right now,” Halamka said. “But how could you know?”
Ultimately, Halamka said he would like to see testing labs set up to evaluate AI, so that information is assessed by a public or private sector third-party.
But a potential mountain of information has led some experts to gravitate toward longer labels, saying a “nutrition label”—while appealing—will have trouble capturing AI’s complexity.
The 2021 research paper—a pre-print that hasn’t been peer-reviewed—identified 15 AI reporting guidelines proposed by researchers, which include items related to use case, product development and fairness. If an AI developer reported the information suggested in each of the frameworks, they would be reporting 220 items in total.
“If we put 220 items on a ‘nutrition label,’ it’ll be too damn long,” said Nigam Shah, a co-author on the study and professor of medicine and co-director of the Center for AI in Medicine and Imaging at Stanford University in Palo Alto, California. The industry needs to coalesce around which variables are most important to include, he said.
He suggested thinking about labels less like “Nutrition Facts” that outline all the ingredients in a product and more like a package insert that comes with pharmaceuticals. Just like package inserts walk through a drug’s indications, dosage and side effects, a similar AI label could explain the algorithm’s intended use, in addition to how it was made.
A longer package insert template would also leave room for information that’s useful for different people involved in the procurement process, like clinicians, IT staff and others.
That said, Shah isn’t convinced yet that a pharmaceutical-style label is the best approach. It’s just one idea he wants to see the healthcare industry consider.
There’s the potential for information overload, particularly if there are dozens of variables that make it difficult for providers to sift through and identify what they actually need.
“How many patients and doctors read that thin leaflet that comes with every prescription?” Shah said. “It’s fine to report all of this … but what are we going to do with it?”
At Stanford Health Care, Shah said they’re working on setting up a virtual testing environment, so that researchers can run AI algorithms on historical medical data held at the health system. That way, the organization can assess whether the tool works as expected with their patients and get a sense of whether a deployment would pay off.
It’s basically a “try before you buy” scenario, Shah said of the process, which he calls a virtual model deployment.
The role of the hospital
Dr. Atul Butte, chief data scientist at University of California Health, said he’d suggest hospitals take a page from how they already think about prescription drugs.
Today, hospitals typically have a group—such as a pharmacy and therapeutics committee—that oversees the organization’s drug formulary, taking on tasks like medication use evaluations, adverse drug event monitoring and medication safety efforts. That approach could work for AI algorithms, too.
Butte suggested hospitals set up a committee focused on algorithmic stewardship to oversee the inventory of AI algorithms deployed at the organization. Such committees—composed of stakeholders like chief information officers, chief medical informatics officers, medical specialists and staff dedicated to health equity—would determine whether to adopt new algorithms and routinely monitor algorithms’ performance.
Just like accrediting bodies such as the Joint Commission require medication use evaluations for hospital accreditation, a similar process could be established for evaluating algorithm use.
“Instead of inventing—or re-inventing—a whole review process, why not borrow what we already do?” Butte said.
That stewardship process could go hand in hand with a pharmaceutical-style label that explains the subpopulations an algorithm was trained on, outcomes from clinical trials and even what types of equipment and software the tool pairs well with—for example, if there’s a developer that’s only tested its image analysis software on X-rays from certain vendors.
“I can see a complicated label being needed some day,” Butte said. “It’s going to have to be sophisticated.”
That’s part of the role hospitals can take on to ensure clinicians are given high-quality AI tools that they know how to use as part of patient care.
Regulators, AI developers, hospital governance committees and clinician users all have a shared responsibility to monitor AI and check that it’s working as expected, said Suchi Saria, professor and director of the Machine Learning and Healthcare Lab at Johns Hopkins University and CEO of Bayesian Health, a company that develops clinical decision-support AI.
Hospitals and AI vendors should empower clinicians to report when an AI recommendation is different from their judgment so they can assess whether there’s a reason why, she said.
“They each have a role to play in making sure there’s end-to-end oversight,” Saria said, including measuring performance over time. “It’s not on one body alone.”