JTo read the medical literature, you might think that AI is taking over medicine. It can detect cancers in images earlier, find heart problems invisible to cardiologists, and predict organ dysfunction hours before it becomes dangerous for hospital patients.
But most AI models described in reviews — and touted in press releases — are never used in the clinic. And the rare exceptions fell far short of their revolutionary expectations.
On Wednesday, a group of teaching hospitals, government agencies and private companies unveiled a plan to change that. The group, billing itself as the Coalition for Health AI, called for the creation of independent testing bodies and a national registry of clinical algorithms to allow doctors and patients to assess their suitability and performance, and to eliminate the biases that so often skew their results.
“We don’t have the tools today to understand whether machine learning algorithms and these new technologies being deployed are good or bad for patients,” said John Halamka, president of Mayo Clinic Platform. The only way to change that, he said, is to study their impacts more rigorously and make the results transparent, so users can understand the benefits and risks.
Like many such documents, the coalition’s blueprint is only a proclamation — a set of principles and recommendations that are eloquently articulated but easily ignored. The group hopes its many members will help spark a national conversation and concrete steps to begin governing the use of AI in medicine. Its blueprint was developed with input from Microsoft and Google, MITER Corp, universities such as Stanford, Duke and Johns Hopkins, and government agencies such as the Food and Drug Administration, National Institutes of Health and the Centers for Medicare & Medicaid Services.
Even with some level of buy-in from these organizations, the hardest part of the job remains to be done. The coalition needs to come to a consensus on ways to measure the usability, reliability, security, and fairness of an AI tool. It will also need to establish the test labs and registry, determine which parties will host and maintain them, and convince AI developers to cooperate with new oversight and increased transparency that could conflict with their business interests.
As things stand, there are few benchmarks that hospitals can use to help test algorithms or understand how well they will work on their patients. Healthcare systems have largely been left to their own devices to sort through the complex legal and ethical questions posed by AI systems and how to implement and monitor them.
“Ultimately, each device should ideally be calibrated and tested locally at each new site,” said Suchi Saria, professor of machine learning and healthcare at Johns Hopkins University, who helped create the plan. . “And there should be a way to monitor and adjust performance over time. That’s key to really evaluating safety and quality.
A hospital’s ability to perform these tasks should not be determined by the size of its budget or access to data science teams typically only found in the largest academic centers, experts said. The coalition is calling for the creation of multiple labs across the country to allow developers to test their algorithms on more diverse data sets and check them for bias. This would ensure that an algorithm based on California data could be tested on patients in Ohio, New York and Louisiana, for example. Currently, many algorithm developers – especially those located in academic institutions – are building AI tools on their own data, which limits their applicability to other regions and patient populations.
“Only by creating these communities can you do the kind of training and tuning needed to get to where we need to be, which is the AI that serves us all,” Brian said. Anderson, chief medical officer of digital health at MITRE. “If all we have are researchers training their AIs on patients in the Bay Area or patients in the Upper Midwest, and not cross-training, I think that would be a very sorry state.”
The coalition is also discussing the idea of creating an accreditation body that would certify an algorithm’s suitability for use on a given task or set of tasks. This would help provide some level of quality assurance, so that the appropriate uses and potential side effects of an algorithm could be understood and disclosed.
“We need to establish that AI-guided decision-making is useful,” said Nigam Shah, professor of biomedical informatics at Stanford. This requires going beyond evaluations of an algorithm’s mathematical performance to study whether it actually improves outcomes for patients and clinical users.
“We need a mindset shift from admiring the algorithm’s output and how beautiful it is to saying, ‘Okay, let’s put some elbow grease into incorporating this into our work system and see what happens,” Shah said. “We need to quantify utility rather than just performance.”
This story is part of a series examining the use of artificial intelligence in healthcare and the practices of sharing and analyzing patient data. It is funded by the Gordon and Betty Moore Foundation.
#coalition #aims #fill #credibility #gap #medicine #testing #surveillance