The market for artificial intelligence in health care is about as transparent as a brick wall. New tools achieve impressive results in published studies. But it is often difficult to compare them head to head with similar products, or tell whether they will work on different kinds of patients.

A new company is promising to change that — if it can convince AI developers to expose their products to more rigorous testing.

Called Dandelion Health, the New York-based firm is launching a first-of-its-kind public service to evaluate AI products on independent data designed to root out weaknesses and reveal bias. The company said Wednesday its initial pilot program, to begin next month, will focus on testing algorithms that use electrocardiograms to predict heart conditions.

Irving Loh, a cardiologist and AI expert at the Ventura Heart Institute, a private practice in California, said the company’s goal of becoming a public utility for AI validation is laudable, especially to address issues of racial, ethnic, and geographic bias. But he added that its success depends on getting enough data, from enough sources, to adequately assess whether a given algorithm will work across all those different groups.

“It may still under-represent African Americans and Asian populations, depending on exactly which site sources they are using and the demographics of those populations,” said Loh, who is not involved in the company.

Dandelion’s executives said they have compiled de-identified data on 10 million patients from three health systems — Sharp HealthCare in San Diego, Sanford Health in South Dakota, and Dallas-based Texas Health Resources. Sanford is the largest rural provider in the country, while city-based Sharp and Texas Health Resources have large concentrations of Black, Hispanic and Asian patients.

“We were really careful to choose systems that were as far apart as possible on a lot of different dimensions that we cared about,” said Ziad Obermeyer, a co-founder and chief scientific officer of the company.

Dandelion aims to fill a gaping void in the market for health AI tools by becoming an independent certifier of quality. Data needed for such testing is expensive and difficult to obtain, and there is currently no third party to perform it. The Food and Drug Administration reviews some AI products, but its evaluations offer no assurance to potential customers that a specific tool will work in their systems and patient populations. That lack of clarity has made it harder for AI companies to win adoption of their products and convince insurers and other parties that they’re worth the cost.

Dandelion is structured as a for-profit and has raised seed funding from a trio of venture capital firms. Its pilot on EKG algorithms is being carried out with funding from the Gordon and Betty Moore Foundation, which also supports STAT’s reporting on artificial intelligence. The company does not plan to charge AI developers during the pilot, but hopes that future customers — from major drug companies to startups — will pay for access to its data.

Elliott Green, a co-founder and chief executive of the company, said Dandelion’s most urgent focus is getting buy-in from a broad array of AI developers and customers interested in building confidence in the usefulness of AI tools.

“This EKG validation is the beginning of that process of being able to say, ‘I don’t want any of these products to be used on patients unless they’ve been validated,’” he said.

EKG, a cheap and commonly performed test, is an especially hot area of algorithm development. AI can analyze an EKG’s waveforms  to identify potentially deadly conditions, such as a weak heart pump or hypertrophic cardiomyopathy. A variety of businesses are developing EKG algorithms, including Apple and health systems such as Mayo Clinic and Cedars Sinai.

Dandelion’s executives said the results of the testing would remain private unless the developer of the AI tool wants to share them publicly. While that might undermine efforts to create transparency in the early going, the company’s hope is that its testing will become a commonly used benchmark of quality.

“It plugs into this culture in computer science of the public leaderboard,” Obermeyer said, noting that computer vision models are often compared based on their performance on ImageNet data. “I think that’s where we’re heading eventually.”

The company will begin accepting EKG algorithms for testing on July 15, and expects the initial pilot phase to run for three months. It may eventually expand beyond EKGs, to test algorithms developed with radiology images, clinical notes, and other types of data.

This story is part of a series examining the use of artificial intelligence in health care and practices for exchanging and analyzing patient data. It is supported with funding from the Gordon and Betty Moore Foundation.

Original article by Casey Moss