Pushing back against the idea that artificial intelligence in sleep care must be inscrutable to be useful.
By Sree Roy
A worry permeates the sleep subspecialty that the steady advance of artificial intelligence (AI) will lead to a dystopian future where black-box algorithms dominate, rendering sleep physicians unable to explain the results appearing on their magic screens. In this scenario, the physician tells the patient, “My computer says you have sleep apnea.” He might shrug his shoulders while clicking to the next field, before continuing: “It recommends positional therapy. Here’s your prescription. Have a great day.”
A different view is that sleep physicians of the future will understand the AI’s outputs and even find them intuitive. Sleep AI’s strengths are in its superhuman speed, notably pattern recognition and algorithm-based tasks, reaching the same conclusions as humans would, just more quickly (and occasionally arriving at better conclusions, but we’ll get to that).
EnsoData co-founder Sam Rusk describes how instinctual an AI model’s outputs can feel for sleep specialists. Its EnsoTherapy offering, for example, predicts and prioritizes the patients who need help with CPAP adherence, and is designed as a time-saver, not to change the paradigm. “Who is at the top of the list is often very intuitive,” says Rusk, the company’s chief AI officer. The AI creates a worklist categorizing each patient’s potential problems, such as mask leak, decreasing usage, and residual apneas—the same data a sleep tech would naturally check on.
Physicians should never be in the position of signing prescriptions based on the conclusions of an inscrutable digital oracle. They should instead think, “I understand why the AI says that, and I would eventually have come to the same conclusion, given enough time.” Though people often conflate AI with a black box, the two terms are not synonymous.
The Rogue AI
Data scientist Jon S. Agustsson, PhD, encountered a situation that illustrates why a human retaining an understanding of sleep biosignals is important. While validating an AI model’s performance in sleep study scoring, his team tested a model that had performed excellently on US sleep studies on a dataset from Europe. They expected it to do well with this dataset too. Instead, out poured jibberish. “What came out was total nonsense,” says Agustsson, vice president of AI and data science at Nox Medical.
The model hadn’t gone rogue in some science fiction sense. It had just encountered an environmental variable no one had anticipated. As it turns out, in the United States, artifacts related to the electricity grid register as 60 Hertz on EEGs, and the model had learned to filter out this noise during training. European power grids, however, operate at 50 Hertz. The model had interpreted the unfamiliar 50 Hertz signals as meaningful data rather than artifacts. “So the 50 Hertz was all over the place,” Agustsson says.
For an AI developer, a lesson is to ensure each model is validated on data representative of the market in which it will be released. But also, because not every erroneous output can be predicted in advance, to facilitate transparency. “If we had presented this as a black-box model and no one had been able to see what was going on, they would just be like, ‘Those are really strange numbers,’ but have no idea why,” Agustsson says. It’s only when physicians can review and challenge the data that trust develops. “That’s how to build confidence in the output of the models,” he says.
When AI Sees Beyond Human Limitations
Sleep AI expert Ankit A. Parekh, PhD, is on a team that is developing an AI model to classify sleep stages more effectively via a training method known as self-supervision, which means the AI learns relevant clinical features from biosignals without using human-labeled outcomes. But does that imply it’s going to classify sleep stages differently from humans?
“I would say it’s almost like a human would,” says Parekh, an assistant professor, medicine in the Division of Pulmonary, Critical Care, and Sleep Medicine, and an assistant professor in the Department of AI and Human Health at the Icahn School of Medicine at Mount Sinai. Those small differences can be linked to humans’ subjective biases, usage of published scoring guidelines, and other exposures and opinions to which the AI model is immune.
For example, Parekh says, the model may classify an epoch as N1 sleep while a human scorer classifies it as wake. The human is constrained by technical rules in published scoring handbooks. But the AI may be discerning subtle periods that it understands are light sleep. “The discrepancies that we get with the human may actually highlight some new information,” Parekh says.
The key insight is that AI doesn’t just replicate human judgment—it can transcend the subjective biases, fatigue, and cognitive limitations that inevitably influence us.
Such discrepancies become dangerous to the sleep subspecialty only when physicians cannot trace the reasoning behind the AI recommendations or lack access to the raw information that would reveal obvious errors. The antidote to “rogue” AI isn’t less artificial intelligence—it’s more transparency and physician education about how these systems work and where they might fail.
We recommend for you:
ID 22154177 © Almagami | Dreamstime.com