But translating hundreds of different sleep trackers into a common language is a challenge. A team has taken on the task, which could ultimately lead to device-agnostic screening for obstructive sleep apnea.
By Sree Roy
A paradox bedevils the sleep medicine subspecialty: patients are sleeping next to more sensors than ever before, yet the vast majority of obstructive sleep apnea (OSA) cases remain hidden. Consumer sleep tracking data typically exists in proprietary silos, making it difficult for clinicians to interpret or compare across different device ecosystems.
“The reason it stays hidden is not this shortage of data; it’s that the data currently doesn’t speak a common language,” says Elie Gottlieb, PhD, head of applied sleep science at Sleep.ai and co-investigator of a new study presented at SLEEP 2026 that begins to bridge this gap.
The investigation, “Machine learning-based prediction of sleep apnea using objective sleep data from 138 consumer sleep technologies,” developed a “translation layer” known as a cross-device harmonization framework, allowing the underlying physiological signal to become portable across the consumer ecosystem. Ultimately, a screening model could flag OSA risk regardless of what brand of ring, watch, nearable, and/or software, a consumer opts for.
Sleep Data Dialects
A barrier to using consumer sleep technology in a clinical setting is heterogeneity. Every manufacturer employs different hardware—ranging from wrist-based accelerometers and photoplethysmography to bedside radio frequency and sonar—and processes that data through proprietary, “black box” algorithms.
Gottlieb notes that even the vocabulary varies. For example, Apple may categorize a certain stage as “core sleep,” while other devices use “light sleep,” and clinical polysomnography refers to N1 and N2 stages.
“A doctor that’s looking at wearable sleep data has no reliable way of knowing how much to trust it, apart from pulling up a publication on the performance evaluation of that versus polysomnography, which they typically don’t have time to do,” Gottlieb says. “They also don’t know how to compare it to another patient on a different device.”
Most validation studies have focused on a single device in a controlled setting. While some researchers have conducted head-to-head comparisons of multiple sleep trackers, the Sleep.ai study scales this logic across 138 different devices and apps. The dataset included 19,431 users and approximately 4.3 million nights of sleep data contributed through Apple HealthKit.
Translating, or ‘Harmonizating’ Data Dialects
To support scalable, device-agnostic OSA screening, the investigators first had to “harmonize” the data. Gottlieb describes this process as a translation where each device’s “dialect” of sleep is mapped onto a common, validated scale.
The framework uses a proprietary anchor: Sleep.ai’s non-contact measurement technology (radio frequency and sonar) that has been validated against PSG in more than 14 publications. When data from a third-party device—such as a Fitbit, Oura ring, or Garmin watch—enters the system, the machine learning model identifies how that specific device systematically differs from the reference anchor.
The model also performs “gap-filling.” Because some devices only report binary sleep-wake data while others provide full staging, the model intelligently estimates missing metrics based on millions of nights of parallel recordings. Gottlieb emphasizes that this is not generative AI “hallucinating” data, but rather a focused expert model that flags any inferred values as “estimated” to ensure clinical transparency.
Model Performance
The investigator’s best model achieved an Area Under the Curve (AUC) of 0.77. In screening terms, this suggests that roughly three out of four times, the model correctly ranks a person with OSA as being at higher risk than a person without it.
While an AUC of 0.77 is considered an acceptable level of discrimination for a first-generation population-level screening tool, Gottlieb suggests the number may actually be a conservative “floor” rather than a “ceiling.” This is due to the inherent difficulty of labeling a large-scale dataset based on self-reported clinical diagnoses. That’s because there are people labeled in the “non-sleep apnea” group who “just don’t know it yet,” he says.
In this scenario, if the model correctly identifies the “signature” of OSA in a user who has not yet been diagnosed, the analysis marks it as a false positive, thereby penalizing the model for being accurate. Gottlieb believes this contamination deflates the measured performance and that prospective validation against PSG—which the team is currently planning—will likely yield higher precision.
Sleep Instability as a ‘Fingerprint’ of OSA
One of the study’s more significant findings for sleep physicians is the importance of longitudinal sleep patterns over nightly averages. While age and gender remain the most powerful baseline predictors of OSA risk, the sleep-based features that rose to the top of the machine learning model were related to sleep instability.
Key non-demographic predictors included:
- Fragmentation of sleep
- Frequent awakenings
- Sleep onset latency and wake after sleep onset
- Night-to-night variability
Gottlieb explains that because OSA is a disorder of repeated interruptions, it leaves a distinct physiological footprint. However, the severity of these interruptions fluctuates based on sleeping position, alcohol consumption, nasal congestion, and REM cycles.
“A person with sleep apnea doesn’t just have worse sleep on average. They tend to have more inconsistent sleep,” Gottlieb says. “Some nights are bad, some are less bad, and that instability is itself a fingerprint. This is why focusing only on averages can be misleading. You can have someone whose average numbers look broadly acceptable, but underneath that average is a wide swing… and that swing is the tell.”
This shift toward longitudinal patterns leverages the unique advantage of consumer wearables: the ability to collect hundreds of nights of data in a patient’s own bed, rather than a single-night snapshot in a lab.
Integrating Consumer Data into Clinical Workflows
For sleep specialists, the goal of a common framework is not to replace diagnostic studies but to improve the “funnel” of patients entering the clinic. Currently, clinicians often set aside consumer data because there is no trustworthy way to interpret it or compare it to other patients.
A device-agnostic framework could enable:
- Consistent Interpretation: Mapping every device to a validated scale so a physician knows what a “deep sleep” number actually represents.
- Longitudinal Comparison: Comparing a patient to themselves over time, even if they switch from a wrist-based wearable to a bedside sensor.
- Upstream Screening: Identifying high-risk individuals from data already being collected and routing them toward proper diagnostic testing sooner.
“I’d put one clear boundary around all of this. None of it replaces the clinician or the diagnostic study,” Gottlieb says. “What a common framework does is turn a chaotic pile of incompatible consumer data into a consistent, longitudinal input that a physician can actually fold into their judgment.”
Future Consumer Sleep Tech Studies
While the study supports the potential of consumer sleep technology for early OSA identification, the researchers acknowledge that this is a “meaningful step, not a finish line.” The next phase of the research involves rigorous prospective validation against gold-standard PSG in a sleep lab setting.
Conducting such a study with 138 devices simultaneously is logistically impossible, so the team plans to focus on a subset of the most common wearable devices. They are also exploring improved labeling strategies, such as using the NoSAS screener instead of relying solely on participant-reported diagnoses.
The study was supported by Sleep.ai, and the modeling was driven by a data science team including Luke Gahan, Alice Lynch, and Eduardo Parkinson de Castro, in collaboration with Nathaniel Watson, MD, MSc, from the University of Washington.
As the industry moves toward real-world applications, Gottlieb envisions the harmonization and screening capabilities being integrated into business-to-business platforms. This would allow the technology to sit inside various tools and services across the sleep and health ecosystem, rather than being confined to a single consumer app.
“The tools to start closing the screening and subsequent diagnostic gap for sleep apnea may already be sitting on people’s wrists, fingers, and bedside tables,” Gottlieb says.