By Sree Roy

I paused several times while reading the final report published by the Agency for Healthcare Research and Quality (AHRQ) about the effectiveness of CPAP devices on long-term clinically important outcomes in patients with obstructive sleep apnea (OSA). The breaks weren’t because of its size—though, at almost 200 pages, the systematic review is lengthy—but to avert my eyes, as if that could somehow shield me from the grim assessment laid out by the authors.

Spoiler alert: The report, Long-Term Health Outcomes in Obstructive Sleep Apnea: A Systematic Review of Comparative Studies Evaluating Positive Airway Pressure and the Validity of Breathing Measures as Surrogate Outcomes, does not answer the main clinical question it asks. Rather, for each outcome—cardiovascular disease, driving accidents, diabetes, mental health, cognitive function, quality of life, and many others—it details why the evidence is inadequate, over and over again. 

“These conclusions do not imply that CPAP has been proven to be ineffective to reduce these health outcomes,” it states. “The conclusions were in large part based on imprecise, nonsignificant effect sizes. It is unclear whether the failure to find an effect of CPAP treatment on long-term health outcomes is related to a lack of power, insufficient followup duration, or due to an actual lack of effect of CPAP.” All-cause mortality, it says, is the one and only long-term outcome for which there may be an evidence-based case for CPAP, but even this conclusion is based on only a low standard of evidence in which the authors have “limited confidence.” They write, “Based on the patients who were included in the eligible studies, this conclusion may be most applicable to older adults and may pertain most to longer-term followup.”

While disappointing, I didn’t find this non-answer to be particularly excoriating. Providing a direction for “future well-conducted, well-reported studies” to allow more definitive conclusions on the clinical effect of CPAP for adults with OSA, to determine who might most benefit from long-term CPAP treatment, and to evaluate the validity of intermediate measures as potential surrogate or mediator measures of long-term health outcomes seemed hopeful, if time-consuming and expensive. If only that was all the report said. But this determination only scratches the surface of the technological assessment, with the detailed analysis being much more horrifying. 

I should pause here to explain the background of the AHRQ report. The Centers for Medicare & Medicaid Services (CMS) nominated the topic to the AHRQ. CMS places greater emphasis on health outcomes actually experienced by patients, such as quality of life and morbidity and mortality, and less emphasis on outcomes that patients do not directly experience, such as surrogate outcomes and laboratory responses. In case you’re wondering, the apnea-hypopnea index (AHI), the most commonly used metric in clinical sleep practice to diagnose patients with OSA and evaluate its severity, is considered a “laboratory measure.”

The resulting 182-page analysis is intended as a medical reference “to help healthcare decision makers make well-informed decisions and thereby improve the quality of healthcare services,” it states. It reviews both randomized controlled trials and nonrandomized comparative studies published between January 2010 and March 22, 2021.

The published studies with the largest number of participants were nonrandomized comparative studies. I grimaced after reading this assessment: “These were often undertaken because CPAP efficacy…was presumed and placebo treatment was thought to be inappropriate and perhaps unethical…These types of studies are not, however, a substitute for [randomized controlled trials]…Presumptive treatment with ineffective therapy is no benefit to patients and is perhaps unethical.” Ouch.

I also cringed when I read that the authors found that, of the only 40% of studies that explicitly reported apnea and hypopnea definitions, the definitions were erratic. This is despite these same studies claiming the use of “American Academy of Sleep Medicine (AASM) criteria.” Some defined apnea as 100% airflow cessation; others used thresholds down to 75% (well within the definition of hypopnea used by most studies). Among studies that reported hypopnea criteria, almost half required at least 50% airflow cessation, and about half allowed at least 30% airflow cessation.

The AHRQ paper’s authors—and me, as a reader who has reported on sleep medicine for a decade—could not determine why. The AHRQ paper states, “We acknowledge that it is not clear to us whether AASM is allowing too much leeway for each sleep center to define criteria as they see fit or if polysomnographic technologists and sleep physicians are misinterpreting or misapplying the criteria.”

The authors write, “For most studies, it would be very difficult for an outside researcher or clinician to replicate how AHI or oxygen desaturation index (ODI), among other various parameters, were defined and/or to determine which patients would be eligible for study inclusion.” Ouch again. The authors note that small differences in thresholds for apneas, hypopneas, and oxygen desaturation can have large effects on the person’s estimated AHI (or ODI) value—and “thus their potential eligibility for CPAP treatment” (and inclusion in these studies).

The assessment authors also express concern that many of the nonrandomized comparative studies, particularly those that conducted analyses of adherent participants only, may be subject to inherent biases related to self-selection of who chooses (or is chosen for) CPAP, who is adherent with using the device, and different reasons for poor adherence. “Adherence is not a random event,” they note. “Overall, we believe it is likely that studies comparing CPAP use (adherence) with nonuse (nonadherence) may be biased toward increased effectiveness of CPAP.”

Bizarrely, the studies that do provide direct comparisons between CPAP use and nonuse in patients with OSA indicate no real differences between adherent user and intention-to-treat analyses, and it is unclear whether this is due to a lack of power to indicate a difference in effect or a real lack of difference. “If there is, in fact, no difference in effect between adherent and nonadherent CPAP users, this may suggest that either any benefit seen with CPAP use is actually not due to CPAP itself, but to some other behavior or action by CPAP users (for example, increased communication with the sleep clinic) or that even the ‘low dose’ of CPAP achieved by nonadherent users is effective,” the authors conjecture.

Data on long-term health outcomes comparing CPAP with other active treatments is also sadly relatively sparse. “None was designed as a noninferiority or equivalence trial regarding health outcomes,” the authors find. “We conclude that the [randomized controlled trials] do not provide evidence of a difference in effect between auto- versus fixed CPAP on functional status, or between CPAP and [mandibular advancement devices] on depression and anxiety symptoms, [quality of life], or functional status. There is insufficient or no comparative evidence for other comparisons and outcomes.” Sigh.

The analysis includes a dive into the US Food and Drug Administration (FDA) database for its CPAP records. The authors found 163 CPAP devices used to treat adults with sleep apnea. But the findings aren’t reassuring here either. Most FDA 510(k) premarket notification records cite other previously approved CPAP devices to support claims of equivalence. In fact, almost all ultimately refer back to four CPAP devices. 

Unfortunately, the available data did not reference clinical studies or unpublished data submitted to the FDA that may have supported the manufacturers’ claims. “Many of these clearances relied on engineering performance or short studies with unvalidated sleep apnea parameters. It is notable that these same devices could be used for noninferiority studies for other sleep apnea technologies,” the authors write. 

There were also no answers to be found anywhere regarding potential modifiers of the effect of CPAP. That is, whether different CPAP features or settings—such as a ramp or setting a minimal tidal volume per breath—impact outcomes.

Overall, I found the AHRQ report to be disheartening. This feeling was made worse because I know the final report underwent peer review and public comment prior to its release.

For those in a position to take action on the findings, the AHRQ offers the following tips for future studies. 

  • Studies should be powered to evaluate potential differential effects of CPAP based on such factors as baseline AHI (or other validated surrogate metrics), patient symptoms, and other patient characteristics. 
  • Studies need to allow readers to fully understand patient eligibility (such as how OSA was defined, how sleep and breathing measures were measured, and what thresholds and other criteria were applied to each measure). 
  • Studies are needed to assess the validity of AHI, other breathing measures, and sleepiness scores as intermediate or surrogate measures for long-term health outcomes. Ideally, studies should assess and compare multiple intermediate or surrogate measures. 
  • Existing studies or databases may have sufficient data to allow exploratory analyses that can subsequently be further evaluated in rigorous trials.


Balk EM, Adam GP, Cao W, et al. Long-term health outcomes in obstructive sleep apnea: A systematic review of comparative studies evaluating positive airway pressure and validity of breathing measures as surrogate outcomes. Project ID: SLPT0919. (Prepared by the Brown Evidence-based Practice Center under Contract No. 290-2015-00002-I/Task Order No. 75Q80119F32017.) Agency for Healthcare Research and Quality. 2022 Dec 1. Available at:

I recommend you read the full report for yourself: Email [email protected] with your thoughts to potentially be published in our Perspectives column.

Photo 44602934 © Andrey Popov |