AASM to Certify Autoscoring Software for Sleep Stage Accuracy

A pilot certification program by the American Academy of Sleep Medicine could build confidence in sleep study autoscoring software and lower interrater variability. It could also impact sleep tech jobs.

By Alyx Arnett

As autoscoring software is increasingly being used in American Academy of Sleep Medicine (AASM)-accredited facilities, the AASM has launched a two-year certification pilot program to verify software performance.

“While autoscoring software requires FDA [Food and Drug Administration] clearance, we found that the process that these companies undergo to confirm the software’s performance was highly variable and without a clear definition of what acceptable autoscoring performance really is,” says Steve Van Hout, MBA, CAE, executive director of the AASM.

For the pilot, the AASM had 11 qualified individuals score over 100 sleep studies from US sleep labs. The privately obtained studies were chosen by AASM staff for their diversity and representation of real-world conditions.

Only stage scoring is being evaluated in the pilot program. Autoscoring software makers must pay a $7,500 fee to apply for the certification. According to Van Hout, sleep facilities and patients can be confident that any software achieving certification performs equivalent to or better than human scorers when evaluating sleep stages.

Building Trust in Autoscoring Software

Some sleep professionals hesitate to trust a machine for sleep scoring due to experiences with autoscoring software that underperformed. “Until the past couple of years, the computer technology to make autoscoring software that scores like a human just hasn’t existed,” says Fred Turkington, product management lead at EnsoData, maker of an AI-assisted sleep autoscoring soluti o n that plans to apply for the new AASM certification.

Glenna Labelle, director of sales and marketing at sleep diagnostics company Cerebra, which also plans to apply for the certification, agrees. “There have been, in the past, a few autoscoring systems that did not do a great job, that were not FDA-approved, that were not published in peer-reviewed journals. This may have given a bad perception in the industry,” she says. “This resistance should be mitigated by the AASM pilot certification program for which only highly validated software will be eligible to apply.”

A certification program such as AASM’s could help build trust in the industry, allowing companies to demonstrate their software’s reliability, says Kishan Kishan, co-founder of Neurobit, maker of an AI-powered automatic sleep scoring system that plans to apply for AASM certification. “If you look at the level of trust…it’s very low because people trust another human for an answer,” Kishan says. “So we need better credibility by a very neutral stakeholder like AASM who can give a certification on that.” Such validation, he says, is important not only for sleep medicine but also for any other field that’s adopting AI as a medical device. “This is a perfect step of adoption of this kind of technology in the sleep market,” Kishan says.

Minimizing Interrater Variability

Autoscoring programs also could help reduce interrater variability in staging sleep. A study co-authored by Cerebra scientific founder Magdy Younes, MD, PhD, found it difficult for technicians to obtain agreement in over 85% of epochs, on average, for five-stage sleep scoring, even between expert scorers.

This is due to a number of epochs that are difficult to classify, causing techs to consider accepting “either of two, or even three, scoring options. Because scorers are obliged to make a decision, agreement (or lack thereof) in such epochs is left to chance,” according to the study.

Digitally obtained information about sleep depth, delta duration, spindles, and K-complexes can significantly reduce interrater variability in sleep staging by eliminating the guesswork in scoring epochs, according to the study.

A separate study concluded that probabilities of the sleep stages determined by artificial intelligence provide an “excellent” estimate of ambiguity among manual scoring.²

“You want the system to be reliable regardless of whoever the manual scorer is you compare it to. It should be good for everybody because everybody may manually score slightly different,” says Younes. Autoscoring programs, then, could aid in providing more consistent and reproducible scoring results.

Garnering Deeper Insights from Sleep Studies

Autoscoring programs also hold the potential to provide deeper patient insights by leveraging currently unavailable or underutilized data. Anuja Bandyopadhyay, MD, who chairs the artificial intelligence in sleep medicine committee at AASM, highlights the abundance of untapped physiologic data within sleep studies that could aid in a better understanding of sleep architecture—identifying risk factors for conditions such as heart failure or Alzheimer’s disease—and offer a more comprehensive understanding of sleep’s overall impact on health.

“It’s very hard for humans to be able to score and get through all that amount of data, but I see AI being able to do that,” Bandyopadhyay says. “In turn, I think that’s something which is going to really add value to sleep studies. It’s going to make us, of course, get more and more sleep studies done, but it’s going to help us serve the patients better. And it’s going to help improve patient care.”

According to the American Association of Sleep Technologists (AAST) 2021 Workforce Survey Report, automated scoring technologies (37%) and AI/machine learning scoring technologies (36%) were among the top-cited challenges and trends anticipated in the next three to five years. Nearly one-third (30%) of respondents believe that artificial intelligence will be used in scoring technologies and for identifying the probability of a sleep disorder.³

“Innovation is inevitable and good for both sleep medicine and sleep technology. Sleep technologists should not see AI technology as a threat, but as a tool to harness in the care of patients,” says Emerson Kerr, MBA, RRT, RPSGT, FAAST, president-elect at AAST.

Changing the Role of the Sleep Tech

But, of course, some sleep professionals do see AI as a threat, particularly worrying that it could render sleep techs obsolete. Younes, who’s also a distinguished professor emeritus at the University of Manitoba in Canada, says sleep techs will continue to be essential. He maintains that even the best autoscoring systems need to be edited. “Digital systems are not perfect, and they do need to be reviewed, as we currently recommend,” says Younes. “So, taking the report from the digital system and sending it directly to the doctor, I don’t think that is a good idea.”

Others believe autoscoring is needed due to the current shortage of technicians. Andrea Ramberg, MS, CCSH, RPSGT, president of the Board of Registered Polysomnographic Technologists, notes that the number of individuals taking certification tests is dropping, while the number of job openings is rising. Last year, 496 candidates passed the RPSGT exam in the US⁴; meanwhile, there are over 1,000 open sleep tech job positions available, according to Indeed.com.

Autoscoring software is more likely to change the role of the sleep tech than eliminate it, says Ramberg. With autoscoring handling the heavy lifting for sleep scoring, more sleep clinicians can fill the “desperately needed” roles in front of patients. “Let the AI technology handle the data, and allow the techs to handle the patient care,” Ramberg says.

According to Bandyopadhyay, the sleep lab at the Riley Hospital for Children at Indiana University Health is booked six months in advance. Autoscoring programs could aid in reducing waiting periods by enabling a single technician to attend to multiple patients, while also reducing the risk of technician burnout, she says.

She sees the technician serving in more of a supervisory role, overseeing the autoscoring process as the machine progresses through it. “I can see it as a very collaborative effort between the autoscoring and the technician,” says Bandyopadhyay, who’s also an assistant professor of clinical pediatrics in the department of pediatrics at Indiana University School of Medicine.

Wider acceptance of autoscoring technology could also create new opportunities for sleep facilities. According to Labelle, it could free technicians from time-consuming tasks, giving them the opportunity to explore different avenues, such as expanding home sleep testing programs, focusing on daytime administrative tasks, expanding patient care, or conducting additional quality assurance tasks.

Beyond the Autoscoring Pilot Program

The next step in AASM’s autoscoring certification program is to evaluate all sleep parameters required in the AASM Manual for the Scoring of Sleep and Associated Events. The academy also intends to grow the number of sleep studies evaluated in the certification process and increase the number of scorers used in determining appropriate performance measures.

As autoscoring software continually improves, the bar for minimum acceptable performance in the certification program may need to move higher, says AASM’s Van Hout, which could incentivize more autoscoring manufacturers to improve their scoring performance.

References

1. Younes M, Hanly PJ. Minimizing interrater variability in staging sleep by use of computer-derived features. J Clin Sleep Med. 2016 Oct 15;12(10):1347-56.

2. Bakker JP, Ross M, Cerny A, et al. Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep. 2023 Feb 2;(46)2.

3. AAST. 2021 workforce survey report. 2021 Aug. Available at https://www.aastweb.org/Portals/0/Docs/News/AAST%202021%20Workforce%20Survey%20Report.pdf.

4. BRPT. 2022 year in review: The BRPT annual overview. 2023 Apr. Available at https://www.brpt.org/wp-content/uploads/BRPT-Annual-Report-2022-FINAL-.pdf.