Stanford researchers built SleepFM, an AI model trained on 585,000 hours of overnight sleep recordings. From a single night, it forecasts the risk of 130 conditions, including dementia, heart failure and stroke.

Spend a night in a sleep lab and you leave behind an enormous amount of data. Electrodes on your scalp, straps across your chest, a clip on your finger, sensors at your nose and legs. All of it streams for hours. A technician later scores the night into stages, flags a few breathing pauses, and most of the raw signal is filed away and never looked at again. That waste has always bothered sleep scientists. Polysomnography is the gold standard for studying sleep, yet the recordings are messy, non-standard between labs, and hard to compare at scale.
A team at Stanford, writing in Nature Medicine, decided to treat all that discarded signal as a language worth learning. They built an AI model called SleepFM and trained it on more than 585,000 hours of overnight recordings from roughly 65,000 people across several cohorts. The result is a system that, from one night of sleep, estimates a person's future risk for 130 different medical conditions.
SleepFM is what researchers call a foundation model. Rather than being built for one narrow task, it learns general-purpose representations first and gets pointed at specific questions later. The training trick here is contrastive learning, a method that teaches the model which slices of physiological signal belong together and which do not. Sleep labs use different montages of sensors, so the authors designed their approach to swallow multiple recording configurations rather than choke on them. What comes out is a compact numerical fingerprint of a night that captures both the physiology and the timing of how sleep unfolds.
The headline claim is prediction. For 130 conditions, SleepFM reached a C-Index of at least 0.75, a measure of how well a model ranks who will develop a condition sooner. A score of 0.5 is a coin flip and 1.0 is perfect. For all-cause mortality the model hit 0.84. For dementia it reached 0.85. Myocardial infarction landed at 0.81, heart failure at 0.80, chronic kidney disease at 0.79, and both stroke and atrial fibrillation at 0.78. These are risk forecasts drawn from a single overnight recording, not diagnoses, but the breadth is what stands out. The same night of sleep carries signal about the brain, the heart, and the kidneys at once.
A model that only works on the data it grew up with is not much use. So the team tested SleepFM on the Sleep Heart Health Study, a dataset deliberately held out of pretraining. It held up there, which suggests the representations are not just memorized quirks of the training cohorts.
The researchers also checked whether their sprawling model could still do the everyday work sleep labs actually care about. On sleep staging, the task of labeling each chunk of the night as wake, REM, or one of the deeper stages, SleepFM posted mean F-scores between 0.70 and 0.78, competitive with dedicated tools like U-Sleep and YASA that were built for that one job. It classified sleep apnea presence with 0.87 accuracy and apnea severity at 0.69. So the same system that forecasts long-range disease risk can also handle the routine scoring that currently eats up technician hours.
Some restraint is in order. A high C-Index tells you the model ranks people well, but it does not tell you when a disease will strike or whether anything can be done about it. Prediction is not the same as understanding cause, and a night of sleep that flags elevated dementia risk offers no treatment on its own. The cohorts, while large, come from populations that agreed to formal sleep studies, which skews toward people already suspected of having sleep problems. Whether the forecasts hold up in the general population, across ages and ethnicities, is a question for prospective testing rather than this retrospective analysis.
Still, the shift in framing is worth sitting with. For decades a sleep study answered one narrow question, usually about snoring or apnea, and then the data went cold. This work treats that same night as a dense physiological readout of the whole body. The signal was always there. Nobody had taught a machine to read the whole page.
Weekly research updates, breakthrough summaries, and new articles — straight to your inbox. Free, always.
Comments