Quantified Self Experiments / Accuracy - Withings Sleep Analyzer vs Dreem 2 EEG sleep staging

Abstract

What did i do?

I've measured my sleep data by using Withings Sleep Analyzer 2.0 and Dreem 2 EEG.

How did i do it?

I've used confusion matrix to reveal agreement between 2 devices.

What did i learn?

Withings Sleep Analyzer v2 sleep staging looks better than Oura, but in a moderate agreement (~68%) with EEG device. I'll ignore withings sleep staging since i need a good sensitivity for variations in sleep.
Here is final agreement (diagonal line) if you dont have time to read:

Introduction

Sleep stages data may reveal some sleep disorders and may be used for sleep quality and quantity asessment. Withings sleep algorithms determine sleep stages by using non-eeg data (heart rate by ballistocardiography, movement, breathing rate) and their accuracy is questionable.

The purpose of this experiment (n=1) is to compared hypnogram data between under mattress device and eeg headband.

Materials & Methods

Participants

Adult male anthropometrics was described in previous article.

Experimental design

From 2020-08-27 to 2021-09-21 sleep quality and quantity was assessed by Dreem 2 EEG Headband which was validated against gold standard PSG. At same time Withings Sleep Analyzer Tracking Mat v2.0 was put under matress and sleep data was collected. There were 37 nights with data from both devices.

Results

To compare sleep accuracy i've decided to use confused matrix approach. Hypnograms were compared on same resolutions. There were total 301 hours of data and i've decided not to build CI's because of large dataset.
Lets build a simplified confusion matrix plot:

How to read confusion matrix? Columns belongs to Withings, rows belongs to Dreem, diagonal is agreement between both devices. In a 1st column we can see that 60% of Withings DEEP sleep was recognized as DEEP sleep by Dreem 2, 35% of Withings DEEP was recognized as LIGHT by Dreem 2 and only 2.4% of Withings DEEP was actually REM. Sum of each column is a 100%.

Cohen's Kappa is 0.51 which is a moderate agreement:

Here is more detailed confusion matrix if anybody interested:

Prediction is Dreem 2 and Target is Withings 2.0.

In contrast to Oura ring, Withings does not over/underestimate sleep stages if we look at proportions:

Withings Dreem Difference
DEEP 21.8% 19.8% slight
LIGHT 47.4% 49.1% slight
REM 24.2% 24.9% slight
AWAKE 6.7% 6.2% slight

Percentage base is Time in Bed = DEEP + LIGHT + REM. + AWAKE.

Anyway, even average data is impressing and catch sleep composition proportions - this data of less interest because does not require stages predicted at correct time as at previous chart. Abscense of variations in the presence with low agreement is confusing me.

Discussion

This data analysis suggests poor accuracy of sleep staging from Withings Sleep Mat v2. Only half of awake time was detected, meaning total sleep time (DEEP + LIGHT + REM), which is of large interest, is also incorrect.

In contrast, averaged sleep composition results look very impressive.

I'll ignore and will not use Withings sleep staging data and TST in my experiments / data analysis. In a present time i cant find a non-eeg devices with acceptable sleep staging.

Data availability & Information

Welcome for questions, suggestions and critics in comments below.

Original unmodified (exported) raw data for withings is here and for dreem is here.

library(stringr)
library(lubridate)
library(jsonlite)
library(dplyr)
library(plyr)
library(cvms)
library(e1071)
library(vcd)
library(caret)
library(modeest)
library(ggplot2)
library(iterators)
library(parallel)
library(doParallel)
registerDoParallel(cores=24)

dreem <- read.csv("https://blog.kto.to/uploads/dreem-v3.csv", skip = 5, sep = ';', header = TRUE)
withings <- read.csv("https://blog.kto.to/uploads/raw_bed_sleep-state.csv")

#process dreem hypnogram
dreem <- dreem[!is.na(dreem$Type),]
dreem$Hypnogram <- str_replace(dreem$Hypnogram, "\\[", "")
dreem$Hypnogram <- str_replace(dreem$Hypnogram, "\\]", "")
dreem$Hypnogram <- str_replace_all(dreem$Hypnogram, "WAKE", "4")
dreem$Hypnogram <- str_replace_all(dreem$Hypnogram, "REM", "3")
dreem$Hypnogram <- str_replace_all(dreem$Hypnogram, "Light", "2")
dreem$Hypnogram <- str_replace_all(dreem$Hypnogram, "Deep", "1")
dreem$HypnogramList <- str_split(dreem$Hypnogram, ",")
dreem$datetime <- ymd_hms(dreem$Start.Time)
dreems <- dreem

results <- foreach(i=1:nrow(dreems), .combine='rbind', .multicombine=TRUE, .packages = "lubridate") %dopar% {
  ddf = NULL
  for (j in 1:length(dreems$HypnogramList[i][[1]]))
  {
    stage <- as.numeric(dreems$HypnogramList[i][[1]][j])
    datetime <- dreems$datetime[i] + 30 * (j - 1)
    res <- c(datetime = datetime, stage = stage)
    ddf = rbind(ddf, res)
  }
  return(ddf)
}

dreem_data <- as.data.frame(results)
row.names(dreem_data) <- NULL
summary(dreem_data)

#process withings hypnogram
withings$datetime <- ymd_hms(withings$start)
withings <- withings[withings$datetime > "2020-08-01",]
withings$value <- str_replace(withings$value, "\\[", "")
withings$value <- str_replace(withings$value, "\\]", "")
withings$value <- str_replace_all(withings$value, "0", "4")
withings$value <- str_replace_all(withings$value, "2", "5")
withings$value <- str_replace_all(withings$value, "1", "2")
withings$value <- str_replace_all(withings$value, "5", "1")
withings$HypnogramList <- str_split(withings$value, ",")

withings_results <- foreach(i=1:nrow(withings), .combine='rbind', .multicombine=TRUE, .packages = "lubridate") %dopar% {
  wdf = NULL
  for (j in 1:length(withings$HypnogramList[i][[1]]))
  {
    stage <- as.numeric(withings$HypnogramList[i][[1]][j])
    for(k in 1:2) #downsample to 30-sec like dreem
    {
      datetime <- withings$datetime[i] + 60 * (j - 1) + (k - 1) * 30
      res <- c(datetime = datetime, stage = stage)
      wdf = rbind(wdf, res)
    }
  }
  return(wdf)
}

withings_data <- as.data.frame(withings_results)
row.names(withings_data) <- NULL
summary(withings_data)

#round datas for both hypnograms
period <-  "30s"
withings_data$dt <- as.POSIXct(withings_data$datetime, origin="1970-01-01")
withings_data$period <- round_date(withings_data$dt, period)
dreem_data$dt <- as.POSIXct(dreem_data$datetime, origin="1970-01-01")
dreem_data$period <- round_date(dreem_data$dt, period)

#merge data
merged_hypnogram <- inner_join(as.data.frame(withings_data[,c("period","stage")]), as.data.frame(dreem_data[,c("period","stage")]) , by = c("period" = "period"))
colnames(merged_hypnogram) <- c("datetime","withings","dreem")

merged_hypnogram <- merged_hypnogram[!is.na(merged_hypnogram$withings),]
merged_hypnogram <- merged_hypnogram[!is.na(merged_hypnogram$dreem),]

merged_hypnogram$dreem[merged_hypnogram$dreem == 1] <- "DEEP";
merged_hypnogram$dreem[merged_hypnogram$dreem == 2] <- "LIGHT";
merged_hypnogram$dreem[merged_hypnogram$dreem == 3] <- "REM";
merged_hypnogram$dreem[merged_hypnogram$dreem == 4] <- "AWAKE";

merged_hypnogram$withings[merged_hypnogram$withings == 1] <- "DEEP";
merged_hypnogram$withings[merged_hypnogram$withings == 2] <- "LIGHT";
merged_hypnogram$withings[merged_hypnogram$withings == 3] <- "REM";
merged_hypnogram$withings[merged_hypnogram$withings == 4] <- "AWAKE";

category_order <- c("DEEP", "LIGHT", "REM", "AWAKE")
merged_hypnogram$withings = factor(merged_hypnogram$withings, levels = category_order)
merged_hypnogram$dreem = factor(merged_hypnogram$dreem, levels = category_order)

cm <- confusion_matrix(targets = as.factor(merged_hypnogram$withings), predictions = as.factor(merged_hypnogram$dreem))

Kappa(as.matrix(cm$Table[[1]], rownames = F))
plot_confusion_matrix(cm$`Confusion Matrix`[[1]],
                      place_x_axis_above = T,
                      add_row_percentages = F,
                      add_col_percentages = T,
                      rotate_y_text = F,
                      class_order = c("AWAKE", "REM", "LIGHT", "DEEP"),
                      add_sums = T)

cm <- table(merged_hypnogram$withings,merged_hypnogram$dreem)
cm <- cm / rowSums(cm)
cm <- as.data.frame(cm, stringsAsFactors = TRUE)
cm$Var2 <- factor(cm$Var2, rev(levels(cm$Var2)))

ggplot(cm, aes(Var1, Var2, fill = round(100*Freq,1))) +
  geom_tile() +
  geom_text(aes(label = paste(" ", round(100*Freq,1),"%")),  size=6) +
  scale_x_discrete(expand = c(0, 0),position = 'top') +
  scale_y_discrete(expand = c(0, 0),position = 'left') +
  scale_fill_gradient(low = "white", high = "#3575b5") +
  labs(x = "Withings Sleep 2.0", y = "Dreem 2", fill = "Agreement, %") +
  theme(legend.title = element_text(size = 12, margin = margin(0, 20, 10, 0)),
        axis.title.x = element_text(margin = margin(20, 20, 20, 20), size = 18),
        axis.title.y = element_text(margin = margin(0, 20, 0, 10), size = 18))

Statistical analysis

RStudio version 1.3.959 and R version 4.0.2. Cohen's Kappa interpretation reference