I've measured my sleep data by using Fitbit Charge 4 and Dreem 2 EEG.
I've used confusion matrix to reveal agreement between 2 devices.
Fitbit Charge 4 sleep staging looks better than Withings and Oura with moderate agreement ~73% to EEG device.
Here is final agreement (diagonal line) if you dont have time to read:
Sleep stages data may reveal some sleep disorders and may be used for sleep quality and quantity asessment. Fitbit Charge 4 sleep algorithms determine sleep stages by using non-eeg data (hr, hrv, accelerometer, skin temperature) and accuracy is questionable.
The purpose of this experiment (n=1) is to compared hypnogram data between wristworn device and eeg headband.
Adult male anthropometrics was described in previous article.
From 2020-08-27 to 2021-09-21 sleep quality and quantity was assessed by Dreem 2 EEG Headband which was validated against gold standard PSG. At same time Fitbit Charge 4 was used and sleep data was collected. There were 36 nights with data from both devices.
To compare sleep accuracy i've decided to build confusion matrix. Hypnograms were compared on same resolutions. There were total 287 hours of data and i've decided not to build confidence intervals because of large dataset.
Lets build a simplified confusion matrix plot:
How to read confusion matrix? Columns belongs to Fitbit Charge 4, rows belongs to Dreem, diagonal is agreement between both devices. In a 1st column we can see that 75% of Fitbit DEEP sleep was recognized as DEEP sleep by Dreem 2, 24% of Fitbit DEEP was recognized as LIGHT by Dreem 2 and only 1% of Fitbit DEEP was actually REM. Sum of each column is a 100%.
Cohen's Kappa is 0.59 which is a moderate agreement:
Here is more detailed confusion matrix if anyone interested:
Prediction is Dreem 2 and Target is Fitbit Charge 4.
In contrast to Oura ring, and like a Withings Sleep Mat - Fitbit Charge 4 does not over/underestimate sleep stages if we look at proportions:
Fitbit Charge 4 | Dreem | Difference | |
DEEP | 19.9% | 19.9% | none |
LIGHT | 52% | 49% | slight |
REM | 21.5% | 24.7% | slight |
AWAKE | 6.7% | 6.5% | none |
Percentage base is Time in Bed = DEEP + LIGHT + REM. + AWAKE.
Anyway, even average data is impressing and catch sleep composition proportions - this data of less interest because does not require stages predicted at correct time as at previous chart.
This data analysis suggests moderate accuracy of sleep staging from Fitbit Charge 4. But only half of awake time was detected at right time, meaning total sleep time (DEEP + LIGHT + REM), which is of large interest, is incorrect.
Compared to Oura and Withings devices Fitbit Charge 4 accuracy is impressive, but still far from EEG.
I'll ignore and will not use Fitbit Charge sleep staging data and TST in my experiments / data analysis (even 75% agreement isnt enough sensitivity). In a present time i cant find a non-eeg devices with acceptable sleep staging.
Welcome for questions, suggestions and critics in comments below.
Original unmodified (exported) raw data for fitbit is here and for dreem is here.
library(mise)
mise()
library(stringr)
library(lubridate)
library(jsonlite)
library(dplyr)
library(plyr)
library(cvms)
library(e1071)
library(vcd)
library(caret)
library(modeest)
library(ggplot2)
library(iterators)
library(parallel)
library(doParallel)
registerDoParallel(cores=24)
dreem <- read.csv("https://blog.kto.to/uploads/dreem-v3.csv", skip = 5, sep = ';', header = TRUE)
rawfitbit <- bind_rows(
fromJSON("https://blog.kto.to/uploads/fitbit/sleep-2021-05-10.json"),
fromJSON("https://blog.kto.to/uploads/fitbit/sleep-2021-06-09.json"),
fromJSON("https://blog.kto.to/uploads/fitbit/sleep-2021-07-09.json"),
fromJSON("https://blog.kto.to/uploads/fitbit/sleep-2021-08-08.json"),
fromJSON("https://blog.kto.to/uploads/fitbit/sleep-2021-09-07.json")
)
#process dreem hypnogram
dreem <- dreem[!is.na(dreem$Type),]
dreem$Hypnogram <- str_replace(dreem$Hypnogram, "\\[", "")
dreem$Hypnogram <- str_replace(dreem$Hypnogram, "\\]", "")
dreem$Hypnogram <- str_replace_all(dreem$Hypnogram, "WAKE", "4")
dreem$Hypnogram <- str_replace_all(dreem$Hypnogram, "REM", "3")
dreem$Hypnogram <- str_replace_all(dreem$Hypnogram, "Light", "2")
dreem$Hypnogram <- str_replace_all(dreem$Hypnogram, "Deep", "1")
dreem$HypnogramList <- str_split(dreem$Hypnogram, ",")
dreem$datetime <- ymd_hms(dreem$Start.Time)
dreems <- dreem
results <- foreach(i=1:nrow(dreems), .combine='rbind', .multicombine=TRUE, .packages = "lubridate") %dopar% {
ddf = NULL
for (j in 1:length(dreems$HypnogramList[i][[1]]))
{
stage <- as.numeric(dreems$HypnogramList[i][[1]][j])
datetime <- dreems$datetime[i] + 30 * (j - 1)
res <- c(datetime = datetime, stage = stage)
ddf = rbind(ddf, res)
}
return(ddf)
}
dreem_data <- as.data.frame(results)
row.names(dreem_data) <- NULL
summary(dreem_data)
#process fitbit hypnogram
fitbit <- data.frame(rawfitbit)
fitbit$datetime <- ymd_hms(fitbit$startTime)
fitbits <- fitbit[1,]
fitbits <- fitbit
fitbit_results <- foreach(i=1:nrow(fitbits), .combine='rbind', .multicombine=TRUE, .packages = "lubridate") %dopar% {
fdf = NULL
data <- as.data.frame(fitbit$levels$data[i])
for(j in 1:nrow(data))
{
stage <- data$level[j]
for(k in 1:round(data$seconds[j]/30)) #downsample to 30-sec like dreem
{
datetime <- ymd_hms(data$dateTime[j]) + (k - 1)*30
res <- c(datetime = datetime, stage = stage)
fdf = rbind(fdf, res)
}
}
return(fdf)
}
fitbit_data <- as.data.frame(fitbit_results)
row.names(fitbit_data) <- NULL
fitbit_data$stage <- str_replace_all(fitbit_data$stage, "wake", "4")
fitbit_data$stage <- str_replace_all(fitbit_data$stage, "rem", "3")
fitbit_data$stage <- str_replace_all(fitbit_data$stage, "light", "2")
fitbit_data$stage <- str_replace_all(fitbit_data$stage, "deep", "1")
fitbit_data$stage <- as.numeric(fitbit_data$stage)
fitbit_data$datetime <- as.numeric(fitbit_data$datetime) - 7*3600
summary(fitbit_data)
#round datas for both hypnograms
period <- "30s"
fitbit_data$dt <- as.POSIXct(fitbit_data$datetime, origin="1970-01-01 00:00:00")
fitbit_data$period <- round_date(fitbit_data$dt, period)
dreem_data$dt <- as.POSIXct(dreem_data$datetime, origin="1970-01-01")
dreem_data$period <- round_date(dreem_data$dt, period)
#merge data
merged_hypnogram <- inner_join(as.data.frame(fitbit_data[,c("period","stage")]), as.data.frame(dreem_data[,c("period","stage")]) , by = c("period" = "period"))
colnames(merged_hypnogram) <- c("datetime","fitbit","dreem")
merged_hypnogram <- merged_hypnogram[!is.na(merged_hypnogram$fitbit),]
merged_hypnogram <- merged_hypnogram[!is.na(merged_hypnogram$dreem),]
merged_hypnogram$dreem[merged_hypnogram$dreem == 1] <- "DEEP";
merged_hypnogram$dreem[merged_hypnogram$dreem == 2] <- "LIGHT";
merged_hypnogram$dreem[merged_hypnogram$dreem == 3] <- "REM";
merged_hypnogram$dreem[merged_hypnogram$dreem == 4] <- "AWAKE";
merged_hypnogram$fitbit[merged_hypnogram$fitbit == 1] <- "DEEP";
merged_hypnogram$fitbit[merged_hypnogram$fitbit == 2] <- "LIGHT";
merged_hypnogram$fitbit[merged_hypnogram$fitbit == 3] <- "REM";
merged_hypnogram$fitbit[merged_hypnogram$fitbit == 4] <- "AWAKE";
category_order <- c("DEEP", "LIGHT", "REM", "AWAKE")
merged_hypnogram$fitbit = factor(merged_hypnogram$fitbit, levels = category_order)
merged_hypnogram$dreem = factor(merged_hypnogram$dreem, levels = category_order)
cm <- confusion_matrix(targets = as.factor(merged_hypnogram$fitbit), predictions = as.factor(merged_hypnogram$dreem))
Kappa(as.matrix(cm$Table[[1]], rownames = F))
plot_confusion_matrix(cm$`Confusion Matrix`[[1]],
place_x_axis_above = T,
add_row_percentages = F,
add_col_percentages = T,
rotate_y_text = F,
class_order = c("AWAKE", "REM", "LIGHT", "DEEP"),
add_sums = T)
cm <- table(merged_hypnogram$fitbit,merged_hypnogram$dreem)
cm <- cm / rowSums(cm)
cm <- as.data.frame(cm, stringsAsFactors = TRUE)
cm$Var2 <- factor(cm$Var2, rev(levels(cm$Var2)))
ggplot(cm, aes(Var1, Var2, fill = round(100*Freq,1))) +
geom_tile() +
geom_text(aes(label = paste(" ", round(100*Freq,1),"%")), size=6) +
scale_x_discrete(expand = c(0, 0),position = 'top') +
scale_y_discrete(expand = c(0, 0),position = 'left') +
scale_fill_gradient(low = "white", high = "#3575b5") +
labs(x = "Fitbit Charge 4", y = "Dreem 2", fill = "Agreement, %") +
theme(legend.title = element_text(size = 12, margin = margin(0, 20, 10, 0)),
axis.title.x = element_text(margin = margin(20, 20, 20, 20), size = 18),
axis.title.y = element_text(margin = margin(0, 20, 0, 10), size = 18))
RStudio version 1.3.959 and R version 4.0.2. Cohen's Kappa interpretation reference