Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform correlation between time-series of unequal frequencies

Tags:

r

time-series

I measured room temperature every minute for 36 minutes and skin temperature 32 times per second for the same time period. I have 35 repeats of the experiment labelled (ID). I need to be able to look at the correlation but, the samples are of unequal sizes.

Data:

I have a data.frame df1 with room temperature measured every minute and another data.frame df2 with skin temperature measured 32 times per second. I have 36 minutes worth of data. In addition there is another column called ID which shows the experiment number (1-35) but I don't know how to represent this in the following example data. So technically I'm looking for correlation for each SkinTemp vs RoomTemp based on ID.

    df1 <- data.frame(
        roomTemp = rnorm(1*36),
    )

   df2 <- data.frame(
        skinTemp = rnorm(32*60*36),
        )

I tried doing:

Data <- data.frame(
  Y=c(df1,df2),
  Variable =factor(rep(c("RoomTemp", "SkinTemp"), times=c(length(df1), length(df2))))
)

cor(Data$Y~Data$Variable)

but that doesn't seem to work.

like image 767
HCAI Avatar asked Dec 11 '22 12:12

HCAI


1 Answers

A rolling join or interpolation might be helpful for imputing roomTemp for times when skinTemp was measured. Below are examples of both. The first section is an update to deal with multiple IDs, followed by the original answer for the case of a single ID.

Modification for multiple IDs

This update addresses the case of data with multiple IDs where we want to either interpolate or do a rolling join separately for each ID.

library(data.table)
library(reshape2)
library(dplyr)
library(purrr)
library(ggplot2)
theme_set(theme_classic(base_size=16))

First, we'll create fake autocorrelated data for two separate IDs:

set.seed(395)
df1 <- data.frame(roomTemp = c(cumsum(rnorm(1*36)), cumsum(rnorm(1*36))),
                  ID = rep(c("A","B"), each=36))
df2 <- data.frame(skinTemp = c(cumsum(rnorm(32*60*36,0,0.01)),
                               cumsum(rnorm(32*60*36,0,0.01))),
                  ID = rep(c("A","B"), each=32*60*36))

Now we add a time column, but in this case I've also added a shift in df1, so that no df1 measurement happens at the same time as a df2 measurement, just to make the answer more general.

# Add time column
df1$time = rep(0:(0.5*nrow(df1)-1)*60 + 0.0438,2)
df2$time = rep(0:(0.5*nrow(df2)-1)/32, 2)

Convert the data frames to data tables. This time, we make ID a key column in addition to time so that the rolling join will occur separately for each ID.

# Convert data frames to data tables
setDT(df1)
setDT(df2)

# Make ID and time key columns in both data frames (for joining)
setkey(df1, ID, time)
setkey(df2, ID, time)

# Rolling join roomTemp to nearest time value of skinTemp
df2 = df1[df2, roll="nearest"]

# Rename rolling joined room temperature column
names(df2)[grep("roomTemp", names(df2))] = "roomTempRoll"

To add the interpolated roomTemp by ID, I've used map_df from the purrr package. map_df operates separately on each ID. approx takes care of the interpolation. In the original answer I used approxfun to create an approximation function first, but here I've just done the interpolation directly in a single step. map_df returns a data frame, but we just need the y column, which has the interpolated values of roomTemp, so I've extracted those at the end of the dplyr function chain and assigned them to roomTempInterp in df2.

# Add interpolated room temperature by ID
df2$roomTempInterp = unique(df2$ID) %>% 
  map_df(~ approx(df1$time[df1$ID==.x], df1$roomTemp[df1$ID==.x], 
                  xout=df2$time[df2$ID==.x]), .id="ID") %>% .$y

In the plot below, we facet by ID so that we can see the imputed temperature values separately for each ID.

# Plot so we can see what the rolling joined room temperature and 
#  interpolated room temperature look like
ggplot(melt(df2, id.var=c("ID", "time")), aes(time, value, colour=variable)) +
  geom_line(size=0.7) +
  geom_point(data=df1, aes(time, roomTemp), colour="black") +
  facet_grid(ID ~ .)

enter image description here

Here's one way to get the correlations by ID:

df2 %>% group_by(ID) %>%
  summarise(r_interp = cor(skinTemp, roomTempInterp, use="pairwise.complete.obs"),
            r_roll = cor(skinTemp, roomTempRoll, use="pairwise.complete.obs"))
      ID    r_interp      r_roll
1      A -0.04853998 -0.02993207
2      B -0.53993960 -0.53092150

Original Answer

First, I modified the sample data frames to add some autocorrelation, since that seemed a bit closer to your real experiment and makes visualization easier.

library(data.table)
library(reshape2)
library(dplyr)
library(ggplot2)
theme_set(theme_classic(base_size=16))

# Fake data with autocorrelation
set.seed(395)
df1 <- data.frame(roomTemp = cumsum(rnorm(1*36)))
df2 <- data.frame(skinTemp = cumsum(rnorm(32*60*36,0,0.01)))

Now add a time column. You can work with actual datetime columns, but here I've just gone with numeric columns denominated in seconds.

# Add time column
df1$time = 0:(nrow(df1)-1)*60
df2$time = 0:(nrow(df2)-1)/32

For interpolation, we need a function that will interpolate room temperatures at the times when skin temperature is measured in between the room temperature measurements. approxfun performs linear interpolation between points. You can also use splinefun in a similar way to interpolate using splines.

# Function to interpolate room temperature between measurements
roomTempInterp = approxfun(df1$time, df1$roomTemp)

Convert the data frames to data tables in order to use data.table's rolling join functionality.

# Convert data frames to data tables
setDT(df1)
setDT(df2)

# Make time a key column in both data frames (for joining)
setkey(df1, time)
setkey(df2, time)

Now perform a rolling join to the nearest time value.

# Rolling join roomTemp to nearest time value of skinTemp
df2 = df1[df2, roll="nearest"]

# Rename rolling joined room temperature column
names(df2)[grep("roomTemp", names(df2))] = "roomTempRoll"

Merge original roomTemp measurements from df1 into df2.

df2 = df1[df2, ]  # Equivalent to dplyr: df2 = left_join(df2, df1)

Add the interpolated room temperature using the function we created above.

# Add interpolated room temperature
df2$roomTempInterp = roomTempInterp(df2$time)

The interpolation method seems more realistic to me, especially if we can assume roomTemp changes relatively smoothly and monotonically between measurements. Below are the first 10 rows of df2, which includes the original df2 data plus the new roomTempRoll and roomTempInterp columns and the original roomTemp measurements from df1. You can now use this data frame to assess correlation and other relationships between roomTemp and skinTemp.

    roomTemp    time roomTempRoll     skinTemp roomTempInterp
 1: -1.21529 0.00000     -1.21529 -0.006511475      -1.215290
 2:       NA 0.03125     -1.21529 -0.014058076      -1.215531
 3:       NA 0.06250     -1.21529 -0.017741690      -1.215773
 4:       NA 0.09375     -1.21529 -0.030211177      -1.216014
 5:       NA 0.12500     -1.21529 -0.027105225      -1.216255
 6:       NA 0.15625     -1.21529 -0.035784295      -1.216497
 7:       NA 0.18750     -1.21529 -0.031319748      -1.216738
 8:       NA 0.21875     -1.21529 -0.033758959      -1.216979
 9:       NA 0.25000     -1.21529 -0.040667384      -1.217220
10:       NA 0.28125     -1.21529 -0.026291442      -1.217462

Below is a plot so you can see what the rolling join and interpolated values look like. The black dots mark the original roomTemp measurements.

ggplot(melt(df2 %>% select(-roomTemp), id.var="time"), aes(time, value, colour=variable)) +
  geom_line(size=1) +
  geom_point(data=df2, aes(time, roomTemp), colour="black")

enter image description here

like image 185
eipi10 Avatar answered Dec 28 '22 08:12

eipi10