I measured room temperature every minute for 36 minutes and skin temperature 32 times per second for the same time period. I have 35 repeats of the experiment labelled (ID). I need to be able to look at the correlation but, the samples are of unequal sizes.
Data:
I have a data.frame df1
with room temperature measured every minute and another data.frame df2
with skin temperature measured 32 times per second. I have 36 minutes worth of data. In addition there is another column called ID which shows the experiment number (1-35) but I don't know how to represent this in the following example data. So technically I'm looking for correlation for each SkinTemp vs RoomTemp based on ID.
df1 <- data.frame(
roomTemp = rnorm(1*36),
)
df2 <- data.frame(
skinTemp = rnorm(32*60*36),
)
I tried doing:
Data <- data.frame(
Y=c(df1,df2),
Variable =factor(rep(c("RoomTemp", "SkinTemp"), times=c(length(df1), length(df2))))
)
cor(Data$Y~Data$Variable)
but that doesn't seem to work.
A rolling join or interpolation might be helpful for imputing roomTemp
for times when skinTemp
was measured. Below are examples of both. The first section is an update to deal with multiple IDs, followed by the original answer for the case of a single ID.
This update addresses the case of data with multiple IDs where we want to either interpolate or do a rolling join separately for each ID.
library(data.table)
library(reshape2)
library(dplyr)
library(purrr)
library(ggplot2)
theme_set(theme_classic(base_size=16))
First, we'll create fake autocorrelated data for two separate IDs:
set.seed(395)
df1 <- data.frame(roomTemp = c(cumsum(rnorm(1*36)), cumsum(rnorm(1*36))),
ID = rep(c("A","B"), each=36))
df2 <- data.frame(skinTemp = c(cumsum(rnorm(32*60*36,0,0.01)),
cumsum(rnorm(32*60*36,0,0.01))),
ID = rep(c("A","B"), each=32*60*36))
Now we add a time column, but in this case I've also added a shift in df1
, so that no df1
measurement happens at the same time as a df2
measurement, just to make the answer more general.
# Add time column
df1$time = rep(0:(0.5*nrow(df1)-1)*60 + 0.0438,2)
df2$time = rep(0:(0.5*nrow(df2)-1)/32, 2)
Convert the data frames to data tables. This time, we make ID
a key column in addition to time
so that the rolling join will occur separately for each ID
.
# Convert data frames to data tables
setDT(df1)
setDT(df2)
# Make ID and time key columns in both data frames (for joining)
setkey(df1, ID, time)
setkey(df2, ID, time)
# Rolling join roomTemp to nearest time value of skinTemp
df2 = df1[df2, roll="nearest"]
# Rename rolling joined room temperature column
names(df2)[grep("roomTemp", names(df2))] = "roomTempRoll"
To add the interpolated roomTemp
by ID
, I've used map_df
from the purrr
package. map_df
operates separately on each ID
. approx
takes care of the interpolation. In the original answer I used approxfun
to create an approximation function first, but here I've just done the interpolation directly in a single step. map_df
returns a data frame, but we just need the y
column, which has the interpolated values of roomTemp
, so I've extracted those at the end of the dplyr
function chain and assigned them to roomTempInterp
in df2
.
# Add interpolated room temperature by ID
df2$roomTempInterp = unique(df2$ID) %>%
map_df(~ approx(df1$time[df1$ID==.x], df1$roomTemp[df1$ID==.x],
xout=df2$time[df2$ID==.x]), .id="ID") %>% .$y
In the plot below, we facet by ID
so that we can see the imputed temperature values separately for each ID
.
# Plot so we can see what the rolling joined room temperature and
# interpolated room temperature look like
ggplot(melt(df2, id.var=c("ID", "time")), aes(time, value, colour=variable)) +
geom_line(size=0.7) +
geom_point(data=df1, aes(time, roomTemp), colour="black") +
facet_grid(ID ~ .)
Here's one way to get the correlations by ID
:
df2 %>% group_by(ID) %>%
summarise(r_interp = cor(skinTemp, roomTempInterp, use="pairwise.complete.obs"),
r_roll = cor(skinTemp, roomTempRoll, use="pairwise.complete.obs"))
ID r_interp r_roll 1 A -0.04853998 -0.02993207 2 B -0.53993960 -0.53092150
First, I modified the sample data frames to add some autocorrelation, since that seemed a bit closer to your real experiment and makes visualization easier.
library(data.table)
library(reshape2)
library(dplyr)
library(ggplot2)
theme_set(theme_classic(base_size=16))
# Fake data with autocorrelation
set.seed(395)
df1 <- data.frame(roomTemp = cumsum(rnorm(1*36)))
df2 <- data.frame(skinTemp = cumsum(rnorm(32*60*36,0,0.01)))
Now add a time column. You can work with actual datetime columns, but here I've just gone with numeric columns denominated in seconds.
# Add time column
df1$time = 0:(nrow(df1)-1)*60
df2$time = 0:(nrow(df2)-1)/32
For interpolation, we need a function that will interpolate room temperatures at the times when skin temperature is measured in between the room temperature measurements. approxfun
performs linear interpolation between points. You can also use splinefun
in a similar way to interpolate using splines.
# Function to interpolate room temperature between measurements
roomTempInterp = approxfun(df1$time, df1$roomTemp)
Convert the data frames to data tables in order to use data.table
's rolling join functionality.
# Convert data frames to data tables
setDT(df1)
setDT(df2)
# Make time a key column in both data frames (for joining)
setkey(df1, time)
setkey(df2, time)
Now perform a rolling join to the nearest time value.
# Rolling join roomTemp to nearest time value of skinTemp
df2 = df1[df2, roll="nearest"]
# Rename rolling joined room temperature column
names(df2)[grep("roomTemp", names(df2))] = "roomTempRoll"
Merge original roomTemp
measurements from df1
into df2
.
df2 = df1[df2, ] # Equivalent to dplyr: df2 = left_join(df2, df1)
Add the interpolated room temperature using the function we created above.
# Add interpolated room temperature
df2$roomTempInterp = roomTempInterp(df2$time)
The interpolation method seems more realistic to me, especially if we can assume roomTemp
changes relatively smoothly and monotonically between measurements. Below are the first 10 rows of df2
, which includes the original df2
data plus the new roomTempRoll
and roomTempInterp
columns and the original roomTemp
measurements from df1
. You can now use this data frame to assess correlation and other relationships between roomTemp
and skinTemp
.
roomTemp time roomTempRoll skinTemp roomTempInterp 1: -1.21529 0.00000 -1.21529 -0.006511475 -1.215290 2: NA 0.03125 -1.21529 -0.014058076 -1.215531 3: NA 0.06250 -1.21529 -0.017741690 -1.215773 4: NA 0.09375 -1.21529 -0.030211177 -1.216014 5: NA 0.12500 -1.21529 -0.027105225 -1.216255 6: NA 0.15625 -1.21529 -0.035784295 -1.216497 7: NA 0.18750 -1.21529 -0.031319748 -1.216738 8: NA 0.21875 -1.21529 -0.033758959 -1.216979 9: NA 0.25000 -1.21529 -0.040667384 -1.217220 10: NA 0.28125 -1.21529 -0.026291442 -1.217462
Below is a plot so you can see what the rolling join and interpolated values look like. The black dots mark the original roomTemp
measurements.
ggplot(melt(df2 %>% select(-roomTemp), id.var="time"), aes(time, value, colour=variable)) +
geom_line(size=1) +
geom_point(data=df2, aes(time, roomTemp), colour="black")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With