I want to interpolate within groups in a dataframe. This will give me an arbitrary number of intermediate points for each group within a dataframe.
I have a dataframe like:
OldDataFrame <- data.frame(ID = c(1,1,1,2,2,2),
time = c(1,2,3,1,2,3),
Var1 = c(-0.6 , 0.2, -0.8 , 1.6 , 0.3 , -0.8),
Var2 = c(0.5 , 0.7, 0.6 , -0.3 , 1.5 , 0.4) )
I want to get a function something like this:
TimeInterpolateByGroup <- function(DataFrame,
GroupingVariable,
TimeVariable,
TimeInterval){
#Something Here
}
It would be handy if I did not have to specify columns to do this on and it could operate automatically on every numeric column like numcolwise
in plyr
So that I could apply it like this:
NewDataFrame = TimeInterpolateByGroup(DataFrame = OldDataFrame,
GroupingVariable = "ID",
TimeVariable = "time",
TimeInterval = 0.25)
to get the NewDataFrame as:
NewDataFrame = data.frame(ID = c( 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2 ),
time = c( 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3 ),
Var1 = c( -0.6, -0.4, -0.2, 0, 0.2, -0.05, -0.3, -0.55, -0.8, 1.6, 1.275, 0.95, 0.625, 0.3, 0.025, -0.25, -0.525, -0.8 ),
Var2 = c( 0.5, 0.55, 0.6, 0.65, 0.7, 0.675, 0.65, 0.625, 0.6, -0.3, 0.15, 0.6, 1.05, 1.5, 1.225, 0.95, 0.675, 0.4 ))
Interpolate variables on subsets of dataframe
plyr
type approach seems to be in the right direction but with a confusing example and without the ability to have an arbitrary number of intermediate interpolation points. This is important for the animation application (see below) where I am not sure how many intermediate time points I will need to get a smooth animation.Some other answers use a time series approach but that would not allow segmenting by group.
I also considered using a longitudinal data package but that seems unnecessarily complicated for what should be a simple problem.
I want to have an x-y plot of Var1 and Var2 with the points being each ID point at time = 1. Then I want to use the animate
package to see the points move as time increases. To do this smoothly I need all of the coordinate sets for intermediate points in time.
I'm fairly sure that the code below gives the correct answer, except for a tiny level of numerical imprecision due to the use of the approx() function. The basic idea is to use ddply to split and combine data frames, and approx to do the interpolation.
library(plyr)
# time_interpolate is a helper function for TimeInterpolateByGroup
# that operates on each of the groups. In the input to this function,
# the GroupingVariable column of the data frame should be single-valued.
# The function returns a (probably longer) data frame, with estimated
# values for the times specified in the output_times array.
time_interpolate <- function(data_frame,
GroupingVariable,
time_var,
output_times) {
input_times <- data_frame[, time_var]
exclude_vars <- c(time_var, GroupingVariable)
value_vars <- setdiff(colnames(data_frame), exclude_vars)
output_df <- data.frame(rep(data_frame[1,GroupingVariable], length(output_times)), output_times)
colnames(output_df) <- c(GroupingVariable, time_var)
for (value_var in value_vars) {
output_df[,value_var] <- approx(input_times, data_frame[, value_var], output_times)$y
}
return(output_df)
}
# A test for time_interpolate
time_interpolate(OldDataFrame[1:3,], "ID" , "time", seq(from=1, to=3, by=0.25))
TimeInterpolateByGroup <- function(DataFrame,
GroupingVariable,
TimeVariable,
TimeInterval){
min_time <- min(DataFrame[, TimeVariable])
max_time <- max(DataFrame[, TimeVariable])
output_times <- seq(from=min_time, to=max_time, by=TimeInterval)
ddply(DataFrame,
GroupingVariable,
time_interpolate,
GroupingVariable=GroupingVariable,
time_var=TimeVariable,
output_times=output_times)
}
You may also use na.approx
from zoo
package.
library(zoo)
my_fun <- function(DataFrame, GroupingVariable, TimeVariable, TimeInterval){
do.call(rbind, by(DataFrame, DataFrame[ , GroupingVariable], function(dat){
tt <- data.frame(time = seq(from = min(dat[ , TimeVariable]),
to = max(dat[ , TimeVariable]),
by = TimeInterval))
dat2 <- merge(tt, dat, all.x = TRUE)
na.approx(dat2)
}))
}
my_fun(df, "ID", "time", 0.25)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With