I've got two data frames containing timeseries (with time coded as numeric, rather than time objects; and time is unsorted). I'd like to normalize a response variable in one data frame to a response variable in another data frame. The problem is that the timepoints in the two data frames aren't quite equivalent. So, I'll need to merge the two data frames by the approximate match of the two time columns.
The data look like this:
df1 <- structure(list(t1 = c(3, 1, 2, 4), y1 = c(9, 1, 4, 16)), .Names = c("t1", "y1"), row.names = c(NA, -4L), class = "data.frame")
df2 <- structure(list(t2 = c(0.9, 4.1), y2 = structure(1:2, .Label = c("a", "b"), class = "factor")), .Names = c("t2", "y2"), row.names = c(NA, -2L), class = "data.frame")
The result should look like this:
t1 y1 y2
1 1 a
4 16 b
Seems like approx
or approxfun
would be useful, but I can't quite see how to do it.
Both dataframes has the different number of values but only common values in both the dataframes are displayed after merge. Example 2: In the resultant dataframe Grade column of df2 is merged with df1 based on key column Name with merge type left i.e. all the values of left dataframe (df1) will be displayed.
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
We'll pass two dataframes to pd. contact() method in the form of a list and mention in which axis you want to concat, i.e. axis=0 to concat along rows, axis=1 to concat along columns.
Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.
You can do this easily with na.approx
from zoo:
library(zoo)
Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE)
Data$y1 <- na.approx(Data$y1, na.rm=FALSE, rule=2)
na.omit(Data)
# t1 y1 y2
# 1 0.9 1 a
# 6 4.1 16 b
You could do this with approx
too:
Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE)
y1.na <- is.na(Data$y1)
Data$y1[y1.na] <- (approx(Data$y1, rule=2, n=NROW(Data))$y)[y1.na]
@JoshuaUlrich provided a nice way to do this if you want the final result to include everything from df2
and you don't care if the t1
column has the values from t2
.
However, if you wanted to avoid these things and continue in the vein suggested by @BrandonBertelsen, you might define custom round
function and then use that on the merge column of the second data.frame
. For example:
# define a more precise rounding function that meets your needs.
# e.g., this one rounds values in x to their nearest multiple of h
gen.round <- function(x, h) {
ifelse(x %% h > (h/2), h + h * (x %/% h), -(h + h * (-x %/% h)))
}
# make a new merge function that uses gen.round to round the merge column
# in the second data.frame
merge.approx <- function(x, y, by.x, by.y, h, ...) {
y <- within(y, assign(by.y, gen.round(get(by.y), h)))
merge(x, y, by.x=by.x, by.y=by.y, ...)
}
merge.approx(df1, df2, by.x='t1', by.y='t2', h =.5)
t1 y1 y2
1 1 1 a
2 4 16 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With