Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge data frames by approximate column values

Tags:

r

I've got two data frames containing timeseries (with time coded as numeric, rather than time objects; and time is unsorted). I'd like to normalize a response variable in one data frame to a response variable in another data frame. The problem is that the timepoints in the two data frames aren't quite equivalent. So, I'll need to merge the two data frames by the approximate match of the two time columns.

The data look like this:

df1 <- structure(list(t1 = c(3, 1, 2, 4), y1 = c(9, 1, 4, 16)), .Names = c("t1", "y1"), row.names = c(NA, -4L), class = "data.frame")
df2 <- structure(list(t2 = c(0.9, 4.1), y2 = structure(1:2, .Label = c("a", "b"), class = "factor")), .Names = c("t2", "y2"), row.names = c(NA, -2L), class = "data.frame")

The result should look like this:

t1  y1    y2
 1   1    a
 4  16    b

Seems like approx or approxfun would be useful, but I can't quite see how to do it.

like image 625
Drew Steen Avatar asked Oct 16 '12 19:10

Drew Steen


People also ask

How do I merge data frames to a specific column?

Both dataframes has the different number of values but only common values in both the dataframes are displayed after merge. Example 2: In the resultant dataframe Grade column of df2 is merged with df1 based on key column Name with merge type left i.e. all the values of left dataframe (df1) will be displayed.

How do I merge data frames by column in R?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

How do I combine multiple data frames into one?

We'll pass two dataframes to pd. contact() method in the form of a list and mention in which axis you want to concat, i.e. axis=0 to concat along rows, axis=1 to concat along columns.

What is the difference between merging and concatenating data frames?

Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.


2 Answers

You can do this easily with na.approx from zoo:

library(zoo)
Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE)
Data$y1 <- na.approx(Data$y1, na.rm=FALSE, rule=2)
na.omit(Data)
#    t1 y1 y2
# 1 0.9  1  a
# 6 4.1 16  b

You could do this with approx too:

Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE)
y1.na <- is.na(Data$y1)
Data$y1[y1.na] <- (approx(Data$y1, rule=2, n=NROW(Data))$y)[y1.na]
like image 177
Joshua Ulrich Avatar answered Oct 15 '22 03:10

Joshua Ulrich


@JoshuaUlrich provided a nice way to do this if you want the final result to include everything from df2 and you don't care if the t1 column has the values from t2.

However, if you wanted to avoid these things and continue in the vein suggested by @BrandonBertelsen, you might define custom round function and then use that on the merge column of the second data.frame. For example:

# define a more precise rounding function that meets your needs.
# e.g., this one rounds values in x to their nearest multiple of h
gen.round <- function(x, h) {
    ifelse(x %% h > (h/2), h + h * (x %/% h), -(h + h * (-x %/% h)))
}

# make a new merge function that uses gen.round to round the merge column 
# in the second data.frame
merge.approx <- function(x, y, by.x, by.y, h, ...) {
    y <- within(y, assign(by.y, gen.round(get(by.y), h)))
    merge(x, y, by.x=by.x, by.y=by.y, ...)
}

merge.approx(df1, df2, by.x='t1', by.y='t2', h =.5)

  t1 y1 y2
1  1  1  a
2  4 16  b
like image 33
Matthew Plourde Avatar answered Oct 15 '22 03:10

Matthew Plourde