I've got two data frames containing timeseries (with time coded as numeric, rather than time objects; and time is unsorted). I'd like to normalize a response variable in one data frame to a response variable in another data frame. The problem is that the timepoints in the two data frames aren't quite equivalent. So, I'll need to merge the two data frames by the approximate match of the two time columns. The data look like this: <pre class="prettyprint"><code>df1 <- structure(list(t1 = c(3, 1, 2, 4), y1 = c(9, 1, 4, 16)), .Names = c("t1", "y1"), row.names = c(NA, -4L), class = "data.frame") df2 <- structure(list(t2 = c(0.9, 4.1), y2 = structure(1:2, .Label = c("a", "b"), class = "factor")), .Names = c("t2", "y2"), row.names = c(NA, -2L), class = "data.frame") </code></pre> The result should look like this: <pre class="prettyprint"><code>t1 y1 y2 1 1 a 4 16 b </code></pre> Seems like <code>approx</code> or <code>approxfun</code> would be useful, but I can't quite see how to do it.

You can do this easily with <code>na.approx</code> from zoo: <pre class="prettyprint"><code>library(zoo) Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE) Data$y1 <- na.approx(Data$y1, na.rm=FALSE, rule=2) na.omit(Data) # t1 y1 y2 # 1 0.9 1 a # 6 4.1 16 b </code></pre> You could do this with <code>approx</code> too: <pre class="prettyprint"><code>Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE) y1.na <- is.na(Data$y1) Data$y1[y1.na] <- (approx(Data$y1, rule=2, n=NROW(Data))$y)[y1.na] </code></pre>

@JoshuaUlrich provided a nice way to do this if you want the final result to include everything from <code>df2</code> and you don't care if the <code>t1</code> column has the values from <code>t2</code>. However, if you wanted to avoid these things and continue in the vein suggested by @BrandonBertelsen, you might define custom <code>round</code> function and then use that on the merge column of the second <code>data.frame</code>. For example: <pre class="prettyprint"><code># define a more precise rounding function that meets your needs. # e.g., this one rounds values in x to their nearest multiple of h gen.round <- function(x, h) { ifelse(x %% h > (h/2), h + h * (x %/% h), -(h + h * (-x %/% h))) } # make a new merge function that uses gen.round to round the merge column # in the second data.frame merge.approx <- function(x, y, by.x, by.y, h, ...) { y <- within(y, assign(by.y, gen.round(get(by.y), h))) merge(x, y, by.x=by.x, by.y=by.y, ...) } merge.approx(df1, df2, by.x='t1', by.y='t2', h =.5) t1 y1 y2 1 1 1 a 2 4 16 b </code></pre>

Merge data frames by approximate column values

Tags:

r

I've got two data frames containing timeseries (with time coded as numeric, rather than time objects; and time is unsorted). I'd like to normalize a response variable in one data frame to a response variable in another data frame. The problem is that the timepoints in the two data frames aren't quite equivalent. So, I'll need to merge the two data frames by the approximate match of the two time columns.

The data look like this:

df1 <- structure(list(t1 = c(3, 1, 2, 4), y1 = c(9, 1, 4, 16)), .Names = c("t1", "y1"), row.names = c(NA, -4L), class = "data.frame")
df2 <- structure(list(t2 = c(0.9, 4.1), y2 = structure(1:2, .Label = c("a", "b"), class = "factor")), .Names = c("t2", "y2"), row.names = c(NA, -2L), class = "data.frame")

The result should look like this:

t1  y1    y2
 1   1    a
 4  16    b

Seems like approx or approxfun would be useful, but I can't quite see how to do it.

625

asked Oct 16 '12 19:10

Drew Steen

2 Answers

You can do this easily with na.approx from zoo:

library(zoo)
Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE)
Data$y1 <- na.approx(Data$y1, na.rm=FALSE, rule=2)
na.omit(Data)
#    t1 y1 y2
# 1 0.9  1  a
# 6 4.1 16  b

You could do this with approx too:

Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE)
y1.na <- is.na(Data$y1)
Data$y1[y1.na] <- (approx(Data$y1, rule=2, n=NROW(Data))$y)[y1.na]

177

answered Oct 15 '22 03:10

Joshua Ulrich

@JoshuaUlrich provided a nice way to do this if you want the final result to include everything from df2 and you don't care if the t1 column has the values from t2.

However, if you wanted to avoid these things and continue in the vein suggested by @BrandonBertelsen, you might define custom round function and then use that on the merge column of the second data.frame. For example:

# define a more precise rounding function that meets your needs.
# e.g., this one rounds values in x to their nearest multiple of h
gen.round <- function(x, h) {
    ifelse(x %% h > (h/2), h + h * (x %/% h), -(h + h * (-x %/% h)))
}

# make a new merge function that uses gen.round to round the merge column 
# in the second data.frame
merge.approx <- function(x, y, by.x, by.y, h, ...) {
    y <- within(y, assign(by.y, gen.round(get(by.y), h)))
    merge(x, y, by.x=by.x, by.y=by.y, ...)
}

merge.approx(df1, df2, by.x='t1', by.y='t2', h =.5)

  t1 y1 y2
1  1  1  a
2  4 16  b

answered Oct 15 '22 03:10

Matthew Plourde

Related questions
                            
                                remove quotation marks from string at beginning and end only if both are present
                            
                                Create a new data frame column by picking a value in others columns according to an index column
                            
                                geom_point: Put overlapping points with highest values on top of others
                            
                                How to delete blank lines with readLines in R?
                            
                                what is a Callback mechanism and how it applies in R
                            
                                R namespace access and match.fun
                            
                                Delete characters at positions within a string in R?
                            
                                Efficient use of as.numeric() and factor()
                            
                                Bold boxplot labels in R
                            
                                How can I remove rows containing '0' of certain columns while keeping the rows IDs of remaining rows in R
                            
                                arranging ggplot2 legend items in a grid
                            
                                R ggplot2 - Simple plot- cannot specify log axis limits
                            
                                Calculate the difference betwen pairs of consecutive rows in a data frame - R
                            
                                Creating a filled contour plot using data in lists
                            
                                Vectorized element-wise division on Sparse Matrices in R
                            
                                standard errors for loess in R
                            
                                grouping by date ranges
                            
                                Switching row-major to column-major dimensions
                            
                                How to change column values in a data frame?
                            
                                ggplot2 one line per each row dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With