Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partial merge in R

Tags:

r

Before I end up at -7 like Two by two matching between dataframes in r did, let me say I've already read the following pages:

  • How to copy row from one data.frame in to another [R]
  • R How to update a column in data.frame using values from another data.frame
  • Using one data.frame to update another

Actually, the last one is really similar to what I want, but not the same, because my columns are different

I have two dataframes, let's say:

> d <- data.frame(year=c(2004,2004,2006),month = c(1,5,3), height = c(1000,2000,3000) )
> d
  year month height
1 2004     1   1000
2 2004     5   2000
3 2006     3   3000

> e <- data.frame(year=c(2004),month=c(5), height = c(9999))
> e
  year month height
1 2004     5   9999

Obviously the real data is longer than this.

I want to merge the values from e into d

Try raw merge:

> merge(d,e)
[1] year   month  height
<0 rows> (or 0-length row.names)

Ok. So add "by":

> merge(d,e,by=c("year","month"))
  year month height.x height.y
1 2004     5     2000     9999

Ok, it did an inner join, and got rid of all the original data in d. So try left outer join:

> merge(d,e,by=c("year","month"),all.x = T)
  year month height.x height.y
1 2004     1     1000       NA
2 2004     5     2000     9999
3 2006     3     3000       NA

It did a join, and it's correct as per outer-join definition, but it didn't do what I want, which is to update the values in d from the values in e. What I really want is more like an sql update:

for (year,month,height) in e:
    update d set d.height=e.height where d.year = e.year and d.month = e.month

ie the results I want would be:

> magic(d,e)
  year month height
1 2004     1   1000
2 2004     5   9999
3 2006     3   3000

Of course, I could just write a bunch of for loops, but I'm hoping there is some vectorized way of doing this?

Edit: my example had only one key column, but my real problem has two. Updated the example to reflect this.

like image 342
Hugh Perkins Avatar asked Jan 16 '23 06:01

Hugh Perkins


1 Answers

You can use data.table

edit noting both e and d have a key defined by month and year

library(data.table)
DD <- as.data.table(d)
DE <- as.data.table(e)

setkey(DD,  year, month)
setkey(DE,  year, month)

DD[DE, height := i.height]

note that I have prefixed height with i. to ensure that it is reading the height value from the i component.

If you read the introduction to data.table vignette, you will quickly understand the relationship between rownames and data.table keys!

like image 156
mnel Avatar answered Jan 26 '23 13:01

mnel