Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use unique rows from data.frame to subset another data.frame

I have a data.frame v that I would like to use the unique rows from

#v
  DAY MONTH YEAR
1   1     1 2000
2   1     1 2000
3   2     2 2000
4   2     2 2000
5   2     3 2001

to subset a data.frame w.

# w
  DAY MONTH YEAR V1 V2 V3
1   1     1 2000  1  2  3
2   1     1 2000  3  2  1
3   2     2 2000  2  3  1
4   2     2 2001  1  2  3
5   3     4 2001  3  2  1

The result is data.frame vw. Where only the rows in 'w' that match the unique rows (e.g. (DAY, MONTH, YEAR)) in v are remaining.

# vw
  DAY MONTH YEAR V1 V2 V3
1   1     1 2000  1  2  3
2   2     2 2000  2  3  1

Right now I am using the code below, where I merge the data.frames and then use ddply to pick only the unqiue/ first instance of a row. This work, but will become cumbersome if I have to include V1=x$V1[1], etc for all of my variables in the ddply part of the code. Is there a way to use the first instance of (DAY, MONTH, YEAR) and the rest of the columns on that row?

Or, is there another to approach the problem of using unique rows from one data.frame to subset another data.frame?

v <- structure(list(DAY = c(1L, 1L, 2L, 2L, 2L), MONTH = c(1L, 1L, 
2L, 2L, 3L), YEAR = c(2000L, 2000L, 2000L, 2000L, 2001L)), .Names = c("DAY", 
"MONTH", "YEAR"), class = "data.frame", row.names = c(NA, -5L
))

w <- structure(list(DAY = c(1L, 1L, 2L, 2L, 3L), MONTH = c(1L, 1L, 
2L, 2L, 4L), YEAR = c(2000L, 2000L, 2000L, 2001L, 2001L), V1 = c(1L, 
3L, 2L, 1L, 3L), V2 = c(2L, 2L, 3L, 2L, 2L), V3 = c(3L, 1L, 1L, 
3L, 1L)), .Names = c("DAY", "MONTH", "YEAR", "V1", "V2", "V3"
), class = "data.frame", row.names = c(NA, -5L))

vw_example <- structure(list(DAY = 1:2, MONTH = 1:2, YEAR = c(2000L, 2000L), 
    V1 = 1:2, V2 = 2:3, V3 = c(3L, 1L)), .Names = c("DAY", "MONTH", 
"YEAR", "V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
-2L))

wv_inter <- merge(v, w, by=c("DAY","MONTH","YEAR"))

vw <- ddply(www,.(DAY, MONTH, YEAR),function(x) data.frame(DAY=x$DAY[1],MONTH=x$MONTH[1],YEAR=x$YEAR[1], V1=x$V1[1], V2=x$V2[1], V3=x$V3[1]))
like image 898
nofunsally Avatar asked Mar 21 '23 06:03

nofunsally


1 Answers

In base R, I would take unique of v first before merging. The merge command will by default merge on common column names, so by is unnecessary here.

vw <- merge(unique(v), w)

With your approach (take the first row from each combination), I think you could do (untested):

vw <- ddply(www,.(DAY, MONTH, YEAR),function(x) x[1,])
like image 143
Blue Magister Avatar answered Apr 06 '23 01:04

Blue Magister