Use unique rows from data.frame to subset another data.frame

Question

I have a data.frame v that I would like to use the unique rows from

#v
  DAY MONTH YEAR
1   1     1 2000
2   1     1 2000
3   2     2 2000
4   2     2 2000
5   2     3 2001

to subset a data.frame w.

# w
  DAY MONTH YEAR V1 V2 V3
1   1     1 2000  1  2  3
2   1     1 2000  3  2  1
3   2     2 2000  2  3  1
4   2     2 2001  1  2  3
5   3     4 2001  3  2  1

The result is data.frame vw. Where only the rows in 'w' that match the unique rows (e.g. (DAY, MONTH, YEAR)) in v are remaining.

# vw
  DAY MONTH YEAR V1 V2 V3
1   1     1 2000  1  2  3
2   2     2 2000  2  3  1

Right now I am using the code below, where I merge the data.frames and then use ddply to pick only the unqiue/ first instance of a row. This work, but will become cumbersome if I have to include V1=x$V1[1], etc for all of my variables in the ddply part of the code. Is there a way to use the first instance of (DAY, MONTH, YEAR) and the rest of the columns on that row?

Or, is there another to approach the problem of using unique rows from one data.frame to subset another data.frame?

v <- structure(list(DAY = c(1L, 1L, 2L, 2L, 2L), MONTH = c(1L, 1L, 
2L, 2L, 3L), YEAR = c(2000L, 2000L, 2000L, 2000L, 2001L)), .Names = c("DAY", 
"MONTH", "YEAR"), class = "data.frame", row.names = c(NA, -5L
))

w <- structure(list(DAY = c(1L, 1L, 2L, 2L, 3L), MONTH = c(1L, 1L, 
2L, 2L, 4L), YEAR = c(2000L, 2000L, 2000L, 2001L, 2001L), V1 = c(1L, 
3L, 2L, 1L, 3L), V2 = c(2L, 2L, 3L, 2L, 2L), V3 = c(3L, 1L, 1L, 
3L, 1L)), .Names = c("DAY", "MONTH", "YEAR", "V1", "V2", "V3"
), class = "data.frame", row.names = c(NA, -5L))

vw_example <- structure(list(DAY = 1:2, MONTH = 1:2, YEAR = c(2000L, 2000L), 
    V1 = 1:2, V2 = 2:3, V3 = c(3L, 1L)), .Names = c("DAY", "MONTH", 
"YEAR", "V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
-2L))

wv_inter <- merge(v, w, by=c("DAY","MONTH","YEAR"))

vw <- ddply(www,.(DAY, MONTH, YEAR),function(x) data.frame(DAY=x$DAY[1],MONTH=x$MONTH[1],YEAR=x$YEAR[1], V1=x$V1[1], V2=x$V2[1], V3=x$V3[1]))

Blue Magister · Accepted Answer

In base R, I would take unique of v first before merging. The merge command will by default merge on common column names, so by is unnecessary here.

vw <- merge(unique(v), w)

With your approach (take the first row from each combination), I think you could do (untested):

vw <- ddply(www,.(DAY, MONTH, YEAR),function(x) x[1,])

Use unique rows from data.frame to subset another data.frame

Tags:

dataframe

r

unique

subset

plyr

nofunsally

1 Answers

Blue Magister

Recent Activity

Donate For Us

Use unique rows from data.frame to subset another data.frame

Tags:

dataframe

r

unique

subset

plyr

nofunsally

1 Answers

Blue Magister

Related questions

Recent Activity

Donate For Us