I have a data.frame v
that I would like to use the unique rows from
#v
DAY MONTH YEAR
1 1 1 2000
2 1 1 2000
3 2 2 2000
4 2 2 2000
5 2 3 2001
to subset a data.frame w
.
# w
DAY MONTH YEAR V1 V2 V3
1 1 1 2000 1 2 3
2 1 1 2000 3 2 1
3 2 2 2000 2 3 1
4 2 2 2001 1 2 3
5 3 4 2001 3 2 1
The result is data.frame vw
. Where only the rows in 'w' that match the unique rows (e.g. (DAY, MONTH, YEAR)
) in v
are remaining.
# vw
DAY MONTH YEAR V1 V2 V3
1 1 1 2000 1 2 3
2 2 2 2000 2 3 1
Right now I am using the code below, where I merge the data.frames
and then use ddply
to pick only the unqiue/ first instance of a row. This work, but will become cumbersome if I have to include V1=x$V1[1]
, etc for all of my variables in the ddply
part of the code. Is there a way to use the first instance of (DAY, MONTH, YEAR)
and the rest of the columns on that row?
Or, is there another to approach the problem of using unique rows from one data.frame
to subset another data.frame
?
v <- structure(list(DAY = c(1L, 1L, 2L, 2L, 2L), MONTH = c(1L, 1L,
2L, 2L, 3L), YEAR = c(2000L, 2000L, 2000L, 2000L, 2001L)), .Names = c("DAY",
"MONTH", "YEAR"), class = "data.frame", row.names = c(NA, -5L
))
w <- structure(list(DAY = c(1L, 1L, 2L, 2L, 3L), MONTH = c(1L, 1L,
2L, 2L, 4L), YEAR = c(2000L, 2000L, 2000L, 2001L, 2001L), V1 = c(1L,
3L, 2L, 1L, 3L), V2 = c(2L, 2L, 3L, 2L, 2L), V3 = c(3L, 1L, 1L,
3L, 1L)), .Names = c("DAY", "MONTH", "YEAR", "V1", "V2", "V3"
), class = "data.frame", row.names = c(NA, -5L))
vw_example <- structure(list(DAY = 1:2, MONTH = 1:2, YEAR = c(2000L, 2000L),
V1 = 1:2, V2 = 2:3, V3 = c(3L, 1L)), .Names = c("DAY", "MONTH",
"YEAR", "V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-2L))
wv_inter <- merge(v, w, by=c("DAY","MONTH","YEAR"))
vw <- ddply(www,.(DAY, MONTH, YEAR),function(x) data.frame(DAY=x$DAY[1],MONTH=x$MONTH[1],YEAR=x$YEAR[1], V1=x$V1[1], V2=x$V2[1], V3=x$V3[1]))
In base R, I would take unique
of v first before merging. The merge
command will by default merge on common column names, so by
is unnecessary here.
vw <- merge(unique(v), w)
With your approach (take the first row from each combination), I think you could do (untested):
vw <- ddply(www,.(DAY, MONTH, YEAR),function(x) x[1,])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With