Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select most recent date, by row in R [duplicate]

Tags:

r

subset

I have a problem that I have reduced to the following task. For a dataframe with IDs and dates;

set.seed(123)
myids <- sample(c('a001', 'a002', 'a003'), 12, replace = TRUE)
mydates <- as.Date(sample(c("2007-06-22", "2004-02-13", "2007-05-22", "2001-10-10", "2008-05-05", "2004-02-15"), 12, replace = TRUE))
mydf <- data.frame(myids, mydates)

I need to select only the row with the most recent date, for each subject. The result should be:

a001    5/5/08
a002    5/5/08
a003    2/15/04

Anyone know how to do this?

like image 223
marcel Avatar asked Feb 09 '23 02:02

marcel


1 Answers

Here's a data.table solution.

library(data.table)
setDT(mydf)[,.SD[which.max(mydates)],keyby=myids]
#    myids    mydates
# 1:  a001 2008-05-05
# 2:  a002 2008-05-05
# 3:  a003 2004-02-15
like image 109
jlhoward Avatar answered Feb 11 '23 16:02

jlhoward