I have a 180,000 x 400 dataframe where the rows correspond to users but every user has exactly two rows.
id date ...
1 2012 ...
3 2010 ...
2 2013 ...
2 2014 ...
1 2011 ...
3 2014 ...
I want to subset the data so that only the most recent row for each user is retained (i.e. the row with the highest value for date for each id).
I first tried using which()
looping ids
with an ifelse()
statement in sapply()
which was painfully slow (O(n^2)
I believe).
Then I tried sorting the df
by id
and then looping through in increments of two and comparing adjacent dates but this was also slow (I guess because loops in R are hopeless). The comparison of the two dates is the bottleneck as the sort was pretty much instant.
Is there a way to vectorize the comparison?
SOLUTION from Remove duplicates keeping entry with largest absolute value
aa <- df[order(df$id, -df$date), ] #sort by id and reverse of date
aa[!duplicated(aa$id),]
Runs very quickly!!
Here's a simple and fast approach using data.table package
library(data.table)
setDT(df)[, .SD[which.max(date)], id]
# id date
# 1: 1 2012
# 2: 3 2014
# 3: 2 2014
Or (could be a bit faster because of keyed by
setkey(setDT(df), id)[, .SD[which.max(date)], id]
Or using OPs idea via the data.table
package
unique(setorder(setDT(df), id, -date), by = "id")
Or
setorder(setDT(df), id, -date)[!duplicated(id)]
Or base R solution
with(df, tapply(date, id, function(x) x[which.max(x)]))
## 1 2 3
## 2012 2014 2014
Another way
library(dplyr)
df %>%
group_by(id) %>%
filter(date == max(date)) # Will keep all existing columns but allow multiple rows in case of ties
# Source: local data table [3 x 2]
# Groups: id
#
# id date
# 1 1 2012
# 2 2 2014
# 3 3 2014
Or
df %>%
group_by(id) %>%
slice(which.max(date)) # Will keep all columns but won't return multiple rows in case of ties
Or
df %>%
group_by(id) %>%
summarise(max(date)) # Will remove all other columns and wont return multiple rows in case of ties
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With