Remove duplicate rows in R data frame, based on a date field and another field

Question

New to R, but learning to handle db data and hit a wall.

I want to remove duplicate rows/observations from a table, based on two criteria: A user ID field and a date field that indicates the last time there was a change to the user, so the most recent dated row.

My truncated data set would look like the following:

UID    | DateLastChange
1      |  01/01/2016
1      |  01/03/2016
2      |  01/14/2015
3      |  02/15/2014
3      |  03/15/2016

I would like to end up with:

UID    | DateLastChange
1      |  01/03/2016
2      |  01/14/2015
3      |  03/15/2016

I have attempted to use duplicate or unique, but they don't seem to fully embrace the ability to be selective. I can conceive of the possibility to build a new table with unique UIDs, then left join in some way to only match with the most recent date.

Any advice would be much appreciated. Scott

akrun · Accepted Answer

We can use data.table

library(data.table)
setDT(df1)[order(UID, -as.IDate(DateLastChange, "%m/%d/%Y")), head(.SD, 1), by = UID]
#     UID DateLastChange
#1:   1     01/03/2016
#2:   2     01/14/2015
#3:   3     03/15/2016

Or using duplicated

setDT(df1)[order(UID, -as.IDate(DateLastChange, "%m/%d/%Y"))][!duplicated(UID)]

Andrew Lavers · Answer

Using dplyr - data can be in any order

require(dplyr)
dat$DateLastChange <- strptime(dat$DateLastChange, "%m/%d%Y")) 
dat %>% group_by(UID) %>% summarize(DateLastChange = max(DateLastChange))

Remove duplicate rows in R data frame, based on a date field and another field

Tags:

date

dataframe

r

duplicates

Scottieie

Video Answer

2 Answers

akrun

Andrew Lavers

Recent Activity

Donate For Us

Remove duplicate rows in R data frame, based on a date field and another field

Tags:

date

dataframe

r

duplicates

Scottieie

Video Answer

2 Answers

akrun

Andrew Lavers

Related questions

Recent Activity

Donate For Us