Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Uniques (or dplyr distinct) + most recent date

Tags:

r

unique

dplyr

I have a dataframe consisting of rows of information that include repeats based on Name from different dates. I'd like to filter this df into one that includes only unique Names, but also to choose the most recent occurrence if given the chance. I am a big fan of dplyr and have used combinations of distinct and select before, but the documentation makes it seem that this cannot be done with it alone:

"Variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved."

This seems like a problem that would occur commonly, so I was wondering if anyone had any advice. An example df is below, which reflects that my real data has Names as a character class and the Date as POSIXct that I generated using the lubridate package.

structure(list(Name = c("John", "John", "Mary", "John", "Mary", 
"Chad"), Date = structure(c(1430438400, 1433116800, 1335830400, 
1422748800, 1435708800, 1427846400), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), .Names = c("Name", "Date"), row.names = c(NA, -6L
), class = "data.frame")

The desired result is:

structure(list(Name = c("John", "Mary", "Chad"), Date = structure(c(1433116800, 
1435708800, 1427846400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("Name", 
"Date"), row.names = c(2L, 5L, 6L), class = "data.frame")

Thank you for your help.

like image 292
Z_D Avatar asked Jul 21 '15 21:07

Z_D


People also ask

Is distinct faster than unique R?

distinct : "Retain only unique/distinct rows from an input tbl. This is similar to unique. data. frame , but considerably faster."

What does distinct () mean in R?

distinct() is a function of dplyr package that is used to select distinct or unique rows from the R data frame. In this article, I will explain the syntax, usage, and some examples of how to select distinct rows. This function also supports eliminating duplicates from tibble and lazy data frames like dbplyr or dtplyr.

How do I select distinct values in R?

To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.


1 Answers

The simplest way would be

DF %>% arrange(desc(Date)) %>% distinct(Name)

If you really want the names to be kept in the same order, these also work (thanks to @akrun):

DF %>% group_by(Name) %>% slice(which.max(Date))  # @akrun's better idea
DF %>% group_by(Name) %>% filter(Date==max(Date)) # my idea
like image 180
Frank Avatar answered Oct 13 '22 17:10

Frank