Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter duplicates from a data frame in r [duplicate]

I have a dataframe with one observation per row and two observations per subject. I'd like to filter out just the rows with duplicate 'day' numbers.

ex <- data.frame('id'= rep(1:5,2), 'day'= c(1:5, 1:3,5:6))    

The following code filters out just the second duplicated row, but not the first. Again, I'd like to filter out both of the duplicated rows.

ex %>% 
    group_by(id) %>% 
    filter(duplicated(day))

The following code works, but seems clunky. Does anyone have a more efficient solution?

ex %>% 
    group_by(id) %>% 
    filter(duplicated(day, fromLast = TRUE) | duplicated(day, fromLast = FALSE))
like image 647
afleishman Avatar asked Nov 04 '16 19:11

afleishman


People also ask

How do I remove duplicates from a data set in R?

Remove Duplicate rows in R using Dplyr – distinct () function. Distinct function in R is used to remove duplicate rows in R using Dplyr package. Dplyr package in R is provided with distinct() function which eliminate duplicates rows with single variable or with multiple variable.

How to remove duplicates from the data frame in R?

To remove duplicates from the data frame in R, use the duplicated () function and pass the column name as a parameter and use the ! outside the duplicated () function, which returns the unique rows of the data frame. It completely removes the row from the data frame having duplicate values.

How to group_by and filter duplicated rows in R?

ex %>% group_by (id) %>% filter (duplicated (day, fromLast = TRUE) | duplicated (day, fromLast = FALSE)) duplicated can be applied on the whole dataset and this can be done with just base R methods. Using dplyr, we can group_by both the columns and filter only when the number of rows ( n ()) is greater than 1.

How do you find duplicates in a data frame?

Find the duplicate elements in the R data frame. To remove duplicates from the data frame in R, use the duplicated () function and pass the column name as a parameter and use the ! outside the duplicated () function, which returns the unique rows of the data frame.

What does the R function duplicated () return?

The R function duplicated() returns a logical vector where TRUE specifies which elements of a vector or data frame are duplicates.


1 Answers

Single tidyverse pipe:

exSinglesOnly <- 
    ex %>% 
    group_by(id,day) %>% # the complete group of interest
    mutate(duplicate = n()) %>% # count number in each group
    filter(duplicate == 1) %>% # select only unique records
    select(-duplicate) # remove group count column
> exSinglesOnly
Source: local data frame [4 x 2]
Groups: id, day [4]

     id   day
  <int> <int>
1     4     4
2     5     5
3     4     5
4     5     6
like image 164
leerssej Avatar answered Oct 19 '22 13:10

leerssej