Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find duplicated elements with dplyr

Tags:

r

dplyr

I tried using the code presented here to find ALL duplicated elements with dplyr like this:

library(dplyr)

mtcars %>%
mutate(cyl.dup = cyl[duplicated(cyl) | duplicated(cyl, from.last = TRUE)])

How can I convert code presented here to find ALL duplicated elements with dplyr? My code above just throws an error? Or even better, is there another function that will achieve this more succinctly than the convoluted x[duplicated(x) | duplicated(x, from.last = TRUE)]) approach?

like image 510
luciano Avatar asked Jan 30 '15 20:01

luciano


People also ask

How do I filter duplicates in R?

Use group_by , filter and duplicated Functions to Remove Duplicate Rows by Column in R. Another solution to remove duplicate rows by column values is to group the data frame with the column variable and then filter elements using filter and duplicated functions.

How do I remove duplicates in dplyr?

Remove Duplicate rows in R using Dplyr – distinct () function. Distinct function in R is used to remove duplicate rows in R using Dplyr package. Dplyr package in R is provided with distinct() function which eliminate duplicates rows with single variable or with multiple variable.

Is duplicated R?

duplicated() in R The duplicated() is a built-in R function that determines which elements of a vector or data frame are duplicates of elements with smaller subscripts and returns a logical vector indicating which elements (rows) are duplicates.


Video Answer


3 Answers

I guess you could use filter for this purpose:

mtcars %>% 
  group_by(carb) %>% 
  filter(n()>1)

Small example (note that I added summarize() to prove that the resulting data set does not contain rows with duplicate 'carb'. I used 'carb' instead of 'cyl' because 'carb' has unique values whereas 'cyl' does not):

mtcars %>% group_by(carb) %>% summarize(n=n())
#Source: local data frame [6 x 2]
#
#  carb  n
#1    1  7
#2    2 10
#3    3  3
#4    4 10
#5    6  1
#6    8  1

mtcars %>% group_by(carb) %>% filter(n()>1) %>% summarize(n=n())
#Source: local data frame [4 x 2]
#
#  carb  n
#1    1  7
#2    2 10
#3    3  3
#4    4 10
like image 120
Marat Talipov Avatar answered Oct 18 '22 02:10

Marat Talipov


Another solution is to use janitor package:

mtcars %>% get_dupes(wt)
like image 29
radek Avatar answered Oct 18 '22 02:10

radek


We can find duplicated elements with dplyr as follows.

library(dplyr)

# Only duplicated elements
mtcars %>%
  filter(duplicated(.[["carb"]])

# All duplicated elements
mtcars %>%
  filter(carb %in% unique(.[["carb"]][duplicated(.[["carb"]])]))
like image 22
Keiku Avatar answered Oct 18 '22 01:10

Keiku