Find duplicated elements with dplyr

Tags:

dplyr

I tried using the code presented here to find ALL duplicated elements with dplyr like this:

library(dplyr)

mtcars %>%
mutate(cyl.dup = cyl[duplicated(cyl) | duplicated(cyl, from.last = TRUE)])

How can I convert code presented here to find ALL duplicated elements with dplyr? My code above just throws an error? Or even better, is there another function that will achieve this more succinctly than the convoluted x[duplicated(x) | duplicated(x, from.last = TRUE)]) approach?

510

asked Jan 30 '15 20:01

luciano

Video Answer

3 Answers

I guess you could use filter for this purpose:

mtcars %>% 
  group_by(carb) %>% 
  filter(n()>1)

Small example (note that I added summarize() to prove that the resulting data set does not contain rows with duplicate 'carb'. I used 'carb' instead of 'cyl' because 'carb' has unique values whereas 'cyl' does not):

mtcars %>% group_by(carb) %>% summarize(n=n())
#Source: local data frame [6 x 2]
#
#  carb  n
#1    1  7
#2    2 10
#3    3  3
#4    4 10
#5    6  1
#6    8  1

mtcars %>% group_by(carb) %>% filter(n()>1) %>% summarize(n=n())
#Source: local data frame [4 x 2]
#
#  carb  n
#1    1  7
#2    2 10
#3    3  3
#4    4 10

120

answered Oct 18 '22 02:10

Marat Talipov

Another solution is to use janitor package:

mtcars %>% get_dupes(wt)

answered Oct 18 '22 02:10

radek

We can find duplicated elements with dplyr as follows.

library(dplyr)

# Only duplicated elements
mtcars %>%
  filter(duplicated(.[["carb"]])

# All duplicated elements
mtcars %>%
  filter(carb %in% unique(.[["carb"]][duplicated(.[["carb"]])]))

answered Oct 18 '22 01:10

Keiku

Related questions
                            
                                Error in fetch(key) : lazy-load database
                            
                                Usage of `...` (three-dots or dot-dot-dot) in functions [duplicate]
                            
                                ggplot combining two plots from different data.frames
                            
                                Return index of the smallest value in a vector?
                            
                                Create a data.frame where a column is a list
                            
                                Formula with dynamic number of variables
                            
                                How can I interrupt a running code in R with a keyboard command?
                            
                                Trimming a huge (3.5 GB) csv file to read into R
                            
                                R sequence of dates with lubridate
                            
                                Saving a high resolution image in R
                            
                                Removing NA in dplyr pipe [duplicate]
                            
                                How to parse milliseconds?
                            
                                Is there a built-in way to do a logarithmic color scale in ggplot2?
                            
                                Creating a Prompt/Answer system to input data into R
                            
                                R Apply() function on specific dataframe columns
                            
                                Select random element in a list of R?
                            
                                Select rows from a data frame based on values in a vector
                            
                                Auto-format R code in RStudio
                            
                                What are the differences between community detection algorithms in igraph?
                            
                                How to use the switch statement in R functions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With