Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding duplicates in a dataframe and returning count of each duplicate record

Tags:

r

I have a dataframe like

col1 col2 col3
A    B    C
A    B    C
A    B    B
A    B    B
A    B    C
B    C    A

I want to get an output in the below format:

col1 col2 col3 Count
A    B    C    3 Duplicates
A    B    B    2 Duplicates

I dont want to use any specific column in the function to find the duplicates.

That is the reason of not using add_count from dplyr.

Using duplicate will have

    col1 col2 col3 count
2   A    B    C    3
3   A    B    B    2
5   A    B    C    3

So not the desired output.

like image 223
Deep Avatar asked Dec 14 '18 04:12

Deep


People also ask

How do you count the number of duplicates in a data frame?

You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method.

How do you count repeated values in pandas?

You can use groupby with function size. Then I reset index with rename column 0 to count .

How do you count the number of repeated values in Python?

Operator. countOf() is used for counting the number of occurrences of b in a. It counts the number of occurrences of value. It returns the Count of a number of occurrences of value.


2 Answers

We can use group_by_all to group by all columns and then remove the ones which are not duplicates by selecting rows which have count > 1.

library(dplyr)

df %>%
  group_by_all() %>%
  count() %>%
  filter(n > 1)

#  col1  col2  col3      n
# <fct> <fct> <fct>   <int>
#1 A     B     B         2
#2 A     B     C         3
like image 150
Ronak Shah Avatar answered Oct 13 '22 00:10

Ronak Shah


We can use data.table

library(data.table)
setDT(df1)[, .(n =.N), names(df1)][n > 1]
#   col1 col2 col3 n
#1:    A    B    C 3
#2:    A    B    B 2

Or with base R

subset(aggregate(n ~ ., transform(df1, n = 1), FUN = sum), n > 1)
#  col1 col2 col3 n
#2    A    B    B 2
#3    A    B    C 3
like image 44
akrun Avatar answered Oct 13 '22 00:10

akrun