Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find how many times duplicated rows repeat in R data frame [duplicate]

Tags:

r

I have a data frame like the following example

a = c(1, 1, 1, 2, 2, 3, 4, 4) b = c(3.5, 3.5, 2.5, 2, 2, 1, 2.2, 7) df <-data.frame(a,b) 

I can remove duplicated rows from R data frame by the following code, but how can I find how many times each duplicated rows repeated? I need the result as a vector.

unique(df) 

or

df[!duplicated(df), ] 
like image 476
rose Avatar asked Aug 13 '13 05:08

rose


People also ask

How do I count the number of repeated rows in R?

To count the number of duplicate rows in an R data frame, we would first need to convert the data frame into a data. table object by using setDT and then count the duplicates with Count function.

How many duplicate rows exist in the Dataframe?

You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .


2 Answers

Here is solution using function ddply() from library plyr

library(plyr) ddply(df,.(a,b),nrow)    a   b V1 1 1 2.5  1 2 1 3.5  2 3 2 2.0  2 4 3 1.0  1 5 4 2.2  1 6 4 7.0  1 
like image 124
Didzis Elferts Avatar answered Sep 30 '22 01:09

Didzis Elferts


You could always kill two birds with the one stone:

aggregate(list(numdup=rep(1,nrow(df))), df, length) # or even: aggregate(numdup ~., data=transform(df,numdup=1), length) # or even: aggregate(cbind(df[0],numdup=1), df, length)    a   b numdup 1 3 1.0      1 2 2 2.0      2 3 4 2.2      1 4 1 2.5      1 5 1 3.5      2 6 4 7.0      1 
like image 29
thelatemail Avatar answered Sep 30 '22 00:09

thelatemail