I have a data frame like the following example
a = c(1, 1, 1, 2, 2, 3, 4, 4) b = c(3.5, 3.5, 2.5, 2, 2, 1, 2.2, 7) df <-data.frame(a,b)
I can remove duplicated rows from R data frame by the following code, but how can I find how many times each duplicated rows repeated? I need the result as a vector.
unique(df)
or
df[!duplicated(df), ]
To count the number of duplicate rows in an R data frame, we would first need to convert the data frame into a data. table object by using setDT and then count the duplicates with Count function.
You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .
Here is solution using function ddply()
from library plyr
library(plyr) ddply(df,.(a,b),nrow) a b V1 1 1 2.5 1 2 1 3.5 2 3 2 2.0 2 4 3 1.0 1 5 4 2.2 1 6 4 7.0 1
You could always kill two birds with the one stone:
aggregate(list(numdup=rep(1,nrow(df))), df, length) # or even: aggregate(numdup ~., data=transform(df,numdup=1), length) # or even: aggregate(cbind(df[0],numdup=1), df, length) a b numdup 1 3 1.0 1 2 2 2.0 2 3 4 2.2 1 4 1 2.5 1 5 1 3.5 2 6 4 7.0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With