My objective is to get a count on how many duplicate are there in a column.
So i have a column of 3516 obs. of 1 variable,
there are all dates with about 144 duplicate each from 1/4/16 to 7/3/16.
Example:(i put 1 duplicate each for example sake)
1/4/16
1/4/16
31/3/16
31/3/16
30/3/16
30/3/16
29/3/16
29/3/16
28/3/16
28/3/16
so i used the function date = count(date)
where date is my df date.
But once i execute it my date sequence is not in order anymore.
Hope someone can solve my problem.
If you want the count of number of duplicates in your column , you can use duplicated
sum(duplicated(df$V1))
#[1] 5
Assuming V1
as your column name.
EDIT
As per the update if you want the count of each data, you can use the table
function which will give you exactly that
table(df$V1)
#1/4/16 28/3/16 29/3/16 30/3/16 31/3/16
# 2 2 2 2 2
If we need to count the total number of duplicates
sum(table(df1$date)-1)
#[1] 5
Suppose, we need the count of each date, one option would be to group by 'date' and get the number of rows. This can be done with data.table
.
library(data.table)
setDT(df1)[, .N, date]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With