How to Count Unique rows in a data frame?

Question

I have a data frame in R which has a lot of duplicate records. I am interested in finding out how many records of each are in this data frame.

For example, I have this data frame:

Fake Name       Fake ID    Fake Status   Fake Program
June             0003         Green        PR1
June             0003         Green        PR1
Television       202          Blue         PR3
Television       202          Green        PR3    
Television       202          Green        PR3
CRT              12           Red          PR0

And from the above I would want to get something that's like similar to below:

Fake Name       Fake ID    Fake Status   Fake Program     COUNT
June             0003         Green        PR1              2
Television       202          Blue         PR3              1
Television       202          Green        PR3              2
CRT              12           Red          PR0              1

Any help would be appreciated. Thank you.

Rui Barradas · Accepted Answer

The following uses duplicated to get the result data.frame and then rle to get the counts.

res <- dat[!duplicated(dat), ]

d <- duplicated(dat) | duplicated(dat, fromLast = TRUE)
res$COUNT <- rle(d)$lengths

res
#   Fake Name Fake ID Fake Status Fake Program COUNT
#1       June    0003       Green          PR1     2
#3 Television     202        Blue          PR3     1
#4 Television     202       Green          PR3     2
#6        CRT      12         Red          PR0     1

Psidom · Answer

Use group_by_all then count the number of rows with n:

df %>% group_by_all() %>% summarise(COUNT = n())

# A tibble: 4 x 5
# Groups:   Fake.Name, Fake.ID, Fake.Status [?]
#  Fake.Name  Fake.ID Fake.Status Fake.Program COUNT
#  <fct>        <int> <fct>       <fct>        <int>
#1 CRT             12 Red         PR0              1
#2 June             3 Green       PR1              2
#3 Television     202 Blue        PR3              1
#4 Television     202 Green       PR3              2

Or even better as from @Ryan's comment:

df %>% group_by_all %>% count

Joby D · Answer

In base R, the table function provides tabular multi-way counts of every factor combination in your data frame. The result can then be converted to data frame that matches your original structure, with an added "Freq" column containing counts.

data.frame(table(df))

#    Fake.Name Fake.ID Fake.Status Fake.Program Freq
#1         CRT    0003        Blue          PR0    0
#2        June    0003        Blue          PR0    0
#3  Television    0003        Blue          PR0    0
#4         CRT      12        Blue          PR0    0

Of course, every combination might not be needed, so you can restrict it to the rows with positive counts:

subset(data.frame(table(df)), Freq > 0)

#    Fake.Name Fake.ID Fake.Status Fake.Program Freq
#22        CRT      12         Red          PR0    1
#38       June    0003       Green          PR1    2
#63 Television     202        Blue          PR3    1
#72 Television     202       Green          PR3    2

Jack Novack · Answer

you could use:

n_distinct(data$col)

How to Count Unique rows in a data frame?

Tags:

dataframe

r

aggregate

dplyr

Alokin

4 Answers

Rui Barradas

Psidom

Joby D

Jack Novack

Recent Activity

Donate For Us

How to Count Unique rows in a data frame?

Tags:

dataframe

r

aggregate

dplyr

Alokin

4 Answers

Rui Barradas

Psidom

Joby D

Jack Novack

Related questions

Recent Activity

Donate For Us