Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Count Unique rows in a data frame?

I have a data frame in R which has a lot of duplicate records. I am interested in finding out how many records of each are in this data frame.

For example, I have this data frame:

Fake Name       Fake ID    Fake Status   Fake Program
June             0003         Green        PR1
June             0003         Green        PR1
Television       202          Blue         PR3
Television       202          Green        PR3    
Television       202          Green        PR3
CRT              12           Red          PR0

And from the above I would want to get something that's like similar to below:

Fake Name       Fake ID    Fake Status   Fake Program     COUNT
June             0003         Green        PR1              2
Television       202          Blue         PR3              1
Television       202          Green        PR3              2
CRT              12           Red          PR0              1

Any help would be appreciated. Thank you.

like image 287
Alokin Avatar asked Jul 10 '18 19:07

Alokin


4 Answers

The following uses duplicated to get the result data.frame and then rle to get the counts.

res <- dat[!duplicated(dat), ]

d <- duplicated(dat) | duplicated(dat, fromLast = TRUE)
res$COUNT <- rle(d)$lengths

res
#   Fake Name Fake ID Fake Status Fake Program COUNT
#1       June    0003       Green          PR1     2
#3 Television     202        Blue          PR3     1
#4 Television     202       Green          PR3     2
#6        CRT      12         Red          PR0     1
like image 107
Rui Barradas Avatar answered Oct 17 '22 05:10

Rui Barradas


Use group_by_all then count the number of rows with n:

df %>% group_by_all() %>% summarise(COUNT = n())

# A tibble: 4 x 5
# Groups:   Fake.Name, Fake.ID, Fake.Status [?]
#  Fake.Name  Fake.ID Fake.Status Fake.Program COUNT
#  <fct>        <int> <fct>       <fct>        <int>
#1 CRT             12 Red         PR0              1
#2 June             3 Green       PR1              2
#3 Television     202 Blue        PR3              1
#4 Television     202 Green       PR3              2

Or even better as from @Ryan's comment:

df %>% group_by_all %>% count
like image 43
Psidom Avatar answered Oct 17 '22 07:10

Psidom


In base R, the table function provides tabular multi-way counts of every factor combination in your data frame. The result can then be converted to data frame that matches your original structure, with an added "Freq" column containing counts.

data.frame(table(df))

#    Fake.Name Fake.ID Fake.Status Fake.Program Freq
#1         CRT    0003        Blue          PR0    0
#2        June    0003        Blue          PR0    0
#3  Television    0003        Blue          PR0    0
#4         CRT      12        Blue          PR0    0

Of course, every combination might not be needed, so you can restrict it to the rows with positive counts:

subset(data.frame(table(df)), Freq > 0)

#    Fake.Name Fake.ID Fake.Status Fake.Program Freq
#22        CRT      12         Red          PR0    1
#38       June    0003       Green          PR1    2
#63 Television     202        Blue          PR3    1
#72 Television     202       Green          PR3    2
like image 37
Joby D Avatar answered Oct 17 '22 07:10

Joby D


you could use:

n_distinct(data$col)
like image 41
Jack Novack Avatar answered Oct 17 '22 06:10

Jack Novack