Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count of unique elements of each row in a data frame in R

Tags:

dataframe

r

I have a data frame like below:

Group1  Group2  Group3  Group4
A       B       A       B   
A       C       B       A   
B       B       B       B   
A       C       B       D   
A       D       C       A   

I want to add a new column to the data frame which will have the count of unique elements in each row. Desired output:

Group1  Group2  Group3  Group4  Count
A       B       A       B       2
A       C       B       A       3
B       B       B       B       1
A       C       B       D       4
A       D       C       A       3

I am able to find such a count for each row using

length(unique(c(df[,c(1,2,3,4)][1,])))

I want to do the same thing for all rows in the data frame. I tried apply() with var=1 but without success. Also, it would be great if you could provide a more elegant solution to this.

like image 702
smaug Avatar asked Apr 24 '17 06:04

smaug


2 Answers

We can use apply with MARGIN =1 to loop over the rows

df1$Count <- apply(df1, 1, function(x) length(unique(x)))
df1$Count
#[1] 2 3 1 4 3

Or using tidyverse

library(dplyr)
df1 %>%
    rowwise() %>%
    do(data.frame(., Count = n_distinct(unlist(.))))
# A tibble: 5 × 5
#   Group1 Group2 Group3 Group4 Count
#*  <chr>  <chr>  <chr>  <chr> <int>
#1      A      B      A      B     2
#2      A      C      B      A     3
#3      B      B      B      B     1
#4      A      C      B      D     4
#5      A      D      C      A     3

We can also use regex to do this in a faster way. It is based on the assumption that there is only a single character per each cell

nchar(gsub("(.)(?=.*?\\1)", "", do.call(paste0, df1), perl = TRUE))
#[1] 2 3 1 4 3

More detailed explanation is given here

like image 146
akrun Avatar answered Oct 21 '22 07:10

akrun


duplicated in base R:

df$Count <- apply(df,1,function(x) sum(!duplicated(x)))

#  Group1 Group2 Group3 Group4 Count
#1      A      B      A      B     2
#2      A      C      B      A     3
#3      B      B      B      B     1
#4      A      C      B      D     4
#5      A      D      C      A     3
like image 26
989 Avatar answered Oct 21 '22 08:10

989