Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count unique values for every column

I would like to return the count of the unique (distinct) values for every column in a data frame. For example, if I have the table:

 Testdata <- data.frame(var_1 = c("a","a","a"), var_2 = c("b","b","b"), var_3 = c("c","d","e"))   var_1 | var_2 | var_3  a     | b     | c   a     | b     | d  a     | b     | e 

I would like the output to be:

 Variable | Unique_Values  var_1    | 1  var_2    | 1  var_3    | 3 

I have tried playing around with loops using the unique function, e.g.

 for(i in names(Testdata)){     # Code using unique function  } 

However I suspect there is a simpler way.

like image 795
Zfunk Avatar asked Mar 05 '14 11:03

Zfunk


People also ask

How do I find unique values in each column?

1. Count of unique values in each column. Using the pandas dataframe nunique() function with default parameters gives a count of all the distinct values in each column. In the above example, the nunique() function returns a pandas Series with counts of distinct values in each column.

How do you count unique values in a DataFrame column?

You can get the number of unique values in the column of pandas DataFrame using several ways like using functions Series. unique. size, Series. nunique(), Series.


2 Answers

You could use apply:

apply(Testdata, 2, function(x) length(unique(x))) # var_1 var_2 var_3  #     1     1     3 
like image 96
sgibb Avatar answered Sep 30 '22 14:09

sgibb


In dplyr:

Testdata %>% summarise_all(n_distinct) 

🙂

( For those curious about the complete syntax.

In dplyr >0.8.0 using purrr syntax:

Testdata %>% summarise_all(list(~n_distinct(.))) 

In dplyr <0.8.0:

Testdata %>% summarise_all(funs(n_distinct(.))) 

)

For more information on summarizing multiple columns found here: https://dplyr.tidyverse.org/reference/summarise_all.html

like image 22
leerssej Avatar answered Sep 30 '22 12:09

leerssej