I would like to return the count of the unique (distinct) values for every column in a data frame. For example, if I have the table:
Testdata <- data.frame(var_1 = c("a","a","a"), var_2 = c("b","b","b"), var_3 = c("c","d","e")) var_1 | var_2 | var_3 a | b | c a | b | d a | b | e
I would like the output to be:
Variable | Unique_Values var_1 | 1 var_2 | 1 var_3 | 3
I have tried playing around with loops using the unique function, e.g.
for(i in names(Testdata)){ # Code using unique function }
However I suspect there is a simpler way.
1. Count of unique values in each column. Using the pandas dataframe nunique() function with default parameters gives a count of all the distinct values in each column. In the above example, the nunique() function returns a pandas Series with counts of distinct values in each column.
You can get the number of unique values in the column of pandas DataFrame using several ways like using functions Series. unique. size, Series. nunique(), Series.
You could use apply
:
apply(Testdata, 2, function(x) length(unique(x))) # var_1 var_2 var_3 # 1 1 3
In dplyr
:
Testdata %>% summarise_all(n_distinct)
🙂
( For those curious about the complete syntax.
In dplyr >0.8.0
using purrr
syntax:
Testdata %>% summarise_all(list(~n_distinct(.)))
In dplyr <0.8.0
:
Testdata %>% summarise_all(funs(n_distinct(.)))
)
For more information on summarizing multiple columns found here: https://dplyr.tidyverse.org/reference/summarise_all.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With