I would like to return the count of the unique (distinct) values for every column in a data frame. For example, if I have the table: <pre class="prettyprint"><code> Testdata <- data.frame(var_1 = c("a","a","a"), var_2 = c("b","b","b"), var_3 = c("c","d","e")) var_1 | var_2 | var_3 a | b | c a | b | d a | b | e </code></pre> I would like the output to be: <pre class="prettyprint"><code> Variable | Unique_Values var_1 | 1 var_2 | 1 var_3 | 3 </code></pre> I have tried playing around with loops using the unique function, e.g. <pre class="prettyprint"><code> for(i in names(Testdata)){ # Code using unique function } </code></pre> However I suspect there is a simpler way.

You could use <code>apply</code>: <pre class="prettyprint"><code>apply(Testdata, 2, function(x) length(unique(x))) # var_1 var_2 var_3 # 1 1 3 </code></pre>

In <code>dplyr</code>: <pre class="prettyprint"><code>Testdata %>% summarise_all(n_distinct) </code></pre> 🙂 ( For those curious about the complete syntax. In <code>dplyr >0.8.0</code> using <code>purrr</code> syntax: <pre class="prettyprint"><code>Testdata %>% summarise_all(list(~n_distinct(.))) </code></pre> In <code>dplyr <0.8.0</code>: <pre class="prettyprint"><code>Testdata %>% summarise_all(funs(n_distinct(.))) </code></pre> ) For more information on summarizing multiple columns found here: https://dplyr.tidyverse.org/reference/summarise_all.html

Count unique values for every column

Tags:

dataframe

r

unique

count

I would like to return the count of the unique (distinct) values for every column in a data frame. For example, if I have the table:

 Testdata <- data.frame(var_1 = c("a","a","a"), var_2 = c("b","b","b"), var_3 = c("c","d","e"))   var_1 | var_2 | var_3  a     | b     | c   a     | b     | d  a     | b     | e

I would like the output to be:

 Variable | Unique_Values  var_1    | 1  var_2    | 1  var_3    | 3

I have tried playing around with loops using the unique function, e.g.

 for(i in names(Testdata)){     # Code using unique function  }

However I suspect there is a simpler way.

795

asked Mar 05 '14 11:03

Zfunk

2 Answers

You could use apply:

apply(Testdata, 2, function(x) length(unique(x))) # var_1 var_2 var_3  #     1     1     3

answered Sep 30 '22 14:09

sgibb

In dplyr:

Testdata %>% summarise_all(n_distinct)

🙂

( For those curious about the complete syntax.

In dplyr >0.8.0 using purrr syntax:

Testdata %>% summarise_all(list(~n_distinct(.)))

In dplyr <0.8.0:

Testdata %>% summarise_all(funs(n_distinct(.)))

)

For more information on summarizing multiple columns found here: https://dplyr.tidyverse.org/reference/summarise_all.html

answered Sep 30 '22 12:09

leerssej

Related questions
                            
                                Merging multiple rasters in R
                            
                                What is the right way to multiply data frame by vector?
                            
                                How to adjust facet size manually
                            
                                R: How to filter/subset a sequence of dates
                            
                                Delete columns/rows with more than x% missing
                            
                                How to transpose a dataframe in tidyverse?
                            
                                How do I strip dollar signs ($) from data/ escape special characters in R?
                            
                                linear regression "NA" estimate just for last coefficient
                            
                                Is there a way to knitr markdown straight out of your workspace using RStudio?
                            
                                Create new column with dplyr mutate and substring of existing column
                            
                                Change plot title sizes in a facet_wrap multiplot
                            
                                Use filter in dplyr conditional on an if statement in R
                            
                                Saving and loading data.frames [duplicate]
                            
                                How to access to specify file in subfolder without change working directory In R?
                            
                                Install binary zipped R package via command line
                            
                                Check whether two vectors contain the same (unordered) elements in R
                            
                                How to remove duplicated column names in R?
                            
                                Transpose / reshape dataframe without "timevar" from long to wide format
                            
                                Add (subtract) months without exceeding the last day of the new month
                            
                                Should I avoid programming packages with pipe operators?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With