I wish to count the number of unique values by grouping of a second variable, and then add the count to the existing data.frame as a new column. For example, if the existing data frame looks like this: <pre class="prettyprint"><code> color type 1 black chair 2 black chair 3 black sofa 4 green sofa 5 green sofa 6 red sofa 7 red plate 8 blue sofa 9 blue plate 10 blue chair </code></pre> I want to add for each <code>color</code>, the count of unique <code>types</code> that are present in the data: <pre class="prettyprint"><code> color type unique_types 1 black chair 2 2 black chair 2 3 black sofa 2 4 green sofa 1 5 green sofa 1 6 red sofa 2 7 red plate 2 8 blue sofa 3 9 blue plate 3 10 blue chair 3 </code></pre> I was hoping to use <code>ave</code>, but can't seem to find a straightforward method that doesn't require many lines. I have >100,000 rows, so am also not sure how important efficiency is. It's somewhat similar to this issue: Count number of observations/rows per group and add result to data frame

Here's a solution with the dplyr package - it has <code>n_distinct()</code> as a wrapper for <code>length(unique())</code>. <pre class="prettyprint"><code>df %>% group_by(color) %>% mutate(unique_types = n_distinct(type)) </code></pre>

Add count of unique / distinct values by group to the original data

Tags:

r

unique

count

aggregate

I wish to count the number of unique values by grouping of a second variable, and then add the count to the existing data.frame as a new column. For example, if the existing data frame looks like this:

  color  type 1 black chair 2 black chair 3 black  sofa 4 green  sofa 5 green  sofa 6   red  sofa 7   red plate 8  blue  sofa 9  blue plate 10 blue chair

I want to add for each color, the count of unique types that are present in the data:

  color  type unique_types 1 black chair            2 2 black chair            2 3 black  sofa            2 4 green  sofa            1 5 green  sofa            1 6   red  sofa            2 7   red plate            2 8  blue  sofa            3 9  blue plate            3 10 blue chair            3

I was hoping to use ave, but can't seem to find a straightforward method that doesn't require many lines. I have >100,000 rows, so am also not sure how important efficiency is.

It's somewhat similar to this issue: Count number of observations/rows per group and add result to data frame

629

asked Jul 02 '13 09:07

Bryan

2 Answers

Here's a solution with the dplyr package - it has n_distinct() as a wrapper for length(unique()).

df %>%   group_by(color) %>%   mutate(unique_types = n_distinct(type))

140

answered Sep 28 '22 19:09

Sam Firke

Using ave (since you ask for it specifically):

within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})

Make sure that type is character vector and not factor.

Since you also say your data is huge and that speed/performance may therefore be a factor, I'd suggest a data.table solution as well.

require(data.table) setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+ # if you don't want df to be modified by reference ans = as.data.table(df)[, count := uniqueN(type), by = color]

uniqueN was implemented in v1.9.6 and is a faster equivalent of length(unique(.)). In addition it also works with data.frames/data.tables.

Arun

Related questions
                            
                                How to leave the R browser() mode in the console window?
                            
                                R: 2 functions with the same name in 2 different packages
                            
                                How can I print when using %dopar%
                            
                                How to declare a vector of zeros in R
                            
                                Merge two data frames while keeping the original row order
                            
                                Understanding `scale` in R
                            
                                Using ggplot2, can I insert a break in the axis?
                            
                                Round up from .5
                            
                                Multiply rows of matrix by vector?
                            
                                Keeping trailing zeros
                            
                                Append data frames together in a for loop
                            
                                R: losing column names when adding rows to an empty data frame
                            
                                How to tell CRAN to install package dependencies automatically?
                            
                                How to group data.table by multiple columns?
                            
                                Proxy setting for R
                            
                                Error: package or namespace load failed for ggplot2 and for data.table
                            
                                Get dplyr count of distinct in a readable way
                            
                                How to use random forests in R with missing values?
                            
                                Create a Vector of All Days Between Two Dates
                            
                                Can dplyr summarise over several variables without listing each one? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With