<pre class="prettyprint"><code>ID= c('A', 'A', 'A', 'B', 'B', 'B') color=c('white', 'green', 'orange', 'white', 'green', 'green') d = data.frame (ID, color) </code></pre> My desired result is <pre class="prettyprint"><code>unique_colors=c(3,3,3,2,2,2) d = data.frame (ID, color, unique_colors) </code></pre> or more clear in a new dataframe c <pre class="prettyprint"><code>ID= c('A','B') unique_colors=c(3,2) c = data.frame (ID,unique_colors) </code></pre> I've tried different combinations of <code>aggregate</code> and <code>ave</code> as well as <code>by</code> and <code>with</code> and I suppose it is a combination of those functions. The solution would include: <pre class="prettyprint"><code>length(unique(d$color)) </code></pre> to calculate the number of unique elements.

I think you've got it all wrong here. There is no need neither in <code>plyr</code> or <code><-</code> when using <code>data.table</code>. Recent versions of data.table, v >= 1.9.6, have a new function <code>uniqueN()</code> just for that. <pre class="prettyprint"><code>library(data.table) ## >= v1.9.6 setDT(d)[, .(count = uniqueN(color)), by = ID] # ID count # 1: A 3 # 2: B 2 </code></pre> If you want to create a new column with the counts, use the <code>:=</code> operator <pre class="prettyprint"><code>setDT(d)[, count := uniqueN(color), by = ID] </code></pre> <hr> Or with <code>dplyr</code> use the <code>n_distinct</code> function <pre class="prettyprint"><code>library(dplyr) d %>% group_by(ID) %>% summarise(count = n_distinct(color)) # Source: local data table [2 x 2] # # ID count # 1 A 3 # 2 B 2 </code></pre> Or (if you want a new column) use <code>mutate</code> instead of <code>summary</code> <pre class="prettyprint"><code>d %>% group_by(ID) %>% mutate(count = n_distinct(color)) </code></pre>

How to count the number of unique values by group? [duplicate]

Tags:

r

ID= c('A', 'A', 'A', 'B', 'B', 'B') color=c('white', 'green', 'orange', 'white', 'green', 'green')  d = data.frame (ID, color)

My desired result is

unique_colors=c(3,3,3,2,2,2) d = data.frame (ID, color, unique_colors)

or more clear in a new dataframe c

ID= c('A','B') unique_colors=c(3,2) c = data.frame (ID,unique_colors)

I've tried different combinations of aggregate and ave as well as by and with and I suppose it is a combination of those functions.

The solution would include:

length(unique(d$color))

to calculate the number of unique elements.

747

asked Jan 27 '15 15:01

rmuc8

1 Answers

I think you've got it all wrong here. There is no need neither in plyr or <- when using data.table.

Recent versions of data.table, v >= 1.9.6, have a new function uniqueN() just for that.

library(data.table) ## >= v1.9.6 setDT(d)[, .(count = uniqueN(color)), by = ID] #    ID count # 1:  A     3 # 2:  B     2

If you want to create a new column with the counts, use the := operator

setDT(d)[, count := uniqueN(color), by = ID]

Or with dplyr use the n_distinct function

library(dplyr) d %>%   group_by(ID) %>%   summarise(count = n_distinct(color)) # Source: local data table [2 x 2] #  #   ID count # 1  A     3 # 2  B     2

Or (if you want a new column) use mutate instead of summary

d %>%   group_by(ID) %>%   mutate(count = n_distinct(color))

107

answered Oct 05 '22 18:10

David Arenburg

Related questions
                            
                                Can rbind be parallelized in R?
                            
                                Formatting ggplot2 axis labels with commas (and K? MM?) if I already have a y-scale
                            
                                Can I calculate z-score with R? [duplicate]
                            
                                Is it possible to swap columns around in a data frame using R?
                            
                                How to remove extra white space between words inside a character vector using?
                            
                                Changing shapes used for scale_shape() in ggplot2
                            
                                How to delete rows from a data.frame, based on an external list, using R?
                            
                                Moving color key in R heatmap.2 (function of gplots package)
                            
                                How to not show all labels on ggplot axis?
                            
                                Initialize an empty tibble with column names and 0 rows
                            
                                Calculate correlation for more than two variables?
                            
                                Selecting a subset of columns in a data.table
                            
                                How to hide or disable in-function printed message
                            
                                How can I rbind vectors matching their column names?
                            
                                Plot polynomial regression curve in R
                            
                                Random forest output interpretation
                            
                                R data.table apply function to rows using columns as arguments
                            
                                data.table - select first n rows within group [duplicate]
                            
                                using substitute to get argument name with
                            
                                Sink does not release file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With