I have a vector of scalar values of which I'm trying to get: "How many different values there are". For instance in <code>group <- c(1,2,3,1,2,3,4,6)</code> unique values are <code>1,2,3,4,6</code> so I want to get <code>5</code>. I came up with: <pre class="prettyprint"><code>length(unique(group)) </code></pre> But I'm not sure it's the most efficient way to do it. Isn't there a better way to do this? Note: My case is more complex than the example, consisting of around 1000 numbers with at most 25 different values.

I have used this function <pre class="prettyprint"><code>length(unique(array)) </code></pre> and it works fine, and doesn't require external libraries.

Count number of distinct values in a vector

Q: How do you count unique values in a vector?

The length of values vector gives you the number of unique values. Show activity on this post. uniqueN function from data. table is equivalent to length(unique(group)) .

Q: How do you find the number of unique values in a vector in R?

To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.

Q: How do you count unique values in CPP?

Using sort function() Calculate the length of an array using the length() function that will return an integer value as per the elements in an array. Call the sort function and pass the array and the size of an array as a parameter. Take a temporary variable that will store the count of distinct elements.

Tags:

performance

r

count

I have a vector of scalar values of which I'm trying to get: "How many different values there are".

For instance in group <- c(1,2,3,1,2,3,4,6) unique values are 1,2,3,4,6 so I want to get 5.

I came up with:

length(unique(group))

But I'm not sure it's the most efficient way to do it. Isn't there a better way to do this?

Note: My case is more complex than the example, consisting of around 1000 numbers with at most 25 different values.

702

asked Aug 05 '13 10:08

AdrieanKhisbe

2 Answers

Here are a few ideas, all points towards your solution already being very fast. length(unique(x)) is what I would have used as well:

x <- sample.int(25, 1000, TRUE)  library(microbenchmark) microbenchmark(length(unique(x)),                nlevels(factor(x)),                length(table(x)),                sum(!duplicated(x))) # Unit: microseconds #                 expr     min       lq   median       uq      max neval #    length(unique(x))  24.810  25.9005  27.1350  28.8605   48.854   100 #   nlevels(factor(x)) 367.646 371.6185 380.2025 411.8625 1347.343   100 #     length(table(x)) 505.035 511.3080 530.9490 575.0880 1685.454   100 #  sum(!duplicated(x))  24.030  25.7955  27.4275  30.0295   70.446   100

answered Sep 27 '22 20:09

flodel

I have used this function

length(unique(array))

and it works fine, and doesn't require external libraries.

answered Sep 27 '22 20:09

lindix

Related questions
                            
                                How to effectively deal with uncompressed saves during package check?
                            
                                In R, what does "loaded via a namespace (and not attached)" mean?
                            
                                When writing my own R package, I can't seem to get other packages to import correctly
                            
                                Most efficient list to data.frame method?
                            
                                Pass arguments into function within a function
                            
                                How do I show the source code of an S4 function in a package?
                            
                                Count number of columns by a condition (>) for each row
                            
                                What are the differences between vector, matrix and array data types?
                            
                                How to read a .csv file containing apostrophes into R?
                            
                                change thickness of the whole line geom_boxplot()
                            
                                Annotate ggplot2 facets with number of observations per facet [duplicate]
                            
                                Text clustering with Levenshtein distances
                            
                                Fastest way to add rows for missing time steps?
                            
                                How to use the strsplit function with a period
                            
                                Is it possible to get code completion for R in Emacs ESS similar to what is available in Rstudio?
                            
                                Show correlations as an ordered list, not as a large matrix
                            
                                Print list without line numbers in R
                            
                                Why can't you use repetition quantifiers in zero-width look behind assertions?
                            
                                Load R package from character string
                            
                                Package error when running r code on command line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With