Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of distinct values in a vector

I have a vector of scalar values of which I'm trying to get: "How many different values there are".

For instance in group <- c(1,2,3,1,2,3,4,6) unique values are 1,2,3,4,6 so I want to get 5.

I came up with:

length(unique(group)) 

But I'm not sure it's the most efficient way to do it. Isn't there a better way to do this?

Note: My case is more complex than the example, consisting of around 1000 numbers with at most 25 different values.

like image 702
AdrieanKhisbe Avatar asked Aug 05 '13 10:08

AdrieanKhisbe


People also ask

How do you count unique values in a vector?

The length of values vector gives you the number of unique values. Show activity on this post. uniqueN function from data. table is equivalent to length(unique(group)) .

How do you find the number of unique values in a vector in R?

To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.

How do you count unique values in CPP?

Using sort function() Calculate the length of an array using the length() function that will return an integer value as per the elements in an array. Call the sort function and pass the array and the size of an array as a parameter. Take a temporary variable that will store the count of distinct elements.


2 Answers

Here are a few ideas, all points towards your solution already being very fast. length(unique(x)) is what I would have used as well:

x <- sample.int(25, 1000, TRUE)  library(microbenchmark) microbenchmark(length(unique(x)),                nlevels(factor(x)),                length(table(x)),                sum(!duplicated(x))) # Unit: microseconds #                 expr     min       lq   median       uq      max neval #    length(unique(x))  24.810  25.9005  27.1350  28.8605   48.854   100 #   nlevels(factor(x)) 367.646 371.6185 380.2025 411.8625 1347.343   100 #     length(table(x)) 505.035 511.3080 530.9490 575.0880 1685.454   100 #  sum(!duplicated(x))  24.030  25.7955  27.4275  30.0295   70.446   100 
like image 82
flodel Avatar answered Sep 27 '22 20:09

flodel


I have used this function

length(unique(array)) 

and it works fine, and doesn't require external libraries.

like image 24
lindix Avatar answered Sep 27 '22 20:09

lindix