Suppose I have a vector and I don't know, apriori, its unique elements (here: 1 and 2).
vec <-
c(1, 1, 1, 2, 2, 2, 2)
I was interested in knowing is there a better way (or elegant way) of getting the number of unique elements in vec
i.e. the same result as table(vec)
. It doesn't matter if its a data.frame or a named vector.
R> table(vec)
vec
1 2
3 4
Reason: I was curious to know if there is a better way. Also, I noticed that there is a for
loop in the base
implementation (in addition to .C call). I don't know if it's a big concern, but when I do something like
R> table(rep(1:1000,100000))
R takes really long time. I am sure it's because of the huge number 100000. But is there a way of making it faster?
EDIT This also does a good job in addition to Chase's
answer.
R> rle(sort(sampData))
This is an interesting problem - I'm curious to see other thoughts on this. Looking at the source for table()
reveals that it builds off of tabulate()
. tabulate()
has a few quirks apparently, namely that it only deals with positive integers and returns an integer vector without names. We can use unique()
on our vector to apply the names()
. If you need to tabulate zero or negative values, I guess going back and reviewing table()
would be necessary as tabulate()
doesn't seem to do that per the examples on the help page.
table2 <- function(data) {
x <- tabulate(data)
y <- sort(unique(data))
names(x) <- y
return(x)
}
And a quick test:
> set.seed(42)
> sampData <- sample(1:5, 10000000, TRUE, prob = c(.3,.25, .2, .15, .1))
>
> system.time(table(sampData))
user system elapsed
4.869 0.669 5.503
> system.time(table2(sampData))
user system elapsed
0.410 0.200 0.605
>
> table(sampData)
sampData
1 2 3 4 5
2999200 2500232 1998652 1500396 1001520
> table2(sampData)
1 2 3 4 5
2999200 2500232 1998652 1500396 1001520
EDIT: I just realized there is a count()
function in plyr
which is another alternative to table()
. In the test above, it performs better than table()
, and slightly worse than the hack-job solution I put together:
library(plyr)
system.time(count(sampData))
user system elapsed
1.620 0.870 2.483
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With