Let's say I have an R vector of type character
:
vector1 = c("dog", "cat", "fish")
vector2 = c("fish", "fish", "fish")
The first vector vector1
contains three distinct elements: dog, cat, fish
However, vector2
contains all the same elements.
I'm looking for an efficient way to check this property in an R vector, preferably using base R.
My idea would be to use the following:
Check length(unique(vector1))==1
. If TRUE
, then there's only 1 element. If FALSE
, there are many.
If you're concerned with speed is looks like either @AshOfFire's all(vec == vec[1])
or uniqueN(vec) == 1
is the best. all(...)
has better performance when the elements are different, but worse when they're the same. There are other posts on this topic here as well: Test for equality among all elements of a single vector
Since you explicitly ask about speed, I don't know if this should be flagged as a dupe or not. Doing some quick microbenchmarking you can see the relative speeds:
library(data.table)
benchmark_vec <- function(vec){
microbenchmark::microbenchmark(length(unique(vec)) == 1,
all(vec == vec[1]),
isTRUE(max(vec) == min(vec)),
uniqueN(vec)==1,
unit = "relative")
}
It looks like your best bet might be to use uniqueN(vec) == 1
since isTRUE(max(vec) == min(vec))
is quite slow when the elements are not the same.
Vector of same elements:
benchmark_vec(rep("a", 1e4))
#Unit: relative
# expr min lq mean median uq max neval
# length(unique(vec)) == 1 6.059888 8.980080 8.807812 9.057240 10.131907 6.538035 100
# all(vec == vec[1]) 2.039980 2.117614 2.517966 2.769089 2.820726 2.147200 100
# isTRUE(max(vec) == min(vec)) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
# uniqueN(vec) == 1 1.679993 1.794075 2.148665 2.442206 2.547134 1.385782 100
benchmark_vec(rep("a", 1e5))
#Unit: relative
# expr min lq mean median uq max neval
# length(unique(vec)) == 1 5.732161 6.898531 7.935981 7.098417 6.776363 52.733981 100
# all(vec == vec[1]) 2.084232 2.416316 2.366826 2.482896 2.454888 2.025258 100
# isTRUE(max(vec) == min(vec)) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
# uniqueN(vec) == 1 1.254857 1.653767 1.632287 1.707769 1.755275 1.401932 100
benchmark_vec(rep("a", 1e6))
#Unit: relative
# expr min lq mean median uq max neval
# length(unique(vec)) == 1 5.993300 6.057139 6.478008 6.083278 6.104486 9.124542 100
# all(vec == vec[1]) 2.249211 2.182959 2.261631 2.182345 2.224141 6.421657 100
# isTRUE(max(vec) == min(vec)) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
# uniqueN(vec) == 1 1.325453 1.451214 1.869176 1.457588 1.470810 6.657961 100
Vector of different elements:
benchmark_vec(sample(LETTERS, 1e4, replace = TRUE))
#Unit: relative
# expr min lq mean median uq # max neval
# length(unique(vec)) == 1 2.989151 2.999928 3.233178 3.031249 3.260122 4.471498 100
# all(vec == vec[1]) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
# isTRUE(max(vec) == min(vec)) 92.619106 91.765377 89.899963 92.227913 93.951104 64.507502 100
# uniqueN(vec) == 1 1.478271 1.494703 1.608562 1.531235 1.637707 2.528502 100
benchmark_vec(sample(LETTERS, 1e5, replace = TRUE))
#Unit: relative
# expr min lq mean median uq max neval
# length(unique(vec)) == 1 3.142858 3.010220 2.945582 2.887597 2.925129 4.482119 100
# all(vec == vec[1]) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
# isTRUE(max(vec) == min(vec)) 97.273487 79.818135 73.850769 75.442748 71.441439 41.772795 100
# uniqueN(vec) == 1 1.280180 1.431999 2.009661 1.431815 1.447287 32.446761 100
benchmark_vec(sample(LETTERS, 1e6, replace = TRUE))
#Unit: relative
# expr min lq mean median uq max neval
# length(unique(vec)) == 1 3.228670 2.898367 2.799075 2.941651 2.914313 1.360938 100
# all(vec == vec[1]) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
# isTRUE(max(vec) == min(vec)) 92.506220 79.923456 67.347683 78.362151 75.028194 13.611664 100
# uniqueN(vec) == 1 2.263843 2.031869 1.861165 2.058220 2.051759 1.129074 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With