I have two vectors:
a <- letters[1:5]
b <- c('a','k','w','p','b','b')
Now I want to count how many times each letter in vector a
shows up in b
. I want to get:
# 1 2 0 0 0
What should I do?
tabulate
works on integer vectors and is fast; match your letters to the universe of possible letters, then tabulate the index; use length(a)
to ensure that there is one count for each possible value.
> tabulate(match(b, a), length(a))
[1] 1 2 0 0 0
This is faster than the 'obvious' table() solution
library(microbenchmark)
f0 = function() table(factor(b,levels=a))
f1 = function() tabulate(match(b, a), length(a))
and then
> microbenchmark(f0(), f1())
Unit: microseconds
expr min lq median uq max neval
f0() 566.824 576.2985 582.950 594.4200 798.275 100
f1() 56.816 60.0180 63.305 65.4185 120.441 100
but also more general, e.g., matching numeric values without coercing to a string representation.
Make b
into a factor with the levels specified by a
. Values that are not in a
will turn into <NA>
. When you tabulate, they will be discarded (unless you specify useNA="ifany"
).
table(factor(b,levels=a))
a b c d e
1 2 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With