Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the number of identical elements in two vectors?

Tags:

r

I have two vectors:

 a <- letters[1:5]
 b <- c('a','k','w','p','b','b')

Now I want to count how many times each letter in vector a shows up in b. I want to get:

 # 1  2  0  0  0

What should I do?

like image 797
user3749549 Avatar asked Dec 19 '22 13:12

user3749549


2 Answers

tabulate works on integer vectors and is fast; match your letters to the universe of possible letters, then tabulate the index; use length(a) to ensure that there is one count for each possible value.

> tabulate(match(b, a), length(a))
 [1] 1 2 0 0 0

This is faster than the 'obvious' table() solution

library(microbenchmark)
f0 = function() table(factor(b,levels=a))
f1 = function() tabulate(match(b, a), length(a))

and then

> microbenchmark(f0(), f1())
Unit: microseconds
 expr     min       lq  median       uq     max neval
 f0() 566.824 576.2985 582.950 594.4200 798.275   100
 f1()  56.816  60.0180  63.305  65.4185 120.441   100

but also more general, e.g., matching numeric values without coercing to a string representation.

like image 196
Martin Morgan Avatar answered Mar 07 '23 10:03

Martin Morgan


Make b into a factor with the levels specified by a. Values that are not in a will turn into <NA>. When you tabulate, they will be discarded (unless you specify useNA="ifany").

table(factor(b,levels=a))

a b c d e 
1 2 0 0 0 
like image 30
Ben Bolker Avatar answered Mar 07 '23 08:03

Ben Bolker