Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the common elements from multiple vectors which appear at least in percentage of them

Tags:

r

Let's say I have 4 vectors:

a <- c("Mark","Kate","Greg", "Mathew")
b <- c("Mark","Tobias","Mary", "Mathew", "Greg")
c <- c("Mary","Chuck","Igor", "Mathew", "Robin", "Tobias")
d <- c("Kate","Mark","Igor", "Greg", "Robin", "Mathew")

I would like to select overlapping names from those vectors with an assumption that the name has to appear in at least 3 out of those 4 vectors. Of course I would like to make it easy to play with percentage of vectors the name has to be present.

Can I modify intersect somehow ?

like image 947
Shaxi Liver Avatar asked Mar 10 '23 06:03

Shaxi Liver


1 Answers

I think this would work. We use the table function to do most of the heavy lifting.

find_perc <- function(..., perc = .75){
    list_len <- length(list(...)) # how many vectors
    tab_it <- table(c(...)) # tabulate all the names
    tab_it_perc <- tab_it / list_len # calculate the frequencies
    names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}


> find_perc(a, b, c, d)
[1] "Greg"   "Mark"   "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg"   "Igor"   "Kate"   "Mark"   "Mary"   "Mathew" "Robin"  "Tobias"
like image 97
bouncyball Avatar answered Mar 13 '23 14:03

bouncyball