Let's say I have 4 vectors:
a <- c("Mark","Kate","Greg", "Mathew")
b <- c("Mark","Tobias","Mary", "Mathew", "Greg")
c <- c("Mary","Chuck","Igor", "Mathew", "Robin", "Tobias")
d <- c("Kate","Mark","Igor", "Greg", "Robin", "Mathew")
I would like to select overlapping names from those vectors with an assumption that the name has to appear in at least 3 out of those 4 vectors. Of course I would like to make it easy to play with percentage of vectors the name has to be present.
Can I modify intersect
somehow ?
I think this would work. We use the table
function to do most of the heavy lifting.
find_perc <- function(..., perc = .75){
list_len <- length(list(...)) # how many vectors
tab_it <- table(c(...)) # tabulate all the names
tab_it_perc <- tab_it / list_len # calculate the frequencies
names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}
> find_perc(a, b, c, d)
[1] "Greg" "Mark" "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg" "Igor" "Kate" "Mark" "Mary" "Mathew" "Robin" "Tobias"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With