Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, find elements of a vector in a list using vectorization

I have a vector v1

v1 = c(1, 200, 4000)

I would like to find the indices of the elements of v1 in a list L1 vectorially, i.e. without a loop, where

> L1
[[1]]
[1] 1 2 3 4

[[2]]
[1] 100 200 300 400

[[3]]
[1] 1000 2000 3000 4000

The output should be c(1, 2, 4).

Is there a way to do this without using a loop or apply (which is computationally the same as using a loop?) I have to do this for very long vectors.

like image 283
Omry Atia Avatar asked Dec 06 '22 12:12

Omry Atia


2 Answers

We can do

sapply(L1, function(x) which(x %in% v1))
#[1] 1 2 4

Or with Vectorize

Vectorize(function(x) which(x %in% v1))(L1)
#[1] 1 2 4

If each element is checked against corresponding element of another

mapply(function(x, y) which(x %in% y), L1, v1)
#[1] 1 2 4

As @nicola mentioned match could also be used to get the first index. If there are duplicate elements, then which would be useful

mapply(match, v1, L1)
#[1] 1 2 4

Or using the purrr::map2

purrr::map2_int(L1, v1, ~ .x %in% .y %>%
                                    which)
#[1] 1 2 4
like image 67
akrun Avatar answered Feb 16 '23 12:02

akrun


we can do this, seems to be the fastest by far.

v1 <- c(1, 200, 4000)
L1 <- list(1:4, 1:4*100, 1:4*1000)

sequence(lengths(L1))[match(v1, unlist(L1))]
# [1] 1 2 4
sequence(lengths(L1))[which(unlist(L1) %in% v1)]
# [1] 1 2 4

library(microbenchmark)
library(tidyverse)

microbenchmark(
  akrun_sapply = {sapply(L1, function(x) which(x %in% v1))},
  akrun_Vectorize = {Vectorize(function(x) which(x %in% v1))(L1)},
  akrun_mapply = {mapply(function(x, y) which(x %in% y), L1, v1)},
  akrun_mapply_match = {mapply(match, v1, L1)},
  akrun_map2 = {purrr::map2_int(L1, v1, ~ .x %in% .y %>% which)},
  CPak = {setNames(rep(1:length(L1), times=lengths(L1)), unlist(L1))[as.character(v1)]},
  zacdav = {sequence(lengths(L1))[match(v1, unlist(L1))]},
  zacdav_which = {sequence(lengths(L1))[which(unlist(L1) %in% v1)]},
  times = 10000
)

Unit: microseconds
               expr     min       lq      mean   median       uq        max neval
       akrun_sapply  18.187  22.7555  27.17026  24.6140  27.8845   2428.194 10000
    akrun_Vectorize  60.119  76.1510  88.82623  83.4445  89.9680   2717.420 10000
       akrun_mapply  19.006  24.2100  29.78381  26.2120  29.9255   2911.252 10000
 akrun_mapply_match  14.136  18.4380  35.45528  20.0275  23.6560 127960.324 10000
         akrun_map2 217.209 264.7350 303.64609 277.5545 298.0455   9204.243 10000
               CPak  15.741  19.7525  27.31918  24.7150  29.0340    235.245 10000
             zacdav   6.649   9.3210  11.30229  10.4240  11.5540   2399.686 10000
       zacdav_which   7.364  10.2395  12.22632  11.2985  12.4515   2492.789 10000
like image 26
zacdav Avatar answered Feb 16 '23 11:02

zacdav