Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get indexes of a vector of numbers in another vector

Tags:

r

vector

Let's suppose we have the following vector:

v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)

Given a sequence of numbers, for instance c(2,3,5,8), I am trying to find what the position of this sequence of numbers is in the vector v. The result I expect is something like:

FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE

I am trying to use which(v == c(2,3,5,8)), but it doesn't give me what I am looking for.

like image 998
eirvine Avatar asked Feb 07 '18 09:02

eirvine


4 Answers

Using base R you could do the following:

v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
x <- c(2,3,5,8)

idx <- which(v == x[1])
idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
# [1]  2 12

This tells you that the exact sequence appears twice, starting at positions 2 and 12 of your vector v.

It first checks the possible starting positions, i.e. where v equals the first value of x and then loops through these positions to check if the values after these positions also equal the other values of x.

like image 158
talat Avatar answered Oct 11 '22 17:10

talat


Two other approaches using the shift-function trom data.table:

library(data.table)

# option 1
which(rowSums(mapply('==',
                     shift(v, type = 'lead', n = 0:(length(x) - 1)),
                     x)
              ) == length(x))

# option 2
which(Reduce("+", Map('==',
                      shift(v, type = 'lead', n = 0:(length(x) - 1)),
                      x)
             ) == length(x))

both give:

[1]  2 12

To get a full vector of the matching positions:

l <- length(x)
w <- which(Reduce("+", Map('==',
                           shift(v, type = 'lead', n = 0:(l - 1)),
                           x)
                  ) == l)
rep(w, each = l) + 0:(l-1)

which gives:

[1]  2  3  4  5 12 13 14 15

The benchmark which was included earlier in this answer has been moved to a separate community wiki answer.


Used data:

v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
x <- c(2,3,5,8)
like image 22
Jaap Avatar answered Oct 11 '22 19:10

Jaap


You can use rollapply() from zoo

v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
x <- c(2,3,5,8)

library("zoo")
searchX <- function(x, X) all(x==X)
rollapply(v, FUN=searchX, X=x, width=length(x))

The result TRUEshows you the beginning of the sequence.
The code could be simplified to rollapply(v, length(x), identical, x) (thanks to G. Grothendieck):

set.seed(2)
vl <- as.numeric(sample(1:10, 1e6, TRUE))
# vm <- vl[1:1e5]
# vs <- vl[1:1e4]
x <- c(2,3,5)

library("zoo")
searchX <- function(x, X) all(x==X)
i1 <- rollapply(vl, FUN=searchX, X=x, width=length(x))
i2 <- rollapply(vl, width=length(x), identical, y=x)

identical(i1, i2)

For using identical() both arguments must be of the same type (num and int are not the same).
If needed == coerces int to num; identical() does not any coercion.

like image 15
jogo Avatar answered Oct 11 '22 17:10

jogo


I feel like looping should be efficient:

w = seq_along(v)
for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]

w 
# [1]  2 12

This should be writable in C++ following @SymbolixAU approach for extra speed.

A basic comparison:

# create functions for selected approaches
redjaap <- function(v,x)
  which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x))
loop <- function(v,x){
  w = seq_along(v)
  for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
  w
}

# check consistency
identical(redjaap(v,x), loop(v,x))
# [1] TRUE

# check speed
library(microbenchmark)
vv <- rep(v, 1e4)
microbenchmark(redjaap(vv,x), loop(vv,x), times = 100)
# Unit: milliseconds
#            expr      min       lq      mean   median       uq       max neval cld
#  redjaap(vv, x) 5.883809 8.058230 17.225899 9.080246 9.907514  96.35226   100   b
#     loop(vv, x) 3.629213 5.080816  9.475016 5.578508 6.495105 112.61242   100  a 

# check consistency again
identical(redjaap(vv,x), loop(vv,x))
# [1] TRUE
like image 11
Frank Avatar answered Oct 11 '22 18:10

Frank