Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R vector-vector matching with ordered indices

Tags:

r

Here I have two string vectors whose order is important and cannot be changed.

vec1 <- c("carrot","carrot","carrot","apple","apple","mango","mango","cherry","cherry")
vec2 <- c("cherry","apple")

I wish to find out if elements in vec2 appears in vec1 and if so, where (index/position) and in what order.

I tried which(vec1 %in% vec2) which gives 4 5 8 9. These are correct indices, but in the wrong order. I tried match(vec2,vec1) which gives 8 4. Only the first match is returned. This would work if vec1 was unique.

Ideally, I am looking for this result: 8 9 4 5. cherry is first matched at pos 8 and 9 and then apple is matched at 4 and 5.

Is there a smart way to do this without resorting to loops?

like image 872
rmf Avatar asked May 07 '15 12:05

rmf


2 Answers

you can try this

unlist(lapply(vec2, function(x) which(vec1 %in% x)))
[1] 8 9 4 5

which will return successively the elements in vec1 present in vec2 one by one.

like image 139
Mamoun Benghezal Avatar answered Nov 03 '22 06:11

Mamoun Benghezal


which(!is.na(match(vec1,vec2)))[order(match(vec1,vec2)[!is.na(match(vec1,vec2))])]

Wow...there's probably an easier way to do this but...

> match(vec1,vec2)
[1] NA NA NA  2  2 NA NA  1  1

OK, so by reversing the match, I can use which() to get the index where it's not NA

> which(!is.na(match(vec1,vec2)))
[1] 4 5 8 9

This gets the indices you want, but not in the order you want. So if we use order on the match() vector it will let me re-sort to the desired value. Here, I match again, and keep only the non-NA values.

> order(match(vec1,vec2)[!is.na(match(vec1,vec2))])
[1] 3 4 1 2

Subsort by this and you get:

> which(!is.na(match(vec1,vec2)))[order(match(vec1,vec2)[!is.na(match(vec1,vec2))])]
[1] 8 9 4 5

If this is slow, save the match statement first to not do it over and over again.

like image 40
Mark Avatar answered Nov 03 '22 06:11

Mark