Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia version of R's Match?

Tags:

r

julia

From R's help pages of match():

Description:

‘match’ returns a vector of the positions of (first) matches of its first argument in its second.

That is, I can give two vectors, match(v1,v2) returns a vector where the i-th element is the index where v1[i] appears in v2.

Is there such a similar function for Julia? I cannot find it.

like image 967
Lindon Avatar asked Sep 01 '15 14:09

Lindon


2 Answers

It sounds like you're looking for indexin (just as search fodder, this is also called ismember by Matlab). It is very slightly different: it returns a vector where the i'th element is the last index where v1[i] appears in v2.

julia> v1 = [8,6,7,11]; v2 = -10:10;
       idxs = indexin(v1, v2)
4-element Array{Int64,1}:
 19
 17
 18
  0

It returns zero for the index of an element in v1 that does not appear in v2. So you can "reconstruct" the parts of v1 that are in v2 simply by indexing by the nonzero indices:

julia> v2[idxs[idxs .> 0]]
3-element Array{Int64,1}:
 8
 6
 7

If you look at the implementation, you'll see that it uses a dictionary to store and look up the indices. This means that it only makes one pass over v1 and v2 each, as opposed to searching through v2 for every element in v1. It should be much more efficient in almost all cases.

If it's important to match R's behavior and return the first index, we can crib off the base implementation and just build the dictionary backwards so the lower indices overwrite the higher ones:

function firstindexin(a::AbstractArray, b::AbstractArray)
    bdict = Dict{eltype(b), Int}()
    for i=length(b):-1:1
        bdict[b[i]] = i
    end
    [get(bdict, i, 0) for i in a]
end

julia> firstindexin([1,2,3,4], [1,1,2,2,3,3])
4-element Array{Int64,1}:
 1
 3
 5
 0

julia> indexin([1,2,3,4], [1,1,2,2,3,3])
4-element Array{Int64,1}:
 2
 4
 6
 0
like image 199
mbauman Avatar answered Sep 30 '22 16:09

mbauman


I don't think this exists out of the box, but as @Khashaa's comment (and Tim Holy's answer to the other question) points out, you should be able to come up with your own definition fairly quickly. A first attempt:

function matched(v1::Array, v2::Array)
  matched = zeros(length(v1))
  for i = 1:length(v1)
    matched[i] = findfirst(v2, v1[i])
  end
  return matched
end

(note that I called the function matched because match is defined in Base for string matching, if you wanted to extend it you'd have to import Base.match first). You could certainly make this faster applying some of the tricks from the Julia docs' performance section if you care about performance.
This function should be doing what you're looking for if I understand correctly, try it with e.g.

v1 = [rand(1:10) for i = 1:100]
v2 = [rand(1:10) for i = 1:100]
matched2(v1,v2)
like image 40
Nils Gudat Avatar answered Sep 30 '22 18:09

Nils Gudat