Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MemberQ in Mathematica

I am a bit at a loss how to do the following efficiently in Mathematica:

a = { 1, 2, 3, 4, 5 };  (* list of integers *)
b = { 2, 4, 6, 8 };     (* another list of integers *)
filter = Table[MemberQ[b, element], {element,a}]

Expected output is:

{False, True, False, True, False}

My lists a and b are big, so Mathematica is doing a kazillion linear searches through b. I want it to do faster lookups with a hashtable. But there seems to be no such structure. The closest I could find is a SparseArray, but

sa = SparseArray[{1 -> True, 2 -> True}];
MemberQ[sa, 1]

is False.

I'm sure this must be possible in Mathematica in one line of code or less, I just can't see it for the trees, or something.

Any hero to the rescue? Meanwhile, I'm going to do this with C#.

like image 258
Gleno Avatar asked Sep 12 '10 19:09

Gleno


2 Answers

The following question is equivalent to yours and contains an answer in Mathematica:

https://stackoverflow.com/questions/1954636/given-list-subset-of-list-return-bitmask

Here's that answer (which I'll feel free to steal since it's actually mine):

bitmask[a_, b_] := Module[{f},
  f[_] = False;
  (f[#] = True)& /@ b;
  f /@ a]

The idea is to make a hash table, f, that can answer in constant time whether a given element is a member of list b. First f is given a default value of False (if we haven't seen it before it's not in b). Then a single pass through the list b is made, setting f[i] to True for each i in b. That's all the set up. Now a single pass of the hash function f over the list a gives us the answer. Total time is O(n+m) -- one pass through each list.

like image 116
dreeves Avatar answered Oct 13 '22 14:10

dreeves


Another idea would be to use rules and Dispatch. It seems to be faster when the length of the blist is large:

alist = Prime /@ Range[20000];
blist = alist + 2;

membQ = First@Timing[MemberQ[blist,#]&/@alist;]

sparse = First@Timing[
    s = SparseArray[blist -> ConstantArray[True, Length@blist], Max[alist, blist], False];
    s[[#]] & /@ alist;
]

rules = First@Timing[
    bRules = Dispatch[Append[Rule[#, True] & /@ blist, _ -> False]];
    (# /. bRules) & /@ alist;
]

fct = First@Timing[
    f[_] = False;
    (f[#] = True) & /@ blist;
    f /@ alist;
]

On my laptop, rules (0.06s) < fct (0.09s) < sparse (0.9s) < membQ (40s).

Edit / Comparing fct and rules timings

@Karsten please feel free to rollback to your original answer

Tables generated with Prime[...]

alt text

Tables generated with RandomInteger[...]

alt text

like image 37
Karsten W. Avatar answered Oct 13 '22 15:10

Karsten W.