Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently compute the inner product of two dictionaries

Suppose I represent a feature vector using a dictionary (why? because I know the features are sparse, but, more on that later).

How should I implement the inner product of two such dictionaries (denoted, A, B)

I tried the naive approach:

for k in A:
  if k in B:
    sum += A[k] * B[k]

but it turns out to be slow.

Some more details:

  • I'm using a dictionary to represent features because

    1. The feature keys are strings
    2. There are ~20K possible keys
    3. Each vector is sparse (say, about 1000 non-zero elements).
  • I'm really interested in computing the pairwise inner product of N=2000 different dictionaries (that is, their linear kernel).

like image 628
Tomer Levinboim Avatar asked Aug 16 '13 08:08

Tomer Levinboim


2 Answers

Not sure about faster, but here's another approach:

keys = A.viewkeys() & B.viewkeys()
the_sum = sum(a[k] * b[k] for k in keys)
like image 181
Eric Avatar answered Oct 05 '22 22:10

Eric


Hmm, it seems your approach is actually the best for dense vectors:

>>> # Eric's answer
>>> timeit.timeit('sum([A[k]*B[k] for k in set(A.keys()) & set(B.keys())])', setup='A=dict((i,i) for i in xrange(100));B=dict((i,i) for i in xrange(100))', number=10000)
0.4360210521285808

>>> # My comment
>>> timeit.timeit('for k,v in A.iteritems(): sum += v*B.get(k,0)', setup='A=dict((i,i) for i in xrange(100));B=dict((i,i) for i in xrange(100));sum=0', number=10000)
0.4082838999682963

# My comment, more compact
>>> timeit.timeit('sum(v*B.get(k,0) for k,v in A.iteritems())', setup='A=dict((i,i) for i in xrange(100));B=dict((i,i) for i in xrange(100))', number=10000)
0.38053266868496394

>>> #Your approach
>>> timeit.timeit('for k in A: sum += A[k]*B[k] if k in B else 0.', setup='A=dict((i,i) for i in xrange(100));B=dict((i,i) for i in xrange(100));sum=0', number=10000)
0.35574231962510794

>>> # Your approach, more compact
>>> timeit.timeit('sum(A[k]*B[k] for k in A if k in B)', setup='A=dict((i,i) for i in xrange(100));B=dict((i,i) for i in xrange(100))', number=10000)
0.3400850549682559

For sparser ones, Eric's answer performs better but yours is still the fastest:

# Mine
>>> timeit.timeit('sum(v*B.get(k,0) for k,v in A.iteritems())', setup='import random;A=dict((i,i) for i in xrange(100) if random.random() < 0.3);B=dict((i,i) for i in xrange(100) if random.random() < 0.2)', number=10000)
0.1390782696843189

# Eric's
>>> timeit.timeit('sum([A[k]*B[k] for k in set(A.keys()) & set(B.keys())])', setup='import random;A=dict((i,i) for i in xrange(100) if random.random() < 0.3);B=dict((i,i) for i in xrange(100) if random.random() < 0.2)', number=10000)
0.11702822992151596

# Yours
>>> timeit.timeit('sum(A[k]*B[k] for k in A if k in B)', setup='import random;A=dict((i,i) for i in xrange(100) if random.random() < 0.3);B=dict((i,i) for i in xrange(100) if random.random() < 0.2)', number=10000)
0.07878250570843193

EDIT

After messing around for a bit, it seems sum([x for x ...]) is significantly faster than sum(x for x in ...). Rebenchmarking with this and Janne's remark for the keys in Eric's answer, yours is still on top (with Joowani's giving a slight improvement):

>>> timeit.timeit('sum([v*B.get(k,0) for k,v in A.items()])', setup='import random;A=dict((i,i) for i in xrange(100) if random.random() < 0.3);B=dict((i,i) for i in xrange(100) if random.random() < 0.2)', number=100000)
1.1604375791416714
>>> timeit.timeit('sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()])', setup='import random;A=dict((i,i) for i in xrange(100) if random.random() < 0.3);B=dict((i,i) for i in xrange(100) if random.random() < 0.2)', number=100000)
0.9234189571552633
>>> timeit.timeit('sum([A[k]*B[k] for k in A if k in B])', setup='import random;A=dict((i,i) for i in xrange(100) if random.random() < 0.3);B=dict((i,i) for i in xrange(100) if random.random() < 0.2)', number=100000)
0.5411289579401455
>>> timeit.timeit('sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A])', setup='import random;A=dict((i,i) for i in xrange(100) if random.random() < 0.3);B=dict((i,i) for i in xrange(100) if random.random() < 0.2)', number=100000)
0.5198972138696263

Scaling to very large sizes, you see exactly the same pattern:

>>> #Mine
>>> timeit.timeit('sum([v*B.get(k,0) for k,v in A.iteritems()])', setup='import random;A=dict((i,i) for i in xrange(10000) if random.random() < 0.1);B=dict((i,i) for i in xrange(10000) if random.random() < 0.2)', number=100000)
45.328807250833506

>>> #Eric's
>>> timeit.timeit('sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()])', setup='import random;A=dict((i,i) for i in xrange(10000) if random.random() < 0.1);B=dict((i,i) for i in xrange(10000) if random.random() < 0.2)', number=100000)
28.042937058640973

>>> #Yours
>>> timeit.timeit('sum([A[k]*B[k] for k in A if k in B])', setup='import random;A=dict((i,i) for i in xrange(10000) if random.random() < 0.1);B=dict((i,i) for i in xrange(10000) if random.random() < 0.2)', number=100000)
16.55080344861699

>>> #Joowani's
>>> timeit.timeit('sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A])', setup='import random;A=dict((i,i) for i in xrange(10000) if random.random() < 0.1);B=dict((i,i) for i in xrange(10000) if random.random() < 0.2)', number=100000)
15.485236119691308

I think Joowani's trick does not improve it significantly here because vectors are roughly the same size, but depending on your problem (if some vectors are ridiculously smaller than others) this might be more significant...

EDIT AGAIN

Oops, seems like I should have taken another coffee before posting... As Eric pointed out (although I completely missed it...), defining the array in setup keeps it the same for all trials, which is not really the best way to benchmark. With PROPER random vectors being tested, results are not significantly different, but for the sake of completeness:

>>> timeit.timeit('mine(dict((i,i) for i in xrange(100) if random.random() < 0.3),dict((i,i) for i in xrange(100) if random.random() < 0.2))', setup='import random;joowanis=lambda A,B:sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A]);mine=lambda A,B:sum([v*B.get(k,0) for k,v in A.iteritems()]);erics=lambda A,B:sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()]);yours=lambda A,B:sum([A[k]*B[k] for k in A if k in B])', number=100000)
6.294158102577967
>>> timeit.timeit('erics(dict((i,i) for i in xrange(100) if random.random() < 0.3),dict((i,i) for i in xrange(100) if random.random() < 0.2))', setup='import random;joowanis=lambda A,B:sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A]);mine=lambda A,B:sum([v*B.get(k,0) for k,v in A.iteritems()]);erics=lambda A,B:sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()]);yours=lambda A,B:sum([A[k]*B[k] for k in A if k in B])', number=100000)
6.068052507449011
>>> timeit.timeit('yours(dict((i,i) for i in xrange(100) if random.random() < 0.3),dict((i,i) for i in xrange(100) if random.random() < 0.2))', setup='import random;joowanis=lambda A,B:sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A]);mine=lambda A,B:sum([v*B.get(k,0) for k,v in A.iteritems()]);erics=lambda A,B:sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()]);yours=lambda A,B:sum([A[k]*B[k] for k in A if k in B])', number=100000)
5.745110704570834
>>> timeit.timeit('joowanis(dict((i,i) for i in xrange(100) if random.random() < 0.3),dict((i,i) for i in xrange(100) if random.random() < 0.2))', setup='import random;joowanis=lambda A,B:sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A]);mine=lambda A,B:sum([v*B.get(k,0) for k,v in A.iteritems()]);erics=lambda A,B:sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()]);yours=lambda A,B:sum([A[k]*B[k] for k in A if k in B])', number=100000)
5.737499445367575

To scale:

>>> timeit.timeit('mine(dict((i,i) for i in xrange(10000) if random.random() < 0.1),dict((i,i) for i in xrange(10000) if random.random() < 0.2))', setup='import random;joowanis=lambda A,B:sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A]);mine=lambda A,B:sum([v*B.get(k,0) for k,v in A.iteritems()]);erics=lambda A,B:sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()]);yours=lambda A,B:sum([A[k]*B[k] for k in A if k in B])', number=1000)
5.0510995368395015
>>> timeit.timeit('erics(dict((i,i) for i in xrange(10000) if random.random() < 0.1),dict((i,i) for i in xrange(10000) if random.random() < 0.2))', setup='import random;joowanis=lambda A,B:sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A]);mine=lambda A,B:sum([v*B.get(k,0) for k,v in A.iteritems()]);erics=lambda A,B:sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()]);yours=lambda A,B:sum([A[k]*B[k] for k in A if k in B])', number=1000)
4.350612399185138
>>> timeit.timeit('yours(dict((i,i) for i in xrange(10000) if random.random() < 0.1),dict((i,i) for i in xrange(10000) if random.random() < 0.2))', setup='import random;joowanis=lambda A,B:sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A]);mine=lambda A,B:sum([v*B.get(k,0) for k,v in A.iteritems()]);erics=lambda A,B:sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()]);yours=lambda A,B:sum([A[k]*B[k] for k in A if k in B])', number=1000)
4.15619379016789
>>> timeit.timeit('joowanis(dict((i,i) for i in xrange(10000) if random.random() < 0.1),dict((i,i) for i in xrange(10000) if random.random() < 0.2))', setup='import random;joowanis=lambda A,B:sum([A[k]*B[k] for k in A if k in B]) if len(A)<len(B) else sum([A[k]*B[k] for k in B if k in A]);mine=lambda A,B:sum([v*B.get(k,0) for k,v in A.iteritems()]);erics=lambda A,B:sum([A[k]*B[k] for k in A.viewkeys() & B.viewkeys()]);yours=lambda A,B:sum([A[k]*B[k] for k in A if k in B])', number=1000)
4.185129374341159

I think the bottom line is that you cannot expect significant speedup by cleverly reordering your expressions for this kind of thing... Maybe you could try doing the numerical part in C/Cython or using Scipy's Sparse package ?

like image 38
val Avatar answered Oct 05 '22 23:10

val