Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python dictionary comprehension very slow

I have a dictionary d1 and a list l1.

The dictionary keys are strings, and the values are Objects I have defined myself. If it helps, I can describe the Object in more detail but for now, the objects have a list attribute names, and some of the elements of name may or may not appear in l1.

What I wanted to do was to throw away any element of the dictionary d1, in which the name attribute of the object in said element does not contain any of the elements that appear in l1.

As a trivial example:

l1 = ['cat', 'dog', 'mouse', 'horse', 'elephant', 
      'zebra', 'lion', 'snake', 'fly']

d1 = {'1':['dog', 'mouse', 'horse','orange', 'lemon'],
      '2':['apple', 'pear','cat', 'mouse', 'horse'], 
      '3':['kiwi', 'lime','cat', 'dog', 'mouse'], 
      '4':['carrot','potato','cat', 'dog', 'horse'], 
      '5':['chair', 'table', 'knife']}

so the resulting dictionary will be more or less the same but the elements of each list will be the key-value pairs from 1 to 4 excluding the fruit and vegetables, and will not contain a 5th key-value par as none of the furniture values appear in l1.

To do this I used a nested list/dictionary comprehension which looked like this:

d2 = {k: [a for a in l1 if a in d1[k]] for k in d1.keys()}
print(d2)

>>>>{'1': ['dog', 'mouse', 'horse'], 
     '3': ['cat', 'dog', 'mouse'], 
     '2': ['cat', 'mouse', 'horse'], 
     '5': [], 
     '4': ['cat', 'dog', 'horse']}

d2 = {k: v for k,v in d2.iteritems() if len(v)>0}
print(d2)

>>>>{'1': ['dog', 'mouse', 'horse'], 
     '3': ['cat', 'dog', 'mouse'], 
     '2': ['cat', 'mouse', 'horse'],  
     '4': ['cat', 'dog', 'horse'],}

This seems to work, but for big dictionaries, 7000+ items, it takes around 20 seconds to work through. In and of itself, not horrible, but I need to do this inside a loop that will iterate 10,000 times, so currently it's not feasible. Any suggestions on how to do this quickly?

like image 206
Davy Kavanagh Avatar asked Aug 10 '12 14:08

Davy Kavanagh


1 Answers

You are effectively computing the set intersection of each list occuring in the dictionary values with the list l1. Using lists for set intersections is rather inefficient because of the linear searches involved. You should turn l1 into a set and use set.intersection() or set membership tests instead (depending on whether it is acceptable that the result is a set again).

The full code could look like this:

l1 = set(l1)
d2 = {k: [s for s in v if s in l1] for k, v in d1.iteritems()}
d2 = {k: v for k, v in d2.iteritems() if v}

Instead of the two dictionary comprehensions, it might also be preferable to use a single for loop here:

l1 = set(l1)
d2 = {}
for k, v in d1.iteritems():
    v = [s for s in v if s in l1]
    if v:
        d2[k] = v
like image 114
Jolly Jumper Avatar answered Oct 21 '22 21:10

Jolly Jumper