Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding all keys in a dictionary from a given list QUICKLY

Tags:

I have a (potentially quite big) dictionary and a list of 'possible' keys. I want to quickly find which of the keys have matching values in the dictionary. I've found lots of discussion of single dictionary values here and here, but no discussion of speed or multiple entries.

I've come up with four ways, and for the three that work best I compare their speed on different sample sizes below - are there better methods? If people can suggest sensible contenders I'll subject them to the analysis below as well.

Sample lists and dictionaries are created as follows:

import cProfile
from random import randint

length = 100000

listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}

 

Method 1: the 'in' keyword:

def way1(theList,theDict):
    resultsList = []
    for listItem in theList:
        if listItem in theDict:
            resultsList.append(theDict[listItem])
    return resultsList

cProfile.run('way1(listOfRandomInts,dictionaryOfRandomInts)')

32 function calls in 0.018 seconds

 

Method 2: error handling:

def way2(theList,theDict):
    resultsList = []
    for listItem in theList:
        try:
            resultsList.append(theDict[listItem])
        except:
            ;
    return resultsList

cProfile.run('way2(listOfRandomInts,dictionaryOfRandomInts)')

32 function calls in 0.087 seconds

 

Method 3: set intersection:

def way3(theList,theDict):
    return list(set(theList).intersection(set(theDict.keys())))

cProfile.run('way3(listOfRandomInts,dictionaryOfRandomInts)')

26 function calls in 0.046 seconds

 

Method 4: Naive use of dict.keys():

This is a cautionary tale - it was my first attempt and BY FAR the slowest!

def way4(theList,theDict):
    resultsList = []
    keys = theDict.keys()
    for listItem in theList:
        if listItem in keys:
            resultsList.append(theDict[listItem])
    return resultsList

cProfile.run('way4(listOfRandomInts,dictionaryOfRandomInts)')

12 function calls in 248.552 seconds

 

EDIT: Bringing the suggestions given in the answers into the same framework that I've used for consistency. Many have noted that more performance gains can be achieved in Python 3.x, particularly list comprehension-based methods. Many thanks for all of the help!

Method 5: Better way of performing intersection (thanks jonrsharpe):

def way5(theList, theDict):
    return = list(set(theList).intersection(theDict))

25 function calls in 0.037 seconds

 

Method 6: List comprehension (thanks jonrsharpe):

def way6(theList, theDict):
    return [item for item in theList if item in theDict]

24 function calls in 0.020 seconds

 

Method 7: Using the & keyword (thanks jonrsharpe):

def way7(theList, theDict):
    return list(theDict.viewkeys() & theList)

25 function calls in 0.026 seconds

For methods 1-3 and 5-7 I timed them as above with list/dictionaries of length 1000, 10000, 100000, 1000000, 10000000 and 100000000 and show a log-log plot of time taken. Across all lengths the intersection and in-statement method perform better. The gradients are all about 1 (maybe a bit higher), indicating O(n) or perhaps slightly super-linear scaling.

Log-Log plot comparing time-scaling of the 6 sensible methods with list/dict length

like image 871
StackG Avatar asked Apr 30 '15 10:04

StackG


People also ask

How do you get all the keys of a dictionary in a list?

You can get all the keys in the dictionary as a Python List. dict. keys() returns an iterable of type dict_keys() . You can convert this into a list using list() .

What Python method is used to get all the keys from a dictionary?

keys() method in Python Dictionary, returns a view object that displays a list of all the keys in the dictionary in order of insertion.


1 Answers

Of a couple of additional methods I've tried, the fastest was a simple list comprehension:

def way6(theList, theDict):
    return [item for item in theList if item in theDict]

This runs the same process as your fastest approach, way1, but more quickly. For comparison, the quickest set-based way was

def way5(theList, theDict):
    return list(set(theList).intersection(theDict))

timeit results:

>>> import timeit
>>> setup = """from __main__ import way1, way5, way6
from random import randint
length = 100000
listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}
"""
>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
14.550477756582723
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
19.597916393388232
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.652289059326904

Having added @abarnert's suggestion:

def way7(theList, theDict):
    return list(theDict.viewkeys() & theList)

and re-run the timing I now get:

>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.110055883138497
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
17.292466681101036
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
14.351759544463917
>>> timeit.timeit('way7(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
17.206370930653392

way1 and way6 have switched places, so I re-ran again:

>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.648176054011941
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.847062579316628

So it looks like the set approach is slower than the list, but the difference between the list and list comprehension is (surprisingly, to me at least) is a bit variable. I'd say just pick one, and not worry about it unless it becomes a real bottleneck later.

like image 192
jonrsharpe Avatar answered Oct 05 '22 10:10

jonrsharpe