I need to store objects that have a number (>2) of integer member variables and do quick look-ups using any member variables as a search key.
For easier illustration let's say the objects are tuples of 3 integers and I need to do quick look-ups using any element of a tuple as a key in a list of such tuples:
collection = [(1, 200, 9),
(2, 300, 8),
(3, 400, 7)]
Look-ups would be like:
collection.lookup_by_first_element(1) # Return (1, 200, 9)
collection.lookup_by_second_element(300) # Return (2, 300, 8)
collection.lookup_by_third_element(250) # Return None
I need the lookups to be fast/efficient. My best bet so far is to use an in memory sqlite database with three columns for the three tuple elements and put indexes on all three columns.
A search tree would also do, but those have one key for lookup and I don't see how could I do lookups based on more than one key. How would you do it?
This is a simple solution. You could easily put this in a class and provide a neater interface.
>>> from collections import defaultdict
>>> collection = [(1, 200, 9),
... (2, 300, 8),
... (3, 400, 7)]
>>> keyed_dict = defaultdict(list)
>>> for tup in collection:
... for i, e in enumerate(tup):
... keyed_dict[(i, e)].append(tup)
...
>>> keyed_dict[(1, 300)]
[(2, 300, 8)]
Update:
For what it's worth, the above is way faster than the numpy solution for lookup:
from timeit import timeit
setup_code = '''
import numpy
clen = {0} # use .format() to fill this value
collection = [(n, (n + 1) * 100, clen - n) for n in xrange(clen)]
npcollection = numpy.array(collection)
def numpy_lookup(collection, column, value):
if numpy.any(collection[:, column] == value): return collection[collection[:, column] == value]
return 'None'
keyed_dict = dict()
for tup in collection:
for i, e in enumerate(tup):
keyed_dict[i, e] = tup
'''
for e in range(5):
setup = setup_code.format(str(10 ** e))
kd_stmt = '[keyed_dict[0, n] for n in range({0})]'.format(str(10 ** e))
np_stmt = '[numpy_lookup(npcollection, 0, n) for n in range({0})]'.format(str(10 ** e))
print 'using keyed_dict: ',
print timeit(stmt=kd_stmt, setup=setup, number=10)
print 'using numpy_lookup: ',
print timeit(stmt=np_stmt.format(str(10 ** e)), setup=setup, number=10)
output:
using keyed_dict: 1.09672546387e-05
using numpy_lookup: 0.000250101089478
using keyed_dict: 3.00407409668e-05
using numpy_lookup: 0.00193691253662
using keyed_dict: 0.000190019607544
using numpy_lookup: 0.0199580192566
using keyed_dict: 0.00195384025574
using numpy_lookup: 0.317503929138
using keyed_dict: 0.0319399833679
using numpy_lookup: 15.0127439499
senderle is right (I upvoted), but I want to elaborate (and more than just in a comment).
Dictionary lookups are O(1) and really fast (basically your key gets turned into a hash that addresses a specific location in memory to immediately retrieve your data from).
In contrast looking for a value by searching through an array is slow at least O(N), so for larger arrays it will take longer to find the right value. (E.g., you have to sift through all N keys, find the right one, and then can return the tuple.)
So if your keys are unique as you say, it makes sense to just create a large dictionary that is indexed based on every key you may use to lookup. Granted is you will have to represent each tuple of m items (m=3 in your case), m times in the dictionary while with a single array it only had to be represented once in the array.
So you want to define a Collection class
class Collection(object):
def __init__(self, collection):
self.collection_dict = dict()
for tup in collection:
for i, v in enumerate(tup):
self.collection_dict[(i, v)] = tup
def lookup_by_first_element(self, e):
return self.collection_dict.get((0, e), None)
def lookup_by_second_element(self, e):
return self.collection_dict.get((1, e), None)
def lookup_by_third_element(self, e):
return self.collection_dict.get((2, e), None)
collection = [(1, 200, 9),
(2, 300, 8),
(3, 400, 7)]
c = Collection(collection)
The internal c.collection_dict
is:
{(0, 1): (1, 200, 9),
(0, 2): (2, 300, 8),
(0, 3): (3, 400, 7),
(1, 200): (1, 200, 9),
(1, 300): (2, 300, 8),
(1, 400): (3, 400, 7),
(2, 7): (3, 400, 7),
(2, 8): (2, 300, 8),
(2, 9): (1, 200, 9)}
And your lookups work as you requested
>>> c.lookup_by_first_element(1) == (1, 200, 9)
True
>>> c.lookup_by_second_element(300) == (2, 300, 8)
True
>>> c.lookup_by_third_element(250) is None
True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With