Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access all dictionaries within a dictionary where a specific key has a particular value

I have a dictionary of dictionaries, with each nested dictionary having the exact same keys, like this:

all_dicts = {'a':{'name': 'A', 'city': 'foo'},
             'b':{'name': 'B', 'city': 'bar'},
             'c':{'name': 'C', 'city': 'bar'},
             'd':{'name': 'B', 'city': 'foo'},
             'e':{'name': 'D', 'city': 'bar'},
            }

How to I get a list (or dictionary) of all the dictionaries where 'city' has value 'bar'?

The following code works, but isn't scalable:

req_key = 'bar'
selected = []
for one in all_dicts.keys():
    if req_key in all_dicts[one]:
    selected.append(all_dicts[one])

Say 'city' can have 50,000 unique values and the dictionary all_dicts contains 600,000 values, iterating over the dictionary for every single value of 'city' is not very efficient.

Is there a scalable and efficient way of doing this?

like image 652
IM94 Avatar asked Jan 10 '17 10:01

IM94


1 Answers

What you could do is create an index on that dictionary, like this:

cityIndex={}
for item in all_dicts.values():
    if item['city'] in cityIndex:
        cityIndex[item['city']].append(item)
    else:
        cityIndex[item['city']]=[item]

This will require some initial processing time as well as some additional memory, but afterwards it will be very fast. If you want all items with some cityName, you'll get them by doing:

mylist=cityIndex[cityName] if cityName in cityIndex else []

This gives you many benefits if all_dicts is built once and queried afterwards many times.

If all_dicts is being modified during the execution of your program, you will need some more code to maintain the cityIndex. If an item is added to all_dicts, just do:

if item['city'] in cityIndex:
    cityIndex[item['city']].append(item)
else:
    cityIndex[item['city']]=[item]

while if an item is removed, this is a straightforward way to remove it from the index as well (assuming the combination of 'name' and 'city' is unique among your items):

for i, val in enumerate(cityIndex[item['city']]):
    if val['name']==item['name']:
        break
del cityIndex[item['city']][i]

If there are many more queries than updates, you will still get a huge performance improvement.

like image 101
xzoert Avatar answered Oct 01 '22 16:10

xzoert