Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering a dictionary by multiple values

I'm new to Python and I've had a difficult time formulating a "pythony" way of filtering a dictionary.

I have a dictionary that looks like this:

'217': {'586': 2.0, '578': 5.0, '172': 1.0, '1222': 1.0, '597': 4.0, '1303': 2.0, '195': 5.0, ...}
'660': {'176': 4.0, '174': 3.0, '231': 5.0, '233': 4.0, '797': 4.0, '541': 3.0, '27': 1.0, '210': 4.0, ...}

and so on.

Then I have a list of strings:

['2', '4', '17', '21', '22', '24', '27', '28', '29', '33', '39', ...]

What I want to achieve is a filtered dictionary where only the tuples with any values in the list of strings exist. I've managed to do exactly that but with using only one string as a requirement and it looks like this:

filtered_dict = {k: v for (k, v) in my_dict.iteritems() if my_list[0] in v}

However if I remove the [0] I get the following error:

TypeError: unhashable type: 'list' Here are some expected output values:

'115 : {'174': 4.0, '172': 4.0, '27': 3.0, '183': 5.0, '180': 3.0, '1039': 5.0, ... }

'212' : {'4': 3.0, '473': 4.0, '208': 5.0, '123': 4.0, '476': 3.0, '474': 4.0, ...}

As you can see, the third value in the first tuple has '27' in it which was in my_list. The first value in the second tuple has the value '4' which also was in my_list.

I would very much appreciate any sort of help on this.

Thank you!

like image 680
Super_Soet Avatar asked Mar 12 '23 16:03

Super_Soet


2 Answers

You can use a set, keeping pairs if value is not disjoint from the set:

st = {'2', '4', '17', '21', '22', '24', '27', '28', '29', '33', '39'}

filtered_dict = {k: v for (k, v) in my_dict.iteritems() if not st.isdisjoint(v)}

They will only be disjoint if they do not share at least one element:

In [6]: st ={1,2,3}

In [7]: v = [1,4,5]

In [8]: st.isdisjoint(v)
Out[8]: False

In [11]: v = [7,4,5]

In [12]: st.isdisjoint(v)
Out[12]: True

It is somewhat equivalent to:

st = {'2', '4', '17', '21', '22', '24', '27', '28', '29', '33', '39'}

filtered_dict = {k: v for (k, v) in my_dict.iteritems() if any(val in st for val in v)}

Where if any element from v is in the set we keep the elements.

Worst case we check len(v) elements against the set, iterating over the mylist elements and checking if x is in v is O(len(my_list)) for every case and also creates the same k,v pairing every time you get a match.

If we add a dict with more match keys and add the pairs to to a list, you can see what happens:

In [22]: my_list  = ['2', '4', '17', '21', '22', '24', '27', '28', '29', '33', '39']

In [23]: my_dict = {'217': {'586': 2.0, '578': 5.0, '172': 1.0, '1222': 1.0, '597': 4.0, '1303': 2.0, '195': 5.0},
   ....:            '660': {'176': 4.0, '174': 3.0, '231': 5.0, '233': 4.0, '797': 4.0, '541': 3.0, '27': 1.0, '210': 4.0},
   ....:            '661': {'2': 4.0, '4': 3.0, '29': 5.0, '17': 4.0, '39': 4.0, '541': 3.0, '27': 1.0, '210': 4.0}}


In [24]: print [(k,v) for k,v in my_dict.iteritems() for x in my_list if x in v]
[('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('660', {'797': 4.0, '27': 1.0, '541': 3.0, '210': 4.0, '176': 4.0, '174': 3.0, '231': 5.0, '233': 4.0})]

Because there are 7 matches, we get the same pairing seven times, you won't get dupes in a dict but you will have to iterate len(my_list) times and create the pairing length of x in v being True times. As the length of my_list grows the run time will increase accordingly.

Just making my_list 1000 elements there is a considerable difference:

In [35]: len(my_list)
Out[35]: 1000

In [36]: %%timeit
   ....: st = set(my_list)
   ....: {k: v for (k, v) in my_dict.iteritems() if not st.isdisjoint(v)}
   ....: 
10000 loops, best of 3: 21.9 µs per loop
In [37]: timeit {k:v for k,v in my_dict.iteritems() for x in my_list if x in v}
10000 loops, best of 3: 136 µs per loop

Almost all that time is spent in the set creation, if you use a set from the start it runs 100 times faster:

In [40]: timeit {k: v for (k, v) in my_dict.iteritems() if not st.isdisjoint(v)}

1000000 loops, best of 3: 1.09 µs per loop
like image 131
Padraic Cunningham Avatar answered Mar 24 '23 05:03

Padraic Cunningham


If i understand your question correctly, you need to traverse list mylist and compare each element with values of dictionary mydict

{k:v for k,v in mydict.iteritems() for x in mylist if x in v}

1) k:v for k,v in mydict.iteritems() is iterating your dictionary pair by pair.

2) for x in mylist if x in v is iterating mylist and checks whether mylist element in inner dictionary v, therefore checks x in v is equal to checking x in v.keys() which x in the keys of v

like image 33
Haifeng Zhang Avatar answered Mar 24 '23 06:03

Haifeng Zhang