I'm new to Python and I've had a difficult time formulating a "pythony" way of filtering a dictionary.
I have a dictionary that looks like this:
'217': {'586': 2.0, '578': 5.0, '172': 1.0, '1222': 1.0, '597': 4.0, '1303': 2.0, '195': 5.0, ...}
'660': {'176': 4.0, '174': 3.0, '231': 5.0, '233': 4.0, '797': 4.0, '541': 3.0, '27': 1.0, '210': 4.0, ...}
and so on.
Then I have a list of strings:
['2', '4', '17', '21', '22', '24', '27', '28', '29', '33', '39', ...]
What I want to achieve is a filtered dictionary where only the tuples with any values in the list of strings exist. I've managed to do exactly that but with using only one string as a requirement and it looks like this:
filtered_dict = {k: v for (k, v) in my_dict.iteritems() if my_list[0] in v}
However if I remove the [0] I get the following error:
TypeError: unhashable type: 'list' Here are some expected output values:
'115 : {'174': 4.0, '172': 4.0, '27': 3.0, '183': 5.0, '180': 3.0, '1039': 5.0, ... }
'212' : {'4': 3.0, '473': 4.0, '208': 5.0, '123': 4.0, '476': 3.0, '474': 4.0, ...}
As you can see, the third value in the first tuple has '27' in it which was in my_list. The first value in the second tuple has the value '4' which also was in my_list.
I would very much appreciate any sort of help on this.
Thank you!
You can use a set, keeping pairs if value is not disjoint from the set:
st = {'2', '4', '17', '21', '22', '24', '27', '28', '29', '33', '39'}
filtered_dict = {k: v for (k, v) in my_dict.iteritems() if not st.isdisjoint(v)}
They will only be disjoint if they do not share at least one element:
In [6]: st ={1,2,3}
In [7]: v = [1,4,5]
In [8]: st.isdisjoint(v)
Out[8]: False
In [11]: v = [7,4,5]
In [12]: st.isdisjoint(v)
Out[12]: True
It is somewhat equivalent to:
st = {'2', '4', '17', '21', '22', '24', '27', '28', '29', '33', '39'}
filtered_dict = {k: v for (k, v) in my_dict.iteritems() if any(val in st for val in v)}
Where if any element from v is in the set we keep the elements.
Worst case we check len(v)
elements against the set, iterating over the mylist
elements and checking if x is in v is O(len(my_list))
for every case and also creates the same k,v pairing every time you get a match.
If we add a dict with more match keys and add the pairs to to a list, you can see what happens:
In [22]: my_list = ['2', '4', '17', '21', '22', '24', '27', '28', '29', '33', '39']
In [23]: my_dict = {'217': {'586': 2.0, '578': 5.0, '172': 1.0, '1222': 1.0, '597': 4.0, '1303': 2.0, '195': 5.0},
....: '660': {'176': 4.0, '174': 3.0, '231': 5.0, '233': 4.0, '797': 4.0, '541': 3.0, '27': 1.0, '210': 4.0},
....: '661': {'2': 4.0, '4': 3.0, '29': 5.0, '17': 4.0, '39': 4.0, '541': 3.0, '27': 1.0, '210': 4.0}}
In [24]: print [(k,v) for k,v in my_dict.iteritems() for x in my_list if x in v]
[('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('661', {'541': 3.0, '39': 4.0, '2': 4.0, '4': 3.0, '17': 4.0, '210': 4.0, '27': 1.0, '29': 5.0}), ('660', {'797': 4.0, '27': 1.0, '541': 3.0, '210': 4.0, '176': 4.0, '174': 3.0, '231': 5.0, '233': 4.0})]
Because there are 7 matches, we get the same pairing seven times, you won't get dupes in a dict but you will have to iterate len(my_list) times and create the pairing length of x in v
being True times. As the length of my_list grows the run time will increase accordingly.
Just making my_list 1000 elements there is a considerable difference:
In [35]: len(my_list)
Out[35]: 1000
In [36]: %%timeit
....: st = set(my_list)
....: {k: v for (k, v) in my_dict.iteritems() if not st.isdisjoint(v)}
....:
10000 loops, best of 3: 21.9 µs per loop
In [37]: timeit {k:v for k,v in my_dict.iteritems() for x in my_list if x in v}
10000 loops, best of 3: 136 µs per loop
Almost all that time is spent in the set creation, if you use a set from the start it runs 100 times faster:
In [40]: timeit {k: v for (k, v) in my_dict.iteritems() if not st.isdisjoint(v)}
1000000 loops, best of 3: 1.09 µs per loop
If i understand your question correctly, you need to traverse list mylist
and compare each element with values of dictionary mydict
{k:v for k,v in mydict.iteritems() for x in mylist if x in v}
1) k:v for k,v in mydict.iteritems() is iterating your dictionary pair by pair.
2) for x in mylist if x in v is iterating mylist and checks whether mylist element in
inner dictionary v, therefore checks x in v
is equal to checking x in v.keys()
which x in the keys of v
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With