Intersecting two dictionaries

Q: What is 2D dictionary?

adjective. of, relating to, or representing something in two dimensions; two-dimensional: 2D computer graphics.

Tags:

I am working on a search program over an inverted index. The index itself is a dictionary whose keys are terms and whose values are themselves dictionaries of short documents, with ID numbers as keys and their text content as values.

To perform an 'AND' search for two terms, I thus need to intersect their postings lists (dictionaries). What is a clear (not necessarily overly clever) way to do this in Python? I started out by trying it the long way with iter:

p1 = index[term1]   p2 = index[term2] i1 = iter(p1) i2 = iter(p2) while ...  # not sure of the 'iter != end 'syntax in this case ...

867

asked Sep 01 '13 00:09

norman

2 Answers

A little known fact is that you don't need to construct sets to do this:

In Python 2:

In [78]: d1 = {'a': 1, 'b': 2}  In [79]: d2 = {'b': 2, 'c': 3}  In [80]: d1.viewkeys() & d2.viewkeys() Out[80]: {'b'}

In Python 3 replace viewkeys with keys; the same applies to viewvalues and viewitems.

From the documentation of viewitems:

In [113]: d1.viewitems?? Type:       builtin_function_or_method String Form:<built-in method viewitems of dict object at 0x64a61b0> Docstring:  D.viewitems() -> a set-like object providing a view on D's items

For larger dicts this also slightly faster than constructing sets and then intersecting them:

In [122]: d1 = {i: rand() for i in range(10000)}  In [123]: d2 = {i: rand() for i in range(10000)}  In [124]: timeit d1.viewkeys() & d2.viewkeys() 1000 loops, best of 3: 714 µs per loop  In [125]: %%timeit s1 = set(d1) s2 = set(d2) res = s1 & s2  1000 loops, best of 3: 805 µs per loop  For smaller `dict`s `set` construction is faster:  In [126]: d1 = {'a': 1, 'b': 2}  In [127]: d2 = {'b': 2, 'c': 3}  In [128]: timeit d1.viewkeys() & d2.viewkeys() 1000000 loops, best of 3: 591 ns per loop  In [129]: %%timeit s1 = set(d1) s2 = set(d2) res = s1 & s2  1000000 loops, best of 3: 477 ns per loop

We're comparing nanoseconds here, which may or may not matter to you. In any case, you get back a set, so using viewkeys/keys eliminates a bit of clutter.

138

answered Oct 16 '22 01:10

Phillip Cloud

In general, to construct the intersection of dictionaries in Python, you can first use the & operator to calculate the intersection of sets of the dictionary keys (dictionary keys are set-like objects in Python 3):

dict_a = {"a": 1, "b": 2} dict_b = {"a": 2, "c": 3}   intersection = dict_a.keys() & dict_b.keys()  # {'a'}

On Python 2 you have to convert the dictionary keys to sets yourself:

keys_a = set(dict_a.keys()) keys_b = set(dict_b.keys()) intersection = keys_a & keys_b

Then given the intersection of the keys, you can then build the intersection of your values however is desired. You have to make a choice here, since the concept of set intersection doesn't tell you what to do if the associated values differ. (This is presumably why the & intersection operator is not defined directly for dictionaries in Python).

In this case it sounds like your values for the same key would be equal, so you can just choose the value from one of the dictionaries:

dict_of_dicts_a = {"a": {"x":1}, "b": {"y":3}} dict_of_dicts_b = {"a": {"x":1}, "c": {"z":4}}   shared_keys = dict_of_dicts_a.keys() & dict_of_dicts_b.keys()  # values equal so choose values from a: dict_intersection = {k: dict_of_dicts_a[k] for k in shared_keys }  # {"a":{"x":1}}

Other reasonable methods of combining values would depend on the types of the values in your dictionaries, and what they represent. For example you might also want the union of values for shared keys of dictionaries of dictionaries. Since the union of dictionaries doesn't depend on the values, it is well defined, and in python you can get it using the | operator:

# union of values for each key in the intersection: dict_intersection_2 = { k: dict_of_dicts_a[k] | dict_of_dicts_b[k] for k in shared_keys }

Which in this case, with identical dictionary values for key "a" in both, would be the same result.

answered Oct 16 '22 01:10

James

Related questions
                            
                                Initializing a dictionary in python with a key value and no corresponding values
                            
                                Why is tuple faster than list in Python?
                            
                                Python - PIP install trouble shooting - PermissionError: [WinError 5] Access is denied
                            
                                Flask to return image stored in database
                            
                                Python: Pandas Dataframe how to multiply entire column with a scalar
                            
                                Get time zone information of the system in Python?
                            
                                How to make python argparse mutually exclusive group arguments without prefix?
                            
                                Python wildcard search in string
                            
                                How do I get the current date and current time only respectively in Django?
                            
                                Error trying to install Postgres for python (psycopg2)
                            
                                Find the date for the first Monday after a given date
                            
                                Get all text inside a tag in lxml
                            
                                How can I convert radians to degrees with Python?
                            
                                How can I denote unused function arguments?
                            
                                inverting image in Python with OpenCV
                            
                                Debugging the error "gcc: error: x86_64-linux-gnu-gcc: No such file or directory"
                            
                                Find Monday's date with Python
                            
                                SSL: CERTIFICATE_VERIFY_FAILED with Python3
                            
                                Python urllib2, basic HTTP authentication, and tr.im
                            
                                Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Intersecting two dictionaries

Tags:

python

dictionary

iteration

intersection

norman

People also ask

2 Answers

Phillip Cloud

James

Recent Activity

Donate For Us