Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When dictionary keys are identical, why does Python keep only the last key-value pair?

Let's say I create a dictionary a_dictionary where two of the key-value pairs have an identical key:

In [1]: a_dictionary = {'key': 5, 'another_key': 10, 'key': 50} 

In [2]: a_dictionary
Out[2]: {'key': 50, 'another_key': 10}

Why does Python choose here to keep the last key-value pair instead of throwing an error (or at least raising a warning) about using identical keys?

The way I see it, the main downside here is that you may lose data without being aware.

(If it's relevant, I ran the code above on Python 3.6.4.)

like image 905
Alex Avatar asked Jan 01 '23 23:01

Alex


1 Answers

If your question is why Python dict displays were originally designed this way… Probably nobody knows.


We know when the decision was made. Python 0.9.x (1991-1993) didn't have dict displays; Python 1.0.x (1994) did. And they worked exactly the same as they do today. From the docs:1

A dictionary display yields a new dictionary object.

The key/datum pairs are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum.

Restrictions on the types of the key values are listed earlier in section types.

Clashes between duplicate keys are not detected; the last datum (textually rightmost in the display) stored for a given key value prevails.

And, testing it:

$ ./python
Python 1.0.1 (Aug 21 2018)
Copyright 1991-1994 Stichting Mathematisch Centrum, Amsterdam
>>> {'key': 1, 'other': 2, 'key': 3}
{'other': 2, 'key': 3}

But there's no mention of why Guido chose this design in:

  • The 1.0 docs.
  • The Design & History FAQ.
  • Guido's History of Python blog.
  • Anywhere else I can think of that might have it.

Also, if you look at different languages with similar features, some of them keep the last key-value pair like Python, some keep an arbitrary key-value pair, some raise some kind of error… there are enough of each that you can't argue that this was the one obvious design and that's why Guido chose it.


If you want a wild guess that's probably no better than what you could guess on your own, here's mine:

The compiler not only could, but does, effectively construct const values out of literals by creating an empty dict and inserting key-values pairs into it. So, you get duplicates-allowed, last-key-wins semantics by default; if you wanted anything else, you'd have to write extra code. And, without a compelling reason to pick one over another, Guido chose to not write extra code.


So, if there's no compelling reason for the design, why has nobody tried to change it in the 24 years since?

Well, someone filed a feature request (b.p.o. #16385), to made duplicate keys an error in 3.4. but apparently went away when it was suggested it bring it up on -ideas.) It may well have come up a few others times, but obviously nobody wanted it changed badly enough to push for it.

Meanwhile, he closest thing to an actual argument for Python's existing behavior is this comment by Terry J. Reedy:

Without more use cases and support (from discussion on python-ideas), I think this should be rejected. Being able to re-write keys is fundamental to Python dicts and why they can be used for Python's mutable namespaces. A write-once or write-key-once dict would be something else.

As for literals, a code generator could depend on being able to write duplicate keys without having to go back and erase previous output.


1. I don't think the docs for 1.0 are directly linkable anywhere, but you can download the whole 1.0.1 source archive and build the docs from the TeX source.

like image 170
abarnert Avatar answered Jan 04 '23 13:01

abarnert