Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python dict: get vs setdefault

The following two expressions seem equivalent to me. Which one is preferable?

data = [('a', 1), ('b', 1), ('b', 2)]  d1 = {} d2 = {}  for key, val in data:     # variant 1)     d1[key] = d1.get(key, []) + [val]     # variant 2)     d2.setdefault(key, []).append(val) 

The results are the same but which version is better or rather more pythonic?

Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp. If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference. While certainly smooth it is not intuitive in the least (at least to me).

To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key"). But while more intuitive to understand, I fear this version is less performant, with all this list creating. Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone. Probably there is a better implementation using get, but presently it eludes me.

My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable. Opinions?

like image 642
Cerno Avatar asked Sep 14 '11 21:09

Cerno


People also ask

What is the difference between GET and Setdefault in Python?

The get the method allows you to avoid getting a KeyError when the key doesn't exist however setdefault the method allows set default values when the key doesn't exist.

What is Setdefault in Python dictionary?

Python Dictionary setdefault() Method The setdefault() method returns the value of the item with the specified key. If the key does not exist, insert the key, with the specified value, see example below.

Should I use dict () or {}?

With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case.

What is the use of Setdefault method in dictionary?

The setdefault() method returns the value of a key (if the key is in dictionary). If not, it inserts key with a value to the dictionary.


2 Answers

Your two examples do the same thing, but that doesn't mean get and setdefault do.

The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.

Making the two methods as similar as possible, I ran

from timeit import timeit  print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000) print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000) print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000) print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000) 

and got

0.794723378711 0.811882272256 0.724429205999 0.722129751973 

So setdefault is around 10% faster than get for this purpose.

The get method allows you to do less than you can with setdefault. You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.

See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.

The thread about setdefault concludes that most of the time, you want to use a defaultdict. The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).

like image 65
agf Avatar answered Sep 20 '22 08:09

agf


The accepted answer from agf isn't comparing like with like. After:

print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000) 

d[0] contains a list with 10,000 items whereas after:

print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000) 

d[0] is simply []. i.e. the d.setdefault version never modifies the list stored in d. The code should actually be:

print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000) 

and in fact is faster than the faulty setdefault example.

The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), i.e. effectively constant time.

Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.

So, assuming d3, d4 = defaultdict(list), {}

# variant 1 (0.39) d1[key] = d1.get(key, []) + [val] # variant 2 (0.003) d2.setdefault(key, []).append(val) # variant 3 (0.0017) d3[key].append(val) # variant 4 (0.002) if key in d4:     d4[key].append(val) else:     d4[key] = [val] 

variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.

I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit. Avoid both of your original variants.

like image 41
Duncan Avatar answered Sep 20 '22 08:09

Duncan