Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does defaultdict default_factory default to None?

You don't have to specify a default factory (but it's the same if you pass None explicitly)

>>> from collections import defaultdict
>>> defaultdict()
defaultdict(None, {})
>>> defaultdict(None)
defaultdict(None, {})

Why None though? Then we get this thing:

>>> dd = defaultdict()
>>> dd[0]
# TypeError: 'NoneType' object is not callable  <-- expected behaviour
# KeyError: 0                                   <-- actual behaviour

It's even explicitly allowed, because if you try to make a default dict from some other object, defaultdict(0) say, there is a failing check

TypeError: first argument must be callable or None

I thought something like lambda: None would be a better default factory. Why is the default_factory optional? I don't understand the use-case.

like image 357
wim Avatar asked Mar 10 '17 20:03

wim


People also ask

How does Defaultdict work Defaultdict will automatically?

The Python defaultdict type behaves almost exactly like a regular Python dictionary, but if you try to access or modify a missing key, then defaultdict will automatically create the key and generate a default value for it. This makes defaultdict a valuable option for handling missing keys in dictionaries.

How does Defaultdict work How does Defaultdict work?

A defaultdict works exactly like a normal dict, but it is initialized with a function (“default factory”) that takes no arguments and provides the default value for a nonexistent key. A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.

Does Defaultdict maintain order?

DefaultDict ,on append elements, maintain keys sorted in the order of addition [duplicate]

Is Defaultdict slower than dict?

It depends on the data; setdefault is faster and simpler with small data sets; defaultdict is faster for larger data sets with more homogenous key sets (ie, how short the dict is after adding elements);


2 Answers

When Guido van Rossum initially proposed a DefaultDict it had a default value (unlike the current defaultdict which uses a callable rather than a value) that was set during construction and was read-only (also unlike defaultdict).

After some discussion Guidio revised the proposal. Here are the relevant highlights:

Many, many people suggested to use a factory function instead of a default value. This is indeed a much better idea (although slightly more cumbersome for the simplest cases).

...

Let's add a generic missing-key handling method to the dict class, as well as a default_factory slot initialized to None.

...

[T]he default implementation is designed so that we can write

d = {}
d.default_factory = list

The important thing to note is that the new functionality no longer belongs to a subclass. That means that setting the default_factory in the constructor would break existing code. So by design setting the default_factory had to happen after the dict was created. It's initial value is set to None and it's now a mutable attribute so that it can be meaningfully overwritten.

After yet more discussion, it was decided that maybe it would be best not to complicate the regular dict type with a defaultdict specialization.

Steven Bethard then asked for clarification regarding the constructor:

Should default_factory be an argument to the constructor? The three answers I see:

  • "No." I'm not a big fan of this answer. Since the whole point of creating a defaultdict type is to provide a default, requiring two statements (the constructor call and the default_factory assignment) to initialize such a dictionary seems a little inconvenient.
  • "Yes and it should be followed by all the normal dict constructor arguments." This is okay, but a few errors, like defaultdict({1:2}) will pass silently (until you try to use the dict, of course).
  • "Yes and it should be the only constructor argument." This is my favorite mainly because I think it's simple, and I couldn't think of good examples where I really wanted to do defaultdict(list, some_dict_or_iterable) or defaultdict(list, **some_keyword_args). It's also forward compatible if we need to add some of the dict constructor args in later.

Guido van Rossum decided that:

The defaultdict signature takes an optional positional argument which is the default_factory, defaulting to None. The remaining positional and all keyword arguments are passed to the dict constructor. IOW:

d = defaultdict(list, [(1, 2)])

is equivalent to:

d = defaultdict()  
d.default_factory = list  
d.update([(1, 2)])

Note that the expanded code mirrors exactly how it worked when Guido was considering altering dict to provide the defaultdict behavior.

He also provides some justifications upthread:

Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful.

Bengt Richter explains why you might want a mutable default factory:

My guess is that realistically default_factory will be used to make clean code for filling a dict, and then turning the factory off if it's to be passed into unknown contexts. Those contexts can then use old code to do as above, or if worth it can temporarily set a factory to do some work. Tightly coupled code I guess could pass factory-enabled dicts between each other.

like image 84
Steven Rumbalski Avatar answered Sep 19 '22 23:09

Steven Rumbalski


My guess is that the design is intentional in order to make a defaultdict instance act like a normal dict, by default, whilst allowing the behaviour to be dynamically modified by simple attribute access later on.

For example:

>>> d = defaultdict()
>>> d['k']  # hey I'm just a plain old dict ;) 
KeyError: 'k'
>>> d.default_factory = list
>>> d['L']  # actually, I'm really a defaultdict(list)
[]
>>> d.default_factory = int  # just kidding!  I'm a counter
>>> d['i']
0
>>> d
defaultdict(int, {'L': [], 'i': 0})

And we can reset it to something that looks like a vanilla dict (which will again raise KeyError), by setting the factory back to None.

I have yet to find a pattern where this could be useful, but this usage wouldn't be possible if it was forced to instantiate default dict with one callable positional argument.

like image 42
wim Avatar answered Sep 19 '22 23:09

wim