I was reading about the collection defaultdict and came across these lines of code:
import collections
tree = lambda: collections.defaultdict(tree)
some_dict = tree()
some_dict['colours']['favourite'] = "yellow"
I understand that lamba takes a variable and performs some function on it. I've seen lambda being used like this: lambda x: x + 3 In the second line of code above, what variable is lambda taking and what function is it carrying out?
I also understand that defaultdict can take parameters such as int or list. In the second line, defaultdict takes the parameter tree which is a variable. What is the significance of that?
defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained. lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int) , which will do the same thing.
The Python defaultdict type behaves almost exactly like a regular Python dictionary, but if you try to access or modify a missing key, then defaultdict will automatically create the key and generate a default value for it. This makes defaultdict a valuable option for handling missing keys in dictionaries.
A defaultdict works exactly like a normal dict, but it is initialized with a function (“default factory”) that takes no arguments and provides the default value for a nonexistent key. A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.
defaultdict is not necessarily slower than a regular dict . The timings there are flawed, as the timings include creating the object. Other than that, there are different types of performance, maintenance ease being one.
The code is roughly equivalent (ignoring metadata introduced by the def
statement) to
import collections
def tree():
return collections.defaultdict(tree)
some_dict = tree()
some_dict['colours']['favourite'] = "yellow"
The lambda
expression simply defines a function of zero parameters, and the function is bound to the name tree
.
Typically, you only use lambda
expressions when you actually want an anonymous function, for example passing it as an argument to a another function, as in
sorted_list = sorted(some_list_of_tuples, key=lambda x: x[0])
It is considered better practice to use a def
statement when you really want a named function.
defaultdict
takes a callable to be used to produce a default value for a new key. int()
returns 0, list()
returns an empty list, and tree()
returns a new defaultdict
; all of them can be used as arguments to defaultdict
. The recursive nature of defining tree
to return a defaultdict
using itself as the default-value generator means you can generate nested dicts to an arbitrary depth; each "leaf" dict is itself another defaultdict
.
In the second line of code above, what variable is lambda taking and what function is it carrying out?
A lambda function is an anonymous (without name) function. So a lambda expression like:
tree = lambda: collections.defaultdict(tree)
is, except for some details (the fact that its __name__
attribute contains the name of the function, and not '<lambda>'
), it is equivalent to:
def tree():
return collectsions.defaultdict(tree)
The difference with a simple exression is thus that we here encode the computation in a function. We can never call it, call it once, or multiple times.
It also allows us to tie a knot. Notice that we pass a reference to the function (lambda expression) in the result. We thus have a function that construct a defaultdict
with as factory the function itself. We can thus recursively construct subtrees.
I also understand that
defaultdict
can take parameters such asint
orlist
. In the second line,defaultdict
takes the parameter tree which is a variable. What is the significance of that?
The tree
that we pass to the defaultdict
is thus a reference to the lambda-expression we construct. It thus means that in case the defaultdict
invokes the "factory". We get another defaultdict
with as factory again the tree
.
If we thus call some_dict['foo']['bar']['qux']
. We thus have a defaultdict
in a defaultdict
in a defaultdict
. All these defaultdict
s have as factory the tree
function. If we later construct extra children, these will again be a defaultdict
with tree
as constructor.
The list
or int
case is not special. If you invoke list
(like list()
), then you construct a new empty list. The same happens with int
: if you call int()
, you will obtain 0
. The fact that this is a reference to a class object is irrelevant: the defaultdict
does not take this into account (it does not know what the factory is, it only invokes it with no parameters).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With