I am using <code>defaultdict(set)</code> to populate an internal mapping in a very large data structure. After it's populated, the whole structure (including the mapping) is exposed to the client code. At that point, I don't want anyone modifying the mapping. And nobody does, intentionally. But sometimes, client code may by accident refer to an element that doesn't exist. At that point, a normal dictionary would have raised <code>KeyError</code>, but since the mapping is <code>defaultdict</code>, it simply creates a new element (an empty set) at that key. This is quite hard to catch, since everything happens silently. But I need to ensure this doesn't happen (the semantics actually doesn't break, but the mapping grows to a huge size). What should I do? I can see these choices: <ol> <li>Find all the instances in current and future client code where a dictionary lookup is performed on the mapping, and convert it to <code>mapping.get(k, {})</code> instead. This is just terrible.</li> <li>"Freeze" <code>defaultdict</code> after the data structure is fully initialized, by converting it to <code>dict</code>. (I know it's not really frozen, but I trust client code to not actually write <code>mapping[k] = v</code>.) Inelegant, and a large performance hit.</li> <li>Wrap <code>defaultdict</code> into a <code>dict</code> interface. What's an elegant way to do that? I'm afraid the performance hit may be huge though (this lookup is heavily used in tight loops).</li> <li>Subclass <code>defaultdict</code> and add a method that "shuts down" all the <code>defaultdict</code> features, leaving it to behave as if it's a regular <code>dict</code>. It's a variant of 3 above, but I'm not sure if it's any faster. And I don't know if it's doable without relying on the implementation details.</li> <li>Use regular <code>dict</code> in the data structure, rewriting all the code there to first check if the element is in the dictionary and adding it if it's not. Not good.</li> </ol>

<code>defaultdict</code> docs say for <code>default_factory</code>: <blockquote> If the default_factory attribute is None, this raises a KeyError exception with the key as argument. </blockquote> What if you just set your defaultdict's default_factory to <code>None</code>? E.g., <pre class="prettyprint"><code>>>> d = defaultdict(int) >>> d['a'] += 1 >>> d defaultdict(<type 'int'>, {'a': 1}) >>> d.default_factory = None >>> d['b'] += 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'b' >>> </code></pre> Not sure if this is the best approach, but seems to work.

Once you have finished populating your defaultdict, you can simply create a regular dict from it: <pre class="prettyprint"><code>my_dict = dict(my_default_dict) </code></pre> One can optionally use the <code>typing.Final</code> type annotation. If the default dict is a recursive default dict, see this answer which uses a recursive solution.

Exposing `defaultdict` as a regular `dict`

Tags:

python

python-3.x

defaultdict

wrapper

I am using defaultdict(set) to populate an internal mapping in a very large data structure. After it's populated, the whole structure (including the mapping) is exposed to the client code. At that point, I don't want anyone modifying the mapping.

And nobody does, intentionally. But sometimes, client code may by accident refer to an element that doesn't exist. At that point, a normal dictionary would have raised KeyError, but since the mapping is defaultdict, it simply creates a new element (an empty set) at that key. This is quite hard to catch, since everything happens silently. But I need to ensure this doesn't happen (the semantics actually doesn't break, but the mapping grows to a huge size).

What should I do? I can see these choices:

Find all the instances in current and future client code where a dictionary lookup is performed on the mapping, and convert it to mapping.get(k, {}) instead. This is just terrible.
"Freeze" defaultdict after the data structure is fully initialized, by converting it to dict. (I know it's not really frozen, but I trust client code to not actually write mapping[k] = v.) Inelegant, and a large performance hit.
Wrap defaultdict into a dict interface. What's an elegant way to do that? I'm afraid the performance hit may be huge though (this lookup is heavily used in tight loops).
Subclass defaultdict and add a method that "shuts down" all the defaultdict features, leaving it to behave as if it's a regular dict. It's a variant of 3 above, but I'm not sure if it's any faster. And I don't know if it's doable without relying on the implementation details.
Use regular dict in the data structure, rewriting all the code there to first check if the element is in the dictionary and adding it if it's not. Not good.

551

asked Nov 20 '12 02:11

max

2 Answers

defaultdict docs say for default_factory:

If the default_factory attribute is None, this raises a KeyError exception with the key as argument.

What if you just set your defaultdict's default_factory to None? E.g.,

>>> d = defaultdict(int) >>> d['a'] += 1 >>> d defaultdict(<type 'int'>, {'a': 1}) >>> d.default_factory = None >>> d['b'] += 2 Traceback (most recent call last):   File "<stdin>", line 1, in <module> KeyError: 'b' >>>

Not sure if this is the best approach, but seems to work.

122

answered Sep 22 '22 08:09

Neal

Once you have finished populating your defaultdict, you can simply create a regular dict from it:

my_dict = dict(my_default_dict)

One can optionally use the typing.Final type annotation.

If the default dict is a recursive default dict, see this answer which uses a recursive solution.

answered Sep 19 '22 08:09

Asclepius

Related questions
                            
                                Swap two rows in a numpy array in python [duplicate]
                            
                                How to get hard disk serial number using Python
                            
                                Override module method where from...import is used
                            
                                Get column name where value is something in pandas dataframe
                            
                                Tkinter messagebox without window?
                            
                                Python best practice in terms of logging
                            
                                Using an OrderedDict in **kwargs
                            
                                OpenCV resize fails on large image with "error: (-215) ssize.area() > 0 in function cv::resize"
                            
                                How to cache Django Rest Framework API calls?
                            
                                Group by two columns and count the occurrences of each combination in Pandas
                            
                                merging 2 dataframes vertically [duplicate]
                            
                                Unicode vs UTF-8 confusion in Python / Django?
                            
                                cursor.rowcount always -1 in sqlite3 in python3k
                            
                                for line in open(filename)
                            
                                Python Queue get()/task_done() issue
                            
                                Why won't re.groups() give me anything for my one correctly-matched group?
                            
                                Bytes in a unicode Python string
                            
                                Does sqlite3 compress data?
                            
                                python append to array in json object
                            
                                How can I get Bottle to restart on file change?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With