I have a dictionary of dictionaries in Python 2.7.
I need to quickly count the number of all keys, including the keys within each of the dictionaries.
So in this example I would need the number of all keys to be 6:
dict_test = {'key2': {'key_in3': 'value', 'key_in4': 'value'}, 'key1': {'key_in2': 'value', 'key_in1': 'value'}}
I know I can iterate through each key with for loops, but I am looking for a quicker way to do this, since I will have thousands/millions of keys and doing this is just ineffective:
count_the_keys = 0 for key in dict_test.keys(): for key_inner in dict_test[key].keys(): count_the_keys += 1 # something like this would be more effective # of course .keys().keys() doesn't work print len(dict_test.keys()) * len(dict_test.keys().keys())
The dict. keys() method in Python Dictionary, returns a view object that displays a list of all the keys in the dictionary in order of insertion.
The number of distinct words (i.e. count of entries in the dictionary) can be found using the len() function.
keys() method in Python is used to retrieve all of the keys from the dictionary. The keys must be of an immutable type (string, number, or tuple with immutable elements) and must be unique.
Keeping it Simple
If we know all the values are dictionaries, and do not wish to check that any of their values are also dictionaries, then it is as simple as:
len(dict_test) + sum(len(v) for v in dict_test.itervalues())
Refining it a little, to actually check that the values are dictionaries before counting them:
len(dict_test) + sum(len(v) for v in dict_test.itervalues() if isinstance(v, dict))
And finally, if you wish to do an arbitrary depth, something like the following:
def sum_keys(d): return (0 if not isinstance(d, dict) else len(d) + sum(sum_keys(v) for v in d.itervalues()) print sum_keys({'key2': {'key_in3': 'value', 'key_in4': 'value'}, 'key1': {'key_in2': 'value', 'key_in1': dict(a=2)}}) # => 7
In this last case, we define a function that will be called recursively. Given a value d
, we return either:
0
if that value is not a dictionary; orMaking it Faster
The above is a succinct and easily understood approach. We can get a little faster using a generator:
def _counter(d): # how many keys do we have? yield len(d) # stream the key counts of our children for v in d.itervalues(): if isinstance(v, dict): for x in _counter(v): yield x def count_faster(d): return sum(_counter(d))
This gets us a bit more performance:
In [1]: %timeit sum_keys(dict_test) 100000 loops, best of 3: 4.12 µs per loop In [2]: %timeit count_faster(dict_test) 100000 loops, best of 3: 3.29 µs per loop
How about
n = sum([len(v)+1 for k, v in dict_test.items()])
What you are doing is iterating over all keys k and values v. The values v are your subdictionaries. You get the length of those dictionaries and add one to include the key used to index the subdictionary.
Afterwards you sum over the list to get the complete number of keys.
EDIT:
To clarify, this snippet works only for dictionaries of dictionaries as asked. Not dictionaries of dictionaries of dictionaries...
So do not use it for nested example :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With