Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count all elements in a nested dictionary?

Tags:

How do I count the number of subelements in a nested dictionary in the most efficient manner possible? The len() function doesn't work as I initially expected it to:

>>> food_colors = {'fruit': {'orange': 'orange', 'apple': 'red', 'banana': 'yellow'}, 'vegetables': {'lettuce': 'green', 'beet': 'red', 'pumpkin': 'orange'}} >>> len(food_colors) 2 >>> 

What if I actually want to count the number of subelements? (e.g., expected result to be "6") Is there a better way to do this rather than looping through each element and summing the qty of subelements? In this particular application, I have about five million subelements to count and every clock cycle counts.

like image 569
jamieb Avatar asked Jan 03 '11 02:01

jamieb


2 Answers

Is it guaranteed that each top-level key has a dictionary as its value, and that no second-level key has a dictionary? If so, this will go as fast as you can hope for:

sum(len(v) for v in food_colors.itervalues()) 

If the data structure is more complicated, it will need more code, of course. I'm not aware of any intrinsics to do deep data structure walks.

like image 185
zwol Avatar answered Nov 12 '22 20:11

zwol


For your specific question, you can just use this:

>>> d={'fruit':           {'orange': 'orange', 'apple': 'red', 'banana': 'yellow'},         'vegetables':           {'lettuce': 'green', 'beet': 'red', 'pumpkin': 'orange'}} >>> len(d) 2            # that is 1 reference for 'fruit' and 1 for 'vegetables' >>> len(d['fruit']) 3            # 3 fruits listed... >>> len(d['vegetables']) 3            # you thought of three of those... >>> len(d['fruit'])+len(d['vegetables']) 6 

While you can use the various tools that Python has to count the elements in this trivial dictionary, the more interesting and productive thing is to think about the structure of the data in the first place.

The basic data structures of Python are lists, sets, tuples, and dictionaries. Any of these data structures can 'hold', by reference, any nested version of itself or the other data structures.

This list is a nested list:

>>> l = [1, [2, 3, [4]], [5, 6]] >>> len(l) 3 >>> l[0] 1 >>> l[1] [2, 3, [4]] >>> l[2] [5, 6] 

The first element is the integer 1. Elements 1 and 2 are lists themselves. The same can be true of any other of the basic Python data structures. These are recursive data structures. You can print them with pprint

If you organize your dictionary a bit better, it is easier to extract information from it with Python's simplest tools:

>>> color='color' >>> family='family' >>> sensation='sensation' >>> good_things={                'fruit':              {                 'orange':                      {                     color: 'orange',                      family: 'citrus',                     sensation: 'juicy'                     },                  'apple':                      {                     color: ['red','green','yellow'],                      family:'Rosaceae',                     'sensation': 'woody'                     },                 'banana':                      {                     color: ['yellow', 'green'],                     family: 'musa',                     sensation: 'sweet'                     }             },             'vegatables':              {                 'beets':                      {                     color: ['red', 'yellow'],                     family: 'Chenopodiaceae',                     sensation: 'sweet'                     },                 'broccoli':                     {                     color: 'green',                     family: 'kale',                     sensation: 'The butter you put on it',                     }             }         }     

Now the queries against that data make more sense:

>>> len(good_things) 2                        # 2 groups: fruits and vegetables >>> len(good_things['fruit']) 3                        # three fruits cataloged >>> len(good_things['vegetables']) 2                        # I can only think of two vegetables... >>> print good_things['fruit']['apple'] {'color': ['red', 'green', 'yellow'], 'sensation': 'woody', 'family': 'Rosaceae'} >>> len(good_things['fruit']['apple']['color']) 3                        # apples have 3 colors 
like image 44
dawg Avatar answered Nov 12 '22 19:11

dawg