Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python with a nested dictionary should I check the type? What other approach can I use?

Disclaimer: Hello all Python masters and fans. I would like to thank everyone for their caring support and dear advises which helped me so much. I am a Python newbie who is trying to learn and advance whilst keeping in mind the importance of best practices. Here's a question in which I am seeking a swift way to avoid type checking as if there's one thing I learnt here, that it's not a good thing to do and there must be another way of doing it.

I am constructing a data-object to represent a site map. I want this in memory so I can quickly map URLs before querying the DB.

Each node must have 2 elements. A name (of website's section) and an ID (its ID in the DB) (4 to 8 digits normally but here represented with one digit only).

If this node has children (on the page), it has Name, ID and another dictionary representing the children.

I have decided to use the following for performance, ease of iteration and memory reasons: I tried in the past only lists [name, id, [name, id, ..]], dictionaries and I think this is a not-such a bad way.

sitemap = {'section_one': 0,
           'section_two': [1, {'c_sect_2_1': 10,
                         'c_sect_2_2': [11, {'c_sect_2_2_1': 110,
                                           'c_sect_2_2_2': 111,
                                           'c_sect_2_2_3': 112}],
                          'c_sect_2_3': 12,
                          'c_sect_2_4': 13}],
           'section_three': 2,
           'section_four': 3,
           'section_five': 4}

I opted for lists because I might need to modify them (hence no tuples) I am using dictionaries (hashable) and I can easily check if they contain a section.

Using this data-set, with below function, I map a URL (e.g. /section_two/c_sect_2_2/c_sect_2_2_3) and see if it exists or not to fetch the data from the DB. My function for this:

def map_url(url): #url here is a list e.g. ['section_two', 'c_sect_2_2', 'c_sect_2_2_3']
    sm = sitemap
    for e in url:
        if e in sm:
            if isinstance(sm[e], int):
                return sm[e] #e = where it stops matching due to no children
            sm = sm[e][1] #if not INT it's a list. list[1] has another dict to go-through
    return False #the URL could not be matched-mapped

My questions are:

  1. Instead of checking if value of an item in the dictionary is an integer to see whether it has children or not, what should I do? What can I do?
  2. What might be an alternative to this whole thing? (the way data-structure is built and/or the iteration through it)

I need this way of url mapping as my website can have a lot of nested sections and I don't want to query the DB multiple times just to see if it exists or not.

Finally, I thank you all for your precious times and advises.

like image 963
Phil Avatar asked Dec 26 '22 19:12

Phil


2 Answers

Instead of checking if value of an item in the dictionary is an integer to see whether it has children or not, what should I do? What can I do?

The problem seems to be that you use different representations for sections with children and for sections without children. A section without children should really just be a section with an empty list of children:

sitemap = {'section_one': [0, {}],
           'section_two': [1, {'c_sect_2_1': [10, {}],
                               'c_sect_2_2': [11, {'c_sect_2_2_1': [110, {}],
                                                   'c_sect_2_2_2': [111, {}],
                                                   'c_sect_2_2_3': [112, {}]}],
                               'c_sect_2_3': [12, {}],
                               'c_sect_2_4': [13, {}]}],
           'section_three': [2, {}],
           'section_four': [3, {}],
           'section_five': [4, {}]}

Now your code should become a bit simpler.

What might be an alternative to this whole thing? (the way data-structure is built and/or the iteration through it)

You could transform the sitemap into a flat dictionary at the start of your program, so that it becomes something like

flat_sitemap = { 
    'section_one': 0,
    'section_two': 1,
    'section_two/c_sect_2_1': 10,
       # ...
    'section_two/c_sect_2_2/c_sect_2_2_1': 110
       # ...
    }

That way your queries would work in expected O(1) time at the cost of a higher space usage.

As for processing the original structure in a different way, you could use recursion. I often find it easier to formulate an algorithm on a tree-like structure in a recursive way, but it depends a bit on your way of thinking. Here's an example (I'm assuming the format of sitemap that is shown in my first sample):

def map_url(url, sm=[None, sitemap]):
    if not url: return sm[0]
    if url[0] not in sm[1]: return False
    return map_url(url[1:], sm[1][url[0]])

print map_url(['section_two', 'c_sect_2_2', 'c_sect_2_2_3']) # => 112
print map_url(['section_two', 'c_sect_2_2'])                 # => 10
print map_url(['section_two', 'notexisting'])                # => False
print map_url([])                                            # => None

As you can see, this makes the special case explicit where you pass an empty URL. You should definitely think about what should happen in that particular case.

You can even leave the second line of the function away. In that case a KeyError would be thrown if an URL cannot be matched (which seems sensible as well).

like image 84
Niklas B. Avatar answered Dec 29 '22 07:12

Niklas B.


A more consistent approach would be to always use the combination of id and dict, but use an empty dict if there are no children.

sitemap = {'section_one': [0, {}],
           'section_two': [1, {'c_sect_2_1': [10,{}],
                         'c_sect_2_2': [11, {'c_sect_2_2_1': [110,{}],
                                           'c_sect_2_2_2': [111,{}],
                                           'c_sect_2_2_3': [112,{}]}],
                          'c_sect_2_3': [12,{}],
                          'c_sect_2_4': [0,{}]}],
           'section_three':[2,{}],
           'section_four': [3,{}],
           'section_five': [4,{}]}

That way you can always check if a remaining url part is in the child dict.

like image 32
Michael Mauderer Avatar answered Dec 29 '22 07:12

Michael Mauderer