Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random "int is not subscriptable" behaviour

I'm reading a valid JSON file (nested 5 levels deep), then adding some data to it, and subsequently trying to use that data for some calculations.

I'm getting int is not subscriptable errors in a random fashion. I can't wrap my head around it. casting to str() doesn't help, printing with pprint doesn't alleviate it, casting to int() on input doesn't help either. I'm desperately running out of options...

main function

with open(rNgram_file, 'r', encoding='utf-8') as ngram_file:
    data = json.load(ngram_file)
    data = rank_items(data)
    data = probability_items(data)

rank_items(data)

All the values are counted at the 5-nested level, and added working upwards in the tree. I added the int() cast to the input as a possible solution, but that didn't help. The issue occurs with getting the x_grams['_rank']

for ngram, one_grams in data.items():
        ngram_rank = 0
        for one_gram, two_grams in one_grams.items():
            one_gram_rank = 0
           [..]
                for four_gram, values in four_grams.items():
                # 4gram = of, values = 34
                three_gram_rank += values
                four_grams['_rank'] = int(three_gram_rank)
                two_gram_rank += three_gram_rank
           [..]    
            two_grams['_rank'] = int(one_gram_rank)
            ngram_rank += one_gram_rank
        one_grams['_rank'] = int(ngram_rank)

probability_items(data)

This is where the errors occur. Seemingly at random, it will complain about int is not subscriptable where x_rank or x_grams['rank] are being printed or assigned, even if they are evaluated with type() (which, if it works, says <class 'int'>) I marked the most common lines with a comment below. Weirdly enough, line 2 and 3 never raise an exception...

    for ngram, one_grams in data.items():
        ngram_rank = int(one_grams['_rank'])               # never gives an error
        print("NgramRank: ", str(ngram_rank))              # never gives an error
        if ngram != '_rank':
            for one_gram, two_grams in one_grams.items():
                pprint(type(two_grams['_rank']))             # common error point
                one_gram_rank = str(two_grams['_rank'])      # never reaches this
                if one_gram != '_rank':
                    for two_gram, three_grams in two_grams.items():
                        pprint(type(three_grams['_rank']))   # common error point
                        pprint(str(three_grams['_rank']))    # never reaches this
                        two_gram_rank = str(three_grams['_rank'])
                        [..]
                    one_gram_prob = int(one_gram_rank) / int(ngram_rank)
                    two_grams['_prob'] = one_gram_prob
            ngram_prob = int(ngram_rank) / int(ngram_rank)
            one_grams['_prob'] = ngram_prob

In randowm fashion, an exception is thrown on the common error point above. Because of these exceptions, the lines below is never reached. But if you delete the common error points, the lines below become the error points. And sometimes, It does a full run through all the way in the inner-for-loop, printing <class 'int'> when evaluated, and all, until it halts at an exception.

I have no clue what's happening, I don't even understand how this error can occur when i'm evaluating it with Type()

Since this is a weird issue, and I'm obviously making a weird mistake, I put all the code in a gist here: https://gist.github.com/puredevotion/7922480

Hope someone can help!

TraceBack details

['Traceback (most recent call last):\n', '  File "Ngram_ranking.py", line 121, in probability_items\n    pprint(type(four_grams[\'_rank\']))\n', "TypeError: 'int' object is not subscriptable\n"]

*** extract_tb:
[('Ngram_ranking.py', 121, 'probability_items', "pprint(type(four_grams['_rank']))")]

*** format_tb:
['  File "Ngram_ranking.py", line 121, in probability_items\n    pprint(type(four_grams[\'_rank\']))\n']

*** tb_lineno: 121
Exception in on line 121: pprint(type(four_grams['_rank'])): 'int' object is not subscriptable

TraceBack for line 115

['Traceback (most recent call last):\n', '  File "Ngram_ranking.py", line 115, in probability_items\n    pprint(type(three_grams[\'_rank\']))\n', "TypeError: 'int' object is not subscriptable\n"]

*** extract_tb:
[('Ngram_ranking.py', 115, 'probability_items', "pprint(type(three_grams['_rank']))")]

*** format_tb:
['  File "Ngram_ranking.py", line 115, in probability_items\n    pprint(type(three_grams[\'_rank\']))\n']

*** tb_lineno: 115
Exception in on line 115: pprint(type(three_grams['_rank'])): 'int' object is not subscriptable

PPRINT(data) at the top of probability_items(data)

{'aesthetic': {'_rank': 290,
           'feeling': {'_rank': 10,
                       'the': {'_rank': 10,
                               'feeling': {'_rank': 10, 'of': 10}}},
           'perception': {'_rank': 280,
                          'and': {'_rank': 190,
                                  'the': {'_rank': 190,
                                          'design': 15,
                                          'environment': 5,
                                          'music': 100,
                                          'painting': 15,
                                          'work': 5,
                                          'works': 50}},
                          'of': {'_rank': 90,
                                 'the': {'_rank': 50,
                                         'work': 30,
                                         'world': 20},
                                 'their': {'_rank': 40, 'female': 40}}}}}
like image 982
puredevotion Avatar asked Oct 03 '22 04:10

puredevotion


1 Answers

The problem is that you have a multi-level nested dictionary and you replicate the same code for all the three levels despite nesting being somewhat different.

I'll just take some part of your dictionary

{
'aesthetic': 
    {
    '_rank': 290,
    'feeling': 
        {
        '_rank': 10,
        'the': 
            {
            '_rank': 10,
            'feeling': 
                {
                '_rank': 10, 
                'of': 10
                }
            }
         },
    }
}

Your top level dictionary is uniform as the value (for key aesthetic) is always a dictionary. But the lower levels also have ints as some of their values.

Thus when you do

for ngram, one_grams in data.items():

you have ngram=aesthetics and one_grams={the dictionary}

int(one_grams['_rank'])

Will always work (as the value dictionary has the element _rank. So you never get an error here.

Now we move to the next step

one_gram, two_grams in one_grams.items()

Running .items() for one_grams dictionary gives

(one_gram,two_grams) = [('_rank', 290), ('feeling', {'_rank': 10, 'the': {'_rank': 10, 'feeling': {'_rank': 10, 'of': 10}}})]

Notice two_grams is an int for the first entry and a dict for the second. Since you iterate over the entire items() while doing

two_grams['_rank']

you run into the error (which tells you that you've hit an int when dict was expected). The same problem occurs in the inner loops.

As dictionaries are not ordered, items() can return in any order. Thus _rank may be the first element or below other dictionary elements. In that case you descend into the inner for loops and encounter the same problem there.

You can neglect _rank keys while iterating

for one_gram,two_grams one_grams.items(): 
    if one_gram=='_rank': 
        continue  

in all the loops.

like image 125
RedBaron Avatar answered Oct 04 '22 19:10

RedBaron