Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i count occurrence of each word in document using Dictionary comprehension

I have a list of lists in python full of texts. It is like set words from each document. So for every document i have a list and then on list for all documents.

All the list contains only unique words. My purpose is to count occurrence of each word in the complete document. I am able to do this successfully using the below code:

for x in texts_list:
    for l in x:
        if l in term_appearance:
            term_appearance[l] += 1
        else:
            term_appearance[l] = 1

But I want to use dictionary comprehension to do the same. This is the first time, I am trying to write dictionary comprehension and using previous existing posts in stackoverflow, I have been able to write the following:

from collections import defaultdict
term_appearance = defaultdict(int)

{{term_appearance[l] : term_appearance[l] + 1 if l else term_appearance[l] : 1 for l in x} for x in texts_list}

Previous post for reference:

Simple syntax error in Python if else dict comprehension

As suggested in above post, I have also used the following code:

{{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}

The above code was successful in producing empty lists but ultimately threw the following traceback :

[]

[]

[]

[]

Traceback (most recent call last):

  File "term_count_fltr.py", line 28, in <module>

    {{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}
  File "term_count_fltr.py", line 28, in <setcomp>

    {{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}

TypeError: unhashable type: 'dict'

Any help in improving my current understanding would be much appreciated.

Looking at the above error, I also tried

[{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list]

This ran without any error but the output was empty lists only.

like image 341
Pappu Jha Avatar asked Oct 08 '15 03:10

Pappu Jha


1 Answers

Like explained in the other answers, the issue is that dictionary comprehension creates a new dictionary, so you don't get reference to that new dictionary until after it has been created. You cannot do dictionary comprehension for what you are doing.

Given that, what you are doing is trying to re-implement what is already done by collections.Counter . You could simply use Counter . Example -

from collections import Counter
term_appearance = Counter()
for x in texts_list:
    term_appearance.update(x)

Demo -

>>> l = [[1,2,3],[2,3,1],[5,4,2],[1,1,3]]
>>> from collections import Counter
>>> term_appearance = Counter()
>>> for x in l:
...     term_appearance.update(x)
...
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

If you really want to do this in some kind of comprehension, you can do:

from collections import Counter
term_appearance = Counter()
[term_appearance.update(x) for x in texts_list]

Demo -

>>> l = [[1,2,3],[2,3,1],[5,4,2],[1,1,3]]
>>> from collections import Counter
>>> term_appearance = Counter()
>>> [term_appearance.update(x) for x in l]
[None, None, None, None]
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

The output [None, None, None, None] is from the list comprehension resulting in that list (because this was run interactively), if you run this in a script as python <script>, that output would simply be discarded.


You can also use itertools.chain.from_iterable() to create a flattened list from your text_lists and then use that for Counter. Example:

from collections import Counter
from itertools import chain
term_appearance = Counter(chain.from_iterable(texts_list))

Demo -

>>> from collections import Counter
>>> from itertools import chain
>>> term_appearance = Counter(chain.from_iterable(l))
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

Also, another issue in your original code in line -

{{term_appearance[l] : term_appearance[l] + 1 if l else term_appearance[l] : 1 for l in x} for x in texts_list}

This is actually a set comprehension with a dictionary comprehension nested inside.

This is the reason you are getting the error - TypeError: unhashable type: 'dict' . Because after first running the dictionary comprehension and creating a dict , it is trying to add that into the set . But dictionaries are not hashable, hence the error.

like image 58
Anand S Kumar Avatar answered Oct 05 '22 00:10

Anand S Kumar