Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding frequencies of pair items in a list of pairs

Let's say I have a long list of this type:

text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'], ... ]

Given the first elements, I want to construct a dictionary that would show a count of the second elements. For example in the particular example above, I'd like to have something like this:

{'a': {'b':2, 'd':1},
 'w': {'a':1}
}

Here's how I unsuccessfully tried to solve it. I constructed a list of unique first elements. Let's call it words and then:

dic = {}

for word in words:
  inner_dic = {}
  for pair in text:
    if pair[0] == word:
      num = text.count(pair)
      inner_dic[pair[1]] = num
  dic[pair[0]] = inner_dic

I get an obviously erroneous result. One problem with the code is, it overcounts pairs. I am not sure how to solve this.

like image 404
Morteza R Avatar asked Dec 05 '22 04:12

Morteza R


2 Answers

You should do this instead:

for word in words:
  inner_dic = {}
  for pair in text:
    if pair[0] == word:
      num = text.count(pair)
      inner_dic[pair[1]] = num
  dic[word] = inner_dic

that is, you should be doing dic[word] rather than dic[pair[0]], which will assign the inner_dic to the first element in the last pair checked, even if pair[0] isn't word.

like image 106
rlms Avatar answered Dec 06 '22 19:12

rlms


The collections module makes short work of tasks like this.

Use a Counter for the counting part (it is a kind of dictionary that returns 0 for missing values, making it easy to use +=1 for incrementing counts). Use defaultdict for the outer dict (it can automatically make a new counter for each "first" prefix):

>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]
>>> for first, second in text:
    d[first][second] += 1

Here is the equivalent using regular dictionaries:

text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]

d = {}
for first, second in text:
    if first not in d:
        d[first] = {}
    inner_dict = d[first]
    if second not in inner_dict:
        inner_dict[second] = 0
    inner_dict[second] += 1

Either the short way or the long way will work perfectly with the json module (both Counter and defaultdict are kinds of dicts that can be JSON encoded).

Hope this helps. Good luck with your text analysis :-)

like image 35
Raymond Hettinger Avatar answered Dec 06 '22 19:12

Raymond Hettinger