Let's say I have a long list of this type:
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'], ... ]
Given the first elements, I want to construct a dictionary that would show a count of the second elements. For example in the particular example above, I'd like to have something like this:
{'a': {'b':2, 'd':1},
'w': {'a':1}
}
Here's how I unsuccessfully tried to solve it. I constructed a list of unique first elements. Let's call it words
and then:
dic = {}
for word in words:
inner_dic = {}
for pair in text:
if pair[0] == word:
num = text.count(pair)
inner_dic[pair[1]] = num
dic[pair[0]] = inner_dic
I get an obviously erroneous result. One problem with the code is, it overcounts pairs. I am not sure how to solve this.
You should do this instead:
for word in words:
inner_dic = {}
for pair in text:
if pair[0] == word:
num = text.count(pair)
inner_dic[pair[1]] = num
dic[word] = inner_dic
that is, you should be doing dic[word]
rather than dic[pair[0]]
, which will assign the inner_dic
to the first element in the last pair
checked, even if pair[0]
isn't word
.
The collections module makes short work of tasks like this.
Use a Counter for the counting part (it is a kind of dictionary that returns 0 for missing values, making it easy to use +=1
for incrementing counts). Use defaultdict for the outer dict (it can automatically make a new counter for each "first" prefix):
>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]
>>> for first, second in text:
d[first][second] += 1
Here is the equivalent using regular dictionaries:
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]
d = {}
for first, second in text:
if first not in d:
d[first] = {}
inner_dict = d[first]
if second not in inner_dict:
inner_dict[second] = 0
inner_dict[second] += 1
Either the short way or the long way will work perfectly with the json module (both Counter and defaultdict are kinds of dicts that can be JSON encoded).
Hope this helps. Good luck with your text analysis :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With