Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python count tuples occurence in list

Is there a way to count how many times each tuple occurs in this list of tokens?

I have tried the count method but it does not work.

This is the list:

['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay']

These are the tuples based on the list:

('hello', 'how')
('how', 'are')
('are','you')
('you', 'doing')
('doing', 'today')
('today', 'are')
('you', 'okay')

I would like the result to be something like this

('hello', 'how')1
('how', 'are')1
('are','you')2
('you', 'doing')1
('doing', 'today')1
('today', 'are')1
('you', 'okay')1
like image 520
MyTivoli Avatar asked Dec 13 '22 23:12

MyTivoli


2 Answers

You can easily use a Counter for that. A generic function to count n-grams is the following:

from collections import Counter
from itertools import islice

def count_ngrams(iterable,n=2):
    return Counter(zip(*[islice(iterable,i,None) for i in range(n)]))

This generates:

>>> count_ngrams(['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'],2)
Counter({('are', 'you'): 2, ('doing', 'today'): 1, ('you', 'doing'): 1, ('you', 'okay'): 1, ('today', 'are'): 1, ('how', 'are'): 1, ('hello', 'how'): 1})
>>> count_ngrams(['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'],3)
Counter({('are', 'you', 'okay'): 1, ('you', 'doing', 'today'): 1, ('are', 'you', 'doing'): 1, ('today', 'are', 'you'): 1, ('how', 'are', 'you'): 1, ('doing', 'today', 'are'): 1, ('hello', 'how', 'are'): 1})
>>> count_ngrams(['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'],4)
Counter({('doing', 'today', 'are', 'you'): 1, ('today', 'are', 'you', 'okay'): 1, ('are', 'you', 'doing', 'today'): 1, ('how', 'are', 'you', 'doing'): 1, ('you', 'doing', 'today', 'are'): 1, ('hello', 'how', 'are', 'you'): 1})
like image 130
Willem Van Onsem Avatar answered Dec 27 '22 09:12

Willem Van Onsem


This solution requires a third-party module (iteration_utilities.Iterable) but should do what you want:

>>> from iteration_utilities import Iterable

>>> l = ['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay']

>>> Iterable(l).successive(2).as_counter()
Counter({('are', 'you'): 2,
         ('doing', 'today'): 1,
         ('hello', 'how'): 1,
         ('how', 'are'): 1,
         ('today', 'are'): 1,
         ('you', 'doing'): 1,
         ('you', 'okay'): 1})
like image 38
MSeifert Avatar answered Dec 27 '22 10:12

MSeifert