I'm very new to Python and I'm sure there is a much easier way to accomplish what I need but here goes.
I'm trying to create a program which performs frequency analysis on a list of letters called inputList
and retrives the 2 letter pairs and adds them to another dictionary. So I need it to populate a second dictonary with all the 2 letter pairs.
I have a rough idea how I can do this but am I bit stuck with the syntax to make it work.
for bigram in inputList:
bigramDict[str(bigram + bigram+1)] = 1
Where bigram+1 is the letter in the next iteration
As an example if I was to have the text "stackoverflow" in the inputList
I need to to first put the letters "st" as the key and 1 as the value. On the second iteration "ta" as the key and so on. The problem I'm having is retriving the value the variable will be on the next iteration without moving to the next iteration.
I hope I explained myself clearly. Thanks for your help
A straightforward way to obtain n-grams for a sequence is slicing:
def ngrams(seq, n=2):
return [seq[i:i+n] for i in range(len(seq) - n + 1)]
Combine this with collections.Counter
and you're ready:
from collections import Counter
print Counter(ngrams("abbabcbabbabr"))
In case you need ngrams()
to be lazy:
from collections import deque
def ngrams(it, n=2):
it = iter(it)
deq = deque(it, maxlen=n)
yield tuple(deq)
for p in it:
deq.append(p)
yield tuple(deq)
(See below for more elegant code for the latter).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With