Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting next variable in a for loop

I'm very new to Python and I'm sure there is a much easier way to accomplish what I need but here goes.

I'm trying to create a program which performs frequency analysis on a list of letters called inputList and retrives the 2 letter pairs and adds them to another dictionary. So I need it to populate a second dictonary with all the 2 letter pairs.

I have a rough idea how I can do this but am I bit stuck with the syntax to make it work.

for bigram in inputList:
    bigramDict[str(bigram + bigram+1)] =  1

Where bigram+1 is the letter in the next iteration

As an example if I was to have the text "stackoverflow" in the inputList I need to to first put the letters "st" as the key and 1 as the value. On the second iteration "ta" as the key and so on. The problem I'm having is retriving the value the variable will be on the next iteration without moving to the next iteration.

I hope I explained myself clearly. Thanks for your help

like image 872
Xtrato Avatar asked Dec 15 '22 22:12

Xtrato


1 Answers

A straightforward way to obtain n-grams for a sequence is slicing:

def ngrams(seq, n=2):
    return [seq[i:i+n] for i in range(len(seq) - n + 1)]

Combine this with collections.Counter and you're ready:

from collections import Counter
print Counter(ngrams("abbabcbabbabr"))

In case you need ngrams() to be lazy:

from collections import deque

def ngrams(it, n=2):
    it = iter(it)
    deq = deque(it, maxlen=n)
    yield tuple(deq)
    for p in it:
        deq.append(p)
        yield tuple(deq)

(See below for more elegant code for the latter).

like image 118
georg Avatar answered Jan 01 '23 11:01

georg