NLTK makes it easy to compute bigrams of words. What about letters?

Question

I've seen tons of documentation all over the web about how the python NLTK makes it easy to compute bigrams of words.

What about letters?

What I want to do is plug in a dictionary and have it tell me the relative frequencies of different letter pairs.

Ultimately I'd like to make some kind of markov process to generate likely-looking (but fake) words.

miku · Accepted Answer

Here is an example (modulo Relative Frequency Distribution) using Counter from the collections module:

#!/usr/bin/env python

import sys
from collections import Counter
from itertools import islice
from pprint import pprint

def split_every(n, iterable):
    i = iter(iterable)
    piece = ''.join(list(islice(i, n)))
    while piece:
        yield piece
        piece = ''.join(list(islice(i, n)))

def main(text):
    """ return ngrams for text """
    freqs = Counter()
    for pair in split_every(2, text): # adjust n here
        freqs[pair] += 1
    return freqs

if __name__ == '__main__':
    with open(sys.argv[1]) as handle:
        freqs = main(handle.read()) 
        pprint(freqs.most_common(10))

Usage:

$ python 14168601.py lorem.txt
[('t ', 32),
 (' e', 20),
 ('or', 18),
 ('at', 16),
 (' a', 14),
 (' i', 14),
 ('re', 14),
 ('e ', 14),
 ('in', 14),
 (' c', 12)]

vpekar · Answer

If bigrams is all you need, you don't need NLTK. You can simply do it as follows:

from collections import Counter
text = "This is some text"
bigrams = Counter(x+y for x, y in zip(*[text[i:] for i in range(2)]))
for bigram, count in bigrams.most_common():
    print bigram, count

Output:

is 2
s  2
me 1
om 1
te 1
 t 1
 i 1
e  1
 s 1
hi 1
so 1
ex 1
Th 1
xt 1

NLTK makes it easy to compute bigrams of words. What about letters?

Tags:

python

nlp

nltk

n-gram

isthmuses

2 Answers

miku

vpekar

Recent Activity

Donate For Us

NLTK makes it easy to compute bigrams of words. What about letters?

Tags:

python

nlp

nltk

n-gram

isthmuses

2 Answers

miku

vpekar

Related questions

Recent Activity

Donate For Us