Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Markov chain on letter scale and random text

I would like to generate a random text using letter frequencies from a book in a .txt file, so that each new character (string.lowercase + ' ') depends on the previous one.

How do I use Markov chains to do so? Or is it simpler to use 27 arrays with conditional frequencies for each letter?

like image 519
Julia Avatar asked Jan 18 '23 22:01

Julia


1 Answers

I would like to generate a random text using letter frequencies from a book in a txt file

Consider using collections.Counter to build-up the frequencies when looping over the text file two letters at a time.

How do I use markov chains to do so? Or is it simpler to use 27 arrays with conditional frequencies for each letter?

The two statements are equivalent. The Markov chain is what you're doing. The 27 arrays with conditional frequencies is how you're doing it.

Here is some dictionary based code to get you started:

from collections import defaultdict, Counter
from itertools import ifilter
from random import choice, randrange

def pairwise(iterable):
    it = iter(iterable)
    last = next(it)
    for curr in it:
        yield last, curr
        last = curr

valid = set('abcdefghijklmnopqrstuvwxyz ')

def valid_pair((last, curr)):
    return last in valid and curr in valid

def make_markov(text):
    markov = defaultdict(Counter)
    lowercased = (c.lower() for c in text)
    for p, q in ifilter(valid_pair, pairwise(lowercased)):
        markov[p][q] += 1
    return markov

def genrandom(model, n):
    curr = choice(list(model))
    for i in xrange(n):
        yield curr
        if curr not in model:   # handle case where there is no known successor
            curr = choice(list(model))
        d = model[curr]
        target = randrange(sum(d.values()))
        cumulative = 0
        for curr, cnt in d.items():
            cumulative += cnt
            if cumulative > target:
                break

model = make_markov('The qui_.ck brown fox')
print ''.join(genrandom(model, 20))
like image 91
Raymond Hettinger Avatar answered Jan 28 '23 16:01

Raymond Hettinger