i've been trying the last couple days to convert this js script to python code.
My implementation (blindfull cp mostly, some minor fixes here and there) so far:
import random
class markov:
memory = {}
separator = ' '
order = 2
def getInitial(self):
ret = []
for i in range(0, self.order, 1):
ret.append('')
return ret
def breakText(self, txt, cb):
parts = txt.split(self.separator)
prev = self.getInitial()
def step(self):
cb(prev, self.next)
prev.shift()#Javascript function.
prev.append(self.next)
#parts.forEach(step) # - step is the function above.
cb(prev, '')
def learn(self, txt):
mem = self.memory
def learnPart(key, value):
if not mem[key]:
mem[key] = []
mem[key] = value
return mem
self.breakText(txt, learnPart)
def step(self, state, ret):
nextAvailable = self.memory[state] or ['']
self.next = nextAvailable[random.choice(nextAvailable.keys())]
if not self.next:
return ret
ret.append(next)
nextState = state.slice(1)
return self.step(nextState, ret)
def ask(self, seed):
if not seed:
seed = self.genInitial()
seed = seed + self.step(seed, []).join(self.separator)
return seed
Issues:
I have absolutely no knowledge of javascript.
When i try to "learn" some text to a "markov" class object [e.g.: a=markov(); a.learn("sdfg");] i get the following error: "TypeError: unhashable type: 'list'", for the "mem" dictionary at the "learnPart" function, member of the "learn" function.
So my question so far is why does this exception [TypeError for a list object, falsely referring to a dictionary object (which is hashable)] occur?
thanks in advance for any suggestions, directions, points, help in general :D
Guy who wrote the article speaking. Glad you found it useful! Now, my first implementation of a Markov chain was actually in Python, so this answer will focus on how to write it in a more Pythonic way. I'll show how to go about making an order-2 Markov chain, since they're easy to talk about, but you can of course make it order-N with some modifications.
In js, the two prominent data structures are the generic object and the array (which is an extension to the generic object). In Python however, you have other options for more finely-grained control. Here're the major differences in the two implementations:
A state in our chain is really a tuple - an immutable, ordered structure, with a fixed amount of elements. We always want n
elements (in this case, n=2
) and their order has meaning.
Manipulating the memory will be easier if we use a defaultdict wrapping a list, so we can skip the "checking if a state doesn't exist, and then doing X", and instead just do X.
So, we stick a from collections import defaultdict
at the top and change how markov.memory
is defined:
memory = defaultdict(list)
Now we change markov.getInitial
to return a tuple (remember this explains an order-2 chain):
def getInitial(self):
return ('', '')
(if you want to expand it further, you can use a really neat Python trick: tuple([''] * 2)
will return the same thing. Instead of empty strings, you can use None
)
We'll get to changing what uses genInitial
in a bit.
A strong concept which doesn't exist in js (yet) but does exist in Python is the yield
operator (see this question for great explanations).
Another feature of Python is its generic for
loop. You can go over nearly anything quite easily, including generators (functions which use yield
). Combining the two, and we can redefine breakText
:
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
#will be explained in the next paragraph
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield prev, ''
The magic part above, prev = (prev[1], word)
can be explained best by example:
>>> a = (0, 1)
>>> a
(0, 1)
>>> a = (a[1], 2)
>>> a
(1, 2)
That's how we advance through the word list. And now we move up to what uses breakText
, to the redefinition of markov.learn
:
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
Because our memory is a defaultdict
, we don't have to worry about the key not existing.
OK, we have half of the chain implemented, time to see it in action! What we have so far:
from collections import defaultdict
class Markov:
memory = defaultdict(list)
separator = ' '
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield (prev, '')
def getInitial(self):
return ('', '')
(I changed the class name from markov
to Markov
because I cringe every time a class begins with a lowercase letter). I saved it as brain.py
and loaded up Python.
>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.memory
defaultdict(<class 'list'>, {('had', 'a'): ['little'], ('Mary', 'had'): ['a'], ('', ''): ['Mary'], ('little', 'lamb'): [''], ('a', 'little'): ['lamb'], ('', 'Mary'): ['had']})
Success! Let's look at the result more carefully, to see that we got it right:
{ ('', ''): ['Mary'],
('', 'Mary'): ['had'],
('Mary', 'had'): ['a'],
('a', 'little'): ['lamb'],
('had', 'a'): ['little'],
('little', 'lamb'): ['']}
zips up Ready to drive on? We still have to use this chain!
step
functionWe've already met what we need to remake step
. We have the defaultdict, so we can use random.choice
right away, and I can cheat a bit because I know the order of the chain. We can also get rid of the recursion (with some sorrow), if we see it as a function which takes a single step through the chain (my bad in the original article - a badly named function).
def step(self, state):
choice = random.choice(self.memory[state] or [''])
if not choice:
return None
nextState = (state[1], choice)
return choice, nextState
I regretfully added the or ['']
because random.choice
moans about empty lists. Finally, we move a larger portion of the logic to ask
(the actual construction of the sentence):
def ask(self, seed=False):
ret = []
if not seed:
seed = self.getInitial()
while True:
link = self.step(seed)
if link is None:
break
ret.append(link[0])
seed = link[1]
return self.separator.join(ret)
Yes, a bit yucky. We could have given step
a better name and made it a generator, but I'm late for a meeting with my pregnant wife who's about to give birth to a baby who left the stove on fire in my car that's being towed! I better hurry!
But first, a talk with bob:
from collections import defaultdict
import random
class Markov:
memory = defaultdict(list)
separator = ' '
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
def ask(self, seed=False):
ret = []
if not seed:
seed = self.getInitial()
while True:
link = self.step(seed)
if link is None:
break
ret.append(link[0])
seed = link[1]
return self.separator.join(ret)
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield (prev, '')
def step(self, state):
choice = random.choice(self.memory[state] or [''])
if not choice:
return None
nextState = (state[1], choice)
return choice, nextState
def getInitial(self):
return ('', '')
And loading it up:
>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.ask()
'Mary had a little lamb'
>>> bob.learn('Mary had a giant crab')
>>> bob.ask(('Mary', 'had'))
'a giant crab'
There is, of course, room for improvement and expanding on the concept. But it wouldn't be any fun if if I just gave you the answer.
Hopefully this will still help after 4 months.
The complex answer
The issue here is that learnPart
is trying to use the return value of getInitial, which is a list, as a key to a dictionary. Lists are mutable, and hence not hashable, which means they can't be used as keys to a dictionary.
You could try adding this line to learnPart
:
def learnPart(key, value):
key = tuple(key) #<-----Try adding this line
if not mem[key]:
mem[key] = []
mem[key] = value
return mem
But I do not think that will solve all the problems.
The simple answer
There are plenty of Markov Chain implementations written in Python out there. A quick search on Github yielded 168 projects: https://github.com/search?l=Python&q=markov+chain
I kind of made a simplified version of the code:
import re
class Brain():
H = ''
def learn(self, txt):
self.H = txt
def ask(self,ask):
H=self.H
ask = re.compile(r"%s(.*)"%(ask),re.I|re.DOTALL)
m = ask.search(H)
print m.group(1)
Here's the execution:
>>> import brain
>>> bob = brain.Brain()
>>> bob.learn('Mary had a little lamb' )
>>> bob.ask('Mary had')
'a little lamb'
I agree that this isn't exactly a Markov chain algorithm.But it has a few advantages:
i. You can supply ask()
with raw text as shown above.
ii. It has a fewer lines of code.
iii. And hopefully, it is easier to understand.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With