I have a unicode string in Python and basically need to go through, character by character and replace certain ones based on a list of rules. One such rule is that a is changed to ö if a is after n. Also, if there are two vowel characters in a row, they get replaced by one vowel character and :. So if I have the string "natarook", what is the easiest and most efficient way of getting "nötaro:k"? Using Python 2.6 and CherryPy 3.1 if that matters.
edit: two vowels in a row does mean the same vowels (oo, aa, ii)
# -*- coding: utf-8 -*-
def subpairs(s, prefix, suffix):
def sub(i, sentinal=object()):
r = prefix.get(s[i:i+2], sentinal)
if r is not sentinal: return r
r = suffix.get(s[i-1:i+1], sentinal)
if r is not sentinal: return r
return s[i]
s = '\0'+s+'\0'
return ''.join(sub(i) for i in xrange(1,len(s)))
vowels = [(v+v, u':') for v in 'aeiou']
prefix = {}
suffix = {'na':u'ö'}
suffix.update(vowels)
print subpairs('natarook', prefix, suffix)
# prints: nötaro:k
prefix = {'na':u'ö'}
suffix = dict(vowels)
print subpairs('natarook', prefix, suffix)
# prints: öataro:k
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With