I'd like to create a regex statement in Python 2.7.8 that will substitute characters. It will work like this...
ó -> o
ú -> u
é -> e
á -> a
í -> i
ù,ú -> u
These are the only unicode characters that I would like to change. Such unicode characters as, ë, ä I don't want to change. So the word, thójlà will become tholja. I'm sure there is a way so that I don't have to create all the regex separately like below.
word = re.sub(ur'ó', ur'o', word)
word = re.sub(ur'ú', ur'u', word)
word = re.sub(ur'é', ur'e', word)
....
I've been trying to figure this out but haven't had any luck. Any help is appreciated!
Try with str.translate and maketrans...
print('thójlà'.translate(str.maketrans('óúéáíùú', 'oueaiuu')))
# thojlà
This way you ensure the only substitutions you want to make.
If you had many strings to change, you should assign your maketrans to a variable, like
table = str.maketrans('óúéáíùú', 'oueaiuu')
and then, each string can be translated as
s.translate(table)
With String's replace() function you can do something like:
x = "thójlà"
>>> x
'thójlà'
>>> x = x.replace('ó','o')
'thojlà'
>>> x = x.replace('à','a')
'thojla'
A generalized way:
# -*- coding: utf-8 -*-
replace_dict = {
'á':'a',
'à':'a',
'é':'e',
'í':'i',
'ó':'o',
'ù':'u',
'ú':'u'
}
str1 = "thójlà"
for key in replace_dict:
str1 = str1.replace(key, replace_dict[key])
print(str1) #prints 'thojla'
A third way, if your list of character mappings is getting too large:
# -*- coding: utf-8 -*-
replace_dict = {
'a':['á','à'],
'e':['é'],
'i':['í'],
'o':['ó'],
'u':['ù','ú']
}
str1 = "thójlà"
for key, values in replace_dict.items():
for character in values:
str1 = str1.replace(character, key)
print(str1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With