A site in Vietnamese, it is virtually no different to English. However, there is a problem that is slug. When I type characters such as "ư", "ơ", "á",... Django is not identified. Solution here is to replace characters that do not sign into. Eg:
ư -> u
ơ -> o
á -> a
One from "những-viên-kẹo" will become "nhung-vien-keo". However, I do not know how to do this. Someone help me. Thank you very much!
[edit]
I take it back, django's django.template.defaultfilters.slugify()
does what you want, using unicodedata.normalize
and .encode('ascii', 'ignore')
. Just feeding your string into slugify will work:
from django.template.defaultfilters import slugify
print slugify(u"những-viên-kẹo")
To do this automatically, add this to the .save()
method in your models:
from django.template.defaultfilters import slugify
MyModel(models.Model):
title = models.CharField(max_length=255)
slug = models.SlugField(blank=True)
def save(self, *args, **kwargs):
if not self.slug:
self.slug = slugify(self.title)
super(MyModel, self).save(*args, **kwargs)
The slolution I wrote ealier (below) would still be useful for languages that require additional characters in their translation, eg German's ü->ue, ß->ss etc.
[original post]
Python allows you to use a translation dict to map characters to a replacement string.
A simple version for you case would be:
vietnamese_map = {
ord(u'ư'): 'u',
ord(u'ơ'): 'o',
ord(u'á'): 'a',
ord(u'n'): 'n',
ord(u'h'): 'h',
ord(u'ữ'): 'u',
ord(u'n'): 'n',
ord(u'g'): 'g',
ord(u'v'): 'v',
ord(u'i'): 'i',
ord(u'ê'): 'e',
ord(u'n'): 'n',
ord(u'k'): 'k',
ord(u'ẹ'): 'e',
ord(u'o'): 'o',
}
And then you can call:
print u"những-viên-kẹo".translate(vietnamese_map)
To get:
u"nhung-vien-keo"
For more advanced use (ie a dynamic dict), see eg http://effbot.org/zone/unicode-convert.htm
Note that the above is just to show you what the map needs to look like, it's not a particularly convenient way of entering the data. A more convenient way to do the exact same thing is something like:
_map = u"nn hh ữu nn gg vv ii êe nn kk ẹe oo"
# Take the above string and generate a translation dict
vietnamese_map = dict((ord(m[0]), m[1:]) for m in _map.split())
print u"những-viên-kẹo".translate(vietnamese_map)
You can try normalize it Python ->
http://pyright.blogspot.com/2009/11/unicode-normalization-python-3x-unicode.html
this could help instead of retype the vietnamese alphabet from a á ớ bờ cờ dờ đờ and ignore the possibility of others special latin character, just run a normalization function, and test if everything work well, remember to test the word "đ" since I've encountered the problem that the normalization function did not normalize Đ - D.
Good luck :P
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With