I have block of code:( Django code )
list_temp = []
tagname_re = re.compile(r'^[\w+\.-]+$', re.UNICODE)
for key,tag in list.items():
if len(tag) > settings.FORM_MAX_LENGTH_OF_TAG or len(tag) < settings.FORM_MIN_LENGTH_OF_TAG:
raise forms.ValidationError(_('please use between %(min)s and %(max)s characters in you tags') % { 'min': settings.FORM_MIN_LENGTH_OF_TAG, 'max': settings.FORM_MAX_LENGTH_OF_TAG})
if not tagname_re.match(tag):
raise forms.ValidationError(_('please use following characters in tags: letters , numbers, and characters \'.-_\''))
# only keep one same tag
if tag not in list_temp and len(tag.strip()) > 0:
list_temp.append(tag)
This allow me to put the tag name in Unicode character.
But I don't know why with my Unicode (khmer uncode Khmer Symbols Range: 19E0–19FF The Unicode Standard, Version 4.0).I could not .
My question :
How can I change the above codetagname_re = re.compile(r'^[\w+\.-]+$', re.UNICODE)
to adapt my Unicode character.?Because if I input the tag with the "នយោបាយ" I got the message?
please use following characters in tags: letters , numbers, and characters \'.-_\''
This will make your regular expressions work with all Unicode regex engines. In addition to the standard notation, \p{L}, Java, Perl, PCRE, the JGsoft engine, and XRegExp 3 allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.
UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.
Use str.Call str. encode() to encode str as UTF-8 bytes. Call bytes. decode() to decode UTF-8 encoded bytes to a Unicode string.
Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points. For efficient storage of these strings, the sequence of code points is converted into a set of bytes. The process is known as encoding.
ោ (U+17C4 KHMER VOWEL SIGN OO) and ា (U+17B6 KHMER VOWEL SIGN AA) are not letters, they're combining marks, so they don't match \w.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With