I have text in my database. I send some text from xhr to my view. Function find does not find some unicode chars.
I want to find selected text using:
text.find(selection)
but sometimes variable 'selection' contains a char like that:
ę # in xhr unichr(281)
whereas in variable 'text' there was:
ę # in db has two chars unichr(101) + unichr(808)
They are just different forms of the same thing. How to make .find
work more reliably here?
Here unicodedata.normalize
might help you.
Basically if you normalize the data coming from the db, and normalize your selection to the same form, you should have a better result when using str.find
, str.__contains__
(i.e. in
), str.index
, and friends.
>>> u1 = chr(281)
>>> u2 = chr(101) + chr(808)
>>> print(u1, u2)
ę ę
>>> u1 == u2
False
>>> unicodedata.normalize('NFC', u2) == u1
True
NFC stands for the Normal Form Composed form. You can read up here for some description of the other possible forms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With