Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python the same char not equals

Tags:

python

unicode

I have text in my database. I send some text from xhr to my view. Function find does not find some unicode chars.

I want to find selected text using:

text.find(selection)

but sometimes variable 'selection' contains a char like that:

ę  # in xhr unichr(281)

whereas in variable 'text' there was:

ę  # in db has two chars unichr(101) + unichr(808)

They are just different forms of the same thing. How to make .find work more reliably here?

like image 465
strz Avatar asked Mar 21 '16 17:03

strz


1 Answers

Here unicodedata.normalize might help you.

Basically if you normalize the data coming from the db, and normalize your selection to the same form, you should have a better result when using str.find, str.__contains__ (i.e. in), str.index, and friends.

>>> u1 = chr(281)
>>> u2 = chr(101) + chr(808)
>>> print(u1, u2)
ę ę
>>> u1 == u2
False
>>> unicodedata.normalize('NFC', u2) == u1
True

NFC stands for the Normal Form Composed form. You can read up here for some description of the other possible forms.

like image 55
wim Avatar answered Sep 23 '22 14:09

wim