I'm trying to search for a pattern in sqlalchemy results (actually filter by a 'like' or 'op'('regexp')(pattern) which I believe is implanted with regex somewhere) - the string and the search string are both in hebrew, and presumably (maybe I'm wrong-)-unicode
where r = u'לבן' and c = u'לבן, ורוד, '
when I do re.search(r,c) I get the SRE.match object
but when I query the db like:
f = session.query(classname)
c = f[0].color
and c gives me:
'\xd7\x9c\xd7\x91\xd7\x9f,\xd7\x95\xd7\xa8\xd7\x95\xd7\x93,'
or print (c):
לבן,ורוד,
practicaly the same but running re.search(r,c) gives me no match object.
Since I suspected a unicode issue I tried to transform to unicode with unicode(c)
and I get an 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal' which I guess means this is already unicode string - so where's the catch here?
I would prefer using the sqlalchemy 'like' but I get the same problem there = where I know for sure (as I showed in my example that the data contains the string)
Should I transform the search string,pattern somehow? is this related to unicode? something else?
The db table (which I'm quering) collation is utf8_unicode_ci
c = f[0].color
is not returning a Unicode string (or its repr() would show a u'...' kind of string), but a UTF-8 encoded string.
Try
c = f[0].color.decode("utf-8")
which results in
u'\u05dc\u05d1\u05df,\u05d5\u05e8\u05d5\u05d3,'
or
u'לבן,ורוד,'
if your console can display Hebrew characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With