Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert unicode string to byte string

Tags:

python

unicode

I get a string from a function that is represented like u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0', but to process it I need it to be bytestring (like '\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0').

How do I convert it without changes?

My best guess so far is to take s.encode('unicode_escape'), which will return '\\xd0\\xbc\\xd0\\xb0\\xd1\\x80\\xd0\\xba\\xd0\\xb0' and process every 5 characters so that '\xd0' becomes one character represented as '\xd0'.

like image 218
Alexander Egurnov Avatar asked Jun 24 '12 03:06

Alexander Egurnov


1 Answers

ISO 8859-1 (aka Latin-1) maps the first 256 Unicode codepoints to their byte values.

>>> u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'.encode('latin-1')
'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
like image 84
Ignacio Vazquez-Abrams Avatar answered Oct 24 '22 10:10

Ignacio Vazquez-Abrams