converting string to unicode type in python

Tags:

I'm trying this code:

s = "سلام"
'{:b}'.format(int(s.encode('utf-8').encode('hex'), 16))

but this error occurs:

'{:b}'.format(int(s.encode('utf-8').encode('hex'), 16))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd3 in position 0: ordinal not in range(128)

I tried '{:b}'.format(int(s.encode('utf-8').encode('hex'), 16)) but nothing changed.

what should I do?

819

asked Oct 08 '13 21:10

Aidin.T

1 Answers

Since you're using python 2, s = "سلام" is a byte string (in whatever encoding your terminal uses, presumably utf8):

>>> s = "سلام"
>>> s
'\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85'

You cannot encode byte strings (as they are already "encoded"). You're looking for unicode ("real") strings, which in python2 must be prefixed with u:

>>> s = u"سلام"
>>> s
u'\u0633\u0644\u0627\u0645'
>>> '{:b}'.format(int(s.encode('utf-8').encode('hex'), 16))
'1101100010110011110110011000010011011000101001111101100110000101'

If you're getting a byte string from a function such as raw_input then your string is already encoded - just skip the encode part:

'{:b}'.format(int(s.encode('hex'), 16))

or (if you're going to do anything else with it) convert it to unicode:

s = s.decode('utf8')

This assumes that your input is UTF-8 encoded, if this might not be the case, check sys.stdin.encoding first.

i10n stuff is complicated, here are two articles that will help you further:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

105

answered Sep 30 '22 11:09

georg

Related questions
                            
                                Python httplib SSL23_GET_SERVER_HELLO:unknown protocol
                            
                                Pass a parent class as an argument?
                            
                                Retrieve id of Jenkins build started with the API using the "location" information in header (new feature of jenkins 1.529)
                            
                                Using python scipy to fit gamma distribution to data
                            
                                Route to worker depending on result in Celery?
                            
                                Python - how can I address an array along a given axis?
                            
                                Comparing two OpenCV images/2D Numpy arrays
                            
                                Getting Attempted relative import in non-package error in spite of having __init__.py
                            
                                How many times can `__del__` be called per object in Python?
                            
                                Overwriting (updating) a pandas Series with values from another Series?
                            
                                how to enumerate OrderedDict in python
                            
                                Fastest possible way to iterate through a specific list?
                            
                                how to use first band of 3d numpy array as imaginary values for all other bands
                            
                                Python 2.7 : difference between exit() and raise ValueError("example")
                            
                                Pycharm Remote Python Interpreter over SSH Gateway, X11 forwarding
                            
                                Python main thread interruption
                            
                                Batch editing of csv files with Python
                            
                                How to filter models using timezone aware dates?
                            
                                Using mysqldb and sqlite3 in the same Python 2.7 script: Should I throw in the towel?
                            
                                MySQL, should I stay connected or connect when needed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

converting string to unicode type in python

Tags:

python

encoding

unicode

utf-8

Aidin.T

People also ask

1 Answers

georg

Recent Activity

Donate For Us