Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Usage of unicode() and encode() functions in Python

I have a problem with encoding of the path variable and inserting it to the SQLite database. I tried to solve it with encode("utf-8") function which didn't help. Then I used unicode() function which gives me type unicode.

print type(path)                  # <type 'unicode'> path = path.replace("one", "two") # <type 'str'> path = path.encode("utf-8")       # <type 'str'> strange path = unicode(path)              # <type 'unicode'> 

Finally I gained unicode type, but I still have the same error which was present when the type of the path variable was str

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

Could you help me solve this error and explain the correct usage of encode("utf-8") and unicode() functions? I'm often fighting with it.

EDIT:

This execute() statement raised the error:

cur.execute("update docs set path = :fullFilePath where path = :path", locals()) 

I forgot to change the encoding of fullFilePath variable which suffers with the same problem, but I'm quite confused now. Should I use only unicode() or encode("utf-8") or both?

I can't use

fullFilePath = unicode(fullFilePath.encode("utf-8")) 

because it raises this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 32: ordinal not in range(128)

Python version is 2.7.2

like image 692
xralf Avatar asked Apr 23 '12 20:04

xralf


People also ask

Why encode () is used in Python?

Definition. The Python encode() is a built-in string method that is used to return an encoded version of the string according to the encoded standard. Python encode() string function is used to secure the string by encoding it based on the specified encoding type.

What does unicode () do in Python?

Remarks. If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised.

What is the importance of unicode encoding?

Unicode is a universal character encoding standard. This standard includes roughly 100000 characters to represent characters of different languages. While ASCII uses only 1 byte the Unicode uses 4 bytes to represent characters. Hence, it provides a very wide variety of encoding.

What is the unicode used for?

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.


1 Answers

str is text representation in bytes, unicode is text representation in characters.

You decode text from bytes to unicode and encode a unicode into bytes with some encoding.

That is:

>>> 'abc'.decode('utf-8')  # str to unicode u'abc' >>> u'abc'.encode('utf-8') # unicode to str 'abc' 

UPD Sep 2020: The answer was written when Python 2 was mostly used. In Python 3, str was renamed to bytes, and unicode was renamed to str.

>>> b'abc'.decode('utf-8') # bytes to str 'abc' >>> 'abc'.encode('utf-8'). # str to bytes b'abc' 
like image 55
newtover Avatar answered Oct 07 '22 02:10

newtover