Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a string to utf-8 in Python

People also ask

How do I UTF-8 encode a string in Python?

Use encode() to convert a String to UTF-8 The encode() method returns the encoded version of the string. In case of failure, a UnicodeDecodeError exception may occur.

How do I convert string to UTF?

In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.

What is encoding UTF-8 in Python?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding.


In Python 2

>>> plain_string = "Hi!"
>>> unicode_string = u"Hi!"
>>> type(plain_string), type(unicode_string)
(<type 'str'>, <type 'unicode'>)

^ This is the difference between a byte string (plain_string) and a unicode string.

>>> s = "Hello!"
>>> u = unicode(s, "utf-8")

^ Converting to unicode and specifying the encoding.

In Python 3

All strings are unicode. The unicode function does not exist anymore. See answer from @Noumenon


If the methods above don't work, you can also tell Python to ignore portions of a string that it can't convert to utf-8:

stringnamehere.decode('utf-8', 'ignore')

Might be a bit overkill, but when I work with ascii and unicode in same files, repeating decode can be a pain, this is what I use:

def make_unicode(inp):
    if type(inp) != unicode:
        inp =  inp.decode('utf-8')
    return inp

Adding the following line to the top of your .py file:

# -*- coding: utf-8 -*-

allows you to encode strings directly in your script, like this:

utfstr = "ボールト"