Is this the best way to ensure that a python unicode "string" is encoded in utf-8?

Question

Given in arbitrary "string" from a library I do not have control over, I want to make sure the "string" is a unicode type and encoded in utf-8. I would like to know if this is the best way to do this:

import types

input = <some value from a lib I dont have control over>

if isinstance(input, types.StringType):
    input = input.decode("utf-8")
elif isinstance(input, types.UnicodeType):
    input = input.encode("utf-8").decode("utf-8")

In my actual code I wrap this in a try/except and handle the errors but I left that part out.

jd. · Accepted Answer

A Unicode object is not encoded (it is internally but this should be transparent to you as a Python user). The line input.encode("utf-8").decode("utf-8") does not make much sense: you get the exact same sequence of Unicode characters at the end that you had in the beginning.

if isinstance(input, str):
    input = input.decode('utf-8')

is all you need to ensure that str objects (byte strings) are converted into Unicode strings.

if isinstance(input, str):
    input = input.decode('utf-8')

is all you need to ensure that str objects (byte strings) are converted into Unicode strings.

Jakob Bowyer · Answer

Simply;

try:
    input = unicode(input.encode('utf-8'))
except ValueError:
    pass

Its always better to seek forgiveness than ask permission.

Is this the best way to ensure that a python unicode "string" is encoded in utf-8?

Tags:

python

unicode

mcot

2 Answers

jd.

Jakob Bowyer

Recent Activity

Donate For Us

Is this the best way to ensure that a python unicode "string" is encoded in utf-8?

Tags:

python

unicode

mcot

2 Answers

jd.

Jakob Bowyer

Related questions

Recent Activity

Donate For Us