Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this the best way to ensure that a python unicode "string" is encoded in utf-8?

Tags:

python

unicode

Given in arbitrary "string" from a library I do not have control over, I want to make sure the "string" is a unicode type and encoded in utf-8. I would like to know if this is the best way to do this:

import types

input = <some value from a lib I dont have control over>

if isinstance(input, types.StringType):
    input = input.decode("utf-8")
elif isinstance(input, types.UnicodeType):
    input = input.encode("utf-8").decode("utf-8")

In my actual code I wrap this in a try/except and handle the errors but I left that part out.

like image 224
mcot Avatar asked Mar 14 '11 21:03

mcot


2 Answers

A Unicode object is not encoded (it is internally but this should be transparent to you as a Python user). The line input.encode("utf-8").decode("utf-8") does not make much sense: you get the exact same sequence of Unicode characters at the end that you had in the beginning.

if isinstance(input, str):
    input = input.decode('utf-8')

is all you need to ensure that str objects (byte strings) are converted into Unicode strings.

like image 172
jd. Avatar answered Oct 12 '22 12:10

jd.


Simply;

try:
    input = unicode(input.encode('utf-8'))
except ValueError:
    pass

Its always better to seek forgiveness than ask permission.

like image 45
Jakob Bowyer Avatar answered Oct 12 '22 12:10

Jakob Bowyer