Given in arbitrary "string" from a library I do not have control over, I want to make sure the "string" is a unicode type and encoded in utf-8. I would like to know if this is the best way to do this:
import types
input = <some value from a lib I dont have control over>
if isinstance(input, types.StringType):
input = input.decode("utf-8")
elif isinstance(input, types.UnicodeType):
input = input.encode("utf-8").decode("utf-8")
In my actual code I wrap this in a try/except and handle the errors but I left that part out.
A Unicode object is not encoded (it is internally but this should be transparent to you as a Python user). The line input.encode("utf-8").decode("utf-8")
does not make much sense: you get the exact same sequence of Unicode characters at the end that you had in the beginning.
if isinstance(input, str):
input = input.decode('utf-8')
is all you need to ensure that str objects (byte strings) are converted into Unicode strings.
Simply;
try:
input = unicode(input.encode('utf-8'))
except ValueError:
pass
Its always better to seek forgiveness than ask permission.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With