Is there any reason to prefer unicode(somestring, 'utf8')
as opposed to somestring.decode('utf8')
?
My only thought is that .decode()
is a bound method so python may be able to resolve it more efficiently, but correct me if I'm wrong.
To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.
Unicode is a general representation of some text, which can be encoded in many different ways into a sequence of binary data represented via str . In Python 3, unicode was renamed to str and there is a new bytes type for a plain sequence of bytes.
Encoding is the process of transforming a set of Unicode characters into a sequence of bytes. Decoding is the process of transforming a sequence of encoded bytes into a set of Unicode characters. The Unicode Standard assigns a code point (a number) to each character in every supported script.
Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.
I'd prefer 'something'.decode(...)
since the unicode
type is no longer there in Python 3.0, while text = b'binarydata'.decode(encoding)
is still valid.
It's easy to benchmark it:
>>> from timeit import Timer
>>> ts = Timer("s.decode('utf-8')", "s = 'ééé'")
>>> ts.timeit()
8.9185450077056885
>>> tu = Timer("unicode(s, 'utf-8')", "s = 'ééé'")
>>> tu.timeit()
2.7656929492950439
>>>
Obviously, unicode()
is faster.
FWIW, I don't know where you get the impression that methods would be faster - it's quite the contrary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With