I noticed the following holds:
>>> u'abc' == 'abc'
True
>>> 'abc' == u'abc'
True
Will this always be true or could it possibly depend on the system locale? (It seems strings are unicode in python 3: e.g. this question, but bytes in 2.x)
Unicode is the universal character encoding used to process, store and facilitate the interchange of text data in any language while ASCII is used for the representation of text such as symbols, letters, digits, etc. in computers.
ASCII has its equivalent in Unicode. The difference between ASCII and Unicode is that ASCII represents lowercase letters (a-z), uppercase letters (A-Z), digits (0-9) and symbols such as punctuation marks while Unicode represents letters of English, Arabic, Greek etc.
Unicode is a superset of ASCII, and the numbers 0–127 have the same meaning in ASCII as they have in Unicode.
ASCII originally used seven bits to encode each character. This was later increased to eight with Extended ASCII to address the apparent inadequacy of the original. In contrast, Unicode uses a variable bit encoding program where you can choose between 32, 16, and 8-bit encodings.
Python 2 coerces between unicode
and str
using the ASCII codec when comparing the two types. So yes, this is always true.
That is to say, unless you mess up your Python installation and use sys.setdefaultencoding()
to change that default. You cannot do that normally, because the sys.setdefaultencoding()
function is deleted from the module at start-up time, but there is a Cargo Cult going around where people use reload(sys)
to reinstate that function and change the default encoding to something else to try and fix implicit encoding and decoding problems. This is a dumb thing to do for precisely this reason.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With