What is the difference between encoding utf-8
and utf8
(if there is any)?
Given the following example:
u = u'€'
print('utf-8', u.encode('utf-8'))
print('utf8 ', u.encode('utf8'))
It produces the following output:
utf-8 b'\xe2\x82\xac'
utf8 b'\xe2\x82\xac'
UTF-8 is a valid IANA character set name, whereas utf8 is not. It's not even a valid alias. it refers to an implementation-provided locale, where settings of language, territory, and codeset are implementation-defined.
UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.
As a content author or developer, you should nowadays always choose the UTF-8 character encoding for your content or data. This Unicode encoding is a good choice because you can use a single character encoding to handle any character you are likely to need. This greatly simplifies things.
String Encoding Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points. For efficient storage of these strings, the sequence of code points is converted into a set of bytes.
There's no difference. See the table of standard encodings. Specifically for 'utf_8'
, the following are all valid aliases:
'U8', 'UTF', 'utf8'
Also note the statement in the first paragraph:
Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g.
'utf-8'
is a valid alias for the'utf_8'
codec
You can also check the aliases of a specific encoding using encodings
module, this way, which will give you a Key matching aliases as values:
>>> from encodings.aliases import aliases
>>>
>>> for k,v in aliases.items():
if 'utf_8' in v:
print('Encoding name:{:>10} -- Module Name: {:}'.format(k,v))
Encoding name: utf -- Module Name: utf_8
Encoding name: u8 -- Module Name: utf_8
Encoding name: utf8_ucs4 -- Module Name: utf_8
Encoding name: utf8_ucs2 -- Module Name: utf_8
Encoding name: utf8 -- Module Name: utf_8
And as pointed by the mgilson's answer:
Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With