Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between encoding utf-8 and utf8 in Python 3.5

What is the difference between encoding utf-8 and utf8 (if there is any)?

Given the following example:

u = u'€'
print('utf-8', u.encode('utf-8'))
print('utf8 ', u.encode('utf8'))

It produces the following output:

utf-8 b'\xe2\x82\xac'
utf8  b'\xe2\x82\xac'
like image 402
bastelflp Avatar asked Feb 13 '16 18:02

bastelflp


People also ask

What is the difference between UTF-8 and UTF-8?

UTF-8 is a valid IANA character set name, whereas utf8 is not. It's not even a valid alias. it refers to an implementation-provided locale, where settings of language, territory, and codeset are implementation-defined.

What does encoding =' UTF-8 do in Python?

UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.

Why is UTF-8 a good choice for the default editor encoding in Python?

As a content author or developer, you should nowadays always choose the UTF-8 character encoding for your content or data. This Unicode encoding is a good choice because you can use a single character encoding to handle any character you are likely to need. This greatly simplifies things.

What encoding does Python 3 use?

String Encoding Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points. For efficient storage of these strings, the sequence of code points is converted into a set of bytes.


2 Answers

There's no difference. See the table of standard encodings. Specifically for 'utf_8', the following are all valid aliases:

'U8', 'UTF', 'utf8'

Also note the statement in the first paragraph:

Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec

like image 149
mgilson Avatar answered Oct 11 '22 01:10

mgilson


You can also check the aliases of a specific encoding using encodings module, this way, which will give you a Key matching aliases as values:

>>> from encodings.aliases import aliases
>>> 
>>> for k,v in aliases.items():
    if 'utf_8' in v:
        print('Encoding name:{:>10} -- Module Name: {:}'.format(k,v))


Encoding name:       utf -- Module Name: utf_8
Encoding name:        u8 -- Module Name: utf_8
Encoding name: utf8_ucs4 -- Module Name: utf_8
Encoding name: utf8_ucs2 -- Module Name: utf_8
Encoding name:      utf8 -- Module Name: utf_8

And as pointed by the mgilson's answer:

Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec.

like image 21
Iron Fist Avatar answered Oct 11 '22 02:10

Iron Fist