Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what kind of encode declaration should I input in python

I learned from the websitethat I should add the code declaration in python when i wan't to input friendly unicode characters: http://www.python.org/dev/peps/pep-0263/, but I still feel confused about it.

Assume that i work in linux with vim, and i create a new py file and input codes as follows:

#!/usr/bin/python2.7
# -*- coding: utf8 -*-
s = u'ޔ'
print s

1. I tried to replace line 2 with codes as follows:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

but it doesn't work, aren't they same?

2. I am not very familiar with linux, I really dont know why should i add _*_ at the beginning and end of code delcaration, and when i tried to replaced # -*- coding: utf8 -*- with # code=utf8 or # code: utf8, I got an error:

File "pythontest.py", line 3
SyntaxError: Non-ASCII character '\xde' in file pythontest.py on line 3, but no encoding declared; see     http://www.python.org/peps/pep-0263.html for details

but these code declaration is mentioned in the website http://www.python.org/dev/peps/pep-0263/!

and according to the documentation , the code declaration as follows is allowed:

# This Python file uses the following encoding: utf-8

Oops, what's this? I don't think it can be recognized by computer.what in the world should the code declared? I feel more and more confused.

Thanks for help.

like image 477
Searene Avatar asked Nov 26 '11 12:11

Searene


People also ask

What encoding should I use Python?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding.

What are the encoding types in Python?

The popular encodings being utf-8, ascii, etc. Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.

Does Python use UTF 16?

Since Python 2.2, "wide" builds of Unicode are supported which use UTF-32 instead; these are primarily used on Linux. Python 3.3 no longer ever uses UTF-16, instead an encoding that gives the most compact representation for the given string is chosen from ASCII/Latin-1, UCS-2, and UTF-32.

Does Python use ASCII or Unicode?

1. Python 2 uses str type to store bytes and unicode type to store unicode code points. All strings by default are str type — which is bytes~ And Default encoding is ASCII.


Video Answer


2 Answers

The abstract of the PEP you link really says it all:

This PEP proposes to introduce a syntax to declare the encoding of a Python source file. The encoding information is then used by the Python parser to interpret the file using the given encoding. Most notably this enhances the interpretation of Unicode literals in the source code and makes it possible to write Unicode literals using e.g. UTF-8 directly in an Unicode aware editor.

(the emphasis is mine).

Even if what you wanted to do would have worked (replacing the encoding of the source file programmatically), it wouldn't have had any sense. Think about it: the code is static (doesn't change). It would make no sense to try to read it with different encoding: there is only one correct one (the one the author of the source edited the source in).

As for the syntax:

# This Python file uses the following encoding: utf-8

the PEP itself says that that syntax is "Without interpreter line, using plain text". It is placed there for humans. So that if you open a file in a text editor and find it full of gibberish, you can manually set the encode of the source in its menu.

EDIT: As for why you should put the encoding between # -*- and -*-... That's purely conventional. The first symbol, the hash sign, tells that that is a comment (so it won't be compiled to bytecode), then the _*_ is just a way to tell the parser that that specific comment is for him/her.

It is not any different than putting in your source:

# TODO: fix this nasty bug

in which the TODO: part tells the developer (and some IDE) that this is a message requiring an action. You could have really used whatever your wanted, including @MarkZar or WTF!... just convention!

HTH!

like image 73
mac Avatar answered Nov 02 '22 05:11

mac


The important part of python encoding declaration is coding: utf-8 and it should be in a comment before the first line of python code, and you can do whatever you want with the other part of the comment.

Here is the lines in the PEP described this behaviour:

More precisely, the first or second line must match the regular expression "coding[:=]\s*([-\w.]+)". The first group of this expression is then interpreted as encoding name. If the encoding is unknown to Python, an error is raised during compilation. There must not be any Python statement on the line that contains the encoding declaration.

like image 29
number5 Avatar answered Nov 02 '22 05:11

number5