Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specify Python source file encoding from the command line

PEP0263 specifies a syntax to declare the encoding of a Python source file within the source file itself.

Is it possible to specify the encoding from the command line?

Or is there a reason why this might be undesirable?

I'm thinking of something like:

$ python --encoding utf-8 myscript.py

or even:

$ PYTHONSOURCEENCODING=utf-8 python myscript.py
like image 373
user3414663 Avatar asked May 18 '15 08:05

user3414663


People also ask

How do you define encoding in Python?

Python String encode() MethodThe encode() method encodes the string, using the specified encoding. If no encoding is specified, UTF-8 will be used.

How do you pass a file path as a command line argument in Python?

To use it, you just pass a path or filename into a new Path() object using forward slashes and it handles the rest: Notice two things here: You should use forward slashes with pathlib functions.

What is encoding =' latin1 in Python?

“latin-1” is another significant ASCII-compatible encoding, as it maps byte values directly to the first 256 Unicode code points.


2 Answers

This is a hack, and isn't what you're looking for, and it doesn't work on systems that don't have sed, but you can prepend the coding line to any python script by using sed '1s/^/# -*- coding: utf-8 -*-\n/' script.py | python.

To make this more generalized, you can define a function in your .bashrc or profile.

As an aside, I think the reason that this wasn't implemented in the first place is that encoding is and should be considered a property of each file itself, not the call that spawns the thread. The conceptual spaces in which file encoding and process spawning exist are pretty different, at least to my thinking.

like image 87
a p Avatar answered Oct 19 '22 17:10

a p


Although there could be special use cases where this feature could help, I think it could be confusing.

When you execute a Python script, there can be 2 diffent encodings :

  • the source script encoding, which can be defined in the script itself via PEP0263
  • the environment encoding that can be defined through environment variables

The former is static in the script and its only use is to allow programmer to use non ASCII characters in litteral strings

The latter is what should be used for IO. It may change on different runs of the script.

If you want to pass the script encoding on command line (or through environment variables) you add confusion with the local runtime system encoding.

like image 33
Serge Ballesta Avatar answered Oct 19 '22 18:10

Serge Ballesta