Here is a little program:
import sys
f = sys.argv[1]
print type(f)
print u"f=%s" % (f)
Here is my running of the program:
$ python x.py 'Recent/רשימת משתתפים.LNK'
<type 'str'>
Traceback (most recent call last):
File "x.py", line 5, in <module>
print u"f=%s" % (f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 7: ordinal not in range(128)
$
The problem is that sys.argv[1] is thinking that it's getting an ascii string, which it can't convert to Unicode. But I'm using a Mac with a full Unicode-aware Terminal, so x.py
is actually getting a Unicode string. How do I tell Python that sys.argv[] is Unicode and not Ascii? Failing that, how do I convert ASCII (that has unicode inside it) into Unicode? The obvious conversions don't work.
In Python, the built-in functions chr() and ord() are used to convert between Unicode code points and characters. A character can also be represented by writing a hexadecimal Unicode code point with \x , \u , or \U in a string literal.
argv with examples. sys. argv is a list in Python that contains all the command-line arguments passed to the script. It is essential in Python while working with Command Line arguments.
Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters. Unicode (https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code.
1. Python 2 uses str type to store bytes and unicode type to store unicode code points. All strings by default are str type — which is bytes~ And Default encoding is ASCII.
The UnicodeDecodeError
error you see is due to you're mixing the Unicode string u"f=%s"
and the sys.argv[1]
bytestring:
both bytestrings:
$ python2 -c'import sys; print "f=%s" % (sys.argv[1],)' 'Recent/רשימת משתתפים'
This passes bytes transparently from/to your terminal. It works for any encoding.
both Unicode:
$ python2 -c'import sys; print u"f=%s" % (sys.argv[1].decode("utf-8"),)' 'Rec..
Here you should replace 'utf-8'
by the encoding your terminal uses. You might use sys.getfilesystemencoding()
here if the terminal is not Unicode-aware.
Both commands produce the same output:
f=Recent/רשימת משתתפים
In general you should convert bytestrings that you consider to be text to Unicode as soon as possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With