Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

listdir doesn't print non-english letters correctly

On Python 2.7,

for dir in os.listdir("E:/Library/Documents/Old - Archives/Case"):
   print dir

prints out:

Danny.xlsx
Dannyh.xlsx
~$??? ?? ?????? ??? ???? ???????.docx

while this:

# using a unicode literal
for dir in os.listdir(u"E:/Library/Documents/Old - Archives/Case"):
   print dir

prints out:

Dan.xlsx
Dann.xlsx

Traceback (most recent call last):
  File "E:\...\FirstModule.py", line 31, in <module>
    print dir
  File "C:\Python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 2-4: character maps to <undefined>

The file's name is in Hebrew, as such: המסמך.xls

How can I make it appear in Hebrew in Python too?

like image 265
mirandalol Avatar asked Jan 17 '23 01:01

mirandalol


2 Answers

The version with u'' string literal works fine: ask with a Unicode pathname and you'll get a Unicode pathname in response, allowing you to work with filenames that include characters outside the current code page.

Your problem comes solely from trying to print the filename. Getting Unicode output to the Windows Command Prompt is a trial.

The default C standard library print function is limited to the locale code page. Unless you call the Win32 API function WriteConsoleW directly (using ctypes) you're never going to get reliable console Unicode support; and even then it won't work unless a suitable non-default font is chosen. This affects pretty much all non-native command line tools, not just Python.

like image 116
bobince Avatar answered Jan 29 '23 12:01

bobince


Solved it: # -*- coding: utf-8 -*- at the top of the document solved it.

like image 22
mirandalol Avatar answered Jan 29 '23 13:01

mirandalol