Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I traverse directories named in Japanese in Python?

I'm trying to build a simple helper utility that will look through my projects and find and return the open ones to me via command line. But my calls to os.listdir return gibberish (example: '\x82\xa9\x82\xcc\x96I') whenever the folder or filename is in Japanese, and said gibberish can't be passed to the call again to get into the folder either. i.e. os.listdir('C:\Documents and Settings\\x82\xa9\x82\xcc\x96I') returns an error:

'WindowsError: [Error 3] 指定されたパスが見つかりません。'

Does anybody know how I can get around this? Thanks a lot.

like image 616
StormShadow Avatar asked Jul 14 '11 08:07

StormShadow


2 Answers

You may need to decode the string into Unicode, then re-encode it in UTF-8 before passing it to os.listdir. It looks like your Japanese string is encoded in shift-JIS:

>>> '\x82\xa9\x82\xcc\x96I'.decode('shift-jis').encode('utf-8')
'\xe3\x81\x8b\xe3\x81\xae\xe8\x9c\x82'
>>> print '\x82\xa9\x82\xcc\x96I'.decode('shift-jis')
かの蜂

Alternatively, make use of the following feature of os.listdir to get Unicode strings out of it in the first place:

On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects. Undecodable filenames will still be returned as string objects.

So:

os.listdir(ur'C:\Documents and Settings')
# ---------^
like image 69
Fred Foo Avatar answered Oct 23 '22 12:10

Fred Foo


You should try to pass in the directory-name as Unicode-literal (u'your/path'). This way, the result is also Unicode (which is probably required to work with Japanese characters).

From the documentation:

On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects. Undecodable filenames will still be returned as string objects.

like image 20
Björn Pollex Avatar answered Oct 23 '22 11:10

Björn Pollex