Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python os.stat and unicode file names

In my Django application, a user has uploaded a file with a unicode character in the name.

When I'm downloading files, I'm calling :

os.path.exists(media)

to test that the file is there. This, in turn, seems to call

st = os.stat(path)

Which then blows up with the error :

UnicodeEncodeError: 'ascii' codec can't encode character u'\xcf' in position 92: ordinal not in range(128)

What can I do about this? Is there an option to path.exists to handle it?

Update : Actually, all I had to do was encode the argument to exists, ie.

os.path.exists(media.encode('utf-8')

Thanks everyone who answered.

like image 467
interstar Avatar asked Jan 16 '10 08:01

interstar


People also ask

What is the use of os path dirname (__ file __) in this method?

dirname() method in Python is used to get the directory name from the specified path. Parameter: path: A path-like object representing a file system path. Return Type: This method returns a string value which represents the directory name from the specified path.

What is os path Expanduser?

path. expanduser() method in Python is used to expand an initial path component ~( tilde symbol) or ~user in the given path to user's home directory. On Unix platforms, an initial ~ is replaced by the value of HOME environment variable, if it is set.

What is os stat in Python?

OS comes under Python's standard utility modules. This module provides a portable way of using operating system dependent functionality. os. stat() method in Python performs stat() system call on the specified path. This method is used to get status of the specified path.


1 Answers

I'm assuming you're in Unix. If not, please remember to say which OS you're in.

Make sure your locale is set to UTF-8. All modern Linux systems do this by default, usually by setting the environment variable LANG to "en_US.UTF-8", or another language. Also, make sure your filenames are encoded in UTF-8.

With that set, there's no need to mess with encodings to access files in any language, even in Python 2.x.

[~/test] echo $LANG
en_US.UTF-8
[~/test] echo testing > 漢字
[~/test] python2.6
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.stat("漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> os.stat(u"漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> open("漢字").read()
'testing\n'
>>> open(u"漢字").read()
'testing\n'

If this doesn't work, run "locale"; if the values are "C" instead of en_US.UTF-8, you may not have the locale installed correctly.

If you're in Windows, I think Unicode filenames should always just work (at least for the os/posix modules), since the Unicode file API in Windows is supported transparently.

like image 165
Glenn Maynard Avatar answered Oct 21 '22 13:10

Glenn Maynard