Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python not able to open file with non-english characters in path

I have a file with the following path : D:/bar/クレイジー・ヒッツ!/foo.abc

I am parsing the path from a XML file and storing it in a variable called path in the form of file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc Then, the following operations are being done :

path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.url2pathname(path)
path=urllib.unquote(path)

The error is :

IOError: [Errno 2] No such file or directory: 'D:\\bar\\\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81\\foo.abc'

Update 1 : I am using Python 2.7 on Windows 7

like image 387
bcosynot Avatar asked May 12 '11 07:05

bcosynot


People also ask

How do you open a file in a specific path in Python?

Use open() to open a file in a different directory Join path with filename into a path to filename from the current directory. Call open(file) with file as the resultant path to filename to open filename from the current directory.

Why does Python not recognize my file?

The error FileNotFoundError occurs because you either don't know where a file actually is on your computer. Or, even if you do, you don't know how to tell your Python program where it is. Don't try to fix other parts of your code that aren't related to specifying filenames or paths.

How do I get the path of a text file in Python?

Run it with the python (or python3 ) command. You can get the absolute path of the current working directory with os. getcwd() and the path specified with the python3 command with __file__ . In Python 3.8 and earlier, the path specified by the python (or python3 ) command is stored in __file__ .


2 Answers

Provide the filename as a unicode string to the open call.

How do you produce the filename?

if provided as a constant by you

Add a line near the beginning of your script:

# -*- coding: utf8 -*-

Then, in a UTF-8 capable editor, set path to the unicode filename:

path = u"D:/bar/クレイジー・ヒッツ!/foo.abc"

read from a list of directory contents

Retrieve the contents of the directory using a unicode dirspec:

dir_files= os.listdir(u'.')

read from a text file

Open the filename-containing-file using codecs.open to read unicode data from it. You need to specify the encoding of the file (because you know what is the “default windows charset” for non-Unicode applications on your computer).

in any case

Do a:

path= path.decode("utf8")

before opening the file; substitute the correct encoding if not "utf8".

like image 38
tzot Avatar answered Oct 02 '22 20:10

tzot


The path in your error is:

'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'

I think this is the UTF8 encoded version of your filename.

I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:

>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>> 
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']

So it seems that Duncan's suggestion of path.decode('utf8') does the trick.


Update

I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8'). This is a bit hacky...

ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
  path = path.decode('utf8')
path=urllib.url2pathname(path)
like image 102
MattH Avatar answered Oct 02 '22 19:10

MattH