Python not able to open file with non-english characters in path

Tags:

I have a file with the following path : D:/bar/クレイジー・ヒッツ！/foo.abc

I am parsing the path from a XML file and storing it in a variable called path in the form of file://localhost/D:/bar/クレイジー・ヒッツ！/foo.abc Then, the following operations are being done :

path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.url2pathname(path)
path=urllib.unquote(path)

The error is :

IOError: [Errno 2] No such file or directory: 'D:\\bar\\\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81\\foo.abc'

Update 1 : I am using Python 2.7 on Windows 7

387

asked May 12 '11 07:05

bcosynot

2 Answers

Provide the filename as a unicode string to the open call.

How do you produce the filename?

if provided as a constant by you

Add a line near the beginning of your script:

# -*- coding: utf8 -*-

Then, in a UTF-8 capable editor, set path to the unicode filename:

path = u"D:/bar/クレイジー・ヒッツ！/foo.abc"

read from a list of directory contents

Retrieve the contents of the directory using a unicode dirspec:

dir_files= os.listdir(u'.')

read from a text file

Open the filename-containing-file using codecs.open to read unicode data from it. You need to specify the encoding of the file (because you know what is the “default windows charset” for non-Unicode applications on your computer).

in any case

Do a:

path= path.decode("utf8")

before opening the file; substitute the correct encoding if not "utf8".

answered Oct 02 '22 20:10

tzot

The path in your error is:

'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'

I think this is the UTF8 encoded version of your filename.

I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:

>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>> 
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']

So it seems that Duncan's suggestion of path.decode('utf8') does the trick.

Update

I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8'). This is a bit hacky...

ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
  path = path.decode('utf8')
path=urllib.url2pathname(path)

102

answered Oct 02 '22 19:10

MattH

Related questions
                            
                                Pytest: How to parametrize a test with a list that is returned from a fixture?
                            
                                Full gradient descent in keras
                            
                                Why does pytesseract fail to recognise digits from image with darker background?
                            
                                Python Google Cloud Function Logging Severity and Duplicates
                            
                                Flask admin remember form value
                            
                                Despite installing the torch vision pytorch library, I am getting an error saying that there is no module named torch vision
                            
                                Getting coordinates of the closest data point on matplotlib plot
                            
                                Google cloud storage python client AttributeError: 'ClientOptions' object has no attribute 'scopes' occurs after deployment
                            
                                Merge two files and add computation and sorting the updated data in python
                            
                                Replacement for for... if array iteration
                            
                                Python UPnP/IGD Client Implementation?
                            
                                Why am I leaking memory with this python loop?
                            
                                How to display locale sensitive time format without seconds in python
                            
                                How do I create a named temporary file on windows in Python?
                            
                                Matplotlib: one line, plotted against two related x axes in different units?
                            
                                Overriding __cmp__, __eq__, and __hash__ for SQLAlchemy Declarative Base
                            
                                Python: Pickling a dict with some unpicklable items
                            
                                Twisted and Websockets: Beyond Echo
                            
                                Is there a Google Insights API? [closed]
                            
                                recursive crawling with Python and Scrapy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python not able to open file with non-english characters in path

Tags:

python

file

path

url-encoding