I have a little script which shall extract a .zip-file. This works well, but only for .zip-files which doesn't contain files with letters like "ä", "ö", "ü" (and so on) in their filenames. Otherwise I get this error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "threading.pyc", line 552, in __bootstrap_inner
File "install.py", line 92, in run
File "zipfile.pyc", line 962, in extractall
File "zipfile.pyc", line 950, in extract
File "zipfile.pyc", line 979, in _extract_member
File "ntpath.pyc", line 108, in join
UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 32: ordinal not in range(128)
Here is the extracting part of my script:
zip = zipfile.ZipFile(path1)
zip.extractall(path2)
How can I solve this?
one suggestion:
I get the error when I do that:
>>> c = chr(129)
>>> c + u'2'
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
c + u'2'
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 0: ordinal not in range(128)
There is a unicode string passed to join somewhere.
Could it be the file path of the zipfile is encoded in unicode? What if you do this:
zip = zipfile.ZipFile(str(path1))
zip.extractall(str(path2))
or this:
zip = zipfile.ZipFile(unicode(path1))
zip.extractall(unicode(path2))
This is line 128 in ntpath:
def join(a, *p): # 63
for b in p: # 68
path += "\\" + b # 128
Second Suggestion:
from ntpath import *
def join(a, *p):
"""Join two or more pathname components, inserting "\\" as needed.
If any component is an absolute path, all previous path components
will be discarded."""
path = a
for b in p:
b_wins = 0 # set to 1 iff b makes path irrelevant
if path == "":
b_wins = 1
elif isabs(b):
# This probably wipes out path so far. However, it's more
# complicated if path begins with a drive letter:
# 1. join('c:', '/a') == 'c:/a'
# 2. join('c:/', '/a') == 'c:/a'
# But
# 3. join('c:/a', '/b') == '/b'
# 4. join('c:', 'd:/') = 'd:/'
# 5. join('c:/', 'd:/') = 'd:/'
if path[1:2] != ":" or b[1:2] == ":":
# Path doesn't start with a drive letter, or cases 4 and 5.
b_wins = 1
# Else path has a drive letter, and b doesn't but is absolute.
elif len(path) > 3 or (len(path) == 3 and
path[-1] not in "/\\"):
# case 3
b_wins = 1
if b_wins:
path = b
else:
# Join, and ensure there's a separator.
assert len(path) > 0
if path[-1] in "/\\":
if b and b[0] in "/\\":
path += b[1:]
else:
path += b
elif path[-1] == ":":
path += b
elif b:
if b[0] in "/\\":
path += b
else:
# !!! modify the next line so it works !!!
path += "\\" + b
else:
# path is not empty and does not end with a backslash,
# but b is empty; since, e.g., split('a/') produces
# ('a', ''), it's best if join() adds a backslash in
# this case.
path += '\\'
return path
import ntpath
ntpath.join = join
For portable reason, maybe you zip files from Windows and extract them in Linux, you can convert all the file's path to unicode in zipped file, when extract from zip, do not use ZipFile.extractall
, this default extract file to disk and do not support unicode path in zipped file, try this:
import zipfile, sys, os,
zf = zipfile.ZipFile(sys.argv[1], 'r')
for m in zf.infolist():
data = zf.read(m) # extract zipped data into memory
# convert unicode file path to utf8
disk_file_name = m.filename.encode('utf8')
dir_name = os.path.dirname(disk_file_name)
try:
os.makedirs(dir_name)
except OSError as e:
if e.errno == os.errno.EEXIST:
pass
else:
raise
except Exception as e:
raise
with open(disk_file_name, 'wb') as fd:
fd.write(data)
zf.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With