Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is python zipfile thread-safe?

In a django project, I need to generate some pdf files for objects in db. Since each file takes a few seconds to generate, I use celery to run tasks asynchronously.

Problem is, I need to add each file to a zip archive. I was planning to use the python zipfile module, but different tasks can be run in different threads, and I wonder what will happen if two tasks try to add a file to the archive at the same time.

Is the following code thread safe or not? I cannot find any valuable information in the python's official doc.

try:
    zippath = os.path.join(pdf_directory, 'archive.zip')
    zipfile = ZipFile(zippath, 'a')
    zipfile.write(pdf_fullname)
finally:
    zipfile.close()

Note: this is running under python 2.6

like image 391
Thibault J Avatar asked Feb 08 '12 14:02

Thibault J


People also ask

Is Python threading thread-safe?

Python is not by its self thread safe. But there are moves to change this: NoGil, etc. Removing the GIL does not make functions thread-safe.

What is Python ZIP file?

Python's zipfile is a standard library module intended to manipulate ZIP files. This file format is a widely adopted industry standard when it comes to archiving and compressing digital data. You can use it to package together several related files.

Can I read ZIP file in Python?

Python can work directly with data in ZIP files. You can look at the list of items in the directory and work with the data files themselves.

How do I unzip a ZIP file module in Python?

Import the zipfile module Create a zip file object using ZipFile class. Call the extract() method on the zip file object and pass the name of the file to be extracted and the path where the file needed to be extracted and Extracting the specific file present in the zip.


2 Answers

No, it is not thread-safe in that sense. If you're appending to the same zip file, you'd need a lock there, or the file contents could get scrambled. If you're appending to different zip files, using separate ZipFile() objects, then you're fine.

like image 179
nosklo Avatar answered Nov 03 '22 01:11

nosklo


Python 3.5.5 makes writing to ZipFile and reading multiple ZipExtFiles threadsafe: https://docs.python.org/3.5/whatsnew/changelog.html#id93

As far as I can tell, the change has not been backported to Python 2.7.

Update: after studying the code and some testing, it becomes apparent that the locking is still not thoroughly implemented. It correctly works only for writestr and doesn't work for open and write.

like image 44
Eugene Pakhomov Avatar answered Nov 03 '22 00:11

Eugene Pakhomov