Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing a file-like object to write() method of another file-like object

I am trying to get a large file from the web, and stream it directly into the zipfile writer provided by the zipfile module, something like:

from urllib.request import urlopen
from zipfile import ZipFile

zip_file = ZipFile("/a/certain/local/zip/file.zip","a")
entry = zip_file.open("an.entry","w")
entry.write( urlopen("http://a.certain.file/on?the=web") )

Apparently, this doesn't work because .write accepts a bytes argument, not an I/O reader. However, since the file is rather large I don't want to load the whole file into RAM before compressing it.

The simple solution is to use bash (never really tried, could be wrong):

curl -s "http://a.certain.file/on?the=web" | zip -q /a/certain/local/zip/file.zip

but it wouldn't be a very elegant, nor convenient, thing to put a single line of bash in a Python script.

Another solution is to use urllib.request.urlretrieve to download the file and then pass the path to zipfile.ZipFile.open, but that way I would still have to wait for the download to complete, and besides that also consume a lot more disk I/O resource.

Is there a way, in Python, to directly pass the download stream to a zipfile writer, like the the bash pipeline above?

like image 237
busukxuan Avatar asked Feb 02 '17 20:02

busukxuan


People also ask

What is file like object in Python?

File objects are also called file-like objects or streams. There are actually three categories of file objects: raw binary files, buffered binary files and text files. Their interfaces are defined in the io module. The canonical way to create a file object is by using the open() function.

How do you create a file like an object in Python?

To create a file object in Python use the built-in functions, such as open() and os. popen() . IOError exception is raised when a file object is misused, or file operation fails for an I/O-related reason. For example, when you try to write to a file when a file is opened in read-only mode.

How do I copy a text file to another file in Python?

The shutil. copy() method in Python is used to copy the content of the source file to destination file or directory.


1 Answers

You can use shutil.copyfileobj() to efficiently copy data between file objects:

from shutil import copyfileobj

with ZipFile("/a/certain/local/zip/file.zip", "w") as zip_file:
    with zip_file.open("an.entry", "w") as entry:
        with urlopen("http://a.certain.file/on?the=web") as response:
            shutil.copyfileobj(response, entry)

This'll call .read() with a given chunksize on the source file object, then pass that chunk to the .write() method on the target file object.

If you are using Python 3.5 or older (where you can't yet directly write to a ZipFile member), your only option is to stream to a temporary file first:

from shutil import copyfileobj
from tempfile import NamedTemporaryFile

with ZipFile("/a/certain/local/zip/file.zip", "w") as zip_file:
    with NamedTemporaryFile() as cache:
        with urlopen("http://a.certain.file/on?the=web") as response:
            shutil.copyfileobj(response, cache)
            cache.flush()
            zipfile.write('an.entry', cache.name)

Using a NamedTemporaryFile() like this only works on POSIX systems, on Windows, you can't open the same filename again, so you'd have to use a tempfile.mkstemp() generated name, open the file from there, and use try...finally to clean up afterwards.

like image 161
Martijn Pieters Avatar answered Sep 20 '22 17:09

Martijn Pieters