The goal is to download a file from the internet, and create from it a file object, or a file like object without ever having it touch the hard drive. This is just for my knowledge, wanting to know if its possible or practical, particularly because I would like to see if I can circumvent having to code a file deletion line.
This is how I would normally download something from the web, and map it to memory:
import requests import mmap u = requests.get("http://www.pythonchallenge.com/pc/def/channel.zip") with open("channel.zip", "wb") as f: # I want to eliminate this, as this writes to disk f.write(u.content) with open("channel.zip", "r+b") as f: # and his as well, because it reads from disk mm = mmap.mmap(f.fileno(), 0) mm.seek(0) print mm.readline() mm.close() # question: if I do not include this, does this become a memory leak?
The first thing to do is to use HTTP/2.0 and keep one conection open for all the files with Keep-Alive. The easiest way to do that is to use the Requests library, and use a session. If this isn't fast enough, then you need to do several parallel downloads with either multiprocessing or threads.
Download multiple files with a Python loop To download the list of URLs to the associated files, loop through the iterable ( inputs ) that we created, passing each element to download_url . After each download is complete we will print the downloaded URL and the time it took to download.
r.raw
(HTTPResponse
) is already a file-like object (just pass stream=True
):
#!/usr/bin/env python import sys import requests # $ pip install requests from PIL import Image # $ pip install pillow url = sys.argv[1] r = requests.get(url, stream=True) r.raw.decode_content = True # Content-Encoding im = Image.open(r.raw) #NOTE: it requires pillow 2.8+ print(im.format, im.mode, im.size)
In general if you have a bytestring; you could wrap it as f = io.BytesIO(r.content)
, to get a file-like object without touching the disk:
#!/usr/bin/env python import io import zipfile from contextlib import closing import requests # $ pip install requests r = requests.get("http://www.pythonchallenge.com/pc/def/channel.zip") with closing(r), zipfile.ZipFile(io.BytesIO(r.content)) as archive: print({member.filename: archive.read(member) for member in archive.infolist()})
You can't pass r.raw
to ZipFile()
directly because the former is a non-seekable file.
I would like to see if I can circumvent having to code a file deletion line
tempfile
can delete files automatically f = tempfile.SpooledTemporaryFile(); f.write(u.content)
. Until .fileno()
method is called (if some api requires a real file) or maxsize
is reached; the data is kept in memory. Even if the data is written on disk; the file is deleted as soon as it closed.
Your answer is u.content
. The content is in the memory. Unless you write it to a file, it won’t be stored on disk.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With