Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Python Requests to 'bridge' a file without loading into memory?

I'd like to use the Python Requests library to GET a file from a url and use it as a mulitpart encoded file in a post request. The catch is that the file could be very large (50MB-2GB) and I don't want to load it in memory. (Context here.)

Following examples in the docs (multipart, stream down and stream up) I cooked up something like this:

    with requests.get(big_file_url, stream=True) as f:
        requests.post(upload_url, files={'file': ('filename', f.content)})

but I'm not sure I'm doing it right. It is in fact throwing this error - redacted from traceback:

    with requests.get(big_file_url, stream=True) as f:
    AttributeError: __exit__

Any suggestions?

like image 900
ergelo Avatar asked Apr 12 '13 13:04

ergelo


2 Answers

As other answers have pointed out already: requests doesn't support POSTing multipart-encoded files without loading them into memory.

To upload a large file without loading it into memory using multipart/form-data, you could use poster:

#!/usr/bin/env python
import sys
from urllib2 import Request, urlopen

from poster.encode import multipart_encode # $ pip install poster
from poster.streaminghttp import register_openers

register_openers() # install openers globally

def report_progress(param, current, total):
    sys.stderr.write("\r%03d%% of %d" % (int(1e2*current/total + .5), total))

url = 'http://example.com/path/'
params = {'file': open(sys.argv[1], "rb"), 'name': 'upload test'}
response = urlopen(Request(url, *multipart_encode(params, cb=report_progress)))
print response.read()

It can be adapted to allow a GET response object instead of a local file:

import posixpath
import sys
from urllib import unquote
from urllib2 import Request, urlopen
from urlparse import urlsplit

from poster.encode import MultipartParam, multipart_encode # pip install poster
from poster.streaminghttp import register_openers

register_openers() # install openers globally

class MultipartParamNoReset(MultipartParam):
    def reset(self):
        pass # do nothing (to allow self.fileobj without seek() method)

get_url = 'http://example.com/bigfile'
post_url = 'http://example.com/path/'

get_response = urlopen(get_url)
param = MultipartParamNoReset(
    name='file',
    filename=posixpath.basename(unquote(urlsplit(get_url).path)), #XXX \ bslash
    filetype=get_response.headers['Content-Type'],
    filesize=int(get_response.headers['Content-Length']),
    fileobj=get_response)

params = [('name', 'upload test'), param]
datagen, headers = multipart_encode(params, cb=report_progress)
post_response = urlopen(Request(post_url, datagen, headers))
print post_response.read()

This solution requires a valid Content-Length header (known file size) in the GET response. If the file size is unknown then the chunked transfer encoding could be used to upload the multipart/form-data content. A similar solution could be implemented using urllib3.filepost that is shipped with requests library e.g., based on @AdrienF's answer without using poster.

like image 55
jfs Avatar answered Sep 20 '22 06:09

jfs


In theory you can just the raw object

In [1]: import requests

In [2]: raw = requests.get("http://download.thinkbroadband.com/1GB.zip", stream=True).raw

In [3]: raw.read(10)
Out[3]: '\xff\xda\x18\x9f@\x8d\x04\xa11_'

In [4]: raw.read(10)
Out[4]: 'l\x15b\x8blVO\xe7\x84\xd8'

In [5]: raw.read() # take forever...

In [6]: raw = requests.get("http://download.thinkbroadband.com/5MB.zip", stream=True).raw

In [7]: requests.post("http://www.amazon.com", {'file': ('thing.zip', raw, 'application/zip')}, stream=True)
Out[7]: <Response [200]>
like image 29
Batiste Bieler Avatar answered Sep 20 '22 06:09

Batiste Bieler