I'd like to use the Python Requests library to GET a file from a url and use it as a mulitpart encoded file in a post request. The catch is that the file could be very large (50MB-2GB) and I don't want to load it in memory. (Context here.)
Following examples in the docs (multipart, stream down and stream up) I cooked up something like this:
with requests.get(big_file_url, stream=True) as f:
requests.post(upload_url, files={'file': ('filename', f.content)})
but I'm not sure I'm doing it right. It is in fact throwing this error - redacted from traceback:
with requests.get(big_file_url, stream=True) as f:
AttributeError: __exit__
Any suggestions?
As other answers have pointed out already: requests
doesn't support POSTing multipart-encoded files without loading them into memory.
To upload a large file without loading it into memory using multipart/form-data, you could use poster
:
#!/usr/bin/env python
import sys
from urllib2 import Request, urlopen
from poster.encode import multipart_encode # $ pip install poster
from poster.streaminghttp import register_openers
register_openers() # install openers globally
def report_progress(param, current, total):
sys.stderr.write("\r%03d%% of %d" % (int(1e2*current/total + .5), total))
url = 'http://example.com/path/'
params = {'file': open(sys.argv[1], "rb"), 'name': 'upload test'}
response = urlopen(Request(url, *multipart_encode(params, cb=report_progress)))
print response.read()
It can be adapted to allow a GET response object instead of a local file:
import posixpath
import sys
from urllib import unquote
from urllib2 import Request, urlopen
from urlparse import urlsplit
from poster.encode import MultipartParam, multipart_encode # pip install poster
from poster.streaminghttp import register_openers
register_openers() # install openers globally
class MultipartParamNoReset(MultipartParam):
def reset(self):
pass # do nothing (to allow self.fileobj without seek() method)
get_url = 'http://example.com/bigfile'
post_url = 'http://example.com/path/'
get_response = urlopen(get_url)
param = MultipartParamNoReset(
name='file',
filename=posixpath.basename(unquote(urlsplit(get_url).path)), #XXX \ bslash
filetype=get_response.headers['Content-Type'],
filesize=int(get_response.headers['Content-Length']),
fileobj=get_response)
params = [('name', 'upload test'), param]
datagen, headers = multipart_encode(params, cb=report_progress)
post_response = urlopen(Request(post_url, datagen, headers))
print post_response.read()
This solution requires a valid Content-Length
header (known file size) in the GET response. If the file size is unknown then the chunked transfer encoding could be used to upload the multipart/form-data content. A similar solution could be implemented using urllib3.filepost
that is shipped with requests
library e.g., based on @AdrienF's answer without using poster
.
In theory you can just the raw object
In [1]: import requests
In [2]: raw = requests.get("http://download.thinkbroadband.com/1GB.zip", stream=True).raw
In [3]: raw.read(10)
Out[3]: '\xff\xda\x18\x9f@\x8d\x04\xa11_'
In [4]: raw.read(10)
Out[4]: 'l\x15b\x8blVO\xe7\x84\xd8'
In [5]: raw.read() # take forever...
In [6]: raw = requests.get("http://download.thinkbroadband.com/5MB.zip", stream=True).raw
In [7]: requests.post("http://www.amazon.com", {'file': ('thing.zip', raw, 'application/zip')}, stream=True)
Out[7]: <Response [200]>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With