I have a Flask application that shall provide an endpoint to download a large file. However, instead of providing it from the file system or generating the file on-the-fly, this file has to be downloaded first from another server via HTTP.
Of course, I could perform a GET request to the external server first, download the file completely and store it in the file system or in memory and then as a second step provide it as a result for the original request. This would look for example like this (also including a basic authentication to indicate why a simple proxy on a lower layer is not sufficient):
#!flask/bin/python
from flask import Flask, jsonify
import os
import requests
from requests.auth import HTTPBasicAuth
app = Flask(__name__)
@app.route('/download')
def download():
auth = HTTPBasicAuth("some_user", "some_password")
session = requests.Session()
session.auth = auth
response = session.get("http://example.com")
return response.content
if __name__ == '__main__':
app.run(host='0.0.0.0', port=1234, debug=True)
However, this increases both the latency and the storage requirements of the application. And also, even if the receiver only requires to perform a partial download (i.e. it performs a HTTP range request) of the file, it has to be fetched from the external server completely, first.
Is there a more elegant option to solve this, i.e. to provide support for HTTP range requests that are directly forwarded to the external server?
According to Proxying to another web service with Flask, Download large file in python with requests and Flask large file download I managed to make a Flask HTTP proxy in stream mode.
from flask import Flask, request, Response
import requests
PROXY_URL = 'http://ipv4.download.thinkbroadband.com/'
def download_file(streamable):
with streamable as stream:
stream.raise_for_status()
for chunk in stream.iter_content(chunk_size=8192):
yield chunk
def _proxy(*args, **kwargs):
resp = requests.request(
method=request.method,
url=request.url.replace(request.host_url, PROXY_URL),
headers={key: value for (key, value) in request.headers if key != 'Host'},
data=request.get_data(),
cookies=request.cookies,
allow_redirects=False,
stream=True)
excluded_headers = ['content-encoding', 'content-length', 'transfer-encoding', 'connection']
headers = [(name, value) for (name, value) in resp.raw.headers.items()
if name.lower() not in excluded_headers]
return Response(download_file(resp), resp.status_code, headers)
app = Flask(__name__)
@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def download(path):
return _proxy()
if __name__ == '__main__':
app.run(host='0.0.0.0', port=1234, debug=True)
download_file()
will open the request in stream mode and yield every chunk as soon as they got streamed.
_proxy()
create the request then just create and return a Flask Response
using the iterator download_file()
as content.
I tested it with https://www.thinkbroadband.com/download where several archive files are free to download for test purpose. (be careful, archives are corrupted, so you better use checksum to make sure you got the expected file).
Some examples:
curl 'http://0.0.0.0:1234/100MB.zip' --output /tmp/100MB.zip
curl 'http://0.0.0.0:1234/20MB.zip' --output /tmp/20MB.zip
I also performed some other tests on random websites to get large images. So far I got no issues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With