Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to forward HTTP range requests using Python and Flask?

Tags:

python

http

flask

I have a Flask application that shall provide an endpoint to download a large file. However, instead of providing it from the file system or generating the file on-the-fly, this file has to be downloaded first from another server via HTTP.

Of course, I could perform a GET request to the external server first, download the file completely and store it in the file system or in memory and then as a second step provide it as a result for the original request. This would look for example like this (also including a basic authentication to indicate why a simple proxy on a lower layer is not sufficient):

#!flask/bin/python
from flask import Flask, jsonify
import os
import requests
from requests.auth import HTTPBasicAuth

app = Flask(__name__)

@app.route('/download')
def download():
    auth = HTTPBasicAuth("some_user", "some_password")
    session = requests.Session()
    session.auth = auth
    response = session.get("http://example.com")
    return response.content

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=1234, debug=True)

However, this increases both the latency and the storage requirements of the application. And also, even if the receiver only requires to perform a partial download (i.e. it performs a HTTP range request) of the file, it has to be fetched from the external server completely, first.

Is there a more elegant option to solve this, i.e. to provide support for HTTP range requests that are directly forwarded to the external server?

like image 263
koalo Avatar asked Jun 24 '20 10:06

koalo


1 Answers

According to Proxying to another web service with Flask, Download large file in python with requests and Flask large file download I managed to make a Flask HTTP proxy in stream mode.

from flask import Flask, request, Response
import requests

PROXY_URL = 'http://ipv4.download.thinkbroadband.com/'

def download_file(streamable):
    with streamable as stream:
        stream.raise_for_status()
        for chunk in stream.iter_content(chunk_size=8192):
            yield chunk


def _proxy(*args, **kwargs):
    resp = requests.request(
        method=request.method,
        url=request.url.replace(request.host_url, PROXY_URL),
        headers={key: value for (key, value) in request.headers if key != 'Host'},
        data=request.get_data(),
        cookies=request.cookies,
        allow_redirects=False,
        stream=True)

    excluded_headers = ['content-encoding', 'content-length', 'transfer-encoding', 'connection']
    headers = [(name, value) for (name, value) in resp.raw.headers.items()
               if name.lower() not in excluded_headers]

    return Response(download_file(resp), resp.status_code, headers)


app = Flask(__name__)

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def download(path):
    return _proxy()

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=1234, debug=True)

download_file() will open the request in stream mode and yield every chunk as soon as they got streamed.

_proxy() create the request then just create and return a Flask Response using the iterator download_file() as content.

I tested it with https://www.thinkbroadband.com/download where several archive files are free to download for test purpose. (be careful, archives are corrupted, so you better use checksum to make sure you got the expected file).

Some examples:

curl 'http://0.0.0.0:1234/100MB.zip' --output /tmp/100MB.zip
curl 'http://0.0.0.0:1234/20MB.zip' --output /tmp/20MB.zip

I also performed some other tests on random websites to get large images. So far I got no issues.

like image 99
Arount Avatar answered Oct 12 '22 10:10

Arount