Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling large file uploads with Flask

What would be the best way to handle very large file uploads (1 GB +) with Flask?

My application essentially takes multiple files assigns them one unique file number and then saves it on the server depending on where the user selected.

How can we run file uploads as a background task so the user does not have the browser spin for 1hour and can instead proceed to the next page right away?

  • Flask development server is able to take massive files (50gb took 1.5 hours, upload was quick but writing the file into a blank file was painfully slow)
  • If I wrap the app with Twisted, the app crashes on large files
  • I've tried using Celery with Redis but this doesn't seem to be an option with posted uploads
  • I'm on Windows and have fewer options for webservers
like image 891
Infinity8 Avatar asked Jun 23 '17 17:06

Infinity8


People also ask

How do you handle file uploads in Flask?

Handling file upload in Flask is very easy. It needs an HTML form with its enctype attribute set to 'multipart/form-data', posting the file to a URL. The URL handler fetches file from request. files[] object and saves it to the desired location.

How do I handle a large file upload?

Possible solutions: 1) Configure maximum upload file size and memory limits for your server. 2) Upload large files in chunks. 3) Apply resumable file uploads. Chunking is the most commonly used method to avoid errors and increase speed.


2 Answers

I think the super simple way to get around that simply sends the file in lots of small parts/chunks. So there are going to be two parts to making this work, the front-end (website) and backend (server). For the front-end part, you can use something like Dropzone.js which has no additional dependencies and decent CSS included. All you have to do is add the class dropzone to a form and it automatically turns it into one of their special drag and drop fields (you can also click and select).

However, by default, dropzone does not chunk files. Luckily, it is really easy to enable. Here's a sample file upload form with DropzoneJS and chunking enabled:

<html lang="en"> <head>      <meta charset="UTF-8">      <link rel="stylesheet"       href="https://cdnjs.cloudflare.com/ajax/libs/dropzone/5.4.0/min/dropzone.min.css"/>      <link rel="stylesheet"       href="https://cdnjs.cloudflare.com/ajax/libs/dropzone/5.4.0/min/basic.min.css"/>      <script type="application/javascript"       src="https://cdnjs.cloudflare.com/ajax/libs/dropzone/5.4.0/min/dropzone.min.js">     </script>      <title>File Dropper</title> </head> <body>  <form method="POST" action='/upload' class="dropzone dz-clickable"        id="dropper" enctype="multipart/form-data"> </form>  <script type="application/javascript">     Dropzone.options.dropper = {         paramName: 'file',         chunking: true,         forceChunking: true,         url: '/upload',         maxFilesize: 1025, // megabytes         chunkSize: 1000000 // bytes     } </script> </body> </html> 

And Here's the Back-end part using flask:

import logging import os  from flask import render_template, Blueprint, request, make_response from werkzeug.utils import secure_filename  from pydrop.config import config  blueprint = Blueprint('templated', __name__, template_folder='templates')  log = logging.getLogger('pydrop')   @blueprint.route('/') @blueprint.route('/index') def index():     # Route to serve the upload form     return render_template('index.html',                            page_name='Main',                            project_name="pydrop")   @blueprint.route('/upload', methods=['POST']) def upload():     file = request.files['file']      save_path = os.path.join(config.data_dir, secure_filename(file.filename))     current_chunk = int(request.form['dzchunkindex'])      # If the file already exists it's ok if we are appending to it,     # but not if it's new file that would overwrite the existing one     if os.path.exists(save_path) and current_chunk == 0:         # 400 and 500s will tell dropzone that an error occurred and show an error         return make_response(('File already exists', 400))      try:         with open(save_path, 'ab') as f:             f.seek(int(request.form['dzchunkbyteoffset']))             f.write(file.stream.read())     except OSError:         # log.exception will include the traceback so we can see what's wrong          log.exception('Could not write to file')         return make_response(("Not sure why,"                               " but we couldn't write the file to disk", 500))      total_chunks = int(request.form['dztotalchunkcount'])      if current_chunk + 1 == total_chunks:         # This was the last chunk, the file should be complete and the size we expect         if os.path.getsize(save_path) != int(request.form['dztotalfilesize']):             log.error(f"File {file.filename} was completed, "                       f"but has a size mismatch."                       f"Was {os.path.getsize(save_path)} but we"                       f" expected {request.form['dztotalfilesize']} ")             return make_response(('Size mismatch', 500))         else:             log.info(f'File {file.filename} has been uploaded successfully')     else:         log.debug(f'Chunk {current_chunk + 1} of {total_chunks} '                   f'for file {file.filename} complete')      return make_response(("Chunk upload successful", 200)) 
like image 197
Abdul Rehman Avatar answered Oct 04 '22 12:10

Abdul Rehman


Use copy_current_request_context,it will duplicate the context request.so you can use thread or anything else to make your task running background.

maybe an example will make it be clear.i have test it by a 3.37G file-debian-9.5.0-amd64-DVD-1.iso.

# coding:utf-8  from flask import Flask,render_template,request,redirect,url_for from werkzeug.utils import secure_filename import os from time import sleep from flask import copy_current_request_context import threading import datetime app = Flask(__name__) @app.route('/upload', methods=['POST','GET']) def upload():     @copy_current_request_context     def save_file(closeAfterWrite):         print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') + " i am doing")         f = request.files['file']         basepath = os.path.dirname(__file__)          upload_path = os.path.join(basepath, '',secure_filename(f.filename))          f.save(upload_path)         closeAfterWrite()         print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') + " write done")     def passExit():         pass     if request.method == 'POST':         f= request.files['file']         normalExit = f.stream.close         f.stream.close = passExit         t = threading.Thread(target=save_file,args=(normalExit,))         t.start()         return redirect(url_for('upload'))     return render_template('upload.html')  if __name__ == '__main__':     app.run(debug=True) 

this is tempalte,it should be templates\upload.html

<!DOCTYPE html> <html lang="en"> <head>     <meta charset="UTF-8">     <title>Title</title> </head> <body>     <h1>example</h1>     <form action="" enctype='multipart/form-data' method='POST'>         <input type="file" name="file">         <input type="submit" value="upload">     </form> </body> </html> 
like image 20
obgnaw Avatar answered Oct 04 '22 10:10

obgnaw