Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best approach to handle large file uploads in a rails app?

I am interested in understanding the different approaches to handling large file uploads in a Rails application, 2-5Gb files.

I understand that in order to transfer a file of this size it will need to be broken down into smaller parts, I have done some research and here is what I have so far.

  • Server-side config will be required to accept large POST requests and probably a 64bit machine to handle anything over 4Gb.
  • AWS supports multipart upload.
  • HTML5 FileSystemAPI has a persistent uploader that uploads the file in chunks.
  • A library for Bitorrent although this requires a transmission client which is not ideal

Can all of these methods be resumed like FTP, the reason I dont want to use FTP is that I want to keep in the web app if this is possible? I have used carrierwave and paperclip but I am looking for something that will be able to be resumed as uploading a 5Gb file could take some time!

Of these approaches I have listed I would like to undertand what has worked well and if there are other approaches that I may be missing? No plugins if possible, would rather not use Java Applets or Flash. Another concern is that these solutions hold the file in memory while uploading, that is also a constraint I would rather avoid if possible.

like image 654
cih Avatar asked May 25 '13 12:05

cih


People also ask

How do I handle a large file upload?

Possible solutions: 1) Configure maximum upload file size and memory limits for your server. 2) Upload large files in chunks. 3) Apply resumable file uploads. Chunking is the most commonly used method to avoid errors and increase speed.

How do I upload a large file to the Web app?

To summarize, to upload large files, you should: Choose the best cloud storage service for your needs (likely Amazon Web Services or Google Cloud Storage). Break large files into smaller chunks. Ensure your file takes the shortest path to your cloud storage by relying on a Content Ingestion Network.

What is chunked uploading?

The Chunked Upload API provides a way to reliably upload large files to Box by chunking them into a sequence of parts that can be uploaded individually. By using this API the application uploads a file in part, allowing it to recover from a failed request more reliably.


2 Answers

I've dealt with this issue on several sites, using a few of the techniques you've illustrated above and a few that you haven't. The good news is that it is actually pretty realistic to allow massive uploads.

A lot of this depends on what you actually plan to do with the file after you have uploaded it... The more work you have to do on the file, the closer you are going to want it to your server. If you need to do immediate processing on the upload, you probably want to do a pure rails solution. If you don't need to do any processing, or it is not time-critical, you can start to consider "hybrid" solutions...

Believe it or not, I've actually had pretty good luck just using mod_porter. Mod_porter makes apache do a bunch of the work that your app would normally do. It helps not tie up a thread and a bunch of memory during the upload. It results in a file local to your app, for easy processing. If you pay attention to the way you are processing the uploaded files (think streams), you can make the whole process use very little memory, even for what would traditionally be fairly expensive operations. This approach requires very little actual setup to your app to get working, and no real modification to your code, but it does require a particular environment (apache server), as well as the ability to configure it.

I've also had good luck using jQuery-File-Upload, which supports good stuff like chunked and resumable uploads. Without something like mod_porter, this can still tie up an entire thread of execution during upload, but it should be decent on memory, if done right. This also results in a file that is "close" and, as a result, easy to process. This approach will require adjustments to your view layer to implement, and will not work in all browsers.

You mentioned FTP and bittorrent as possible options. These are not as bad of options as you might think, as you can still get the files pretty close to the server. They are not even mutually exclusive, which is nice, because (as you pointed out) they do require an additional client that may or may not be present on the uploading machine. The way this works is, basically, you set up an area for them to dump to that is visible by your app. Then, if you need to do any processing, you run a cron job (or whatever) to monitor that location for uploads and trigger your servers processing method. This does not get you the immediate response the methods above can provide, but you can set the interval to be small enough to get pretty close. The only real advantage to this method is that the protocols used are better suited to transferring large files, the additional client requirement and fragmented process usually outweigh any benefits from that, in my experience.

If you don't need any processing at all, your best bet may be to simply go straight to S3 with them. This solution falls down the second you actually need to do anything with the files other than server them as static assets....

I do not have any experience using the HTML5 FileSystemAPI in a rails app, so I can't speak to that point, although it seems that it would significantly limit the clients you are able to support.

Unfortunately, there is not one real silver bullet - all of these options need to be weighed against your environment in the context of what you are trying to accomplish. You may not be able to configure your web server or permanently write to your local file system, for example. For what it's worth, I think jQuery-File-Upload is probably your best bet in most environments, as it only really requires modification to your application, so you could move an implementation to another environment most easily.

like image 193
Brad Werth Avatar answered Sep 23 '22 08:09

Brad Werth


This project is a new protocol over HTTP to support resumable upload for large files. It bypass Rails by providing its own server.

http://tus.io/

like image 34
Bmxer Avatar answered Sep 22 '22 08:09

Bmxer