Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uploading Large HD Video Files to Amazon Web Services S3

Ultimate Goal: Upload large video files (<200MB-3GB) from a content producer's computer to an AWS S3 bucket to use the Elastic Transcoder service.

  • The content producer will be a pro user, so a little extra work on their part is not a huge burden. However, keeping it as simple as possible for them (and me) is ideal. Would be best if a web form could be used to initiate.
  • There wouldn't be many hundreds of content producers, so some extra time or effort could be devoted to setting up some sort of account or process for each individual content producer. Although automation is king.
  • Some said you could use some sort of Java Applet or maybe Silverlight.
  • One thing I thought of was using SFTP to upload first to EC2 then it would be moved to S3 afterwards. But it kind of sounds like a pain making it secure.
  • After some research I discovered S3 allows cross-origin resource sharing. So this could allow uploading directly to S3. However, how stable would this be with huge files?
    • How to directly upload files to Amazon S3 from your client side web app
    • Direct Upload to S3 (with a little help from jQuery)
  • Looks like S3 allows multipart uploading as well.
    • Uploading Objects Using Multipart Upload API

Any ideas?

like image 325
Adam Avatar asked Oct 17 '13 15:10

Adam


1 Answers

You could implement the front-end in pretty much anything that you can code to speak native S3 multipart upload... which is the approach I'd recommend for this, because of stability.

With a multipart upload, "you" (meaning the developer, not the end user, I would suggest) choose a part size, minimum 5MB per part, and the file can be no larger that 10,000 "parts", each exactly the same size (the one "you" selected at the beginning of the upload, except for the last part, which would be however many bytes are left over at the end... so the ultimatel maximum size of the uploaded file depends on the part-size you choose.

The size of a "part" essentially becomes your restartable/retryable block size (win!)... so your front-end implementation can infinitely resend a failed part until it goes through correctly. Parts don't even have to be uploaded in order, they can be uploaded in parallel, and if you upload the same part more than once, the newer one replaces the older one, and with each block, S3 returns a checksum that you compare to your locally calculated one. The object doesn't become visible in S3 until you finalize the upload. When you finalize the upload, if S3 hasn't got all the parts (which is should, because they were all acknowledged when they uploaded) then the finalize call will fail.

The one thing you do have to keep in mind, though, is that multipart uploads apparently never time out, and if they are "never" either finalized/completed nor actively aborted by the client utility, you will pay for the storage of the uploaded blocks of the incomplete uploads. So, you want to implement an automated back-end process that periodically calls ListMultipartUploads to identify and abort those uploads that for whatever reason were never finished or canceled, and abort them.

I don't know how helpful this is as an answer to your overall question, but developing a custom front-end tool should not be a complicated matter -- the S3 API is very straightforward. I can say this, because I developed a utility to do this (for my internal use -- this isn't a product plug). I may one day release it as open source, but it likely wouldn't suit your needs anyway -- its essentially a command-line utility that can be used by automated/scheduled processes to stream ("pipe") the output of a program directly into S3 as a series of multipart parts (the files are large, so my default part-size is 64MB), and when the input stream is closed by the program generating the output, it detects this and finalizes the upload. :) I use it to stream live database backups, passed through a compression program, directly into S3 as they are generated, without ever needing those massive files to exist anywhere on any hard drive.

Your desire to have a smooth experience for your clients, in my opinion, highly commends S3 multipart for the role, and if you know how to code in anything that can generate a desktop or browser-based UI, can read local desktop filesystems, and has libraries for HTTP and SHA/HMAC, then you can write a client to do this that looks and feels exactly the way you need it to.

You wouldn't need to set up anything manually in AWS for each client, so long as you have a back-end system that authenticates the client utility to you, perhaps by a username and password sent over an SSL connection to an application on a web server, and then provides the client utility with automatically-generated temporary AWS credentials that the client utility can use to do the uploading.

like image 70
Michael - sqlbot Avatar answered Sep 28 '22 06:09

Michael - sqlbot