Need help deciding between EBS vs S3 on Amazon Web Services

Tags:

I'm working on a project that incorporates file storage and sharing features and after months of researching the best method to leverage AWS I'm still a little concerned.

Basically my decision is between using EBS storage to house user files or S3. The system will incorporate on-the-fly zip archiving when the user wants to download a handful of files. Also, when users download any files I don't want the URL to the files exposed.

The two best options I've come up with are:

Have an EC2 instance which has a number of EBS volumes mounted to store user files.
- pros: It seems much faster than S3, and zipping files from the EBS volume is straight forward.
- cons: I believe Amazon caps how much EBS storage you can use and there is not as redundant as S3.
After files are uploaded and processed, the system pushes those files to an S3 bucket for long term storage. When files are requested I will retrieve the files from S3 and output back to the client.
- pros: Redundancy, no file storage limits
- cons: It seems very SLOW, no way to mount an S3 bucket as a volume in filesystem, serving zipped files would mean transferring each file to the EC2 instance, zipping, and then finally sending output (again, slow!)

Are any of my assumptions flawed? Can anyone think of a better way of managing massive amounts of file storage?

851

asked Aug 10 '12 23:08

andrewvnice

1 Answers

If your service is going to be used by an undetermined number of users, it is important to bear in mind that scaleability will always be a concern, regardless of the option adopted, you will need to scale the service to meet demand, so it would be convenient assume that your service will be running in a Auto Scaling Group with a pool of EC2 instances and not a single instance.

Regarding the protection of the URL to allow only authorized users download the files, there are many ways to do this without requiring your service to act as an intermediate, then you will need to deal with at least two issues:

File name predictability: to avoid URL predictability, you could name the uploaded file as a hash and store the original filenames and ownerships in a database like SimpleDB, optionally you can set a http header such as "Content-Disposition: filename=original_file_name.ext" to advise users browser to name the downloaded file accordingly.
authorization: when the user ask to download a given file your service, issue a temporary authorization using Query String Authentication or Temporary Security Credentials for that specific user giving read access to the file for a period of time then your service redirects to the S3 bucket URL for direct download. This can greatly offload your EC2 pool instances, making then available to process other requests more quickly.

To reduce the space and traffic to your S3 bucket (remember you pay per GB stored and transferred), I would also recommend compressing each individual file using a standard algorithm like gzip before uploading to S3 and set the header " Content-Encoding: gzip " in order to make automatic uncompression work with users browser. If your programming language of choice is Java, I suggest taking a look at the plugin code webcache-s3-maven-plugin that I created to upload static resources from web projects.

Regarding the processing time in compressing a folder, you will frequently be unable to ensure that the folders are going to be compressed in short time, in order to allow the user to download it immediately, since eventually there could be huge folders that could take minutes or even hours to be compressed. For this I suggest you to use the SQS and SNS services in order to allow asynchronous compression processing, it would work as follows:

user requests folder compression
the frontend EC2 instance creates a compression request in an SQS queue
a backend EC2 instance, consumes the compression request of the SQS queue
the backend instance downloads the files from S3 to a EBS drive, since the generated files will be temporary I would suggest to choose to use at least m1.small instances with ephemeral type disks, which are local to the virtual machine in order to reduce I/O latency and the processing time.
after the compressed file is generated, the service uploads the file to the S3 bucket, optionally setting the Object Expiration properties, that will tell S3 bucket to delete the file automatically after a certain period of time (again to reduce your storage costs), and publishes a notification that the file is ready to be downloaded in a SNS topic.
if the user is still online, read the notification from the topic, and notify the user that the zip file is ready to be downloaded, if after a while this notification did not arrive, you can tell the user that compression is taking longer than expected and the service will notify him by e-mail as soon as the file is ready to be downloaded.

In this scenario you could have two Auto Scaling Groups, respectively frontend and backend, that may have different scaleability restrictions.

131

answered Oct 04 '22 06:10

Alessandro Oliveira

Related questions
                            
                                Is boto3 client thread-safe
                            
                                Connecting to AWS Transfer for SFTP
                            
                                AWS S3 Java SDK - Access Denied
                            
                                Is there any way to specify --endpoint-url in aws cli config file
                            
                                How do I copy files from S3 to Amazon EMR HDFS?
                            
                                Fastest way to sync two Amazon S3 buckets
                            
                                How secure are Amazon AWS Access keys?
                            
                                Creating a folder via s3cmd (Amazon S3)
                            
                                Restricting S3 bucket access to a VPC
                            
                                Getting error "fork/exec /var/task/main: no such file or directory" while executing aws-lambda function
                            
                                Can't delete directory from Amazon S3
                            
                                How to scp to Amazon s3?
                            
                                Terraform: How to migrate state between projects?
                            
                                How do I use shell script to check if a bucket exists?
                            
                                Sharp image library rotates image when resizing?
                            
                                Amazon S3 copyObject permission
                            
                                Reading multiple files from S3 in Spark by date period
                            
                                Why Amazon S3 bucket name must be the same as website name when hosting a static website
                            
                                Does Amazon S3's HTTP Uploads feature support web-hook style callbacks?
                            
                                Can I run my static website from an S3 Bucket, and add password protection?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Need help deciding between EBS vs S3 on Amazon Web Services

Tags:

amazon-web-services

amazon-s3

amazon-ec2

amazon

andrewvnice

People also ask

1 Answers

Alessandro Oliveira

Recent Activity

Donate For Us