I have deployed a REST server in an Amazon EC2 instance. I have also configured an Amazon S3 bucket to store all the data generated by users while interacting with the API. The main information stored are images. Users can upload images by doing a PUT HTTP request over certain URL and credentials. The PUT request may be done over the EC2 instance, since the upload needs to be authorized and users cannot access directly to S3 instance. When the EC2 receives a valid PUT petition, I use the AWS PHP SDK to upload the object to the S3 bucket. The method I use is putObject. For this first part, I think that there are not more alternatives. However to allow users to download previous uploads I have two different alternatives:
The first one is to provide the user an url with the file that points to the S3 bucket-key, as files are uploaded in a public way. So the user can download the image directly from S3 servers without any interaction with EC2.
The second one is to use the REST API running on the EC2 instance to provide the image contents while doing some HTTP GET request. In this case I should use the AWS PHP SDK to "download" the image from S3 servers and return it to the user. The method used would be getObject.
Another possible solution that seems dirty to me, is to provide an HTTP Redirect from EC2 instance to S3 bucket url, but then, the user client should achieve two connections to retrieve a simple image (a bad thing if the user is working over mobile devices).
I have implemented the second option and seems to work fine.
My question is: if accessing the files from the EC2 instance through the REST API, that downloads the contents from S3 instance, would suppose a big overhead over direct accessing files with an url to S3 servers. Both instances are running in the same region (IRELAND). I do not know how the transfer from an S3 to EC2 (or vice-versa) is computed in terms of bandwidth. Would a transfer from S3-EC2-user would compute double than S3-user? Is this transfer done over some kind of local area networks?
I prefer the second way as I can control the content access, log who is accessing each file, changing bucket would be transparent for user, and so on.
Thanks!
These are actually multiple questions combined into one, but I'll try to answer them.
You can set up uploads to go directly to S3, without passing trough your EC2 instance, while still being able to authenticate the upload before it happens. The upload would be performed using a POST request directly to S3. For it to work you need to attach a policy and sign that request (your code on EC2 would generate the policy and signature). For a more detailed guide, see Browser Uploads to S3 using HTML POST.
Proxying the S3 content trough your EC2 instance will certainly add some overhead, but the effect really depends on your app's scale. If you proxy a few requests / second and you have small files, the overhead will most likely not be very noticeable. If you have hundreds of requests/second, then proxying them trough a single EC2 instance will not really work (even if your instance could handle your traffic, you might encounter S3 slow down
errors).
Connections between EC2 and S3 in the same region are fast enough, certainly much faster than any connections between an external host and S3.
Data transfers inside a region are not billed, so your S3-EC2-user transfers would cost the same as your S3-user transfers.
If you need to handle large traffic, I recommend using Query String Authentication to generate signed URLs to your S3 objects, and just do a redirect to these signed urls from your download code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With