Currently, I have a website which serves dynamic (PHP-MySQL) content from an Apache server, and serves static content (JavaScript, images) from a separate Lighthttpd server.
For reasons of scale I would like to use Amazon Cloudfront and possibly S3. To be honest I'm not entirely sure how S3 or CloudFront work. I'm used to the normal server behaviour of "upload a file... it becomes available" and S3 "buckets" and CloudFront edge-mirroring are daunting.
I need a better understanding of how this works and have some questions:
1) I don't want to store any images on my own servers. I want them to be entirely in the cloud. Am I correct that this means I will need to use S3 for storage as an "origin server"? Will CloudFront on its own not be enough? Is CloudFront just the edge-CDN service?
2) We currently upload images via a PHP script which FTPs them to our image server, or via manual FTP upload. How will that change if I use S3? I heard you can't FTP to it? :(
3) If using S3, can I still create hierarchical directories and store images inside these? The images are stored various folders deep and I can't afford to change the code, but I heard S3 was a flat "bucket"?
4) Finally, I heard that with CloudFront, if a file changes, you have to issue an invalidation request, which costs money. Is this because CloudFront is caching the image from the origin? I'm not used to this as in my current setup I just replace an image via FTP and it updates! Is there no way to imitate this classic behaviour?
Sincerest thanks for help.
Benefit from CloudFront's native accelerationWith its support for all HTTP methods including GET, PUT and POST, CloudFront can accelerate your entire website. CloudFront continuously implements new technologies in internet communication protocols and makes them available to you.
Amazon CloudFront works with S3 but copies files from S3 to the outer "edge" of Amazon's servers, allowing for fast retrieval. My tests show that it retrieves files in about half the time of S3. There's a slight increase in price from Amazon S3, but not much.
Amazon CloudFront is a good choice for distribution of frequently accessed static content that benefits from edge delivery—like popular website images, videos, media files or software downloads.
1) I don't want to store any images on my own servers. I want them to be entirely in the cloud. Am I correct that this means I will need to use S3 for storage as an "origin server"? Will CloudFront on its own not be enough? Is CloudFront just the edge-CDN service?
S3 is designed for the long-term, reliable storage of data with eleven 9s of durability. Buckets (as they're called) are region-specific and live in one of Amazon's regional data centers.
Conversely, CloudFront is designed as a series of edge servers. By default, when you request an object (i.e., file) from a CloudFront hostname, that object is pulled from the origin location and cached in the nearest CloudFront edge location for 24 hours (this can be adjusted programmatically). At the end of 24 hours the cache expires, and CloudFront will pull a fresh copy the next time that object is requested.
A common setup is to configure CloudFront to use S3 as its origin location. CloudFront also has the ability to use any server, if that's what you prefer (it sounds like you don't).
2) We currently upload images via a PHP script which FTPs them to our image server, or via manual FTP upload. How will that change if I use S3? I heard you can't FTP to it? :(
S3 isn't an FTP server, so it doesn't speak the (S)FTP protocol. However, nearly all FTP clients for Mac OS X include support for Amazon S3. Amazon S3 has a web service API, so you can automate the push using one of the AWS SDK's if you'd like.
One tool, Cyberduck, does SSH, SFTP, FTP, Amazon S3, and a few other things. It's available for both Mac and Windows. There are also other tools out there that provide a GUI for uploading to S3 as simply as though you were uploading via FTP.
3) If using S3, can I still create hierarchical directories and store images inside these? The images are stored various folders deep and I can't afford to change the code, but I heard S3 was a flat "bucket"?
Yes and no.
Yes, S3 is a flat file system, but files can have slashes in their names. For example, "abc/def/ghi/jkl.txt" is not actually 3 folders and a file, but rather one file with slashes in its filename. Most GUI tools choose to visualize this as folders and subdirectories, and the S3 URL looks just like any other URL. Speaking personally, I've never needed to do anything different for S3 than I used to do for SFTP.
4) Finally, I heard that with CloudFront, if a file changes, you have to issue an invalidation request, which costs money. Is this because CloudFront is caching the image from the origin? I'm not used to this as in my current setup I just replace an image via FTP and it updates! Is there no way to imitate this classic behaviour?
Right. Because CloudFront caches the source file to the nearest edge server. By default, the expiration is 24 hours, but you can set it as low as 1 hour, or even expire it sooner with an "invalidation request". I've seen this take anywhere from 3-15 minutes to complete, because CloudFront has to check all of the edge servers to make sure that they're all cleared.
If you don't want the caching, you can just use S3 straight-up. This is the closest equivalent to replacing an image via FTP, but then you lose all of the benefits of using a CDN in the first place.
According to the Amazon CloudFront pricing page:
"No additional charge for the first 1,000 files that you request for invalidation each month. $0.005 per file listed in your invalidation requests thereafter."
That's half-of-a-penny for each file you invalidate over 1,000 in a month. I use CloudFront regularly and have never crossed that limit, but if you're running a larger site with lots and lots of changes, then it's certainly a possibility.
I hope this helps! :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With