Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

amazon s3 renaming and overwriting files, recommendations and risks

I have a bucket with two kinds of file names:

  1. [Bucket]/[file]
  2. [Bucket]/[folder]/[file]

For example, I could have:

  1. MyBucket/bar
  2. MyBucket/foo/bar

I want to rename all the [Bucket]/[folder]/[file] files to [Bucket]/[file] files (and thus overwriting / discarding the [Bucket]/[file] files).
So as in the previous example, i want MyBucket/foo/bar to become MyBucket/bar (and overwrite / duscard the original MyBucket/bar).

I tried two methods:

  1. Using s3cmd's move command: s3cmd mv s3://MyBucket/foo/bar s3://MyBucket/bar
  2. Using Amazon's SDK for php: rename(s3://MyBucket/foo/bar, s3://MyBucket/bar)

Both methods seem to work, but - considering I have to do this as a batch process on thousands of files,
my questions are:

  1. Which method is preferred?
  2. Are there other better methods?
  3. Must I delete the old files prior to the move/rename? (it seems to work fine without it, but I might not be aware of risks involved)

Thank you.

like image 707
EyalAr Avatar asked May 01 '12 13:05

EyalAr


People also ask

Can you overwrite files in S3?

Simply upload your new file on top of your old file to replace an old file in an S3 bucket. The existing file will be overwritten by your new file.

Can you rename files in S3?

There is no direct method to rename the file in s3. what do you have to do is copy the existing file with new name (Just set the target key) and delete the old one.

What is the recommended approach to restrict access to S3 buckets?

Restrict access to your S3 buckets or objects by doing the following: Writing IAM user policies that specify the users that can access specific buckets and objects. IAM policies provide a programmatic way to manage Amazon S3 permissions for multiple users.

Can you rename an S3 folder?

All you have to do is to select a file or folder you want to rename and click on the “Rename” button on the toolbar.


2 Answers

Since I asked this question about 5 months ago, I had some time to gain some insights; so I will answer it myself:

From what I have seen, there is no major difference performance-wise. I can imagine that calling s3cmd from within PHP might be costly, due to invoking an external process for each request; but then again - Amazon's SDK uses cURL to send it's requests, so there is not much of a difference.

One difference I did notice, is that Amazon's SDK tends to throw cURL exceptions (seemingly randomly, and rarely), but s3cmd did not crash at all. My scripts run on 10's of thousands of files, so I had to learn the hard way to deal with these cURL exceptions.
My theory is that cURL crashes when there is a communication conflict on the server (for example, when two processes try to use the same resource). I am working on a development server on which sometimes several processes access S3 with cURL simultaneously; these are the only situations in which cURL exhibited this behaviour.

For conclusion:
Using s3cmd might be more stable, but using the SDK allows more versatility and better integration with you PHP code; as long as you remember to handle the rare cases (I'd say 1 for every 1000 requests, when several processes run simultaneously) in which the SDK throws a cURL exception.

like image 146
EyalAr Avatar answered Sep 19 '22 16:09

EyalAr


Since either methods, s3cmd and SDK, will eveltually issue the same REST call, you can safely choose the one is best for you.

When you are moving a file, if the target exists, it is always replaced, then, if you don't want this behavior, you will need to check whether the target file name already exists, in order to perform or not the move operation.

like image 2
Alessandro Oliveira Avatar answered Sep 19 '22 16:09

Alessandro Oliveira