Is there a more efficient way to list files from a bucket in Amazon S3 and also extract the meta data for each of those files? I'm using the AWS PHP SDK.
if ($paths = $s3->get_object_list('my-bucket')) {
foreach($paths AS $path) {
$meta = $s3->get_object_metadata('my-bucket', $path);
echo $path . ' was modified on ' . $meta['LastModified'] . '<br />';
}
}
At the moment I need to run get_object_list() to list all the files and then get_object_metadata() for each file to get its meta data.
If I have 100 files in my bucket, it makes 101 calls to get this data. It would be good if it's possible to do it in 1 call.
E.g:
if ($paths = $s3->get_object_list('my-bucket')) {
foreach($paths AS $path) {
echo $path['FileName'] . ' was modified on ' . $path['LastModified'] . '<br />';
}
}
In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.
To download an entire bucket to your local file system, use the AWS CLI sync command, passing it the s3 bucket as a source and a directory on your file system as a destination, e.g. aws s3 sync s3://YOUR_BUCKET . . The sync command recursively copies the contents of the source to the destination.
Data stored in the S3 Glacier Deep Archive storage class has a minimum storage duration period of 180 days and a default retrieval time of 12 hours. If you have deleted, overwritten, or transitioned to a different storage class an object before the 180-day minimum, you are charged for 180 days.
You can increase your read or write performance by using parallelization. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second. Similarly, you can scale write operations by writing to multiple prefixes.
I know this is a bit old, but I encountered this problem and to solve it I extended the Aws sdk to use the batch functionality for this type of problem. It makes a lot quicker to retrieve custom meta data for lots of files. This is my code:
/**
* Name: Steves_Amazon_S3
*
* Extends the AmazonS3 class in order to create a function to
* more efficiently retrieve a list of
* files and their custom metadata using the CFBatchRequest function.
*
*
*/
class Steves_Amazon_S3 extends AmazonS3 {
public function get_object_metadata_batch($bucket, $filenames, $opt = null) {
$batch = new CFBatchRequest();
foreach ($filenames as $filename) {
$this->batch($batch)->get_object_headers($bucket, $filename); // Get content-type
}
$response = $this->batch($batch)->send();
// Fail if any requests were unsuccessful
if (!$response->areOK()) {
return false;
}
foreach ($response as $file) {
$temp = array();
$temp['name'] = (string) basename($file->header['_info']['url']);
$temp['etag'] = (string) basename($file->header['etag']);
$temp['size'] = $this->util->size_readable((integer) basename($file->header['content-length']));
$temp['size_raw'] = basename($file->header['content-length']);
$temp['last_modified'] = (string) date("jS M Y H:i:s", strtotime($file->header['last-modified']));
$temp['last_modified_raw'] = strtotime($file->header['last-modified']);
@$temp['creator_id'] = (string) $file->header['x-amz-meta-creator'];
@$temp['client_view'] = (string) $file->header['x-amz-meta-client-view'];
@$temp['user_view'] = (string) $file->header['x-amz-meta-user-view'];
$result[] = $temp;
}
return $result;
}
}
You need to know that list_objects
function has limit. It doesn't allows to load more than 1000 objects, even if max-keys
option will be set to some large number.
To fix this you need to load data several times:
private function _getBucketObjects($prefix = '', $booOneLevelOny = false)
{
$objects = array();
$lastKey = null;
do {
$args = array();
if (isset($lastKey)) {
$args['marker'] = $lastKey;
}
if (strlen($prefix)) {
$args['prefix'] = $prefix;
}
if($booOneLevelOny) {
$args['delimiter'] = '/';
}
$res = $this->_client->list_objects($this->_bucket, $args);
if (!$res->isOK()) {
return null;
}
foreach ($res->body->Contents as $object) {
$objects[] = $object;
$lastKey = (string)$object->Key;
}
$isTruncated = (string)$res->body->IsTruncated;
unset($res);
} while ($isTruncated == 'true');
return $objects;
}
As result - you'll have a full list of the objects.
What if you have some custom headers?
They will be not returned via list_objects
function. In this case this will help:
foreach (array_chunk($arrObjects, 1000) as $object_set) {
$batch = new CFBatchRequest();
foreach ($object_set as $object) {
if(!$this->isFolder((string)$object->Key)) {
$this->_client->batch($batch)->get_object_headers($this->_bucket, $this->preparePath((string)$object->Key));
}
}
$response = $this->_client->batch($batch)->send();
if ($response->areOK()) {
foreach ($response as $arrHeaderInfo) {
$arrHeaders[] = $arrHeaderInfo->header;
}
}
unset($batch, $response);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With