I am using @aws-sdk/client-s3 and can’t figure out how to get objects with given tag in given S3 bucket.
Unfortunately, S3 does not support filtering objects by tag out of the box. There’s no built-in API to get all objects with a certain tag. But you can work around this with the following steps: See: AWS S3 ListObjects API
List all objects in the bucket: ListObjectsV2Command
For each object, fetch its tags: GetObjectTaggingCommand
Filter based on the desired tag:
if (tags.TagSet.some(tag =>
tag.Key === targetTagKey &&
tag.Value === targetTagValue)) {
matchingObjects.push(obj);
}
Note: This approach can be expensive if you have a large number of objects. If you often need to filter by tags, consider maintaining tag info in a database or a metadata index.
In 2024, AWS has released S3 Metadata, built-on on S3 Tables, that allows you to query an object's system and custom-defined tags, along with properties over time. This feature is only available in three regions, at the moment, and comes with some other limitations as it's new. Also, this only currently applies to new objects or tags being added to objects from the time you enable S3 Metadata. However, it does work for your use case, if the regions align.
You can query objects using the Athena SDK, the Athena web editor or directly through a Spark. I'll demonstrate using the SDK with node.js' @aws-sdk/client-athena.
<account_id>:s3tablescatalog/<table_bucket_name>. Next, under Databases select aws_s3_metadata, and under Tables the table you previously created. Finally under Table permission click the Select permission, and click Grant.<region>, <table-bucket-name> and <table-name> with your values. const client = new AthenaClient({
region: "<region>",
});
const params = {
WorkGroup: "primary",
QueryExecutionContext: {
Database: "aws_s3_metadata",
Catalog: "s3tablescatalog/`<table-bucket-name>`",
},
QueryString: `
SELECT key, object_tags
FROM "aws_s3_metadata"."<table-name>"
where object_tags['department'] = 'it'
ORDER BY record_timestamp DESC
LIMIT 5;
`
}
const command = new StartQueryExecutionCommand(params);
var response = await client.send(command);
await sleep(10000);
const commandResult = new GetQueryResultsCommand(response);
const responseResult = await client.send(commandResult);
console.log(responseResult.ResultSet);
Some notes on the code:
This query defined in QueryString looks for objects with a tag key for department and value of it. You can adjust this to your actual tags.
I have also defined my own sleep function to wait for the execution to complete, you can fill this in with your own.
The GetQueryResultsCommand call may also be paginated if your results are longer, refer to the documentation for parsing the output and implementing pagination.
As for costs:
I would recommend reviewing the individual cost pages for more information. Hope this helped.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With