I'm testing out S3 Select and as far as I understand from the examples, you can treat a single object (CSV or JSON) as a data store.
I wanted to have a single JSON document per S3 object and search the entire bucket as a 'database'. I'm saving each 'file' as <ID>.json
and each file has JSON documents with the same schema.
Is it possible to search multiple objects in a single call? i.e. Find all JSON documents where customerId = 123
?
Athena can query multiple objects at once, while with S3 select, we can only query a single object (ex. a single flat file)
Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 terabytes. The largest object that can be uploaded in a single PUT is 5 gigabytes. For objects larger than 100 megabytes, customers should consider using the Multipart Upload capability.
Our solution is built with Amazon S3 event notifications, AWS Lambda, AWS Glue Catalog, and Amazon Athena. These services allow you to search thousands of objects in an S3 bucket by filenames, object metadata, and object keys.
S3 Select is a feature of S3 that lets you specify targeted portions of an S3 object to retrieve and return to you rather than returning the entire contents of the object. You can use some basic SQL expressions to select certain columns and filter for particular records in your structured file.
It appears that Amazon S3 Select operates on only one object.
You can use Amazon Athena to run queries across paths, which will include all files within that path. It also supports partitioning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With