S3 performance for LIST by prefix with millions of objects in a single bucket

Tags:

I have a project where there will be about 80 million objects in an S3 bucket. Every day, I will be deleting about 4 million and adding 4 million. The object names will be in a pseudo directory structure:

/012345/0123456789abcdef0123456789abcdef

For deletion, I will need to list all objects with a prefix of 012345/, and then delete them. I am concerned of the time it will take for this LIST operation. While it seems clear that S3's access time for individual assets does not increase for individual objects, I haven't found anything definitive that says that a LIST operation over 80MM objects, searching for 10 objects that all have the same prefix will remain fast in such a large bucket.

In a side comment on a question about the maximum number of objects that can be stored in a bucket (from 2008):

In my experience, LIST operations do take (linearly) longer as object count increases, but this is probably a symptom of the increased I/O required on the Amazon servers, and down the wire to your client.

From the Amazon S3 documentation:

There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets or just a few. You can store all of your objects in a single bucket, or you can organize them across several buckets.

While I am inclined to believe the Amazon documentation, it isn't entirely clear what operations their comment is directed to.

Before committing to this expensive plan, I would like to definitively know if LIST operations when searching by prefix remain fast when buckets contain millions of objects. If someone has real-world experience with such large buckets, I would love to hear your input.

365

asked Jul 31 '14 14:07

Brad

2 Answers

Prefix searches are fast, if you've chosen the prefixes correctly. Here's an explanation: https://cloudnative.io/blog/2015/01/aws-s3-performance-tuning/

158

answered Sep 21 '22 10:09

r3m0t

I've never seen a problem, but why would you ever list a million files just to pull a few files out of the list? It's not S3 performance, it's likely do to the call just taking longer.

Why not store the file names in a database, index them, then query from there. That'd be a better solution I'd think.

answered Sep 21 '22 10:09

Paul Frederiksen

Related questions
                            
                                How to avoid basic authentication for AWS ELB health-check with nginx configuration
                            
                                Are Amazon Elastic Load Balancer (ELB) failure proof?
                            
                                PgAdmin access to AWS Postgres instance in private subnet
                            
                                Upload multipart file to AWS without saving it locally
                            
                                AWS CodeBuild as non-root user
                            
                                Pros and Cons of Amazon SageMaker VS. Amazon EMR, for deploying TensorFlow-based deep learning models?
                            
                                AWS and Terraform - Default egress rule in security group
                            
                                Can we restore to same dynamodb table from backup
                            
                                How can I access an EC2 instance created by CDK?
                            
                                AWS Step Functions - Pass input to another task
                            
                                Debezium with AWS MSK NOT_ENOUGH_REPLICAS
                            
                                Amplify CreateApp Permission
                            
                                AWS Lambda not authorised to perform action listed in permissions
                            
                                Laravel artisan tinker from Amazon Linux 2 (Elastic Beanstalk)
                            
                                EC2 - taking an EBS snapshot, saving to S3, and then launching instances from S3
                            
                                How do you specify the requester's workerID in a mechanical turk HIT?
                            
                                AWS EC2 apache log file location on AMI
                            
                                Importing Key Pair into Amazon AWS - wrong fingerprint?
                            
                                S3 DeleteObject - DeleteMarker always returns empty
                            
                                Is it possible to create an AWS IAM policy for automatic resource tagging?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

S3 performance for LIST by prefix with millions of objects in a single bucket

Tags:

amazon-web-services

amazon-s3

Brad

People also ask

2 Answers

r3m0t

Paul Frederiksen

Recent Activity

Donate For Us