Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

S3 - What Exactly Is A Prefix? And what Ratelimits apply?

I was wondering if anyone knew what exactly an s3 prefix was and how it interacts with amazon's published s3 rate limits:

Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket.

While that's really clear I'm not quite certain what a prefix is?

Does a prefix require a delimiter?

If we have a bucket where we store all files at the "root" level (completely flat, without any prefix/delimters) does that count as single "prefix" and is it subject to the rate limits posted above?

The way I'm interpreting amazon's documentation suggests to me that this IS the case, and that the flat structure would be considered a single "prefix". (ie it would be subject to the published rate limits above)

Suppose that your bucket (admin-created) has four objects with the following object keys:

Development/Projects1.xls

Finance/statement1.pdf

Private/taxdocument.pdf

s3-dg.pdf

The s3-dg.pdf key does not have a prefix, so its object appears directly at the root level of the bucket. If you open the Development/ folder, you see the Projects.xlsx object in it.

In the above example would s3-dg.pdf be subject to a different rate limit (5500 GET requests /second) than each of the other prefixes (Development/Finance/Private)?


What's more confusing is I've read a couple of blogs about amazon using the first N bytes as a partition key and encouraging about using high cardinality prefixes, I'm just not sure how that interacts with a bucket with a "flat file structure".

like image 758
dm03514 Avatar asked Sep 21 '18 12:09

dm03514


People also ask

What is a prefix in S3?

You can use prefixes to organize the data that you store in Amazon S3 buckets. A prefix is a string of characters at the beginning of the object key name. A prefix can be any length, subject to the maximum length of the object key name (1,024 bytes).

What is prefix and suffix in S3 bucket?

Today Amazon S3 added some great new features for event handling: Prefix filters – Send events only for objects in a given path. Suffix filters – Send events only for certain types of objects (. png, for example)

Does S3 have rate limiting?

Resolution. Amazon S3 supports a request rate of 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. The resources for this request rate aren't automatically assigned when a prefix is created.

What is the size limit of S3 bucket policies?

Bucket policies are limited to 20 KB in size. You can use the AWS Policy Generator to create a bucket policy for your Amazon S3 bucket. You can then use the generated document to set your bucket policy by using the Amazon S3 console , through several third-party tools, or via your application.


2 Answers

You're right, the announcement seems to contradict itself. It's just not written properly, but the information is correct. In short:

  1. Each prefix can achieve up to 3,500/5,500 requests per second, so for many purposes, the assumption is that you wouldn't need to use several prefixes.
  2. Prefixes are considered to be the whole path (up to the last '/') of an object's location, and are no longer hashed only by the first 6-8 characters. Therefore it would be enough to just split the data between any two "folders" to achieve x2 max requests per second. (if requests are divided evenly between the two)

For reference, here is a response from AWS support to my clarification request:

Hello Oren,

Thank you for contacting AWS Support.

I understand that you read AWS post on S3 request rate performance being increased and you have additional questions regarding this announcement.

Before this upgrade, S3 supported 100 PUT/LIST/DELETE requests per second and 300 GET requests per second. To achieve higher performance, a random hash / prefix schema had to be implemented. Since last year the request rate limits increased to 3,500 PUT/POST/DELETE and 5,500 GET requests per second. This increase is often enough for applications to mitigate 503 SlowDown errors without having to randomize prefixes.

However, if the new limits are not sufficient, prefixes would need to be used. A prefix has no fixed number of characters. It is any string between a bucket name and an object name, for example:

  • bucket/folder1/sub1/file
  • bucket/folder1/sub2/file
  • bucket/1/file
  • bucket/2/file

Prefixes of the object 'file' would be: /folder1/sub1/ , /folder1/sub2/, /1/, /2/. In this example, if you spread reads across all four prefixes evenly, you can achieve 22,000 requests per second.

like image 64
Oren Avatar answered Nov 10 '22 16:11

Oren


This looks like it is obscurely addressed in an amazon release communication

https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/

Performance scales per prefix, so you can use as many prefixes as you need in parallel to achieve the required throughput. There are no limits to the number of prefixes.

This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications. This improvement is now available in all AWS Regions. For more information, visit the Amazon S3 Developer Guide.

like image 36
dm03514 Avatar answered Nov 10 '22 16:11

dm03514