Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how many objects are returned by aws s3api list-objects?

I am using:

aws s3api list-objects --endpoint-url https://my.end.point/ --bucket my.bucket.name --query 'Contents[].Key' --output text

to get the list of files in a bucket.

The aws s3api list-object documentation page says that this command returns only up to a 1000 objects, however I noticed that in my case it returns the names of all files in my bucket. For example when I run the following command:

aws s3api list-objects --endpoint-url https://my.end.point/ --bucket my.bucket.name --query 'Contents[].Key' --output text | tr "\t" "\n" | wc -l

I get 13512 displayed, meaning that more than 13 thousand file names were returned.

Am I missing smth?

I use the following aws cli version:

aws-cli/1.10.57 Python/2.7.3 Linux/3.2.0-4-amd64 botocore/1.4.47
like image 368
Rustam Issabekov Avatar asked Aug 20 '16 12:08

Rustam Issabekov


People also ask

How many objects can S3 make?

The total volume of data and number of objects you can store are unlimited. Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB.

How often can you expect to lose data if you store 10000000 objects in S3?

As AWS notes, “If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years.”

How many items are in a S3 folder?

If you are referring to the number of objects that you can store in one Amazon S3 bucket, then it's unlimited. There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets or just a few.


1 Answers

Returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket. [1]

I think that the part "(up to 1000)" in the documentation's description is highly misleading. It refers to the maximal page size per underlying HTTP request which is sent by the cli. The documentation of the --page-size option makes this clear:

The size of each page to get in the AWS service call. This does not affect the number of items returned in the command's output. Setting a smaller page size results in more calls to the AWS service, retrieving fewer items in each call. This can help prevent the AWS service calls from timing out.

It gets even clearer when reading the AWS documentation about pagination [2] which describes:

For commands that can return a large list of items, the AWS Command Line Interface (AWS CLI) adds three options that you can use to control the number of items included in the output when the AWS CLI calls a service's API to populate the list.

By default, the AWS CLI uses a page size of 1000 and retrieves all available items. For example, if you run aws s3api list-objects on an Amazon S3 bucket that contains 3,500 objects, the CLI makes four calls to Amazon S3, handling the service-specific pagination logic for you in the background and returning all 3,500 objects in the final output.

As Ankit already stated correctly, using the --max-items option is the correct solution to limit the result and stop the automatic pagination:

To include fewer items at a time in the AWS CLI output, use the --max-items option. The AWS CLI still handles pagination with the service as described above, but prints out only the number of items at a time that you specify. [2]

References

[1] https://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html
[2] https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-pagination.html

like image 119
Martin Löper Avatar answered Sep 25 '22 12:09

Martin Löper