I am using:
aws s3api list-objects --endpoint-url https://my.end.point/ --bucket my.bucket.name --query 'Contents[].Key' --output text
to get the list of files in a bucket.
The aws s3api list-object
documentation page says that this command returns only up to a 1000 objects, however I noticed that in my case it returns the names of all files in my bucket. For example when I run the following command:
aws s3api list-objects --endpoint-url https://my.end.point/ --bucket my.bucket.name --query 'Contents[].Key' --output text | tr "\t" "\n" | wc -l
I get 13512 displayed, meaning that more than 13 thousand file names were returned.
Am I missing smth?
I use the following aws cli version:
aws-cli/1.10.57 Python/2.7.3 Linux/3.2.0-4-amd64 botocore/1.4.47
The total volume of data and number of objects you can store are unlimited. Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB.
As AWS notes, “If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years.”
If you are referring to the number of objects that you can store in one Amazon S3 bucket, then it's unlimited. There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets or just a few.
Returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket. [1]
I think that the part "(up to 1000)" in the documentation's description is highly misleading. It refers to the maximal page size per underlying HTTP request which is sent by the cli. The documentation of the --page-size
option makes this clear:
The size of each page to get in the AWS service call. This does not affect the number of items returned in the command's output. Setting a smaller page size results in more calls to the AWS service, retrieving fewer items in each call. This can help prevent the AWS service calls from timing out.
It gets even clearer when reading the AWS documentation about pagination [2] which describes:
For commands that can return a large list of items, the AWS Command Line Interface (AWS CLI) adds three options that you can use to control the number of items included in the output when the AWS CLI calls a service's API to populate the list.
By default, the AWS CLI uses a page size of 1000 and retrieves all available items. For example, if you run aws s3api list-objects on an Amazon S3 bucket that contains 3,500 objects, the CLI makes four calls to Amazon S3, handling the service-specific pagination logic for you in the background and returning all 3,500 objects in the final output.
As Ankit already stated correctly, using the --max-items
option is the correct solution to limit the result and stop the automatic pagination:
To include fewer items at a time in the AWS CLI output, use the --max-items option. The AWS CLI still handles pagination with the service as described above, but prints out only the number of items at a time that you specify. [2]
[1] https://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html
[2] https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-pagination.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With