Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get more than 1000 objects from S3 by using list_objects_v2?

I have more than 500,000 objects on s3. I am trying to get the size of each object. I am using the following python code for that

import boto3  bucket = 'bucket' prefix = 'prefix'  contents = boto3.client('s3').list_objects_v2(Bucket=bucket,  MaxKeys=1000, Prefix=prefix)["Contents"]  for c in contents:     print(c["Size"]) 

But it just gave me the size of the top 1000 objects. Based on the documentation we can't get more than 1000. Is there any way I can get more than that?

like image 528
tahir siddiqui Avatar asked Jan 22 '19 18:01

tahir siddiqui


People also ask

What is maximum size of S3 bucket and object?

Q: How much data can I store in Amazon S3? The total volume of data and number of objects you can store are unlimited. Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.

What is list_objects_v2?

PDF. Returns some or all (up to 1,000) of the objects in a bucket with each request. You can use the request parameters as selection criteria to return a subset of the objects in a bucket. A 200 OK response can contain valid or invalid XML.

What is the maximum number of keys files the HTTP request will display?

Summary. Microsoft security update MS11-100 limits the maximum number of form keys, files, and JSON members to 1000 in an HTTP request. Because of this change, ASP.NET applications reject requests that have more than 1000 of these elements.

What is the maximum size of a single S3 bucket?

Individual Amazon S3 objects can now range in size from 1 byte all the way to 5 terabytes (TB). Now customers can store extremely large files as single objects, which greatly simplifies their storage experience.


1 Answers

The inbuilt boto3 Paginator class is the easiest way to overcome the 1000 record limitation of list-objects-v2. This can be implemented as follows

s3 = boto3.client('s3')  paginator = s3.get_paginator('list_objects_v2') pages = paginator.paginate(Bucket='bucket', Prefix='prefix')  for page in pages:     for obj in page['Contents']:         print(obj['Size']) 

For more details: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Paginator.ListObjectsV2

like image 88
J Tasker Avatar answered Sep 19 '22 13:09

J Tasker