Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get size of all files in an S3 bucket with versioning?

I know this command can provide the size of all files in a bucket:

aws s3 ls mybucket --recursive --summarize --human-readable

But this does not account for versioning.

If I run this command:

aws s3 ls s3://mybucket/myfile --human-readable

It will show something like "100 MiB" but it may have 10 versions of this file which will be more like "1 GiB" total.

The closest I have is getting the sizes of every version of a given file:

aws s3api list-object-versions --bucket mybucket --prefix "myfile" --query 'Versions[?StorageClass=`STANDARD`].Size' > /tmp/s3_myfile_version_sizes

Then take the sum of all version sizes.

But I would have to rerun this command for every file in a bucket.

Is there an easier way to do this?

like image 365
jhoang7 Avatar asked Mar 31 '17 22:03

jhoang7


People also ask

How can I get the size of an Amazon S3 bucket?

To find the size of a single S3 bucket, you can use the S3 console and select the bucket you wish to view. Under Metrics, there's a graph that shows the total number of bytes stored over time.

How do I find the total size of my AWS S3 storage bucket or folder?

Get the Size of a Folder in S3 # Open the AWS S3 console and click on your bucket's name. Optionally use the search input to filter by folder name. Click on the checkbox next to your folder's name. Click on the Actions button and select Calculate total size.

How big is all of S3?

The total volume of data and number of objects you can store are unlimited. Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.

How do I count the number of files in a S3 bucket?

Open the AWS S3 console and click on your bucket's name. In the Objects tab, click the top row checkbox to select all files and folders or select the folders you want to count the files for. Click on the Actions button and select Calculate total size.


1 Answers

You can run list-object-versions on the bucket as a whole:

aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size'

Use jq to sum it up:

aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add

Or, if you need a human readable output:

aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add | numfmt  --to=iec-i --suffix=B

You can also add a prefix in case you want to know the size of a given "folder" and maybe get also the number of version objects:

aws s3api list-object-versions --bucket my-bucket --prefix my-folder --query 'Versions[*].Size' | jq 'length|add'

Or you can use jq filtering to write more complex filters, for example, including only non-current objects:

aws s3api list-object-versions --bucket my-bucket --prefix my-folder | jq '[.Versions[]|select(.IsLatest == false)|.Size] | length,add'

If jq is not available, using the --output text option unfortunately results in tab-separated values, so here's a hack to force it to separate lines and then add up the total:

aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].[Size,Size]' --output text  | awk '{s+=$1} END {printf "%.0f", s}'

If you have a large number of objects, it might be better to use data provided by the Amazon S3 Storage Inventory:

Amazon S3 inventory provides a comma-separated values (CSV) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).

like image 110
John Rotenstein Avatar answered Sep 20 '22 05:09

John Rotenstein