Is there a way to programmatically find zero bytes file in Amazon S3?
The total size of the bucket is more than 100G,
unlikely for me to sync back to server, then do a
find . -size 0 -type f
Combining s3cmd with awk should do the trick easily.
Note: s3cmd outputs 4 columns, date, time, size and name. You want to match the size (column 3) against 0 and output the object name (column 4). This should do the trick...
$ s3cmd ls -r s3://bucketname | awk '{if ($3 == 0) print $4}'
s3://bucketname/root/
s3://bucketname/root/e
If you want to see all information, just drop the $4 so that it only says print.
$ s3cmd ls -r s3://bucketname | awk '{if ($3 == 0) print}'
2013-03-04 06:28 0 s3://bucketname/root/
2013-03-04 06:28 0 s3://bucketname/root/e
Memory-wise, this should be fine as it's a simple bucket listing.
There is no direct process to search files of zero bytes in size at amazon s3. You can do it by listing all objects and then sort that items on the basis of size, then you can get all zero file size together.
if you want get list of all file having size zero then you can use Bucket Explorer and list the objects of the selected bucket then click on size header (sort by size) it will keep together files size of zero byte together.
Disclosure: I am a developer of Bucket Explorer.
Just use Boto:
from boto import S3Connection
aws_access_key = ''
aws_secret_key = ''
bucket_name = ''
s3_conn = S3Connection(aws_access_key, aws_secret_key)
s3_conn.get_bucket(bucket_name)
for key in bucket.list():
if key.size == 0:
print(key.key)
In regards to the number files, Boto requests the file metadata (not the actual file content) at 1000 per time (the aws limit), and it uses a generator so the memory usage is minor.
JMSE Query:
aws s3api list-objects --bucket $BUCKET --prefix $PREFIX --output json --query 'Contents[?Size==`0`]'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With