Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS S3 - How to get all files that are GLACIER storage class

My goal is to convert all files that are currently GLACIER storage class to STANDARD using aws cli s3api. In order to do this, I need to first get a list of all these files, then fire a restore command, and eventually a copy command to change them all to STANDARD.

The problem is, the number of files are too large (~ 5 million), which eventually results in core dump segmentation fault error if the max item exceeds 600k to 700k. If I don't supply the --max-item parameter, I would get the same error. So I couldn't get anymore files below 700k threshold. Here's the command I used:

aws s3api list-objects --bucket my-bucket --query 'Contents[?StorageClass==`GLACIER`]' --max-item 700000 > glacier.txt

Is there any workaround?

like image 943
Casper Avatar asked Oct 14 '25 07:10

Casper


2 Answers

So I discovered the option --starting-token from list-objects command. So I wrote a script to scan all items in batch of 100k objects. This script will output a file containing the S3 key of all GLACIER object.

#!/bin/bash
BUCKET="s3-bucket-name"
PREFIX="foldername"
PROFILE="awscliprofile"
MAX_ITEM=100000

var=0
NEXT_TOKEN=0
while true; do

    var=$((var+1))

    echo "Iteration #$var - Next token: $NEXT_TOKEN"

    aws s3api list-objects \
    --bucket $BUCKET \
    --prefix $PREFIX \
    --profile $PROFILE \
    --max-item $MAX_ITEM \
    --starting-token $NEXT_TOKEN > temp

    awk '/GLACIER/{getline; print}' temp >> glacier.txt

    NEXT_TOKEN=$(cat temp | grep NextToken | awk '{print $2}' | sed 's/\("\|",\)//g')
    if [ ${#NEXT_TOKEN} -le 5 ]; then
        echo "No more files..."
        echo "Next token: $NEXT_TOKEN"
        break
        rm temp
    fi
    rm temp
done
echo "Exiting."

After that I can use restore-object and finally copy-object to change the storage class of all these files to STANDARD. See more scripts here. Hope this helps anyone who needs to achieve the same thing.

like image 163
Casper Avatar answered Oct 16 '25 20:10

Casper


here is one liner solution

 aws s3api list-objects --bucket *bucket-name*| grep "StorageClass" > nonglacier.txt

then you can grep storage class using

cat nonglacier.txt | grep GLACIER | wc -l

can also be summarized as

aws s3api list-objects --bucket <bucket-name>| grep "StorageClass" |  grep GLACIER | wc -l
like image 31
Keshav Kumar Avatar answered Oct 16 '25 20:10

Keshav Kumar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!