Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compress file on S3

I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed.

I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the bottleneck (250kB/s).

I've not found any straightforward way to compress the file on S3, or enable compression on transfer in s3cmd, boto, or related tools.

like image 239
Matt Joiner Avatar asked Jan 24 '13 06:01

Matt Joiner


People also ask

How do I compress an AWS file?

When you want to compress large load files, we recommend that you use gzip, lzop, bzip2, or Zstandard to compress them and split the data into multiple smaller files. Specify the GZIP, LZOP, BZIP2, or ZSTD option with the COPY command. This example loads the TIME table from a pipe-delimited lzop file.

How do I zip a file on AWS S3?

In a nutshell, first create an object using BytesIO method, then use the ZipFile method to write into this object by iterating all the s3 objects, then use put method on this zip object and create a presiged url for it.

How do I limit the size of a S3 bucket?

There are no configuration available that will limit the size of Amazon S3 buckets. You can, however, obtain Amazon S3 metrics in Amazon CloudWatch. You could create an alarm on a bucket to send a notification when the amount of data stored in an Amazon S3 bucket exceeds a certain threshold.

Does S3 upload file overwrite?

By default, when you upload the file with same name. It will overwrite the existing file. In case you want to have the previous file available, you need to enable versioning in the bucket.


3 Answers

S3 does not support stream compression nor is it possible to compress the uploaded file remotely.

If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html

If you need this more frequently

Serving gzipped CSS and JavaScript from Amazon CloudFront via S3

like image 173
Michel Feldheim Avatar answered Oct 12 '22 19:10

Michel Feldheim


Late answer but I found this working perfectly.

aws s3 sync s3://your-pics .

for file in "$(find . -name "*.jpg")"; do gzip "$file"; echo "$file";  done

aws s3 sync . s3://your-pics --content-encoding gzip --dryrun

This will download all files in s3 bucket to the machine (or ec2 instance), compresses the image files and upload them back to s3 bucket. Verify the data before removing dryrun flag.

like image 14
Navaneeth Pk Avatar answered Oct 12 '22 19:10

Navaneeth Pk


There are now pre-built apps in Lambda that you could use to compress images and files in S3 buckets. So just create a new Lambda function and select a pre-built app of your choice and complete the configuration.

  1. Step 1 - Create a new Lambda function
  2. Step 2 - Search for prebuilt app enter image description here
  3. Step 3 - Select the app that suits your need and complete the configuration process by providing the S3 bucket names. enter image description here
like image 3
CloudArch Avatar answered Oct 12 '22 18:10

CloudArch