I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed. I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the bottleneck (250kB/s). I've not found any straightforward way to compress the file on S3, or enable compression on transfer in s3cmd, boto, or related tools.

Late answer but I found this working perfectly. <pre class="prettyprint"><code>aws s3 sync s3://your-pics . for file in "$(find . -name "*.jpg")"; do gzip "$file"; echo "$file"; done aws s3 sync . s3://your-pics --content-encoding gzip --dryrun </code></pre> This will download all files in s3 bucket to the machine (or ec2 instance), compresses the image files and upload them back to s3 bucket. Verify the data before removing dryrun flag.

There are now pre-built apps in Lambda that you could use to compress images and files in S3 buckets. So just create a new Lambda function and select a pre-built app of your choice and complete the configuration. <ol> <li>Step 1 - Create a new Lambda function</li> <li>Step 2 - Search for prebuilt app <img src="https://i.stack.imgur.com/b8ZAf.jpg" alt="enter image description here"> </li> <li>Step 3 - Select the app that suits your need and complete the configuration process by providing the S3 bucket names. <img src="https://i.stack.imgur.com/j1Qby.jpg" alt="enter image description here"> </li> </ol>

Compress file on S3

3 Answers

S3 does not support stream compression nor is it possible to compress the uploaded file remotely.

If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html

If you need this more frequently

Serving gzipped CSS and JavaScript from Amazon CloudFront via S3

173

answered Oct 12 '22 19:10

Michel Feldheim

Late answer but I found this working perfectly.

aws s3 sync s3://your-pics .

for file in "$(find . -name "*.jpg")"; do gzip "$file"; echo "$file";  done

aws s3 sync . s3://your-pics --content-encoding gzip --dryrun

This will download all files in s3 bucket to the machine (or ec2 instance), compresses the image files and upload them back to s3 bucket. Verify the data before removing dryrun flag.

answered Oct 12 '22 19:10

Navaneeth Pk

There are now pre-built apps in Lambda that you could use to compress images and files in S3 buckets. So just create a new Lambda function and select a pre-built app of your choice and complete the configuration.

Step 1 - Create a new Lambda function
Step 2 - Search for prebuilt app
Step 3 - Select the app that suits your need and complete the configuration process by providing the S3 bucket names.

answered Oct 12 '22 18:10

CloudArch

Related questions
                            
                                Creating an S3 bucket policy that allows access to Cloudfront but restricts access to anyone else
                            
                                s3cmd get an entire directory
                            
                                Multiple Cloudfront Origins with Behavior Path Redirection
                            
                                Write parquet from AWS Kinesis firehose to AWS S3
                            
                                How to download s3 object directly into memory in java
                            
                                How to read input from S3 in a Spark Streaming EC2 cluster application
                            
                                Filter S3 list-objects results to find a key matching a pattern
                            
                                Rails: allow download of files stored on S3 without showing the actual S3 URL to user
                            
                                Are there any difference between amazon cloudfront and amazon s3 transfer acceleration?
                            
                                Amazon S3: when/why [closed]
                            
                                Add Metadata, headers (Expires, CacheControl) to a file uploaded to Amazon S3 using the Laravel 5.0 Storage facade
                            
                                Trying to Install AWS CLI, stuck on a step
                            
                                How to configure django-compressor and django-staticfiles with Amazon's S3?
                            
                                Rails direct upload to Amazon S3
                            
                                React dropzone, how to upload image?
                            
                                Does the ListObjects command guarantee the results are sorted by key?
                            
                                Amazon S3 listing "directories"
                            
                                What is the difference between boto3 list_objects and list_objects_v2?
                            
                                AmazonS3: Getting warning: S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection
                            
                                How to run aws configure on Amazon AWS EC2 automatically without interaction without prompt?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compress file on S3

Tags:

compression

amazon-s3

hive

file-transfer

emr

Matt Joiner

People also ask

3 Answers

Michel Feldheim

Navaneeth Pk

CloudArch

Recent Activity

Donate For Us