Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to zip files in Amazon s3 Bucket and get its URL

I have a bunch of files inside Amazon s3 bucket, I want to zip those file and download get the contents via S3 URL using Java Spring.

like image 213
jeff ayan Avatar asked Apr 07 '17 10:04

jeff ayan


People also ask

Can we zip a file in S3 bucket?

In a nutshell, first create an object using BytesIO method, then use the ZipFile method to write into this object by iterating all the s3 objects, then use put method on this zip object and create a presiged url for it.

How do I extract a ZIP file in an Amazon S3 by using lambda?

If you head to the Properties tab of your S3 bucket, you can set up an Event Notification for all object “create” events (or just PutObject events). As the destination, you can select the Lambda function where you will write your code to unzip and gzip files. Now, every time there is a new .


2 Answers

S3 is not a file server, nor does it offer operating system file services, such as data manipulation.

If there is many "HUGE" files, your best bet is

  1. start a simple EC2 instance
  2. Download all those files to EC2 instance, compress them, reupload it back to S3 bucket with a new object name

Yes, you can use AWS lambda to do the same thing, but lambda is bounds to 900 seconds (15 mins) execution timeout (Thus it is recommended to allocate more RAM to boost lambda execution performance)

Traffics from S3 to local region EC2 instance and etc services is FREE.

If your main purpose is just to read those file within same AWS region using EC2/etc services, then you don't need this extra step. Just access the file directly.

(Update) : As mentioned by @Robert Reiz, now you can also use AWS Fargate to do the job.

Note :

It is recommended to access and share file using AWS API. If you intend to share the file publicly, you must look into security issue seriously and impose download restriction. AWS traffics out to internet is never cheap.

like image 199
mootmoot Avatar answered Sep 22 '22 20:09

mootmoot


Zip them in your end instead of doing it in AWS, ideally in frontend, directly on user browser. You can stream the download of several files in javascript, use that stream to create a zip and save this zip on user disk.

The advantages of moving the zipping to the frontend:

  • You can use it with S3 URLs, a bunch of presigned links or even mixing content from different sources, some from S3, some of whatever other place.
  • You don't waste lambda memory, nor have to up an EC2 fargate instance, that saves money. Let the user computer do it for you.
  • Improves user experience - no needs to wait the zip is created to start downloading it, just start downloading meanwhile the zip is being created.

StreamSaver is useful for this purpose, but in their zipping examples (Saving multiple files as a zip) is limited by less than 4GB files as it doesn't implement zip64. You can combine StreamSaver with client-zip, that support zip64, with something like this (I haven't test this):

import { downloadZip } from 'client-zip';
import streamSaver from 'streamsaver';
const files = [
  {
    'name': 'file1.txt',
    'input': await fetch('test.com/file1')
  },
  {
    'name': 'file2.txt',
    'input': await fetch('test.com/file2')
  },
]
downloadZip(files).body.pipeTo(streamSaver.createWriteStream('final_name.zip'));

In case you choose this option, keep in mind that if you have CORS enabled in your bucket you will need to add the frontend url where the zipping is done, right in the AllowedOrigins field from your CORS configuration of your bucket.

About performance: As @aviv-day complains in a comment this could not be suitable for all scenarios. Client-zip library has a benchmark that can give you an idea if this fit or not with your scenario. Generally, if you have a big set of small files (I don't have a number about what is big here, but I'll say something between 100 and 1000) it will take a lot of time just zipping it, and it will drain the final user CPU. Also, if you are offering the same set of files zipped for all the users, it's better zip it one and present it already zipped. Using this method of zipping in frontend works well with a limited small group of files that can dynamically change depending on user preferences about what to download. I've no really test this and I really think the bottle neck would be the network speed more than the zip process, as it happens on the fly, I don't really think that scenario with a big set of files would actually be a problem. If anyone have benchmarks about this would be nice to share with us!

like image 38
javrd Avatar answered Sep 25 '22 20:09

javrd