Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I bulk upload to s3?

I recently refactored some of my code to stuff rows into a db using 'load data' and it works great -- however for each record I have I must upload 2 files to s3 -- this totally destroys the magnificent speed upgrade that I was obtaining. Whereas I was able to process 600+ of these documents/second they are now trickling in at 1/second because of s3.

What are your workarounds for this? Looking at the API I see that it is mostly RESTful so I'm not sure what to do -- maybe I should just stick all this into the database. The text files are usually no more than 1.5k. (the other file we stuff in there is an xml representation of the text)

I already cache these files in HTTP requests to my web server as they are used quite a lot.

btw: our current implementation uses java; I have not yet tried threads but that might be an option

Recommendations?

like image 941
eyberg Avatar asked Mar 20 '09 18:03

eyberg


1 Answers

You can use the [putObjects][1] function of JetS3t to upload multiple files at once.

Alternatively you could use a background thread to upload to S3 from a queue, and add files to the queue from your code that loads the data into the database.

[1]: http://jets3t.s3.amazonaws.com/api/org/jets3t/service/multithread/S3ServiceMulti.html#putObjects(org.jets3t.service.model.S3Bucket, org.jets3t.service.model.S3Object[])

like image 118
Adam Hughes Avatar answered Oct 06 '22 21:10

Adam Hughes