Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Migrate s3 data to google cloud storage

I have a python web application deployed on Google App Engine.

I need to grab a log file stored on Amazon S3 and load it into Google Cloud Storage. Once it is in Google Cloud Storage I may need to perform some transformations and eventually import the data into BigQuery for analysis.

I tried using gsutil as a some sort of proof of concept, since boto is under the hood of gsutil and I'd like to use boto in my project. This did not work.

I'd like to know if anyone has managed to transfer file directly between the 2 clouds. If possible I'd like to see a simple example. In the end this task has to be accomplished through code executing on GAE.

like image 249
Eyalk Avatar asked Nov 28 '12 13:11

Eyalk


1 Answers

Per this thread, you can stream data from S3 to Google Cloud Storage using gsutil but every byte still has to take two hops: S3 to your local computer and then your computer to GCS. Since you're using App Engine, however, you should be able to pull from S3 and deposit into GCS. It's the same progression as above except App Engine is the intermediary, i.e. every byte travels from S3 to your app and then to GCS. You could use boto for the pull side and the Google Cloud Storage API for the push side.

like image 67
Marc Cohen Avatar answered Sep 29 '22 00:09

Marc Cohen