Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load s3 open dataset in google colaboratory?

I am trying to access spacenet challenge dataset(https://registry.opendata.aws/spacenet/) in google colaboratory. How to get it in google colaboratory?

like image 390
Umair Javaid Avatar asked Jun 21 '18 22:06

Umair Javaid


People also ask

How do I load a dataset in Google Colab?

Click on the dataset in your repository, then click on View Raw. Copy the link to the raw dataset and store it as a string variable called url in Colab as shown below (a cleaner method but it's not necessary). The last step is to load the url into Pandas read_csv to get the dataframe.

How do I load multiple files in Colab?

You need to click on Mount Drive Option to the pane on the left side of the notebook and you'll get access to all the files stored in your drive. For importing multiple files in one go, you may need to write a function. Save this answer.


1 Answers

You need to create an AWS account and configure IAM user and generate AccessKey and Secret AccessKey.

With CoLab,

s3r = boto3.resource('s3', aws_access_key_id='XXXXXXXXXXXXXXXXXXXX',
    aws_secret_access_key='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
buck = s3r.Bucket('bucket name')
buck.download_file(remotefilename,localfilename)

Here is the boto3 documentation to start with.

http://boto3.readthedocs.io/en/latest/guide/s3-example-download-file.html

One more thing to note. When you download the data AWS will charge your account which may come under your free tier.

That is the reason you need to do all these the download bucket is configured as requestor pays.

You can learn about Amazon S3 pricing here,

https://aws.amazon.com/s3/pricing/

EDIT1:

Install aws s3 cli tools from the below link,

https://aws.amazon.com/cli/

and follow the instructions for spacenet data access here,

https://medium.com/the-downlinq/getting-started-with-spacenet-data-827fd2ec9f53

Hope it helps.

like image 152
Kannaiyan Avatar answered Sep 30 '22 18:09

Kannaiyan