I am trying to access spacenet challenge dataset(https://registry.opendata.aws/spacenet/) in google colaboratory. How to get it in google colaboratory?
Click on the dataset in your repository, then click on View Raw. Copy the link to the raw dataset and store it as a string variable called url in Colab as shown below (a cleaner method but it's not necessary). The last step is to load the url into Pandas read_csv to get the dataframe.
You need to click on Mount Drive Option to the pane on the left side of the notebook and you'll get access to all the files stored in your drive. For importing multiple files in one go, you may need to write a function. Save this answer.
You need to create an AWS account and configure IAM user and generate AccessKey and Secret AccessKey.
With CoLab,
s3r = boto3.resource('s3', aws_access_key_id='XXXXXXXXXXXXXXXXXXXX',
aws_secret_access_key='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
buck = s3r.Bucket('bucket name')
buck.download_file(remotefilename,localfilename)
Here is the boto3 documentation to start with.
http://boto3.readthedocs.io/en/latest/guide/s3-example-download-file.html
One more thing to note. When you download the data AWS will charge your account which may come under your free tier.
That is the reason you need to do all these the download bucket is configured as requestor pays.
You can learn about Amazon S3 pricing here,
https://aws.amazon.com/s3/pricing/
EDIT1:
Install aws s3 cli tools from the below link,
https://aws.amazon.com/cli/
and follow the instructions for spacenet data access here,
https://medium.com/the-downlinq/getting-started-with-spacenet-data-827fd2ec9f53
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With