Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To access S3 bucket from R

I have set-up R on an EC2 Instance on AWS. I have few csv files uploaded into a S3 bucket. I was wondering if there is a way to access the csv files in the S3 bucket from R.

Any help/pointers would be appreciated.

like image 788
user3803555 Avatar asked Oct 09 '14 00:10

user3803555


1 Answers

Have a look at the cloudyr aws.s3 package (https://github.com/cloudyr/aws.s3), it might do what you need. Unfortunately (at time of writing), this package is quite early stage & a little unstable.

I've had good success simply using R's system() command to make a call to the AWS CLI. This is relatively easy to get started on, very robust and very well supported.

  1. Start here: http://aws.amazon.com/cli/
  2. List objects using S3 API: http://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html
  3. Get objects using S3 API: http://docs.aws.amazon.com/cli/latest/reference/s3api/get-object.html

So, for example, on command-line try following:

pip install awscli
aws configure
aws s3 help
aws s3api list-objects --bucket some-bucket --query 'Contents[].{Key: Key}'
aws s3api get-object --bucket some-bucket --key some_file.csv new_file_name.csv

In R, can just do something like:

system("aws s3api list-objects --bucket some-bucket --query 'Contents[].{Key: Key}' > my_bucket.json")
like image 129
John Sandall Avatar answered Oct 07 '22 03:10

John Sandall