Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a signed S3 url

I would like to read csv files from S3 using fread from the data.table package like this:

 ulr_with_signature <- signURL(url, access_key, secret_key)
 DT <- fread(ulr_with_signature)

Is there a package or piece of code somewhere that will allow me to build URL using access/secret key pair.

I would like not to use awscli for reading the data.

like image 517
Bulat Avatar asked Feb 07 '23 22:02

Bulat


1 Answers

You can use the AWS S3 package:

To perform your read:

# These variables should be set in your environment, but you could set them in R:
Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
       "AWS_SECRET_ACCESS_KEY" = "mysecretkey",
       "AWS_DEFAULT_REGION" = "us-east-1")

library("aws.s3")

If you have an R object obj you want to save to AWS, and later read:

s3save(obj, bucket = "my_bucket", object = "object")
# and then later
obj <- s3load("object", bucket = "my_bucket")

Obviously substituting the bucket name and filename (the name of the object in the AWS bucket) for real values. The package also has a corresponding s3save function. You can also save and load in RDS format with s3saveRDS and s3readRDS.

If you need to read a text file, it's a bit more complicated, as the library's function 'get_object' returns a raw vector, and we have to parse it ourselves:

raw_data <- get_object('data.csv', 'my_bucket')

# this method to parse the data is copied from the httr library
# substitute encoding from as needed
data <- iconv(readBin(raw_data, character()), from="UTF-8", to="UTF-8")

# now the data can be read by any R function, eg.
read.csv(data)
fread(data)

# All this can be done without temporary objects:
fread(iconv(
  readBin(get_object('data.csv', 'my_bucket'), character()),
  from="UTF-8", to="UTF-8"))

Your notion of a ‘signed URL’ is not available, as far as I know. A caveat, should you try to develop such a solution: It is important to think of the security implications of storing your secret access key in the source code.

Another concern about the ‘signed url’, is that the object would be stored in memory. If the workspace is saved, it would be stored on disk. Such a solution would have to review security carefully.

like image 65
pusillanimous Avatar answered Feb 19 '23 19:02

pusillanimous