Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Connect to AWS Athena

I am attempting to connect to AWS Athena based upon what I have read online, but I am having issues.

Steps taking

  • Update Java
  • replace user/pass with accesskey/secretKey
  • pass accesskey/secretKey with user/pass as well

Any ideas?

Error Message:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.sql.SQLException: AWS accessId/secretKey or AWS credentials provider must be provided

System Information

  sysname                           release                                       version 
  "Linux"                           "4.4.0-62-generic"      "#83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017" 
  nodename                          machine                                        login
 "ip-***-**-**-***"                 "x86_64"                                      "unknown" 
  user                              effective_user 
 "rstudio"                          "rstudio"

Code https://www.r-bloggers.com/interacting-with-amazon-athena-from-r/

library(RJDBC)

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.0.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)

drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir="s3://mybucket",
                                   user=Sys.getenv("myuser"),
                                   password=Sys.getenv("mypassword"))
like image 225
user2600629 Avatar asked Dec 23 '22 20:12

user2600629


1 Answers

The Athena JDBC driver is expecting your AWS Access Key Id as the user, and the Secret Key as the password:

accessKeyId <- "your access key id..."
secretKey <- "your secret key..."

jdbcConnection <- dbConnect(
  drv, 
  'jdbc:awsathena://athena.us-east-1.amazonaws.com:443',
  s3_staging_dir="s3://mybucket",
  user=accessKeyId,
  password=secretKey
)

The R-bloggers article obtains those from environment variables using Sys.getenv("ATHENA_USER") and Sys.getenv("ATHENA_PASSWORD"), but that is optional.


Updated: Using a Credentials Provider with the Athena driver from R

@Sam is correct that a Credentials Provider is the best practice for handling AWS credentials. I recommend the DefaultCredentialsProviderChain, it covers several options for loading credentials from CLI profiles, environment variables, etc.

  1. Download the AWS SDK for Java, specifically the SDK jar from (lib) and a directory of third-party dependency jars (third-party/lib).
  2. Add a bit of R code to add all the jar files to rJava's classpath

    # Load JAR Files
    library("rJava")
    
    .jinit()
    
    # Load AWS SDK jar
    .jaddClassPath("/path/to/aws-java-sdk-1.11.98/lib/aws-java-sdk-1.11.98.jar")
    
    # Add Third-Party JARs
    jarFilePaths <- dir("/path/to/aws-java-sdk-1.11.98/third-party/lib/", full.names=TRUE, pattern=".jar")
    for(i in 1:length(jarFilePaths)) {
        .jaddClassPath(jarFilePaths[i])
    }
    
  3. Configure the Athena driver to load the credentials provider class by name

    athenaConn <- dbConnect(
      athenaDriver, 
      'jdbc:awsathena://athena.us-east-1.amazonaws.com:443',
      s3_staging_dir="s3://mybucket",
      aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
      )
    

Getting the classpath set up is key. When dbConnect is executed, the Athena driver will attempt to load the named class from the JARs, and this will load all dependencies. If the classpath does not include the SDK JAR, you will see errors like:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.auth.DefaultAWSCredentialsProviderChain

And without the third-party JAR references, you may see errors like this:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

like image 143
James Avatar answered Jan 03 '23 11:01

James