I am attempting to connect to AWS Athena based upon what I have read online, but I am having issues.
Steps taking
Any ideas?
Error Message:
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.sql.SQLException: AWS accessId/secretKey or AWS credentials provider must be provided
System Information
sysname release version
"Linux" "4.4.0-62-generic" "#83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017"
nodename machine login
"ip-***-**-**-***" "x86_64" "unknown"
user effective_user
"rstudio" "rstudio"
Code https://www.r-bloggers.com/interacting-with-amazon-athena-from-r/
library(RJDBC)
URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.0.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")
con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
s3_staging_dir="s3://mybucket",
user=Sys.getenv("myuser"),
password=Sys.getenv("mypassword"))
The Athena JDBC driver is expecting your AWS Access Key Id as the user
, and the Secret Key as the password
:
accessKeyId <- "your access key id..."
secretKey <- "your secret key..."
jdbcConnection <- dbConnect(
drv,
'jdbc:awsathena://athena.us-east-1.amazonaws.com:443',
s3_staging_dir="s3://mybucket",
user=accessKeyId,
password=secretKey
)
The R-bloggers article obtains those from environment variables using Sys.getenv("ATHENA_USER")
and Sys.getenv("ATHENA_PASSWORD")
, but that is optional.
Updated: Using a Credentials Provider with the Athena driver from R
@Sam is correct that a Credentials Provider is the best practice for handling AWS credentials. I recommend the DefaultCredentialsProviderChain, it covers several options for loading credentials from CLI profiles, environment variables, etc.
lib
) and a directory of third-party dependency jars (third-party/lib
).Add a bit of R code to add all the jar files to rJava's classpath
# Load JAR Files
library("rJava")
.jinit()
# Load AWS SDK jar
.jaddClassPath("/path/to/aws-java-sdk-1.11.98/lib/aws-java-sdk-1.11.98.jar")
# Add Third-Party JARs
jarFilePaths <- dir("/path/to/aws-java-sdk-1.11.98/third-party/lib/", full.names=TRUE, pattern=".jar")
for(i in 1:length(jarFilePaths)) {
.jaddClassPath(jarFilePaths[i])
}
Configure the Athena driver to load the credentials provider class by name
athenaConn <- dbConnect(
athenaDriver,
'jdbc:awsathena://athena.us-east-1.amazonaws.com:443',
s3_staging_dir="s3://mybucket",
aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
)
Getting the classpath set up is key. When dbConnect
is executed, the Athena driver will attempt to load the named class from the JARs, and this will load all dependencies. If the classpath does not include the SDK JAR, you will see errors like:
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.auth.DefaultAWSCredentialsProviderChain
And without the third-party JAR references, you may see errors like this:
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With