Can I use Athena View as a source for a AWS Glue Job?

Tags:

I'm trying to use an Athena View as a data source to my AWS Glue Job. The error message I'm getting while trying to run the Glue job is about the classification of the view. What can I define it as? Thank you

Error Message Appearing

315

asked Nov 01 '18 13:11

Nikitas Bompolias

1 Answers

You can by using the Athena JDBC driver. This approach circumvents the catalog, as only Athena (and not Glue as of 25-Jan-2019) can directly access views.

Download the driver and store the jar to an S3 bucket.
Specify the S3 path to the driver as a dependent jar in your job definition.
Load the data into a dynamic frame using the code below (using an IAM user with permission to run Athena queries).

from awsglue.dynamicframe import DynamicFrame
# ...
athena_view_dataframe = (
    glueContext.read.format("jdbc")
    .option("user", "[IAM user access key]")
    .option("password", "[IAM user secret access key]")
    .option("driver", "com.simba.athena.jdbc.Driver")
    .option("url", "jdbc:awsathena://athena.us-east-1.amazonaws.com:443")
    .option("dbtable", "my_database.my_athena_view")
    .option("S3OutputLocation","s3://bucket/temp/folder") # CSVs/metadata dumped here on load
    .load()
    )

athena_view_datasource = DynamicFrame.fromDF(athena_view_dataframe, glueContext, "athena_view_source")

The driver docs (pdf) provide alternatives to IAM user auth (e.g. SAML, custom provider).

The main side effect to this approach is that loading causes the query results to be dumped in CSV format to the bucket specified with the S3OutputLocation key.

I don't believe that you can create a Glue Connection to Athena via JDBC because you can't specify an S3 path to the driver location.

Attribution: AWS support totally helped me get this working.

107

answered Sep 30 '22 04:09

Alejandro C De Baca

Related questions
                            
                                I have been getting "Failed to write to 'index.js'. Please try again." whenever I try to update my lambda function code. why am I facing this issue?
                            
                                AWS CodeDeploy not working in private VPC
                            
                                AWS cli installation on Mac with anaconda python
                            
                                Serverless framework. Exclude not needed functions from package
                            
                                using boto3 how can i associate a vpc with ec2 instance
                            
                                Set LogStreamName for AWS Lambda call
                            
                                How to redirect s3 content request to lambda?
                            
                                How can I set up Continuous Integration of a Dockerized application to Elastic Beanstalk?
                            
                                CodeDeploy running outdated appspec file?
                            
                                How can a Cloudfront distribution an AWS KMS key to GET an S3 image encrypted at rest?
                            
                                How do I run Spark jobs concurrently in the same AWS EMR cluster ?
                            
                                How to return the user's status within AWS's user pool using Amplify with Javascript?
                            
                                How to set retry for "aws s3 cp" command?
                            
                                spring batch file writer to write directly to amazon s3 storage without PutObjectRequest
                            
                                Is it possible to run a join between two different AWS Redshift Databases in the same cluster?
                            
                                TypeError AWS.KinesisVideo is not a constructor
                            
                                AWS Cognito/React.js newPasswordRequired Challenge
                            
                                Work with multiple environments/variables in Terraform
                            
                                Kubernetes HTTP to HTTPS Redirect on AWS with ELB terminating SSL
                            
                                FFmpeg on aws lambda - Invalid NAL unit size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I use Athena View as a source for a AWS Glue Job?

Tags:

amazon-web-services

jobs

amazon-athena

aws-glue

Nikitas Bompolias

People also ask

1 Answers

Alejandro C De Baca

Recent Activity

Donate For Us