I'm confused as to how I should use terraform to connect Athena to my Glue Catalog database.
I use
resource "aws_glue_catalog_database" "catalog_database" {
name = "${var.glue_db_name}"
}
resource "aws_glue_crawler" "datalake_crawler" {
database_name = "${var.glue_db_name}"
name = "${var.crawler_name}"
role = "${aws_iam_role.crawler_iam_role.name}"
description = "${var.crawler_description}"
table_prefix = "${var.table_prefix}"
schedule = "${var.schedule}"
s3_target {
path = "s3://${var.data_bucket_name[0]}"
}
s3_target {
path = "s3://${var.data_bucket_name[1]}"
}
}
to create a Glue DB and the crawler to crawl an s3 bucket (here only two), but I don't know how I link the Athena query service to the Glue DB. In the terraform documentation for Athena
, there doesn't appear to be a way to connect Athena to a Glue catalog but only to an S3 Bucket. Clearly, however, Athena can be integrated with Glue.
How can I terraform an Athena database to use my Glue catalog as its data source rather than an S3 bucket?
You can by using the Athena JDBC driver. This approach circumvents the catalog, as only Athena (and not Glue as of 25-Jan-2019) can directly access views. Download the driver and store the jar to an S3 bucket. Specify the S3 path to the driver as a dependent jar in your job definition.
AWS Athena vs AWS Glue A key difference between Glue and Athena is that Athena is primarily used as a query tool for analytics and Glue is more of a transformation and data movement tool. Creating tables for Glue to use in ETL jobs.
A database in the AWS Glue Data Catalog is a container that holds tables. You use databases to organize your tables into separate categories. Databases are created when you run a crawler or add a table manually. The database list in the AWS Glue console displays descriptions for all your databases.
The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. You can then use the metadata to query and transform that data in a consistent manner across a wide variety of applications.
Our current basic setup for having Glue crawl one S3 bucket and create/update a table in a Glue DB, which can then be queried in Athena, looks like this:
Crawler role and role policy:
resource "aws_iam_role" "glue_crawler_role" {
name = "analytics_glue_crawler_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_role_policy" "glue_crawler_role_policy" {
name = "analytics_glue_crawler_role_policy"
role = "${aws_iam_role.glue_crawler_role.id}"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:*",
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetBucketAcl",
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::analytics-product-data",
"arn:aws:s3:::analytics-product-data/*",
]
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:*:*:/aws-glue/*"
]
}
]
}
EOF
}
S3 Bucket, Glue Database and Crawler:
resource "aws_s3_bucket" "product_bucket" {
bucket = "analytics-product-data"
acl = "private"
}
resource "aws_glue_catalog_database" "analytics_db" {
name = "inventory-analytics-db"
}
resource "aws_glue_crawler" "product_crawler" {
database_name = "${aws_glue_catalog_database.analytics_db.name}"
name = "analytics-product-crawler"
role = "${aws_iam_role.glue_crawler_role.arn}"
schedule = "cron(0 0 * * ? *)"
configuration = "{\"Version\": 1.0, \"CrawlerOutput\": { \"Partitions\": { \"AddOrUpdateBehavior\": \"InheritFromTable\" }, \"Tables\": {\"AddOrUpdateBehavior\": \"MergeNewColumns\" } } }"
schema_change_policy {
delete_behavior = "DELETE_FROM_DATABASE"
}
s3_target {
path = "s3://${aws_s3_bucket.product_bucket.bucket}/products"
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With