Amazon Athena reads data from input Amazon S3 buckets using the IAM credentials of the user who submitted the query; query results are stored in a separate S3 bucket.
Here is the script in Hashicorp site https://www.terraform.io/docs/providers/aws/r/athena_database.html
resource "aws_s3_bucket" "hoge" {
bucket = "hoge"
}
resource "aws_athena_database" "hoge" {
name = "database_name"
bucket = "${aws_s3_bucket.hoge.bucket}"
}
Where it says
bucket - (Required) Name of s3 bucket to save the results of the query execution.
How can I specify the input S3 bucket in the terraform script?
You would use the storage_descriptor
argument in the aws_glue_catalog_table
resource:
https://www.terraform.io/docs/providers/aws/r/glue_catalog_table.html#parquet-table-for-athena
Here is an example of creating a table using CSV file(s):
resource "aws_glue_catalog_table" "aws_glue_catalog_table" {
name = "your_table_name"
database_name = "${aws_athena_database.your_athena_database.name}"
table_type = "EXTERNAL_TABLE"
parameters = {
EXTERNAL = "TRUE"
}
storage_descriptor {
location = "s3://<your-s3-bucket>/your/file/location/"
input_format = "org.apache.hadoop.mapred.TextInputFormat"
output_format = "org.apache.hadoop.mapred.TextInputFormat"
ser_de_info {
name = "my-serde"
serialization_library = "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
parameters = {
"field.delim" = ","
"skip.header.line.count" = "1"
}
}
columns {
name = "column1"
type = "string"
}
columns {
name = "column2"
type = "string"
}
}
}
The input S3 bucket is specified in each table you create in the database, as such, there's no global definition for it.
As of today, the AWS API doesn't have much provision for Athena management, as such, neither does the aws
CLI command, and nor does Terraform. There's no 'proper' way to create a table via these means.
In theory, you could create a named query to create your table, and then execute that query (for which there is API functionality, but not yet Terraform). It seems a bit messy to me, but it would probably work if/when TF gets the StartQuery functionality. The asynchronous nature of Athena makes it tricky to know when that table has actually been created though, and so I can imagine TF won't fully support table creation directly.
TF code that covers the currently available functionality is here: https://github.com/terraform-providers/terraform-provider-aws/tree/master/aws
API doco for Athena functions is here: https://docs.aws.amazon.com/athena/latest/APIReference/API_Operations.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With