Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Athena Terraform Scripts

Amazon Athena reads data from input Amazon S3 buckets using the IAM credentials of the user who submitted the query; query results are stored in a separate S3 bucket.

Here is the script in Hashicorp site https://www.terraform.io/docs/providers/aws/r/athena_database.html

resource "aws_s3_bucket" "hoge" {
  bucket = "hoge"
}

resource "aws_athena_database" "hoge" {
  name = "database_name"
  bucket = "${aws_s3_bucket.hoge.bucket}"
}

Where it says

bucket - (Required) Name of s3 bucket to save the results of the query execution.

How can I specify the input S3 bucket in the terraform script?

like image 399
Himalay Majumdar Avatar asked Sep 25 '18 19:09

Himalay Majumdar


2 Answers

You would use the storage_descriptor argument in the aws_glue_catalog_table resource:

https://www.terraform.io/docs/providers/aws/r/glue_catalog_table.html#parquet-table-for-athena

Here is an example of creating a table using CSV file(s):

resource "aws_glue_catalog_table" "aws_glue_catalog_table" {
  name          = "your_table_name"
  database_name = "${aws_athena_database.your_athena_database.name}"
  table_type    = "EXTERNAL_TABLE"

  parameters = {
    EXTERNAL = "TRUE"
  }

  storage_descriptor {
    location      = "s3://<your-s3-bucket>/your/file/location/"
    input_format  = "org.apache.hadoop.mapred.TextInputFormat"
    output_format = "org.apache.hadoop.mapred.TextInputFormat"

    ser_de_info {
      name                  = "my-serde"
      serialization_library = "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"

      parameters = {
        "field.delim"            = ","
        "skip.header.line.count" = "1"
      }
    }

    columns {
      name = "column1"
      type = "string"
    }

    columns {
      name = "column2"
      type = "string"
    }

  }
}
like image 69
ishwr_ Avatar answered Nov 17 '22 00:11

ishwr_


The input S3 bucket is specified in each table you create in the database, as such, there's no global definition for it.

As of today, the AWS API doesn't have much provision for Athena management, as such, neither does the aws CLI command, and nor does Terraform. There's no 'proper' way to create a table via these means.

In theory, you could create a named query to create your table, and then execute that query (for which there is API functionality, but not yet Terraform). It seems a bit messy to me, but it would probably work if/when TF gets the StartQuery functionality. The asynchronous nature of Athena makes it tricky to know when that table has actually been created though, and so I can imagine TF won't fully support table creation directly.

TF code that covers the currently available functionality is here: https://github.com/terraform-providers/terraform-provider-aws/tree/master/aws

API doco for Athena functions is here: https://docs.aws.amazon.com/athena/latest/APIReference/API_Operations.html

like image 1
Ralph Bolton Avatar answered Nov 16 '22 23:11

Ralph Bolton