Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create/update Amazon Athena tables from Amazon S3 bucket files

I have uploaded the MySQL csv file / MYSQL zip file of all tables in Amazon S3 bucket. Now I want to link the Amazon Athena with S3 bucket file. But when I write the schema for different tables is showing the same result for the select query of each table. I have search a lot but not able to understand the exact /correct way to do this.

I want to create/update different table schema in Athena with the help of one csv /sql zip file from S3 bucket.

like image 921
SHIRISH-DK Avatar asked Apr 17 '19 10:04

SHIRISH-DK


People also ask

How do you update tables in Athena?

To see a new table column in the Athena Query Editor after you run ALTER TABLE ADD COLUMNS , manually refresh the table list in the editor, and then expand the table again.

Can Athena query S3 bucket?

You can now run SQL queries on your file-based data sources from S3 in Athena. This comes in very handy when you have to analyse huge data sets which are stored as multiple files in S3. Depending on how your data is distributed across files and in which file format, your queries will be very performant.

How do I create a table from a CSV file in Athena?

Use a CREATE TABLE statement to create an Athena table based on the data. Reference the OpenCSVSerDe class after ROW FORMAT SERDE and specify the character separator, quote character, and escape character in WITH SERDEPROPERTIES , as in the following example.


Video Answer


1 Answers

Amazon Athena will look in a defined directory for data. All data files within that directory will be treated as containing data for a given table.

You use a CREATE TABLE command to both define the schema and direct Athena to the directory, eg:

CREATE EXTERNAL TABLE test1 (
  f1 string,
  s2 string
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\") 
LOCATION 's3://my-bucket/data-directory/'

You will need to run a CREATE EXTERNAL TABLE command for each table, and the data for each table should be in a separate directory. The CSV files can be in ZIP format (which makes it faster and cheaper to query).

As an alternative to writing these table definitions yourself, you can create a crawler in AWS Glue. Point the crawler to the data directory, supply a name and the crawler will examine the data files and create a table definition that matches the files.

like image 164
John Rotenstein Avatar answered Oct 14 '22 05:10

John Rotenstein