I would like to create a database in Athena via API. I have parquet files in S3 that I would like to query using the API and I would like to use Athena for the query. Anyway I can create a database via API for Athena?

Creating a database in Athena can be done by creating your own API request or using the SDK. Here is a Python example using the SDK: <pre class="prettyprint"><code>import boto3 client = boto3.client('athena') config = {'OutputLocation': 's3://TEST_BUCKET/'} client.start_query_execution( QueryString = 'create database TEST_DATABASE', ResultConfiguration = config ) </code></pre> There are SDKs available for Java, .NET, Node, PHP, Python, Ruby, Go, and C++. If you want to create your own API requests, I recommend developing a good understanding of the signing process. You could also use the AWS CLI as such: <pre class="prettyprint"><code>$ aws athena start-query-execution --query-string "CREATE database ATHENA_TEST_TWO" --result-configuration "OutputLocation=s3://TEST_BUCKET/" </code></pre> Once you have a database created, you can then pass the database name in your query requests. <pre class="prettyprint"><code>context = {'Database': 'TEST_DATABASE'} client.start_query_execution(QueryString='CREATE TABLE ...', QueryExecutionContext = context, ResultConfiguration=config) </code></pre> To see some DDL creating a table from Parquet files see the following examples on the Amazon Athena User Guide. Edit In reponse to @condo1234's questions: <blockquote> But how do I associated a database with a file in S3? </blockquote> The short answer is you don't. You associate a table with files sharing a prefix in a bucket in S3. For example, say I want to create a table to analyze data held in <code>s3://TEST_BUCKET</code>. Through the AWS Console, I can use the poorly named "Create Folder" button to create a prefix called <code>one-table-many-files/</code>. I then created two csv files: f1.csv <pre class="prettyprint"><code>Codd,1923 Ellison,1944 Chamberlin,1944 Boyce,1947 </code></pre> f2.csv <pre class="prettyprint"><code>Hopper,1906 Floyd,1953 Moriarty Wolf Chambers,1980 </code></pre> I then uploaded these text files to the example bucket/prefix combination <code>s3://TEST_BUCKET/one-table-many-files/</code> I ran the following DDL: <pre class="prettyprint"><code>CREATE EXTERNAL TABLE php_test.computer_scientists ( name string, year_born int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION 's3://TEST_BUCKET/one-table-many-files/'; </code></pre> And ran the following SQL Statement: <pre class="prettyprint"><code>SELECT * FROM php_test.computer_scientists; </code></pre> I got the following results back, with data from both files in the bucket + prefix combination specified in the DDL. <pre class="prettyprint"><code>"name","year_born" "Hopper","1906" "Floyd","1953" "Moriarty Wolf Chambers","1980" "Codd","1923" "Ellison","1944" "Chamberlin","1944" "Boyce","1947" </code></pre> Notice that I am using the word "prefix" and not "folder"? That is because S3 has no concept of a folder! These prefixes are useful however, as they allow for Athena Partitioning. Per your request, here is a php example as well. <pre class="prettyprint"><code><?php print('Welcome to PHP'); require 'aws-autoloader.php'; $athena = new Aws\Athena\AthenaClient(['version' => 'latest', 'region' => 'us-east-1' ]); $athena->StartQueryExecution([ 'QueryString' => 'CREATE DATABASE php_test;', 'ResultConfiguration' => [ 'OutputLocation' => 's3://TEST_BUCKET/', // REQUIRED ], ]); ?> </code></pre> See the PHP SDK Documentation for more.

How to create Athena database via API

1 Answers

Creating a database in Athena can be done by creating your own API request or using the SDK.

Here is a Python example using the SDK:

import boto3

client = boto3.client('athena')

config = {'OutputLocation': 's3://TEST_BUCKET/'}

client.start_query_execution(
                             QueryString = 'create database TEST_DATABASE', 
                             ResultConfiguration = config
)

There are SDKs available for Java, .NET, Node, PHP, Python, Ruby, Go, and C++. If you want to create your own API requests, I recommend developing a good understanding of the signing process. You could also use the AWS CLI as such:

$ aws athena start-query-execution --query-string "CREATE database ATHENA_TEST_TWO" --result-configuration "OutputLocation=s3://TEST_BUCKET/"

Once you have a database created, you can then pass the database name in your query requests.

context = {'Database': 'TEST_DATABASE'}
client.start_query_execution(QueryString='CREATE TABLE ...', 
                             QueryExecutionContext = context, 
                             ResultConfiguration=config)

To see some DDL creating a table from Parquet files see the following examples on the Amazon Athena User Guide.

Edit In reponse to @condo1234's questions:

But how do I associated a database with a file in S3?

The short answer is you don't. You associate a table with files sharing a prefix in a bucket in S3.

For example, say I want to create a table to analyze data held in s3://TEST_BUCKET. Through the AWS Console, I can use the poorly named "Create Folder" button to create a prefix called one-table-many-files/. I then created two csv files:

f1.csv

Codd,1923
Ellison,1944
Chamberlin,1944
Boyce,1947

f2.csv

Hopper,1906
Floyd,1953
Moriarty Wolf Chambers,1980

I then uploaded these text files to the example bucket/prefix combination s3://TEST_BUCKET/one-table-many-files/

I ran the following DDL:

CREATE EXTERNAL TABLE php_test.computer_scientists (
  name string,
  year_born int
  ) 
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LINES TERMINATED BY '\n'
LOCATION 's3://TEST_BUCKET/one-table-many-files/';

And ran the following SQL Statement:

SELECT * FROM php_test.computer_scientists;

I got the following results back, with data from both files in the bucket + prefix combination specified in the DDL.

"name","year_born"
"Hopper","1906"
"Floyd","1953"
"Moriarty Wolf Chambers","1980"
"Codd","1923"
"Ellison","1944"
"Chamberlin","1944"
"Boyce","1947"

Notice that I am using the word "prefix" and not "folder"? That is because S3 has no concept of a folder! These prefixes are useful however, as they allow for Athena Partitioning.

Per your request, here is a php example as well.

<?php
print('Welcome to PHP');

require 'aws-autoloader.php';

$athena = new Aws\Athena\AthenaClient(['version' => 'latest', 'region' => 'us-east-1' ]);

$athena->StartQueryExecution([
    'QueryString' => 'CREATE DATABASE php_test;',
    'ResultConfiguration' => [
        'OutputLocation' => 's3://TEST_BUCKET/', // REQUIRED
    ],
]);

?>

See the PHP SDK Documentation for more.

answered Sep 30 '22 02:09

Zerodf

Related questions
                            
                                SHOW PARTITIONS with order by in Amazon Athena
                            
                                AWS Glue crawler - partition keys types
                            
                                Difference between "ROWS BETWEEN" and "RANGE BETWEEN" in (Presto) window function "OVER" clause
                            
                                AWS Athena create table and partition
                            
                                How to Query parquet data from Amazon Athena?
                            
                                AWS Athena MSCK REPAIR TABLE takes too long for a small dataset
                            
                                AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3
                            
                                Sanitizing SQL query parameters in AWS Athena
                            
                                AWS Athena partition fetch all paths
                            
                                Amazon AWS Athena S3 and Glacier Mixed Bucket
                            
                                AWS Glue cannot create database from crawler: permission denied
                            
                                How to read quoted CSV with NULL values into Amazon Athena
                            
                                Unnesting in SQL (Athena): How to convert array of structs into an array of values plucked from the structs?
                            
                                AWS Athena - GENERIC_INTERNAL_ERROR: Number of partition values does not match number of filters
                            
                                Amazon Athena - Column cannot be resolved on basic SQL WHERE query
                            
                                Can AWS Athena update or insert data stored in S3?
                            
                                Athena date format unable to convert string to date formate
                            
                                What does "WITH SERDEPROPERTIES ( 'paths' = 'key1, key2, key3') " really do in Hive DDL json serde?
                            
                                Athena create table from parquet schema
                            
                                Store multiple elements in json files in AWS Athena

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create Athena database via API

Tags:

amazon-athena

condo1234

People also ask

1 Answers

Zerodf

Recent Activity

Donate For Us