How to Query parquet data from Amazon Athena?

Tags:

Athena creates a temporary table using fields in S3 table. I have done this using JSON data. Could you help me on how to create table using parquet data?

I have tried following:

Converted sample JSON data to parquet data.
Uploaded parquet data to S3.
Created temporary table using columns of JSON data.

By doing this I am able to a execute query but the result is empty.

Is this approach right or is there any other approach to be followed on parquet data?

Sample json data:

{"_id":"0899f824e118d390f57bc2f279bd38fe","_rev":"1-81cc25723e02f50cb6fef7ce0b0f4f38","deviceId":"BELT001","timestamp":"2016-12-21T13:04:10:066Z","orgid":"fedex","locationId":"LID001","UserId":"UID001","SuperviceId":"SID001"},
{"_id":"0899f824e118d390f57bc2f279bd38fe","_rev":"1-81cc25723e02f50cb6fef7ce0b0f4f38","deviceId":"BELT001","timestamp":"2016-12-21T13:04:10:066Z","orgid":"fedex","locationId":"LID001","UserId":"UID001","SuperviceId":"SID001"}

395

asked Mar 14 '17 12:03

rajeswari

3 Answers

steps:
1. create your my_table_json
2. insert data into my_table_json (verify existence of the created json files in the table 'LOCATION')
3. create my_table_parquet: same create statement as my_table_json except you need to add 'STORED AS PARQUET' clause.
4. run: INSERT INTO my_table_parquet SELECT * FROM my_table_json

answered Oct 20 '22 04:10

belostoky

If your data has been successfully stored in Parquet format, you would then create a table definition that references those files.

Here is an example statement that uses Parquet files:

CREATE EXTERNAL TABLE IF NOT EXISTS elb_logs_pq (
  request_timestamp string,
  elb_name string,
  request_ip string,
  request_port int,
  ...
  ssl_protocol string )
PARTITIONED BY(year int, month int, day int) 
STORED AS PARQUET
LOCATION 's3://athena-examples/elb/parquet/'
tblproperties ("parquet.compress"="SNAPPY");

This example was taken from the AWS blog post Analyzing Data in S3 using Amazon Athena that does an excellent job of explaining the benefits of using compressed and partitioned data in Amazon Athena.

answered Oct 20 '22 02:10

John Rotenstein

If your table definition is valid but not getting any rows, try this

-- The MSCK REPAIR TABLE command will load all partitions into the table. -- This command can take a while to run depending on the number of partitions to be loaded.

MSCK REPAIR TABLE {tablename}

answered Oct 20 '22 03:10

Eric Linden

Related questions
                            
                                How to use jsonPath inside array in AWS Step Functions
                            
                                Cloud API with JavaScript (Amazon, Azure)
                            
                                Choosing Azure or AWS on .NET WCF service deployment [closed]
                            
                                AWS AMI deprecation (API: ec2:RunInstances Not authorized for images)
                            
                                Elastic Beanstalk deployment failing because of a dangling security group
                            
                                Descriptions of Boto3 ClientMethods
                            
                                Modifying rules for a given EC2 security group with Boto3
                            
                                Listing S3 bucket objects with specific storage class
                            
                                Using an existing API key with the Serverless Framework in AWS
                            
                                How to connect Node Sequelize to Amazon RDS MySQL with Multi-AZ probably
                            
                                Amazon SQS how to control the number of retries
                            
                                Chrome --headless for AWS Lambda?
                            
                                AWS SNS : Case of multiple subscribers
                            
                                How To Prevent AWS Lambda Abuse by 3rd-party apps
                            
                                overwrite hive partitions using spark
                            
                                AWS Athena create table and partition
                            
                                DynamoDB BatchWriteItem: Provided list of item keys contains duplicates
                            
                                AWS Elastic Beanstalk and Secret Manager
                            
                                How can I build a front end for querying a Redshift database (hopefully with Rails)
                            
                                Mysterious Http 408 errors in AWS elasticbeanstalk-access_log

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to Query parquet data from Amazon Athena?

Tags:

amazon-web-services

parquet

amazon-athena