Write pandas dataframe into AWS athena database

Question

I have run a query using pyathena, and have created a pandas dataframe. Is there a way to write the pandas dataframe to AWS athena database directly? Like data.to_sql for MYSQL database.

Sharing a example of dataframe code below for reference need to write into AWS athena database:

data=pd.DataFrame({'id':[1,2,3,4,5,6],'name':['a','b','c','d','e','f'],'score':[11,22,33,44,55,66]})

Robert Navado · Accepted Answer

Another modern (as for February 2020) way to achieve this goal is to use aws-data-wrangler library. It's authomating many routine (and sometimes annoying) tasks in data processing.

Combining the case from the question the code would look like below:

import pandas as pd
import awswrangler as wr

data=pd.DataFrame({'id':[1,2,3,4,5,6],'name':['a','b','c','d','e','f'],'score':[11,22,33,44,55,66]})

# Typical Pandas, Numpy or Pyarrow transformation HERE!

wr.pandas.to_parquet(  # Storing the data and metadata to Data Lake
    dataframe=data,
    database="database",
    path="s3://your-s3-bucket/path/to/new/table",
    partition_cols=["name"],
)

This is amazingly helpful, because aws-data-wrangler knows to parse table name from the path (but you can provide table name in the parameter) and define proper types in Glue catalog according to the dataframe.

It also helpful for querying the data with Athena directly to pandas dataframe:

df = wr.pandas.read_table(database="dataase", table="table")

All the process will be fast and convenient.

Write pandas dataframe into AWS athena database

Tags:

python

database

pandas

amazon-athena

PritamJ

1 Answers

Robert Navado

Recent Activity

Donate For Us

Write pandas dataframe into AWS athena database

Tags:

python

database

pandas

amazon-athena

PritamJ

1 Answers

Robert Navado

Related questions

Recent Activity

Donate For Us