Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write pandas dataframe into AWS athena database

I have run a query using pyathena, and have created a pandas dataframe. Is there a way to write the pandas dataframe to AWS athena database directly? Like data.to_sql for MYSQL database.

Sharing a example of dataframe code below for reference need to write into AWS athena database:

data=pd.DataFrame({'id':[1,2,3,4,5,6],'name':['a','b','c','d','e','f'],'score':[11,22,33,44,55,66]})
like image 602
PritamJ Avatar asked Oct 27 '25 10:10

PritamJ


1 Answers

Another modern (as for February 2020) way to achieve this goal is to use aws-data-wrangler library. It's authomating many routine (and sometimes annoying) tasks in data processing.

Combining the case from the question the code would look like below:

import pandas as pd
import awswrangler as wr

data=pd.DataFrame({'id':[1,2,3,4,5,6],'name':['a','b','c','d','e','f'],'score':[11,22,33,44,55,66]})

# Typical Pandas, Numpy or Pyarrow transformation HERE!

wr.pandas.to_parquet(  # Storing the data and metadata to Data Lake
    dataframe=data,
    database="database",
    path="s3://your-s3-bucket/path/to/new/table",
    partition_cols=["name"],
)

This is amazingly helpful, because aws-data-wrangler knows to parse table name from the path (but you can provide table name in the parameter) and define proper types in Glue catalog according to the dataframe.

It also helpful for querying the data with Athena directly to pandas dataframe:

df = wr.pandas.read_table(database="dataase", table="table")

All the process will be fast and convenient.

like image 120
Robert Navado Avatar answered Oct 29 '25 23:10

Robert Navado



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!