I have run a query using pyathena, and have created a pandas dataframe. Is there a way to write the pandas dataframe to AWS athena database directly? Like data.to_sql for MYSQL database.
Sharing a example of dataframe code below for reference need to write into AWS athena database:
data=pd.DataFrame({'id':[1,2,3,4,5,6],'name':['a','b','c','d','e','f'],'score':[11,22,33,44,55,66]})
Another modern (as for February 2020) way to achieve this goal is to use aws-data-wrangler library. It's authomating many routine (and sometimes annoying) tasks in data processing.
Combining the case from the question the code would look like below:
import pandas as pd
import awswrangler as wr
data=pd.DataFrame({'id':[1,2,3,4,5,6],'name':['a','b','c','d','e','f'],'score':[11,22,33,44,55,66]})
# Typical Pandas, Numpy or Pyarrow transformation HERE!
wr.pandas.to_parquet( # Storing the data and metadata to Data Lake
dataframe=data,
database="database",
path="s3://your-s3-bucket/path/to/new/table",
partition_cols=["name"],
)
This is amazingly helpful, because aws-data-wrangler knows to parse table name from the path (but you can provide table name in the parameter) and define proper types in Glue catalog according to the dataframe.
It also helpful for querying the data with Athena directly to pandas dataframe:
df = wr.pandas.read_table(database="dataase", table="table")
All the process will be fast and convenient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With