Snowflake pandas pd_writer writes out tables with NULLs

Tags:

I have a Pandas dataframe that I'm writing out to Snowflake using SQLAlchemy engine and the to_sql function. It works fine, but I have to use the chunksize option because of some Snowflake limit. This is also fine for smaller dataframes. However, some dataframes are 500k+ rows, and at a 15k records per chunk, it takes forever to complete writing to Snowflake.

I did some research and came across the pd_writer method provided by Snowflake, which apparently loads the dataframe much faster. My Python script does complete faster and I see it creates a table with all the right columns and the right row count, but every single column's value in every single row is NULL.

I thought it was a NaN to NULL issue and tried everything possible to replace the NaNs with None, and while it does the replacement within the dataframe, by the time it gets to the table, everything becomes NULL.

How can I use pd_writer to get these huge dataframes written properly into Snowflake? Are there any viable alternatives?

EDIT: Following Chris' answer, I decided to try with the official example. Here's my code and the result set:

import os
import pandas as pd
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
from snowflake.connector.pandas_tools import write_pandas, pd_writer


def create_db_engine(db_name, schema_name):
    return create_engine(
        URL(
            account=os.environ.get("DB_ACCOUNT"),
            user=os.environ.get("DB_USERNAME"),
            password=os.environ.get("DB_PASSWORD"),
            database=db_name,
            schema=schema_name,
            warehouse=os.environ.get("DB_WAREHOUSE"),
            role=os.environ.get("DB_ROLE"),
        )
    )


def create_table(out_df, table_name, idx=False):
    engine = create_db_engine("dummy_db", "dummy_schema")
    connection = engine.connect()

    try:
        out_df.to_sql(
            table_name, connection, if_exists="append", index=idx, method=pd_writer
        )

    except ConnectionError:
        print("Unable to connect to database!")

    finally:
        connection.close()
        engine.dispose()

    return True


df = pd.DataFrame([("Mark", 10), ("Luke", 20)], columns=["name", "balance"])

print(df.head)

create_table(df, "dummy_demo_table")

The code works fine with no hitches, but when I look at the table, which gets created, it's all NULLs. Again.

This is what dummy_demo_table shows me

427

asked Aug 12 '20 06:08

CodingInCircles

1 Answers

Turns out, the documentation (arguably, Snowflake's weakest point) is out of sync with reality. This is the real issue: https://github.com/snowflakedb/snowflake-connector-python/issues/329. All it needs is a single character in the column name to be upper case and it works perfectly.

My workaround is to simply do: df.columns = map(str.upper, df.columns) before invoking to_sql.

155

answered Sep 21 '22 10:09

CodingInCircles

Related questions
                            
                                Printing not being logged by Kubernetes
                            
                                Config Set min and max value for window size Kivy
                            
                                Django TemplateDoesNotExist at / debug_toolbar/base.html after deployiing to EC2
                            
                                ImportError: No module named 'selenium' in PyCharm
                            
                                Speed up Metropolis--Hastings in Python
                            
                                How to add bytes to bytearray in Python 3.7?
                            
                                Section divider in Spyder
                            
                                How to create multiple seaborn heatmaps with a shared legend in one figure?
                            
                                UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' (writing to PDF)
                            
                                Why is Pandas so madly fast? How to define such functions?
                            
                                How to change entire row if NaN present if a single column has NaN
                            
                                What does 'del self.self ' in an __init__ function mean?
                            
                                How do you invert a tensor of boolean values in Pytorch?
                            
                                Looping through multiple arrays & concatenating values in pandas
                            
                                how to count the frequency of letters in text excluding whitespace and numbers?
                            
                                How to upsert pandas DataFrame to PostgreSQL table?
                            
                                Is there a way to grab the last item of a group
                            
                                Warning: The lock file is not up to date with the latest changes in pyproject.toml
                            
                                How to unpack a single variable tuple in Python3?
                            
                                Remove white border from dots in a seaborn scatterplot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Snowflake pandas pd_writer writes out tables with NULLs

Tags:

python

pandas

dataframe

sqlalchemy

snowflake-cloud-data-platform

CodingInCircles

People also ask

1 Answers

CodingInCircles

Recent Activity

Donate For Us