I'm trying to insert data from a Pandas dataframe into a table in Snowflake, and I'm having trouble figuring out how to do it properly. To start with, I have a created a table in Snowflake that has some columns of type VARIANT
. For example:
CREATE OR REPLACE TABLE
mydatabase.myschema.results(
DATE date,
PRODUCT string,
PRODUCT_DETAILS variant,
ANALYSIS_META variant,
PRICE float
)
Then in Pandas, I have a dataframe like this:
import pandas as pd
record = {'DATE': '2020-11-05',
'PRODUCT': 'blue_banana',
'PRODUCT_DETAILS': "{'is_blue': True, 'is_kiwi': nan}",
'ANALYSIS_META': "None",
'PRICE': 13.02}
df = pd.DataFrame(record, index=[0])
As you see, I've encoded VARIANT
columns as strings, as that's what I understood from the snowflake-connector
documentation, that a Snowflake VARIANT
type maps to str
dtype in Pandas and vice-versa.
So, what I've tried to far is the following:
from snowflake.connector import pandas_tools
pandas_tools.write_pandas(
conn=conn,
df=df,
table_name="results",
schema="myschema",
database="mydatabase")
And this does work, returning
(True,
1,
1,
[('czeau/file0.txt', 'LOADED', 1, 1, 1, 0, None, None, None, None)])
However, the results I get in Snowflake are not of the proper VARIANT
type. The field ANALYSIS_META
is correctly NULL
, but the field PRODUCT_DETAILS
is of type str
. See:
(also, for example this query throws an error:
SELECT * FROM
MYDATABASE.MYSCHEMA.RESULTS
WHERE PRODUCT_DETAILS:is_blue -- should work for json/variant fields
So with all that, my question is: how should I properly format my Pandas dataframe in order to insert he VARIANT
fields correctly as nested fields into a Snowflake table? I thought that casting a dictionary into a string would do the trick, but apparently it doesn't work as I expected. What I am missing here?
After some investigation, I found the following solution to work:
1. Ensure that the columns are json-compliant
The key here is that json.dumps
will transform your data to the right format (the right quotations, representation of null
and such).
import pandas as pd
import json
record = {'DATE': '2020-11-05',
'PRODUCT': 'blue_banana',
'PRODUCT_DETAILS': json.dumps({'is_blue': True, 'is_kiwi': None}),
'ANALYSIS_META': json.dumps(None),
'PRICE': 13.02}
df = pd.DataFrame(record, index=[0])
2. Ensure you use parse_json
and INSERT
iteratively
Instead of using write_pandas
as tried originally, we can INSERT
into the table row by row, making sure to specify parse_json
on the columns of desired VARIANT
type, while also encoding the value as a string (by putting '
marks around it). The caveat is that this solution would be very slow if you have large amounts of data.
sql = """INSERT INTO MYDATABASE.MYSCHEMA.RESULTS
SELECT
to_date('{DATE}'),
'{PRODUCT}',
parse_json('{PRODUCT_DETAILS}'),
parse_json('{ANALYSIS_META}'),
{PRICE}
"""
### CREATE A SNOWFLAKE CONN...
for i, r in df.iterrows():
conn.cursor().execute(sql.format(**dict(r)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With