Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating Temporary Table on Netezza from Pandas DataFrame

I am looking to change over my workflow from SAS to Python, and have been prety successful thus far except for one pretty big thing. I cannot figure out how to upload Pandas DataFrames to my company's Netezza for use in later queries. This is actually pretty important as we have many datasets that we upload and use for queries.

I have the following pandas DataFrame:

[In ]  df
[Out]
       col1  col2  col3  
    0     1     2     3
    1     4     5     6
    2     7     8     9

I would like to upload this DataFrame to my Netezza box via ODBC connection. The connection is already set up as follows:

import pyodbc
conn = pyodbc.connect("Driver=NetezzaSQL;Server=...;")

I have used this connection in conjunction with Pandas read_sql to extract data and store it in a DataFrame. However, I have not yet figured out how to extract data. In SAS, I would do the following:

proc sql _method;
connect to netezza as net_dw
(auth domain info goes here...)
execute( create temporary table my_table
                                ( col1   int,
                                  col2   int,
                                  col3   int ) distribute on (col1) by net_dw)

insert into temp.my_table
select col1, col2, col3 from work.my_table;
quit;

I tried the following using Pandas:

t = pd.read_sql('''create temporary table test1 (col1 int, col2 int, col3 int); insert into temp.test1 select * from df''', conn)

but got a TypeError: 'NoneType' object is not iterable.

Is it possible to upload temporary tables to Netezza using pyodbc and Pandas?

like image 461
invoker Avatar asked Nov 22 '22 08:11

invoker


1 Answers

nzalchemy can help here.

  • Install sqlalchemy using pip install sqlalchemy
  • Install nzalchemy
  • For linux systems installing unixOdbc is required (yum instal unixODBC-devel or brew install unixOdbc
  • DataFrame's .to_sql method can be used to move data frames directly to Netezza as a table
from sqlalchemy import create_engine
from urllib import parse_quote_plus

# assumes NZ_HOST, NZ_USER, NZ_PASSWORD are set
import os

params = parse_quote_plus(f"DRIVER=NetezzaSQL;SERVER={os['NZ_HOST']};"
            f"DATABASE={os['NZ_DATABASE']};USER={os['NZ_USER']};"
            f"PASSWORD={os['NZ_PASSWORD']}")
engine = create_engine(f"netezza+pyodbc:///?odbc_connect={params}", 
            echo=True)

# assuming df to be a dataframe
df.to_sql('users', con=engine)
engine.execute("SELECT * FROM users").fetchall()

Details of the api - DataFrame.to_sql

like image 175
Aniket Kulkarni Avatar answered Nov 28 '22 05:11

Aniket Kulkarni