Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connecting Python with Teradata using Teradata module

Tags:

I have installed python 2.7.0 and Teradata module on Windows 7. I am not able to connect and query TD from python.

pip install Teradata

Now I want to import teradata module in my source code and perform operations like -

  1. Firing queries to teradata and get result set.
  2. Check if connection is made to teradata.

Please help me writing code for the same as I am new to Python and there is no information available with me to connect to teradata.

like image 444
anurag Avatar asked Mar 11 '16 10:03

anurag


2 Answers

There are a number of ways to connect to Teradata and export table to Pandas. Here are four+:

Using teradata module

# You can install teradata via PIP: pip install teradata # to get a list of your odbc drivers names, you could do: teradata.tdodbc.drivers # You don’t need to install teradata odbc driver if using method='rest'.      # See sending data from df to teradata for connection example   import teradata import pandas as pd  host,username,password = 'HOST','UID', 'PWD' #Make a connection udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)   with udaExec.connect(method="odbc",system=host, username=username,                             password=password, driver="DRIVERNAME") as connect:      query = "SELECT * FROM DATABASEX.TABLENAMEX;"      #Reading query to df     df = pd.read_sql(query,connect)     # do something with df,e.g.     print(df.head()) #to see the first 5 rows 

Using TeradataSQL

from @ymzkala : This package doesn't require you to install Teradata drivers (other than this package).

# Installing python -m pip install teradatasql  import teradatasql  with teradatasql.connect(host='host', user='username', password='password') as connect:     df = pd.read_sql(query, connect) 

Using pyodbc module

import pyodbc   #You can install teradata via PIP: pip install pyodbc  #to get a list of your odbc drivers names, you could do: pyodbc.drivers()  #Make a connection link = 'DRIVER={DRIVERNAME};DBCNAME={hostname};UID={uid};PWD={pwd}'.format(                       DRIVERNAME=DRIVERNAME,hostname=hostname,                         uid=username, pwd=password) with pyodbc.connect(link,autocommit=True) as connect:      #Reading query to df     df = pd.read_sql(query,connect) 

Using sqlalchemy Module

 #You can install sqlalchemy via PIP: pip install sqlalchemy-teradata  #Note: It is not pip install sqlalchemy. If you already have sqlalchemy, you still need sqlalchemy-teradata to get teradata dialects  from sqlalchemy import create_engine  #Make a connection  link = 'teradata://{username}:{password}@{hostname}/?driver={DRIVERNAME}'.format(                username=username,hostname=hostname,DRIVERNAME=DRIVERNAME)  with create_engine(link) as connect:      #Reading query to df     df = pd.read_sql(query,connect) 

There is a fifth way, using giraffez module. I enjoy using this module as it come with MLOAD, FASTLOAD, BULKEXPORT etc. The only issue for beginners is its requirements (e.g C/C++ compiler ,Teradata CLIv2 and TPT API headers/lib files).

Note: Updated 13-07-2018, using of context manager to ensure closing of sessions

Update: 31-10-2018: Using teradata to send data from df to teradata

We can send data from df to Teradata. Avoiding 'odbc' 1 MB limit and odbc driver dependency, we can use 'rest' method. We need host ip_address, instead of driver argument. NB: The order of columns in df should match the order of columns in Teradata table.

import teradata import pandas as pd  # HOST_IP can be found by executing *>>nslookup viewpoint* or *ping  viewpoint*  udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)  with udaExec.connect(method="rest",system="DBName", username="UserName",                       password="Password", host="HOST_IP_ADDRESS") as connect:      data = [tuple(x) for x in df.to_records(index=False)]      connect.executemany("INSERT INTO DATABASE.TABLEWITH5COL values(?,?,?,?,?)",data,batch=True) 

Using 'odbc', you have to chunk your data to less than 1MB chunks to avoid "[HY001][Teradata][ODBC Teradata Driver] Memory allocation error" error: E.g.

import teradata import pandas as pd import numpy as np  udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)  with udaExec.connect(method="odbc",system="DBName", username="UserName",                       password="Password", driver="DriverName") as connect:      #We can divide our huge_df to small chuncks. E.g. 100 churchs     chunks_df = np.array_split(huge_df, 100)      #Import chuncks to Teradata     for i,_ in enumerate(chunks_df):          data = [tuple(x) for x in chuncks_df[i].to_records(index=False)]         connect.executemany("INSERT INTO DATABASE.TABLEWITH5COL values(?,?,?,?,?)",data,batch=True) 
like image 196
Prayson W. Daniel Avatar answered Oct 02 '22 12:10

Prayson W. Daniel


To add on to Prayson's answer, you can use the teradatasql package (found on pypi). This package doesn't require you to install Teradata drivers (other than this package). Use it like so:

import teradatasql import pandas as pd  with teradatasql.connect(host='host', user='username', password='password') as connect:     data = pd.read_sql('select top 5 * from table_name;', connect)  
like image 33
ymzkala Avatar answered Oct 02 '22 14:10

ymzkala