Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate SQL statements from a Pandas Dataframe

Tags:

python

sql

pandas

I am loading data from various sources (csv, xls, json etc...) into Pandas dataframes and I would like to generate statements to create and fill a SQL database with this data. Does anyone know of a way to do this?

I know pandas has a to_sql function, but that only works on a database connection, it can not generate a string.

Example

What I would like is to take a dataframe like so:

import pandas as pd import numpy as np  dates = pd.date_range('20130101',periods=6) df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD')) 

And a function that would generate this (this example is PostgreSQL but any would be fine):

CREATE TABLE data (   index timestamp with time zone,   "A" double precision,   "B" double precision,   "C" double precision,   "D" double precision ) 
like image 723
Jorick Spitzen Avatar asked Jun 26 '15 11:06

Jorick Spitzen


2 Answers

If you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the get_schema function of the pandas.io.sql module:

In [10]: print pd.io.sql.get_schema(df.reset_index(), 'data') CREATE TABLE "data" (   "index" TIMESTAMP,   "A" REAL,   "B" REAL,   "C" REAL,   "D" REAL ) 

Some notes:

  • I had to use reset_index because it otherwise didn't include the index
  • If you provide an sqlalchemy engine of a certain database flavor, the result will be adjusted to that flavor (eg the data type names).
like image 87
joris Avatar answered Sep 25 '22 13:09

joris


GENERATE SQL CREATE STATEMENT FROM DATAFRAME

SOURCE = df TARGET = data 

GENERATE SQL CREATE STATEMENT FROM DATAFRAME

def SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):  # SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET) # SOURCE: source dataframe # TARGET: target table to be created in database      import pandas as pd     sql_text = pd.io.sql.get_schema(SOURCE.reset_index(), TARGET)        return sql_text 

Check the SQL CREATE TABLE Statement String

print('\n\n'.join(sql_text)) 

GENERATE SQL INSERT STATEMENT FROM DATAFRAME

def SQL_INSERT_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):     sql_texts = []     for index, row in SOURCE.iterrows():                sql_texts.append('INSERT INTO '+TARGET+' ('+ str(', '.join(SOURCE.columns))+ ') VALUES '+ str(tuple(row.values)))             return sql_texts 

Check the SQL INSERT INTO Statement String

print('\n\n'.join(sql_texts)) 
like image 30
Jansen Simanullang Avatar answered Sep 26 '22 13:09

Jansen Simanullang