I am loading data from various sources (csv, xls, json etc...) into Pandas dataframes and I would like to generate statements to create and fill a SQL database with this data. Does anyone know of a way to do this?
I know pandas has a to_sql
function, but that only works on a database connection, it can not generate a string.
What I would like is to take a dataframe like so:
import pandas as pd import numpy as np dates = pd.date_range('20130101',periods=6) df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
And a function that would generate this (this example is PostgreSQL but any would be fine):
CREATE TABLE data ( index timestamp with time zone, "A" double precision, "B" double precision, "C" double precision, "D" double precision )
If you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the get_schema
function of the pandas.io.sql module:
In [10]: print pd.io.sql.get_schema(df.reset_index(), 'data') CREATE TABLE "data" ( "index" TIMESTAMP, "A" REAL, "B" REAL, "C" REAL, "D" REAL )
Some notes:
reset_index
because it otherwise didn't include the indexSOURCE = df TARGET = data
def SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET): # SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET) # SOURCE: source dataframe # TARGET: target table to be created in database import pandas as pd sql_text = pd.io.sql.get_schema(SOURCE.reset_index(), TARGET) return sql_text
CREATE TABLE
Statement Stringprint('\n\n'.join(sql_text))
def SQL_INSERT_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET): sql_texts = [] for index, row in SOURCE.iterrows(): sql_texts.append('INSERT INTO '+TARGET+' ('+ str(', '.join(SOURCE.columns))+ ') VALUES '+ str(tuple(row.values))) return sql_texts
INSERT INTO
Statement Stringprint('\n\n'.join(sql_texts))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With