Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify flavor of sql in pd.io.sql.get_schema?

I am trying to use pd.io.sql.get_schema to generate a postgres schema from a dataframe.

There is no documentation for pd.io.sql.get_schema but from this(https://github.com/pandas-dev/pandas/issues/9960) link it says that I can specify a flavor of sql.

However this feature seems to be deprecated instead I can specify a engine like postgresql (Generate SQL statements from a Pandas Dataframe). How do I do this?

Here is my code so far:

pd.io.sql.get_schema(df.reset_index(), 'data')

Open to all suggestions for generating schema.

like image 685
RustyShackleford Avatar asked Jul 11 '18 19:07

RustyShackleford


People also ask

What is DF To_sql?

DataFrame - to_sql() function. The to_sql() function is used to write records stored in a DataFrame to a SQL database. Syntax: DataFrame.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None) Parameters: Name.

Can we use SQL in pandas DataFrame?

Pandasql is a python library that allows manipulation of a Pandas Dataframe using SQL. Under the hood, Pandasql creates an SQLite table from the Pandas Dataframe of interest and allow users to query from the SQLite table using SQL.

Can pandas replace SQL?

There are, of course, alternatives for both but they are the predominant ones in the field. Since both Pandas and SQL operate on tabular data, similar operations or queries can be done using both. In this post, we will compare Pandas and SQL with regards to typical operations in the data analysis process.


1 Answers

I believe you create a connection to a postgreSQL database using SQLAlchemy and then pass that connection to the con kwarg. For example:

import numpy as np
import pandas as pd
import sqlalchemy

dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))

url = 'postgresql://USER:PASSWORD@HOST:PORT/DATABASE'
con = sqlalchemy.create_engine(url, client_encoding='utf8')
print(pd.io.sql.get_schema(df.reset_index(), 'data', con=con))
CREATE TABLE data (
        index TIMESTAMP WITHOUT TIME ZONE,
        "A" FLOAT(53),
        "B" FLOAT(53),
        "C" FLOAT(53),
        "D" FLOAT(53)
)
like image 68
Grr Avatar answered Oct 27 '22 23:10

Grr