Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a temporary table in MySQL using Pandas

Pandas has a great feature, where you can write your dataframe to a table in SQL.

df.to_sql(con=cnx, name='some_table_name', if_exists='replace', flavor='mysql', index=False)

Is there a way to make a temporary table this way?

There is nothing in the documentation as far as I can tell.

like image 590
Monica Heddneck Avatar asked Dec 10 '15 02:12

Monica Heddneck


Video Answer


3 Answers

The DataFrame.to_sql() uses the built into pandas pandas.io.sql package, which itself relies on the SQLAlchemy as a database abstraction layer. In order to create a "temporary" table in SQLAlchemy ORM, you need to supply a prefix:

t = Table(
    't', metadata,
    Column('id', Integer, primary_key=True),
    # ...
    prefixes=['TEMPORARY'],
)

From what I see, pandas.io.sql does not allow you to specify the prefixes or easily change the way tables are created.

One way to approach this problem would be to create the temporary table beforehand and use to_sql() with if_exists="append" (all using the same database connection).


Here is also what I've tried to do: override the pandas.io.sql.SQLTable's _create_table_setup() method and pass the prefixes to the Table constructor. For some reason, the table was still created non-temporary. Not sure if it would help, but here is the code I was using: gist. This is kind of hacky, but I hope it would at least serve as an example code to get you started on this approach.

like image 59
alecxe Avatar answered Oct 13 '22 17:10

alecxe


Easy workaround without fancy magic

This was a quick and easy workaround for me.

Simply apply a RegEx to the generated SQL to add in whatever statements you want.

import io
import pandas as pd

# Get the SQL that would be generated by the create table statement
create_table_sql = pd.io.sql.get_schema(df, tmp_table_name)

# Replace the `CREATE TABLE` part of the generated statement with 
# whatever you need.
create_tmp_table_sql = re.sub(
    "^(CREATE TABLE)?",
    "CREATE TEMP TABLE",
    create_table_sql
)

# Write to the database in a transaction (psycopg2)
with conn.cursor() as cur:
    cur.execute(create_tmp_table_sql)
    output = io.StringIO()
    df.to_csv(output, sep="\t", header=False, index=False, na_rep="NULL")
    output.seek(0)
    cur.copy_from(output, tmp_table_name, null="NULL")

Credit to Aseem for a fast way to write to Postgres.

like image 37
iamyojimbo Avatar answered Oct 13 '22 18:10

iamyojimbo


This may be a bit hacky and it doesn't technically create a temporary table, it just acts like one, but you could create use the @contextmanager decorator from contextlib to create the table upon opening the context and drop it upon close. Could look something like:

from contextlib import contextmanager

import numpy as np
import sqlalchemy as sqla
import pandas as pd


@contextmanager
def temp_table(frame, tbl, eng, *args, **kwargs):
    frame.to_sql(tbl, eng, *args, **kwargs)
    yield
    eng.execute('DROP TABLE {}'.format(tbl))

df = pd.DataFrame(np.random.randint(21, size=(10, 10)))
cnx = sqla.create_engine(conn_string)

with temp_table(df, 'some_table_name', cnx, if_exists='replace', flavor='mysql', index=False):
    # do stuff with "some_table_name"

I tested it using Teradata and it works fine. I don't have a MySQL laying around that I can test it out on, but as long as DROP statements work in MySQL, it should work as intended.

like image 1
Eric Ed Lohmar Avatar answered Oct 13 '22 19:10

Eric Ed Lohmar