Pandas has a great feature, where you can write your dataframe to a table in SQL.
df.to_sql(con=cnx, name='some_table_name', if_exists='replace', flavor='mysql', index=False)
Is there a way to make a temporary table this way?
There is nothing in the documentation as far as I can tell.
The DataFrame.to_sql()
uses the built into pandas pandas.io.sql
package, which itself relies on the SQLAlchemy as a database abstraction layer. In order to create a "temporary" table in SQLAlchemy ORM, you need to supply a prefix:
t = Table(
't', metadata,
Column('id', Integer, primary_key=True),
# ...
prefixes=['TEMPORARY'],
)
From what I see, pandas.io.sql
does not allow you to specify the prefixes
or easily change the way tables are created.
One way to approach this problem would be to create the temporary table beforehand and use to_sql()
with if_exists="append"
(all using the same database connection).
Here is also what I've tried to do: override the pandas.io.sql.SQLTable
's _create_table_setup()
method and pass the prefixes
to the Table
constructor. For some reason, the table was still created non-temporary. Not sure if it would help, but here is the code I was using: gist. This is kind of hacky, but I hope it would at least serve as an example code to get you started on this approach.
This was a quick and easy workaround for me.
Simply apply a RegEx to the generated SQL to add in whatever statements you want.
import io
import pandas as pd
# Get the SQL that would be generated by the create table statement
create_table_sql = pd.io.sql.get_schema(df, tmp_table_name)
# Replace the `CREATE TABLE` part of the generated statement with
# whatever you need.
create_tmp_table_sql = re.sub(
"^(CREATE TABLE)?",
"CREATE TEMP TABLE",
create_table_sql
)
# Write to the database in a transaction (psycopg2)
with conn.cursor() as cur:
cur.execute(create_tmp_table_sql)
output = io.StringIO()
df.to_csv(output, sep="\t", header=False, index=False, na_rep="NULL")
output.seek(0)
cur.copy_from(output, tmp_table_name, null="NULL")
Credit to Aseem for a fast way to write to Postgres.
This may be a bit hacky and it doesn't technically create a temporary table, it just acts like one, but you could create use the @contextmanager
decorator from contextlib
to create the table upon opening the context and drop it upon close. Could look something like:
from contextlib import contextmanager
import numpy as np
import sqlalchemy as sqla
import pandas as pd
@contextmanager
def temp_table(frame, tbl, eng, *args, **kwargs):
frame.to_sql(tbl, eng, *args, **kwargs)
yield
eng.execute('DROP TABLE {}'.format(tbl))
df = pd.DataFrame(np.random.randint(21, size=(10, 10)))
cnx = sqla.create_engine(conn_string)
with temp_table(df, 'some_table_name', cnx, if_exists='replace', flavor='mysql', index=False):
# do stuff with "some_table_name"
I tested it using Teradata and it works fine. I don't have a MySQL laying around that I can test it out on, but as long as DROP
statements work in MySQL, it should work as intended.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With