Python Version - 2.7.6 Pandas Version - 0.17.1 MySQLdb Version - 1.2.5 In my database ( <code>PRODUCT</code> ) , I have a table ( <code>XML_FEED</code> ). The table XML_FEED is huge ( Millions of record ) I have a pandas.DataFrame() ( <code>PROCESSED_DF</code> ). The dataframe has thousands of rows. Now I need to run this <pre class="prettyprint"><code>REPLACE INTO TABLE PRODUCT.XML_FEED (COL1, COL2, COL3, COL4, COL5), VALUES (PROCESSED_DF.values) </code></pre> Question:- Is there a way to run <code>REPLACE INTO TABLE</code> in pandas? I already checked <code>pandas.DataFrame.to_sql()</code> but that is not what I need. I do not prefer to read <code>XML_FEED</code> table in pandas because it very huge.

With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the <code>to_sql</code> function. I was able to achieve the behavior of <code>REPLACE INTO</code> by passing this callable to <code>to_sql</code>: <pre class="prettyprint lang-py prettyprint-override"><code>def mysql_replace_into(table, conn, keys, data_iter): from sqlalchemy.dialects.mysql import insert from sqlalchemy.ext.compiler import compiles from sqlalchemy.sql.expression import Insert @compiles(Insert) def replace_string(insert, compiler, **kw): s = compiler.visit_insert(insert, **kw) s = s.replace("INSERT INTO", "REPLACE INTO") return s data = [dict(zip(keys, row)) for row in data_iter] conn.execute(table.table.insert(replace_string=""), data) </code></pre> You would pass it like so: <pre class="prettyprint lang-py prettyprint-override"><code>df.to_sql(db, if_exists='append', method=mysql_replace_into) </code></pre> Alternatively, if you want the behavior of <code>INSERT ... ON DUPLICATE KEY UPDATE ...</code> instead, you can use this: <pre class="prettyprint lang-py prettyprint-override"><code>def mysql_replace_into(table, conn, keys, data_iter): from sqlalchemy.dialects.mysql import insert data = [dict(zip(keys, row)) for row in data_iter] stmt = insert(table.table).values(data) update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(), stmt.inserted.values()))) conn.execute(update_stmt) </code></pre> Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.

REPLACE rows in mysql database table with pandas DataFrame

Tags:

python

replace

pandas

mysql

Python Version - 2.7.6

Pandas Version - 0.17.1

MySQLdb Version - 1.2.5

In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record ) I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.

Now I need to run this

REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)

Question:-

Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.

841

asked Jan 07 '16 17:01

Yogesh Yadav

3 Answers

With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the to_sql function.

I was able to achieve the behavior of REPLACE INTO by passing this callable to to_sql:

def mysql_replace_into(table, conn, keys, data_iter):
    from sqlalchemy.dialects.mysql import insert
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql.expression import Insert

    @compiles(Insert)
    def replace_string(insert, compiler, **kw):
        s = compiler.visit_insert(insert, **kw)
        s = s.replace("INSERT INTO", "REPLACE INTO")
        return s

    data = [dict(zip(keys, row)) for row in data_iter]

    conn.execute(table.table.insert(replace_string=""), data)

You would pass it like so:

df.to_sql(db, if_exists='append', method=mysql_replace_into)

Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ... instead, you can use this:

def mysql_replace_into(table, conn, keys, data_iter):
    from sqlalchemy.dialects.mysql import insert

    data = [dict(zip(keys, row)) for row in data_iter]

    stmt = insert(table.table).values(data)
    update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(), 
                                               stmt.inserted.values())))

    conn.execute(update_stmt)

Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.

138

answered Oct 24 '22 11:10

devnull

Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same. I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')

Suppose

1) product_id is my primary key in table PRODUCT

2) feed_id is my primary key in table XML_FEED.

SIMPLE VERSION

import MySQLdb
import sqlalchemy
import pandas

con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password@localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated

Please note:- The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.

answered Oct 24 '22 11:10

Yogesh Yadav

I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO statement with df.to_sql().

It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.

def to_sql_update(df, engine, schema, table):
    df.reset_index(inplace=True)
    sql = ''' SELECT column_name from information_schema.columns
              WHERE table_schema = '{schema}' AND table_name = '{table}' AND
                    COLUMN_KEY = 'PRI';
          '''.format(schema=schema, table=table)
    id_cols = [x[0] for x in engine.execute(sql).fetchall()]
    id_vals = [df[col_name].tolist() for col_name in id_cols]
    sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
    for row in zip(*id_vals):
        sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
        sql += ' OR ({}) '.format(sql_row)
    engine.execute(sql)
    
    df.to_sql(table, engine, schema=schema, if_exists='append', index=False)

answered Oct 24 '22 11:10

dbc

Related questions
                            
                                pandas: for each row in df copy row N times with slight changes
                            
                                HTTPSHandler error while installing pip with python 2.7.9
                            
                                Comparison operators and 'is' - operator precedence in python?
                            
                                How to use Pearson Correlation as distance metric in Scikit-learn Agglomerative clustering
                            
                                Certain Power of Sum of Digits of N == N (running too slowly)
                            
                                Displaying image without waitKey
                            
                                Python Windows 7 - Installation Fail 0x80240017
                            
                                Round floats down in Python to keep one non-zero decimal only
                            
                                Debug cython code (.pyx) when using the python debugger (pdb) - Best Practice
                            
                                How to get easy_install to ignore certifcate
                            
                                Find/extract a sequence of integers within a list in python
                            
                                Python Exception Safe Pickle Use
                            
                                Generator of evenly spaced points in a circle in python
                            
                                How to download a file over HTTP with multi-thread (asynchronous download) using Python 2.7
                            
                                Django DB Models F Combined Expression
                            
                                Using argparse with function that takes **kwargs argument
                            
                                remove item from list according to item's special attribute [duplicate]
                            
                                asyncio's call_later raises 'generator' object is not callable with coroutine object
                            
                                Making np.loadtxt work with multiple possible delimiters
                            
                                numpy, get maximum of subsets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

REPLACE rows in mysql database table with pandas DataFrame

Tags:

python

replace

pandas

mysql

Yogesh Yadav

People also ask

3 Answers

devnull

Yogesh Yadav

dbc

Recent Activity

Donate For Us