Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to upsert pandas DataFrame to MySQL with SQLAlchemy

I'm pushing data from a data-frame into MySQL, right now it is only adding new data to the table if the data does not exists(appending). This works perfect, however I also want my code to check if the record already exists then it needs to update. So I need it to append + update. I really don't know how to start fixing this as I got stuck....someone tried this before?

This is my code:

engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                        .format(user="root",
                                pw="*****",
                                db="my_db"))
my_df.to_sql('my_table', con = engine, if_exists = 'append')
like image 736
NorthAfrican Avatar asked Feb 01 '26 03:02

NorthAfrican


2 Answers

You can use next solution on DB side:

First: create table for insert data from Pandas (let call it test):

CREATE TABLE `test` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(100) NOT NULL,
  `capacity` INT(11) NOT NULL,
  PRIMARY KEY (`id`)
);

Second: Create table for resulting data (let call it cumulative_test) exactly same structure as test:

CREATE TABLE `cumulative_test` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(100) NOT NULL,
  `capacity` INT(11) NOT NULL,
  PRIMARY KEY (`id`)
);

Third: set trigger on each insert into the test table will insert ore update record in the second table like:

DELIMITER $$

CREATE
    /*!50017 DEFINER = 'root'@'localhost' */
    TRIGGER `before_test_insert` BEFORE INSERT ON `test` 
    FOR EACH ROW BEGIN
    DECLARE _id INT;
    
    SELECT id INTO _id
    FROM `cumulative_test` WHERE `cumulative_test`.`name` = new.name;
    
    IF _id IS NOT NULL THEN
        UPDATE cumulative_test
        SET `cumulative_test`.`capacity` = `cumulative_test`.`capacity` + new.capacity;
     ELSE 
        INSERT INTO `cumulative_test` (`name`, `capacity`) 
        VALUES (NEW.name, NEW.capacity);
    END IF; 
END;
$$

DELIMITER ;

So you will already insert values into the test table and get calculated results in the second table. The logic inside the trigger can be matched for your needs.

like image 174
Slava Rozhnev Avatar answered Feb 02 '26 17:02

Slava Rozhnev


Similar to the approach used for PostgreSQL here, you can use INSERT … ON DUPLICATE KEY in MySQL:

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.execute(sa.text("DROP TABLE IF EXISTS main_table"))
    conn.execute(
        sa.text(
            "CREATE TABLE main_table (id int primary key, txt varchar(50))"
        )
    )
    conn.execute(
        sa.text(
            "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
        )
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )

    # step 1 - create temporary table and upload DataFrame
    conn.execute(
        sa.text(
            "CREATE TEMPORARY TABLE temp_table (id int primary key, txt varchar(50))"
        )
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.execute(
        sa.text(
            """\
            INSERT INTO main_table (id, txt) 
            SELECT id, txt FROM temp_table
            ON DUPLICATE KEY UPDATE txt = VALUES(txt)
            """
        )
    )

    # step 3 - confirm results
    result = conn.execute(
        sa.text("SELECT * FROM main_table ORDER BY id")
    ).fetchall()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]
like image 44
Gord Thompson Avatar answered Feb 02 '26 17:02

Gord Thompson