Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete rows from SQL server bases on content in dataframe

I have an inventory table in a SQL Server called dbo.inventory which contains Year, Month, Material and Stock_quantity. I receive a new inventory count as csv file each day and need to load this into the dbo.inventory table. I do however need to delete records in the database if Year and Month from the csv file already exits in the database in order to avoid loading multiple inventory counts for the same month.

In SQL I would do it like this:

Delete t1 
FROM dbo.inventory t1
JOIN csv t2 ON t1.Year = t2.Year and t1.Month = t2.Month

I don't know how to do it in a Python script so I avoid loading my CSV file as a staging table into the datawarehouse, but just delete existing rows matching Year and Month and then loading them.

I have used the following in another setup:

delete_date = sales.Date.max()
connection = engine.connect()
connection.execute(f"""delete from sales where Date = '{delete_date}'""")
connection.close()

But this doesn't work here as the input for what should be deleted is a dataframe, which in theory could contain multiple year and months if it a correction to earlier loaded figures.

like image 660
Morten_DK Avatar asked Sep 02 '19 19:09

Morten_DK


Video Answer


1 Answers

Pandas doesn't support deletion of SQL rows based on specific conditions. You have to tell SQL Server which rows your want to delete:

import sqlalchemy as sa

engine = sa.create_engine('mssql+pyodbc://...')
meta = sa.MetaData()

# Map the Inventory table in your database to a SQLAlchemy object
inventory = sa.Table('Inventory', meta, autoload=True, autoload_with=engine)

# Build the WHERE clause of your DELETE statement from rows in the dataframe.
# Equivalence in T-SQL
#      WHERE (Year = ... AND Month = ...) OR (Year = ... AND Month = ...) OR (Year = ... AND Month = ...)
cond = df.apply(lambda row: sa.and_(inventory.c['Year'] == row['Year'], inventory.c['Month'] == row['Month']), axis=1)
cond = sa.or_(*cond)

# Define and execute the DELETE
delete = inventory.delete().where(cond)
with engine.connect() as conn:
    conn.execute(delete)

# Now you can insert the new data
df.to_sql('Inventory', engine, if_exists='append', index=False)
like image 88
Code Different Avatar answered Nov 14 '22 23:11

Code Different