I have a table which is populated by a daily scheduled job that deletes the last 7 days of data and then repopulates with the 7 most recent days worth of data from another source (mainframe).
Recently, users reported a number of duplicates going back to the beginning of October 2011. ...in the magnitude of hundreds of thousand of rows.
I noticed strange behavior with the delete that runs for each job:
DELETE FROM fm104d
WHERE location = '18'
AND (CONVERT(datetime,CASE WHEN ISDATE(pull_date)=0 THEN '19000101'
ELSE pull_date END)) > DATEADD(day, -7, getdate())
The above returns "(0 row(s) affected)".
When I run the above after replacing the DELETE with a SELECT *, I get 32,000+ rows in return.
Why would the SELECT and DELETE behave differently?
UPDATE
Here is the Actual Execution Plan:
http://pastie.org/2869202
DELETE Syntax Notice the WHERE clause in the DELETE statement. The WHERE clause specifies which record(s) should be deleted. If you omit the WHERE clause, all records in the table will be deleted!
To resolve this issue, we can use the following methods: Using TABLOCK hint with the SQL delete statements. Using ALTER TABLE “heap table name” REBUILD command. Creating and dropping a clustered index on the heap table.
You should use the WHERE clause to filter the records and fetching only the necessary records. The WHERE clause is not only used in the SELECT statement, but it is also used in the UPDATE, DELETE statement, etc., which we would examine in the subsequent chapters.
A delete statement acquires an exclusive intent lock on the reference table; therefore, during that time, no other transactions can modify the data. You can use NOLOCK hint to read the data.
You won't believe this. I didn't in fact as it makes almost no logical sense, but in the end, the solution that worked...was to add an index.
Credit for this goes to my local DBA "Did think about adding an index? I just did to test and sure enough it works".
Here's the index as added:
CREATE INDEX ixDBO_fir104d__SOURCE_LOCATION__Include
ON [dbo].[fir104d] ([SOURCE_LOCATION])
INCLUDE ([Transaction_Date],[PULL_DATE])
GO
I let the job run as scheduled and, sure enough, all is as it was.
My guess is that there is something in the explain plan to say it wasn't using an index / wrong index, but my developer mind can't make much sense of that level of detail.
Thanks to everybody for the time and effort you've all spent.
UPDATE
Received news from a different dev that the data in this table additionally corrupted to the point where it took "several hours of DBA involvement to resolve" along with the dev having to perform some other data fixes (read:data file reloads).
At the end of the day, while adding the index was probably a good thing considering the way the scheduled job runs, apparently, there was even more to the story!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With