Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to salvage SQL server 2008 query from KILLED/ROLLBACK state?

I have a stored procedure that inserts batches of millions of rows, emerging from a certain query, into an SQL database. It has one parameter selecting the batch; when this parameter is omitted, it will gather a list of batches and recursively call itself, in order to iterate over batches. In (pseudo-)code, it looks something like this:

CREATE PROCEDURE spProcedure AS BEGIN
    IF @code = 0  BEGIN
        ...
        WHILE @@Fetch_Status=0 BEGIN
            EXEC spProcedure @code
            FETCH NEXT ... INTO @code
        END
    END
    ELSE BEGIN

        -- Disable indexes
        ...

        INSERT INTO table
        SELECT (...)

        -- Enable indexes
        ...

Now it can happen that this procedure is slow, for whatever reason: it can't get a lock, one of the indexes it uses is misdefined or disabled. In that case, I want to be able kill the procedure, truncate and recreate the resulting table, and try again. However, when I try and kill the procedure, the process frequently oozes into a KILLED/ROLLBACK state from which there seems to be no return. From Google I have learned to do an sp_lock, find the spid, and then kill it with KILL <spid>. But when I try to kill it, it tells me

SPID 75: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 554 seconds.

I did find a forum message hinting that another spid should be killed before the other one can start a rollback. But that didn't work for me either, plus I do not understand, why that would be the case... could it be because I am recursively calling my own stored procedure? (But it should be having the same spid, right?)

In any case, my process is just sitting there, being dead, not responding to kills, and locking the table. This is very frustrating, as I want to go on developing my queries, not waiting hours on my server sitting dead while pretending to be finishing a supposed rollback.

Is there some way in which I can tell the server not to store any rollback information for my query? Or not to allow any other queries to interfere with the rollback, so that it will not take so long? Or how to rewrite my query in a better way, or how kill the process successfully without restarting the server?

like image 228
littlegreen Avatar asked May 12 '10 16:05

littlegreen


4 Answers

A few comments.

First, gbn is correct with regard to not being able to cancel rollbacks in progress. This is how SQL keeps transactional integrity, and you wouldn't want that behavior changed. If you absolutely don't care and just want to get your DB back to where you were when the last backup was taken, then follow his steps.

One caveat, however. There are times I've seen where a spid isn't really rolling back, its just stuck (usually at 0% or 100% progress). The most reliable indicator in this case is if the CPU/IO counters for the spid in activity monitor are not changing (and the SPID isn't being blocked by another SPID). In this case, you might have to restart the SQL service (don't need to do an entire reboot) to clear the spid.

With regard to re-organizing your query so that these rollbacks don't cripple you, yes, its possible. Just use explicit transactions:

    WHILE @@Fetch_Status=0 BEGIN
        BEGIN TRANS
            EXEC spProcedure @code
        COMMIT TRANS
        FETCH NEXT ... INTO @code
    END

The data is committed after each batch. If you experience an issue and have to kill the spid, it should only roll back the current batch it is working on.

If even a single batch is too much, you could probably refactor your "spProcedure" to insert in smaller batches of 10k-100k records, committing after each one.

like image 198
BradC Avatar answered Oct 10 '22 08:10

BradC


It's rolling back the transaction.

It will keep doing this even if you restart the instance.

If you were 99 million rows into a 100 million row insert or delete, all 99 million need rolled back. You cannot change this behavior. Any single DML statement is atomic.

If you want to fix it:

  • Stop SQL Server
  • Delete the DB files
  • Start SQL Server
  • DROP the DB in its broken state
  • Restore

YMMV of course :-)

like image 25
gbn Avatar answered Oct 10 '22 07:10

gbn


Been here. Last time for us was processing somewhere around 3.2 Billion records. The initial statement went to 99% complete within ten minutes; then spent 20 hours on IO.

The first time through it ran for about 8 hours, then an automated backup job killed the process and bounced the server. It took nearly 2 days to get it back online so that we could start the process over.. This time making sure the backup process was turned off.

like image 29
NotMe Avatar answered Oct 10 '22 07:10

NotMe


As I understand it, not without potentially killing the consistency of the data file by doing something nasty like a hard reset, and then SQL will go into recovery state and still perform aspects of roll back to make sure the rolled back transactions is sucessfully rolled back.

Unless something like a nolock can get you past your existing lock (you've not mentioned if its exclusive lock) - you could probably still script the schema of the table, makes it MyTable2 - and then keep writing queries and go back and alter them when it's finished.

like image 1
Andrew Avatar answered Oct 10 '22 08:10

Andrew