Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread abort leaves zombie transactions and broken SqlConnection

I feel like this behavior should not be happening. Here's the scenario:

  1. Start a long-running sql transaction.

  2. The thread that ran the sql command gets aborted (not by our code!)

  3. When the thread returns to managed code, the SqlConnection's state is "Closed" - but the transaction is still open on the sql server.

  4. The SQLConnection can be re-opened, and you can try to call rollback on the transaction, but it has no effect (not that I would expect this behavior. The point is there is no way to access the transaction on the db and roll it back.)

The issue is simply that the transaction is not cleaned up properly when the thread aborts. This was a problem with .Net 1.1, 2.0 and 2.0 SP1. We are running .Net 3.5 SP1.

Here is a sample program that illustrates the issue.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using System.Data.SqlClient;
using System.Threading;

namespace ConsoleApplication1
{
    class Run
    {
        static Thread transactionThread;

        public class ConnectionHolder : IDisposable
        {
            public void Dispose()
            {
            }

            public void executeLongTransaction()
            {
                Console.WriteLine("Starting a long running transaction.");
                using (SqlConnection _con = new SqlConnection("Data Source=<YourServer>;Initial Catalog=<YourDB>;Integrated Security=True;Persist Security Info=False;Max Pool Size=200;MultipleActiveResultSets=True;Connect Timeout=30;Application Name=ConsoleApplication1.vshost"))
                {
                    try
                    {
                        SqlTransaction trans = null;
                        trans = _con.BeginTransaction();

                        SqlCommand cmd = new SqlCommand("update <YourTable> set Name = 'XXX' where ID = @0; waitfor delay '00:00:05'", _con, trans);
                        cmd.Parameters.Add(new SqlParameter("0", 340));
                        cmd.ExecuteNonQuery();

                        cmd.Transaction.Commit();

                        Console.WriteLine("Finished the long running transaction.");
                    }
                    catch (ThreadAbortException tae)
                    {
                        Console.WriteLine("Thread - caught ThreadAbortException in executeLongTransaction - resetting.");
                        Console.WriteLine("Exception message: {0}", tae.Message);
                    }
                }
            }
        }

        static void killTransactionThread()
        {
            Thread.Sleep(2 * 1000);

            // We're not doing this anywhere in our real code.  This is for simulation
            // purposes only!
            transactionThread.Abort();

            Console.WriteLine("Killing the transaction thread...");
        }

        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main(string[] args)
        {
            using (var connectionHolder = new ConnectionHolder())
            {
                transactionThread = new Thread(connectionHolder.executeLongTransaction);
                transactionThread.Start();

                new Thread(killTransactionThread).Start();

                transactionThread.Join();

                Console.WriteLine("The transaction thread has died.  Please run 'select * from sysprocesses where open_tran > 0' now while this window remains open. \n\n");

                Console.Read();
            }
        }
    }
}

There is a Microsoft Hotfix targeted at .Net2.0 SP1 that was supposed to address this, but we obviously have newer DLL's (.Net 3.5 SP1) that don't match the version numbers listed in this hotfix.

Can anyone explain this behavior, and why the ThreadAbort is still not cleaning up the sql transaction properly? Does .Net 3.5 SP1 not include this hotfix, or is this behavior that is technically correct?

like image 799
womp Avatar asked Jun 02 '11 19:06

womp


2 Answers

Since you're using SqlConnection with pooling, your code is never in control of closing the connections. The pool is. On the server side, a pending transaction will be rolled back when the connection is truly closed (socket closed), but with pooling the server side never sees a connection close. W/o the connection closing (either by physical disconnect at the socket/pipe/LPC layer or by sp_reset_connection call), the server cannot abort the pending transaction. So it really boils down to the fact that the connection does not get properly release/reset. I don't understand why you're trying to complicate the code with explicit thread abort dismissal and attempt to reopen a closed transaction (that will never work). You should simply wrap the SqlConnection in an using(...) block, the implied finally and connection Dispose will be run even on thread abort.

My recommendation would be to keep things simple, ditch the fancy thread abort handling and replace it with a plain 'using' block (using(connection) {using(transaction) {code; commit () }}.

Of course I assume you do not propagate the transaction context into a different scope in the server (you do not use sp_getbindtoken and friends, and you do not enroll in distributed transactions).

This little program shows that the Thread.Abort properly closes a connection and the transaction is rolled back:

using System;
using System.Data.SqlClient;
using testThreadAbort.Properties;
using System.Threading;
using System.Diagnostics;

namespace testThreadAbort
{
    class Program
    {
        static AutoResetEvent evReady = new AutoResetEvent(false);
        static long xactId = 0;

        static void ThreadFunc()
        {
            using (SqlConnection conn = new SqlConnection(Settings.Default.conn))
            {
                conn.Open();
                using (SqlTransaction trn = conn.BeginTransaction())
                {
                    // Retrieve our XACTID
                    //
                    SqlCommand cmd = new SqlCommand("select transaction_id from sys.dm_tran_current_transaction", conn, trn);
                    xactId = (long) cmd.ExecuteScalar();
                    Console.Out.WriteLine("XactID: {0}", xactId);

                    cmd = new SqlCommand(@"
insert into test (a) values (1); 
waitfor delay '00:01:00'", conn, trn);

                    // Signal readyness and wait...
                    //
                    evReady.Set();
                    cmd.ExecuteNonQuery();

                    trn.Commit();
                }
            }

        }

        static void Main(string[] args)
        {
            try
            {
                using (SqlConnection conn = new SqlConnection(Settings.Default.conn))
                {
                    conn.Open();
                    SqlCommand cmd = new SqlCommand(@"
if  object_id('test') is not null
begin
    drop table test;
end
create table test (a int);", conn);
                    cmd.ExecuteNonQuery();
                }


                Thread thread = new Thread(new ThreadStart(ThreadFunc));
                thread.Start();
                evReady.WaitOne();
                Thread.Sleep(TimeSpan.FromSeconds(5));
                Console.Out.WriteLine("Aborting...");
                thread.Abort();
                thread.Join();
                Console.Out.WriteLine("Aborted");

                Debug.Assert(0 != xactId);

                using (SqlConnection conn = new SqlConnection(Settings.Default.conn))
                {
                    conn.Open();

                    // checked if xactId is still active
                    //
                    SqlCommand cmd = new SqlCommand("select count(*) from  sys.dm_tran_active_transactions where transaction_id = @xactId", conn);
                    cmd.Parameters.AddWithValue("@xactId", xactId);

                    object count = cmd.ExecuteScalar();
                    Console.WriteLine("Active transactions with xactId {0}: {1}", xactId, count);

                    // Check count of rows in test (would block on row lock)
                    //
                    cmd = new SqlCommand("select count(*) from  test", conn);
                    count = cmd.ExecuteScalar();
                    Console.WriteLine("Count of rows in text: {0}", count);
                }
            }
            catch (Exception e)
            {
                Console.Error.Write(e);
            }

        }
    }
}
like image 196
Remus Rusanu Avatar answered Oct 10 '22 08:10

Remus Rusanu


This is a bug in Microsoft's MARS implementation. Disabling MARS in your connection string will make the problem go away.

If you require MARS, and are comfortable making your application dependent on another company's internal implementation, familiarize yourself with http://dotnet.sys-con.com/node/39040, break out .NET Reflector, and look at the connection and pool classes. You have to store a copy of the DbConnectionInternal property before the failure occurs. Later, use reflection to pass the reference to a deallocation method in the internal pooling class. This will stop your connection from lingering for 4:00 - 7:40 minutes.

There are surely other ways to force the connection out of the pool and to be disposed. Short of a hotfix from Microsoft, though, reflection seems to be necessary. The public methods in the ADO.NET API don't seem to help.

like image 36
Cody Jones Avatar answered Oct 10 '22 08:10

Cody Jones