Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WCF Net.Msmq Service occasionally faults

Tags:

wcf

msmq-wcf

I have a self-hosted WCF service (runs inside a windows service). This service listens for messages on an MSMQ. The service is PerCall, and Transactional running on Windows 2008 R2, .NET 4.0, MSMQ 5.0.

Once every couple of weeks the service will stop processing messages. The windows service remains running but the WCF servicehost itself stops. The servicehost faults with the following exception:

Timestamp: 3/21/2015 5:37:06 PM Message: HandlingInstanceID: a26ffd8b-d3b4-4b89-9055-4c376d586268 An exception of type 'System.ServiceModel.MsmqException' occurred and was caught. --------------------------------------------------------------------------------- 03/21/2015 13:37:06 Type : System.ServiceModel.MsmqException, System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089 Message : An error occurred while receiving a message from the queue: The transaction's operation sequence is incorrect. (-1072824239, 0xc00e0051). Ensure that MSMQ is installed and running. Make sure the queue is available to receive from. Source : System.ServiceModel Help link : ErrorCode : -1072824239 Data : System.Collections.ListDictionaryInternal TargetSite : Boolean TryReceive(System.TimeSpan, System.ServiceModel.Channels.Message ByRef) dynatrace_invocationCount : 0 Stack Trace : at System.ServiceModel.Channels.MsmqInputChannelBase.TryReceive(TimeSpan timeout, Message& message) at System.ServiceModel.Dispatcher.InputChannelBinder.TryReceive(TimeSpan timeout, RequestContext& requestContext) at System.ServiceModel.Dispatcher.ErrorHandlingReceiver.TryReceive(TimeSpan timeout, RequestContext& requestContext)

Searching for the particular exception ("The transaction's operation sequence is incorrect") doesn't yield a lot of info. And most suggestions for how to remedy a faulted services is to restart the servicehost within the faulted event.

I can do that but I hoping that there is a known fixable cause for this exception and/or whether there is a cleaner way to handle it.

like image 298
JNappi Avatar asked Mar 23 '15 21:03

JNappi


2 Answers

We have faced this issue in our product and we opened a ticket with Microsoft, at the end they admits its a bug in .NET Framework and it will be fixed soon.

The issue was reported on windows server 2008 and 2012 but never on 2016 or windows 10.

So we did two solution, recommended all customers to upgrade to Windows 2016, and we added a code to handle the on fault for the service host to restart the service (You can simulate the same error by restarting the MSMQ service while the WCF service host is open.

The code to restore the service is as below:

first you add an event handler for your host to handle "Faulted" event:

SH.Faulted += new EventHandler(SH_Faulted);
//SH is the ServiceHost

Then inside the event handler

 private static void SH_Faulted(object sender, EventArgs e)
        {

        if (SH.State != CommunicationState.Opened)
        {

            int intSleep = 15 * 1000;
            //Abort the host
            SH.Abort();

            //Remove the event
            SH.Faulted -= new EventHandler(SH_Faulted); 

            //I sleep to make sure that the MSMQ have enough time to recover, better make it optional.
            System.Threading.Thread.Sleep(intSleep);
            try
            {
                ReConnectCounter++;
                LogEvent(string.Format("Service '{0}' faulted restarting service count # {1}", serviceName, ReConnectCounter));

                  //Restart the service again here
            }
            catch (Exception ex)
            {
                //failed.. .you can retry if you like
            }
        }
    }

Eventually the error will happen again, but your service will continue working fine, till Microsoft solves the issue or you upgrade to 2016

Updated: After further investigation, and help from Microsoft we found the root cause of the issue, which is the order of the timeout between the below:

MachineLeveDTCTimeOut(20 minutes) >=
DefaultTimeOut(15 minutes) >= WCF service transactionTimeout >
receiveTimeout()

So by adding the below it should fix this issue:

<system.transactions>
      <defaultSettings timeout="00:05:00"/>
</system.transactions>

More detailed article: https://blogs.msdn.microsoft.com/asiatech/2013/02/18/wcfmsmq-intermittent-mq_error_transaction_sequence-error/

like image 113
Sufyan Jabr Avatar answered Nov 12 '22 06:11

Sufyan Jabr


We have the same problem in our production environment. Unfortunately, there is an issue opened with Microsoft about it, but it's marked "Closed as Deferred" since 2013. The following workaround is mentioned by EasySR20:

If you set the service's receiveTimeout a few seconds less than the service's transactionTimeout this will prevent the exception from happening and taking down the service host. These are both settings that can be set in the server's app.config file.

I haven't confirmed this resolves the issue, but it's one option.

We have implemented the service fault restart option instead.

like image 21
Taylor Buchanan Avatar answered Nov 12 '22 07:11

Taylor Buchanan