Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NServiceBus MSMQ messages intermittently get stuck on the Outgoing Queue

We have a Pub / Sub system based on NServiceBus, where we have intermittent issues with messages getting stuck on the Publishers outgoing queue indefinitely, rather than being transmitted to the Subscribers input queues.

Points to note:

  1. When we restart the Publisher Service and Subscriber services, message flow resumes normally for a while.
  2. The problem seems to occur more often if a sustained period of time between messages occurs.
  3. The publisher service resides on the LAN, the subscribers on the otherside of a firewall.
  4. Some messages get through! As mentioned after service restarts, things go fine for a while.
  5. Using QueueExplorer, I can see the messages on the Outgoing queue have a state of WAITING.

Annoyingly our development environment does not exhibit this behaviour, but then again the publisher and subscribers all reside on the same LAN in this environment.

like image 912
Scott Ferguson Avatar asked Dec 14 '11 03:12

Scott Ferguson


4 Answers


MSMQ messages being stuck in an outgoing queue is purely an MSMQ issue.
Restarting the Publisher and Subscriber services should make no difference as they are not directly involved in message delivery. If you can fix the problem by ONLY restarting the Pub/Sub services and NOT the Message Queuing services then it looks like a resources/memory leak problem.

I imagine something like this happening:

  1. Messages flow to destination, which uses up kernel memory in storing them
  2. For some reason, kernel memory runs out (too many messages, memory leak, whetever)
  3. Destination now rejects new messages as they cannot be loaded into memory from the wire
  4. Connection is reset and not re-connected until WaitTime value reached; Queue is "waiting" at this point
  5. System loops through (3) and (4) until ...
  6. Pub/Sub services are restarted and now there is sufficient resources for messages to be delivered
  7. Goto (2)

Occasional messages get through when just enough kernel memory is temporarily freed up by one of the many services and device drivers that use it.

Item 4 of this blog post is the most likely culprit: http://blogs.msdn.com/b/johnbreakwell/archive/2006/09/18/insufficient-resources-run-away-run-away.aspx

Cheers
John Breakwell

like image 125
John Breakwell Avatar answered Nov 15 '22 18:11

John Breakwell


We had a similar scenario in production, it turned out we migrated one of our subscriber endpoints to a new physical host and forgot to unsubscribe before shutting down the old endpoint. Our publisher was trying to deliver messages to both the old and new endpoints but could only reach the new one. Eventually the publishers outbound queue grew so large that it started affecting all outgoing messages.

like image 42
Charlie Barker Avatar answered Nov 15 '22 18:11

Charlie Barker


I have run into this issue as well, I know it is not Item 4, as I don't send anything to it before it gets stuck in the outgoing queue. If I let both publisher and subscriber sit for about 10 minutes before sending a message, it never leaves the outgoing queue. If I send a message before that amount of time, it flows fine. Also, if I restart the subscriber the message will then flow. This is reproducible every time I let them sit idle for 10 minutes.

I think I found the answer here, at least this fixed the issue I was having:

http://support.microsoft.com/kb/2554746

Also, in my case it had nothing to do with restarting, so don't let that throw you off, I did exhibit the symptoms in the netstat and messages would initially go through when the client was first started up.

like image 44
BlackICE Avatar answered Nov 15 '22 18:11

BlackICE


Just to throw my 2p in:

We had an issue where the message queuing service had some kind of memory leak and would consume large amounts of memory which is did not release.

enter image description here

This lead to messages getting stuck for long periods of time - although they would eventually be delivered (sometimes after 3 days).

We have not bothered fixing this yet as it only happens when the service is under heavy load which does not happen often.

like image 1
tom redfern Avatar answered Nov 15 '22 19:11

tom redfern