Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What could cause a Windows Service to hang when a Console App doing the exact same thing using the exact same base libraries doesn't?

I hate asking questions like this - they're so undefined... and undefinable, but here goes.

Background: I've got a DLL that is the guts of an application that is a timed process. My timer receives a configuration for the interval at which it runs and a delegate that should be run when the interval elapses. I've got another DLL that contains the process that I inject.

I created two applications, one Windows Service and one Console Application. Each of the applications read their own configuration file and load the same libraries pushing the configured timer interval and delegate into my timed process class.

Problem: Yesterday and for the last n weeks, everything was working fine in our production environment using the Windows Service. Today, the Windows Service will run for a period of around 20-30 minutes and hangs (with a timer interval of 30 secods), but the console application runs without issue and has for the past 4 hours. Detailed logging doesn't indicate any failure. It's as if the Windows Service just...dies quietly - without stopping.

Given that my Windows Service and Console Applications are doing the exact same thing, I can only think that there is something that is causing the Windows Service process to hang - but I have no idea what could be causing that. I've checked the configuration files, and they're both identical - I even copied and pasted the contents of one into the other just to be sure. No dice.

Can anyone make suggestions as to what might cause a Windows Service to hang, when a counterpart Console Application using the same base libraries doesn't; or can anyone point me in the direction of tools that would allow me to diagnose what could be causing this issue?

Thanks for everyone's help - still digging.

like image 440
BenAlabaster Avatar asked Dec 13 '22 01:12

BenAlabaster


2 Answers

You need to figure out what changed on the production server. At first, the IT guys responsible will swear that nothing changed but you have to be persistent. i've seen this happen to often i've lost count. Software doesn't spoil. Period. The change must have been to the environment.

Difference in execution: You have two apps running the same code. The most likely difference (and culprit) is that the service is running with a different set of security credentials than your console app and might fall victim to security vagaries. Check on that first. Which Windows account is running the service? What is its role and scope? Is there any 3rd party security software running on the server and perhaps Killing errant apps? Do you have to register your service with a 3rd party security service? Is your .Net assembly properly signed? Are your .Net assemblies properly registered and configured on the server? Last but not least, don't forget that a debugger user, which you most likely are, gets away with a lot more stuff than many other account types.

Another thought: Since timing seems to be part of the issues, check the scheduled tasks on the machine. Perhaps there's a process that is set to go off every 30 minutes that is interfering with your own.

like image 89
Paul Sasik Avatar answered Feb 09 '23 12:02

Paul Sasik


You can debug a Windows service by running it interactively within Visual Studio. This may help you to isolate the problem by setting (perhaps conditional) breakpoints.

Alternatively, you can use the Visual Studio "Attach to process" dialog window to find the service process and attach to it with the "Debug CLR" option enabled. Again this allows you to set breakpoints as needed.

Are you using any assertions? If an assertion fires without being re-directed to write to a log file, your service will hang. If the code throws an unhandled exception, perhaps because of a memory leak, then your service process will crash. If you set the Service Control Manager (SCM) to restart your process in the event of a crash, you should be able to see that the service has been restarted. As you have identical code running in both environments, these two situations don't seem likely. But remember that your service is being hosted by the SCM, which means a very different environment to the one in which your console app is running.

I often use a "heartbeat", where each active thread in the service sends a regular (say every 30 seconds) message to a local MSMQ. This enables manual or automated monitoring, and should give you some clues when these heartbeat messages stop arriving.

Annother possibility is some sort of permissions problem, because the service is probably running with a different local/domain user to the console.

After the hang, can you use the SCM to stop the service? If you can't, then there is probably some sort of thread deadlock problem. After the service appears to hang, you can go to a command-line and type sc queryex servicename. This should give you the current STATE of the service.

like image 24
HTTP 410 Avatar answered Feb 09 '23 12:02

HTTP 410