I have a .NET Remoting service which works fine most of the time. If an exception or error happens, it logs the error to a file but still continues to run.
However, about once every two weeks the service stops responding to clients, which causes the client appication to crash with a SocketException with the following message:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
No exception or stack trace is written to our log file, so I can't figure out where the service is crashing at, which leads me to believe that it is somewhere outside of my code which is failing. What additional steps can I take to figure out the root cause of this crash? I would imagine that it writes something to an EventLog somewhere, but I am not super familiar with Windows' Event Logging system so I'm not exactly sure where to look.
Thanks in advance for any assistance with this.
EDIT: Forgot to mention, stopping or restarting the service does nothing, the service never responds. I need to manually kill the process before I can start the service again.
EDIT 2:
public class ClientInfoServerSinkProvider :
IServerChannelSinkProvider
{
private IServerChannelSinkProvider _nextProvider = null;
public ClientInfoServerSinkProvider()
{
}
public ClientInfoServerSinkProvider(
IDictionary properties,
ICollection providerData)
{
}
public IServerChannelSinkProvider Next
{
get { return _nextProvider; }
set { _nextProvider = value; }
}
public IServerChannelSink CreateSink(IChannelReceiver channel)
{
IServerChannelSink nextSink = null;
if (_nextProvider != null)
{
nextSink = _nextProvider.CreateSink(channel);
}
return new ClientIPServerSink(nextSink);
}
public void GetChannelData(IChannelDataStore channelData)
{
}
}
public class ClientIPServerSink :
BaseChannelObjectWithProperties,
IServerChannelSink,
IChannelSinkBase
{
private IServerChannelSink _nextSink;
public ClientIPServerSink(IServerChannelSink next)
{
_nextSink = next;
}
public IServerChannelSink NextChannelSink
{
get { return _nextSink; }
set { _nextSink = value; }
}
public void AsyncProcessResponse(
IServerResponseChannelSinkStack sinkStack,
Object state,
IMessage message,
ITransportHeaders headers,
Stream stream)
{
IPAddress ip = headers[CommonTransportKeys.IPAddress] as IPAddress;
CallContext.SetData("ClientIPAddress", ip);
sinkStack.AsyncProcessResponse(message, headers, stream);
}
public Stream GetResponseStream(
IServerResponseChannelSinkStack sinkStack,
Object state,
IMessage message,
ITransportHeaders headers)
{
return null;
}
public ServerProcessing ProcessMessage(
IServerChannelSinkStack sinkStack,
IMessage requestMsg,
ITransportHeaders requestHeaders,
Stream requestStream,
out IMessage responseMsg,
out ITransportHeaders responseHeaders,
out Stream responseStream)
{
if (_nextSink != null)
{
IPAddress ip =
requestHeaders[CommonTransportKeys.IPAddress] as IPAddress;
CallContext.SetData("ClientIPAddress", ip);
ServerProcessing spres = _nextSink.ProcessMessage(
sinkStack,
requestMsg,
requestHeaders,
requestStream,
out responseMsg,
out responseHeaders,
out responseStream);
return spres;
}
else
{
responseMsg = null;
responseHeaders = null;
responseStream = null;
return new ServerProcessing();
}
}
This is like trying to find out why nobody picks up the phone when you call a friend. And the problem is that his house burned down to the ground. An imperfect view of what is going on is the core issue, especially bad with a service because there is so little to look at.
This can't get better until you use that telephone to talk to the service programmer and get him involved with the problem. Somebody is going to have to debug this. And yes, it will be difficult, failing once every two weeks might not be considered critical enough. Or too long to sit around waiting for it to happen. Only practical thing you can do to help is create a minidump of the process and pass that to the service programmer so he's got something to poke at. If the service runs on another machine then get the LAN admin involved as well.
The issue was due to a deadlock caused in my code, if memory serves I had two locking objects and I locked one from inside the other, essentially making them wait for each other. I was able to determine this by hooking up a debugger to the remote service.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With