Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IcmpSendEcho2 fails with fails with WSA_QOS_ADMISSION_FAILURE and ERROR_NOACCESS

I have an application that pings a bunch of servers. It runs great for days, but suddenly will have many failures of one of two types:

WSA_QOS_ADMISSION_FAILURE (11010) "A QoS error occurred due to lack of resources"

or

ERROR_NOACCESS (998) "Invalid access to memory location."

The odd thing is the errors come in bunches. Ie all pings might fail for a few minutes with one of the above errors. Then it clears up. Later all pings will fail for a few minutes with the other error. They don't seem to ever interleave.

This happens on Windows 2008 R2. I can't reproduce it at will, but if I wait for a day or two, it always happens again.

I checked and rechecked, then checked again to ensure I close all handles that were opened.

It never happens when the app first starts, so doesn't seem to be related to finding or loading DLLs. And it fixes itself after a while, so doesn't seem to be resource exhaustion. And it runs just fine for days, so it doesn't seem to be an API usage problem.

At a loss here. Does anyone have any ideas?

Thanks

like image 481
DougN Avatar asked Dec 14 '22 23:12

DougN


1 Answers

It turns out that the error code 11010 is actually not WSA_QOS_ADMISSION_FAILURE from WinSock (which is not involved here), but a completely different value from the IP stack's ICMP_ECHO_REPLY structure with much more meaningful meaning:

IP_REQ_TIMED_OUT   (11010)   The request timed out

You are supposed to call GetIpErrorString() first and only "if the function fails, use FormatMessage to obtain the message string for the returned error".


Unfortunately, that does not help with that other value, 998.

One clue might be the page "Mapping NT Status Error Codes to Win32 Error Codes", which says that the NT status conditions which map (or mapped when it was last updated, in 2005) to the Win32 code 998 (ERROR_NOACCESS) are more broad:

STATUS_DATATYPE_MISALIGNMENT            ERROR_NOACCESS
STATUS_ACCESS_VIOLATION                 ERROR_NOACCESS
STATUS_DATATYPE_MISALIGNMENT_ERROR      ERROR_NOACCESS

It seems likely that whenever something fails during the IOCTL call (which sends the ICMP echo request to the kernel to be really handled), the underlying exception is swallowed if possible and only this generic Win32 code is sent back.

Therefore it might be that you are really passing some not entirely correct data to the function (like unaligned buffer on the stack, that might explain why it happens sporadically), or even hint at some bug inside the ICMP stack. I'm afraid that only some hardcore kernel debugging could reveal the real cause.

like image 145
Yirkha Avatar answered May 09 '23 03:05

Yirkha