Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting random 10060 (conn. timeout) errors on a server socket

A few users of my software have come to me recently, telling me that it doesn't work on Windows 8. After investigation it turns out that for some strange reason, my server socket doesn't always accept connections, but lets them time out.

Even stranger: it also happens when connecting to localhost, not just when accessing it remotely.

"What have you tried?"

  • Obvious stuff: turn off firewalls (no effect), see if other software does work (it does), try modifying random lines of code (no effect).
  • Less obvious stuff: use a global WSAStartup instead of one per client or server instance (no effect).

Reminder: the exact same code works fine on Windows XP and Windows 7 and it affects localhost connections as well (not a hardware issue). Also, only a third of the connections fail, the rest works fine.

Okay, now some real code, since that's a lot more useful than all these words.

Socket setup:

int iResult;

struct addrinfo *result = NULL;
struct addrinfo hints;

ZeroMemory(&hints, sizeof(hints));
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;
hints.ai_protocol = IPPROTO_TCP;
hints.ai_flags = AI_PASSIVE;

// "Resolve" our localhost
iResult = getaddrinfo(NULL, port, &hints, &result);
if (iResult != 0) {
    printf("error (2) : %d\n", iResult);
    return false;
}

// Create the socket
listenSocket = socket(result->ai_family, result->ai_socktype, result->ai_protocol);
if (listenSocket == INVALID_SOCKET) {
    freeaddrinfo(result);
    printf("error (3) : %d\n", WSAGetLastError());
    return false;
}

// Bind it
iResult = bind(listenSocket, result->ai_addr, result->ai_addrlen);
if (iResult == SOCKET_ERROR) {
    freeaddrinfo(result);
    closesocket(listenSocket);
    printf("error (4) : %d\n", WSAGetLastError());
    return false;
}

freeaddrinfo(result);

// Listen
iResult = listen(listenSocket, SOMAXCONN);
if (iResult == SOCKET_ERROR) {
    closesocket(listenSocket);
    printf("%d\n", WSAGetLastError());
    return false;
}

As you can probably see, it's almost directly taken from MSDN and should be fine. Besides, it works for 2/3 of the connections so I really doubt it's the setup code at fault.

The receiver code:

    if (listenSocket == INVALID_SOCKET) return false;
#pragma warning(disable:4127)
    fd_set fds;

    SOCKET client;
    do {
        FD_ZERO(&fds);
        FD_SET(listenSocket, &fds);

        struct timeval timeout;
        timeout.tv_sec = 5;
        timeout.tv_usec = 0;
        if (!select(1, &fds, NULL, NULL, &timeout)) continue; // See you next loop!

        struct sockaddr_in addr;
        socklen_t addrlen = sizeof(addr);

        // Accept the socket
        client = accept(listenSocket, (struct sockaddr *)&addr, &addrlen);

        if (client == INVALID_SOCKET) {
            printf("[HTTP] Invalid socket\n");
            closesocket(listenSocket);
            return false;
        }

        // Set a 1s timeout on recv()
        struct timeval tv;
        tv.tv_sec = 1;
        tv.tv_usec = 0;
        setsockopt(client, SOL_SOCKET, SO_RCVTIMEO, (char*)&tv, sizeof(tv));

        // Receive the request
        char recvbuf[513];
        int iResult;
        std::stringbuf buf;

        clock_t end = clock() + CLOCKS_PER_SEC; // 1s from now

        do {
            iResult = recv(client, recvbuf, 512, 0);
            if (iResult > 0) {
                buf.sputn(recvbuf, iResult);
            } else if (iResult == 0) {
                // Hmm...
            } else {
                printf("[HTTP] Socket error: %d\n", WSAGetLastError());
                break;
            }
        } while (!requestComplete(&buf) && clock() < end);

This code spits out a "[HTTP] Socket error: 10060" error, so any code that comes after it is fairly irrelevant.

The select call is there because the actual loop does some other things as well, but I left it out because it's not socket-related.

Even stranger: Windows seems to be making actual network errors, according to Wireshark: http://i.imgur.com/BIrbD.png

I've been trying to figure this out for a while now, and I'm probably just doing something stupid, so I really appreciate all your answers.

like image 451
Tom van der Woerdt Avatar asked Oct 05 '12 10:10

Tom van der Woerdt


2 Answers

I've been working on this annoying issue for an entire day now, and managed to eventually resolve it by rewriting the entire server from scratch and implementing it differently. I did trace the issue back to setsockopt which doesn't seem to take SO_RCVTIMEO very well anymore, causing the timeout to go to zero seconds which makes random connections time out.

My new implementation no longer uses a timeout, and is now simply non-blocking and asynchronous. Works very well but it takes a lot more code.

I'll assume that it's simply a bug in Windows 8 that will be fixed with an update before it's released. I doubt that Microsoft wanted to change the Berkeley Sockets API like this.

like image 151
Tom van der Woerdt Avatar answered Oct 06 '22 01:10

Tom van der Woerdt


In Windows, SO_RCVTIMEO option requieres DWORD argument in MILLISECONDS, but not a timeval structure. See http://msdn.microsoft.com/en-us/library/windows/desktop/ms740476(v=vs.85).aspx.

Passing timeval causes windows to interpret it as DWORD, and seconds member is read as it is milliseconds. I don't know why timeval argument worked in Win prior 8, probably it was undocumented feature, which was removed in win 8.

like image 24
Crush Avatar answered Oct 06 '22 01:10

Crush