I am using epoll in what I believe to be the typical manner for TCP sockets (based largely on this example, but slightly adapted to C++); one main listening socket bound to the port, and each new connection socket (from accept()) is also added for alerts when it's ready for recv(). I have created a test script that basically hammers it with connections and send/receives. When any single client is connected, it will work flawlessly, endlessly.
However, adding a second simultaneous test client will cause one of them to hang and fail. After a couple days of debugging, I finally decided to just have it spit out the socket ID it's working with into a file, and I am perplexed by what I found.
When one script starts, I get just a stream of, in this case, 6. However, when the second script starts, I get a stream of 7. Just 7. And it remains at 7, exclusively communicating with the second client, completely ignoring the first, until the first one reaches its timeout and closes. (Then, when client 2 reconnects, it gets ID 6 instead.)
It is worth noting that this test script does not use a persistent connection, it disconnects and reconnects after a few messages go back and forth (for a more accurate simulation). But even through that, client 1 is ignored. If I set the timeout high enough that client 2 actually has time to exit, it still won't resume with client 1, as whatever it was waiting for just kinda gets lost.
Is this normal behaviour, for epoll (or sockets in general) to completely abandon a previous task when a new one arises? Is there some option I have to specify?
EDIT: This is as much of the code as I can show; I'm not necessarily expecting a "this is what you did wrong", more of a "these are some things that will break/fix a similar situation".
#define EVENTMODE (EPOLLIN | EPOLLET | EPOLLRDHUP | EPOLLHUP)
#define ERRCHECK (EPOLLERR | EPOLLHUP | EPOLLRDHUP)
//Setup event buffer:
struct epoll_event* events = (epoll_event*)calloc(maxEventCount, sizeof(event));
//Setup done, main processing loop:
int iter, eventCount;
while (1) {
//Wait for events indefinitely:
eventCount = epoll_wait(pollID, events, maxEventCount, -1);
if (eventCount < 0) {
syslog(LOG_ERR, "Poll checking error, continuing...");
continue;
}
for (iter = 0; iter<eventCount; ++iter) {
int currFD = events[iter].data.fd;
cout << "Working with " << events[iter].data.fd << endl;
if (events[iter].events & ERRCHECK) {
//Error or hangup:
cout << "Closing " << events[iter].data.fd << endl;
close(events[iter].data.fd);
continue;
} else if (!(events[iter].events & EPOLLIN)) {
//Data not really ready?
cout << "Not ready on " << events[iter].data.fd << endl;
continue;
} else if (events[iter].data.fd == socketID) {
//Event on the listening socket, incoming connections:
cout << "Connecting on " << events[iter].data.fd << endl;
//Set up accepting socket descriptor:
int acceptID = accept(socketID, NULL, NULL);
if (acceptID == -1) {
//Error:
if (!(errno == EAGAIN || errno == EWOULDBLOCK)) {
//NOT just letting us know there's nothing new:
syslog(LOG_ERR, "Can't accept on socket: %s", strerror(errno));
}
continue;
}
//Set non-blocking:
if (setNonBlocking(acceptID) < 0) {
//Error:
syslog(LOG_ERR, "Can't set accepting socket non-blocking: %s", strerror(errno));
close(acceptID);
continue;
}
cout << "Listening on " << acceptID << endl;
//Add event listener:
event.data.fd = acceptID;
event.events = EVENTMODE;
if (epoll_ctl(pollID, EPOLL_CTL_ADD, acceptID, &event) < 0) {
//Error adding event:
syslog(LOG_ERR, "Can't edit epoll: %s", strerror(errno));
close(acceptID);
continue;
}
} else {
//Data on accepting socket waiting to be read:
cout << "Receive attempt on " << event.data.fd << endl;
cout << "Supposed to be " << currFD << endl;
if (receive(event.data.fd) == false) {
sendOut(event.data.fd, streamFalse);
}
}
}
}
EDIT: The code has been revised, and the removal of edge-triggering will indeed stop epoll from locking onto one client. It still has issues with clients not receiving data; debugging is under way to see if it's the same issue or something else.
EDIT: It seems to be the same error in a different suit. It does try to receive on the second socket, but further logging reports that it actually hits an EWOULDBLOCK almost every time. Interestingly enough, the logs are reporting much more activity than is warranted - over 150,000 lines, when I'd expect about 60,000. Deleting all the "Would block" lines reduces it to about the number I'd expect... and lo and behold, the resulting lines create the exact same pattern. Putting edge-triggering back in stops the would-block behaviour, apparently preventing it from spinning its wheels as fast as it can for no apparent reason. Still doesn't solve the original problem.
EDIT: Just to cover my bases, I figured I'd do more debugging on the sending side, since the hung client is obviously waiting for a message it never gets. However, I can confirm that the server sends a response for every request it processes; the hung client's request just gets lost entirely, and therefore never responded to.
I have also made sure that my receive loop reads until it actually hits EWOULDBLOCK (this is normally unnecessary because the first two bytes of my message header contain the message size), but it didn't change anything.
'Nother EDIT: I should probably clarify that this system uses a request/reply format, and the receiving, processing, and sending is all done in one shot. As you may guess, this requires reading the receive buffer until it's empty, the primary requirement for edge-triggered mode. If the received message is incomplete (which should never happen), the server basically returns false to the client, which while technically an error will still allow the client to proceed with another request.
Debugging has confirmed that the client to hang will send out a request, and wait for a response, but that request never triggers anything in epoll - it completely ignores the first client after the second is connected.
I also removed the attempt to receive immediately after accepting; in a hundred thousand tries, it wasn't ready once.
More EDIT: Fine, fine - if there's one thing that can goad me into an arbitrary task, it's questioning my ability. So, here, the function where everything must be going wrong:
bool receive(int socketID)
{
short recLen = 0;
char buff[BUFFERSIZE];
FixedByteStream received;
short fullSize = 0;
short diff = 0;
short iter = 0;
short recSoFar = 0;
//Loop through received buffer:
while ((recLen = read(socketID, buff, BUFFERSIZE)) > 0) {
cout << "Receiving on " << socketID << endl;
if (fullSize == 0) {
//We don't know the size yet, that's the first two bytes:
fullSize = ntohs(*(uint16_t*)&buff[0]);
if (fullSize < 4 || recLen < 4) {
//Something went wrong:
syslog(LOG_ERR, "Received nothing.");
return false;
}
received = FixedByteStream(fullSize);
}
diff = fullSize - recSoFar;
if (diff > recLen) {
//More than received bytes left, get them all:
for (iter=0; iter<recLen; ++iter) {
received[recSoFar++] = buff[iter];
}
} else {
//Less than or equal to received bytes left, get only what we need:
for (iter=0; iter<diff; ++iter) {
received[recSoFar++] = buff[iter];
}
}
}
if (recLen < 0 && errno == EWOULDBLOCK) {
cout << "Would block on " << socketID << endl;
}
if (recLen < 0 && errno != EWOULDBLOCK) {
//Had an error:
cout << "Error on " << socketID << endl;
syslog(LOG_ERR, "Connection receive error: %s", strerror(errno));
return false;
} else if (recLen == 0) {
//Nothing received at all?
cout << "Received nothing on " << socketID << endl;
return true;
}
if (fullSize == 0) {
return true;
}
//Store response, since it needs to be passed as a reference:
FixedByteStream response = process(received);
//Send response:
sendOut(socketID, response);
return true;
}
As you can see, it can not loop after encountering an error. I may not use C++ much, but I've been coding for long enough to check for such mistakes before seeking assistance.
bool sendOut(int socketID, FixedByteStream &output)
{
cout << "Sending on " << socketID << endl;
//Send to socket:
if (write(socketID, (char*)output, output.getLength()) < 0) {
syslog(LOG_ERR, "Connection send error: %s", strerror(errno));
return false;
}
return true;
}
What if it EWOULDBLOCK's? Same as if my motherboard melts - I'll fix it. But it hasn't happened yet, so I'm not going to fix it, I'm just making sure I know if it ever needs fixing.
And no, process() doesn't do anything with the sockets, it only accepts and returns a fixed-length char array. Again, this program works perfectly with one client, just not with two or more.
Last EDIT: After yet more debugging, I have found the source of the problem. I'll just go ahead and answer myself.
A socket that has been established as a server can accept connection requests from multiple clients.
epoll monitors I/O events for multiple file descriptors. epoll supports edge trigger (ET) or level trigger (LT), which waits for I/O events via epoll_wait and blocks the calling thread if no events are currently available. select and poll only support LT working mode, and the default working mode of epoll is LT mode.
epoll is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5. 44 of the Linux kernel. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them.
event.data.fd
? Why are you trying to use that? events[iter].data.fd
is the one with the value you want to receive on. You may want to name your variables more distinctly to avoid this problem in the future so you don't waste everyone's time. This is clearly not an issue with epoll.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With