Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why always 5 connections with no program attached?

This question is similar to Network port open, but no process attached? and netstat shows a listening port with no pid but lsof does not. But the answers to them can't solve mine, since it is so weird.

I have a server application called lps that waits for tcp connections on port 8588.

[root@centos63 lcms]# netstat -lnp | grep 8588   
tcp        0      0 0.0.0.0:8588                0.0.0.0:*                   LISTEN          6971/lps

As you can see, nothing is wrong with the listening socket, but when I connect some thousand test clients(written by another colleague) to the server, whether it's 2000, 3000, or 4000. There have always been 5 clients(which are also random) that connect and send login request to the server, but cannot receive any response. Take 3000 clients as an example. This is what the netstat command gives:

[root@centos63 lcms]# netstat -nap | grep 8588 | grep ES | wc -l
3000

And this is lsof command output:

[root@centos63 lcms]# lsof -i:8588 | grep ES | wc -l
2995

That 5 connections are here:

[root@centos63 lcms]# netstat -nap | grep 8588 | grep -v 'lps'                   
tcp    92660      0 192.168.0.235:8588          192.168.0.241:52658         ESTABLISHED -                   
tcp    92660      0 192.168.0.235:8588          192.168.0.241:52692         ESTABLISHED -                   
tcp    92660      0 192.168.0.235:8588          192.168.0.241:52719         ESTABLISHED -                   
tcp    92660      0 192.168.0.235:8588          192.168.0.241:52721         ESTABLISHED -                   
tcp    92660      0 192.168.0.235:8588          192.168.0.241:52705         ESTABLISHED -                   

The 5 above shows that they are connected to the server on port 8588 but no program attached. And the second column(which is RECV-Q) keeps increasing as the clients are sending the request.

The links above say something about NFS mount and RPC. As for RPC, I used the command rcpinfo -p and the result has nothing to do with port 8588. And NFS mount, nfssta output says Error: No Client Stats (/proc/net/rpc/nfs: No such file or directory).

Question : How can this happen? Always 5 and also not from the same 5 clients. I don't think it's port conflict as the other clients are also connected to the same server IP and port and they are all properly handled by the server.

Note: I'm using Linux epoll to accept client requests. I also write debug code in my program and record every socket(along with the clients' information) that accept returns but cannot find the 5 connections. This is uname -a output:

Linux centos63 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Thanks for your kind help! I'm really confused.


Update 2013-06-08: After upgrading the system to CentOS 6.4, the same problem occurs. Finally I returned to epoll, and found this page saying that set listen fd to be non-blocking and accept till EAGAIN or EWOULDBLOCK error returns. And yes, it works. No more connections are pending. But why is that? The Unix Network Programming Volume 1 says

accept is called by a TCP server to return the next completed connection from the 
front of the completed connection queue. If the completed connection queue is empty,
the process is put to sleep (assuming the default of a blocking socket).

So if there are still some completed connections in the queue, why the process is put to sleep?

Update 2013-7-1: I use EPOLLET when adding the listening socket, so I can't accept all if not keeping accept till EAGAIN encountered. I just realized this problem. My fault. Remember: always read or accept till EAGAIN comes out if using EPOLLET, even if it is listening socket. Thanks again to Matthew for proving me with a testing program.

like image 203
leowang Avatar asked Nov 13 '22 03:11

leowang


1 Answers

I've tried duplicating your problem using the following parameters:

  1. The server uses epoll to manage connections.
  2. I make 3000 connections.
  3. Connections are blocking.
  4. The server is basically 'reduced' to handling the connections only and performing very little complicated work.

I cannot duplicate the problem. Here is my server source code.

#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>

#include <errno.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/epoll.h>

#include <err.h>
#include <sysexits.h>
#include <string.h>
#include <unistd.h>

struct {
  int numfds;
  int numevents;
  struct epoll_event *events;
} connections = { 0, 0, NULL };

static int create_srv_socket(const char *port) {
  int fd = -1;
  int rc;
  struct addrinfo *ai = NULL, hints;

  memset(&hints, 0, sizeof(hints));
  hints.ai_flags = AI_PASSIVE;

  if ((rc = getaddrinfo(NULL, port, &hints, &ai)) != 0)
    errx(EX_UNAVAILABLE, "Cannot create socket: %s", gai_strerror(rc));

  if ((fd = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol)) < 0)
    err(EX_OSERR, "Cannot create socket");

  if (bind(fd, ai->ai_addr, ai->ai_addrlen) < 0)
    err(EX_OSERR, "Cannot bind to socket");

  rc = 1;
  if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &rc, sizeof(rc)) < 0)
    err(EX_OSERR, "Cannot setup socket options");

  if (listen(fd, 25) < 0)
    err(EX_OSERR, "Cannot setup listen length on socket");

  return fd;
}

static int create_epoll(void) {
  int fd;
  if ((fd = epoll_create1(0)) < 0)
    err(EX_OSERR, "Cannot create epoll");
  return fd;
}

static bool epoll_join(int epollfd, int fd, int events) { 
  struct epoll_event ev;
  ev.events = events;
  ev.data.fd = fd;

  if ((connections.numfds+1) >= connections.numevents) {
    connections.numevents+=1024;
    connections.events = realloc(connections.events, 
      sizeof(connections.events)*connections.numevents);
    if (!connections.events)
      err(EX_OSERR, "Cannot allocate memory for events list");
  }

  if (epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, &ev) < 0) {
    warn("Cannot add socket to epoll set");
    return false;
  }

  connections.numfds++;
  return true;
}

static void epoll_leave(int epollfd, int fd) {
  if (epoll_ctl(epollfd, EPOLL_CTL_DEL, fd, NULL) < 0)
    err(EX_OSERR, "Could not remove entry from epoll set");

  connections.numfds--;
}


static void cleanup_old_events(void) {
  if ((connections.numevents - 1024) > connections.numfds) {
    connections.numevents -= 1024;
    connections.events = realloc(connections.events,
      sizeof(connections.events)*connections.numevents);
  }
}


static void disconnect(int fd) {
  shutdown(fd, SHUT_RDWR);
  close(fd);
  return;
}

static bool read_and_reply(int fd) {
  char buf[128];
  int rc;
  memset(buf, 0, sizeof(buf));

  if ((rc = recv(fd, buf, sizeof(buf), 0)) <= 0) {
    rc ? warn("Cannot read from socket") : 1;
    return false;
  }

  if (send(fd, buf, rc, MSG_NOSIGNAL) < 0) {
    warn("Cannot send to socket");
    return false;
  }

  return true;
}

int main()
{
  int srv = create_srv_socket("8558");
  int ep = create_epoll();
  int rc = -1;
  struct epoll_event *ev = NULL;

  if (!epoll_join(ep, srv, EPOLLIN)) 
    err(EX_OSERR, "Server cannot join epollfd");

  while (1) {
    int i, cli;

    rc = epoll_wait(ep, connections.events, connections.numfds, -1);
    if (rc < 0 && errno == EINTR)
      continue;
    else if (rc < 0)
      err(EX_OSERR, "Cannot properly perform epoll wait");

    for (i=0; i < rc; i++) {
      ev = &connections.events[i];

      if (ev->data.fd != srv) {

        if (ev->events & EPOLLIN) {
          if (!read_and_reply(ev->data.fd)) {
            epoll_leave(ep, ev->data.fd);
            disconnect(ev->data.fd);
          }
        } 

        if (ev->events & EPOLLERR || ev->events & EPOLLHUP) {
          if (ev->events & EPOLLERR)
            warn("Error in in fd: %d", ev->data.fd);
          else
            warn("Closing disconnected fd: %d", ev->data.fd);

          epoll_leave(ep, ev->data.fd);
          disconnect(ev->data.fd);
        }

      }
      else {

        if (ev->events & EPOLLIN) {
          if ((cli = accept(srv, NULL, 0)) < 0) {
            warn("Could not add socket");
            continue;
          }

          epoll_join(ep, cli, EPOLLIN);
        }

        if (ev->events & EPOLLERR || ev->events & EPOLLHUP)
          err(EX_OSERR, "Server FD has failed", ev->data.fd);

      }
    }

    cleanup_old_events();
  }

}

Here is the client:

from socket import *
import time
scks = list()

for i in range(0, 3000):
  s = socket(AF_INET, SOCK_STREAM)
  s.connect(("localhost", 8558))
  scks.append(s)

time.sleep(600)

When running this on my local machine I get 6001 sockets using port 8558 (1 listening, 3000 client side sockets and 3000 server side sockets).

$ ss -ant | grep 8558 | wc -l
6001

When checking the number of IP connections connected on the client I get 3000.

# lsof -p$(pgrep python) | grep IPv4 | wc -l
3000

I've also tried the test with the server on a remote machine with success too.

I'd suggest you attempt to do the same.

In addition try turning off iptables completely just in case its some connection tracking quirk. Sometimes the iptables option in /proc can help too. So try sysctl -w net.netfilter.nf_conntrack_tcp_be_liberal=1.

Edit: I've done another test which produces the output you see on your side. Your problem is that you are shutting down the connection on the server side pre-emptively.

I can duplicate results similar to what you are seeing doing the following:

  • After reading some data in to my server, call shutdown(fd, SHUT_RD).
  • Do send(fd, buf, sizeof(buf)) on the server.

After doing this the following behaviours are seen.

  • On the client I get 3000 connections open in netstat/ss with ESTABLISHED.
  • In lsof output I get 2880 (nature of how I was doing shutdown) connections established.
  • The remainder of the connections lsof -i:8558 | grep -v ES are in CLOSE_WAIT.

This only happens on a half-shutdown connection.

As such I suspect this is a bug in your client or server program. Either you are sending something to the server which the server objects to, or the server is invalidly closing connections down for some reason.

You need to confirm that what state the "anomalous" connections in (like close_wait or something else).

At this stage I also consider this a programming problem and not really something that belongs on serverfault. Without seeing the relevant portions of the source for the client/server it is not going to be possible for anybody to track down the cause of the fault. Albeit I am pretty confident this is nothing to do with the way the operating system is handling the connections.

like image 146
Matthew Ife Avatar answered Nov 26 '22 01:11

Matthew Ife