Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very odd socket behavior in Java; not always closing ports?

Over the course of development of a significantly large project, we've accumulated a lot of unit tests. A lot of these tests start servers, connect to these servers and close the servers and clients, usually in the same process.

However, these tests randomly fail with a "Failed to bind address 127.0.0.1:(port)". When the test is re-run, the error usually disappears.

Now, we thought this was a problem with our tests, but we decided to write a small test in Clojure, which I'll post below (and comment for the non-Clojure people).

(ns test
  (:import [java.net Socket ServerSocket]))

(dotimes [n 10000] ; Run the test ten thousand times
  (let [server (ServerSocket. 10000) ; Start a server on port 10000
        client (Socket. "localhost" 10000) ; Start a client on port 10000
        p (.getLocalPort client)] ; Get the local port of the client
    (.close client) ; Close the client
    (.close server) ; Close the server
    (println "n = " n) ; Debug
    (println "p = " p) ; Debug
    (println "client = " client) ; Debug
    (println "server = " server) ; Debug
    (let [server (ServerSocket. p)] ; Start a server on the local port of the client we just closed
      (.close server) ; Close the server
      (println "client = " client) ; Debug
      (println "server = " server) ; Debug
    ))
  )

The exception appears, at random, on the line where we start the second server. It appears that Java is holding onto the local port - even though the client on that port has already been closed.

So, my question: Why on earth is Java doing this, and why is it so seemingly random?

EDIT: Someone suggested I set the socket's reuseAddr to true. I've done this, and nothing has changed, so here's the code below.

(ns test
  (:import [java.net Socket ServerSocket InetSocketAddress]))

(dotimes [n 10000] ; Run the test ten thousand times
  (let [server (ServerSocket. )] ; Create a server socket
    (. server (setReuseAddress true)) ; Set the socket to reuse address
    (. server (bind (InetSocketAddress. 10000))) ; Bind the socket
    (let  [client (Socket. "localhost" 10000) ; Start a client on port 10000
           p (.getLocalPort client)] ; Get the client's local port
      (.close client) ; Close the client
      (.close server) ; Close the server
;      (. Thread (sleep 1000)) ; A sleep for testing
      (println "n = " n) ; Debug
      (println "p = " p) ; Debug
      (println "client = " client) ; Debug
      (println "server = " server) ; Debug
      (let [server (ServerSocket. )] ; Create a server socket
        (. server (setReuseAddress true)) ; Set the socket to reuse address
        (. server (bind (InetSocketAddress. p))) ; Bind the socket to the local port of the client we just had
        (.close server) ; Close the server
        (println "client = " client) ; Debug
        (println "server = " server) ; Debug
      )))
  )

I've also noticed that a sleep of 10msec or even 100msec does not prevent the problem. 1000msec has (so far) managed to prevent it, however.

EDIT 2: Someone put me on to SO_LINGER - but I can't find a way to set that on the ServerSockets. Anyone have any ideas on that?

EDIT 3: Turns out that SO_LINGER is disabled by default. What else can we look at?

UPDATE: The problem has been solved for the most part, using dynamic port allocation over a range of 10,000 or so ports. However, I'd still like to see what people can come up with.

like image 504
gdude2002 Avatar asked Aug 27 '12 14:08

gdude2002


2 Answers

I'm not (too) with the Clojure syntax, but you should invoke socket.setReuseAddr(true). This allows the program to reuse the port, even if there may be sockets in the TIME_WAIT state.

like image 105
andri Avatar answered Oct 15 '22 21:10

andri


The test itself is invalid. Testing this behaviour is pointless, and has nothing to do with any required application behaviour: it is just exercising a corner condition in the TCP stack, which certainly no application should try to rely on. I would expect that opening a listening socket on a port that had just been an outbound connected port would never succeed at all due to TIME_WAIT, or at best succeed half the time due to uncertainty as to which end issued the close first.

I would remove the test. The rest of it doesn't do anything useful either,

like image 25
user207421 Avatar answered Oct 15 '22 21:10

user207421