Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Docker services stops communicating after some time

I have together 6 containers running in docker swarm. Kafka+Zookeeper, MongoDB, A, B, C and Interface. Interface is the main access point from public - only this container publish the port - 5683. The interface container connects to A, B and C during startup. I am using docker-compose file + docker stack deploy, each service has a name which is used as host for interface. Everything starts successfully and works fine. After some time (20 mins,1h,..) I am not able to make request to interface. Interface receives my requests but application lost connection with service A,B,C or all of them. If I restart interface, it's able to reconnect to services A,B,C.

I firstly thought it's problem of application so I expose 2 new ports on each service (interface, A,B,C) and connect with profiler and debugger to them. Application is running properly, no leaks, no blocked threads, normally working and waiting for connections. Debugger shows me that when I make a request to interface and interface tries to request service A, Connection reset by peer exception was thrown.

During this debugging I found out interesting stuff. I attached debugger to interface when the services started and also debugger was disconnected after some time. + I was not able to reconnect it, until I made request to the container -> application. PRoblem - handshake failed.

Another interesting stuff that I found out was that I was not able to request neither interface. So I used wireshark to see what's going on and: SYN - ACK was fine. Then application post some data and interface respond with FIN,ACK. I assume that this also happen when interface tries to request service A and it FIN the connection. Codebase of Interface, A,B and C is the same regarding netty server.

Finally, I don't think it's a application issue. Why? I tried to deploy containers not as services. I run each container separately, published the ports of each and endpoint of services were set to localhost. (not overlay network). And it is working. Containers run without problem. + I didn't say at the beginning, that the the java applications (interface, A,B,C) runs without problem when they are running as standalone application - not in docker.

Could you please help me what could be the issue? Why the docker in case of overlay network is closing sockets?

I am using newest docker. I used also older.

like image 610
Ondrej Tomcik Avatar asked Apr 09 '17 19:04

Ondrej Tomcik


People also ask

Does Docker have a timeout?

Docker has limited patience when it comes to stopping containers. There's a timeout, which is 10 seconds by default for each container.

Why is it difficult for Docker containers to communicate with each other?

Containers can only communicate with each other if they share a network. Containers that don't share a network cannot communicate with one another. That's one of the isolation features provided by Docker. A container can belong to more than one network, and a network can have multiple containers inside.

Why does my Docker container stop immediately?

You're running a shell in a container, but you haven't assigned a terminal: If you're running a container with a shell (like bash ) as the default command, then the container will exit immediately if you haven't attached an interactive terminal.

How do I stop Docker from stopping?

docker rm -f The final option for stopping a running container is to use the --force or -f flag in conjunction with the docker rm command. Typically, docker rm is used to remove an already stopped container, but the use of the -f flag will cause it to first issue a SIGKILL.


Video Answer


1 Answers

Finally, I was able to solve the problem.

What was happening, one more time. Interface opens permanent TCP connection to A,B,C. When you try to run these services A,B,C as a standalone java applications, everything is working. When we dockerize them and run in swarm, it was working only few minutes. Strange was that the connection between Interface and another service was interrupted in the moment when you made a request from client to interface.

After many many unsuccessful tests and debugging each container I tried to run each docker container separately, with mapped ports and as endpoint I specified localhost. (each container exposed ports and interface was connecting to localhost) Funny thing happen, it was working. When you run containers like this, different network driver for container is used. Bridge one. If you run it in swarm, overlay network driver is used.

So it had to be something with the docker network, not with application itself. Next step was tcpdump from each container after couple of minutes, when it should stop working. It was very interesting.

  • Client -> Interface (OK, request accepted)
  • Interface ->(forward request because it belongs to A) A
    • Interface -> A [POST]
    • A -> Interface [RESET]

A was reseting opened TCP communication after couple of minutes without communication. Why?

Docker uses IP Virtual Server and IPVS maintains its own connection table. The default timeout for CLOSE_WAIT connections in IPVS table is 60 seconds. Hence when the server sends something after 60 seconds, the IPVS connection is no longer available and the packet looks invalid for a new TCP session and gets RST. On the client side, the connection remains forever in FIN_WAIT2 state because the app still has the socket open; kernel's fin_wait timer kicks in only for orphaned TCP sockets.

This is what I read about it and how understand it. I am not sure if my explanation of problem is correct, but based on these assumptions I implemented ping-pong between Interface and A,B,C services in case there is no communication for <60seconds. And, it’s working.

like image 169
Ondrej Tomcik Avatar answered Sep 19 '22 23:09

Ondrej Tomcik