Monthly Archives: April 2016

Quickly detect TCP disconnection

Application may be blocked on reading TCP scocket I/O  when connection was dropped.  It could take up to 2 hours for application to detect this disconnection without a separate thread monitoring network heatbeat(Application level heatbeat ) and TCP native keepalive enabled and tune.

TCP keepalive is an optional feature and has to be explicitly enabled when creating a new sockets.  Tune the following using sysctl and persist the change in /etc/sysctl.conf as well.

net.ipv4.tcp_keepalive_time=10
net.ipv4.tcp_keepalive_probes=2
net.ipv4.tcp_keepalive_intvl=3

The above means heatbeats will be sent every 10s. If no response received for the corresponding heartbeat, another heartbeat will be sent out every 3 sends.  If 2 response (probes) missed, connection will be regarded as dropped and blocking I/O will be signaled.  Application thus get a chance to return from blocking I/O and receive an error or exception for re-establish a new connection.

Futher tuning can be done on tcp_user_timeout, fin-wait, fin-wait2, close-wait, for more details please read the following pages.

Patching RMQ issue

“RabbitMQ Best Practices”

TCP_USER_TIMEOUT

TCP keepalive overview