Author Archives: iaastalk

Quickly detect TCP disconnection

Application may be blocked on reading TCP scocket I/O when connection was dropped. It could take up to 2 hours for application to detect this disconnection without a separate thread monitoring network heatbeat(Application level heatbeat ) and TCP native keepalive enabled and tune.

TCP keepalive is an optional feature and has to be explicitly enabled when creating a new sockets. Tune the following using sysctl and persist the change in /etc/sysctl.conf as well.

net.ipv4.tcp_keepalive_time=10
net.ipv4.tcp_keepalive_probes=2
net.ipv4.tcp_keepalive_intvl=3

The above means heatbeats will be sent every 10s. If no response received for the corresponding heartbeat, another heartbeat will be sent out every 3 sends. If 2 response (probes) missed, connection will be regarded as dropped and blocking I/O will be signaled. Application thus get a chance to return from blocking I/O and receive an error or exception for re-establish a new connection.

Futher tuning can be done on tcp_user_timeout, fin-wait, fin-wait2, close-wait, for more details please read the following pages.

Patching RMQ issue

“RabbitMQ Best Practices”

TCP_USER_TIMEOUT

TCP keepalive overview

Cinder Volume attachment Issue

Leave a reply

Openstack by designed is flawed and operations are not fully transactional which may leaves states in different components out of sync. One example is frequent volume attachment/detachment, esp when network has glitches causing MQ unstable, and strange errors will start showing. This is because multiple components are involved, cinder, nova, and compute, where many states are maintained in multiple components. There is no master. Once a message got delayed or lost, they could easily run out of sync. There is no transaction manager coordinating the operation. In general operation across multiple components require a global transaction manager to guarantee the consistency. This also can be achieved using 2PC which can be built into the framework.

Network connection reset

Leave a reply

Recently we are seeing many connection reset issues and did some root cause.

This mainly happened with long lasting connections, like pooled database connections and MQ connections, and also RPC reply connection.

Finally all are fixed by raising up timeout in VIP/LB and firewall profiles.

Timeout is too short with connections that may have no activities during the period.

RabbitMQ rescue

Leave a reply

RabbitMQ cluster shows partitioned badly. I was trying to rebuild.

First of all I had issue with stopping / starting.

Start/Stop hanging:

killall -u rabbitmq -q
backup exiting rabbitmq.config to rabbitmq.org (/etc/rabbitmq/rabbitmq.config)
remove all other cluster members from rabbitmq.config and only keep current host.
rm -ef /var/lib/rabbitmq/mnesia (this needs to verified. Pls check /etc/rabbitmq/rabbitmq-env.conf to see where RABBITMQ_MNESIA_BASE is pointing to. Another thing needs to pay attention to the permission on this directory. The owner has to be rabbitmq, otherwise rabbitmq will fail to start because it’s running as user rabbitmq and can not create directories and files without sufficient permission.
service rabbitmq-server restart

Now server should start without issue. now let’s do further clean up.

rabbitmqctl stop_app
rabbitmqctl force_reset
rabbitmqctl start_app
rbbitmqctl stop

There should be no errors

Now restore /etc/rabbitmq/rabbitmq.config from backup rabbitmq.org and start rabbitmq again.

service rabbitmq-server start

Now we should have all nodes running

Assume node001 is master and start service on node001:

rabbitmqctl start_app

All other nodes:

rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit@node001
rabbitmqctl start_app

Check cluster status on all nodes:

rabbitmqctl cluster_status

should have no more partition and all nodes should be running:

Cluster status of node rabbit@phx04rmqa001 …
[{nodes,[{disc,[rabbit@node001,rabbit@node002, rabbit@node003]}]},
{running_nodes,[[rabbit@node001,rabbit@node002, rabbit@node003]},
{partitions,[]}]

partitions should be empty [] .

don’t forget to enabled HA queue etc.

rabbitmqctl set_policy ha-all “” ‘{“ha-mode”:”all”,”ha-sync-mode”:”automatic”}’

Another option is cluster_partition_handling in configuration that changes how partition recovery works ( default is ignore):

pause_minority
{pause_if_all_down, [nodes], ignore | autoheal}
autoheal

combined with loadbalancer, things could behave strange. Highly recommend no VIP to front mq server.

Another commonly see issue is rabbitmq stops responding due to messages flooded particular queues with no consumer. It may completely stopped responding or delay message delivering. A workaround is to put a size on the queue to force RMQ not to exhaust memry:

rabbitmqctl set_policy POLICY_NAME “QUEUE_NAME” ‘{“max-length”:100}’ –apply-to queues

Good luck

Optimize to extreme

Leave a reply

Application layers :

I/O or CPU intensive, traffic and cpu usage pattern, time change pattern
service dependencies and strength of connections
internal optimization: efficiency of buffer utilization, reduce buffer memory copies, user/kernal spaces, async if possible.

IAAS layers

characteristics of devices: SSD/Spinning HD, Network Band width/QoS,
capacities : availability of cpu cores, memory, disk space, networks.

Smart scheduler:

scheduling based on knowledge provided on both.
dynamically consolidating/scaling based on current load

IAAS Talk

Infrastructure as a Service

Author Archives: iaastalk

Quickly detect TCP disconnection

Cinder Volume attachment Issue

Network connection reset

RabbitMQ rescue

Optimize to extreme