{"id":13,"date":"2015-11-20T14:09:44","date_gmt":"2015-11-20T22:09:44","guid":{"rendered":"http:\/\/iaastalk.com\/?p=13"},"modified":"2016-02-04T17:30:11","modified_gmt":"2016-02-05T01:30:11","slug":"rabbitmq-rescue","status":"publish","type":"post","link":"http:\/\/iaastalk.com\/?p=13","title":{"rendered":"RabbitMQ rescue"},"content":{"rendered":"<p>RabbitMQ cluster shows partitioned badly. I was trying to rebuild.<\/p>\n<p>First of all I had issue with stopping \/ starting.<\/p>\n<p>Start\/Stop hanging:<\/p>\n<ol>\n<li>killall -u rabbitmq -q<\/li>\n<li>backup exiting rabbitmq.config to rabbitmq.org\u00a0 (\/etc\/rabbitmq\/rabbitmq.config)<\/li>\n<li>remove all other cluster members from rabbitmq.config and only keep current host.<\/li>\n<li>rm -ef \/var\/lib\/rabbitmq\/mnesia\u00a0 (this needs to verified. Pls check \/etc\/rabbitmq\/rabbitmq-env.conf to see where RABBITMQ_MNESIA_BASE is pointing to.\u00a0 Another thing needs to pay attention to the permission on this directory. The owner has to be rabbitmq, otherwise rabbitmq will fail to start because it&#8217;s running as user rabbitmq and\u00a0 can not create directories and files without sufficient permission.<\/li>\n<li>service rabbitmq-server restart<\/li>\n<\/ol>\n<p>Now server should start without issue. now let&#8217;s do further clean up.<\/p>\n<ol>\n<li>rabbitmqctl stop_app<\/li>\n<li>rabbitmqctl force_reset<\/li>\n<li>rabbitmqctl start_app<\/li>\n<li>rbbitmqctl stop<\/li>\n<\/ol>\n<p>There should be no errors<\/p>\n<p>Now restore \/etc\/rabbitmq\/rabbitmq.config from backup rabbitmq.org and start rabbitmq again.<\/p>\n<ol>\n<li>service rabbitmq-server start<\/li>\n<\/ol>\n<p>Now we should have all nodes running<\/p>\n<p>Assume node001 is master and start service on node001:<\/p>\n<ol>\n<li>rabbitmqctl start_app<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>All other nodes:<\/p>\n<ol>\n<li>rabbitmqctl stop_app<\/li>\n<li>rabbitmqctl join_cluster rabbit@node001<\/li>\n<li>rabbitmqctl start_app<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>Check cluster status on all nodes:<\/p>\n<ol>\n<li>rabbitmqctl cluster_status<\/li>\n<\/ol>\n<p>should have no more partition and all nodes should be running:<\/p>\n<p>Cluster status of node rabbit@phx04rmqa001 &#8230;<br \/>\n[{nodes,[{disc,[rabbit@node001,rabbit@node002, rabbit@node003]}]},<br \/>\n{running_nodes,[[rabbit@node001,rabbit@node002, rabbit@node003]},<br \/>\n{partitions,[]}]<\/p>\n<p>partitions should be empty []\u00a0 .<\/p>\n<p>&nbsp;<\/p>\n<p>don&#8217;t forget to enabled HA queue etc.<\/p>\n<p>rabbitmqctl set_policy ha-all &#8220;&#8221; &#8216;{&#8220;ha-mode&#8221;:&#8221;all&#8221;,&#8221;ha-sync-mode&#8221;:&#8221;automatic&#8221;}&#8217;<\/p>\n<p>Another option is <span class=\"code \">cluster_partition_handling in configuration that changes<\/span> how partition recovery works ( default is ignore):<\/p>\n<ul>\n<li><span class=\"code \">pause_minority<\/span><\/li>\n<li><span class=\"code \">{pause_if_all_down, [nodes], ignore | autoheal}<\/span><\/li>\n<li><span class=\"code \">autoheal<\/span><\/li>\n<\/ul>\n<p>combined with loadbalancer, things could behave strange. Highly recommend no VIP to front mq server.<\/p>\n<p>Another commonly see issue is rabbitmq stops responding due to messages flooded particular queues with no consumer. It may completely stopped responding or delay message delivering. A workaround is to put a size on the queue to force RMQ not to exhaust memry:<\/p>\n<p>rabbitmqctl set_policy POLICY_NAME\u00a0 &#8220;QUEUE_NAME&#8221; &#8216;{&#8220;max-length&#8221;:100}&#8217; &#8211;apply-to queues<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Good luck<\/p>\n","protected":false},"excerpt":{"rendered":"<p>RabbitMQ cluster shows partitioned badly. I was trying to rebuild. First of all I had issue with stopping \/ starting. Start\/Stop hanging: killall -u rabbitmq -q backup exiting rabbitmq.config to rabbitmq.org\u00a0 (\/etc\/rabbitmq\/rabbitmq.config) remove all other cluster members from rabbitmq.config and only keep current host. rm -ef \/var\/lib\/rabbitmq\/mnesia\u00a0 (this needs to verified. Pls check \/etc\/rabbitmq\/rabbitmq-env.conf to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/iaastalk.com\/index.php?rest_route=\/wp\/v2\/posts\/13"}],"collection":[{"href":"http:\/\/iaastalk.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/iaastalk.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/iaastalk.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/iaastalk.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=13"}],"version-history":[{"count":6,"href":"http:\/\/iaastalk.com\/index.php?rest_route=\/wp\/v2\/posts\/13\/revisions"}],"predecessor-version":[{"id":29,"href":"http:\/\/iaastalk.com\/index.php?rest_route=\/wp\/v2\/posts\/13\/revisions\/29"}],"wp:attachment":[{"href":"http:\/\/iaastalk.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=13"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/iaastalk.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=13"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/iaastalk.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=13"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}