Chef Backend Cluster 2.0.1 - Full follower recovery

Sean Horn -

When a chef-backend cluster's followers are down too long, you will see this in the followers' /var/log/chef-backend/postgresql/VERSION/current logfiles


2018-04-25_16:36:29.42242 FATAL:  the database system is starting up
2018-04-25_16:36:30.90058 LOG:  started streaming WAL from primary at 16F3/2D000000 on timeline 88
2018-04-25_16:36:30.90124 FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 00000058000016F30000002D has already been removed

You can recover the followers, 1807 and 1806, by running the following replacing FOLLOWER_NAME  with the appropriate values on the current leader

chef-backend-ctl remove-node FOLLOWER_NAME


Then, on the follower you just removed, replace LEADER_IP_ADDRESS and start a recovery


chef-backend-ctl join-cluster LEADER_IP_ADDRESS --recovery


Try it out using one or the other follower's name. What should happen is that that follower system is able to sync, and then the cluster should start working again as 2/3 is good enough to start serving requests. If it's still blown up, submit a ticket with support and they can help take a look.

Have more questions? Submit a request


Powered by Zendesk